Upgrading to v3
This page summarizes the breaking changes between Apify Python SDK v2.x and v3.0.
Python version support
Support for Python 3.9 has been dropped. The Apify Python SDK v3.x now requires Python 3.10 or later. Make sure your environment is running a compatible version before upgrading.
Changes in storages
Apify Python SDK v3.0 includes Crawlee v1.0, which brings significant changes to the storage APIs. In Crawlee v1.0, the Dataset
, KeyValueStore
, and RequestQueue
storage APIs have been updated for consistency and simplicity. Below is a detailed overview of what's new, what's changed, and what's been removed.
See the Crawlee's Storages guide for more details.
Dataset
The Dataset
API now includes several new methods, such as:
get_metadata
- retrieves metadata information for the dataset.purge
- completely clears the dataset, including all items (keeps the metadata only).list_items
- returns the dataset's items in a list format.
Some older methods have been removed or replaced:
from_storage_object
constructor has been removed. You should now use theopen
method with either aname
orid
parameter.get_info
method and thestorage_object
property have been replaced by the newget_metadata
method.set_metadata
method has been removed.write_to_json
andwrite_to_csv
methods have been removed; instead, use theexport_to
method for exporting data in different formats.
Key-value store
The KeyValueStore
API now includes several new methods, such as:
get_metadata
- retrieves metadata information for the key-value store.purge
- completely clears the key-value store, removing all keys and values (keeps the metadata only).delete_value
- deletes a specific key and its associated value.list_keys
- lists all keys in the key-value store.
Some older methods have been removed or replaced:
from_storage_object
- removed; use theopen
method with either aname
orid
instead.get_info
andstorage_object
- replaced by the newget_metadata
method.set_metadata
method has been removed.
Request queue
The RequestQueue
API now includes several new methods, such as:
get_metadata
- retrieves metadata information for the request queue.purge
- completely clears the request queue, including all pending and processed requests (keeps the metadata only).add_requests
- replaces the previousadd_requests_batched
method, offering the same functionality under a simpler name.
Some older methods have been removed or replaced:
from_storage_object
- removed; use theopen
method with either aname
orid
instead.get_info
andstorage_object
- replaced by the newget_metadata
method.get_request
has argumentunique_key
instead ofrequest_id
as theid
field was removed from theRequest
.set_metadata
method has been removed.
Some changes in the related model classes:
resource_directory
inRequestQueueMetadata
- removed; use the correspondingpath_to_*
property instead.stats
field inRequestQueueMetadata
- removed as it was unused.RequestQueueHead
- replaced byRequestQueueHeadWithLocks
.
Removed Actor.config property
Actor.config
property has been removed. UseActor.configuration
instead.
Actor initialization and ServiceLocator changes
Actor
initialization and global service_locator
services setup is more strict and predictable.
- Services in
Actor
can't be changed after callingActor.init
, entering theasync with Actor
context manager or after requesting them from theActor
. - Services in
Actor
can be different from services in Crawler.
Now (v3.0):
from crawlee.crawlers import BasicCrawler
from crawlee.storage_clients import MemoryStorageClient
from crawlee.configuration import Configuration
from crawlee.events import LocalEventManager
from apify import Actor
async def main():
async with Actor():
# This crawler will use same services as Actor and global service_locator
crawler_1 = BasicCrawler()
# This crawler will use custom services
custom_configuration = Configuration()
custom_event_manager = LocalEventManager.from_config(custom_configuration)
custom_storage_client = MemoryStorageClient()
crawler_2 = BasicCrawler(
configuration=custom_configuration,
event_manager=custom_event_manager,
storage_client=custom_storage_client,
)
Removed Actor.config property
Actor.config
property has been removed. UseActor.configuration
instead.
Default storage ids in configuration changed to None
Configuration.default_key_value_store_id
changed from'default'
toNone
.Configuration.default_dataset_id
changed from'default'
toNone
.Configuration.default_request_queue_id
changed from'default'
toNone
.
Previously using the default storage without specifying its id
in Configuration
would lead to using specific storage with id 'default'
. Now it will use newly created unnamed storage with 'id'
assigned by the Apify platform, consecutive calls to get the default storage will return the same storage.