Full Scan
This is the "last resort" implementation type. It will be used if no other implementation type is provided.
The full scan strategy fetches all records from the external app and compares them with the previously fetched records to determine the changes.
How It Works
- Fetches all records from the collection
- Compares current records with previously stored records
- Detects created, updated, and deleted records by comparing snapshots
- Generates appropriate events for each change
Configuration Example
# spec.yml
name: Collection Name
events:
created:
implementationType: full-scan
parameters:
ignoredFields:
- lastViewedAt
- downloadUrl
- temporaryToken
updated:
implementationType: full-scan
parameters:
ignoredFields:
- lastViewedAt
- downloadUrl
- temporaryToken
deleted:
implementationType: full-scan
Parameters
ignoredFields
– list of fields that should be ignored when comparing records. For example, dynamically generated fields such as download URLs or last view timestamps need to be ignored to avoid false updates.
Limitations
Due to the need to read the full collection and compare the full list of records between previous and current read, it can become quite resource-intensive.
To avoid unexpected excessive usage of resources (worker time, API calls), we limit the full scan event pulls with:
- 100,000 records in a collection (to avoid running out of memory)
- 1,000 pages of records in a collection (to avoid excessive use of API requests)
- 20 minutes of runtime of pulling all collection records (to avoid unexpected cost associated with long-running background tasks)
Whenever one of these limits is reached, the full sync pull will fail.
When to Use
- Last resort only - when no other implementation type is available
- Small to medium-sized collections (under 100k records)
- APIs without webhook or change tracking support
- Simple data structures without frequent updates
Not Recommended For
- Large collections (>100k records)
- Frequently updated data
- APIs with rate limiting
- Production systems with high performance requirements
Alternative Recommendations
If you expect your collections to be larger than the limits, we recommend implementing periodic full re-sync of the whole collection into your database instead of relying on events generated by the full scan.
Best Practices
- Use
ignoredFields
to exclude dynamic or frequently changing fields
Updated 2 days ago