When Batch Isn’t Enough
Data Cloud’s batch ingestion (SFTP files, scheduled pulls from other platforms) is the simplest onboard. It’s fine for most use cases — nightly syncs, daily engagement updates, periodic reconciliations.
Streaming ingestion is for the jobs that can’t wait:
- Personalization that must react to an event within seconds.
- Fraud detection on transactions as they happen.
- Abandoned-cart signals driving email within minutes.
- Real-time segmentation for ad bidding.
For these, you use the Ingestion API in streaming mode.
The Ingestion API
The Data Cloud Ingestion API accepts HTTP POSTs to an Ingestion Source. Each Ingestion Source is defined in Data Cloud setup:
- Source type: Ingestion API.
- Schema: the fields you’ll send.
- Data stream name: lands events as rows in a DLO.
- Authentication: OAuth 2.0 JWT bearer flow with an Ingestion API named credential.
Configured once. Then clients POST events to the endpoint.
The Request Shape
POST /api/v1/ingest/sources/{source-api-name}/{object-name}
Authorization: Bearer {access_token}
Content-Type: application/json
{
"data": [
{
"customer_id": "cust-100",
"event_type": "page_view",
"url": "https://example.com/products/42",
"timestamp": "2026-04-23T14:22:01Z",
"session_id": "sess-abc"
}
]
}
Each POST can include multiple records in the data array. The endpoint returns a success if the batch was accepted; actual processing is async.
Idempotency
Streaming ingestion is at-least-once. A client retry due to a network blip can result in duplicate events landing.
Mitigations:
- Include a unique event ID as a field. Downstream processing (calculated insights, identity resolution) can deduplicate by this ID.
- Use primary key fields with consistent hashing. The Data Cloud ingestion pipeline deduplicates on primary keys when configured.
Design for duplicates. Do not rely on “I’ll only send it once.”
Throughput and Volume
Streaming ingestion is rate-limited per org. Typical limits:
- Requests per second: hundreds to thousands depending on license tier.
- Records per request: up to tens of thousands.
Plan client-side batching. Sending 100 small requests per second is worse than one batched request per second carrying 100 events.
Monitor in Setup → Data Cloud → Ingestion API Monitoring for accepted vs. rejected rates.
Schema Evolution
Streaming ingestion couples tightly to the Data Cloud schema for that source. Adding fields is allowed (backward compatible); removing or renaming fields breaks consumers.
Version your events:
- Include an
event_versionfield. - Never remove fields from an existing schema version — add new versions instead.
- Consumers decide which versions to accept.
This costs schema clutter but prevents silent data corruption during change.
Error Handling
A POST can be rejected for:
- Schema mismatch. Your payload doesn’t match the expected shape.
- Auth failure. Token expired or invalid.
- Rate limit. Too many requests.
- Transient platform errors. 5xx.
Client retry policy:
- 4xx (except 429): don’t retry. Fix the payload or auth.
- 429: retry with exponential backoff.
- 5xx: retry with exponential backoff.
- Network errors: retry with exponential backoff.
Failed events that can’t be ingested should land in a dead-letter queue for manual review. Don’t silently drop.
Common Patterns
1. Web Event Stream
Web clickstream → JavaScript SDK → HTTP POST to Ingestion API → Data Cloud DLO.
Events include customer_id (from cookie or login), event_type, page_url, timestamp.
Downstream: calculated insights compute session counts, most-visited pages, cart activity. Segments trigger on engagement thresholds.
2. App Event Stream
Mobile app → backend → Ingestion API.
Events include app-specific context: screen views, feature usage, transactions.
3. IoT Telemetry
Device events → gateway or IoT platform → Ingestion API.
Use when device behavior drives customer-centric insights (smart home usage, wearable engagement).
4. Transaction Stream
Commerce system → order events → Ingestion API.
Each order or refund event lands as a transaction DLO row. Drives revenue insights, upsell recommendations.
Integration With Calculated Insights
Streaming data becomes useful when insights and segments aggregate it. Real-time insights continuously update; batch insights run on schedule.
Decide per use case:
- Real-time: for sub-minute actions (ad bidding, fraud alerts).
- Batch: for everything else. Batch is cheaper and simpler.
Most streaming data is fine landing in real-time DLOs and being aggregated by batch insights daily. Don’t pay for real-time insight if a daily refresh meets the need.
Monitoring
Ingestion rates: accepted vs. rejected per source. Spikes in rejections indicate client bugs.
Latency: end-to-end time from event emitted to landing in the DLO. For real-time use cases, target < 30 seconds.
Queue depth: if events accumulate in the ingestion queue, downstream is starved. Scale the ingestion capacity or slow the publisher.
Storage growth: streaming sources accumulate data fast. Lifecycle rules (retention, archiving) keep costs sane.
Security
The Ingestion API uses OAuth 2.0 JWT bearer. The JWT is signed with a private key held by the client; Data Cloud verifies with the registered public key.
Best practices:
- Rotate signing keys annually or on personnel changes.
- Issue separate JWT credentials per client — don’t share.
- Monitor for unusual ingestion patterns that might indicate compromise.
Anti-Patterns
Using streaming for batch data. Nightly syncs don’t need streaming; they need SFTP or file-drop ingestion.
No idempotency keys. Retries create duplicates; dedup at calculation time is expensive.
No batching on the client. One event per request wastes throughput and quota.
Ignoring dead-lettered events. Failed events accumulate and the root cause stays unfixed.
Sending all data to streaming “just in case.” Expensive and complicates the pipeline. Stream only what needs to be real-time.
Frequently Asked Questions
How does streaming ingestion interact with identity resolution?
Streamed records participate in the next resolution run. For identity-critical streaming use cases, configure more frequent resolution schedules.
Is there a native connector for Kafka?
Partner connectors for Kafka exist; direct native Kafka ingestion is in roadmap. Check current availability.
What file formats does streaming support?
Streaming is JSON over HTTPS. Batch ingestion supports CSV, Parquet, and JSON files.
Can I ingest from multiple clients to the same source?
Yes — multiple clients can POST to the same Ingestion Source. Include a client identifier in events if you need to distinguish.