Data Cloud Streaming Ingestion: Patterns for Real-Time

Q: How does streaming ingestion interact with identity resolution?

See the full answer in the How does streaming ingestion interact with identity resolution? section of this article.

Q: Is there a native connector for Kafka?

See the full answer in the Is there a native connector for Kafka? section of this article.

Q: What file formats does streaming support?

See the full answer in the What file formats does streaming support? section of this article.

Q: Can I ingest from multiple clients to the same source?

See the full answer in the Can I ingest from multiple clients to the same source? section of this article.

When Batch Isn’t Enough

Data Cloud’s batch ingestion (SFTP files, scheduled pulls from other platforms) is the simplest onboard. It’s fine for most use cases — nightly syncs, daily engagement updates, periodic reconciliations.

Streaming ingestion is for the jobs that can’t wait:

Personalization that must react to an event within seconds.
Fraud detection on transactions as they happen.
Abandoned-cart signals driving email within minutes.
Real-time segmentation for ad bidding.

For these, you use the Ingestion API in streaming mode.

The Ingestion API

The Data Cloud Ingestion API accepts HTTP POSTs to an Ingestion Source. Each Ingestion Source is defined in Data Cloud setup:

Source type: Ingestion API.
Schema: the fields you’ll send.
Data stream name: lands events as rows in a DLO.
Authentication: OAuth 2.0 JWT bearer flow with an Ingestion API named credential.

Configured once. Then clients POST events to the endpoint.

The Request Shape

POST /api/v1/ingest/sources/{source-api-name}/{object-name}
Authorization: Bearer {access_token}
Content-Type: application/json

{
  "data": [
    {
      "customer_id": "cust-100",
      "event_type": "page_view",
      "url": "https://example.com/products/42",
      "timestamp": "2026-04-23T14:22:01Z",
      "session_id": "sess-abc"
    }
  ]
}

Each POST can include multiple records in the data array. The endpoint returns a success if the batch was accepted; actual processing is async.

Idempotency

Streaming ingestion is at-least-once. A client retry due to a network blip can result in duplicate events landing.

Mitigations:

Include a unique event ID as a field. Downstream processing (calculated insights, identity resolution) can deduplicate by this ID.
Use primary key fields with consistent hashing. The Data Cloud ingestion pipeline deduplicates on primary keys when configured.

Design for duplicates. Do not rely on “I’ll only send it once.”

Throughput and Volume

Streaming ingestion is rate-limited per org. Typical limits:

Requests per second: hundreds to thousands depending on license tier.
Records per request: up to tens of thousands.

Plan client-side batching. Sending 100 small requests per second is worse than one batched request per second carrying 100 events.

Monitor in Setup → Data Cloud → Ingestion API Monitoring for accepted vs. rejected rates.

Schema Evolution

Streaming ingestion couples tightly to the Data Cloud schema for that source. Adding fields is allowed (backward compatible); removing or renaming fields breaks consumers.

Version your events:

Include an event_version field.
Never remove fields from an existing schema version — add new versions instead.
Consumers decide which versions to accept.

This costs schema clutter but prevents silent data corruption during change.

Error Handling

A POST can be rejected for:

Schema mismatch. Your payload doesn’t match the expected shape.
Auth failure. Token expired or invalid.
Rate limit. Too many requests.
Transient platform errors. 5xx.

Client retry policy:

4xx (except 429): don’t retry. Fix the payload or auth.
429: retry with exponential backoff.
5xx: retry with exponential backoff.
Network errors: retry with exponential backoff.

Failed events that can’t be ingested should land in a dead-letter queue for manual review. Don’t silently drop.

Common Patterns

1. Web Event Stream

Web clickstream → JavaScript SDK → HTTP POST to Ingestion API → Data Cloud DLO.

Events include customer_id (from cookie or login), event_type, page_url, timestamp.

Downstream: calculated insights compute session counts, most-visited pages, cart activity. Segments trigger on engagement thresholds.

2. App Event Stream

Mobile app → backend → Ingestion API.

Events include app-specific context: screen views, feature usage, transactions.

3. IoT Telemetry

Device events → gateway or IoT platform → Ingestion API.

Use when device behavior drives customer-centric insights (smart home usage, wearable engagement).

4. Transaction Stream

Commerce system → order events → Ingestion API.

Each order or refund event lands as a transaction DLO row. Drives revenue insights, upsell recommendations.

Integration With Calculated Insights

Streaming data becomes useful when insights and segments aggregate it. Real-time insights continuously update; batch insights run on schedule.

Decide per use case:

Real-time: for sub-minute actions (ad bidding, fraud alerts).
Batch: for everything else. Batch is cheaper and simpler.

Most streaming data is fine landing in real-time DLOs and being aggregated by batch insights daily. Don’t pay for real-time insight if a daily refresh meets the need.

Monitoring

Ingestion rates: accepted vs. rejected per source. Spikes in rejections indicate client bugs.

Latency: end-to-end time from event emitted to landing in the DLO. For real-time use cases, target < 30 seconds.

Queue depth: if events accumulate in the ingestion queue, downstream is starved. Scale the ingestion capacity or slow the publisher.

Storage growth: streaming sources accumulate data fast. Lifecycle rules (retention, archiving) keep costs sane.

Security

The Ingestion API uses OAuth 2.0 JWT bearer. The JWT is signed with a private key held by the client; Data Cloud verifies with the registered public key.

Best practices:

Rotate signing keys annually or on personnel changes.
Issue separate JWT credentials per client — don’t share.
Monitor for unusual ingestion patterns that might indicate compromise.

Anti-Patterns

Using streaming for batch data. Nightly syncs don’t need streaming; they need SFTP or file-drop ingestion.

No idempotency keys. Retries create duplicates; dedup at calculation time is expensive.

No batching on the client. One event per request wastes throughput and quota.

Ignoring dead-lettered events. Failed events accumulate and the root cause stays unfixed.

Sending all data to streaming “just in case.” Expensive and complicates the pipeline. Stream only what needs to be real-time.

Frequently Asked Questions

How does streaming ingestion interact with identity resolution?

Streamed records participate in the next resolution run. For identity-critical streaming use cases, configure more frequent resolution schedules.

Is there a native connector for Kafka?

Partner connectors for Kafka exist; direct native Kafka ingestion is in roadmap. Check current availability.

What file formats does streaming support?

Streaming is JSON over HTTPS. Batch ingestion supports CSV, Parquet, and JSON files.

Can I ingest from multiple clients to the same source?

Yes — multiple clients can POST to the same Ingestion Source. Include a client identifier in events if you need to distinguish.