Streaming data refers to continuous, real-time data flows generated by various sources such as sensors, applications, devices, and systems. Unlike batch processing, streaming data is processed incrementally as it arrives, enabling immediate insights and actions.
High-speed data ingestion with thousands to millions of events per second.
Massive amounts of continuous data requiring scalable processing infrastructure.
Never-ending data streams that require persistent processing pipelines.
Sub-second to millisecond processing requirements for time-sensitive applications.
Event ordering and temporal relationships must be preserved and maintained.
Guaranteed delivery with exactly-once or at-least-once semantics.
| Aspect | Streaming | Batch |
|---|---|---|
| Data Processing | Continuous, real-time | Periodic, scheduled |
| Latency | Milliseconds to seconds | Minutes to hours |
| Data Size | Small, incremental | Large, accumulated |
| Use Case | Real-time analytics, monitoring | Historical analysis, reporting |
| Complexity | Higher (state management, windowing) | Lower (simpler logic) |
| Resource Usage | Continuous consumption | Periodic spikes |
Combines batch and stream processing layers for comprehensive data processing:
Stream-first architecture that simplifies Lambda by using only streaming:
Distributed services communicating via event streams:
| Technology | Type | Key Strength |
|---|---|---|
| Apache Kafka | Message Broker | High-throughput, durable event streaming |
| Apache Flink | Stream Processor | Stateful computations, exactly-once semantics |
| Apache Pulsar | Message Broker | Multi-tenancy, geo-replication |
| Apache Spark Streaming | Stream Processor | Unified batch and stream processing |
| Amazon Kinesis | Cloud Service | Managed streaming on AWS |
| Apache Storm | Stream Processor | Low-latency distributed processing |
Make decisions based on current data, not yesterday's batch reports.
Process data incrementally, avoiding expensive full-dataset scans.
Deliver real-time features like live dashboards and instant notifications.
Handle growing data volumes by scaling horizontally across clusters.
This standard defines: