📚 Learning Resources

Core Technologies

Apache Kafka

Distributed event streaming platform capable of handling trillions of events per day.

Message Broker Event Streaming Durability

Links:
• Official Docs: kafka.apache.org
• GitHub: github.com/apache/kafka
• Quickstart: kafka.apache.org/quickstart

Apache Flink

Stateful computations over data streams with exactly-once semantics and event-time processing.

Stream Processing Stateful Low Latency

Links:
• Official Docs: flink.apache.org
• Training: flink.apache.org/training
• Playgrounds: flink.apache.org/try-flink

Apache Pulsar

Cloud-native distributed messaging and streaming platform with multi-tenancy and geo-replication.

Multi-tenancy Geo-replication Cloud Native

Links:
• Official Docs: pulsar.apache.org
• Tutorials: pulsar.apache.org/docs/next
• Community: pulsar.apache.org/community

Apache Spark Streaming

Scalable stream processing with micro-batch architecture and unified batch/stream API.

Micro-batching Unified API Scalable

Links:
• Official Docs: spark.apache.org/streaming
• Examples: spark.apache.org/examples
• Structured Streaming: spark.apache.org/docs/latest/structured-streaming

Amazon Kinesis

Fully managed streaming service on AWS for real-time data ingestion and processing.

AWS Managed Serverless

Links:
• Official Docs: aws.amazon.com/kinesis
• Developer Guide: docs.aws.amazon.com/kinesis
• Tutorials: aws.amazon.com/kinesis/getting-started

Apache Storm

Real-time distributed computation system for processing unbounded streams of data.

Real-time Distributed Fault-tolerant

Links:
• Official Docs: storm.apache.org
• Tutorial: storm.apache.org/releases/current/Tutorial.html
• GitHub: github.com/apache/storm

Client Libraries & SDKs

Library Language Platform Description
KafkaJS JavaScript/TypeScript Node.js Modern Kafka client for Node.js with async/await support
confluent-kafka-python Python Any High-performance Python client based on librdkafka
kafka-go Go Any Pure Go Kafka client with zero dependencies
kafka-clients Java JVM Official Apache Kafka Java client
rdkafka C/C++ Any High-performance C library used by many language bindings
Sarama Go Any Pure Go client with consumer group support

Books & Publications

📖 Streaming Systems

Authors: Tyler Akidau, Slava Chernyak, Reuven Lax

Comprehensive guide to stream processing concepts, windowing, watermarks, and state management.

Fundamentals Advanced

📖 Kafka: The Definitive Guide

Authors: Neha Narkhede, Gwen Shapira, Todd Palino

Complete reference for Apache Kafka covering architecture, operations, and use cases.

Kafka Production

📖 Designing Event-Driven Systems

Author: Ben Stopford

Patterns and concepts for building event-driven architectures with Kafka.

Architecture Patterns

📖 Stream Processing with Apache Flink

Authors: Fabian Hueske, Vasiliki Kalavri

Practical guide to building streaming applications with Apache Flink.

Flink Hands-on

Online Courses & Tutorials

Tools & Utilities

Tool Purpose Features
Confluent Control Center Kafka Management Monitoring, alerting, cluster management, stream processing
Kafka Manager (CMAK) Cluster Management Topic management, consumer groups, cluster monitoring
Kafdrop Web UI Browse topics, view messages, monitor consumers
Schema Registry Schema Management Avro/Protobuf/JSON schema versioning and validation
Kafka Connect Data Integration Source/sink connectors for databases, cloud services, etc.
ksqlDB Stream Processing SQL-like queries on Kafka streams
Burrow Monitoring Consumer lag monitoring and alerting
Cruise Control Operations Automated cluster rebalancing and anomaly detection

Community & Support

Forums & Discussion

Conferences & Events

Blogs & News

WIA-DATA-013 Specification Documents

📄 PHASE 1: Data Format

Event serialization formats, schemas, and data modeling for streaming systems.

Avro Protobuf JSON

📄 PHASE 2: API

Producer/consumer APIs, admin operations, and client interaction patterns.

REST API SDK gRPC

📄 PHASE 3: Protocol

Wire protocols, communication patterns, and network optimization strategies.

TCP Binary Protocol Compression

📄 PHASE 4: Integration

Connectors, adapters, and integration patterns with databases and cloud services.

CDC Connectors Cloud

Best Practices

  1. Schema Management: Use Schema Registry for schema evolution and compatibility
  2. Partitioning: Choose partition keys carefully for even distribution and ordering
  3. Consumer Groups: Scale consumers horizontally within consumer groups
  4. Monitoring: Track lag, throughput, error rates, and latency continuously
  5. Error Handling: Implement dead-letter queues for failed messages
  6. Exactly-Once: Use idempotent producers and transactional consumers when needed
  7. Backpressure: Handle slow consumers with proper buffering and flow control
  8. Security: Enable SSL/TLS, SASL authentication, and ACLs in production