π WIA Emotion AI Standard Ebook | Chapter 6 of 8
Hongik Ingan (εΌηδΊΊι)
"Benefit All Humanity"
Real-time emotion streaming enables continuous affective computing for applications requiring immediate feedback and temporal analysis.
Phase 3 defines the real-time streaming protocol for continuous emotion analysis. This is essential for applications like:
| Protocol | Use Case | Latency |
|---|---|---|
| WebSocket | Web/mobile apps, bidirectional | <100ms |
| gRPC | Server-to-server, high performance | <50ms |
| MQTT | IoT devices, low bandwidth | <200ms |
| Server-Sent Events | One-way streaming | <150ms |
Endpoint: wss://stream.wiastandards.com/emotion-ai/v1/stream Connection Request: GET /emotion-ai/v1/stream HTTP/1.1 Host: stream.wiastandards.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13 X-WIA-API-Key: your_api_key Connection Response: HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
| Type | Direction | Purpose |
|---|---|---|
| config | Client β Server | Configure stream parameters |
| frame | Client β Server | Send video/audio frame |
| result | Server β Client | Emotion analysis result |
| error | Server β Client | Error notification |
| ping/pong | Bidirectional | Keep-alive |
{
"type": "config",
"stream_id": "stream_abc123",
"modalities": ["facial", "voice"],
"facial": {
"frame_rate": 30,
"resolution": "720p",
"encoding": "jpeg",
"return_action_units": true
},
"voice": {
"sample_rate": 16000,
"encoding": "LINEAR16",
"language": "en-US"
},
"output": {
"rate": "per_frame", // or "per_second", "on_change"
"include_dimensions": true,
"include_raw_scores": false
}
}
Binary Frame Format:
βββββββββββββββββββββββββββββββββββββββββββββββ
β Header (16 bytes) β
β - Magic: 0x57494145 ("WIAE") β
β - Version: 1 β
β - Modality: 1=facial, 2=voice, 3=both β
β - Timestamp: 8 bytes (ms since epoch) β
β - Length: 4 bytes β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Payload (variable length) β
β - JPEG image data or PCM audio β
βββββββββββββββββββββββββββββββββββββββββββββββ
{
"type": "result",
"stream_id": "stream_abc123",
"frame_number": 1542,
"timestamp": "2025-12-19T10:30:15.123Z",
"latency_ms": 42,
"emotions": {
"primary": { "label": "happiness", "confidence": 0.85 },
"change_detected": false
},
"dimensions": {
"valence": 0.72,
"arousal": 0.48
},
"action_units": [
{ "au": "AU6", "intensity": 0.78 },
{ "au": "AU12", "intensity": 0.82 }
],
"face": {
"detected": true,
"tracking_id": "face_001",
"bbox": { "x": 120, "y": 80, "width": 200, "height": 250 }
}
}
| Use Case | Minimum FPS | Recommended FPS |
|---|---|---|
| Video calls | 15 | 30 |
| Gaming/VR | 30 | 60 |
| Driver monitoring | 30 | 60 |
| Healthcare | 10 | 30 |
| Retail analytics | 5 | 15 |
| Component | Target | Maximum |
|---|---|---|
| Frame capture | <10ms | 20ms |
| Network transfer | <30ms | 50ms |
| Face detection | <15ms | 30ms |
| Emotion classification | <20ms | 40ms |
| Response delivery | <10ms | 20ms |
| Total (end-to-end) | <85ms | 160ms |
"per_frame": - Result for every input frame - Highest bandwidth, lowest latency - Best for gaming, VR "per_second": - Aggregated result every second - Lower bandwidth - Best for analytics, monitoring "on_change": - Result only when emotion changes - Lowest bandwidth - Best for event-driven applications "threshold": - Result when confidence exceeds threshold - Configurable sensitivity - Best for alert systems
{
"type": "event",
"event_type": "emotion_change",
"stream_id": "stream_abc123",
"timestamp": "2025-12-19T10:30:15.123Z",
"previous_emotion": {
"label": "neutral",
"confidence": 0.75
},
"current_emotion": {
"label": "happiness",
"confidence": 0.82
},
"transition_duration_ms": 450,
"trigger": "gradual" // or "sudden"
}
| Event | Trigger | Use Case |
|---|---|---|
| emotion_change | Primary emotion changes | Logging, alerts |
| high_arousal | Arousal exceeds threshold | Stress detection |
| negative_valence | Valence drops below threshold | Dissatisfaction detection |
| face_lost | Face tracking lost | Attention monitoring |
| face_detected | New face detected | Presence detection |
| micro_expression | Brief expression detected | Deception detection |
API Key (in header):
X-WIA-API-Key: sk_live_abc123
JWT Token (in query or header):
wss://stream.wiastandards.com/v1/stream?token=eyJhbG...
Token Refresh:
{
"type": "auth_refresh",
"token": "new_jwt_token"
}
{
"type": "error",
"code": "FACE_NOT_DETECTED",
"message": "No face detected in frame 1543",
"frame_number": 1543,
"recoverable": true
}
| Code | Recoverable | Action |
|---|---|---|
| FACE_NOT_DETECTED | Yes | Continue sending frames |
| INVALID_FRAME | Yes | Skip frame, continue |
| RATE_EXCEEDED | Yes | Reduce frame rate |
| AUTH_EXPIRED | Yes | Refresh token |
| STREAM_LIMIT | No | Close and reconnect |
| SERVER_ERROR | No | Retry with backoff |
Exponential Backoff:
Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Attempt 5: Wait 16 seconds
Maximum: 60 seconds
Resume Stream:
{
"type": "resume",
"stream_id": "stream_abc123",
"last_frame_number": 1542
}
1. Connect (WebSocket handshake) 2. Configure (send config message) 3. Stream (send frames, receive results) 4. Pause (optional) 5. Resume (optional) 6. Close (graceful disconnect) Session Limits: - Maximum duration: 4 hours - Maximum idle time: 5 minutes - Maximum frames: 1,000,000
On session close, receive:
{
"type": "session_summary",
"stream_id": "stream_abc123",
"duration_ms": 3600000,
"frames_processed": 108000,
"emotion_distribution": {
"happiness": 0.45,
"neutral": 0.35,
"sadness": 0.12,
"other": 0.08
},
"average_dimensions": {
"valence": 0.42,
"arousal": 0.38
},
"events": {
"emotion_changes": 47,
"high_arousal_moments": 12
}
}
[OK] Key Takeaways:
Chapter 6 Complete | Approximate pages: 14
Next: Chapter 7 - Phase 4: Integration
WIA - World Certification Industry Association
Hongik Ingan - Benefit All Humanity