💗 WIA Emotion AI Standard Ebook | Chapter 6 of 8

💗 Chapter 6: Phase 3 - Streaming Protocol

Hongik Ingan (弘益人間)

"Benefit All Humanity"

Real-time emotion streaming enables continuous affective computing for applications requiring immediate feedback and temporal analysis.

6.1 Overview

6.1.1 Purpose

Phase 3 defines the real-time streaming protocol for continuous emotion analysis. This is essential for applications like:

Live video call emotion monitoring
Driver drowsiness detection
Gaming and VR experiences
Mental health session monitoring
Customer service sentiment tracking

6.1.2 Protocol Options

Protocol	Use Case	Latency
WebSocket	Web/mobile apps, bidirectional	<100ms
gRPC	Server-to-server, high performance	<50ms
MQTT	IoT devices, low bandwidth	<200ms
Server-Sent Events	One-way streaming	<150ms

6.2 WebSocket Protocol

6.2.1 Connection

Endpoint: wss://stream.wiastandards.com/emotion-ai/v1/stream

Connection Request:
GET /emotion-ai/v1/stream HTTP/1.1
Host: stream.wiastandards.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
X-WIA-API-Key: your_api_key

Connection Response:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

6.2.2 Message Types

Type	Direction	Purpose
config	Client → Server	Configure stream parameters
frame	Client → Server	Send video/audio frame
result	Server → Client	Emotion analysis result
error	Server → Client	Error notification
ping/pong	Bidirectional	Keep-alive

6.2.3 Configuration Message

{
    "type": "config",
    "stream_id": "stream_abc123",
    "modalities": ["facial", "voice"],

    "facial": {
        "frame_rate": 30,
        "resolution": "720p",
        "encoding": "jpeg",
        "return_action_units": true
    },

    "voice": {
        "sample_rate": 16000,
        "encoding": "LINEAR16",
        "language": "en-US"
    },

    "output": {
        "rate": "per_frame",  // or "per_second", "on_change"
        "include_dimensions": true,
        "include_raw_scores": false
    }
}

6.2.4 Frame Message (Binary)

Binary Frame Format:
┌─────────────────────────────────────────────┐
│ Header (16 bytes)                           │
│   - Magic: 0x57494145 ("WIAE")              │
│   - Version: 1                              │
│   - Modality: 1=facial, 2=voice, 3=both     │
│   - Timestamp: 8 bytes (ms since epoch)     │
│   - Length: 4 bytes                         │
├─────────────────────────────────────────────┤
│ Payload (variable length)                   │
│   - JPEG image data or PCM audio            │
└─────────────────────────────────────────────┘

6.2.5 Result Message

{
    "type": "result",
    "stream_id": "stream_abc123",
    "frame_number": 1542,
    "timestamp": "2025-12-19T10:30:15.123Z",
    "latency_ms": 42,

    "emotions": {
        "primary": { "label": "happiness", "confidence": 0.85 },
        "change_detected": false
    },

    "dimensions": {
        "valence": 0.72,
        "arousal": 0.48
    },

    "action_units": [
        { "au": "AU6", "intensity": 0.78 },
        { "au": "AU12", "intensity": 0.82 }
    ],

    "face": {
        "detected": true,
        "tracking_id": "face_001",
        "bbox": { "x": 120, "y": 80, "width": 200, "height": 250 }
    }
}

6.3 Frame Rate and Latency

6.3.1 Frame Rate Requirements

Use Case	Minimum FPS	Recommended FPS
Video calls	15	30
Gaming/VR	30	60
Driver monitoring	30	60
Healthcare	10	30
Retail analytics	5	15

6.3.2 Latency Targets

Component	Target	Maximum
Frame capture	<10ms	20ms
Network transfer	<30ms	50ms
Face detection	<15ms	30ms
Emotion classification	<20ms	40ms
Response delivery	<10ms	20ms
Total (end-to-end)	<85ms	160ms

6.3.3 Output Rate Modes

"per_frame":
  - Result for every input frame
  - Highest bandwidth, lowest latency
  - Best for gaming, VR

"per_second":
  - Aggregated result every second
  - Lower bandwidth
  - Best for analytics, monitoring

"on_change":
  - Result only when emotion changes
  - Lowest bandwidth
  - Best for event-driven applications

"threshold":
  - Result when confidence exceeds threshold
  - Configurable sensitivity
  - Best for alert systems

6.4 Event-Based Streaming

6.4.1 Emotion Events

{
    "type": "event",
    "event_type": "emotion_change",
    "stream_id": "stream_abc123",
    "timestamp": "2025-12-19T10:30:15.123Z",

    "previous_emotion": {
        "label": "neutral",
        "confidence": 0.75
    },

    "current_emotion": {
        "label": "happiness",
        "confidence": 0.82
    },

    "transition_duration_ms": 450,
    "trigger": "gradual"  // or "sudden"
}

6.4.2 Event Types

Event	Trigger	Use Case
emotion_change	Primary emotion changes	Logging, alerts
high_arousal	Arousal exceeds threshold	Stress detection
negative_valence	Valence drops below threshold	Dissatisfaction detection
face_lost	Face tracking lost	Attention monitoring
face_detected	New face detected	Presence detection
micro_expression	Brief expression detected	Deception detection

6.5 Security

6.5.1 Transport Security

TLS 1.3: Required for all connections
Certificate Pinning: Recommended for mobile apps
WSS only: No unencrypted WebSocket

6.5.2 Authentication

API Key (in header):
X-WIA-API-Key: sk_live_abc123

JWT Token (in query or header):
wss://stream.wiastandards.com/v1/stream?token=eyJhbG...

Token Refresh:
{
    "type": "auth_refresh",
    "token": "new_jwt_token"
}

6.5.3 Privacy Protection

End-to-end encryption: Optional for sensitive applications
No storage: Frames processed and discarded by default
Anonymization: subject_id should be pseudonymized
Consent signal: Include consent token in config

6.6 Error Handling

6.6.1 Error Message

{
    "type": "error",
    "code": "FACE_NOT_DETECTED",
    "message": "No face detected in frame 1543",
    "frame_number": 1543,
    "recoverable": true
}

6.6.2 Error Codes

Code	Recoverable	Action
FACE_NOT_DETECTED	Yes	Continue sending frames
INVALID_FRAME	Yes	Skip frame, continue
RATE_EXCEEDED	Yes	Reduce frame rate
AUTH_EXPIRED	Yes	Refresh token
STREAM_LIMIT	No	Close and reconnect
SERVER_ERROR	No	Retry with backoff

6.6.3 Reconnection Strategy

Exponential Backoff:
  Attempt 1: Wait 1 second
  Attempt 2: Wait 2 seconds
  Attempt 3: Wait 4 seconds
  Attempt 4: Wait 8 seconds
  Attempt 5: Wait 16 seconds
  Maximum: 60 seconds

Resume Stream:
{
    "type": "resume",
    "stream_id": "stream_abc123",
    "last_frame_number": 1542
}

6.7 Session Management

6.7.1 Session Lifecycle

1. Connect (WebSocket handshake)
2. Configure (send config message)
3. Stream (send frames, receive results)
4. Pause (optional)
5. Resume (optional)
6. Close (graceful disconnect)

Session Limits:
  - Maximum duration: 4 hours
  - Maximum idle time: 5 minutes
  - Maximum frames: 1,000,000

6.7.2 Session Summary

On session close, receive:

{
    "type": "session_summary",
    "stream_id": "stream_abc123",
    "duration_ms": 3600000,
    "frames_processed": 108000,

    "emotion_distribution": {
        "happiness": 0.45,
        "neutral": 0.35,
        "sadness": 0.12,
        "other": 0.08
    },

    "average_dimensions": {
        "valence": 0.42,
        "arousal": 0.38
    },

    "events": {
        "emotion_changes": 47,
        "high_arousal_moments": 12
    }
}

6.8 Chapter Summary

[OK] Key Takeaways:

WebSocket Primary: Real-time bidirectional communication
Low Latency: Target <100ms end-to-end
Flexible Output: per_frame, per_second, on_change
Event-Based: emotion_change, high_arousal, etc.
Security: TLS 1.3, authentication, privacy
Resilience: Reconnection, session resume

Chapter 6 Complete | Approximate pages: 14

Next: Chapter 7 - Phase 4: Integration

WIA - World Certification Industry Association

Hongik Ingan - Benefit All Humanity

https://wiastandards.com