πŸ’— WIA Emotion AI Standard Ebook | Chapter 6 of 8


πŸ’— Chapter 6: Phase 3 - Streaming Protocol

Hongik Ingan (εΌ˜η›ŠδΊΊι–“)

"Benefit All Humanity"

Real-time emotion streaming enables continuous affective computing for applications requiring immediate feedback and temporal analysis.


6.1 Overview

6.1.1 Purpose

Phase 3 defines the real-time streaming protocol for continuous emotion analysis. This is essential for applications like:

6.1.2 Protocol Options

Protocol Use Case Latency
WebSocket Web/mobile apps, bidirectional <100ms
gRPC Server-to-server, high performance <50ms
MQTT IoT devices, low bandwidth <200ms
Server-Sent Events One-way streaming <150ms

6.2 WebSocket Protocol

6.2.1 Connection

Endpoint: wss://stream.wiastandards.com/emotion-ai/v1/stream

Connection Request:
GET /emotion-ai/v1/stream HTTP/1.1
Host: stream.wiastandards.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
X-WIA-API-Key: your_api_key

Connection Response:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

6.2.2 Message Types

Type Direction Purpose
config Client β†’ Server Configure stream parameters
frame Client β†’ Server Send video/audio frame
result Server β†’ Client Emotion analysis result
error Server β†’ Client Error notification
ping/pong Bidirectional Keep-alive

6.2.3 Configuration Message

{
    "type": "config",
    "stream_id": "stream_abc123",
    "modalities": ["facial", "voice"],

    "facial": {
        "frame_rate": 30,
        "resolution": "720p",
        "encoding": "jpeg",
        "return_action_units": true
    },

    "voice": {
        "sample_rate": 16000,
        "encoding": "LINEAR16",
        "language": "en-US"
    },

    "output": {
        "rate": "per_frame",  // or "per_second", "on_change"
        "include_dimensions": true,
        "include_raw_scores": false
    }
}

6.2.4 Frame Message (Binary)

Binary Frame Format:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Header (16 bytes)                           β”‚
β”‚   - Magic: 0x57494145 ("WIAE")              β”‚
β”‚   - Version: 1                              β”‚
β”‚   - Modality: 1=facial, 2=voice, 3=both     β”‚
β”‚   - Timestamp: 8 bytes (ms since epoch)     β”‚
β”‚   - Length: 4 bytes                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Payload (variable length)                   β”‚
β”‚   - JPEG image data or PCM audio            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6.2.5 Result Message

{
    "type": "result",
    "stream_id": "stream_abc123",
    "frame_number": 1542,
    "timestamp": "2025-12-19T10:30:15.123Z",
    "latency_ms": 42,

    "emotions": {
        "primary": { "label": "happiness", "confidence": 0.85 },
        "change_detected": false
    },

    "dimensions": {
        "valence": 0.72,
        "arousal": 0.48
    },

    "action_units": [
        { "au": "AU6", "intensity": 0.78 },
        { "au": "AU12", "intensity": 0.82 }
    ],

    "face": {
        "detected": true,
        "tracking_id": "face_001",
        "bbox": { "x": 120, "y": 80, "width": 200, "height": 250 }
    }
}

6.3 Frame Rate and Latency

6.3.1 Frame Rate Requirements

Use Case Minimum FPS Recommended FPS
Video calls 15 30
Gaming/VR 30 60
Driver monitoring 30 60
Healthcare 10 30
Retail analytics 5 15

6.3.2 Latency Targets

Component Target Maximum
Frame capture <10ms 20ms
Network transfer <30ms 50ms
Face detection <15ms 30ms
Emotion classification <20ms 40ms
Response delivery <10ms 20ms
Total (end-to-end) <85ms 160ms

6.3.3 Output Rate Modes

"per_frame":
  - Result for every input frame
  - Highest bandwidth, lowest latency
  - Best for gaming, VR

"per_second":
  - Aggregated result every second
  - Lower bandwidth
  - Best for analytics, monitoring

"on_change":
  - Result only when emotion changes
  - Lowest bandwidth
  - Best for event-driven applications

"threshold":
  - Result when confidence exceeds threshold
  - Configurable sensitivity
  - Best for alert systems

6.4 Event-Based Streaming

6.4.1 Emotion Events

{
    "type": "event",
    "event_type": "emotion_change",
    "stream_id": "stream_abc123",
    "timestamp": "2025-12-19T10:30:15.123Z",

    "previous_emotion": {
        "label": "neutral",
        "confidence": 0.75
    },

    "current_emotion": {
        "label": "happiness",
        "confidence": 0.82
    },

    "transition_duration_ms": 450,
    "trigger": "gradual"  // or "sudden"
}

6.4.2 Event Types

Event Trigger Use Case
emotion_change Primary emotion changes Logging, alerts
high_arousal Arousal exceeds threshold Stress detection
negative_valence Valence drops below threshold Dissatisfaction detection
face_lost Face tracking lost Attention monitoring
face_detected New face detected Presence detection
micro_expression Brief expression detected Deception detection

6.5 Security

6.5.1 Transport Security

6.5.2 Authentication

API Key (in header):
X-WIA-API-Key: sk_live_abc123

JWT Token (in query or header):
wss://stream.wiastandards.com/v1/stream?token=eyJhbG...

Token Refresh:
{
    "type": "auth_refresh",
    "token": "new_jwt_token"
}

6.5.3 Privacy Protection


6.6 Error Handling

6.6.1 Error Message

{
    "type": "error",
    "code": "FACE_NOT_DETECTED",
    "message": "No face detected in frame 1543",
    "frame_number": 1543,
    "recoverable": true
}

6.6.2 Error Codes

Code Recoverable Action
FACE_NOT_DETECTED Yes Continue sending frames
INVALID_FRAME Yes Skip frame, continue
RATE_EXCEEDED Yes Reduce frame rate
AUTH_EXPIRED Yes Refresh token
STREAM_LIMIT No Close and reconnect
SERVER_ERROR No Retry with backoff

6.6.3 Reconnection Strategy

Exponential Backoff:
  Attempt 1: Wait 1 second
  Attempt 2: Wait 2 seconds
  Attempt 3: Wait 4 seconds
  Attempt 4: Wait 8 seconds
  Attempt 5: Wait 16 seconds
  Maximum: 60 seconds

Resume Stream:
{
    "type": "resume",
    "stream_id": "stream_abc123",
    "last_frame_number": 1542
}

6.7 Session Management

6.7.1 Session Lifecycle

1. Connect (WebSocket handshake)
2. Configure (send config message)
3. Stream (send frames, receive results)
4. Pause (optional)
5. Resume (optional)
6. Close (graceful disconnect)

Session Limits:
  - Maximum duration: 4 hours
  - Maximum idle time: 5 minutes
  - Maximum frames: 1,000,000

6.7.2 Session Summary

On session close, receive:

{
    "type": "session_summary",
    "stream_id": "stream_abc123",
    "duration_ms": 3600000,
    "frames_processed": 108000,

    "emotion_distribution": {
        "happiness": 0.45,
        "neutral": 0.35,
        "sadness": 0.12,
        "other": 0.08
    },

    "average_dimensions": {
        "valence": 0.42,
        "arousal": 0.38
    },

    "events": {
        "emotion_changes": 47,
        "high_arousal_moments": 12
    }
}

6.8 Chapter Summary

[OK] Key Takeaways:

  1. WebSocket Primary: Real-time bidirectional communication
  2. Low Latency: Target <100ms end-to-end
  3. Flexible Output: per_frame, per_second, on_change
  4. Event-Based: emotion_change, high_arousal, etc.
  5. Security: TLS 1.3, authentication, privacy
  6. Resilience: Reconnection, session resume

Chapter 6 Complete | Approximate pages: 14

Next: Chapter 7 - Phase 4: Integration


WIA - World Certification Industry Association

Hongik Ingan - Benefit All Humanity

https://wiastandards.com