πŸ’— WIA Emotion AI Standard Ebook | Chapter 5 of 8


πŸ’— Chapter 5: Phase 2 - API Interface

Hongik Ingan (εΌ˜η›ŠδΊΊι–“)

"Benefit All Humanity"

A well-designed API enables developers to easily integrate emotion recognition into their applications, regardless of the underlying implementation.


5.1 API Design Principles

5.1.1 Core Principles

Principle Implementation
RESTful Standard HTTP methods, resource-based URLs
JSON All requests and responses in JSON format
Versioned API version in URL path (/v1/)
Authenticated API keys or OAuth 2.0
Rate Limited Clear rate limits with headers

5.1.2 Base URL

Production: https://api.wiastandards.com/emotion-ai/v1
Staging:    https://api-staging.wiastandards.com/emotion-ai/v1

5.2 Authentication

5.2.1 API Key Authentication

Header: X-WIA-API-Key: your_api_key_here

Example Request:
curl -X POST https://api.wiastandards.com/emotion-ai/v1/analyze/face \
  -H "X-WIA-API-Key: sk_live_abc123" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/face.jpg"}'

5.2.2 OAuth 2.0 (Optional)

Header: Authorization: Bearer <access_token>

Token Endpoint: POST /oauth/token
Scopes:
  - emotion:read    - Read emotion analysis results
  - emotion:analyze - Submit content for analysis
  - emotion:stream  - Real-time streaming access

5.3 Facial Emotion Analysis API

5.3.1 Analyze Face Image

POST /v1/analyze/face

Request Body:
{
    "image_url": "https://example.com/face.jpg",
    // OR
    "image_base64": "data:image/jpeg;base64,/9j/4AAQ...",

    "options": {
        "return_action_units": true,
        "return_dimensions": true,
        "return_landmarks": false,
        "min_face_size": 50,
        "max_faces": 5
    }
}

Response (200 OK):
{
    "request_id": "req_abc123",
    "processing_time_ms": 145,
    "faces": [
        {
            "face_id": 0,
            "bbox": { "x": 120, "y": 80, "width": 200, "height": 250 },
            "emotions": {
                "primary": { "label": "happiness", "confidence": 0.87 },
                "all": [
                    { "label": "happiness", "confidence": 0.87 },
                    { "label": "neutral", "confidence": 0.08 },
                    { "label": "surprise", "confidence": 0.05 }
                ]
            },
            "dimensions": {
                "valence": 0.72,
                "arousal": 0.45
            },
            "action_units": [
                { "au": "AU6", "intensity": 0.8 },
                { "au": "AU12", "intensity": 0.9 }
            ]
        }
    ]
}

5.3.2 Analyze Video Frame

POST /v1/analyze/face/video

Request Body:
{
    "video_url": "https://example.com/video.mp4",
    // OR
    "frames_base64": ["data:image/jpeg;base64,...", ...],

    "options": {
        "sample_rate": 5,  // Analyze every 5th frame
        "start_time_ms": 0,
        "end_time_ms": 10000,
        "track_faces": true
    }
}

Response (200 OK):
{
    "request_id": "req_video123",
    "duration_ms": 10000,
    "frame_count": 60,
    "analyzed_frames": 12,
    "timeline": [
        {
            "timestamp_ms": 0,
            "faces": [{ ... }]
        },
        {
            "timestamp_ms": 833,
            "faces": [{ ... }]
        }
    ],
    "summary": {
        "dominant_emotion": "happiness",
        "average_valence": 0.65,
        "average_arousal": 0.42,
        "emotion_transitions": 3
    }
}

5.4 Voice Emotion Analysis API

5.4.1 Analyze Audio

POST /v1/analyze/voice

Request Body:
{
    "audio_url": "https://example.com/audio.wav",
    // OR
    "audio_base64": "data:audio/wav;base64,...",

    "options": {
        "language": "en-US",
        "return_transcript": true,
        "return_prosody": true,
        "segment_by": "utterance"
    }
}

Response (200 OK):
{
    "request_id": "req_voice456",
    "duration_ms": 5500,
    "language_detected": "en-US",
    "transcript": "I'm really excited about this opportunity!",

    "emotions": {
        "primary": { "label": "excitement", "confidence": 0.82 },
        "all": [
            { "label": "excitement", "confidence": 0.82 },
            { "label": "happiness", "confidence": 0.65 },
            { "label": "neutral", "confidence": 0.12 }
        ]
    },

    "dimensions": {
        "valence": 0.78,
        "arousal": 0.85
    },

    "prosody": {
        "pitch_mean_hz": 185.5,
        "pitch_range_hz": 120.3,
        "intensity_db": 68.2,
        "speech_rate_wpm": 145,
        "pause_ratio": 0.12
    },

    "segments": [
        {
            "start_ms": 0,
            "end_ms": 2500,
            "text": "I'm really excited",
            "emotion": "excitement",
            "confidence": 0.85
        },
        {
            "start_ms": 2500,
            "end_ms": 5500,
            "text": "about this opportunity!",
            "emotion": "happiness",
            "confidence": 0.78
        }
    ]
}

5.4.2 Real-time Voice Analysis

POST /v1/analyze/voice/stream

Request: WebSocket upgrade (see Chapter 6)

Initial Message:
{
    "type": "config",
    "sample_rate": 16000,
    "encoding": "LINEAR16",
    "language": "en-US"
}

Audio Chunks:
Binary audio data (PCM)

Response Messages:
{
    "type": "partial",
    "timestamp_ms": 1500,
    "emotion": { "label": "neutral", "confidence": 0.7 }
}

{
    "type": "final",
    "segment": {
        "start_ms": 0,
        "end_ms": 3000,
        "text": "Hello, how are you?",
        "emotion": { "label": "happiness", "confidence": 0.82 }
    }
}

5.5 Text Sentiment Analysis API

5.5.1 Analyze Text

POST /v1/analyze/text

Request Body:
{
    "text": "I absolutely love this product! Best purchase ever.",
    "language": "en",

    "options": {
        "return_aspects": true,
        "return_entities": true,
        "detect_sarcasm": true
    }
}

Response (200 OK):
{
    "request_id": "req_text789",
    "text_length": 52,
    "language": "en",

    "sentiment": {
        "polarity": 0.92,
        "subjectivity": 0.85,
        "label": "very_positive"
    },

    "emotions": {
        "primary": { "label": "happiness", "confidence": 0.91 },
        "all": [
            { "label": "happiness", "confidence": 0.91 },
            { "label": "excitement", "confidence": 0.75 },
            { "label": "satisfaction", "confidence": 0.68 }
        ]
    },

    "dimensions": {
        "valence": 0.88,
        "arousal": 0.65
    },

    "aspects": [
        {
            "aspect": "product",
            "sentiment": 0.95,
            "mentions": ["product", "purchase"]
        }
    ],

    "entities": [
        {
            "text": "product",
            "type": "PRODUCT",
            "emotion": "happiness",
            "sentiment": 0.95
        }
    ],

    "sarcasm": {
        "detected": false,
        "confidence": 0.02
    }
}

5.5.2 Batch Text Analysis

POST /v1/analyze/text/batch

Request Body:
{
    "texts": [
        { "id": "review_1", "text": "Great product!" },
        { "id": "review_2", "text": "Terrible experience." },
        { "id": "review_3", "text": "It was okay, nothing special." }
    ],
    "language": "en"
}

Response (200 OK):
{
    "request_id": "req_batch001",
    "results": [
        {
            "id": "review_1",
            "sentiment": { "polarity": 0.85, "label": "positive" },
            "emotion": { "label": "happiness", "confidence": 0.82 }
        },
        {
            "id": "review_2",
            "sentiment": { "polarity": -0.78, "label": "negative" },
            "emotion": { "label": "anger", "confidence": 0.71 }
        },
        {
            "id": "review_3",
            "sentiment": { "polarity": 0.1, "label": "neutral" },
            "emotion": { "label": "neutral", "confidence": 0.85 }
        }
    ],
    "summary": {
        "average_sentiment": 0.06,
        "positive_count": 1,
        "negative_count": 1,
        "neutral_count": 1
    }
}

5.6 Biosignal Analysis API

5.6.1 Analyze Biosignals

POST /v1/analyze/biosignal

Request Body:
{
    "signals": {
        "ecg": {
            "sample_rate": 256,
            "data": [0.12, 0.15, 0.18, ...],
            "unit": "mV"
        },
        "eda": {
            "sample_rate": 4,
            "data": [2.5, 2.6, 2.8, ...],
            "unit": "uS"
        }
    },
    "duration_ms": 60000,

    "options": {
        "return_hrv": true,
        "return_stress": true,
        "return_engagement": true
    }
}

Response (200 OK):
{
    "request_id": "req_bio001",
    "duration_ms": 60000,

    "heart_rate": {
        "mean_bpm": 72,
        "min_bpm": 65,
        "max_bpm": 82,
        "std_bpm": 5.2
    },

    "hrv": {
        "rmssd_ms": 42.5,
        "sdnn_ms": 55.3,
        "pnn50": 0.18,
        "lf_hf_ratio": 1.8
    },

    "eda": {
        "scl_mean": 3.2,
        "scr_count": 5,
        "scr_amplitude_mean": 0.8
    },

    "derived_states": {
        "stress_level": 0.35,
        "relaxation": 0.55,
        "engagement": 0.72,
        "arousal": 0.45
    },

    "timeline": [
        { "time_ms": 0, "stress": 0.3, "engagement": 0.7 },
        { "time_ms": 10000, "stress": 0.35, "engagement": 0.75 },
        { "time_ms": 20000, "stress": 0.4, "engagement": 0.68 }
    ]
}

5.7 Multimodal Fusion API

5.7.1 Multimodal Analysis

POST /v1/analyze/multimodal

Request Body:
{
    "modalities": {
        "face": {
            "image_url": "https://example.com/face.jpg"
        },
        "voice": {
            "audio_url": "https://example.com/audio.wav"
        },
        "text": {
            "text": "I'm feeling great today!"
        }
    },

    "fusion": {
        "method": "weighted_average",
        "weights": {
            "face": 0.5,
            "voice": 0.3,
            "text": 0.2
        }
    }
}

Response (200 OK):
{
    "request_id": "req_multi001",

    "fused_result": {
        "emotions": {
            "primary": { "label": "happiness", "confidence": 0.88 }
        },
        "dimensions": {
            "valence": 0.75,
            "arousal": 0.52
        }
    },

    "modality_results": {
        "face": {
            "emotions": { "primary": { "label": "happiness", "confidence": 0.85 } },
            "dimensions": { "valence": 0.72, "arousal": 0.48 }
        },
        "voice": {
            "emotions": { "primary": { "label": "happiness", "confidence": 0.82 } },
            "dimensions": { "valence": 0.78, "arousal": 0.55 }
        },
        "text": {
            "emotions": { "primary": { "label": "happiness", "confidence": 0.90 } },
            "dimensions": { "valence": 0.80, "arousal": 0.60 }
        }
    },

    "fusion_details": {
        "method": "weighted_average",
        "weights_used": { "face": 0.5, "voice": 0.3, "text": 0.2 },
        "agreement_score": 0.92
    }
}

5.8 Error Handling

5.8.1 Error Response Format

{
    "error": {
        "code": "INVALID_IMAGE",
        "message": "The provided image could not be processed",
        "details": {
            "reason": "No face detected in image",
            "suggestion": "Ensure the image contains a clearly visible face"
        }
    },
    "request_id": "req_err001"
}

5.8.2 Error Codes

HTTP Status Error Code Description
400 INVALID_REQUEST Malformed request body
400 INVALID_IMAGE Image cannot be processed
400 NO_FACE_DETECTED No face found in image
401 UNAUTHORIZED Invalid or missing API key
403 FORBIDDEN Insufficient permissions
429 RATE_LIMITED Too many requests
500 INTERNAL_ERROR Server error

5.9 Rate Limits

Plan Requests/Minute Requests/Day
Free 10 100
Developer 60 10,000
Business 300 100,000
Enterprise Custom Custom

5.10 Chapter Summary

[OK] Key Takeaways:

  1. RESTful Design: Standard HTTP methods, JSON format
  2. Four Modality APIs: Face, Voice, Text, Biosignal
  3. Multimodal Fusion: Combine modalities with configurable weights
  4. Comprehensive Output: Emotions, dimensions, AUs, metadata
  5. Error Handling: Clear error codes and messages
  6. Rate Limiting: Tiered plans for different use cases

Chapter 5 Complete | Approximate pages: 16

Next: Chapter 6 - Phase 3: Streaming Protocol


WIA - World Certification Industry Association

Hongik Ingan - Benefit All Humanity

https://wiastandards.com