๐ WIA Emotion AI Standard Ebook | Chapter 3 of 8
Hongik Ingan (ๅผ็ไบบ้)
"Benefit All Humanity"
The WIA Emotion AI Standard provides a comprehensive framework for ethical, accurate, and interoperable affective computing systems.
The WIA Emotion AI Standard aims to establish a universal, open framework for emotion recognition systems that prioritizes human wellbeing, privacy, and accuracy while enabling innovation and interoperability.
| Goal | Description | Benefit |
|---|---|---|
| Interoperability | Common data formats and APIs | No vendor lock-in |
| Accuracy | Minimum accuracy thresholds | Reliable results |
| Ethics | Privacy and consent requirements | Responsible AI |
| Fairness | Bias testing requirements | Equitable performance |
| Transparency | Clear documentation | User understanding |
The WIA Emotion AI Standard is organized into four phases, each addressing a specific layer of the affective computing stack:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Phase 4: Integration โ โ Healthcare โ Education โ Marketing โ Automotive โ XR โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Phase 3: Protocol โ โ WebSocket โ REST โ Real-time Streaming โ Security โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Phase 2: API Interface โ โ Facial โ Voice โ Text โ Biosignal โ Multimodal Fusion โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ Phase 1: Data Format โ โ JSON Schema โ Emotions โ AU Codes โ V-A โ Metadata โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The WIA Standard supports Ekman's six basic emotions plus neutral:
| Emotion | Label (EN) | Label (KO) | Emoji | V-A Typical Range |
|---|---|---|---|---|
| Happiness | happiness | ํ๋ณต | ๐ | V: 0.5~1.0, A: 0.2~0.8 |
| Sadness | sadness | ์ฌํ | ๐ข | V: -0.8~-0.3, A: -0.5~0.1 |
| Anger | anger | ๋ถ๋ ธ | ๐ | V: -0.7~-0.2, A: 0.3~0.9 |
| Fear | fear | ๊ณตํฌ | ๐จ | V: -0.7~-0.2, A: 0.4~0.9 |
| Disgust | disgust | ํ์ค | ๐คข | V: -0.8~-0.3, A: -0.1~0.5 |
| Surprise | surprise | ๋๋ | ๐ฎ | V: -0.2~0.5, A: 0.5~1.0 |
| Neutral | neutral | ์ค๋ฆฝ | ๐ | V: -0.2~0.2, A: -0.2~0.2 |
The WIA Standard also supports dimensional representation:
Valence-Arousal Space:
+1.0 (High Arousal)
โ
Angry โ Excited
โ
-1.0 โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ +1.0
(Negative) โ (Positive)
โ
Sad โ Calm
โ
-1.0 (Low Arousal)
Value Ranges:
Valence: -1.0 (most negative) to +1.0 (most positive)
Arousal: -1.0 (low energy) to +1.0 (high energy)
Beyond basic emotions, the standard supports extended labels for finer granularity:
| Category | Extended Labels |
|---|---|
| Positive High-Arousal | excited, elated, enthusiastic, amused |
| Positive Low-Arousal | content, relaxed, calm, serene |
| Negative High-Arousal | stressed, anxious, frustrated, irritated |
| Negative Low-Arousal | bored, tired, depressed, melancholic |
| Cognitive States | confused, focused, interested, engaged |
The WIA Standard supports the complete FACS system with 44 Action Units:
| AU Range | Region | Count |
|---|---|---|
| AU1-AU7 | Upper Face (Brows, Forehead) | 7 AUs |
| AU9-AU17 | Nose and Upper Lip | 8 AUs |
| AU18-AU28 | Lower Face (Lips, Jaw) | 11 AUs |
| AU41-AU46 | Eyelids | 6 AUs |
| AU51-AU58 | Head Position | 8 AUs |
| AU61-AU64 | Eye Position | 4 AUs |
Action Unit intensities are encoded on a 0-1 scale:
Intensity Mapping: 0.0 = Not present 0.01-0.20 = Trace (A) 0.21-0.40 = Slight (B) 0.41-0.60 = Marked (C) 0.61-0.80 = Pronounced (D) 0.81-1.00 = Maximum (E)
| Input Type | Image (JPEG, PNG) or Video (H.264, VP9) |
| Resolution | Minimum 480p, Recommended 720p+ |
| Frame Rate | Minimum 15 fps, Recommended 30 fps |
| Output | Emotion labels, AU intensities, V-A coordinates |
| Latency Target | < 100ms per frame |
| Input Type | Audio (WAV, MP3, WebM) |
| Sample Rate | Minimum 16kHz, Recommended 44.1kHz |
| Channels | Mono or Stereo |
| Features | Pitch, intensity, speech rate, voice quality, prosody |
| Output | Emotion labels, V-A coordinates, confidence |
| Input Type | UTF-8 text |
| Languages | 100+ languages supported |
| Max Length | 10,000 characters per request |
| Output | Sentiment polarity, emotion labels, entity emotions |
| Features | Sarcasm detection, aspect sentiment, intensity |
| Supported Signals | ECG/HR, EDA/GSR, EEG, Respiration |
| Sample Rates | HR: 1Hz+, EDA: 4Hz+, EEG: 128Hz+ |
| Format | JSON array or CSV |
| Output | Arousal level, stress indicators, engagement |
The WIA Standard supports multiple fusion approaches:
| Strategy | Description | Use Case |
|---|---|---|
| Early Fusion | Combine raw features before classification | When modalities are synchronized |
| Late Fusion | Combine classification outputs | When modalities are independent |
| Decision Fusion | Voting or weighted averaging of decisions | Simple, robust approach |
| Attention Fusion | Learned weights based on context | When reliability varies |
Default weights for multimodal fusion:
Default Weights (configurable): Facial: 0.40 (highest reliability for discrete emotions) Voice: 0.25 (good for arousal detection) Text: 0.20 (context-dependent) Biosignal: 0.15 (hard to fake, but noisy) Weights should be adjusted based on: - Signal quality - Context (e.g., voice-only call) - Cultural factors - Individual calibration
| Level | Name | Requirements | Use Cases |
|---|---|---|---|
| 1 | Compliant | Follows data format, 75% accuracy | Research, prototypes |
| 2 | Certified | Full API compliance, 80% accuracy, bias testing | Commercial products |
| 3 | Certified Plus | All requirements, 85% accuracy, audited | Healthcare, sensitive apps |
[OK] Key Takeaways:
In Chapter 4, we will dive deep into Phase 1: Emotion Data Format, covering JSON schemas, field specifications, and practical examples.
Chapter 3 Complete | Approximate pages: 14
Next: Chapter 4 - Phase 1: Emotion Data Format
WIA - World Certification Industry Association
Hongik Ingan - Benefit All Humanity