Reading the Mind in the Eyes Test - Revised (RMET-R)

Version: v1 (current)

An advanced theory of mind assessment measuring the ability to infer complex mental states from facial expressions in the eye region.

Overview

The Reading the Mind in the Eyes Test - Revised (RMET-R) is the updated version of the classic Mind in the Eyes task, with improved stimuli, more sophisticated mental state terms, and enhanced psychometric properties. Participants view photographs showing only the eye region of faces and select which complex mental state (from four options) best describes what the person is thinking or feeling.

Unlike basic emotion recognition tasks, the RMET-R requires nuanced social-cognitive inference about sophisticated mental states like "contemplative," "suspicious," "playful," "skeptical," or "thoughtful." The task taps into advanced theory of mind abilities essential for understanding others' intentions and navigating complex social situations.

The RMET-R is considered the gold-standard measure of adult-level mentalizing ability and is extensively used in autism research, social neuroscience, empathy studies, and clinical assessment of social cognition deficits.

Scientific Background

Classic Findings:

Theory of Mind Sensitivity: Requires inferring internal mental states from minimal visual cues
Autism Marker: Individuals with autism spectrum conditions score significantly lower (mean ~22/36 vs. ~26/36 in controls)
Sex Differences: Women typically score 1-2 points higher than men on average
Brain Correlates: Activates medial prefrontal cortex (mPFC), temporoparietal junction (TPJ), and superior temporal sulcus (STS)
Empathy Relationship: Correlates moderately with self-reported empathy (r ~ 0.3-0.4)
Stability: Test-retest reliability ~0.63, indicating moderate stability

Key Mechanisms:

Mentalizing: Attributing mental states to others (beliefs, desires, intentions)
Social Perception: Extracting social information from facial features
Semantic Knowledge: Understanding nuanced emotion/cognition vocabulary
Integrative Processing: Combining perceptual cues with social knowledge

Seminal Papers:

Baron-Cohen, Wheelwright, Hill, Raste, & Plumb (2001): The "Reading the Mind in the Eyes" Test revised version: A study with normal adults, and adults with Asperger syndrome or high-functioning autism
Baron-Cohen et al. (2015): Elevated fetal steroidogenic activity in autism
Oakley, Brewer, Bird, & Catmur (2016): Theory of mind is not theory of emotion

Why Researchers Use This Task

Autism Research: Core social-cognitive assessment; identifies mentalizing deficits even in high-functioning individuals
Social Neuroscience: Map neural networks supporting theory of mind using fMRI/EEG
Empathy Studies: Correlate mentalizing ability with empathy, prosocial behavior, and relationship quality
Clinical Assessment: Evaluate social cognition in schizophrenia, personality disorders, social anxiety, depression
Developmental Research: Track development of advanced mentalizing across lifespan (childhood through aging)
Intervention Studies: Measure improvements in social cognition following therapy or training

Current Implementation Status

Fully Implemented:

✅ 36-item revised version with standardized eye stimuli
✅ Four-option forced-choice format
✅ Correct answer key for automated scoring
✅ Response time measurement per item
✅ Practice trials with feedback
✅ Glossary of mental state terms (optional accessibility feature)

Partially Implemented:

⚠️ Limited to digital stimuli (original paper version used physical booklet)
⚠️ English mental state terms only (no multilingual versions yet)

Not Yet Implemented:

❌ Child version (simplified terms and stimuli)
❌ Short-form version (10-12 items for screening)
❌ Adaptive administration based on performance

Configuration Parameters

Task Structure

Parameter	Type	Default	Description
Num Items	number	36	Number of eye stimuli (standard = 36)
Randomize Order	boolean	true	Randomize item presentation order
Show Glossary	boolean	false	Provide definitions of mental state terms
Self Paced	boolean	true	Allow unlimited time per item (recommended)

Response Parameters

Parameter	Type	Default	Description
Num Options	number	4	Response choices per item (always 4 in standard RMET-R)
Response Timeout (ms)	number	0	Max time per item (0 = unlimited, recommended)
Show Progress	boolean	true	Display item number (e.g., "Item 5 of 36")

Practice Configuration

Parameter	Type	Default	Description
Practice Mode	string	'optional'	Practice availability ('none', 'optional', 'mandatory')
Practice Items	number	2	Number of practice items with feedback

Accessibility

Parameter	Type	Default	Description
Font Size (px)	number	18	Size of mental state term text
High Contrast	boolean	false	Enhanced visual contrast for low vision

Data Output

Markers

{
  "type": "stimulus_shown",
  "ts": "2024-01-01T00:00:01.000Z",
  "hr": 1234.56,
  "data": {
    "trial_index": 0,
    "item_id": "rmet_001",
    "image_url": "eyes/001.jpg",
    "correct_answer": "playful",
    "options": ["playful", "comforting", "irritated", "bored"]
  }
}

Response Data

{
  "trial_index": 0,
  "item_id": "rmet_001",
  "selected_option": "playful",
  "correct": true,
  "latency_ms": 4820,
  "source": "button",
  "ts": "2024-01-01T00:00:05.820Z",
  "hr": 6054.56
}

Summary Artifact

{
  "task_kind": "rmet_revised",
  "total_items": 36,
  "completed_items": 36,
  "correct": 28,
  "accuracy": 0.778,
  "mean_rt_ms": 4650,
  "median_rt_ms": 4120,
  "performance_category": "typical_range",
  "by_item": [
    {
      "item_id": "rmet_001",
      "correct_answer": "playful",
      "selected": "playful",
      "correct": true,
      "rt_ms": 4820
    }
  ]
}

Example Research Configurations

Standard Clinical/Research Administration

Items: Full 36-item revised version
Presentation: Randomized order
Timing: Self-paced (no time limit)
Glossary: Available but not shown by default
Practice: 2 items with feedback
Scoring: Total correct out of 36
Analysis: Compare to normative data (mean ~26, SD ~3.5)

Screening Version (Time-Limited)

Items: 36 items
Timing: 10-second limit per item
Purpose: Reduce ceiling effects in high-functioning samples
Analysis: Speed-accuracy trade-off

Autism Research Protocol

Items: 36 items
Presentation: Randomized
Additional: Record eye-tracking to examine fixation patterns
Analysis: Compare ASD vs. controls on accuracy and viewing strategies

Intervention Study (Pre-Post)

Items: 36 items at baseline and follow-up
Order: Different randomization at each timepoint
Analysis: Change in total score after social skills training
Interpretation: Increase of 3+ points = meaningful improvement

Participant Experience

Instructions: "You will see photographs of people's eyes. For each photo, choose which word best describes what the person is thinking or feeling."
Glossary Option: (If enabled) "A glossary of terms is available if you need to check the meaning of any words."
Practice Items (if enabled):
- See example eye photo with 4 options
- Make selection
- Receive feedback: "Correct! The person looks 'playful'" or "Not quite. The correct answer was 'playful'"
- 2 practice items
Main Task:
- For each of 36 items:
  - View eye photograph (center of screen)
  - Read 4 mental state options (below or around image)
  - Click/tap the best answer
  - No feedback during main task
- Progress indicator shows "Item X of 36"
Completion: "Task complete. You answered [X] out of 36 items correctly."

Design Recommendations

General Guidelines

Self-Paced: Recommended to allow time for careful consideration
No Time Pressure: Timed administration reduces validity by adding speed demands
Randomization: Counterbalances order effects across participants
Glossary: Helpful for non-native speakers or individuals unfamiliar with complex emotion terms
Practice: Ensures participants understand task format

Scoring and Interpretation

Normative Data (Baron-Cohen et al., 2001):

Typical Adults: Mean ~26/36 (SD ~3.5), range 18-34
Female Advantage: Women score ~1-2 points higher on average
Autism Spectrum: Mean ~22/36 (SD ~4), range 10-32
Cutoff: Scores ≤21 suggest mentalizing difficulties (not diagnostic alone)

Scoring Methods:

Total Correct: Primary outcome (0-36)
Response Time: Secondary measure; very slow RTs may indicate difficulty or overthinking
Item Analysis: Some items are more difficult; can examine patterns

Population-Specific Adaptations

Adolescents (13+ years):

Full 36-item version appropriate
Consider glossary for less familiar terms
Scores slightly lower than adults (developmental trajectory)

Older Adults (65+):

Full version appropriate
Larger font size (20-24px) recommended
High-contrast mode for visual accessibility
Scores generally stable with aging (no major decline in healthy aging)

Non-Native Speakers:

Critical: Glossary should be mandatory
Consider translating mental state terms (requires validation)
Cultural differences in expression recognition possible

Autism Spectrum:

Standard administration appropriate
Expect lower scores but wide variability
Some individuals develop compensatory strategies (explicit rule-learning)
Scores don't correlate with intelligence in ASD

Clinical Populations:

Schizophrenia: Moderate impairment (mean ~21-23)
Borderline Personality Disorder: Variable performance
Social Anxiety: Typically normal or slightly reduced scores
Depression: May show slight reduction during acute episodes

Common Issues and Solutions

Issue	Solution
Ceiling effects (many perfect or near-perfect scores)	Add time pressure (10s/item) or use more difficult items
Participant unfamiliar with vocabulary	Enable glossary feature; provide definitions before task
Very slow responses (>10s per item)	Normal; allow self-pacing; only flag if consistently >15s
Low motivation/engagement	Emphasize no right/wrong, focus on "best fit"; use progress bar
Cultural differences in expressions	Acknowledge limitation; use caution comparing across cultures
Participants overthink responses	Instruct to go with "first impression" or "gut feeling"

Validity and Reliability

Psychometric Properties:

Internal Consistency: Cronbach's α = 0.60-0.70 (modest due to item diversity)
Test-Retest: r = 0.63 over 1 year (moderate stability)
Convergent Validity: Correlates with other ToM measures (r ~ 0.3-0.5)
Discriminant Validity: Low correlation with basic emotion recognition (r ~ 0.2-0.3)
Sensitivity: Discriminates autism from typical development (Cohen's d ~ 0.8-1.2)

Scoring Output Fields

{
  "total_score": 28,
  "accuracy": 0.778,
  "percentile": 60,
  "performance_band": "typical",
  "comparison_to_mean": "+1.14 SD",
  "items_correct": 28,
  "items_total": 36,
  "mean_rt_ms": 4650,
  "median_rt_ms": 4120,
  "very_fast_responses": 0,
  "very_slow_responses": 2,
  "clinical_interpretation": "Score within typical adult range"
}

References

Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The "Reading the Mind in the Eyes" Test revised version: A study with normal adults, and adults with Asperger syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry, 42(2), 241-251.
Baron-Cohen, S., & Wheelwright, S. (2004). The empathy quotient: An investigation of adults with Asperger syndrome or high-functioning autism, and normal sex differences. Journal of Autism and Developmental Disorders, 34(2), 163-175.
Oakley, B. F., Brewer, R., Bird, G., & Catmur, C. (2016). Theory of mind is not theory of emotion: A cautionary note on the Reading the Mind in the Eyes Test. Journal of Abnormal Psychology, 125(6), 818-823.
Prevost, M., Carrier, M. E., Chowne, G., Zelkowitz, P., Joseph, L., & Gold, I. (2014). The Reading the Mind in the Eyes test: Validation of a French version and exploration of cultural variations in a multi-ethnic city. Cognitive Neuropsychiatry, 19(3), 189-204.
Vellante, M., Baron-Cohen, S., Melis, M., Marrone, M., Petretto, D. R., Masala, C., & Preti, A. (2013). The "Reading the Mind in the Eyes" test: Systematic review of psychometric properties and a validation study in Italy. Cognitive Neuropsychiatry, 18(4), 326-354.

Overview​

Scientific Background​

Why Researchers Use This Task​

Current Implementation Status​

Configuration Parameters​

Task Structure​

Response Parameters​

Practice Configuration​

Accessibility​

Data Output​

Markers​

Response Data​

Summary Artifact​

Example Research Configurations​

Standard Clinical/Research Administration​

Screening Version (Time-Limited)​

Autism Research Protocol​

Intervention Study (Pre-Post)​

Participant Experience​

Design Recommendations​

General Guidelines​

Scoring and Interpretation​

Population-Specific Adaptations​

Common Issues and Solutions​

Validity and Reliability​

Scoring Output Fields​

References​

See Also​