Test Data Analytics and Download
Quick Summary: Learn how to download individual test data, combine CSV files from multiple participants, and prepare data for statistical analysis.
What You'll Learn
- Downloading data for individual tests vs. entire studies
- Combining participant data files into single datasets
- Understanding PEBL test data file formats
- Preparing data for analysis in R, Python, SPSS, or Excel
- Using advanced download options (filtering, combining)
Overview
The PEBL Online Platform stores data separately for each participant and test. This guide covers how to download and combine these files for statistical analysis, including the powerful Combined Data Download feature that merges participant files automatically.
Understanding Test Data Structure
Storage Organization
Data files are organized hierarchically:
uploads/
{STUDY_TOKEN}/
v{version}/ # Snapshot version
{test_name}/
{participant_id}/
{test}-{participant}.csv # Trial-by-trial data
{test}-pooled.csv # Summary data
{test}-{participant}-detail.csv # Additional details
Example:
uploads/STUDY_ABC123/
v1/
corsi/
FP_a1b2c3/
corsi-FP_a1b2c3.csv # Trial data for this participant
corsi-pooled.csv # Summary scores
stroop/
FP_a1b2c3/
stroop-FP_a1b2c3.csv
stroop-pooled.csv
v2/ # After parameter changes
corsi/
FP_d4e5f6/
...
Common File Types
Most PEBL tests generate two main files per participant:
- Trial-by-trial data (
{test}-{participant}.csv)
- One row per trial
- Contains stimulus, response, RT, accuracy
- Used for detailed analysis
- Pooled/summary data (
{test}-pooled.csv)
- One row per participant (or per condition)
- Contains aggregate scores, spans, averages
- Used for quick analysis
Step-by-Step Guide
Step 1: Access Test Data
- Log into the PEBL Online Platform
- Click Browse Data in the main menu
- Select your study from the list
- You'll see a table showing:
- Test names
- Number of participants
- Number of data files
- Download options
Step 2: Download Individual Test Data
Option A: Download Raw Files (ZIP)
Best for: Getting original files exactly as generated
- Find your test in the test list
- Click Download ZIP next to the test name
- Receives a ZIP file containing:
- All participant subdirectories
- All CSV files for that test
- Original file structure preserved
Structure of downloaded ZIP:
corsi-data-2026-01-03.zip
├── FP_a1b2c3/
│ ├── corsi-FP_a1b2c3.csv
│ └── corsi-pooled.csv
├── FP_d4e5f6/
│ ├── corsi-FP_d4e5f6.csv
│ └── corsi-pooled.csv
└── FP_g7h8i9/
├── corsi-FP_g7h8i9.csv
└── corsi-pooled.csv
Option B: Download Combined CSV
Best for: Immediate analysis without manual file combining
- Find your test in the test list
- Click Download Combined CSV
- Opens download options dialog
| Option | Description | Example |
|---|---|---|
| Include pattern | Only include matching files | *-pooled.csv (only summary files) |
| Exclude pattern | Skip matching files | test-* (skip test runs) |
| Add filename column | Prepend filename as first column | Identifies source file per row |
| Files have headers | Files contain header row | Usually YES for CSV files |
- Click Download
- Receives single CSV file with all participant data combined
Step 3: Understanding Combined Data Format
Example: Corsi Block Test
Individual file (corsi-pooled.csv for participant FPa1b2c3): sub,forward_span,backward_span,total_correct,total_time
FP_a1b2c3,6,5,42,187.3
Combined file (with filename column enabled):
filename,sub,forward_span,backward_span,total_correct,total_time
corsi-pooled.csv,FP_a1b2c3,6,5,42,187.3
corsi-pooled.csv,FP_d4e5f6,7,6,48,172.1
corsi-pooled.csv,FP_g7h8i9,5,4,38,201.5
Without filename column:
sub,forward_span,backward_span,total_correct,total_time
FP_a1b2c3,6,5,42,187.3
FP_d4e5f6,7,6,48,172.1
FP_g7h8i9,5,4,38,201.5
Step 4: Advanced Filtering
Example 1: Download Only Summary Files
Many tests generate both trial-level and summary files. To download only summaries:
Settings:
- Include pattern:
*-pooled.csv - Exclude pattern: (leave empty)
Result: Only files matching *-pooled.csv are included
Example 2: Exclude Test/Pilot Runs
If participants used IDs like "test123" or "pilot01":
Settings:
- Include pattern:
*.csv - Exclude pattern:
test-*,pilot-*
Result: Skips files starting with "test-" or "pilot-"
Example 3: Specific File Types
For tests with multiple output files:
Settings:
- Include pattern:
*-summary*.csv - Exclude pattern:
*-debug*
Result: Only summary files, no debug files
Data Analysis Workflows
Workflow 1: R Analysis with Combined Data
Step 1: Download combined CSV
Test: corsi
Pattern: *-pooled.csv
Filename column: YES
Step 2: Load in R
library(tidyverse)
# Load combined data
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
# Check structure
glimpse(corsi)
# Basic statistics
corsi %>%
summarize(
n = n(),
mean_forward = mean(forward_span, na.rm = TRUE),
sd_forward = sd(forward_span, na.rm = TRUE),
mean_backward = mean(backward_span, na.rm = TRUE),
sd_backward = sd(backward_span, na.rm = TRUE)
)
# Test correlation
cor.test(corsi$forward_span, corsi$backward_span)
Workflow 2: Python Analysis
Download combined CSV (same as above)
Load in Python:
import pandas as pd
import numpy as np
from scipy import stats
# Load data
corsi = pd.read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
# Check structure
print(corsi.info())
print(corsi.describe())
# Basic statistics
print(f"N = {len(corsi)}")
print(f"Forward span: M = {corsi['forward_span'].mean():.2f}, SD = {corsi['forward_span'].std():.2f}")
print(f"Backward span: M = {corsi['backward_span'].mean():.2f}, SD = {corsi['backward_span'].std():.2f}")
# Correlation
r, p = stats.pearsonr(corsi['forward_span'], corsi['backward_span'])
print(f"Correlation: r = {r:.3f}, p = {p:.3f}")
Workflow 3: Excel/SPSS
Option A: Use combined CSV directly
- Download combined CSV
- Open in Excel or SPSS
- Data is already in single-table format
- Ready for analysis
- Download ZIP of raw files
- Extract all CSV files
- Open first file in Excel
- Copy/paste data from other files below
- Remove duplicate headers manually
Workflow 4: Combining Multiple Tests
Scenario: Merge Corsi and Stroop data by participant ID
Step 1: Download each test
Test 1: corsi, Pattern: *-pooled.csv
Test 2: stroop, Pattern: *-pooled.csv
Step 2: Merge in R
library(tidyverse)
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
stroop <- read_csv("combined-data-STUDY_ABC123-stroop-2026-01-03.csv")
# Merge by participant ID (sub column)
data <- corsi %>%
inner_join(stroop, by = "sub", suffix = c("_corsi", "_stroop"))
# Now analyze relationships
cor.test(data$forward_span, data$stroop_effect)
Step 2 alternative: Merge in Python
import pandas as pd
corsi = pd.read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
stroop = pd.read_csv("combined-data-STUDY_ABC123-stroop-2026-01-03.csv")
# Merge
data = pd.merge(corsi, stroop, on="sub", suffixes=("_corsi", "_stroop"))
# Analyze
print(data[['forward_span', 'stroop_effect']].corr())
Test-Specific Data Formats
Corsi Block Test
Files generated:
corsi-{participant}.csv- Trial-by-trial datacorsi-pooled.csv- Summary scores
corsi-pooled.csv columns:
sub,forward_span,backward_span,total_correct,total_time,forward_rt_mean,backward_rt_mean
Key variables:
forward_span: Longest forward sequence recalledbackward_span: Longest backward sequence recalledtotal_correct: Total trials correct*_rt_mean: Average reaction time
Stroop Test
Files generated:
stroop-{participant}.csv- Trial datastroop-pooled.csv- Condition summaries
stroop-pooled.csv columns:
sub,congruent_acc,congruent_rt,incongruent_acc,incongruent_rt,stroop_effect
Key variables:
*_acc: Accuracy (proportion correct) per condition*_rt: Mean RT per conditionstroop_effect: RT difference (incongruent - congruent)
Tower of London
Files generated:
tol-{participant}.csv- Trial datatol-pooled.csv- Summary
tol-pooled.csv columns:
sub,total_correct,total_moves,planning_time_mean,execution_time_mean,efficiency
Key variables:
total_correct: Number of puzzles solved optimallytotal_moves: Total moves across all trialsplanning_time_mean: Average time before first moveefficiency: Ratio of optimal to actual moves
Custom Tests
For custom tests, check the test code to understand output format:
- Look for
FilePrint()calls in test .pbl file - First
FilePrint()usually defines header - Subsequent calls write data rows
- Header defines column names and order
Common Data Preparation Tasks
Task 1: Remove Practice Trials
Many tests include practice trials in data files.
Option 1: Filter during download
Exclude pattern: *-practice-*
Option 2: Filter in analysis
# R: Remove practice trials
data_main <- data %>%
filter(!grepl("practice", trial_type))
# Python
data_main = data[~data['trial_type'].str.contains("practice")]
Task 2: Exclude Participants
By ID pattern:
# R: Exclude test participants
data_clean <- data %>%
filter(!grepl("test|pilot", sub, ignore.case = TRUE))
By performance criterion:
# R: Exclude participants with <60% accuracy
data_clean <- data %>%
filter(accuracy >= 0.60)
Task 3: Handle Missing Data
Check for missing:
# R
summary(data) # Shows NA counts
data %>%
summarize(across(everything(), ~sum(is.na(.))))
# Python
data.info()
data.isnull().sum()
Remove participants with missing critical data:
# R
data_complete <- data %>%
filter(!is.na(forward_span) & !is.na(backward_span))
# Python
data_complete = data.dropna(subset=['forward_span', 'backward_span'])
Task 4: Calculate Derived Measures
Example: Stroop interference score
# R
data <- data %>%
mutate(
stroop_interference = incongruent_rt - congruent_rt,
stroop_interference_pct = (incongruent_rt - congruent_rt) / congruent_rt * 100
)
Example: Working memory composite
# R: Z-score and average
data <- data %>%
mutate(
forward_z = scale(forward_span),
backward_z = scale(backward_span),
wm_composite = (forward_z + backward_z) / 2
)
Task 5: Wide to Long Format
Some analyses require long format (one row per trial):
# R: Convert wide to long
data_long <- data %>%
pivot_longer(
cols = c(congruent_rt, incongruent_rt),
names_to = "condition",
values_to = "rt"
)
# Python
data_long = pd.melt(
data,
id_vars=['sub'],
value_vars=['congruent_rt', 'incongruent_rt'],
var_name='condition',
value_name='rt'
)
Validation and Quality Checks
Check 1: Verify Sample Size
# R
n_participants <- n_distinct(data$sub)
print(paste("N =", n_participants))
# Expected vs. actual
expected_n <- 100
if (n_participants < expected_n) {
warning(paste("Missing", expected_n - n_participants, "participants"))
}
Check 2: Check for Duplicates
# R: Check for duplicate participant IDs
duplicates <- data %>%
count(sub) %>%
filter(n > 1)
if (nrow(duplicates) > 0) {
warning("Duplicate participant IDs found:")
print(duplicates)
}
Check 3: Validate Data Ranges
# R: Check for impossible values
data %>%
filter(
forward_span < 0 | forward_span > 9 | # Corsi span should be 0-9
accuracy < 0 | accuracy > 1 # Accuracy should be 0-1
) %>%
select(sub, forward_span, accuracy)
Check 4: Identify Outliers
# R: RT outliers (> 3 SD from mean)
data <- data %>%
mutate(
rt_z = scale(rt_mean),
is_outlier = abs(rt_z) > 3
)
outliers <- data %>%
filter(is_outlier)
print(paste("Found", nrow(outliers), "RT outliers"))
Troubleshooting
Problem: Header Mismatch Error
Symptom: Combined download fails with "header mismatch" error
Cause: Different participants have files with different column orders or names
Common reasons:
- Test parameters changed mid-study
- Test version updated
- Different test variants used
- Check if study has multiple snapshot versions
- Download each version separately:
Browse Data → Study → Version 1 → Download
Browse Data → Study → Version 2 → Download
- Or use raw ZIP download and combine manually
Problem: Empty Combined File
Symptom: Combined CSV downloads but is empty
Cause: Include/exclude patterns too restrictive
Solution:
- Try with default settings first (include:
*, exclude: (empty)) - Check actual filenames in raw ZIP
- Adjust patterns to match actual filenames
Problem: Missing Participants
Symptom: Fewer participants in combined file than expected
Cause: Some participants' files don't match include pattern
Solution:
- Download raw ZIP to see all files
- Check if some participants have different filename format
- Adjust include pattern or use
*.csvto get all
Problem: Duplicate Headers
Symptom: Header row appears multiple times in combined file
Cause: "Files have headers" option was set to NO when they do have headers
Solution:
- Re-download with "Files have headers" = YES
- Or remove duplicate headers in analysis:
# R: Remove non-data rows
data <- data %>%
filter(sub != "sub") # Remove header rows
Problem: Can't Merge Tests
Symptom: Joining data from two tests results in no matches
Cause: Participant ID column has different name in each test
Solution:
# R: Rename before joining
corsi <- corsi %>% rename(participant_id = sub)
stroop <- stroop %>% rename(participant_id = subnum)
data <- inner_join(corsi, stroop, by = "participant_id")
Best Practices
1. Download Combined Data for Analysis
Do: Use combined CSV download for analysis
- Faster than manual combining
- Removes duplicate headers automatically
- Validates header consistency
- Adds filename tracking
Don't: Manually copy/paste from individual files
- Time-consuming
- Error-prone
- Difficult to reproduce
2. Keep Raw Data Archives
Do: Download raw ZIP periodically as backup
- Original files preserved
- Can re-combine with different settings
- Acts as archive
Don't: Rely only on combined downloads
- Can't change filtering later
- Original structure lost
3. Document Your Data Processing
Do: Create analysis script that documents all steps
# Example R script
# Study: Working Memory Battery
# Downloaded: 2026-01-03
# Analyst: J. Smith
# Load data
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
# Exclusions
# 1. Remove test participants
corsi <- corsi %>% filter(!grepl("test", sub))
# N excluded: 3
# 2. Remove participants with missing data
corsi <- corsi %>% filter(!is.na(forward_span))
# N excluded: 2
# Final N
final_n <- nrow(corsi)
# N = 95
Don't: Do manual steps without documentation
- Can't reproduce analysis
- Difficult for collaborators
- Errors hard to find
4. Validate After Download
Do: Check data immediately after download
# Quick validation
summary(data)
glimpse(data)
table(is.na(data))
Don't: Assume download worked perfectly
- Files may be corrupted
- Wrong test downloaded
- Patterns may have excluded important data
5. Use Consistent Naming
Do: Use descriptive, dated filenames
combined-data-WM-Study-corsi-2026-01-03.csv
combined-data-WM-Study-stroop-2026-01-03.csv
analysis-script-WM-Study-2026-01-03.R
Don't: Use generic names
data.csv
data2.csv
script.R
Related Topics
- Managing and Downloading Data - Overall data management
- Study Snapshots - Understanding data versions
- Configuring Test Parameters - Test settings
- Data Combiner Implementation - Technical details (for developers)
Need more help?
- Check test-specific documentation for data format details
- See Troubleshooting for common issues
- Contact platform administrator for technical support