Category: Data Management Level: Intermediate Reading time: 15 minutes Updated: 2026-01-03

Test Data Analytics and Download

Quick Summary: Learn how to download individual test data, combine CSV files from multiple participants, and prepare data for statistical analysis.

What You'll Learn

Downloading data for individual tests vs. entire studies
Combining participant data files into single datasets
Understanding PEBL test data file formats
Preparing data for analysis in R, Python, SPSS, or Excel
Using advanced download options (filtering, combining)

Overview

The PEBL Online Platform stores data separately for each participant and test. This guide covers how to download and combine these files for statistical analysis, including the powerful Combined Data Download feature that merges participant files automatically.

Understanding Test Data Structure

Storage Organization

Data files are organized hierarchically:

uploads/
  {STUDY_TOKEN}/
    v{version}/              # Snapshot version
      {test_name}/
        {participant_id}/
          {test}-{participant}.csv          # Trial-by-trial data
          {test}-pooled.csv                 # Summary data
          {test}-{participant}-detail.csv   # Additional details

Example:

uploads/STUDY_ABC123/
  v1/
    corsi/
      FP_a1b2c3/
        corsi-FP_a1b2c3.csv       # Trial data for this participant
        corsi-pooled.csv          # Summary scores
    stroop/
      FP_a1b2c3/
        stroop-FP_a1b2c3.csv
        stroop-pooled.csv
  v2/                              # After parameter changes
    corsi/
      FP_d4e5f6/
        ...

Common File Types

Most PEBL tests generate two main files per participant:

Trial-by-trial data ({test}-{participant}.csv)

One row per trial

Contains stimulus, response, RT, accuracy
Used for detailed analysis

Pooled/summary data ({test}-pooled.csv)

One row per participant (or per condition)

Contains aggregate scores, spans, averages
Used for quick analysis

Step-by-Step Guide

Step 1: Access Test Data

Log into the PEBL Online Platform
Click Browse Data in the main menu
Select your study from the list
You'll see a table showing:

Test names

Number of participants
Number of data files
Download options

Step 2: Download Individual Test Data

Option A: Download Raw Files (ZIP)

Best for: Getting original files exactly as generated

Find your test in the test list
Click Download ZIP next to the test name
Receives a ZIP file containing:

All participant subdirectories

All CSV files for that test
Original file structure preserved

Structure of downloaded ZIP:

corsi-data-2026-01-03.zip
├── FP_a1b2c3/
│   ├── corsi-FP_a1b2c3.csv
│   └── corsi-pooled.csv
├── FP_d4e5f6/
│   ├── corsi-FP_d4e5f6.csv
│   └── corsi-pooled.csv
└── FP_g7h8i9/
    ├── corsi-FP_g7h8i9.csv
    └── corsi-pooled.csv

Option B: Download Combined CSV

Best for: Immediate analysis without manual file combining

Find your test in the test list
Click Download Combined CSV
Opens download options dialog

Available options:

Option	Description	Example
Include pattern	Only include matching files	`*-pooled.csv` (only summary files)
Exclude pattern	Skip matching files	`test-*` (skip test runs)
Add filename column	Prepend filename as first column	Identifies source file per row
Files have headers	Files contain header row	Usually YES for CSV files

Click Download
Receives single CSV file with all participant data combined

Step 3: Understanding Combined Data Format

Example: Corsi Block Test

Individual file (corsi-pooled.csv for participant FPa1b2c3):

sub,forward_span,backward_span,total_correct,total_time
FP_a1b2c3,6,5,42,187.3

Combined file (with filename column enabled):
filename,sub,forward_span,backward_span,total_correct,total_time corsi-pooled.csv,FP_a1b2c3,6,5,42,187.3 corsi-pooled.csv,FP_d4e5f6,7,6,48,172.1 corsi-pooled.csv,FP_g7h8i9,5,4,38,201.5

Without filename column:
sub,forward_span,backward_span,total_correct,total_time FP_a1b2c3,6,5,42,187.3 FP_d4e5f6,7,6,48,172.1 FP_g7h8i9,5,4,38,201.5

Step 4: Advanced Filtering

Example 1: Download Only Summary Files

Many tests generate both trial-level and summary files. To download only summaries:

Settings:

Include pattern: *-pooled.csv
Exclude pattern: (leave empty)

Result: Only files matching *-pooled.csv are included

Example 2: Exclude Test/Pilot Runs

If participants used IDs like "test123" or "pilot01":

Settings:

Include pattern: *.csv
Exclude pattern: test-*,pilot-*

Result: Skips files starting with "test-" or "pilot-"

Example 3: Specific File Types

For tests with multiple output files:

Settings:

Include pattern: *-summary*.csv
Exclude pattern: *-debug*

Result: Only summary files, no debug files

Data Analysis Workflows

Workflow 1: R Analysis with Combined Data

Step 1: Download combined CSV

Test: corsi
Pattern: *-pooled.csv
Filename column: YES

Step 2: Load in R

library(tidyverse)

# Load combined data
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")

# Check structure
glimpse(corsi)

# Basic statistics
corsi %>%
  summarize(
    n = n(),
    mean_forward = mean(forward_span, na.rm = TRUE),
    sd_forward = sd(forward_span, na.rm = TRUE),
    mean_backward = mean(backward_span, na.rm = TRUE),
    sd_backward = sd(backward_span, na.rm = TRUE)
  )

# Test correlation
cor.test(corsi$forward_span, corsi$backward_span)

Workflow 2: Python Analysis

Download combined CSV (same as above)

Load in Python:

import pandas as pd
import numpy as np
from scipy import stats

# Load data
corsi = pd.read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")

# Check structure
print(corsi.info())
print(corsi.describe())

# Basic statistics
print(f"N = {len(corsi)}")
print(f"Forward span: M = {corsi['forward_span'].mean():.2f}, SD = {corsi['forward_span'].std():.2f}")
print(f"Backward span: M = {corsi['backward_span'].mean():.2f}, SD = {corsi['backward_span'].std():.2f}")

# Correlation
r, p = stats.pearsonr(corsi['forward_span'], corsi['backward_span'])
print(f"Correlation: r = {r:.3f}, p = {p:.3f}")

Workflow 3: Excel/SPSS

Option A: Use combined CSV directly

Download combined CSV
Open in Excel or SPSS
Data is already in single-table format
Ready for analysis

Option B: Manual combining in Excel

Download ZIP of raw files
Extract all CSV files
Open first file in Excel
Copy/paste data from other files below
Remove duplicate headers manually

Recommendation: Use combined CSV download - much faster!

Workflow 4: Combining Multiple Tests

Scenario: Merge Corsi and Stroop data by participant ID

Step 1: Download each test

Test 1: corsi, Pattern: *-pooled.csv
Test 2: stroop, Pattern: *-pooled.csv

Step 2: Merge in R

library(tidyverse)

corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
stroop <- read_csv("combined-data-STUDY_ABC123-stroop-2026-01-03.csv")

# Merge by participant ID (sub column)
data <- corsi %>%
  inner_join(stroop, by = "sub", suffix = c("_corsi", "_stroop"))

# Now analyze relationships
cor.test(data$forward_span, data$stroop_effect)

Step 2 alternative: Merge in Python

import pandas as pd

corsi = pd.read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
stroop = pd.read_csv("combined-data-STUDY_ABC123-stroop-2026-01-03.csv")

# Merge
data = pd.merge(corsi, stroop, on="sub", suffixes=("_corsi", "_stroop"))

# Analyze
print(data[['forward_span', 'stroop_effect']].corr())

Test-Specific Data Formats

Corsi Block Test

Files generated:

corsi-{participant}.csv - Trial-by-trial data
corsi-pooled.csv - Summary scores

corsi-pooled.csv columns:

sub,forward_span,backward_span,total_correct,total_time,forward_rt_mean,backward_rt_mean

Key variables:

forward_span: Longest forward sequence recalled
backward_span: Longest backward sequence recalled
total_correct: Total trials correct
*_rt_mean: Average reaction time

Stroop Test

Files generated:

stroop-{participant}.csv - Trial data
stroop-pooled.csv - Condition summaries

stroop-pooled.csv columns:

sub,congruent_acc,congruent_rt,incongruent_acc,incongruent_rt,stroop_effect

Key variables:

*_acc: Accuracy (proportion correct) per condition
*_rt: Mean RT per condition
stroop_effect: RT difference (incongruent - congruent)

Tower of London

Files generated:

tol-{participant}.csv - Trial data
tol-pooled.csv - Summary

tol-pooled.csv columns:

sub,total_correct,total_moves,planning_time_mean,execution_time_mean,efficiency

Key variables:

total_correct: Number of puzzles solved optimally
total_moves: Total moves across all trials
planning_time_mean: Average time before first move
efficiency: Ratio of optimal to actual moves

Custom Tests

For custom tests, check the test code to understand output format:

Look for FilePrint() calls in test .pbl file
First FilePrint() usually defines header
Subsequent calls write data rows
Header defines column names and order

Common Data Preparation Tasks

Task 1: Remove Practice Trials

Many tests include practice trials in data files.

Option 1: Filter during download

Exclude pattern: *-practice-*

Option 2: Filter in analysis

# R: Remove practice trials
data_main <- data %>%
  filter(!grepl("practice", trial_type))

# Python
data_main = data[~data['trial_type'].str.contains("practice")]

Task 2: Exclude Participants

By ID pattern:

# R: Exclude test participants
data_clean <- data %>%
  filter(!grepl("test|pilot", sub, ignore.case = TRUE))

By performance criterion:

# R: Exclude participants with <60% accuracy
data_clean <- data %>%
  filter(accuracy >= 0.60)

Task 3: Handle Missing Data

Check for missing:

# R
summary(data)  # Shows NA counts
data %>%
  summarize(across(everything(), ~sum(is.na(.))))

# Python
data.info()
data.isnull().sum()

Remove participants with missing critical data:

# R
data_complete <- data %>%
  filter(!is.na(forward_span) & !is.na(backward_span))

# Python
data_complete = data.dropna(subset=['forward_span', 'backward_span'])

Task 4: Calculate Derived Measures

Example: Stroop interference score

# R
data <- data %>%
  mutate(
    stroop_interference = incongruent_rt - congruent_rt,
    stroop_interference_pct = (incongruent_rt - congruent_rt) / congruent_rt * 100
  )

Example: Working memory composite

# R: Z-score and average
data <- data %>%
  mutate(
    forward_z = scale(forward_span),
    backward_z = scale(backward_span),
    wm_composite = (forward_z + backward_z) / 2
  )

Task 5: Wide to Long Format

Some analyses require long format (one row per trial):

# R: Convert wide to long
data_long <- data %>%
  pivot_longer(
    cols = c(congruent_rt, incongruent_rt),
    names_to = "condition",
    values_to = "rt"
  )

# Python
data_long = pd.melt(
    data,
    id_vars=['sub'],
    value_vars=['congruent_rt', 'incongruent_rt'],
    var_name='condition',
    value_name='rt'
)

Validation and Quality Checks

Check 1: Verify Sample Size

# R
n_participants <- n_distinct(data$sub)
print(paste("N =", n_participants))

# Expected vs. actual
expected_n <- 100
if (n_participants < expected_n) {
  warning(paste("Missing", expected_n - n_participants, "participants"))
}

Check 2: Check for Duplicates

# R: Check for duplicate participant IDs
duplicates <- data %>%
  count(sub) %>%
  filter(n > 1)

if (nrow(duplicates) > 0) {
  warning("Duplicate participant IDs found:")
  print(duplicates)
}

Check 3: Validate Data Ranges

# R: Check for impossible values
data %>%
  filter(
    forward_span < 0 | forward_span > 9 |  # Corsi span should be 0-9
    accuracy < 0 | accuracy > 1             # Accuracy should be 0-1
  ) %>%
  select(sub, forward_span, accuracy)

Check 4: Identify Outliers

# R: RT outliers (> 3 SD from mean)
data <- data %>%
  mutate(
    rt_z = scale(rt_mean),
    is_outlier = abs(rt_z) > 3
  )

outliers <- data %>%
  filter(is_outlier)

print(paste("Found", nrow(outliers), "RT outliers"))

Troubleshooting

Problem: Header Mismatch Error

Symptom: Combined download fails with "header mismatch" error

Cause: Different participants have files with different column orders or names

Common reasons:

Test parameters changed mid-study
Test version updated
Different test variants used

Solution:

Check if study has multiple snapshot versions
Download each version separately:

   Browse Data → Study → Version 1 → Download
   Browse Data → Study → Version 2 → Download

Or use raw ZIP download and combine manually

Problem: Empty Combined File

Symptom: Combined CSV downloads but is empty

Cause: Include/exclude patterns too restrictive

Solution:

Try with default settings first (include: *, exclude: (empty))
Check actual filenames in raw ZIP
Adjust patterns to match actual filenames

Problem: Missing Participants

Symptom: Fewer participants in combined file than expected

Cause: Some participants' files don't match include pattern

Solution:

Download raw ZIP to see all files
Check if some participants have different filename format
Adjust include pattern or use *.csv to get all

Problem: Duplicate Headers

Symptom: Header row appears multiple times in combined file

Cause: "Files have headers" option was set to NO when they do have headers

Solution:

Re-download with "Files have headers" = YES
Or remove duplicate headers in analysis:

   # R: Remove non-data rows
   data <- data %>%
     filter(sub != "sub")  # Remove header rows

Problem: Can't Merge Tests

Symptom: Joining data from two tests results in no matches

Cause: Participant ID column has different name in each test

Solution:

# R: Rename before joining
corsi <- corsi %>% rename(participant_id = sub)
stroop <- stroop %>% rename(participant_id = subnum)

data <- inner_join(corsi, stroop, by = "participant_id")

Best Practices

1. Download Combined Data for Analysis

Do: Use combined CSV download for analysis

Faster than manual combining
Removes duplicate headers automatically
Validates header consistency
Adds filename tracking

Don't: Manually copy/paste from individual files

Time-consuming
Error-prone
Difficult to reproduce

2. Keep Raw Data Archives

Do: Download raw ZIP periodically as backup

Original files preserved
Can re-combine with different settings
Acts as archive

Don't: Rely only on combined downloads

Can't change filtering later
Original structure lost

3. Document Your Data Processing

Do: Create analysis script that documents all steps

# Example R script
# Study: Working Memory Battery
# Downloaded: 2026-01-03
# Analyst: J. Smith

# Load data
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")

# Exclusions
# 1. Remove test participants
corsi <- corsi %>% filter(!grepl("test", sub))
# N excluded: 3

# 2. Remove participants with missing data
corsi <- corsi %>% filter(!is.na(forward_span))
# N excluded: 2

# Final N
final_n <- nrow(corsi)
# N = 95

Don't: Do manual steps without documentation

Can't reproduce analysis
Difficult for collaborators
Errors hard to find

4. Validate After Download

Do: Check data immediately after download

# Quick validation
summary(data)
glimpse(data)
table(is.na(data))

Don't: Assume download worked perfectly

Files may be corrupted
Wrong test downloaded
Patterns may have excluded important data

5. Use Consistent Naming

Do: Use descriptive, dated filenames

combined-data-WM-Study-corsi-2026-01-03.csv
combined-data-WM-Study-stroop-2026-01-03.csv
analysis-script-WM-Study-2026-01-03.R

Don't: Use generic names

data.csv
data2.csv
script.R

Test Data Analytics and Download

What You'll Learn

Overview

Understanding Test Data Structure

Storage Organization

Common File Types

Step-by-Step Guide

Step 1: Access Test Data

Step 2: Download Individual Test Data

Option A: Download Raw Files (ZIP)

Option B: Download Combined CSV

Step 3: Understanding Combined Data Format

Example: Corsi Block Test

Step 4: Advanced Filtering

Example 1: Download Only Summary Files

Example 2: Exclude Test/Pilot Runs

Example 3: Specific File Types

Data Analysis Workflows

Workflow 1: R Analysis with Combined Data

Workflow 2: Python Analysis

Workflow 3: Excel/SPSS

Workflow 4: Combining Multiple Tests

Test-Specific Data Formats

Corsi Block Test

Stroop Test

Tower of London

Custom Tests

Common Data Preparation Tasks

Task 1: Remove Practice Trials

Task 2: Exclude Participants

Task 3: Handle Missing Data

Task 4: Calculate Derived Measures

Task 5: Wide to Long Format

Validation and Quality Checks

Check 1: Verify Sample Size

Check 2: Check for Duplicates

Check 3: Validate Data Ranges

Check 4: Identify Outliers

Troubleshooting

Problem: Header Mismatch Error

Problem: Empty Combined File

Problem: Missing Participants

Problem: Duplicate Headers

Problem: Can't Merge Tests

Best Practices

1. Download Combined Data for Analysis

2. Keep Raw Data Archives

3. Document Your Data Processing

4. Validate After Download

5. Use Consistent Naming

Related Topics

Related Topics