Category: Data Management Level: Intermediate Reading time: 15 minutes Updated: 2026-01-03

Test Data Analytics and Download

Quick Summary: Learn how to download individual test data, combine CSV files from multiple participants, and prepare data for statistical analysis.

What You'll Learn

  • Downloading data for individual tests vs. entire studies
  • Combining participant data files into single datasets
  • Understanding PEBL test data file formats
  • Preparing data for analysis in R, Python, SPSS, or Excel
  • Using advanced download options (filtering, combining)

Overview

The PEBL Online Platform stores data separately for each participant and test. This guide covers how to download and combine these files for statistical analysis, including the powerful Combined Data Download feature that merges participant files automatically.

Understanding Test Data Structure

Storage Organization

Data files are organized hierarchically:

uploads/
  {STUDY_TOKEN}/
    v{version}/              # Snapshot version
      {test_name}/
        {participant_id}/
          {test}-{participant}.csv          # Trial-by-trial data
          {test}-pooled.csv                 # Summary data
          {test}-{participant}-detail.csv   # Additional details

Example:

uploads/STUDY_ABC123/
  v1/
    corsi/
      FP_a1b2c3/
        corsi-FP_a1b2c3.csv       # Trial data for this participant
        corsi-pooled.csv          # Summary scores
    stroop/
      FP_a1b2c3/
        stroop-FP_a1b2c3.csv
        stroop-pooled.csv
  v2/                              # After parameter changes
    corsi/
      FP_d4e5f6/
        ...

Common File Types

Most PEBL tests generate two main files per participant:

  1. Trial-by-trial data ({test}-{participant}.csv)
  • One row per trial
    • Contains stimulus, response, RT, accuracy
    • Used for detailed analysis
  1. Pooled/summary data ({test}-pooled.csv)
  • One row per participant (or per condition)
    • Contains aggregate scores, spans, averages
    • Used for quick analysis

Step-by-Step Guide

Step 1: Access Test Data

  1. Log into the PEBL Online Platform
  2. Click Browse Data in the main menu
  3. Select your study from the list
  4. You'll see a table showing:
  • Test names
    • Number of participants
    • Number of data files
    • Download options

Step 2: Download Individual Test Data

Option A: Download Raw Files (ZIP)

Best for: Getting original files exactly as generated

  1. Find your test in the test list
  2. Click Download ZIP next to the test name
  3. Receives a ZIP file containing:
  • All participant subdirectories
    • All CSV files for that test
    • Original file structure preserved

Structure of downloaded ZIP:

corsi-data-2026-01-03.zip
├── FP_a1b2c3/
│   ├── corsi-FP_a1b2c3.csv
│   └── corsi-pooled.csv
├── FP_d4e5f6/
│   ├── corsi-FP_d4e5f6.csv
│   └── corsi-pooled.csv
└── FP_g7h8i9/
    ├── corsi-FP_g7h8i9.csv
    └── corsi-pooled.csv

Option B: Download Combined CSV

Best for: Immediate analysis without manual file combining

  1. Find your test in the test list
  2. Click Download Combined CSV
  3. Opens download options dialog
Available options:
OptionDescriptionExample
Include patternOnly include matching files*-pooled.csv (only summary files)
Exclude patternSkip matching filestest-* (skip test runs)
Add filename columnPrepend filename as first columnIdentifies source file per row
Files have headersFiles contain header rowUsually YES for CSV files
  1. Click Download
  2. Receives single CSV file with all participant data combined

Step 3: Understanding Combined Data Format

Example: Corsi Block Test

Individual file (corsi-pooled.csv for participant FPa1b2c3):

sub,forward_span,backward_span,total_correct,total_time
FP_a1b2c3,6,5,42,187.3

Combined file (with filename column enabled):

filename,sub,forward_span,backward_span,total_correct,total_time
corsi-pooled.csv,FP_a1b2c3,6,5,42,187.3
corsi-pooled.csv,FP_d4e5f6,7,6,48,172.1
corsi-pooled.csv,FP_g7h8i9,5,4,38,201.5

Without filename column:

sub,forward_span,backward_span,total_correct,total_time
FP_a1b2c3,6,5,42,187.3
FP_d4e5f6,7,6,48,172.1
FP_g7h8i9,5,4,38,201.5

Step 4: Advanced Filtering

Example 1: Download Only Summary Files

Many tests generate both trial-level and summary files. To download only summaries:

Settings:

  • Include pattern: *-pooled.csv
  • Exclude pattern: (leave empty)

Result: Only files matching *-pooled.csv are included

Example 2: Exclude Test/Pilot Runs

If participants used IDs like "test123" or "pilot01":

Settings:

  • Include pattern: *.csv
  • Exclude pattern: test-*,pilot-*

Result: Skips files starting with "test-" or "pilot-"

Example 3: Specific File Types

For tests with multiple output files:

Settings:

  • Include pattern: *-summary*.csv
  • Exclude pattern: *-debug*

Result: Only summary files, no debug files

Data Analysis Workflows

Workflow 1: R Analysis with Combined Data

Step 1: Download combined CSV

Test: corsi
Pattern: *-pooled.csv
Filename column: YES

Step 2: Load in R

library(tidyverse)

# Load combined data
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")

# Check structure
glimpse(corsi)

# Basic statistics
corsi %>%
  summarize(
    n = n(),
    mean_forward = mean(forward_span, na.rm = TRUE),
    sd_forward = sd(forward_span, na.rm = TRUE),
    mean_backward = mean(backward_span, na.rm = TRUE),
    sd_backward = sd(backward_span, na.rm = TRUE)
  )

# Test correlation
cor.test(corsi$forward_span, corsi$backward_span)

Workflow 2: Python Analysis

Download combined CSV (same as above)

Load in Python:

import pandas as pd
import numpy as np
from scipy import stats

# Load data
corsi = pd.read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")

# Check structure
print(corsi.info())
print(corsi.describe())

# Basic statistics
print(f"N = {len(corsi)}")
print(f"Forward span: M = {corsi['forward_span'].mean():.2f}, SD = {corsi['forward_span'].std():.2f}")
print(f"Backward span: M = {corsi['backward_span'].mean():.2f}, SD = {corsi['backward_span'].std():.2f}")

# Correlation
r, p = stats.pearsonr(corsi['forward_span'], corsi['backward_span'])
print(f"Correlation: r = {r:.3f}, p = {p:.3f}")

Workflow 3: Excel/SPSS

Option A: Use combined CSV directly

  1. Download combined CSV
  2. Open in Excel or SPSS
  3. Data is already in single-table format
  4. Ready for analysis
Option B: Manual combining in Excel
  1. Download ZIP of raw files
  2. Extract all CSV files
  3. Open first file in Excel
  4. Copy/paste data from other files below
  5. Remove duplicate headers manually
Recommendation: Use combined CSV download - much faster!

Workflow 4: Combining Multiple Tests

Scenario: Merge Corsi and Stroop data by participant ID

Step 1: Download each test

Test 1: corsi, Pattern: *-pooled.csv
Test 2: stroop, Pattern: *-pooled.csv

Step 2: Merge in R

library(tidyverse)

corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
stroop <- read_csv("combined-data-STUDY_ABC123-stroop-2026-01-03.csv")

# Merge by participant ID (sub column)
data <- corsi %>%
  inner_join(stroop, by = "sub", suffix = c("_corsi", "_stroop"))

# Now analyze relationships
cor.test(data$forward_span, data$stroop_effect)

Step 2 alternative: Merge in Python

import pandas as pd

corsi = pd.read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")
stroop = pd.read_csv("combined-data-STUDY_ABC123-stroop-2026-01-03.csv")

# Merge
data = pd.merge(corsi, stroop, on="sub", suffixes=("_corsi", "_stroop"))

# Analyze
print(data[['forward_span', 'stroop_effect']].corr())

Test-Specific Data Formats

Corsi Block Test

Files generated:

  • corsi-{participant}.csv - Trial-by-trial data
  • corsi-pooled.csv - Summary scores

corsi-pooled.csv columns:

sub,forward_span,backward_span,total_correct,total_time,forward_rt_mean,backward_rt_mean

Key variables:

  • forward_span: Longest forward sequence recalled
  • backward_span: Longest backward sequence recalled
  • total_correct: Total trials correct
  • *_rt_mean: Average reaction time

Stroop Test

Files generated:

  • stroop-{participant}.csv - Trial data
  • stroop-pooled.csv - Condition summaries

stroop-pooled.csv columns:

sub,congruent_acc,congruent_rt,incongruent_acc,incongruent_rt,stroop_effect

Key variables:

  • *_acc: Accuracy (proportion correct) per condition
  • *_rt: Mean RT per condition
  • stroop_effect: RT difference (incongruent - congruent)

Tower of London

Files generated:

  • tol-{participant}.csv - Trial data
  • tol-pooled.csv - Summary

tol-pooled.csv columns:

sub,total_correct,total_moves,planning_time_mean,execution_time_mean,efficiency

Key variables:

  • total_correct: Number of puzzles solved optimally
  • total_moves: Total moves across all trials
  • planning_time_mean: Average time before first move
  • efficiency: Ratio of optimal to actual moves

Custom Tests

For custom tests, check the test code to understand output format:

  1. Look for FilePrint() calls in test .pbl file
  2. First FilePrint() usually defines header
  3. Subsequent calls write data rows
  4. Header defines column names and order

Common Data Preparation Tasks

Task 1: Remove Practice Trials

Many tests include practice trials in data files.

Option 1: Filter during download

Exclude pattern: *-practice-*

Option 2: Filter in analysis

# R: Remove practice trials
data_main <- data %>%
  filter(!grepl("practice", trial_type))

# Python
data_main = data[~data['trial_type'].str.contains("practice")]

Task 2: Exclude Participants

By ID pattern:

# R: Exclude test participants
data_clean <- data %>%
  filter(!grepl("test|pilot", sub, ignore.case = TRUE))

By performance criterion:

# R: Exclude participants with <60% accuracy
data_clean <- data %>%
  filter(accuracy >= 0.60)

Task 3: Handle Missing Data

Check for missing:

# R
summary(data)  # Shows NA counts
data %>%
  summarize(across(everything(), ~sum(is.na(.))))

# Python
data.info()
data.isnull().sum()

Remove participants with missing critical data:

# R
data_complete <- data %>%
  filter(!is.na(forward_span) & !is.na(backward_span))

# Python
data_complete = data.dropna(subset=['forward_span', 'backward_span'])

Task 4: Calculate Derived Measures

Example: Stroop interference score

# R
data <- data %>%
  mutate(
    stroop_interference = incongruent_rt - congruent_rt,
    stroop_interference_pct = (incongruent_rt - congruent_rt) / congruent_rt * 100
  )

Example: Working memory composite

# R: Z-score and average
data <- data %>%
  mutate(
    forward_z = scale(forward_span),
    backward_z = scale(backward_span),
    wm_composite = (forward_z + backward_z) / 2
  )

Task 5: Wide to Long Format

Some analyses require long format (one row per trial):

# R: Convert wide to long
data_long <- data %>%
  pivot_longer(
    cols = c(congruent_rt, incongruent_rt),
    names_to = "condition",
    values_to = "rt"
  )

# Python
data_long = pd.melt(
    data,
    id_vars=['sub'],
    value_vars=['congruent_rt', 'incongruent_rt'],
    var_name='condition',
    value_name='rt'
)

Validation and Quality Checks

Check 1: Verify Sample Size

# R
n_participants <- n_distinct(data$sub)
print(paste("N =", n_participants))

# Expected vs. actual
expected_n <- 100
if (n_participants < expected_n) {
  warning(paste("Missing", expected_n - n_participants, "participants"))
}

Check 2: Check for Duplicates

# R: Check for duplicate participant IDs
duplicates <- data %>%
  count(sub) %>%
  filter(n > 1)

if (nrow(duplicates) > 0) {
  warning("Duplicate participant IDs found:")
  print(duplicates)
}

Check 3: Validate Data Ranges

# R: Check for impossible values
data %>%
  filter(
    forward_span < 0 | forward_span > 9 |  # Corsi span should be 0-9
    accuracy < 0 | accuracy > 1             # Accuracy should be 0-1
  ) %>%
  select(sub, forward_span, accuracy)

Check 4: Identify Outliers

# R: RT outliers (> 3 SD from mean)
data <- data %>%
  mutate(
    rt_z = scale(rt_mean),
    is_outlier = abs(rt_z) > 3
  )

outliers <- data %>%
  filter(is_outlier)

print(paste("Found", nrow(outliers), "RT outliers"))

Troubleshooting

Problem: Header Mismatch Error

Symptom: Combined download fails with "header mismatch" error

Cause: Different participants have files with different column orders or names

Common reasons:

  1. Test parameters changed mid-study
  2. Test version updated
  3. Different test variants used
Solution:
  1. Check if study has multiple snapshot versions
  2. Download each version separately:
   Browse Data → Study → Version 1 → Download
   Browse Data → Study → Version 2 → Download
  1. Or use raw ZIP download and combine manually

Problem: Empty Combined File

Symptom: Combined CSV downloads but is empty

Cause: Include/exclude patterns too restrictive

Solution:

  1. Try with default settings first (include: *, exclude: (empty))
  2. Check actual filenames in raw ZIP
  3. Adjust patterns to match actual filenames

Problem: Missing Participants

Symptom: Fewer participants in combined file than expected

Cause: Some participants' files don't match include pattern

Solution:

  1. Download raw ZIP to see all files
  2. Check if some participants have different filename format
  3. Adjust include pattern or use *.csv to get all

Problem: Duplicate Headers

Symptom: Header row appears multiple times in combined file

Cause: "Files have headers" option was set to NO when they do have headers

Solution:

  1. Re-download with "Files have headers" = YES
  2. Or remove duplicate headers in analysis:
   # R: Remove non-data rows
   data <- data %>%
     filter(sub != "sub")  # Remove header rows

Problem: Can't Merge Tests

Symptom: Joining data from two tests results in no matches

Cause: Participant ID column has different name in each test

Solution:

# R: Rename before joining
corsi <- corsi %>% rename(participant_id = sub)
stroop <- stroop %>% rename(participant_id = subnum)

data <- inner_join(corsi, stroop, by = "participant_id")

Best Practices

1. Download Combined Data for Analysis

Do: Use combined CSV download for analysis

  • Faster than manual combining
  • Removes duplicate headers automatically
  • Validates header consistency
  • Adds filename tracking

Don't: Manually copy/paste from individual files

  • Time-consuming
  • Error-prone
  • Difficult to reproduce

2. Keep Raw Data Archives

Do: Download raw ZIP periodically as backup

  • Original files preserved
  • Can re-combine with different settings
  • Acts as archive

Don't: Rely only on combined downloads

  • Can't change filtering later
  • Original structure lost

3. Document Your Data Processing

Do: Create analysis script that documents all steps

# Example R script
# Study: Working Memory Battery
# Downloaded: 2026-01-03
# Analyst: J. Smith

# Load data
corsi <- read_csv("combined-data-STUDY_ABC123-corsi-2026-01-03.csv")

# Exclusions
# 1. Remove test participants
corsi <- corsi %>% filter(!grepl("test", sub))
# N excluded: 3

# 2. Remove participants with missing data
corsi <- corsi %>% filter(!is.na(forward_span))
# N excluded: 2

# Final N
final_n <- nrow(corsi)
# N = 95

Don't: Do manual steps without documentation

  • Can't reproduce analysis
  • Difficult for collaborators
  • Errors hard to find

4. Validate After Download

Do: Check data immediately after download

# Quick validation
summary(data)
glimpse(data)
table(is.na(data))

Don't: Assume download worked perfectly

  • Files may be corrupted
  • Wrong test downloaded
  • Patterns may have excluded important data

5. Use Consistent Naming

Do: Use descriptive, dated filenames

combined-data-WM-Study-corsi-2026-01-03.csv
combined-data-WM-Study-stroop-2026-01-03.csv
analysis-script-WM-Study-2026-01-03.R

Don't: Use generic names

data.csv
data2.csv
script.R

Related Topics


Need more help?

  • Check test-specific documentation for data format details
  • See Troubleshooting for common issues
  • Contact platform administrator for technical support

Related Topics