SYNAPSE

SYNthetic Agent Populations And Social Environments

Privacy-Preserving Digital Twins for Policy Research—High-Fidelity Simulation Foundations When Real Data Can't Be Shared

Zero Risk

Re-identification impossible

$570K-$730K

Full integrated development

Open Source

Community-validated tools

At a Glance

SYNAPSE is an AI-powered tool that generates high-fidelity synthetic populations with integrated social network structures, enabling privacy-preserving data sharing and policy research currently blocked by confidentiality restrictions. These synthetic datasets fuse information from multiple existing data sources with overlapping and complementary information, preserving complex statistical relationships and realistic contact structures essential to inform analyses and simulation models of complex economic and social system dynamics—while eliminating re-identification risks and data access barriers. Building on our pioneering research in synthetic health data generation and network modeling, we deliver open-source tools for applications spanning pandemic preparedness, insurance reform analysis, and any domain requiring detailed individual and network data that cannot be shared.

The Challenge

Policy researchers and public health modelers face an impossible dilemma: their most important analyses require detailed individual-level data capturing demographics, behaviors, and social connections—yet access to such data is increasingly restricted by privacy regulations, institutional barriers, and confidentiality requirements. This creates critical gaps.

Existing Approaches Fall Short:

Data Access Barriers

Real data can't be shared across institutions, limiting collaboration and reproducibility. Researchers spend years negotiating data access that may never materialize.

Inadequate Anonymization

Simple statistical sampling loses the complex correlations essential for valid policy simulations. Traditional anonymization techniques either destroy analytical utility or fail privacy guarantees.

Missing Network Structure

Even when individual-level data is available, social network connections are rarely included—yet disease spread, information diffusion, and social influence all depend critically on who connects to whom.

The result? Critical policy questions go unanswered because we can't access or share the data needed to address them—even when that data exists. Pandemic preparedness models lack realistic contact networks. Insurance reform analyses can't replicate detailed individual heterogeneity. Researchers can't validate their methods without reproducible benchmark datasets.

The Solution: AI-Powered Synthetic Population Generation

Our synthetic population framework does what conventional approaches cannot: generate completely artificial populations and social networks that preserve the statistical fidelity of real data while enabling unrestricted sharing and flexible scenario analysis—no privacy compromises, no data access bottlenecks.

Three Transformative Capabilities

1. Statistical Fidelity Beyond Simple Sampling

Deep generative models learn complex joint distributions across demographic, socioeconomic, behavioral, and health variables
Preserves intricate correlational structures essential for policy analysis—not just marginal distributions
Captures realistic heterogeneity and rare subpopulations often missed by traditional methods
Fuses multiple data sources: census, surveys, administrative records with overlapping/complementary coverage

Why it matters: Synthetic individuals behave like real ones in simulations because they reflect real-world complexity—correlations between income, health status, employment, family structure, geographic location all preserved

2. Network-Level Realism for Transmission and Diffusion Models

Generates social contact networks with empirically-grounded mixing patterns (age-assortative, household, workplace, community)
Incorporates spatial proximity, demographic constraints, and context-specific interaction rules
Supports dynamic scenarios: pandemic distancing, workplace closures, targeted interventions
Multiple network layers: family ties, workplace contacts, casual interactions, online social networks

Why it matters: Disease spread, information diffusion, and social influence depend on who connects to whom—synthetic networks get the structure right, enabling realistic epidemic modeling and intervention evaluation

3. Privacy-Preserving Sharing Without Restrictions

Zero risk of re-identification—synthetic individuals never existed
Enables open data sharing, reproducible science, and cross-institutional collaboration
Supports custom policy scenarios without returning to data owners for permission
Facilitates method validation: researchers can benchmark algorithms on shared synthetic data with known ground truth

Why it matters: Researchers can share datasets publicly, accelerating discovery and enabling validation studies impossible with confidential data. Models developed on synthetic populations can be openly published, peer-reviewed, and improved by the research community

What Synthetic Populations Enable That Real Data Cannot

For Public Health Emergency Response

Agent-based disease transmission models with realistic contact networks and individual risk profiles
Scenario planning for pandemics, bioterrorism, or natural disasters across diverse populations
Intervention targeting based on network position, demographics, and vulnerability—without privacy concerns
Shareable benchmark datasets for comparing epidemic models and intervention strategies

For Health Policy Evaluation

Microsimulation of insurance reforms, screening programs, or prevention initiatives across heterogeneous populations
Distributional impact analyses showing which demographic groups gain or lose from policy changes
Multi-state comparisons using standardized synthetic populations calibrated to each state
Open sharing of model inputs enabling reproducible policy analysis

For Social Policy and Economic Analysis

Computational models of how policy changes propagate through social networks
Employment, education, and welfare program simulations capturing household and community dynamics
Transportation, housing, and urban planning models requiring detailed population spatial distributions
Privacy-preserving analysis of sensitive topics like poverty, discrimination, or criminal justice

For Methodological Research and Training

Benchmark datasets for algorithm development where ground truth is known
Teaching and training datasets students can use without IRB restrictions
Validation studies comparing model performance across synthetic and real data
Reproducible computational experiments that other researchers can replicate exactly

Innovation Grounded in Established Expertise

Proven Track Record

Our approach builds on pioneering research in AI-driven synthetic data generation:

"Using Artificial Intelligence to Generate Synthetic Health Data" (RAND WRA2892-1): Demonstrated novel applications of variational autoencoders and generative adversarial networks to create synthetic health datasets preserving complex statistical relationships
"Deep Generative Modeling in Network Science with Applications to Public Policy Research" (RAND WRA843-1): Extended deep learning methods to generate realistic social network structures for policy analysis

Reference Links:
• WRA2892-1: Using AI to Generate Synthetic Health Data
• WRA843-1: Deep Generative Modeling in Network Science

Technical Foundation

Deep Generative Models & Data Fusion

Variational autoencoders, generative adversarial networks, and flow-based models adapted for tabular and network data—learning high-dimensional joint distributions while fusing multiple data sources (census, surveys, contact studies, administrative records) into unified synthetic populations

AI-Powered Calibration

Simulation-based inference for rapid alignment to target distributions with flexible constraint incorporation for demographic targets, mixing matrices, and geographic boundaries—enabling tailored synthetic populations for specific policy contexts

Validation & Quality Framework

Rigorous comparison of synthetic vs. real data statistical properties, utility-privacy trade-off quantification, and downstream model performance evaluation—ensuring synthetic populations meet quality standards for policy applications

Open and Accessible

Open-source software tools with comprehensive documentation
Modular design supporting extensions to new data sources and application domains
Community validation and continuous improvement through transparent development
Integration with existing simulation frameworks (agent-based models, microsimulation platforms)

Flexible Development Pathways

SYNAPSE development follows a phased approach ensuring each stage delivers validated tools while building toward comprehensive population-network synthesis capabilities.

Phase	Investment	Timeline	Key Deliverables
Phase I: Core Population Synthesis	$190K–$245K	9–12 months	• Individual-level synthetic population generation • Demographic, socioeconomic, and health attributes • Validation framework comparing synthetic vs. real data • Initial software tools and documentation • Working prototype for tabular synthetic data
Phase II: Network Integration	$270K–$350K	12–15 months	• Social network generation with contact patterns and mixing matrices • Spatial and geographic constraint integration • Dynamic scenario capabilities (e.g., pandemic contact reductions) • Multiple network layers (household, workplace, community) • Complete population-network synthesis platform
Phase III: Application-Specific Extensions	$95K–$125K	6–9 months	• Custom modules for specific domains (epidemiology, health economics, social policy) • Integration with existing simulation frameworks • Advanced features: temporal dynamics, intervention modeling, counterfactuals • Domain-tailored tools ready for policy applications

Investment Options

Phase I Only

$190K–$245K

9–12 months

Core population synthesis for tabular data; working prototype with validation framework

Phases I + II

$460K–$595K

21–27 months

Core population + network integration; complete population-network synthesis platform

Full Integration (All Phases)

$570K–$730K

27–36 months

Complete system with application-specific extensions; domain-tailored tools ready for policy applications

24–30 months

Complete platform with application-specific extensions; domain-tailored tools for policy applications

Why Now

Privacy Regulations Tightening

GDPR, HIPAA, and state privacy laws increasingly restrict data sharing—synthetic alternatives becoming essential for collaborative research

AI Capabilities Maturing

Deep generative models have reached the sophistication needed for high-fidelity population synthesis—what was impossible 5 years ago is now achievable

Policy Crises Demanding Rapid Analysis

From pandemic response to healthcare reform, policymakers need simulation tools they can deploy immediately without lengthy data access negotiations

Synthetic populations are no longer a future possibility—they're a present necessity. Organizations that invest now will be positioned to conduct policy research others cannot, share data others cannot share, and answer questions others cannot address. SYNAPSE transforms privacy barriers into opportunities for open, reproducible, collaborative science.

Ready to Discuss SYNAPSE for Your Organization?

Whether you're a public health agency, policy research organization, academic institution, or government department, we'd like to explore how synthetic populations can unlock policy research capabilities for your organization.

CausalPaths Analytics LLC
Advancing expert modeling through AI-augmented capabilities and 20+ years of domain expertise