SYNAPSE
SYNthetic Agent Populations And Social Environments
Privacy-Preserving Digital Twins for Policy Research—High-Fidelity Simulation Foundations When Real Data Can't Be Shared
At a Glance
SYNAPSE is an AI-powered tool that generates high-fidelity synthetic populations with integrated social network structures, enabling privacy-preserving data sharing and policy research currently blocked by confidentiality restrictions. These synthetic datasets fuse information from multiple existing data sources with overlapping and complementary information, preserving complex statistical relationships and realistic contact structures essential to inform analyses and simulation models of complex economic and social system dynamics—while eliminating re-identification risks and data access barriers. Building on our pioneering research in synthetic health data generation and network modeling, we deliver open-source tools for applications spanning pandemic preparedness, insurance reform analysis, and any domain requiring detailed individual and network data that cannot be shared.
The Challenge
Policy researchers and public health modelers face an impossible dilemma: their most important analyses require detailed individual-level data capturing demographics, behaviors, and social connections—yet access to such data is increasingly restricted by privacy regulations, institutional barriers, and confidentiality requirements. This creates critical gaps.
Existing Approaches Fall Short:
Data Access Barriers
Real data can't be shared across institutions, limiting collaboration and reproducibility. Researchers spend years negotiating data access that may never materialize.
Inadequate Anonymization
Simple statistical sampling loses the complex correlations essential for valid policy simulations. Traditional anonymization techniques either destroy analytical utility or fail privacy guarantees.
Missing Network Structure
Even when individual-level data is available, social network connections are rarely included—yet disease spread, information diffusion, and social influence all depend critically on who connects to whom.
The result? Critical policy questions go unanswered because we can't access or share the data needed to address them—even when that data exists. Pandemic preparedness models lack realistic contact networks. Insurance reform analyses can't replicate detailed individual heterogeneity. Researchers can't validate their methods without reproducible benchmark datasets.
The Solution: AI-Powered Synthetic Population Generation
Our synthetic population framework does what conventional approaches cannot: generate completely artificial populations and social networks that preserve the statistical fidelity of real data while enabling unrestricted sharing and flexible scenario analysis—no privacy compromises, no data access bottlenecks.
Three Transformative Capabilities
1. Statistical Fidelity Beyond Simple Sampling
- Deep generative models learn complex joint distributions across demographic, socioeconomic, behavioral, and health variables
- Preserves intricate correlational structures essential for policy analysis—not just marginal distributions
- Captures realistic heterogeneity and rare subpopulations often missed by traditional methods
- Fuses multiple data sources: census, surveys, administrative records with overlapping/complementary coverage
Why it matters: Synthetic individuals behave like real ones in simulations because they reflect real-world complexity—correlations between income, health status, employment, family structure, geographic location all preserved
2. Network-Level Realism for Transmission and Diffusion Models
- Generates social contact networks with empirically-grounded mixing patterns (age-assortative, household, workplace, community)
- Incorporates spatial proximity, demographic constraints, and context-specific interaction rules
- Supports dynamic scenarios: pandemic distancing, workplace closures, targeted interventions
- Multiple network layers: family ties, workplace contacts, casual interactions, online social networks
Why it matters: Disease spread, information diffusion, and social influence depend on who connects to whom—synthetic networks get the structure right, enabling realistic epidemic modeling and intervention evaluation
3. Privacy-Preserving Sharing Without Restrictions
- Zero risk of re-identification—synthetic individuals never existed
- Enables open data sharing, reproducible science, and cross-institutional collaboration
- Supports custom policy scenarios without returning to data owners for permission
- Facilitates method validation: researchers can benchmark algorithms on shared synthetic data with known ground truth
Why it matters: Researchers can share datasets publicly, accelerating discovery and enabling validation studies impossible with confidential data. Models developed on synthetic populations can be openly published, peer-reviewed, and improved by the research community
What Synthetic Populations Enable That Real Data Cannot
For Public Health Emergency Response
- Agent-based disease transmission models with realistic contact networks and individual risk profiles
- Scenario planning for pandemics, bioterrorism, or natural disasters across diverse populations
- Intervention targeting based on network position, demographics, and vulnerability—without privacy concerns
- Shareable benchmark datasets for comparing epidemic models and intervention strategies
For Health Policy Evaluation
- Microsimulation of insurance reforms, screening programs, or prevention initiatives across heterogeneous populations
- Distributional impact analyses showing which demographic groups gain or lose from policy changes
- Multi-state comparisons using standardized synthetic populations calibrated to each state
- Open sharing of model inputs enabling reproducible policy analysis
For Social Policy and Economic Analysis
- Computational models of how policy changes propagate through social networks
- Employment, education, and welfare program simulations capturing household and community dynamics
- Transportation, housing, and urban planning models requiring detailed population spatial distributions
- Privacy-preserving analysis of sensitive topics like poverty, discrimination, or criminal justice
For Methodological Research and Training
- Benchmark datasets for algorithm development where ground truth is known
- Teaching and training datasets students can use without IRB restrictions
- Validation studies comparing model performance across synthetic and real data
- Reproducible computational experiments that other researchers can replicate exactly
Innovation Grounded in Established Expertise
Proven Track Record
Our approach builds on pioneering research in AI-driven synthetic data generation:
- "Using Artificial Intelligence to Generate Synthetic Health Data" (RAND WRA2892-1): Demonstrated novel applications of variational autoencoders and generative adversarial networks to create synthetic health datasets preserving complex statistical relationships
- "Deep Generative Modeling in Network Science with Applications to Public Policy Research" (RAND WRA843-1): Extended deep learning methods to generate realistic social network structures for policy analysis
Technical Foundation
Deep Generative Models & Data Fusion
Variational autoencoders, generative adversarial networks, and flow-based models adapted for tabular and network data—learning high-dimensional joint distributions while fusing multiple data sources (census, surveys, contact studies, administrative records) into unified synthetic populations
AI-Powered Calibration
Simulation-based inference for rapid alignment to target distributions with flexible constraint incorporation for demographic targets, mixing matrices, and geographic boundaries—enabling tailored synthetic populations for specific policy contexts
Validation & Quality Framework
Rigorous comparison of synthetic vs. real data statistical properties, utility-privacy trade-off quantification, and downstream model performance evaluation—ensuring synthetic populations meet quality standards for policy applications
Open and Accessible
- Open-source software tools with comprehensive documentation
- Modular design supporting extensions to new data sources and application domains
- Community validation and continuous improvement through transparent development
- Integration with existing simulation frameworks (agent-based models, microsimulation platforms)
Flexible Development Pathways
SYNAPSE development follows a phased approach ensuring each stage delivers validated tools while building toward comprehensive population-network synthesis capabilities.
| Phase | Investment | Timeline | Key Deliverables |
|---|---|---|---|
| Phase I: Core Population Synthesis | $190K–$245K | 9–12 months |
• Individual-level synthetic population generation • Demographic, socioeconomic, and health attributes • Validation framework comparing synthetic vs. real data • Initial software tools and documentation • Working prototype for tabular synthetic data |
| Phase II: Network Integration | $270K–$350K | 12–15 months |
• Social network generation with contact patterns and mixing matrices • Spatial and geographic constraint integration • Dynamic scenario capabilities (e.g., pandemic contact reductions) • Multiple network layers (household, workplace, community) • Complete population-network synthesis platform |
| Phase III: Application-Specific Extensions | $95K–$125K | 6–9 months |
• Custom modules for specific domains (epidemiology, health economics, social policy) • Integration with existing simulation frameworks • Advanced features: temporal dynamics, intervention modeling, counterfactuals • Domain-tailored tools ready for policy applications |
Investment Options
Phase I Only
Core population synthesis for tabular data; working prototype with validation framework
Phases I + II
Core population + network integration; complete population-network synthesis platform
Full Integration (All Phases)
Complete system with application-specific extensions; domain-tailored tools ready for policy applications
Complete platform with application-specific extensions; domain-tailored tools for policy applications
Why Now
Privacy Regulations Tightening
GDPR, HIPAA, and state privacy laws increasingly restrict data sharing—synthetic alternatives becoming essential for collaborative research
AI Capabilities Maturing
Deep generative models have reached the sophistication needed for high-fidelity population synthesis—what was impossible 5 years ago is now achievable
Policy Crises Demanding Rapid Analysis
From pandemic response to healthcare reform, policymakers need simulation tools they can deploy immediately without lengthy data access negotiations
Synthetic populations are no longer a future possibility—they're a present necessity. Organizations that invest now will be positioned to conduct policy research others cannot, share data others cannot share, and answer questions others cannot address. SYNAPSE transforms privacy barriers into opportunities for open, reproducible, collaborative science.
Ready to Discuss SYNAPSE for Your Organization?
Whether you're a public health agency, policy research organization, academic institution, or government department, we'd like to explore how synthetic populations can unlock policy research capabilities for your organization.
CausalPaths Analytics LLC
Advancing expert modeling through AI-augmented capabilities and 20+ years of domain expertise