Data Scientist

Bay AreaFullTimeUSD 150000-350000 per yearPosted Jun 29, 2026

About Arena Intelligence

Arena is the platform for evaluating how AI models perform in the real world. Founded by researchers from UC Berkeley's SkyLab, we're on a mission to measure and advance the frontier of AI for real-world use, and to build the foundation for everyone to understand, shape, and benefit from it.

Tens of millions of people use Arena each month to evaluate how frontier systems handle the work they actually do. The preferences they share power the most transparent, rigorous, and human-centered evaluations in AI. Leading AI labs, enterprises, and independent researchers rely on our work and open datasets to understand how models behave in real workflows: agentic coding, creative generation, professional productivity, and beyond. We go beyond leaderboards and decompose what human experience reveals about AI, so models advance toward the work people actually do.

We're a team of researchers, academics, builders, and creatives from UC Berkeley, Google, Stanford, and DeepMind. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We're building a company where thoughtful, curious people from all backgrounds can do their best work together, in an office culture that radiates excellence, energy, and focus.

About the Role

We are seeking a Data Scientist with expertise in experimentation, causal inference, and retention analytics to drive data-informed decision-making and optimize user engagement. In this role, you will design and analyze experiments (A/B tests, quasi-experiments), develop measurement frameworks for key metrics (DAU, WAU, MAU, retention), and provide actionable insights to improve product growth and user retention. Proficiency in PySpark is highly desirable to handle large-scale datasets efficiently.

About the Role

Experimentation & Causal Inference
- Design, implement, and analyze A/B tests, multi-armed bandits, and quasi-experimental methods to measure the impact of product changes.
- Apply causal inference techniques (e.g., difference-in-differences, propensity score matching, synthetic control, regression discontinuity) to estimate treatment effects in non-randomized settings.
- Collaborate with product, engineering, and marketing teams to define hypotheses, success metrics, and statistical power requirements.
- Ensure rigorous statistical validity (e.g., controlling for biases, multiple testing corrections, confidence intervals).
Retention & Engagement Analytics
- Develop and refine retention measurement frameworks (e.g., cohort analysis, survival analysis, churn prediction).
- Define and track core engagement metrics (DAU, WAU, MAU, rolling retention, N-day retention) and diagnose trends.
- Identify key drivers of retention through segmentation, funnel analysis, and predictive modeling.
- Work with growth teams to optimize onboarding, engagement loops, and monetization strategies.
Data Infrastructure & Scalable Analytics
- Build and maintain scalable data pipelines (using PySpark, SQL, or big data tools) to process and analyze large datasets.
- Develop automated dashboards and reports (e.g., Tableau, Looker, Metabase) to monitor experiment performance and retention trends.
- Ensure data quality and consistency in metric definitions across teams.
- Optimize queries and computations for performance and cost efficiency in distributed systems (e.g., Databricks, AWS EMR, GCP BigQuery).
Cross-Functional Collaboration
- Partner with product managers, engineers, and marketers to translate business questions into data-driven analyses.
- Present findings and recommendations to executive stakeholders in clear, actionable formats.
- Mentor junior data scientists and analysts on best practices in experimentation and retention analytics.

You’ll have

3+ years of experience in data science, analytics, or experimentation (or equivalent in academic research).
Strong background in statistics and causal inference (hypothesis testing, Bayesian methods, experimental design).
Hands-on experience with SQL and Python (Pandas, NumPy, SciPy, StatsModels, Scikit-learn).
Proficiency in experimentation tools (e.g., Optimizely, Statsig, Eppo, or custom in-house systems).
Experience defining and analyzing retention metrics (DAU/WAU/MAU, cohort retention, churn).
Familiarity with big data tools (PySpark, Hadoop, or similar distributed computing frameworks).

Highly Desirable:

Expertise in PySpark for large-scale data processing and analytics.
Experience with time-series forecasting, survival analysis, or uplift modeling.
Knowledge of ML for retention (e.g., propensity models, clustering, recommendation systems).
Experience with data visualization tools (Tableau, Looker, Plotly, Matplotlib/Seaborn).
Background in growth analytics, product analytics, or marketing analytics.

Nice to Have:

Advanced degree (MS/PhD) in Statistics, Economics, Computer Science, or a quantitative field.
Experience with reinforcement learning or bandit algorithms for dynamic experimentation.
Knowledge of MLOps or productionizing models (e.g., MLflow, Airflow, Docker).

What we offer

We offer competitive compensation and equity aligned to the markets where our team members are based. The base salary range will depend on the candidate’s permanent work location.
Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.
The opportunity to work on cutting-edge AI with a small, mission-driven team
A culture that values transparency, trust, and community impact

Come help build the space where anyone can explore and help shape the future of AI.

Arena Intelligence provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.