Responsibilities
We are looking for a Data Scientist to contribute across the full ML development lifecycle — from model building and experimentation to production deployment and monitoring. Core responsibilities are in applied data science and MLOps, with secondary contributions to data engineering and light platform operations. This role works within established platform patterns alongside dedicated infrastructure engineers, without requiring their involvement for routine ML and data tasks. All work is performed in a HIPAA-governed, FedRAMP-compliant healthcare analytics environment.
What you'll do:
- Develop, train, and evaluate ML models (classification, regression, clustering, anomaly detection) and contribute to LLM-based capabilities such as RAG pipelines and prompt evaluation.
- Support model governance and deployment practices using MLFlow, including experiment tracking, model versioning, registry promotion workflows, and automated testing across the ML lifecycle.
- Contribute to production ML operations: model performance monitoring, drift detection, automated alerting, and incident escalation to maintain reliability and SLA compliance.
- Build and improve model serving infrastructure, feature pipelines, and lifecycle automation to support reproducible, scalable model development and inference.
- Apply explainability techniques (e.g., SHAP, LIME) and produce technical documentation to support stakeholder transparency and compliance requirements.
- Contribute to data ingestion, ELT/ETL transformation, and pipeline reliability using Spark and SQL-based frameworks within Snowflake and Databricks environments.
- Support pipeline orchestration, medallion architecture conventions, and data stewardship practices (metadata management, PII handling, lineage tracking in Unity Catalog).
- Perform occasional system administration tasks in collaboration with platform teams, including environment configuration, access management, compute troubleshooting, and secrets handling using platform-native tools.
Qualifications
Basic Qualifications:
- Associate's degree with 6 years of experience, Bachelor's degree with 4+ years of relevant experience, or Master's degree with 2+ years of relevant experience or High School diploma with 8 years of experience.
- Demonstrated experience with SQL and Python, including Python-based ML frameworks (e.g., scikit-learn, XGBoost, PyTorch, or TensorFlow).
- Hands-on experience with MLFlow or equivalent tools for experiment tracking, model governance, and lifecycle management.
- Strong understanding of SDLC fundamentals and experience with GitHub or equivalent version control.
- Experience with distributed compute environments (e.g., Spark, Databricks) and cloud-native services.
- Basic proficiency with Bash or shell scripting for automation and environment setup.
- Ability to collaborate across multidisciplinary teams and communicate technical concepts to varied audiences.
- Ability to obtain and maintain a Public Trust clearance
- US citizenship required or must be a Green Card holder and have been in the USA for 3 of the last 5 years..
Preferred Qualifications:
- Experience with MLOps practices including CI/CD for ML, containerization, feature pipeline automation, and model deployment frameworks.
- Experience with Databricks E2 components (Unity Catalog, Feature Store, Delta Live Tables) and/or model serving and drift monitoring tools (e.g., Databricks Model Serving, Evidenly, etc.).
- Experience with LLM frameworks (e.g., LangChain, LlamaIndex, Hugging Face Transformers) and familiarity with model explainability libraries (e.g., SHAP, LIME).
- Advanced Spark performance optimization experience and/or API development using Databricks REST APIs.
- Experience with healthcare analytics data (preferably Medicare or Medicaid) and familiarity with HIPAA or FedRAMP compliance constraints.
- Experience building data pipelines in a Snowflake or Databricks environment.
- Familiarity with orchestration tools (Airflow, Databricks Workflows).
- Exposure to streaming data patterns using Spark Structured Streaming, Delta Live Tables, or Kafka.
- Familiarity with environment reproducibility tooling (Docker, conda) and scripting (Python, Bash) to support automation and CI/CD tasks
Peraton Overview
Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure.