Data Scientist I

10h10 hours ago

Viridien

Navi Mumbai, IN · Full-time · INR 800,000 – INR 1,200,000

About this role

Viridien is an advanced technology, digital and Earth data company that pushes the boundaries of science for a more prosperous and sustainable future. This Data Scientist role focuses on developing data-transformation modules to support complex natural resource, digital, energy transition, and infrastructure challenges.

You will develop systematic data-transformation modules to gather, clean, validate, and structure raw datasets. You will collaborate closely with Subject Matter Experts and the labeling team, providing tools and continuous feedback to improve annotation workflows.

You will work closely with cross-functional partners in production and technology to ensure alignment and transparency. You will maintain a deep understanding of the Data Hub technology stack and data schemas, staying current with emerging technologies.

You will contribute to broader initiatives in data integration, feature design, and machine learning. You will design and run experiments, iterate based on results, and use pre-trained ML models to fine-tune on internal datasets. Document and share learnings with the team.

Requirements

Background in data science or a related field (Master’s degree preferred).
Proficient in at least one programming language—ideally Python—with experience in common ML libraries.
Experience with hybrid ML workflows (traditional ML, LLMs, embeddings, ontologies, knowledge graphs).
Comfortable working with relational, NoSQL, and graph databases.
Strong data-processing skills, including cleaning, filtering, and feature extraction.
Clear communicator with strong collaboration and presentation skills.

Responsibilities

Develop systematic data-transformation modules to gather, clean, validate, and structure raw datasets.
Support SMEs and the labeling team by providing tools, solutions, and continuous technical feedback to improve annotation workflows.
Build scalable, reusable data-processing solutions and maintain version control using GitLab.
Troubleshoot and diagnose data issues, proactively flagging inconsistencies or risks.
Maintain a deep understanding of the Data Hub technology stack and data schemas.
Design and run experiments, iterate based on results, and document and share learnings with the team.
Use pre-trained ML models, and train or fine-tune models on internal datasets with proper evaluation and validation.