Duration: 6 Months Contract
Job Description:
- The project involves analyzing the results of Human Data in the coding domain.
- We are seeking a highly skilled Data Modeling Engineer with a strong understanding of Large Language Models (LLMs) to join our team as a Quality Analyst for Human-Created Data.
- In this role, you will work closely with highly technical modeling teams as well as vendor leads to understand the data needs of the LLM modeling teams. You’ll need a strong ability to comprehend LLM use cases, have a nuanced understanding of good versus great content, and be able to use both quantitative and qualitative techniques to analyze LLM datasets.
- Write Extract, Transform, and Load (ETL) logic to automate data collection and reporting processes/pipelines, including data quality and monitoring.
- Build complex and reliable data pipelines in SQL to serve as the backbone for input and output to several ML models. Additionally, deploy SQL workflows in conjunction with Python code.
Responsibilities
- Work with human data leads and modeling leads to define human data creation instructions, rubric and then evaluate the coding-related data created by humans.
- Data Analysis, Meetings with Rater pool leads, and Modeling leads, Status reporting.
- Define data requirements based on a deep understanding of modeling team goals and analysis of model loss patterns.
- Benchmark vendor data quality against competitor models.
- Use quantitative techniques to analyze vendor-produced data, ensuring high quality and driving rater pool optimization.
- Audit datasets for quality issues and develop tools to accelerate qualitative analysis.
- Apply modeling and experimentation techniques to demonstrate data impact and identify blind spots.
- Use your knowledge of data processing, technical systems, and project management to enhance our existing data and machine learning platforms for internal use cases.
- Collaborate with data scientists to drive operational efficiency and make our machine learning data workflows more reliable.
Mandatory:
- Experience with SQL and Python Coding.
- Understanding of LLM capabilities and limitations.
- Understanding of LLM processes like Pre-training, RLHF, SFT, Evals, etc.
- Experience with training/tuning models, prompt engineering and evaluating LLM outputs is a plus.
- Experience writing, maintaining, and monitoring both streaming and batch ETLs operating on a variety of structured and unstructured sources.
- Familiarity with Machine Learning libraries (such as TensorFlow, Scikit-learn, Keras) or exploratory/statistical analysis using Python, R.
- Experience in software development life cycle.
- Experience with ML / AI is a plus.
- Prompt engineering & Writing prompts for GenAI is a plus.
- Advanced ability to write English prose.
Education:
- Bachelor’s or higher in CS or related field.
About US Tech Solutions:
US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. To know more about US Tech Solutions, please visit www.ustechsolutions.com.
US Tech Solutions is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Recruiter Details:
Name: Devesh
Email: devesh@ustechsolutionsinc.com
Internal Id: 24-28153