Job Title : Technical AI Operations Engineer
Location : Remote
Contract : 12 Months (Possibility for Extension)
Must have skills:
- Enterprise Consulting experience with a focus on Data, Machine Learning, and GenAI solutions (15+ years preferred)
- Proficiency in designing and delivering solutions that leverage GenAI technologies (e.g., LLMs, Foundation Models)
- Deep familiarity with relevant concepts and models/technologies (e.g., transformer models, prompt engineering, model fine-tuning)
- Experience delivering and scaling complex infrastructural solutions across diverse platforms
- Ability to translate complex processes and business problems into technical solutions
Strong knowledge of:
- vLLM
- OpenShift AI
- Prometheus
- Grafana
- Aqua
- Automation of deployment and execution of pipelines
Proficient knowledge of:
– Python and SQL
– Apache Spark, Apache Hadoop, Informatica, and similar data processing tools
– Proven experience with building test procedures and ensuring data pipeline quality, reliability, performance, and scalability
– Proven experience developing a comprehensive set of process document runbooks and plans will be developed to cover key operational procedures and activities, including:
- Operational procedures
- Activities and tasks
- Escalation processes
- Communication plans
- Ability to develop applications that expose and use Restful APIs for data querying and ingestion
- Understanding of the AI tooling ecosystem (e.g., Kubernetes, MLOps, AIOps)
- Strong communication and customer-facing skills
- Ability to work efficiently in collaborative teams using Agile methodologies
- Ability to influence and interact with confidence and credibility at all levels