AI Ops Infrastructure Engineer
- Hybrid
- Herstal, Walloon Region, Belgium
- Infrastructure
Job description
Looking for a change? Ready for new challenges?
We are seeking a highly motivated and detail-oriented AIOps Infrastructure Engineer to join our team!
NRB Group is one of the leading IT players in Belgium.
We provide end-to-end IT services covering all needs: Infrastructure & Cloud, Software Development, Consultancy, and Managed Staffing.
Learn more: https://www.nrb.be/
As an AIOps Infrastructure Engineer, you will be responsible for implementing and maintaining our AIOps platform based on your knowledge of AI model architectures, containerization environments and hardware systems expertise (GPU, DPU, and other accelerator technologies). You will also participate to the design and development of this multi-tenant AI infrastructure platform.
Key responsibilities:
AIOps Platform Development:
Work on the development and maintenance of our AIOps platform, using containerization technology and CI/CD pipelines.
Optimize the platform performances with respect to AI workloads, available hardware and infrastructure costs.
Collaborate with the team to design and implement new features and enhancements to the platform.
Troubleshoot and resolve issues with the platform, working closely with the operations team.
2. Automation and Scripting:
Develop automation scripts to perform routine tasks, such as data processing, reporting, and alerting.
Automate workflows and processes using tools such as Ansible, Terraform, PowerShell or Python.
3. Integration and Testing:
Integrate the AIOps platform with other systems and tools, such as monitoring systems, incident management systems, and data storage systems.
Develop and execute test plans to ensure the platform is functioning correctly.
4. Documentation and Knowledge Management:
Maintain accurate and up-to-date documentation of the AIOps platform, including technical documentation.
Contribute to the organization's knowledge base by documenting best practices, troubleshooting guides, and how-to guides.
6. Collaboration and Communication:
Work closely with the operations team, IT architects, and stakeholders (development teams) to understand requirements and provide solutions.
Communicate technical information to non-technical stakeholders, using clear and concise language.
OUR OFFER
A job function rich in responsibilities and challenges
A team of enthusiastic, professional colleagues in which there is an excellent atmosphere and where expertise is shared
Tools and infrastructure that are consistently at the forefront of innovation
Personalized career support to help you with your development
Many training opportunities and certifications
A company with a friendly working atmosphere which is dynamic and oriented towards innovation.
Fun times – a food truck, barbecue, after-work events, family day!
Complete salary package
Teleworking opportunities (up to 3 days a week)
Job requirements
YOUR PROFILE
Bachelor degree in computer science, Information Technology, or a related field.
3-4 years of experience in datacenter, network and server administration.
Experience with GPU/DPU and other accelerators technology and inferencing solution.
Experience with machine learning and AI concepts (llm, models architecture)
Experience with containerization (Docker, Podman, k8s, OCP, …), microservices architecture and virtualization (VmWare, Kvm, …)
Experience with automation and scripting tools, such as Ansible, Terraform, PowerShell, or Python.
Experience with DevOps tools and practices, such as version control and continuous integration/continuous deployment (CI/CD), GitHub and Argo CD.
Knowledge of IT service management frameworks, such as ITIL.
Knowledge of job scheduling, resource management, and HPC cluster administration
Familiarity with cloud-based AIOps platforms and services, such as AWS, Azure, or Google Cloud.
Programming skills in languages such as Python,Java, or C++.
Strong problem-solving skills and attention to detail.
Excellent communication and collaboration skills.
Ability to work in a fast-paced environment and adapt to changing priorities.
Good command of both French and English is required.
or
All done!
Your application has been successfully submitted!