Distributed Computing Infrastructure Developer - VCP,
Remote Poland


Remote Poland

Office Address

Project Description

Client description:

Out client is one of the world's leading manufacturers of semiconductor-chip-making equipment. A majority of the world's microchips receive their critical lithographic patterning in machines made by
them. In addition they produce metrology tools and advanced applications to analyze and optimize the performance of the customer production process.

Job Mission:

Participate in the development of a distributed data and compute platform infrastructure. Be accurate, be precise and own the specification, design and implementation of features and fixes. Onboard, integrate and configure open source or other packages that support the development of semiconductor process tuning applications on the client's platform. Support installation of these platforms in Korea, Taiwan, Israel, China and the US (etc.).
Be part of this compute platform that is one of the main pillars under the production of the next generation microchips of Apple, Samsung and many others.

Other details about the working environment:

You will be working in the Business Line Applications. The BL Apps develops Analytics & Control solutions that improve the accuracy of performance metrics (such as overlay, focus, critical dimension) as measured on the end product of a fab process (wafers with chip structures). You will work on the platform underneath these processing algorithms, a distributed computing platform. This platform will provide value to the client's customers all over the world, making sure the chips of the next generation are produces efficiently and with the highest quality.
There are 3 - 4 infra teams, 20-30 engineers, Product Owners and Scrum Masters working on the platform layers.
The application development teams that develop the business critical applications consist of 15-25 teams.


    You will be working as engineer in the virtual compute platform (VCP). This platform is developed inside the client's company to host compute and analytics applications that aim to improve the yield in the semiconductor factories of their customers.
    These applications take data from the client's scanners and metrology equipment. They combine this data to real time production corrections and scanner process diagnostics. The corrections are sent back to the production equipment. Failure of the platform has high impact. It would mean failure of the customer's (TSMC, Samsung, Intel etc.) production facility.
    The platform is currently developed based on DC/OS. In 2021 they have started the migration to Kubernetes. They develop the platform aspects internally. Scheduling of resources, containerization, fail-over and data collection from scanner and measurement devices inside the fab. They have an uptime expectation of 4 nines. As a true distributed computing expert you will have your own view on such a baseline expectation. This might be a nice topic to discuss during an interview.
    Installations and upgrades are automated with Ansible. Other technologies you may encounter are Spark for data processing, Kafka for notifications and high volume data ingestion. Hadoop and HBASE are used for data storage. We are open to your underpinned input on the suitability of stable alternatives for these technologies where these better suit the client's business case.

    Your responsibilities:
    -Design and implement the product with the team
    -Automated tests accompany every delivery
    -Help application developers to understand the infrastructure / cluster / system
    -Make the VCP reliable by improving system resilience (bug-fixing and beyond)


Must have

    -Knowledge of distributed computing systems, practical experience (must)
    -Kubernetes configuration, not just development on top it (must)
    -Ansible playbooks and programming
    -Spark, Kafka topics, Zookeeper for management
    -Experienced in CI/CD, including git, test automation, etc.
    -Familiar with at least one scripting language (e.g. Python)
    -Linux expertise

Nice to have

    -Experience with new technology introduction @ zero downtime including data migration
    - DC/OS
    -HBASE and Hadoop, config, troubleshoot, failover, replication
    -Fan of automated test and qualification.
    -Available to work (remotely) outside regular office hours when it proves that attempt to build a fail-safe system was not yet successful. We really want this to be an exception, not a rule

    -You like to solve problems (permanently)
    -You are open to challenges
    -You think outside the box
    -You can look through the customer eyes
    -You automate everything
    -You have a positive attitude

    We also value:
    -Collaboration with stakeholders
    -Curiosity, understanding how the system is working
    -Ability to dive deep into a specific topic
    -Being able to combine the individual elements and requests into a system design
    -Having fun


English: B2 Upper Intermediate



Relocation package

If needed, we can help you with relocation process. Click here for more information.

Work Type

BigData Platform

Ref Number