Position

Senior Java Developer with Scala,
Remote Poland

Location


Remote Poland

Office Address


Project Description


You’ll be working with various ETL (Extract Transform Load) pipelines based mainly on Apache Spark in AWS for batch and streaming processing, communicating daily with colleagues distributed locally and abroad. The project includes numerous activities like design, technical supervision, and development. You should be comfortable with discussing selected technical solutions with a customer, as well as assessing design risks, flaws, and eliminating issues. We expect you’ll be able to collaborate technical work among Big Data developers, QAs, and DevOps to ensure the appropriate quality of code, continuous integration, and continuous delivery per best practices of software development.

Project pipelines examples:

• Spark pipeline to cleanse a dataset for further analysis by ML algorithms by other teams. Deployed on AWS EMR using CDK. Scheduled on a daily basis to process ~1 TB of data
• Workflow to pull pre-processed data from Teradata database, hash sensitive data and store partitioned data on S3 in parquet format. Scheduled on a daily basis on an on-prem server as an Oozie workflow
• Hourly Spark pipeline to normalize highly dynamic dataset in JSON format to satisfy SQL standards. Deployed on AWS EMR using CDK
• Daily Spark pipeline to ingest data to Redshift database for BA analysis. Dataset is extremely dynamic (from 1k columns to 7k each day) which won’t fit a classic SQL database by default

Responsibilities


    • Gather requirements from users (Data Analytics), design and implement ETL pipelines from scratch
    • Support and migration of the existing workflows from on-premise to AWS
    • Understanding the Data Lake and Data Warehouse concepts
    • Analysis of the business requirements and task estimation
    • Refactoring of the existing application. Defects fixing
    • Design and implementation of ETL pipelines from scratch on distributed systems
    • Support PROD releases. Release documentation
    • Release automation using AWS CDK and CloudFormation Templates

Skills


Must have

    5+ years' experience in Java/Scala software development
    2+ years' experience with AWS

    Tech stack
    • Programming languages: Scala, Java
    • Hadoop: Spark, Kafka, YARN, HDFS, Oozie (Spark is a basis — you have to know Scala)
    • AWS: EMR, S3, Redshift, Glue, Athena, CloudFormation, CDK, CloudWatch, Secret Manager
    • non-AWS: Couchbase, ElasticSearch
    • Data formats: Parquet, Avro, JSON

Nice to have

    • Programming languages: experience with scripting languages - Python, Groovy, JavaScript
    • AWS: Spectrum, StepFunction, Lambda, SNS, SQS
    • non-AWS: Snowflake, NiFi

Languages


English: B2 Upper Intermediate

Relocation package


If needed, we can help you with relocation process. Click here for more details: see more details

Work Type


Java

Ref Number


VR-56501

Explore More

LoGeek Magazine
icon Logeek Luxoft
Learn more
Events
icon Events Luxoft
Learn more
Relocation Program
icon Relocation Luxoft
Learn more
Referral
Platform
icon Referral Luxoft
Learn more
Students
and Grads
icon Students Luxoft
Learn more

More job opportunities in
Java

Specialization Position / Title Location Send to a friend
Java Senior Java Developer (with messaging and Kafka) Remote Poland, PL
Java Senior Software Developer Remote Poland, PL
Java Senior Java Developer Remote Poland, PL