We’re looking for a Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.
In a typical week as a Data Engineer at Relevon you’ll:
- Select and integrate any Big Data tools and frameworks required to provide requested capabilities.
- Build the infrastructure required for optimal ETL of data from various data sources.
- Monitor performance and advising any necessary infrastructure changes.
- Create & maintain optimal data pipeline architecture, define data retention policies.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, designing infrastructure for greater scalability.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across through multiple data centers.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
What we’re looking for:
Don’t worry if you don’t meet 100% of these requirements. Your desire to learn and your ability to work in a forward-thinking, collaborative environment is just as important to us.
- 3+ years of experience as a Data Engineer. Graduate degree in Computer Science, Statistics, Informatics or another quantitative field is preferred.
- Experience supporting and working with cross-functional teams in a dynamic environment, remote or in-office.
- Experience with data warehousing solutions such as: Google BigQuery, EC2, EMR, RDS, Redshift.
- Experience with relational SQL and NoSQL databases, such as MongoDB, Postgres or Cassandra.
- Ability to solve any ongoing issues with operating the cluster
- Proficiency with Hadoop v2, MapReduce, HDFS
- Experience with big data tools: Spark, Kafka, Hadoop, Pig, Hive, Impala etc.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with integration of data from multiple data sources
- Knowledge of various ETL techniques and frameworks, such as Flume
- Experience with object-oriented/object function scripting languages: Python, Java, C++ is a plus.
- Excellent problem solving and analytical skills.