Site Reliability Engineer (SRE) Job at Openkyber, California

djJrZjYvRTdPRFhzZldjM0FJbnN4OFFHb0E9PQ==
  • Openkyber
  • California

Job Description

Overview:

Dataflix is seeking a highly experienced Senior or Lead Platform Engineer/Site Reliability Engineer (SRE)/Hadoop Admin to manage and enhance our petabyte-scale, on-premises data platform. This platform is built using the open-source Hadoop ecosystem. The ideal candidate brings deep technical expertise, a strong understanding of distributed systems, and extensive experience operating and optimizing large-scale data infrastructure.

Responsibilities:
  • Own and operate the end-to-end infrastructure of a large-scale, on-prem Hadoop-based data platform, ensuring high availability and reliability.
  • Design, implement, and maintain core platform components, including Hadoop, Hive, Spark, NiFi, Iceberg, ELK, OpenSearch and Ambari.
  • Automate infrastructure management, monitoring, and deployments using CI/CD pipelines (GitLab) and scripting.
  • Implement and enforce security controls, access management, and compliance standards.
  • Perform system upgrades, patching, performance tuning, and troubleshooting across platform components.
  • Optimize observability and telemetry using tools like Prometheus, Grafana, and OpenTelemetry for real-time performance monitoring and alerting.
  • Proactively monitor system health, resolve incidents, and conduct root-cause analyses to prevent recurrence.
  • Collaborate with data engineering, analytics, and infrastructure teams to align platform capabilities with evolving needs.
Requirements:
  • 10+ years of experience in Platform Engineering, Site Reliability Engineering, or similar roles, with proven success managing large-scale, distributed Hadoop infrastructure.
  • Deep expertise in the Hadoop ecosystem, including HDFS, YARN, Hive, Spark, NiFi, Ambari, and Iceberg.
  • Strong Linux system administration skills (CentOS/Rocky preferred), including system tuning, performance optimization, and troubleshooting.
  • Proficiency in containerization and orchestration using Docker and Kubernetes.
  • Solid experience with automation and Infrastructure as Code, leveraging tools like GitLab CI/CD and scripting in Python and bash.
  • Practical knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry) and understanding of system health, alerting, and telemetry.
  • Familiarity with networking concepts, security protocols, and data compliance requirements.
  • Experience managing petabyte-scale data platforms and implementing disaster recovery strategies.
  • Understanding of data governance, metadata management, and operational best practices.

Job Tags

Similar Jobs

Compass Group, North America

PATIENT OBSERVATION ASSISTANT (FULL TIME AND PART TIME) Job at Compass Group, North America

 ...immediately for full time and part time **PATIENT OBSERVATION ASSISTANT** positions.+ **Address** : Traveling Constant Observer - 500...  ...required. More details upon interview.+ **Requirement** : Previous hospital or healthcare experience is preferred but not required.+ **... 

Everlight Solar

Journeyman Electrician Job at Everlight Solar

 ...Everlight Solar is seeking a talented individual to fill the role of Electrician. We are looking for a dependable, hardworking individual that...  ...: ~ Electrician (WI) license/certification ~ Journeyman license required ~4+ years of Licensed Electrician experience... 

Goodson National Trucking

Regional Class A CDL Driver Job at Goodson National Trucking

 ...Regional Class A CDL Driver Home Time : 5-6 days with 2 days off in between runs Pay : $0.60 per mile Miles Per Week : 2,500-3,000 miles per week Additional Information: ~80% drop and hook loads. ~ Paid Weekly with Direct Deposit. ~ The driver will... 

Gotham Enterprises Ltd

Case Manager Job at Gotham Enterprises Ltd

 ...Case Manager Location: Piedmont, CA Work Hours: MondayFriday, 9 AM to 5 PM Annual Salary: $100,000 to $120,000 A full-time, in-office Case Manager role is now open in Piedmont, CA. This position involves supporting clients with varying needsfrom healthcare... 

AECOM

Entry Level Environmental Engineer Job at AECOM

 ...better world. Join us.**Job Description**AECOM is seeking an Entry Level Environmental Engineer for our Orlando, FL office._This...  ...Data collection and analysis for environmental compliance, sustainability, and stormwater projects+ Environmental Sampling (e.g., groundwater...