TL;DR

Senior Site Reliability Engineer (Starlink): Upgrading distributed systems for sharding and geo-redundancy and advancing deployment, monitoring, and alerting infrastructure in a multi-region environment with an accent on managing petabyte-scale bare metal compute clusters. Focus on improving system performance, scalability, and maintainability across the software development lifecycle for critical missions.

Location: Onsite in Hawthorne, CA, USA. Must be a U.S. citizen, U.S. lawful permanent resident, Refugee, or Asylee to conform to U.S. Government export regulations.

Salary: $160,000 - $220,000 per year

Company

SpaceX is developing advanced technologies like Starlink, the world's largest satellite constellation, to enable humanity's multi-planetary future.

What you will do

Upgrade existing distributed systems to become sharded and geo-redundant in multiple data centers.
Advance existing deployment, monitoring, and alerting infrastructure to support a multi-region environment.
Manage petabyte-scale bare metal compute clusters.
Closely collaborate with engineers across all programs to create highly operable, scalable, and maintainable products.
Engage throughout the whole software development lifecycle of services.
Focus on performance bottlenecks and performance improvement techniques.

Requirements

Bachelor's degree in computer science, engineering, math, or scientific discipline and 5 years of software development experience; OR 7+ years of professional experience building software with site reliability or DevOps in lieu of a degree.
Experience with Linux operating systems.
Active Top Secret or TS/SCI clearance is required.
Must be a U.S. citizen, U.S. lawful permanent resident, Refugee, or Asylee to conform to U.S. Government export regulations.

Nice to have

5+ years of rigorous experience with site reliability or DevOps.
Experience with Kubernetes and Istio for on-premise deployment.
Experience with in-stream, data processing and analytics using open source platforms such as Apache Kafka, Spark, HBase, HDFS, Flink.
Experience troubleshooting hardware and network-layer issues.
Programming experience in Python, C#, Java, Scala, or Go.
Good understanding of version control, testing, continuous integration, build, deployment, and monitoring.

Culture & Benefits

Opportunity to make an impact on a truly inspiring mission (Starlink).
Full ownership of challenging problems, working with enthusiastic engineers.
Access to comprehensive medical, vision, and dental coverage.
Access to a 401(k) retirement plan and various other discounts and perks.
Paid parental leave, short & long-term disability insurance, and life insurance.
3 weeks of paid vacation and 10 or more paid holidays per year.
Potential eligibility for long-term incentives (company stock, options, cash awards).

Sr. Site Reliability Engineer - Top Secret Clearance (Starlink)

Описание вакансии