TL;DR

Senior Site Reliability Engineer (Starlink): Upgrading distributed systems for sharding and geo-redundancy and advancing deployment, monitoring, and alerting infrastructure in a multi-region environment with an accent on managing petabyte-scale bare metal compute clusters. Focus on improving system performance, scalability, and maintainability across the software development lifecycle for critical missions.

Location: Onsite in Hawthorne, CA, USA. Must be a U.S. citizen, U.S. lawful permanent resident, Refugee, or Asylee to conform to U.S. Government export regulations.

Salary: $160,000 - $220,000 per year

Company

SpaceX is developing advanced technologies like Starlink, the world's largest satellite constellation, to enable humanity's multi-planetary future.

What you will do

  • Upgrade existing distributed systems to become sharded and geo-redundant in multiple data centers.
  • Advance existing deployment, monitoring, and alerting infrastructure to support a multi-region environment.
  • Manage petabyte-scale bare metal compute clusters.
  • Closely collaborate with engineers across all programs to create highly operable, scalable, and maintainable products.
  • Engage throughout the whole software development lifecycle of services.
  • Focus on performance bottlenecks and performance improvement techniques.

Requirements

  • Bachelor's degree in computer science, engineering, math, or scientific discipline and 5 years of software development experience; OR 7+ years of professional experience building software with site reliability or DevOps in lieu of a degree.
  • Experience with Linux operating systems.
  • Active Top Secret or TS/SCI clearance is required.
  • Must be a U.S. citizen, U.S. lawful permanent resident, Refugee, or Asylee to conform to U.S. Government export regulations.

Nice to have

  • 5+ years of rigorous experience with site reliability or DevOps.
  • Experience with Kubernetes and Istio for on-premise deployment.
  • Experience with in-stream, data processing and analytics using open source platforms such as Apache Kafka, Spark, HBase, HDFS, Flink.
  • Experience troubleshooting hardware and network-layer issues.
  • Programming experience in Python, C#, Java, Scala, or Go.
  • Good understanding of version control, testing, continuous integration, build, deployment, and monitoring.

Culture & Benefits

  • Opportunity to make an impact on a truly inspiring mission (Starlink).
  • Full ownership of challenging problems, working with enthusiastic engineers.
  • Access to comprehensive medical, vision, and dental coverage.
  • Access to a 401(k) retirement plan and various other discounts and perks.
  • Paid parental leave, short & long-term disability insurance, and life insurance.
  • 3 weeks of paid vacation and 10 or more paid holidays per year.
  • Potential eligibility for long-term incentives (company stock, options, cash awards).