TL;DR
Senior Site Reliability Engineer (Starlink): Upgrading distributed systems for sharding and geo-redundancy and advancing deployment, monitoring, and alerting infrastructure in a multi-region environment with an accent on managing petabyte-scale bare metal compute clusters. Focus on improving system performance, scalability, and maintainability across the software development lifecycle for critical missions.
Location: Onsite in Hawthorne, CA, USA. Must be a U.S. citizen, U.S. lawful permanent resident, Refugee, or Asylee to conform to U.S. Government export regulations.
Salary: $160,000 - $220,000 per year
Company
SpaceX is developing advanced technologies like Starlink, the world's largest satellite constellation, to enable humanity's multi-planetary future.
What you will do
- Upgrade existing distributed systems to become sharded and geo-redundant in multiple data centers.
- Advance existing deployment, monitoring, and alerting infrastructure to support a multi-region environment.
- Manage petabyte-scale bare metal compute clusters.
- Closely collaborate with engineers across all programs to create highly operable, scalable, and maintainable products.
- Engage throughout the whole software development lifecycle of services.
- Focus on performance bottlenecks and performance improvement techniques.
Requirements
- Bachelor's degree in computer science, engineering, math, or scientific discipline and 5 years of software development experience; OR 7+ years of professional experience building software with site reliability or DevOps in lieu of a degree.
- Experience with Linux operating systems.
- Active Top Secret or TS/SCI clearance is required.
- Must be a U.S. citizen, U.S. lawful permanent resident, Refugee, or Asylee to conform to U.S. Government export regulations.
Nice to have
- 5+ years of rigorous experience with site reliability or DevOps.
- Experience with Kubernetes and Istio for on-premise deployment.
- Experience with in-stream, data processing and analytics using open source platforms such as Apache Kafka, Spark, HBase, HDFS, Flink.
- Experience troubleshooting hardware and network-layer issues.
- Programming experience in Python, C#, Java, Scala, or Go.
- Good understanding of version control, testing, continuous integration, build, deployment, and monitoring.
Culture & Benefits
- Opportunity to make an impact on a truly inspiring mission (Starlink).
- Full ownership of challenging problems, working with enthusiastic engineers.
- Access to comprehensive medical, vision, and dental coverage.
- Access to a 401(k) retirement plan and various other discounts and perks.
- Paid parental leave, short & long-term disability insurance, and life insurance.
- 3 weeks of paid vacation and 10 or more paid holidays per year.
- Potential eligibility for long-term incentives (company stock, options, cash awards).
