Site Reliability Engineer, Entry Level

Apply Now

Job Description

If this blog helped you, spread the word!

Job Summary:

HappyRobot is a voice AI tool that automates phone operations used in the logistics and fleet management sectors. They are seeking a Site Reliability Engineer to lead the scaling of operational resilience, ensuring system stability and observability while improving developer focus and system uptime.

Responsibilities:

• Own the stability, observability, and debugging workflows that keep our systems running smoothly.

• Be the go-to person for untangling complex failures in real time.

• Design tools that turn chaos into clarity.

• Help shift from reactive to proactive operations.

• Reduce incident load, build internal tooling, and directly improve developer focus and system uptime.

Qualifications:

Required:

• 1+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)

• Strong problem-solving skills and ability to dive into unfamiliar backend codebases

• Comfort with Python and Go for reading code and writing small tools/utilities

• Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)

• Clear, calm communication under pressure — especially during live incidents

Preferred:

• Experience working with distributed systems or services at scale

• Built or maintained internal tooling for on-call teams or reliability workflows

• Familiarity with deployment pipelines, CI/CD, or infra-as-code

• Experience improving system observability (e.g., custom metrics, traces, log pipelines)

Company:

HappyRobot is a voice AI tool that automates phone operations used in the the logistics and fleet management sectors. Founded in 2022, the company is headquartered in San Francisco, California, USA, with a team of 11-50 employees. The company is currently Early Stage. HappyRobot has a track record of offering H1B sponsorships.

If this blog helped you, spread the word!