28 days old

Lead Site Reliability Engineer

Dublin, OH 43017
  • Job Code

You have a life. We like that about you.

At OCLC, we believe you'll do the best work of your life when you're living the best life possible.

We work hard to build the technology that connects thousands of today's libraries. But we also work hard to make a job at OCLC a meaningful part of a balanced life- not a substitute for one.

The Job Details are as follows:

Discover. Innovate. Collaborate. Inform. A few words we use to describe a career at OCLC.

Technology with a Purpose. OCLC supports thousands of libraries in making information more accessible and more useful to people around the world. OCLC provides shared technology services, original research and community programs that help libraries meet the ever-evolving needs of their users, institutions and communities. With office locations around the globe, OCLC employees are dedicated to offering premier services and software to help libraries cut costs while keeping pace with the demands of our information-driven society.

As a Lead Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. SRE is an engineering discipline that combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. SRE ensures that OCLC's internally critical systems have reliability and uptime appropriate to users' needs while continuously monitoring capacity and performance. SRE is a mindset and a set of engineering approaches focused on optimizing existing systems, building infrastructure, and eliminating work through automation. Youll join a team of problem solvers with a diverse set of perspectives that is focused on ensuring a consistent environment, and supporting day to day operations of a global search platform.


  • Design, code, test and deliver software to automate manual operational work.
  • Development and support of Java applications, streaming applications (Spark streaming and Kafka), and SQL and No-SQL databases (specifically HBase).
  • Migrating applications and systems to internal and external clouds.
  • Troubleshoot priority incidents, facilitatepost-mortems and ensure permanent closure of incidents.
  • Engage with other development teams throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.
  • Identify application patterns and analytics in support of better service level objectives.
  • Design self-healing and resiliency patterns.
  • Design automated software and product upgrades, change management, and release management solutions.
  • Promote and implement pipeline orchestration for continuous delivery of applications.
  • Participate in the 24x7 support coverage, as needed.


  • 8+ years of experience in software or systems engineering.
  • 8+ years of DevOps experience.
  • Mastery designing, coding, testing and software delivery in Java software language.
  • Experience monitoring, supporting and tuning a production application stack.
  • Experience with the Hadoop ecosystem and associated components Map Reduce, HBase, Zookeeper, etc.
  • Experience with scripting and automation frameworks.
  • Dedication to supporting full-stack solutions, including applications, servers, networks, data pipelines and data platforms.
  • Excellent troubleshooting skills.
  • Ability to demonstrate an objective, data-driven approach to problem-solving.
  • Excellent collaboration and communication skills.
  • Experience working across silos in change-controlled environment.
  • Experience with cloud hosting technologies (E.g., AWS, Azure, Google).
  • Experience with containerization platforms (E.g., Docker, Kubernetes).

Other traits:

  • Get excited by being assigned tasks/projects that you have no idea how to do
  • No I have never touched it before, but give me a chance and I will figure it out! attitude
  • Driven to learn and try new things
  • An analytical, creative, and innovative approach to solving problems
  • An interest in working hard and being challenged in a fast-paced environment, and having fun while doing it

Posted: 2020-04-27 Expires: 2020-05-26
Sponsored by:
ADP Logo
Sponsored by:
Bank of America Logo

Featured Jobs[ View All ]

Featured Employers

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Lead Site Reliability Engineer

Dublin, OH 43017

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast