We are looking for a Site Reliability Engineer to develop automated techniques which harden our services, improve system up-time, and supplement our rapid-deployment model. This is a multi-hat role which will require both strong diagnostic skills paired with robust coding acumen. We design, develop, and maintain a repository of infrastructure software that lays the foundation for our engineers to rapidly iterate development, automate system deployment, monitor system status, and assists consumers of our services in deploying their analytics to our platform. Additionally we diagnose, track, and resolve operational issues as they arise, to keep the system stable and available.
The ideal candidate draws upon a good foundation of programming experience using JVM languages (Kotlin/Java), Python, or GoLang, has fairly extensive Linux experience, and familiarity with Docker, PKI-based security, and networking. Any combination of experience with container orchestration architectures (Mesos/Marathon, Kubernetes), databases(SQL, NoSQL), web services, monitoring solutions(ELK, TICK), HDFS (Hadoop/Accumulo/Zookeeper), NiFi, configuration scripting(Salt/Puppet) and/or experience as a technical lead for a small team would all weigh heavily as assets
Bachelor’s degree in Computer Science or related discipline from an accredited college or university is required Four (4) years of SWE experience on projects with similar software processes may be substituted for a bachelor’s degree.
- Analyze user requirements to derive software design and performance requirements
-Design and code new software or modify existing software to add new features
-Debug existing software and correct defects
-Integrate existing software into new or modified systems or operating environments
- Develop simple data queries for existing or proposed databases or data repositories
-Provide recommendations for improving documentation and software development process standards