Apple’s Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, we are opening up the boundaries. Working with multiple lines of business, we handle many streams of Apple-scale data. We bring it all together and extract the value. We do all this with an exceptional group of software engineers, data scientists, SRE/DevOps engineers and managers.
Overall 12+ years’ experience including atleast 5 Years of Management experience leading a team of engineers
5+ years of experience proving reliability engineering for distributed apps and technologies – Cassandra, Solr, Kafka
Hands on manager who likes fixing complex performance and scale problems
Excellent problem solving, critical thinking, and communication skills – Lead by example to motivate and challenge the team to deliver their best
Strong Experience leading multi-functional initiatives and thought leadership
Zoom in and zoom out to clear out ambiguity and set a clear path forward
Have a passion for automation by creating tools using Python, Java or other JVM languages
Strong expertise in solving complex production issues
Should be adept at prioritizing multiple issues in a high pressure environment
Should be able to understand complex architectures and be comfortable working with different teams
Ability to conduct performance analysis and fix large scale distributed systems
Should be highly proactive with a keen focus on improving uptime availability of our mission-critical services
Comfortable working in a fast paced environment while continuously evaluating emerging technologies
The position requires solid knowledge of secure coding practices and experience with the open source technologies
We are seeking a hands-on Manager who has end to end experience managing distributed applications and underlying data layers. You are responsible for availability, security and reliability of many key applications in AML portfolio. You have grown into leadership roles after proving your technical skills in individual contributor roles but still enjoy hands on work when the situation calls for it. You have designed and built large scale application environments for availability, security and reliability serving hundreds of millions of request/day and billions of database calls/day. You keep yourself informed about the choices and trade off as the new technology evolves in the data and infrastructure landscape. You have an eye for talent and hire and grow your engineers by mentoring and challenging them. You will collaborate across many teams to deliver on projects and provide SRE support for reliability of these managed services.
You will have significant opportunity to influence and shape our platform and infrastructure strategy as we work on the next generation of our architecture, platform and processes.
Education & Experience
• B.Tech/BE degree or equivalent from a reputed college
Experience with running infrastructure in AWS and Kubernetes
Experience building and operating large scale Hadoop/Spark/Kafka data infrastructure used for machine learning in a production environment
Experience in tuning complex Hive and Spark queries
Expertise in debugging Hadoop/Spark/Hive issues using Namenode, Datanode, Nodemanager and Spark executor logs.
Experience in Capacity management on multi tenant Hadoop cluster
Experience in Workflow and data pipeline orchestration (Airflow, Oozie, Jenkins etc.)
Experience in Jupyter based notebook infrastructure.