Engineering

The Human Masterminds Behind AI at AppDynamics

By | | 7 min read


Summary
What is the critical factor needed to successfully apply machine learning and artificial intelligence? Human intelligence. Meet some key members of the data science team at AppDynamics who are building the next generation of AI-powered solutions.

A renown data scientist at Bell Laboratories, Tian Bu was ready for a new challenge in early 2015. But of all the places he imagined himself working, Cisco wasn’t on the list. Bu thought of Cisco as a hardware company whose business appeared to lack the very thing that mattered most to him—compelling problems that could be solved through a deep understanding of data. However, at the urging of a friend, Bu agreed to take a closer look.

What he found surprised and intrigued him. Earlier that year, Cisco had begun talking up a more software-centric approach with the announcement of the Cisco ONE software licensing program. But there was a great deal more to the new software-centric strategy than what had been publicly announced. Cisco was planning to disrupt the market and itself with a highly secure, intelligent networking platform designed to continually learn, adapt, automate, and protect. Such a platform would depend on machine learning and artificial intelligence. Cisco was offering Bu an opportunity he had been preparing for his entire career.

Bu had joined the Labs in 2002 as a member of the technical staff after distinguishing himself as a Ph.D. student at the University of Massachusetts, Amherst. With the support of DARPA and in collaboration with the Lawrence Berkeley National Laboratory, he had applied the same tomographic techniques used in medical imaging to the Internet, creating algorithms for predicting bottlenecks and other issues. A paper he co-authored on the project,  “Network Tomography on General Topologies,” was published in the Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems and recognized ten years later with a “Test of Time” award.

In 2007, Bell Labs Ventures approached Bu about creating an internal startup to commercialize his research on analyzing and optimizing wireless networks. Within 18 months, the technology was deployed in several Tier One networks. Momentum continued to build, and the startup was acquired by Alcatel-Lucent’s network intelligence business unit in 2010. In 2012, the Labs lured Bu back with the promise of applied research. For nearly three more years, he delved into questions about wireless networking and data monetization.

Joining Cisco would represent a radical change. If Cisco succeeded in its transformation, Bu would be at the forefront of figuring out how to automate IT and design genuinely self-healing systems. Not all the pieces were in place, but neither Cisco nor Bu could afford to wait. He decided to take a leap of faith and begin building a team.

His first hire was Anne Sauve, an expert in forecasting with a Ph.D. in electrical engineering. Sauve had a unique background, which Bu believed would be useful in finding insights into the millions of metrics per second that were streaming in from modern IT systems. During her doctoral studies at the University of Michigan, Sauve had specialized in statistical signal processing. Since then she had built up six years of experience in bioinformatics and genomics and nine years in medical imaging and 3D modeling. Her last job before joining Cisco was at a startup, where she developed a churn predictor for customer renewals and natural language processing algorithms to derive insights from customer tickets.

“What I liked about Cisco was its culture of rigorous engineering and the fact that it is grounded in reality,” she said. As Sauve dove into her work, producing a time series clustering algorithm to help determine the root cause of performance issues from streaming data and a new ensemble approach to forecasting, a second data scientist named Jiabin Zhao joined the group. An internal transfer from Cisco, Zhao brought more than a decade of experience working with IT data.

When Cisco acquired AppDynamics and Perspica in 2017 the size of the team more than doubled. AppDynamics had two seasoned data scientists: Yuchen Zhao and Yi Hong. Zhao and Hong both had worked for several years applying machine learning to the root cause analysis of problems affecting application performance. Their work included the algorithms that allowed customers to search for the relevant fields that were causing a business transaction to slow down. In addition, Zhao had shared two patents with Arjun Iyer, the senior engineering director, on automating log analysis and anomaly detection.

While AppDynamics’ strength lay in surfacing insights from stored data, Perspica applied machine learning and artificial intelligence to massive amounts of streaming data. Its cloud-based analysis engine could ingest and process millions of data points in real time. It offered the ability to automate threshold management and root cause analysis (RCA) and to predict problems at scale, complementing AppDynamics’ approach to those problems. While the pieces would have to be integrated, together they represented an extremely powerful AI solution.

From Bu’s point of view, the influx of talent from AppD and Pespica was as important as the technology. J.F. Huard, Perspica’s founder, now CTO of Data Science at AppDynamics, and Philip Labo, Perspica’s principal data scientist, were particularly strong additions to the team. Like Bu, Huard had spent time in the early 1990s at Bell Labs while simultaneously earning a doctorate at Columbia University. His research focus in those days was expert systems for network management. After graduating, he pioneered the application of advanced math to provide QoS in programmable networks at a company he co-founded called Xbind. He subsequently started three more companies including one that managed dynamic resource allocation based on game theory and another that focused on predictive analytics. Perspica was Huard’s fifth company.

Years of experience had brought Bu and Huard to the same conclusion: progress in machine learning and AI came from applying the right solution. It was insight and experience that distinguished one data scientist from another.

Labo was a post-doc at Stanford University when he met Huard to interview for a job at Perspica. He remembered how Huard had enthusiastically described a problem and then asked him to solve it. “I was thinking of elaborate solutions based on my work at Stanford,” Labo recalled. “JF was like, ‘No! Principal Component Analysis.’” PCA was a statistical procedure invented in 1901, and Labo was initially unimpressed. But as he thought about it more,  he realized PCA represented an elegant and simple solution to the problem Huard had posed.

Labo was drawn to the opportunity to put his background in applied math to work solving real-world problems for customers. In graduate school he had developed expertise in real-time multivariate analysis. Though the focus of his work was change point detection in yeast population evolution, the underlying ideas were curiously applicable to multivariate anomaly detection in computer data. “There’s something really funny about math in general and applied math in particular,” he said. “It just kind of works in a lot of different situations.”

Bu said Labo’s training has indeed been useful as the team has doubled down on multivariate anomaly detection. Overall, the diversity of backgrounds and depth of experience ensures that AppD will not blindly apply AI, but will choose the most appropriate solutions—ones that are both high quality and efficient to implement.

Given an industry shortage of senior data scientists, Bu said he feels particularly lucky to have a team that has spent years applying machine learning and AI to the entire stack—from applications to the network and beyond. “The strength of the team is that we are not just data scientists who know our math, we are also very familiar with the IT analytics domain,” he said.

The automation of IT at AppDynamics and Cisco is well on its way, Bu noted, with the right people applying the right solutions to important industry problems. For now, the team is focused on time series analysis, classification, and clustering. AppDynamics will be talking more in the near future about how customers can leverage their progress to spot problems sooner, find the root cause faster, and reduce system downtime.

Until then? “We are full speed ahead,” Bu said.