DevOps Scares Me – Part 2

What is DevOps?

In the first post of this series, my colleague Jim Hirschauer defined what DevOps means and how it impacts organizations. He concluded DevOps is defined as “A software development method that stresses communication, collaboration and integration between software developers and information technology professionals with the goal of automating as much as possible different operational processes.”

I like to think DevOps can be explained simply as operations working together with engineers to get things done faster in an automated and repeatable way.

DevOps Cycle

From developer to operations – one and the same?

As a developer I have always dabbled lightly in operations. I always wanted to focus on making my code great and let an operations team worry about setting up the production infrastructure. It used to be easy! I could just ftp my files to production, and voila! My app was live and it was time for a beer. Real applications are much more complex. As I evolved my skillset I started to do more and expand my operations knowledge.

Infrastructure Automation

When I was young you actually had to build a server from scratch, buy power and connectivity in a data center, and manually plug a machine into the network. After wearing the operations hat for a few years I have learned many operations tasks are mundane, manual, and often have to be done at two in the morning once something has gone wrong. DevOps is predicated on the idea that all elements of technology infrastructure can be controlled through code. With the rise of the cloud it can all be done in real-time via a web service.

Growing pains

When you are responsible for large distributed applications the operations complexity grows quickly.

  • How do you provision virtual machines?
  • How do you configure network devices and servers?
  • How do you deploy applications?
  • How do you collect and aggregate logs?
  • How do you monitor services?
  • How do you monitor network performance?
  • How do you monitor application performance?
  • How do you alert and remediate when there are problems?

Combining the power of developers and operations

The focus on the developer/operations collaboration enables a new approach to managing the complexity of real world operations. I believe the operations complexity breaks down into a few main categories: infrastructure automation, configuration management, deployment automation, log management, performance management, and monitoring. Below are some tools I have used to help solve these tasks.

Infrastructure Automation

Infrastructure automation solves the problem of having to be physically present in a data center to provision hardware and make network changes. The benefits of using cloud services is that costs scale linearly with demand and you can provision automatically as needed without having to pay for hardware up front.

Configuration Management

Configuration management solves the problem of having to manually install and configure packages once the hardware is in place. The benefit of using configuration automation solutions is that servers are deployed exactly the same way every time. If you need to make a change across ten thousand servers you only need to make the change in one place.

There are other vendor-specific DevOps tools as well:

Deployment Automation

Deployment automation solves the problem of deploying an application with an automated and repeatable process.

Log Management

Log management solves the problem of aggregating, storing, and analyzing all logs in one place.

Performance Management

Performance management is about ensuring your network and application are performing as expected and providing intelligence when you encounter problems.

Monitoring

Monitoring and alerting are a crucial piece to managing operations and making sure people are notified when infrastructure and related services go down.

In my next post of this series we will dive into each of these categories and explore the best tools available in the devops space. As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Click here for DevOps Scares Me Part 3.

DevOps Scares Me – Part 1

Call of Duty: DevOpsDevOps is scary stuff for us pure Ops folks that thought they left coding behind a long, long time ago. Most of us Ops people can hack out some basic (or maybe even advanced) shell scripts in Perl, ksh, bash, csh, etc… But the term DevOps alone makes me cringe and think I might really need to know how to write code for real (which I don’t enjoy, that’s why I’m an ops guy in the first place).

So here’s my plan. I’m going to do a bunch of research, play with relevant tools (what fun is IT without tools?), and document everything I discover here in a series of blog posts. My goal is to educate myself and others so that we operations types can get more comfortable with DevOps. By breaking down this concept and figuring out what it really means, hopefully we can figure out how to transition pure Ops guys into this new IT management paradigm.

What is DevOps

Here we go, I’m probably about to open up Pandoras Box by trying to define what DevOps means but to me that is the foundation of everything else I will discuss in this series. I started my research by asking Google “what is devops”. Naturally, Wikipedia was the first result so that is where we will begin. The first sentence on Wikipedia defines DevOps as “a software development method that stresses communication, collaboration and integration between software developers and information technology (IT) professionals.” Hmmm… This is not a great start for us Ops folks who don’t really want anything to do with programming.

Reading further on down the page I see something more interesting to me … “The goal is to automate as much as possible different operational processes.” Now that is an idea I can stand behind. I have always been a fan of automating whatever repetitive processes that I can (usually by way of shell scripts).

Taking Comfort

My next stop on this DevOps train lead me to a very interesting blog post by the folks at the agile admin. In it they discuss the definition and history of DevOps. Here are some of the nuggets that were of particular interest to me:

  • “Effectively, you can define DevOps as system administrators participating in an agile development process alongside developers and using many of the same agile techniques for their systems work.”
  • “It’s a misconception that DevOps is coming from the development side of the house – DevOps, and its antecedents in agile operations, are largely being initiated out of operations teams.”
  • “The point is that all the participants in creating a product or system should collaborate from the beginning – business folks of various stripes, developers of various stripes, and operations folks of various stripes, and all this includes security, network, and whoever else.”

Wow, that’s a lot more comforting to my fragile psyche. The idea that DevOps is being largely initiated out of the operations side of the house makes me feel like I misunderstood the whole concept right from the start.

For even more perspective I read a great article on O’Reilly Radar from Mike Loukides. In it he explains the origins of dev and ops and shows how operations has been changing over the years to include much more automation of tasks and configurations. He also explains how there is no expectation of all knowing developer/operations super humans but instead that operations staff needs to work closely or even be in the same group as the development team.

When it comes right down to it there are developers and there are operations staff. The two groups have worked too far apart for far too long. The DevOps movement is an attempt to bring these worlds together so that they can achieve the effectiveness and efficiency that the business deserves. I really do feel a lot better about DevOps now that I have done more research into the basic meaning and I hope this helps some of you who were feeling intimidated like I was. In my next post I plan to break down common operations tasks and talk about the tools that are available to help automate those tasks and their associated processes.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Click here for DevOps Scares Me Part 2.

AppDynamics & Splunk – Better Together

AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

APM for the Non-Java Guru: What leak?

Memory, Memory, Memory…

 

Memory is a critical part of Java, in particular, the management of memory. As a developer, memory management is not something you want to be doing on a regular basis, nor is it something you want to do manually. One of the great benefits of Java is its ability to take care of the memory model for you. When objects aren’t in use, Java helps you out by doing the clean up.

What’s Under Your Hood? APM for the Non Java Guru: ORMs & Slow SQL

It’s time for another update to our series on Top Application Performance Challenges. Last time I looked at Java’s synchronization mechanism as a source for performance problems. This time around I take on what is likely the Performance Engineer’s bread and butter … slow database access!

Behind this small statement lies a tricky and multifaceted discussion. For now, I’m going to focus on just one particular aspect – the Object Relational Mapper (ORM).

The ORM has become a method of choice for bringing together the two foundational technologies that we base business applications on today – object-oriented applications (Java, .NET) and relational databases (Oracle, mySQL, PostgreSQL, etc.). For many developers, this technology can seem like a godsend, eliminating the need for them to drill-down into the intricacies of how these two technologies interact. But at the same time, ORMs can place an additional burden on applications, significantly impacting performance while everything looks fine on the surface.

Here’s my two cents on ORMs and why developers should take a longer look under the hood:

In the majority of cases, the time and resources taken to retrieve data are orders of magnitude greater than what’s required to process it. It is no surprise that performance considerations should always include the means and ways of accessing and storing data.

I already mentioned the two major technology foundations on which we build business applications today, object oriented applications, used to model and execute the business logic, and relational databases, used to manage and store the data. Object oriented programs unite data and logic into object instances, relational databases on the other hand isolate data into columns and tables, joined by keys and indexes.

This leaves a pretty big gap to bridge, and it falls upon the application to do the legwork. Since bridging this gap is something many applications must do, enter the ORM as a convenient, reusable framework. Hibernate, for instance, is quite likely the most popular ORM out there.

While intuitive for an application developer to use (ORMs do hide the translation complexities) an ORM can also be a significant weight on an application’s performance.

Let me explain.

Take a Call Graph from AppDynamics and follow the execution path of a transaction, method by method, from the moment a user request hits the application until calls to the database are issued to retrieve the requested information. Then size up the layers of code this path has to go through to get to the data. If your application has implemented an ORM like Hibernate, I assure you’ll be surprised how much stuff is actually going on in there.

No finger-pointing intended. A developer will emphasize that the benefits of just using the ORM component (without having to understand how its working) greatly increases his productivity. Point taken. It does pay, however, to review an ORM’s data access strategy.

I recently worked with a company and saw transactions (single user requests) that each bombard the database with 3000+ distinct calls. Seems a little excessive? You would be right. More over, nobody knew that this was happening and certainly nobody intended for it to be so.

In many cases simple configuration settings or using a different ‘fetch’ method offered by the ORM itself can affect performance significantly. Whether for instance the ORM accesses each row of the customer table individually to fill an array of customers in your code or is actually constructing a query that encompasses all of the expected result-set and gets them in one fell swoop does make a big difference.

Seems obvious, right? But do you actually know what your ORM really does when retrieving the information?

You might be surprised.

DevOps Days – Visibility & Aligning Goals

This summer, I had a chance to sit on a panel at DevOps Days with a few industry colleagues to discuss the role of monitoring, testing, and performance in solving DevOps issues. It was a lively discussion on the teams, tools, and techniques that play a role in managing application and network performance.

Application Virtualization & A Free iPad!

Interested in winning an iPad? Take our Application Virtualization survey, and we’ll give you a more-than-decent shot at winning your very own!

In talking to our customers, AppDynamics recognizes that virtualizing mission-critical applications is at the top of everyone’s mind, even though some companies are at different stages of the process than others.

Many IT departments have already finished virtualizing their less critical apps and they realize that there are many benefits to be had in virtualizing the rest. But, in many cases, application owners are concerned about performance impacts to their mission-critical apps, and they’re saying, “Hands off my app.”

What’s it like at your organization? Is the path to app virtualization a smooth one, or are you experiencing some bumps along the way? We’d like to find out more, and we’d like your help to do it.

Answer a few short questions on your app virtualization strategy and be automatically entered to win an iPad. In addition, we’ll send you a copy of the results, so you can discover what your peers at other organizations are doing.

Find out:

– What percentage of Tier 1 apps have been virtualized by other companies
– Whether the virtualization project is a stepping stone towards the cloud
– What challenges are preventing people from finishing (or even starting) virtualization of Tier 1 apps

The survey is only 10 questions long–and again, entering gives you the possibility of winning an iPad, as well as a guaranteed copy of the survey results.

Take the survey now!

We look forward to reviewing your responses!

APM for the Non Java Guru – Threading & Synchronization Issues

This is the first post in a series on the Top Application Performance Challenges.

Of the many issues affecting the performance of Java/.NET applications, synchronization ranks near the top.  Issues arising from synchronization are often hard to recognize and their impact on performance can be become significant. What’s more, they are often, at least in principle, avoidable.

The fundamental need to synchronize lies with Java’s support for concurrency. This is implemented by allowing the execution of code by separate threads within the same process. Separate threads can share the same resources, objects in memory. While being a very efficient way to get more work done (while one thread waits for an IO operation to complete, another thread gets the CPU to run a computation), the application is also exposed to interference and consistency problems.

The JVM/CLR does not guarantee an execution order of the code running in concurrent threads. If multiple threads reference the same object there is no telling what state that object will be in at a given moment in time. The repercussions of that simple fact can be enormous with, for example, one thread running calculations and returning wrong results because a concurrent thread accessing and modifying shared bits of information at the same time.

To prevent such a scenario (a program needs to execute correctly, after all) a programmer uses the “synchronize” keyword in his/her program to force order on concurrent thread execution. Using “synchronize” prevents threads from obtaining the same object at the same time.

In practice, however, this simple mechanism comes with substantial side effects. Modern business applications are typically highly multi-threaded. Many threads execute concurrently, and consequently “contend” heavily for shared objects. Contention occurs when a thread wants to access a synchronized object that is already held by another thread. All threads contending effectively “block,” halting their execution until they can acquire the object. Synchronization effectively forces concurrent processing back into sequential execution.

With just a few metrics we can show the effects of synchronization on an application’s performance. For instance, take a look at the graph below.

While increasing load (number of users = blue), we see that at some point midway the response time (yellow) takes an upward curve, while at the same time resource usage (CPU = red) somewhat increases to eventually plateau and even recedes. It almost looks like the application runs with the “handbrake on,” a classic, albeit high-level, symptom of an application that has been “over-synchronized.”

With every new version of the JVM/CLR improvements are made to mitigate this issue. However, while helpful, these improvements can’t fully resolve the issue and address the application’s negative performance.

Also, developers have come to adopt “defensive” coding practices, synchronizing large pieces of code to prevent possible problems. In large development organizations this problem is further magnified as no one developer or team has full ownership of an application’s entire code base. The practice to err on the side of safety can quickly exacerbate with large portions of synchronized code significantly impacting the performance of an application’s potential throughput.

It is often too arduous a task to maintain a locking strategy fine grained enough to ensure that only the necessary minimum of execution paths are synchronized. New approaches to better manage state in a concurrent environment are available in newer versions of Java such as readWriteLocks, but they are not widely adopted yet.  These approaches promise a higher degree of concurrency, but it will always be up to the developer to implement and use the mechanism correctly.

Is synchronization, then, always going to result in a high MTTR?

New technologies exist on the horizon that may lend some relief.  Software Transactional Memory Systems (STM), for example, might become a powerful weapon for dealing with synchronization issues. They may not be ready for prime time yet, but given what we’ve seen with database systems, they might be the key to taming the concurrency challenges affecting applications today. Check out JVSTM, Multiverse and Clojure for examples of STMs.

For now, the best development organizations are the ones that can walk the fine line of balancing code review/rewrite burdens and concessions to performance. APM tools can help quite a lot in such scenarios, allowing to monitor application execution under high load (aka “in production”) and quickly pinpoint to the execution times for particular highly contended Objects, Database connections being a prime example. With the right APM in place, the ability to identify thread synchronization issues become greatly increased—and the overall MTTR will drop dramatically.

Application Performance Issues? You’re Not Alone…

Last week we hosted a presentation by AppDynamics customer, Priceline, along with an informative customer roundtable on application performance. During our roundtable, we polled the audience on their challenges with application performance – we asked questions about their architectures, development philosophies, common performance problems, downtime and visibility. Roughly 100 attendees from companies big and small responded.

The results were interesting. Here’s a look at what we found:

When asked about performance challenges, 88% of respondents noted they’ve experienced application performance problems in the last 12 months. The most common performance issues cited were Slow Response, Stalls, Errors and Memory Leaks. (Stay tuned this August for our upcoming blog series on what you should know about the Top 5 App Performance Challenges).

Additionally, 67% of respondents said they struggled to determine the root cause of their application performance problem in a timely manner, and roughly 33% responded that end-users called the helpdesk to complain about poor online experiences. Surprisingly, only 42% of participants admitted that they’ve experienced production downtime.

When asked to share some information about their application environments, participants responded that:

• 73% have a multi-tier, distributed application
• 70% leverage both open source and commercial application infrastructure (i.e. app servers)
• 52% say they follow an agile development philosophy
• 82% said that they lack visibility in key areas of their application

For the most part, we weren’t surprised by these answers. We hear from our customers everyday that they are shifting toward SOA, virtualization and cloud computing infrastructures. These responses also confirmed our experience that many organizations have built JBoss and Tomcat into their application architectures, whereas five years ago most companies would primarily have had WebLogic or WebSphere.

Its also not surprising that such a large number of respondents had experienced issues with performance. Distributed applications, especially those that use a mix of open source and proprietary components, are extremely difficult to monitor and troubleshoot. They often contain blind spots that result in critical performance problems.

Nor was it surprising that the majority of companies lack visibility into their applications. This is the most common refrain we hear when talking to companies for the first time. It doesn’t matter if they are running a $10 billion dollar business or a $10 million dollar business…visibility is always a concern. At the end of the day, “You can’t manage what you can’t see.”

Management really is the kicker. Web-applications have become transaction-oriented and revenue-critical, and it is more important than ever to be able to not only monitor, but quickly diagnose and fix performance issues before they impact your business.

“This Week in Cloud Computing” – Agility & the Cloud

I recently had a chance to visit the webcast “This Week in Cloud Computing” and share some of my thoughts about cloud trends and application performance management. One thread of the conversation that I found particularly interesting was the discussion of agility in cloud computing. Although this theme comes up from time to time, most discussions I hear on cloud computing focus on cost-cutting and security.

These are extremely important concerns, of course — security in particular can be seen as a prerequisite of any sound cloud computing strategy. But there’s a “forest for the trees” risk in focusing too much on cloud computing pitfalls in lieu of recognizing its benefits, of which agility is certainly a major component.

We’re seeing with our own customers the need to be even more agile than before, of scrums becoming common and engineering stand-ups becoming a way of life. Any process change that helps speed up the application deployment chain is more than just a “nice to have;” it’s a sea change in the ability of companies to deliver value to their end users.

Bernard Golden makes some interesting points about two types of cloud computing agility in this discussion on CIO – definitely worth a read if you’re interested in the topic.

In case you missed the live webcast last week, here’s the video: