IT Operations and the Team USA Speed Skating Disaster

Photo Credit: http://sports.yahoo.com/news/speed-skating-u-miss-medals-again-172417859–oly.html

In case you weren’t aware, Team USA speed skating came home from the Olympic games in Sochi with zero medals. Not just zero Gold medals, zero medals in total (not counting short track skating which only netted one silver medal). Now I’m not a big speed skating fan but what happened during the Olympic games reminded me of what I had seen all too often while working in operations at some very large enterprises. So that’s the topic for today’s blog… What IT organizations can learn from the USA speed skating melt down in Sochi (and visa versa).

As the Team USA speed skaters were turning in poor performance after poor performance on the ice, their coaches were trying to figure out why their athletes were not competing at the level they expected and how to fix the issue. The same things happens in IT organizations when things go wrong with application performance. The business leans on the IT organization and they try to figure out what is going wrong so they can fix the issue.

Team USA decided that their new suits could be the reason why no medals had been won yet and requested to change back to the suits they had been using before they made the switch. Did they not test the new suits at all? No, of course they tested them. The important question is; how did they perform these tests?

Testing Parameters

Altitude, air density, humidity levels, varying velocity, body positions, body shapes, etc… There are a ton of different factors that may or may not have been accounted for during the testing of these suits. It’s not possible to test every variant of every parameter before using the suits in a race just like it’s not possible to properly test an application for every scenario before it get’s released into production.

The fact is, at some point you have to transition from testing to using in a real life scenario. For Team USA, that means using the new suits in races. For IT professionals, it means deploying your application into production.

Cover Up the Air Vent (Blame It On the Database)

The initial reaction by Team USA was to guess that the performance problem was a result of the back air vent system creating turbulent air flow during the race. They proceeded to cover up this air vent with no improvement in results. This is the IT equivalent of blaming application performance problems on the database or the network. This is a natural reaction in the absence of data that can help isolate the location of the problem.

Isolating the bottleneck in a production application can be easy. Here we see there is a very slow call to the database.

Isolating the bottleneck in a production application can be easy. Here we see there is a very slow call to the database.

When it was obvious that a simple closure of the air vent did not have the desired effect, Team USA then decided it was time to switch back to their old suits. In the IT world we call this the back out plan. Roll back the code to the last known good version and hope for the best.

It’s All About the Race Results (aka Production)

No matter how well you test speed skating suits or application releases, the measure of success or failure is how well you do on race day (or in production for the IT crowd). As Team USA found out, you can’t predict how things will go in production from your tests alone. Similar to an IT organization holding off on application changes before major events (like end of year financials, Black Friday, Cyber Monday, etc…), Team USA should have proven their new suits in some races leading up to their biggest event instead of making a suit change right before it.

I feel bad for the amazing athletes of Team USA Speed Skating that had to deal with so much drama during the Olympics. I would imagine it was difficult to perform at the highest level required with a major distraction like their new suits. In the end, there was no difference in results between using the new suits or the old ones. The suits were just a distraction from whatever the real issue was.

IT professionals have the luxury of tools that help them find and fix problems in production.

IT professionals have the luxury of tools that help them find and fix problems in production.

Fortunately for IT professionals we have tools that help us find problems in production instead of just having to guess what the root cause of the issue might be. If you’re a fan of USA Speed Skating, sorry for bringing up a sore subject but hopefully there is a lesson that has been learned for next time. If you’re an IT professional without the proper monitoring tools to figure out the root cause of your production issues you need to sign up today for a free trial of AppDynamics and see what you’ve been missing.

Why BSM Fails to Provide Timely Business Insight

Business Service Management (BSM) projects have always had a reputation for over promising and under delivering. Most people know BSM as the alleged “manager of managers,” or “single source of truth.” According to the latest ITIL definition, BSM is described as:  “The management of business services delivered to business customers.” Like much of ITIL this description is rather ambiguous.

Wikipedia however, currently describes BSM’s purpose as facilitating a “cultural change from one which is very technology-focused to a position which understands and focuses on business objectives and benefits.” Nearly every organization I talk to highlights being technology-focussed as one of their biggest challenges, as well as having a desire for greater alignment to business goals. BSM should therefore be the answer everyone is looking for… it’s just a shame BSM has always been such a challenge to deliver.

Some years ago I worked as a consultant for a small startup which provided BSM software and services. I got to work with many large organizations who all had one common goal: “to make sense of how well IT was supporting their business.” It was a tremendous learning experience for me where I frequently witnessed just how little most organizations really understand the impact major IT events had on their business. For example, I remember working with a major European telco who would have an exec review meeting on the 15th calendar day in the month, to review the previous months’ IT performance. The meeting was held on this date because it was taking 4 people 2 weeks to collate all the information and crunch them into a “mega-spreadsheet.” That’s 40 man days effort to report on the previous 30 day period!

As organizations continue to collect an increasing amount of data from a growing list of data sources, more and more organizations I talk to are looking for solutions to manage this type of “information-fogginess,” but are skeptical about undertaking large scale BSM projects due to the implementation timescale and overall cost.

Implementing BSM:

I’m sure the person who first coined the term “scope creep” must have been involved in implementing BSM, as most BSM projects have a nasty habit of growing arms and legs during the implementation phase. I dread to think how many BSM projects have actually provided a return on their substantial investments.

BSM has always been a heavily services-led undertaking as it is attempting to uniquely model and report on an organization. No two organisations are structured in quite the same way; each having its own unique IT architecture, operating model, tools, challenges and business goals. This is why BSM projects almost always begin with a team of consultants conducting lots of interviews.

Let’s look at cost of implementation for a typical deployment such as the European Telco example I described earlier. This type of project could easily expect 100 – 200 man days of professional services in order to deliver. Factoring in software license costs, training, support & maintenance costs, the project needs to deliver a pretty substantial return in order to justify the spend:

Cost of BSM implementation:

Professional services

(100-200 days @ $1800 per day)

$180,000 – $360,000

Software license

$200,000 -$500,000

Annual support and maintenance

$40,000  – $100,000

Training

$25,000

TOTAL

$445k – $985k

Now if we compare to the pre-existing cost of manually producing the monthly analysis:

Existing manual process costs:

Days per month creating reports

10

Number of people

4

Total man days effort per year

480 days

Average annual salary

$45,000

Total working days per year

225

Annual cost to generate reports

$96,000

Even with our most conservative estimates it would take almost 5 years before this organization would see a return on their investment by which time things will probably have changed sufficiently to require a bunch of additional service days in order to update the BSM implementation. This high cost of implementation is one reason why there is such a reluctance to take the leap of faith needed to implement such technologies.

The most successful BSM implementations I am aware of have typically been the smaller projects, which are primarily focused around data visualization; but with powerful open-source reporting tools such as graphite, graphiti or plotly available for free, I wonder if BSM still has a place even with these small projects today?

What does success look like?

Fundamentally, BSM is about mapping business services to their supporting IT components. However, modern IT environments have become highly distributed, with SOA architectures that have data dispersed across numerous cloud environments and it is just not feasible to map basic 1:1 relationships between business and IT functions any more. This growing complexity can only increase the amount of time and money it takes to complete a traditional BSM implementation. A simplified, more achievable approach is needed in order to fulfil the need to provide meaningful business insight from today’s complex IT environments.

In 2011 Netscape founder Mark Andreessen famously described how today’s businesses depend so heavily on applications when he wrote “software is eating the world”. These applications are built with the purpose of supporting whatever the individual business goals are. It seems logical then that organizations should look into the heart of these applications to get a true understanding of how well the business is functioning.

In a previous post I described how this can be achieved using AppDynamics Real-time Business Metrics (RtBM) to enable multiple parts of an IT organisations to access business metrics from within these applications. By instrumenting the key information points within your application code and gathering business metrics in real time such as the number of orders being placed, the amount of revenue per transaction, and more, AppDynamics can enable everyone in your organization to focus on the success or failure of the most important business metrics.

These are very similar goals to those of a traditional BSM project, however in stark contrast to every BSM project I have ever heard of; AppDynamics can be deployed in under an hour, without the need for any costly services as detailed in a previous blog post introducing Real-time Business Metrics.

Instead of interviewing dozens of people, architecting and building complex dependency models, gathering data and analyzing it all to make sense of what is happening, AppDynamics Real time Business Metrics focuses on the key metrics which matter to your business, providing focus and a common measurement for success across IT and The Business.

So before you embark on a long and costly BSM project to understand what is happening in your business, why not download a free trial of AppDynamics and see for yourself; there is an easier way!

Managing the Performance of Cloud Based Applications

In the last post I covered several architectural techniques you can use to build a highly scalable, failure resistant application in the cloud. However, these architectural changes – along with the inherent unreliability of the cloud – introduce some new problems for application performance management. Many organizations rely on logging, profilers, and legacy application performance monitoring (APM) solutions to monitor and manage performance in the data center, but these strategies and solutions simply aren’t enough when you move into the cloud. Here are a few important considerations for choosing an APM solution that works in the cloud.

Business Transactions

Many monitoring solutions check for server availability and alert users when a server goes down. In the cloud, however, servers can come and go all the time, so alerting on availability will result in a lot of false positives. In addition, many of the server-level metrics that APM tools and server monitoring tools report are no longer as relevant as they were on a vertically scaled system. For example, what does 90% CPU utilization mean to the behavior of your cloud application? Does it mean there is an impending performance problem that needs to be addressed? Or does it mean that more servers need to be added into that tier? This goes for other metrics, too, like physical memory usage, JVM memory usage, thread usage, database connection pool usage, and so on. These are all good indicators of the performance of a single server, but when servers can come and go they’re no longer the best approximation of the performance of your application as a whole.

Instead, it’s best to understand performance in terms of Business Transactions. A business transaction is essentially a user request – for an eCommerce application, “Check out” or “Add to Cart” may be two important business transactions. Each business transaction includes all of the downstream activities until the end user receives a response (and perhaps more, if your application uses asynchronous communication). For example, an application may define a service that performs request validation, stores data in a database, and then publishes a request to a topic. A JMS listener might receive that message from the topic, make a call to an external service, and then store the data in a Hadoop cluster. All of these activities need to be grouped together into a single Business Transaction so that you can understand how every part of your system affects your end users.

Tiers

With these various tiers tracked at the Business Transaction level, the next step is to measure performance at the tier level. While it is important to know when a Business Transaction is behaving abnormally, it is equally as important to detect performance anomalies at the tier level. If the response time of a Business Transaction, as a whole, is slow by one standard deviation (which is acceptable) but one of its tiers is slower by a factor of three standard deviations, you may have a problem developing, even though it hasn’t affected your end users yet. Chances are the tier’s problem will evolve into a systemic problem that causes multiple Business Transactions to suffer.

Returning to our example from figure 3, let’s say the web service behaves well, but the topic listener is significantly slower than usual. The topic listener has not caused a problem in the Business Transaction itself, but it has slowed down enough to cause concern, so there might be an issue that needs to be addressed. Business Transactions, therefore, need to be evaluated both as a whole and at the tier level in order to identify performance issues before they arise. The only way to effectively monitor the performance of an application in a dynamic environment is to capture metrics at the Business Transaction level and the tier level.

Baselines

One of the most important reasons that many organizations move to the cloud is to be able to scale applications up and down rapidly as load changes. If the load on your application fluctuates dramatically over the day, week or year, the cloud will allow you to scale your application infrastructure efficiently to meet that load. However, most application monitoring tools are not equipped to handle such dramatic shifts in load or performance. Application monitoring tools that rely on static thresholds for alerting and data collection will create alert storms when load increases and miss potential problems when it decreases. You need to be able to understand what normal performance is for a given time of day, day of the week or time of the year, which is best done by baselining the performance of your application over time.

Baselining your application essentially means collecting data around how your application performs (or how a specific Business Transactions performs) at any given time. Having this data will allow you (or your APM solution) to determine if how your application is performing now is normal or if it might indicate a problem. Baselines can be defined on a per-hour basis over a period of time – for example, for the past 30 days, how has Checkout performed from 9:00am to 10:00am? In this configuration, the response time of a specific Business Transaction will be compared to the average response time for that Business Transaction over the past 30 days, between the hours of 9:00am and 10:00am. If the response time is greater than some measurable value, such as two standard deviations, then the monitoring system should raise an alert. Figure 4 attempts to show this graphically.

The average response time for this Business Transaction is about 1.75 seconds, with two standard deviations being between 1.5 seconds and 2 seconds, captured over the past 30 days. All incoming occurrences of this Business Transaction during this hour (9:00am to 10:00am in this example) will be compared to the average of 1.75 seconds, and if the response time exceeds two standard deviations from this normal (2 seconds), then an alert will be raised.

What happens if the behavior of your users differs from day to day or month to month? Your monitoring solution should be configurable enough to handle this. Banking applications probably have spikes in load twice a month when most people get paid, and eCommerce applications are inundated on Black Friday. By baselining the performance of these applications over the year, an APM tool could anticipate this load and expect slower performance during these times. Make sure your APM tool is configurable or intelligent enough that it can understand what’s “normal” behavior for your app.

Dynamic Application Mapping

Many monitoring solutions today require manual configuration to instrument and monitor a new server. If new servers are disappearing and appearing all the time, however, this will result in blind spots as you update the tool to reflect the new environment. This will quickly become untenable as your application scales. A cloud-ready monitoring tool must automatically detect and map the application in real time, so you always have an up-to-date idea of what your application looks like. For agent-based monitoring solutions, this can be accomplished by deploying your agent along with your application so that new nodes are automatically instrumented by your APM solution of choice.

 

Take five minutes to get complete visibility into the performance of your cloud applications with AppDynamics today.

3 Reasons Financial Services Companies Need APM Right Now

Financial Services companies operate in a difficult environment. Many of their applications are absolutely vital to the proper workings of the global economy. They are one of the most heavily regulated industries in the world, and they are a constant target of hackers. Their systems need to be available, performant, and secure while generating the revenue sought by Wall Street and their shareholders.

In this article we’ll take a look at 3 factors that impact revenue and how APM is used to mitigate risk.

1. Customers Hate to Wait

Losing a customer is a bad thing no matter the circumstances. Losing a customer due to poor application performance and stability is preventable and should never happen! If you lose customers, you either need to win them back or attract new customers.

Fred Reichheld of Bain & Company reports that:

  • Over a five-year period businesses may lose as many as 1/2 of their customers
  • Acquiring a new customer can cost 6 to 7 times more than retaining an existing customer
  • Businesses who boosted customer retention rates by as little as 5% saw increases in their profits ranging from 5% to a whopping 95%

Based on this research, you should do everything in your power to retain your existing customers. Providing a great end user experience and level of service is what every customer expects.

APM software tracks every transaction flowing through an application, recognizes when there are problems and can even automatically fix the issue.

FS Transaction View

Understanding the components involved in each servicing each transaction is the first step in proper troubleshooting and problem resolution.

2. Customer Loyalty = Revenue Growth

On the flipside, better performance means more customer loyalty and, as a result, revenue gains. The Net Promoter methodology, developed by Satmetrix in cooperation with Bain & Company and Fred Reichheld, is a standard way to measure customer satisfaction. Satmetrix developed the Net Promoter methodology, which is an indexed measure of customer loyalty. In their Net Promoter whitepaper, Satmetrix discovered a direct correlation between high customer loyalty scores and the rate of revenue growth for those companies. The paper showed that the higher the customer loyalty score a company achieved, the higher their rate of revenue growth over a 5-year period.

With applications playing a dominant role as the most common interaction between company and customer, it is imperative that customers have a great experience every time they use your application. Slow transactions, errors, and unavailable platforms leave customers dissatisfied and will reduce your loyalty score. Over time, this will have a significant impact on revenue.

So if we accept the premise that performance should be top-of-mind for anyone with a critical banking or FS application, what do we do next? How do we improve our application management strategy to prevent loss of revenue and improve customer loyalty? The answer: by taking on a transaction-based approach to application performance management.

APM software tracks the performance of all transactions, dynamically baselines the normal performance, and alerts when transactions deviate from their normal behavior. In this manner you’re able to identify performance problems as they are beginning instead of waiting until customers are frustrated and abandoning you applications.

FS Business Transaction List

List of business transactions classified by their level of deviation from normal performance.

3. Transactions = Money

Transactions are the lifeblood of banking. From making an online payment or converting currency to buying or selling stock, just about everything a bank does involves transactions. Furthermore, a significant portion of banks’ revenue comes from transaction fees for activities ranging from ATM withdrawals to currency conversion and credit card usage. For these fee-based transactions, the faster you can ring the cash register (response time of business transactions), the more money you will make and the better likelihood that your customer will come back to you for their next transaction.

With this in mind, it is imperative that IT organizations take a user-centric, or rather, transaction-centric approach to managing application performance.

APM software provides context that enables you to understand the business impact of failed and/or slow transactions. The data gathered by APM software can be used to focus on improving functionality that is used most often or responsible for the most revenue.

FS Prioritization

Having various data perspectives allows application support and development teams to prioritize what needs to be updated in the next release.

If you aren’t using an APM tool yet, or if your APM tool isn’t providing the value that it should be, then you need to take a free trial of AppDynamics and see what you and your customers have been missing.

Architecting for the Cloud

The biggest difference between cloud-based applications and the applications running in your data center is scalability. The cloud offers scalability on demand, allowing you to expand and contract your application as load fluctuates. This scalability is what makes the cloud appealing, but it can’t be achieved by simply lifting your existing application to the cloud. In order to take advantage of what the cloud has to offer, you need to re-architect your application around scalability. The other business benefit comes in terms of price, as in the cloud costs scale linearly with demand.

Sample Architecture of a Cloud-Based Application

Designing an application for the cloud often requires re-architecting your application around scalability. The figure below shows what the architecture of a highly scalable cloud-based application might look like.

The Client Tier: The client tier contains user interfaces for your target platforms, which may include a web-based user interface, a mobile user interface, or even a thick client user interface. There will typically be a web application that performs actions such as user management, session management, and page construction. But for the rest of the interactions the client makes RESTful service calls into the server.

Services: The server is composed of both caching services, from which the clients read data, that host the most recently known good state of all of the systems of record, and aggregate services that interact directly with the systems of record for destructive operations (operations that change the state of the systems of record).

Systems of Record: The systems of record are your domain-specific servers that drive your business functions. These may include user management CRM systems, purchasing systems, reservation systems, and so forth. While these can be new systems in the application you’re building, they are most likely legacy systems with which your application needs to interact. The aggregate services are responsible for abstracting your application from the peculiarities of the systems of record and providing a consistent front-end for your application.

ESB: When systems of record change data, such as by creating a new purchase order, a user “liking” an item, or a user purchasing an airline ticket, the system of record raises an event to a topic. This is where the idea of an event-driven architecture (EDA) comes to the forefront of your application: when the system of record makes a change that other systems may be interested in, it raises an event, and any system interested in that system of record listens for changes and responds accordingly. This is also the reason for using topics rather than using queues: queues support point-to-point messaging whereas topics support publish-subscribe messaging/eventing. If you don’t know who all of your subscribers are when building your application (which you shouldn’t, according to EDA) then publishing to a topic means that anyone can later integrate with your application by subscribing to your topic.

Whenever interfacing with legacy systems, it is desirable to shield the legacy system from load. Therefore, we implement a caching system that maintains the currently known good state of all of the systems of record. And this caching system utilizes the EDA paradigm to listen to changes in the systems of record and update the versions of the data it hosts to match the data in the systems of record. This is a powerful strategy, but it also changes the consistency model from being consistent to being eventually consistent. To illustrate what this means, consider posting an update on your favorite social media site: you may see it immediately, but it may take a few seconds or even a couple minutes before your friends see it. The data will eventually be consistent, but there will be times when the data you see and the data your friends see doesn’t match. If you can tolerate this type consistency then you can reap huge scalability benefits.

NoSQL: Finally, there are many storage options available, but if your application needs to store a huge amount of data it is far easier to scale by using a NoSQL document store. There are various NoSQL document stores, and the one you choose will match the nature of your data. For example, MongoDB is good for storing searchable documents, Neo4J is good at storing highly inter-related data, and Cassandra is good at storing key/value pairs. I typically also recommend some form of search index, such as Solr, to accelerate queries to frequently accessed data.

Let’s begin our deep-dive investigation into this architecture by reviewing service-oriented architectures and REST.

REpresentational State Transfer (REST)

The best pattern for dividing an application into tiers is to use a service-oriented architecture (SOA). There are two main options for this, SOAP and REST. There are many reasons to use each protocol that I won’t go into here, but for our purposes REST is the better choice because it is more scalable.

REST was defined in 2000 by Roy Fielding in his doctoral dissertation and is an architectural style that models elements as a distributed hypermedia system that rides on top of HTTP. Rather than thinking about services and service interfaces, REST defines its interface in terms of resources, and services define how we interact with these resources. HTTP serves as the foundation for RESTful interactions and RESTful services use the HTTP verbs to interact with resources, which are summarized as follows:

  • GET: retrieve a resource

  • POST: create a resource

  • PUT: update a resource

  • PATCH: partially update a resource

  • DELETE: delete a resource

  • HEAD: does this resource exist OR has it changed?

  • OPTIONS: what HTTP verbs can I use with this resource

For example, I might create an Order using a POST, retrieve an Order using a GET, change the product type of the Order using a PATCH, replace the entire Order using a PUT, delete an Order using a DELETE, send a version (passing the version as an Entity Tag or eTag) to see if an Order has changed using a HEAD, and discover permissible Order operations using OPTIONS. The point is that the Order resource is well defined and then the HTTP verbs are used to manipulate that resource.

In addition to keeping application resources and interactions clean, using the HTTP verbs can greatly enhance performance. Specifically, if you define a time-to-live (TTL) on your resources, then HTTP GETs can be cached by the client or by an HTTP cache, which offloads the server from constantly rebuilding the same resource.

REST defines three maturity levels, affectionately known as the Richardson Maturity Model (because it was developed by Leonard Richardson):

  1. Define resources

  2. Properly use the HTTP verbs

  3. Hypermedia Controls

Thus far we have reviewed levels 1 and 2, but what really makes REST powerful is level 3. Hypermedia controls allow resources to define business-specific operations or “next states” for resources. So, as a consumer of a service, you can automatically discover what you can do with the resources. Making resources self-documenting enables you to more easily partition your application into reusable components (and hence makes it easier to divide your application into tiers).

Sideline: you may have heard the acronym HATEOAS, which stands for Hypermedia as the Engine of Application State. HATEOAS is the principle that clients can interact with an application entirely through the hypermedia links that the application provides. This is essentially the formalization of level 3 of the Richardson Maturity Model.

RESTful resources maintain their own state so RESTful web services (the operations that manipulate RESTful resources) can remain stateless. Stateless-ness is a core requirement of scalability because it means that any service instance can respond to any request. Thus, if you need more capacity on any service tier, you can add additional virtual machines to that tier to distribute the load. To illustrate why this is important, let’s consider a counter-example: the behavior of stateful servers. When a server is stateful then it maintains some client state, which means that subsequent requests by a client to that server need to be sent to that specific server instance. If that tier becomes overloaded then adding new server instances to the tier may help new client requests, but will not help existing client requests because the load cannot be easily redistributed.

Furthermore, the resiliency requirements of stateful servers hinder scalability because of fail-over options. What happens if the server to which your client is connected goes down? As an application architect, you want to ensure that client state is not lost, so how to we gracefully fail-over to another server instance? The answer is that we need to replicate client state across multiple server instances (or at least one other instance) and then define a fail-over strategy so that the application automatically redirects client traffic to the failed-over server. The replication overhead and network chatter between replicated servers means that no matter how optimal the implementation, scalability can never be linear with this approach.

Stateless servers do not suffer from this limitation, which is another benefit to embracing a RESTful architecture. REST is the first step in defining a cloud-based scalable architecture. The next step is creating an event-driven architecture.

Deploying to the Cloud

This paper has presented an overview of a cloud-based architecture and provided a cursory look at REST and EDA. Now let’s review how such an application can be deployed to and leverage the power of the cloud.

Deploying RESTful Services

RESTful web services, or the operations that manage RESTful resources, are deployed to a web container and should be placed in front of the data store that contains their data. These web services are themselves stateless and only reflect the state of the underlying data they expose, so you are able to use as many instances of these servers as you need. In a cloud-based deployment, start enough server instances to handle your normal load and then configure the elasticity of those services so that new server instances are added as these services become saturated and the number of server instances is reduced when load returns to normal. The best indicator of saturation is the response time of the services, although system resources such as CPU, physical memory, and VM memory are good indicators to monitor as well. As you are scaling these services, always be cognizant of the performance of the underlying data stores that the services are calling and do not bring those data stores to their knees.

The above graphics shows that the services that interact with Document Store 1 can be deployed separately, and thus scaled independently, from the services that interact with Document Store 2. If Service Tier 1 needs more capacity then add more server instances to Service Tier 1 and then distribute load to the new servers.

Deploying an ESB

The choice of whether or not to use an ESB will dictate the EDA requirements for your cloud-based deployment. If you do opt for an ESB, consider partitioning the ESB based on function so that excessive load on one segment does not take down other segments.

 The importance of segmentation is to isolate the load generated by System 1 from the load generated by System 2. Or stated another way, if System 1 generates enough load to slow down the ESB, it will slow down its own segment, but not System 2’s segment, which is running on its own hardware. In our initial deployment we had all of our systems publishing to a single segment, which exhibited just this behavior! Additionally, with segmentations, you are able to scale each segment independently by adding multiple servers to that segment (if your ESB vendor supports this).

Cloud-based applications are different from traditional applications because they have different scalability requirements. Namely, cloud-based applications must be resilient enough to handle servers coming and going at will, must be loosely-coupled, must be as stateless as possible, must expect and plan for failure, and must be able to scale from a handful of server to tens of thousands of servers.

There is no single correct architecture for cloud-based applications, but this paper presented an architecture that has proven successful in practice making use of RESTful services and an event-driven architecture. While there is much, much more you can do with the architecture of your cloud application, REST and EDA are the basic tools you’ll need to build a scalable application in the cloud.

 

Take five minutes to get complete visibility into the performance of your cloud applications with AppDynamics today.

Monitoring Apps on the Cloud Foundry PaaS

At AppDynamics, we pride ourselves on making it easier to monitor complex applications. This is why we are excited to announce our partnership with Pivotal to make it easier to deploy built-in application performance monitoring to the cloud.

 

Getting started with Pivotal’s Cloud Foundry Web Service

Cloud Foundry is the open platform as a service, developed and operated by Pivotal. You can deploy applications to the hosted Pivotal Web Services (much like you host apps on Heroku) or you can run your own Cloud Foundry PaaS on premise using Pivotal CF. Naturally, Cloud Foundry is an open platform that is used and operated by many companies and service providers.

1) Sign up for a Pivotal CF account and AppDynamics Pro SaaS account

In the future, Pivotal Web Services will include the AppDynamics SaaS APM services, so you’ll only need to sign up for Pivotal Web Services and it will automatically create an AppDynamics account.

2) Download the Cloud Foundry CLI (Command Line Interface)

Pivotal Web Services has both a web based GUI as well as a full featured linux style command line interface (CLI). Once you have a PWS account, you can download a Mac, Windows or Unix CLI from the “Tools” tab in the PWS dashboard or directly for OSX, Linux, and Windows.

Pivotal Web Services CLI

3) Sign in with your Pivotal credentials

Using the CLI, log in to your Pivotal Web Services account. Remember to preface all commands given to Cloud Foundry with “cf”.  Individual Cloud Foundry PaaS clouds are identified by their domain API endpoint. For PWS, the endpoint is api.run.pivotal.io. The system will automatically target your default org (you can change this later) and ask you to select a space (a space is similar to a project or folder where you can keep a collection of app(s).

$ cf login

Cloud Foundry CLI 

Monitoring Cloud Foundry apps on Pivotal Web Services

Cloud Foundry uses a flexible approach called buildpacks to dynamically assemble and configure a complete runtime environment for executing a particular class of applications. Rather than specifying how to run applications, your developers can rely on buildpacks to detect, download and configure the appropriate runtimes, containers and libraries. The AppDynamics agent is built-in to the Java buildpack for easy instrumentation so if you have AppDynamics monitoring running, the Cloud Foundry DEA will auto-detect the service and enable the agent in the buildpack. If you start the AppDynamics monitoring for an app already running, just restart the app and the DEA will autodetect the new service.

1) Clone the Spring Trader demo application

The sample Spring Trader app is provided by Pivotal as a demonstration. We’ll use it to show how monitoring works. First git clone the app from the Github repository.

$ git clone https://github.com/cloudfoundry-samples/rabbitmq-cloudfoundry-samples

2) Create a user provided service to auto-discover the AppDynamics agent

$ cf create-user-provided-service demo-app-dynamics-agent -p “host-name,port,ssl-enabled,account-name,account-access-key”

Cloud Foundry CLI

Find out more about deploying on PWS in the Java buildpack docs.

3) Use the Pivotal Web Services add-on marketplace to add a cloud based AMQP + PostgreSQL database instance

$ cf create-service elephantsql turtle demo-db

$ cf create-service cloudamqp lemur demo-amqp

Cloud Foundry CLI

4) Bind PostgreSQL, AMQP, and AppDynamics services to app

$ git clone https://github.com/cloudfoundry-samples/rabbitmq-cloudfoundry-samples

$ cd rabbitmq-cloudfoundry-samples/spring

$ mvn package

$ cf bind-service demo-app demo-app-dynamics-agent

$ cf bind-service demo-app demo-amqp

$ cf bind-service demo-app demo-db

Cloud Foundry CLI

5) Push the app to production using the Cloud Foundry CLI (Command Line Interface)

$ cf push demo-app -i 1 -m 512M -n demo-app -p target/rabbitmq-spring-1.0-SNAPSHOT.war

Cloud Foundry CLI

Spring AMQP Stocks Demo App

Spring Trader

Pivotal Web Services Console

Pivotal PaaS CloudFoundry

 

 

Production monitoring with AppDynamics Pro

Monitor your critical cloud-based applications with AppDynamics Pro for code level visibility into application performance problems.

AppD Dashboard

Pivotal is the proud sponsor of Spring and the related open-source JVM technologies Groovy and Grails. Spring helps development teams build simple, portable, fast, and flexible JVM-based systems and applications. Spring is the most popular application development framework for enterprise Java. AppDynamics Java agent supports the latest Spring framework and Groovy natively. Monitor the entire Pivotal stack including TC server and Web Server, GreenPlum, RabbitMQ, and the popular Spring framework:

AppD

 

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.