Why Every PHP Application Should Use an OpCache

PHP 5.5 introduced opcode caching into the core via OPCache.  OPCache was previously known as Zend Optimizer+, and although free, was closed source. Zend decided to open source the implementation, and include it in the core PHP distribution. OPCache is also available as an extension through pecl, and is compatible all the way back to PHP 5.2. While other opcode caching solutions like APC exist, now that OPCache is bundled with PHP, it will likely become the standard going forward.

What is an opcode cache, and how does it work? Every time a PHP script is requested, the PHP script will be parsed and compiled into opcode which then is executed in the Zend Engine. This is what allows PHP developers to skip the compilation step required in other languages like Java or C# — you can make changes to your PHP code and see those changes immediately. However, the parsing and compiling steps increase your response time, and in a non-development environment are often unnecessary, since your application code changes infrequently.

When an opcode cache is introduced, after a PHP script is interpreted and turned into opcode, it’s saved in shared memory, and subsequent requests will skip the parsing and compilation phases and leverage the opcode stored in memory, reducing the execution time of PHP.

How much benefit can you expect from an opcode cache?  Like many things in life, the answer is it depends. To test the benefit of OPCache, we have taken an existing PHP demo application used at AppDynamics, and installed OPCache. The OPCache settings were fairly straightforward, but we opted to use 0 for the refresh rate, which means a script will never be checked to see if it’s updated. While applicable for a production environment, it means you must delete the opcache cache when deploying new code.

The demo application is a simple e-commerce site built on top of Symfony 2 and PHP 5.4, leveraging a MySQL database, memcache and a backend Java service. For the test, the demo application is running on a medium ec2 instance (database, memcached, and Java services are on separate instances) with a small but steady amount of load on four different pages within the application.

In order to understand the performance benefit of enabling OPCache, the AppDynamics PHP agent was installed. The PHP agent auto-discovers application topology, and tracks metrics and flow maps for business transactions, app services, and backends in your web application by injecting instrumentation in the PHP-enabled web server instance at runtime. By leveraging the metrics collected by AppDynamics, we can see the decrease in response time OPCache provides.

Once OPCache was enabled on the application, there was a 14% reduction in response time for the application overall. AppDynamics has a feature called “compare releases” which allows you to select to separate time ranges and compare key metrics. In the screenshot below, we are comparing two small time ranges – March 14th from 9:00am to 12:00pm and March 14th from 1:00pm to 4:00pm, as OPCache was enabled at 12:10pm on March 14th.

While a 14% decrease in response time is good, especially considering the minimal amount of work required to install and enable OPCache, it may be less than you were expecting. The overall application decrease in response time obscures the variation seen across different pages within the application.

AppDynamics analyzes a concept called a business transaction, which represents an aggregation of similar user requests to accomplish a logical user activity. In this demo application, we were generating load on four specific business transactions: View Product, Search, Login, and Loop. Using the compare releases functionality from AppDynamics, instead of focusing on the individual business transactions, we see a lot of variation between the different business transactions in response time once OPCache was introduced.

Let’s look at each business transaction and determine why some transactions saw a large reduction in response time, while others experience a moderate or minimal decrease in response time.

The login business transaction saw a substantial decrease in response time, 74%.

The Login business transaction is relatively simple, as shown by the AppDynamics flow map below (a flow map graphically represents the tiers, nodes, and backends and the process flows between them in a managed application). The transaction goes through a standard Symfony controller and renders a basic html form to login — there are no databases or external services involved. On this particular business transaction, a majority of the response time was spent parsing and compiling the PHP. Once those steps are removed via an OPCache, the response time drops dramatically.

The Product View business transaction experience a similar decrease in response time at 74%.

The Product View business transaction relies on both memcache and MySQL database, although only 2% of the request time is spent outside of PHP (after OPCache was turned on, this increased to 8%), and hence we see a large benefit from opcode caching like we saw in the Login business transaction.

The Search business transaction response time dropped by only 8%.

Looking at the flow map, the majority of the response time is spent on the Java backend service, with a small amount of time spent in the network. Enabling OPCode  resulted in a 70% reduction in response time in PHP, but since PHP was only 11% of the overall response time, the effect was muted.

Before:After:

The Loop business transaction is not part of the demo application, but was added specifically for this test. The Loop business transaction saw only a 6% decrease in time.

Loop is 3 lines of code – it loops 10 millions times and increments a counter on each loop. The amount of time it takes to parse and compile the code is small compared with the time it takes to actually execute the opcode, hence the small decrease in response time form enabling opcode caching.

To illustrate the difference we can review a call graph of each transaction. AppDynamics captures snapshots of certain requests, and a snapshot contains the call graph. Looking at the call graphs below for Loop and Login, we see Login has a lot more PHP code to parse and compile:

 

Loop

 

Login

In summary, opcode caches provide a quick way to decrease the latency of your PHP application and should always be enabled in production PHP environments. The decrease in response time will primarily depend on two things: 1. The amount of time the request spends in PHP. If your application spends a lot of time waiting for a database to return results or relies on slow third party web services, the decrease in response time from an opcode cache will be on the lower side. 2. If your PHP scripts are very basic, including only the minimal amount of code to process the request, as compared to using a framework, then the reduction in response time will also be limited. Get started by upgrading to PHP 5.5 or installing Zend OpCache today.

Take five minutes to get complete visibility and control into the performance of your production applications with AppDynamics Pro today.

Big Data Monitoring

The term “Big Data” is quite possibly one of the most difficult IT-related terms to pin down ever. There are so many potential types of, and applications for Big Data that it can be a bit daunting to consider all of the possibilities. Thankfully, for IT operations staff, Big Data is mostly a bunch of new technologies that are being used together to solve some sort of business problem. In this blog post I’m going to focus on what IT Operations teams need to know about big data technology and support.

Big Data Repositories

At the heart of any big data architecture is going to be some sort of NoSQL data repository. If you’re not very familiar with the various types of NoSQL databases that are out there today I recommend reading this article on the MongoDB website. These repositories are designed to run in a distributed/clustered manner so they they can process incoming queries as fast as possible on extremely large data sets.

MongoDB Request Diagram

Source: MongoDB

An important concept to understand when discussing big data repositories is the concept of sharding. Sharding is when you take a large database and break it down into smaller sets of data which are distributed across server instances. This is done to improve performance as your database can be highly distributed and the amount of data to query is less than the same database without sharding. It also allows you to keep scaling horizontally and that is usually much easier than having to scale vertically. If you want more details on sharding you can reference this Wikipedia page.

Application Performance Considerations

Monitoring the performance of big data repositories is just as important as monitoring the performance of any other type of database. Applications that want to use the data stored in these repositories will submit queries in much the same way as traditional applications querying relational databases like Oracle, SQL Server, Sybase, DB2, MySQL, PostgreSQL, etc… Let’s take a look at more information from the MongoDB website. In their documentation there is a section on monitoring MongoDB that states “Monitoring is a critical component of all database administration.” This is a simple statement that is overlooked all too often when deploying new technology in most organizations. Monitoring is usually only considered once major problems start to crop up and by that time there has already been impact to the users and the business.

RedisDashboard

Dashboard showing Redis key metrics.

One thing that we can’t forget is just how important it is to monitor not only the big data repository, but to also monitor the applications that are querying the repository. After all, those applications are the direct clients that could be responsible for creating a performance issue and that certainly rely on the repository to perform well when queried. The application viewpoint is where you will first discover if there is a problem with the data repository that is actually impacting the performance and/or functionality of the app itself.

Monitoring Examples

So now that we have built a quick foundation of big data knowledge, how do we monitor them in the real world?

End to end flow – As we already discussed, you need to understand if your big data applications are being impacted by the performance of your big data repositories. You do that by tracking all of the application transactions across all of the application tiers and analyzing their response times. Given this information it’s easy to identify exactly which components are experiencing problems at any given time.

FS Transaction View

Code level details – When you’ve identified that there is a performance problem in your big data application you need to understand what portion of the code is responsible for the problems. The only way to do this is by using a tool that provides deep code diagnostics and is capable of showing you the call stack of your problematic transactions.

Cassandra_Call_Stack

Back end processing – Tracing transactions from the end user, through the application tier, and into the backend repository is required to properly identify and isolate performance problems. Identification of poor performing backend tiers (big data repositories, relational databases, etc…) is easy if you have the proper tools in place to provide the view of your transactions.

Backend_Detection

AppDynamics detects and measures the response time of all backend calls.

Big data metrics – Each big data technology has it’s own set of relevant KPIs just like any other technology used in the enterprise. The important part is to understand what is normal behavior for each metric while performance is good and then identify when KPIs are deviating from normal. This combined with the end to end transaction tracking will tell you if there is a problem, where the problem is, and possibly the root cause. AppDynamics currently has monitoring extensions for HBase, MongoDB, Redis, Hadoop, Cassandra, CouchBase, and CouchDB. You can find all AppDynamics platform extensions by clicking here.

HadoopDashboard

Hadoop KPI Dashboard 1

HadoopDashboard2

Hadoop KPI Dashboard 2

Big data deep dive – Sometimes KPIs aren’t enough to help solve your big data performance issues. That’s when you need to pull out the big guns and use a deep dive tool to assist with troubleshooting. Deep dive tools will be very detailed and very specific to the big data repository type that you are using/monitoring. In the screen shots below you can see details of AppDynamics monitoring for MongoDB.

MongoDB Monitoring 1

 MongoDB Monitoring 2

MongoDB Monitoring 3

MongoDB Monitoring 4

If your company is using big data technology, it’s IT operations’ responsibility to deploy and support a cohesive performance monitoring strategy for the inevitable performance degradation that will cause business impact. See what AppDynamics has to offer by signing up for our free trial today.

IT Operations and the Team USA Speed Skating Disaster

Photo Credit: http://sports.yahoo.com/news/speed-skating-u-miss-medals-again-172417859–oly.html

In case you weren’t aware, Team USA speed skating came home from the Olympic games in Sochi with zero medals. Not just zero Gold medals, zero medals in total (not counting short track skating which only netted one silver medal). Now I’m not a big speed skating fan but what happened during the Olympic games reminded me of what I had seen all too often while working in operations at some very large enterprises. So that’s the topic for today’s blog… What IT organizations can learn from the USA speed skating melt down in Sochi (and visa versa).

As the Team USA speed skaters were turning in poor performance after poor performance on the ice, their coaches were trying to figure out why their athletes were not competing at the level they expected and how to fix the issue. The same things happens in IT organizations when things go wrong with application performance. The business leans on the IT organization and they try to figure out what is going wrong so they can fix the issue.

Team USA decided that their new suits could be the reason why no medals had been won yet and requested to change back to the suits they had been using before they made the switch. Did they not test the new suits at all? No, of course they tested them. The important question is; how did they perform these tests?

Testing Parameters

Altitude, air density, humidity levels, varying velocity, body positions, body shapes, etc… There are a ton of different factors that may or may not have been accounted for during the testing of these suits. It’s not possible to test every variant of every parameter before using the suits in a race just like it’s not possible to properly test an application for every scenario before it get’s released into production.

The fact is, at some point you have to transition from testing to using in a real life scenario. For Team USA, that means using the new suits in races. For IT professionals, it means deploying your application into production.

Cover Up the Air Vent (Blame It On the Database)

The initial reaction by Team USA was to guess that the performance problem was a result of the back air vent system creating turbulent air flow during the race. They proceeded to cover up this air vent with no improvement in results. This is the IT equivalent of blaming application performance problems on the database or the network. This is a natural reaction in the absence of data that can help isolate the location of the problem.

Isolating the bottleneck in a production application can be easy. Here we see there is a very slow call to the database.

Isolating the bottleneck in a production application can be easy. Here we see there is a very slow call to the database.

When it was obvious that a simple closure of the air vent did not have the desired effect, Team USA then decided it was time to switch back to their old suits. In the IT world we call this the back out plan. Roll back the code to the last known good version and hope for the best.

It’s All About the Race Results (aka Production)

No matter how well you test speed skating suits or application releases, the measure of success or failure is how well you do on race day (or in production for the IT crowd). As Team USA found out, you can’t predict how things will go in production from your tests alone. Similar to an IT organization holding off on application changes before major events (like end of year financials, Black Friday, Cyber Monday, etc…), Team USA should have proven their new suits in some races leading up to their biggest event instead of making a suit change right before it.

I feel bad for the amazing athletes of Team USA Speed Skating that had to deal with so much drama during the Olympics. I would imagine it was difficult to perform at the highest level required with a major distraction like their new suits. In the end, there was no difference in results between using the new suits or the old ones. The suits were just a distraction from whatever the real issue was.

IT professionals have the luxury of tools that help them find and fix problems in production.

IT professionals have the luxury of tools that help them find and fix problems in production.

Fortunately for IT professionals we have tools that help us find problems in production instead of just having to guess what the root cause of the issue might be. If you’re a fan of USA Speed Skating, sorry for bringing up a sore subject but hopefully there is a lesson that has been learned for next time. If you’re an IT professional without the proper monitoring tools to figure out the root cause of your production issues you need to sign up today for a free trial of AppDynamics and see what you’ve been missing.

The Digital Enterprise – Problems and Solutions

According to a recent article featured in Wall Street and Technology, Financial Services (FS) companies have a problem. The article explains that FS companies built more datacenter capacity than they needed when profits were up and demand was rising. Now that profits are lower and demand has not risen as expected the data centers are partially empty and very costly to operate.

FS companies are starting to outsource their IT infrastructure and this brings a new problem to light…

“It will take a decade to complete the move to a digital enterprise, especially in financial services, because of the complexity of software and existing IT architecture. “Legacy data and applications are hard to move” to a third party, Bishop says, adding that a single application may touch and interact with numerous other applications. Removing one system from a datacenter may disrupt the entire ecosystem.”

Serious Problems

The article calls out a significant problem that FS companies are facing now and will be for the next decade but doesn’t mention a solution.

The problem is that you can’t just pick up an application and move it without impacting other applications. Based upon my experience working with FS applications I see multiple related problems:

  1. Disruption of other applications
  2. Baselining performance and availability before the move
  3. Ensuring performance and availability after the move

All of these problems increase risk and the chance that users will be impacted.

Solutions

1. Disruption of other applications – The solution to this problem is easy in theory and traditionally difficult in practice. The theory is that you need to understand all of the external interactions with application you want to move.

One solution is to use ADDM (Application Discovery and Dependency Mapping) tools that scan your infrastructure looking for application components and the various communications to and from them. This method works okay (I have used it in the past) but typically requires a lot of manual data manipulation after the fact to improve the accuracy of the discovered information.

ADDM1

ADDM product view of application dependencies.

Another solution is to use an APM (Application Performance Management) tool to gather the information from within the running application. The right APM tool will automatically see all application instances (even in a dynamically scaled environment) as well as all of the communications into and out of the monitored application.

Distributed Application View

APM visualization of an application and it’s components with remote service calls.

Remote Services 1

APM tool list of remote application calls with response times, throughput and errors.

 

A combination of these two types of tools would provide the ultimate in accurate and easy to consume information (APM strength) along with flexibility to cover all of the one off custom application processes that might not be supported by an APM tool (ADDM strength).

2. Baselining performance and availability before the move – It’s critically important to understand the performance characteristics of your application before you move. This will provide the baseline required for comparison sake after you make the move. The last thing you want to do is degrade application performance and user satisfaction by moving an application. The solution here is leveraging the APM tool referenced in solution #1. This is a core strength of APM and should be leveraged from multiple perspectives:

  1. Overall application throughput, response times, and availability
  2. Individual business transaction throughput and response times
  3. External dependency throughput and response times
  4. Application error rate and type
Application overview and baseline

Application overview with baseline information.

transactions and baselines

Business transaction overview and baseline information.

3. Ensuring performance and availability after the move – Now that your application has moved to an outsourcer it’s more important than ever to understand performance and availability. Invariably your application performance will degrade and the finger pointing between you and your outsourcer will begin. That is, unless you are using an APM tool to monitor your application. The whole point of APM tools is to end finger pointing and to reduce mean time to restore service (MTRS) as much as possible. By using APM after the application move you will provide the highest level of service to your customers as possible.

Compare Releases

Comparison of two application releases. Granular comparison to understand before and after states. – Key Performance Indicators

Compare releases 2

Comparison of two application releases. Granular comparison to understand before and after states. – Load, Response Time, Errors

If you’re considering or in the process of transitioning to a digital enterprise you should seriously consider using APM to solve a multitude of problems. You can click here to sign up for a free trial of AppDynamics and get started today.

IT holds more business influence than they realise

A ‘well oiled’ organization is one where IT and the rest of the business are working together and on the same page. In order to achieve this there needs to be good communication, and for good communication there needs to be a common language.

In most organizations, while IT are striving to achieve their goal of 99.999% availability, the rest of the business is looking to drive additional revenue, increase user satisfaction, and reduce customer churn.

Ultimately everyone should be working towards a common goal: SUCCESS. Unfortunately different teams define their success in different ways and this lack of alignment often results in a mistrust between IT departments and the rest of the business.

Measuring success

Let’s look at how various teams within a typical organization define success today:

Operations:
IT ops teams are responsible for reducing risk, ensuring the application is available and the ‘lights are green’. The number ‘99.9’ can either be IT Ops best friend or its worst enemy. Availability targets such as these are often the only measure of ops success or failure, meaning many of the other things you are doing often go unnoticed.

Availability targets don’t show business insight, or the positive impact you’re having on the business. For instance, how much did performance improve after you implemented that change last week? Has the average order size increased? How many additional orders can the application process since re-platforming? Is anyone measuring what the performance improvement gains were for that change you implemented last week?

Development:
Dev teams are focussed on change. The Business demands they release more frequently, with more features, less defects, less resources and often less sleep! Dev teams are often targeted according to the number of updates and changes they can release. But nobody is measuring the success of these changes. Can anyone in your dev team demonstrate what the impact of your last code release was? Did revenues climb? Were users more satisfied? Were there an increased number of orders placed?

‘The Business’:
The business is focussed on targets; last month’s achievements and end of year goals. This means they concentrate on the past and the future, but have little or no idea what impact IT is having on the business in the present. Consulting a data warehouse to gather ‘Business Intelligence’ at the end of the month does not allow you to keep your finger on the pulse of the business.

With everyone focussing on different targets there is no real alignment to the overall business goals between different parts of an organization. One reason for this disconnect is due to the lack of meaningful shared metrics. More specifically, it’s access to these metrics in real-time that is the missing link.

If I asked how much revenue has passed through your application since reading this blogpost, or what impact your last code release had on customer adoption, how quickly could you find the answers? How quickly could anyone in your organization find the answers?

What if answers to these questions only took seconds?

Monitoring the Business in Real-time

In a previous post, I introduced AppDynamics Real-time Business Metrics which enables you to easily collect, auto-baseline, alert, and report on the Business data that is flowing through your applications… as it’s really happening.

This post demonstrates how to configure AppDynamics to extract all checkout revenue values from every business transaction and make this available as a new metric “Checkout Revenue” which can be reported in real-time just like any other AppDynamics metric.

With IT Ops, Dev and Business Owners all supporting business critical applications that are responsible for generating revenue, it is a great example of a business metric that could be used by every team to measure success.

Let’s look at a few examples of how this could change the way you do business, if everyone was jointly focussed on the same business metric.

Outage cost
The below example shows the revenue per minute vs. the response time per minute of an application. This application has obviously suffered an outage that lasted approximately 50 mins and it’s clear to see the impact it has had on the business in lost revenue. The short spike/increase in revenue seen after the outage indicates users who returned to complete their transaction, but this is not enough to recover the lost revenue for the period.

RtBM - outage

Impact of agile releases
This example shows the result of a performance improvement program that has taken place. The overall response time has improved by over a second across three code releases and you can clearly see the additional revenue that has been generated as a result of the code releases.

RtBM - agile releases

Here a 1 second improvement in response time has increased the revenue being generated by the online booking system by more than 30%. The value a development team is delivering back to the business is clearly visible with each new release, allowing developers to focus on the areas that drive the most return and quantify the value they are delivering.

Marketing campaign
This example is a little more complex. At midday there is a massive increase in the number of people visiting this eCommerce website due to an expensive TV advertising campaign. The increased load on the system has resulted in a small increase in the overall response time but nothing too significant. However, despite the increased traffic to the site, the revenue has not improved. If we take a look at the Number of Checkouts, which is a second Business Metric that has been configured, it’s clear the advertising campaign has driven additional users to the site, but these users have not generated additional revenue.

RtBM - marketing

Common metrics for common success

With traditional methods of measuring success in different ways it’s impossible to to align towards a common goal. This creates silo’d working environments that make it impossible for teams to collaborate and prioritise.

By enabling all parts of the business to focus on the business metrics that really matter, organizations benefit from being able to proactively prioritise and resolve issues when they occur. It helps IT truly align with the priorities and needs of the business, allowing them to speak the same language and manage the bottom line. For example, after implementing AppDynamics Real-time Business Metrics Doug Strick, who is the Internet Application Admin at Garmin, said the following:

“We can now understand how the application is growing over time. This data will prove invaluable in guiding future decisions in IT.”
-Doug Strick, Internet Application Admin

AppDynamics Real-time Business Metrics enable you to identify business challenges and react to them immediately, instead of waiting hours, days or even weeks for answers. Correlating performance, user experience, and Business metrics together in real-time and in one place.

If you want to capture the business performance and measure your success against it in real-time , you can get started today with Real-time Business Metrics by signing up and taking a free trial of AppDynamics Pro here.

A Real Example of the Database to Storage Performance Relationship

Most enterprise databases today run on shared storage volumes (SAN, NAS, etc…) that are accessed over the network or via Fibre Channel connection. The shared storage concept is great for helping to keep storage infrastructure and management costs relatively low but creates cross silo finger pointing when there are performance issues. In this blog post we will explore a real world example of how to avoid finger pointing and get right down to figuring out how to fix the problem.

One Rotten Apple Can Ruin The Whole Bunch

This story dates back to June of 2012 but I just came across it so it is new to me. One of our customers had an event which impacted the performance of multiple databases. All of these databases were connected to the same NetApp storage array. Often when there is an issue with database performance the DBAs will point the finger at the storage team and the storage team will tell the DBA team that everything looks good on their side. This finger pointing between silo’s is a common occurrence between various groups (network, storage, database, application support, etc…) within enterprise organizations.

In the chart below (screen grab taken from AppDynamics for Databases) you can see that there was a significant increase in I/O activity on dw_logvol. This issue impacted the performance of the entire NetApp storage array.

NetApp Storage Issue

As it turns out dw_logvol was used as a temporary storage location for web logs. There was a process that would copy log files to this location, decompress them, and insert them into an Oracle data warehouse for long term storage. This process normally would not impact the performance of anything else connected to the same storage array but in this case there happened to be corrupted log files that could not be properly decompressed. This resulted in multiple attempts to retransmit and decompress the same files.

Context and Collaboration to the Rescue

Storage teams normally don’t have access to application context and application teams normally don’t have access to storage metrics. In this case though, both teams were able to collaborate and quickly realize what the problem was as a result of having a monitoring solution that was available to everyone. The fix for this problem was really easy, just remove the corrupted files and replace them with versions without any corruption. You can see activity return to normal in the chart below.

NetApp Storage Issue After

Modern application architectures require collaboration across all silo’s in order to identify and fix issues in a timely manner. One of the key enablers of cross-silo collaboration is intelligent monitoring at each layer of the application and the infrastructure components that provide the underlying resources. AppDynamics provides end-to-end visibility in an analytics based solution that help you identify, isolate and remediate issues. Try AppDynamics for Databases and Storage for free today and bring a new level of collaboration to your organization.

What’s up with the Network and Storage teams?

yoda-talk-to-handA few weeks ago I was presenting at CMG Performance and Capacity 2013 and during my presentation we (myself and a few audience members) got slightly side-tracked. Our conversation somehow took a turn and became a question of why it was so hard to get performance data from the network and storage teams. Audience members were asking me why, when they requested this type of data, they were typically stonewalled by these organizations.

I didn’t have a good answer for this question and in fact I have run into the same problem. Back when I was working in the Financial Services sector I was part of a team that was building a master dashboard that collected data from a bunch of underlying tools and displayed it in a drill-down dashboard format. It was, and still is, a great example of how to provide value to your business by bringing together the most relevant bits of data from your tooling ecosystem.

This master dashboard was focused on applications and included as many of the components involved in delivering each application as possible. Web servers, application severs, middleware, databases, OS metrics, business metrics, etc… were all included and the key performance indicators (KPIs) for each component were available within the dashboard. The entire premise of this master dashboard relied upon getting access to the tools that collected the data via API or through database queries.

The only problems that our group faced in getting access to the data we needed was with the network and storage teams. Why was that? Was it because these teams did not have the data we needed? Was it because these teams did not want anyone to see when they were experiencing issues? Was it for some other reason?

I know the network team had the data we required because they eventually agreed to provide the KPIs we had asked for. This was great, but the process was very painful and took way longer than it should have. Still, we eventually got access to the data. The big problem is that we never got access to the storage data. To this day I still don’t know why we were blocked at every turn but I’m hoping that some of the readers of this blog can share their insight.

Back in the day, when my team was chasing the storage team for access to their monitoring data, there weren’t really any tools that we could find for performance monitoring of storage arrays besides the tools that came with the arrays. These days I would have been able to get the data I needed for NetApp storage by using AppDynamics for Databases (which includes NetApp performance monitoring capabilities). You can read more about it by clicking here.

Have you been stonewalled by the network or storage or some other team? Did you ever get what you were after? Based upon my experiences talking with a lot of different folks at different organizations this seems to be a significant problem today. Are you on a network or storage team? Does your team freely share the data they have? Please share your experience, insight, or questions in the comments below. Just to clarify, I hold no ill will against any of you network or storage professionals out there. I’d just like to hear some opinions and gain some perspective.

An Example of How Node.js is Faster Than PHP – Part 2

In my previous post I installed and configured Ghost (a node.js based blogging platform) and WordPress (a PHP based blogging platform and CMS). The purpose of that blog post was to test relative performance of the 2 platforms to see which one could handle more load. The test doesn’t compare like code between node.js and PHP, but instead was designed to understand what platform was faster from a basic blog functionality standpoint.

The result of the first set of tests was that Ghost was 678% faster than WordPress in their “out of the box” configurations. The test and results spurred a lot of interesting dialogue with many people requesting another test where an opcode cache was in place for WordPress. So that is exactly what this next blog post is about.

The Setup

I fired up the exact servers that I had used in my last round of testing so I have the same configuration as in my original blog post. For this set of tests I stuck with Apache as the web server for both Ghost and WordPress. I also added APC opcode cache by following the instructions in this blog post. It was pretty easy and painless getting APC installed and functional and it definitely made a nice difference in the performance of WordPress.

The Results

As before, I used Siege to apply load to the platforms. As a reminder of our WordPress baseline I ran a load test on Apache + WordPress first without the APC opcode cache. Those results are shown below.

Apache+Wordpress-NoCache-HeavyLoad

Apache+Wordpress under heavy load.

Wordpress-Heavy-Load-CPU

CPU utilization during Apache + WordPress load test.

This load test resulted in 100% CPU utilization just as we had seen in my last blog post. I load tested Apache + Ghost again so that we could compare the base configurations and those results are shown below.

Apache+Ghost-HeavyLoad

Apache + Ghost heavy load test results.

Ghost-Heavy-Load-CPU

CPU utilization during Apache + Ghost heavy load test.

As expected Ghost had a much higher transactional throughput ~654% more than WordPress. So now came the real fun. Configure PHP to use APC, restart Apache, and restart the load test. Those results are shown below.

Apache+Wordpress+APC-HeavyLoad

Apache + WordPress + APC heavy load test results.

Much better results for WordPress this time with ~159% improvement in throughput over WordPress without an opcode cache. Transaction response times were also much better showing with ~70% reduction in shortest response time and ~63 percent reduction in longest response time. That’s a nice performance gain for a small bit of work installing and configuring APC. I have included a couple of screenshots for those who are curious about key cache metrics (notice the high cache hit rate)…

APC Cache Info

APC Cache Hit Rate

While the improvement to WordPress was admirable the fact still remains that Ghost handled the load way better than WordPress. The results of this test show Ghost with a ~190% lead over WordPress when it comes to total throughput, ~51% faster for shortest response time, and ~80% faster for the longest response time.

It’s worth mentioning that the CPU load did not decrease while using the opcode cache during this test. Utilization stayed pegged at 100% for the duration of the test even though throughput and responsiveness improved.

What about lighter loading?

It’s also interesting to understand the difference in platform response time under light loading conditions. The following screen shots all show loads of 10 concurrent users in batches that are spaced 5 seconds apart. The combination of Apache and Ghost is just flat out fast and sets the bar for transaction response time with .01 seconds for the fastest transaction and .07 seconds for the slowest transaction.

Apache+Ghost-LightLoad

Apache + Ghost light load test results.

Apache and WordPress without any opcode cache (shown below) is respectably fast coming in at .20 seconds for the fastest transaction and .66 seconds for the slowest. That is 1900% and 842% worse than Ghost respectively. The percentages are high but the reality is that the page loads are still fast.

Apache+Wordpress-LightLoad

Apache + WordPress light load test results.

Adding the APC opcode cache to the Apache and WordPress combination clearly makes pages load faster even under light load. You can see below that the fastest transaction took .07 seconds and the slowest took .25 seconds. That’s a very nice improvement in speed. It’s still considerably slower than Ghost response times but at these speeds nobody will notice the difference.

Apache+Wordpress+APC-LightLoad

Apache + WordPress + APC light load test results.

Conclusion

One of the major difference between these two platforms is that Ghost was designed to be just a blogging platform so it is not bloated like WordPress is these days. I love the functionality that WordPress offers but as far as plain old blogging platforms go I think Ghost is going to be pretty tough to beat if you need a high throughput platform.

No matter what programming language is used on a project there will always be good code and bad code. By that I mean code that is efficient and effective (good) versus code that is resource heavy and potentially buggy (bad). If your application isn’t performing the way you want it or the way the business needs it to, then you should try installing AppDynamics for free and figure out what the problems are.

AppDynamics goes to QCon San Francisco

AppDynamics is at QCon San Francisco this week for another stellar event from the folks at InfoQ. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. If you are in the area this week stop by our booth and say hello!

I presented the Performance Testing Crash Course highlighting how to capacity plan and load test your applications to gaurantee a smooth launch.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

PHP Performance Crash Course, Part 2: The Deep Dive

In my first post on this series I covered some basic tips for optimizing performance in php applications. In this post we are going to dive a bit deeper into the principles and practical tips in scaling PHP.

Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Those organizations understand that performance has a direct impact on the success of their business.

Ultimately, scalability is about the entire architecture, not some minor code optimizations. Often times people get this wrong and naively think they should focus on the edge cases. Solid architectural decisions like doing blocking work in the background via tasks, proactively caching expensive calls, and using a reverse proxy cache will get you much further than arguing about single quotes or double quotes.

Just to recap some core principles for performant PHP applications:

The first few tips don’t really require elaboration, so I will focus on what matters.

Optimize your sessions

In PHP it is very easy to move your session store to Memcached:

1) Install the Memcached extension with PECL

pecl install memcached

2) Customize your php.ini configuration to change the session handler


session.save_handler = memcached
session.save_path = "localhost:11211"

If you want to support a pool of memcache instances you can separate with a comma:

session.save_handler = memcached
session.save_path = "10.0.0.10:11211,10.0.0.11:11211,10.0.0.12:11211"

The Memcached extension has a variety of configuration options available, see the full list on Github. The ideal configuration I have found if using a pool of servers:

session.save_handler = memcached
session.save_path = "10.0.0.10:11211,10.0.0.11:11211,10.0.0.12:11211"

memcached.sess_prefix = “session.”
memcached.sess_consistent_hash = On
memcached.sess_remove_failed = 1
memcached.sess_number_of_replicas = 2
memcached.sess_binary = On
memcached.sess_randomize_replica_read = On
memcached.sess_locking = On
memcached.sess_connect_timeout = 200
memcached.serializer = “igbinary”

That’s it! Consult the documentation for a complete explanation of these configuration directives.

Leverage caching

Any data that is expensive to generate or query and long lived should be cached in-memory if possible. Common examples of highly cacheable data include web service responses, database result sets, and configuration data.

Using the Symfony2 HttpFoundation component for built-in http caching support

I won’t attempt to explain http caching. Just go read the awesome post from Ryan Tomako, Things Caches Do or the more in-depth guide to http caching from Mark Nottingham. Both are stellar posts that every professional developer should read.

With the Symfony2 HttpFoundation component it is easy to add support for caching to your http responses. The component is completely standalone and can be dropped into any existing php application to provide an object oriented abstraction around the http specification. The goal is to help you manage requests, responses, and sessions. Add “symfony/http-foundation” to your Composer file and you are ready to get started.

Expires based http caching flow

use SymfonyComponentHttpFoundationResponse;

$response = new Response(‘Hello World!’, 200, array(‘content-type’ => ‘text/html’));

$response->setCache(array(
‘etag’ => ‘a_unique_id_for_this_resource’,
‘last_modified’ => new DateTime(),
‘max_age’ => 600,
‘s_maxage’ => 600,
‘private’ => false,
‘public’ => true,
));

If you use both the request and response from the http foundation you can check your conditional validators from the request easily:


use SymfonyComponentHttpFoundationRequest;
use SymfonyComponentHttpFoundationResponse;

$request = Request::createFromGlobals();

$response = new Response(‘Hello World!’, 200, array(‘content-type’ => ‘text/html’));

if ($response->isNotModified($request)) {
$response->send();
}

Find more examples and complete documentation from the very detailed Symfony documentation.

Caching result sets with Doctrine ORM

If you aren’t using an ORM or some form of database abstraction you should consider it. Doctrine is the most fully featured database abstraction layer and object-relational mapper available for PHP. Of course, adding abstractions comes at the cost of performance, but I find Doctrine to be exteremly fast and efficient if used properly. If you leverage the Doctrine ORM you can easily enable caching result sets in Memcached:


$memcache = new Memcache();
$memcache->connect('localhost', 11211);

$memcacheDriver = new DoctrineCommonCacheMemcacheCache();
$memcacheDriver->setMemcache($memcache);

$config = new DoctrineORMConfiguration();
$config->setQueryCacheImpl($memcacheDriver);
$config->setMetadataCacheImpl($memcacheDriver);
$config->setResultCacheImpl($memcacheDriver);

$entityManager = DoctrineORMEntityManager::create(array(‘driver’ => ‘pdo_sqlite’, ‘path’ => __DIR__ . ‘/db.sqlite’), $config);

$query = $em->createQuery(‘select u from EntitiesUser u’);
$query->useResultCache(true, 60);

$users = $query->getResult();

Find more examples and complete documentation from the very detailed Doctrine documentation.

Caching web service responses with Guzzle HTTP client

Interacting with web services is very common in modern web applications. Guzzle is the most fully featured http client available for PHP. Guzzle takes the pain out of sending HTTP requests and the redundancy out of creating web service clients. It’s a framework that includes the tools needed to create a robust web service client. Add “guzzle/guzzle” to your Composer file and you are ready to get started.

Not only does Guzzle support a variety of authentication methods (OAuth 1+2, HTTP Basic, etc), it also support best practices like retries with exponential backoffs as well as http caching.


$memcache = new Memcache();
$memcache->connect('localhost', 11211);

$memcacheDriver = new DoctrineCommonCacheMemcacheCache();
$memcacheDriver->setMemcache($memcache);

$client = new GuzzleHttpClient(‘http://www.test.com/’);

$cachePlugin = new GuzzlePluginCacheCachePlugin(array(
‘storage’ => new GuzzlePluginCacheDefaultCacheStorage(
new GuzzleCacheDoctrineCacheAdapter($memcacheDriver)
)
));
$client->addSubscriber($cachePlugin);

$response = $client->get(‘http://www.wikipedia.org/’)->send();

// response will come from cache if server sends 304 not-modified
$response = $client->get(‘http://www.wikipedia.org/’)->send();

Following these tips will allow you to easily cache all your database queries, web service requests, and http responses.

Moving work to the background with Resque and Redis

Any process that is slow and not important for the immediate http response should be queued and processed via non-blocking background tasks. Common examples are sending social notifications (like Facebook, Twitter, LinkedIn), sending emails, and processing analytics. There are a lot of systems available for managing messaging layers or task queues, but I find Resque for PHP dead simple. I won’t provide an in-depth guide as Wan Qi Chen’s has already published an excellent blog post series about getting started with Resque. Add “chrisboulton/php-resque” to your Composer file and you are ready to get started. A very simple introduction to adding Resque to your application:

1) Define a Redis backend

Resque::setBackend('localhost:6379');

2) Define a background task

class MyTask
{
public function perform()
{
// Work work work
echo $this->args['name'];
}
}

3) Add a task to the queue

Resque::enqueue('default', 'MyTask', array('name' => 'AppD'));

4) Run a command line task to process the tasks with five workers from the queue in the background

$ QUEUE=* COUNT=5 bin/resque

For more information read the official documentation or see the very complete tutorial from Wan Qi Chen:

Monitor production performance

AppDynamics is application performance management software designed to help dev and ops troubleshoot performance problems in complex production applications. The application flow map allows you to easily monitor calls to databases, caches, queues, and web services with code level detail to performance problems:

Symfony2 Application Flow Map

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

If you prefer slide format these posts were inspired from a recent tech talk I presented:

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.