Why DevOps Often Fails in FinServ

Financial services organisations are often late adopters of new technologies and methodologies. There are a number of reasons for this, some related to the way regulators and auditors operate in the industry, others to the difficulties of replacing old systems and practices, which often scale to such levels that it’s not always feasible to replace them without significant disruption.

Furthermore, banks and other FinServ companies are often some of the oldest and largest businesses around, making technological innovation much more challenging than it might be for smaller and newer companies.

Despite these hurdles, large FinServ firms have been investing heavily in digital transformation programmes over the last few years, as modernising the technology stack becomes a competitive necessity. These programmes are aimed at aligning the bank’s technology with recent industry standards, as well as looking for new ways to improve the use of said technology. They often involve the adoption of cloud services, improved automation, better scalability, usage of IaaS/PaaS, and adoption of DevOps methodologies.

A transformative journey is challenging in so many ways. This blog will take a closer look at the DevOps adoption hurdles facing FinServ organisations in their digital transformation.

DevOps Challenges

Why is it so difficult to introduce DevOps in large FinServ companies? I have already highlighted some of the reasons why digitalising large financial services can be challenging, but what specific difficulties do organisations face?

Processes and Structure

By the nature of their business, financial services are heavily process-driven, relying on bureaucracy and structure established over a long period of time. They are also heavily regulated and audited and as such, must carefully consider potential changes such as new methodologies.

One area that might limit the success of DevOps methodologies is the segregation between development and production services. This segregation is driven by role-based access control, which in many cases doesn’t allow software developers to access production environments. As a result, DevOps teams—comprised mostly of developers—can’t access the very software they’re expected to operate in production.

Segregation is caused not only by regulations, but also by historical structure across many FinServ companies, where management of the support groups and development teams is shared only at the CIO level, which means that connecting the Dev and Ops is not always feasible.

Another common theme that separates Dev and Ops is the company’s budget structure. Here’s where two acronyms come into play: Change the Bank (CTB) refers to development projects and budgets assigned for short delivery durations (usually annual projects), while Run the Bank (RTB) refers to operational budgets that get renewed annually.

RTB (or business-as-usual, BAU) support teams are usually handed over completed projects/products that are ready for operations. They will start familiarising themselves with the new products, whereas the CTB teams will most likely be dissolved or move on to develop new products. This approach can conflict with the expected continuity of DevOps methodologies, where a team shepherds products through their entire lifecycle.

Here’s another interesting angle about the way budgets are processed in financial services: budget approvals often require committed financial benefits, which means that newly funded projects are expected to drive future savings. This approach doesn’t always align well with the actual need for DevOps, which is mostly about enabling growth and continuation, and not always cost reduction.

ITIL vs. DevOps

ITIL is a set of detailed practices for IT service management (ITSM). Many articles have addressed the ITIL vs. DevOps debate—whether IT organisations should choose one over the other, or whether DevOps can work together with ITSM disciplines. I’m not taking sides on whether the two are friends or enemies; however, the way ITSM/ITIL is often implemented in FinServ might limit the benefits of DevOps.

One example is the frequency of releases, a core tenet of Agile and DevOps. It’s difficult to operate DevOps methodologies where there are frequent release freezes, or expectations that release cycles adhere to approval processes that may take weeks before a release can go ahead.

One FinServ organisation I worked with was aiming to drive their digital transformation programme as a standalone development entity with full autonomy. The teams were highly innovative and started developing products using cutting-edge technology and DevOps practices.

However, when the programme started looking into the practicalities of releasing products to production, the teams had to take several steps back and reconsider their technology decisions, some of which weren’t approved as part of the organisation’s technology stack. The teams were also required to prepare support handover documentation and follow existing change management procedures. Furthermore, they weren’t allowed access to production environments.

This example highlights the fact that even cutting-edge technology programmes must consider the existing technologies and methodologies used by the organisation, as they’re not likely to be allowed to ignore them.

Skipping the Agile Phase

Since FinServ firms are often late adopters of new technology, they have opportunities to skip, or leapfrog, incremental stages that other enterprises and industries have gone through. But while there are advantages to leapfrogging a few steps, these maturity phases are often necessary for successful implementation.

In DevOps, many organisations and industries have experienced long periods of introducing, fine-tuning and mastering agile development methodologies, which were later complemented by DevOps. While FinServ orgs are trying to move from structured, waterfall-based methodologies straight to DevOps, this transition introduces maturity risks that are not easy to overcome.

When speaking with colleagues in financial services on this topic, I often hear quotes like: “The terms ‘Agile’ and ‘DevOps’ are used as excuses for lack of planning.” I’ve also witnessed a few examples where projects were approved and funded with no clear target deliveries and dates—all on the basis that “This is Agile.”

How FinServ Can Make DevOps Work

Looking at the challenges described above, successful introduction and adoption of DevOps methodologies in FinServ sounds difficult. But being aware of these challenges—and carefully considering them as part of the digital transformation programme—can help prevent most known issues. Here are some suggestions and shared experiences.

Consider a hybrid approach: Don’t get caught up in a quest for DevOps purity, but instead focus on achieving successful agile development. Introduce operational methodologies that contribute to improved agility, mostly where role-based access control and segregation of duties are required by regulators.

Bring Dev and Ops closer: Focusing on DevOps can alienate production services teams, as their roles are theoretically under threat. However, the support of these teams is often critical to the success of new methodologies.

There are a few ways to achieve better relationships between existing support teams and new DevOps teams. Examples include:

  • Integrate support staff into the DevOps pods.

  • Introduce site reliability engineering (SRE) teams that, in addition to providing support services, also focus on DevOps autonomy and reduction of overhead. These teams can eventually integrate with production services.

  • Invest in skilled operations: For Ops teams to become agents of transformation, they need to have the right experience, exposure and skills. In addition to investing in experienced staff, you must provide existing staff with development opportunities.

Another way to introduce DevOps methodologies in FinServ is to identify independent business areas, projects and products that can be delivered and operated using DevOps practices.

There are a few examples in the industry where senior managers gave full backing to DevOps methods in certain projects or product lines. In these successful adoption examples, the DevOps teams managed to isolate their deliveries from the wider technology and process-related dependencies.

One caveat: Having such isolation during development, delivery and operations is rarely feasible in FinServ, so these examples are not common.

Investment in Tools

While the general use of technology in FinServ may lag behind that of other industries, budgeting for some of the best tech tools has never been a problem, and that includes DevOps tools.

The use of tools for development, deployment, testing automation, pipeline automation, analytics, etc., is a must-have for DevOps enablement. But it’s not only about the right tools, it’s also about the successful adoption and use of these tools.

A mature CI/CD process can encompass a significant part of ITSM requirements, and provide governance and confidence in the release process. And with good, well-adopted deployment tools, developers won’t need to access production servers during the release process. Successful adoption of the right monitoring and analytics tools will also provide the information that DevOps teams need to capture and troubleshoot potential faults—again, removing the need for access to production environments.

Summary

A significant gap remains between IT industry gurus who often live and breathe DevOps, and financial services enterprises which—despite their best efforts—face many hurdles when trying to introduce these methodologies. The good news is that FinServ can make DevOps work by embracing a pragmatic approach to DevOps, improving relationships between existing support teams and new DevOps teams, investing in the right tools for DevOps enablement, and other steps.

AppDynamics has been used by many of the top financial services organisations in their own digital transformation journeys. Learn more about how AppDynamics can help accelerate your own DevOps adoption.

Best Practice for ITSM Professionals Using Monitoring and Alerting

I’ve only been with AppDynamics a few months and wish I knew 10 years ago what I know now. Using technology, businesses have moved on leaps and bounds, but the fundamental enterprise IT challenges remain the same — just with increased complexity.

10 years ago, as the Director of IT Business and Service Management with a large european investment bank., I was tasked with IT governance and control environment. My main goal was to ensure IT had full visibility of the service it provided to the business.

The goal of the program was to ensure IT managed every business-critical application issue:

  • Restoring an issue while informing the customer about the problem and the business impact of the issue
  • Notifying the customer of application issues and expected mean-time-to-resolution (MTTR)
  • This seems so easy when condensed into 2 sentences, and it really should be in the modern world of IT in an investment bank. However, those who have experienced this it’s anything but and the tasks were more accurately:

  • Knowing all the technical intricacies for every business service
  • Knowing the underlying configuration items (CI’s) that supported each technical service
  • Monitoring the performance and status of every CI
  • Every time the configuration of the IT estate changed I needed to know the impact that this would have on the business service
  • Historically, in an ideal world

    To help with our role, we deployed an application discovery and dependency-mapping tool, which continually monitored the topology of our estate. This tool populated our configuration database (CMDB) with all the changes, and also reconciled them to the approved state of the estate, informing us on any deviations.

    We implemented monitoring tools on all of the CI’s to ensure proper performance. To help us receive proper notifications, we configured the tools to alert us any time there was a performance issue with any of the CI’s — in theory updating the technical and business services. The IT service owner would then confirm the service was restored and create a problem record.

    Once the problem record was created, the IT team would analyze and look for the root cause of the issue and create (or log) and error. This ideal procedure would foster a balanced IT situation within the bank. However, the situation was anything but ideal.

    In principle, this all seems relatively simple but the maintenance and manual control of the environment was unachievable.

  • The CMDB was not updated accurately
  • The alerting system was not continuously integrated
  • The technical service were not updated with any changes
  • Often, the root cause analysis was not confirmed
  • It was unlikely the errors were logged
  • Why was this the case, considering we had (in essence) deployed an out of the box ITSM environment based on an ITIL best practice? Simply put, here’s why:

  • Alerting was based on static thresholds
  • The estate changed rapidly and we couldn’t model the CMBD quickly enough
  • Lack of dynamic baselines resulted in inaccurate alert storms and an impossible root cause analysis
  • Without knowing the root cause, we couldn’t correctly log the errors
  • No changes were made without authenticating the errors
  • How AppDynamics helps IT

    Don’t get me wrong, we weren’t completely incompetent, we still had manual governance and control over the critical business process and service. However, all we had was a state of the art ITSM solution adhering to an ITIL best practice, and we went about our day jobs in pretty much the same way as we had before. Like having a Ferrari sitting in your garage collecting dust.

    So this brings me back to where I am today, working at AppDynamics and a little smarter than I was 10 years ago. With AppDynamics:

  • Monitor business transactions at a code level
  • Provide a continuously updated topology of the business service
  • Receive alerts based on dynamical baselines
  • Using the AppDynamics flow map, update the business and technical services to improve the overall quality
  • Easily see the root cause within the environment
  • Update the problem records in a service management toolset
  • If we had had AppDynamics at the bank our lives would have been much easier and the bank would be performing optimally, instead of the bottleneck and broken flow we had mapped out.

    This is the benefit of next generation application intelligence tools. They make the important measurable, not the measurable important. Please check out my white paper on dynamic ticketing with our integration with ServiceNow here.

    Why BSM Fails to Provide Timely Business Insight

    Business Service Management (BSM) projects have always had a reputation for over promising and under delivering. Most people know BSM as the alleged “manager of managers,” or “single source of truth.” According to the latest ITIL definition, BSM is described as:  “The management of business services delivered to business customers.” Like much of ITIL this description is rather ambiguous.

    Wikipedia however, currently describes BSM’s purpose as facilitating a “cultural change from one which is very technology-focused to a position which understands and focuses on business objectives and benefits.” Nearly every organization I talk to highlights being technology-focussed as one of their biggest challenges, as well as having a desire for greater alignment to business goals. BSM should therefore be the answer everyone is looking for… it’s just a shame BSM has always been such a challenge to deliver.

    Some years ago I worked as a consultant for a small startup which provided BSM software and services. I got to work with many large organizations who all had one common goal: “to make sense of how well IT was supporting their business.” It was a tremendous learning experience for me where I frequently witnessed just how little most organizations really understand the impact major IT events had on their business. For example, I remember working with a major European telco who would have an exec review meeting on the 15th calendar day in the month, to review the previous months’ IT performance. The meeting was held on this date because it was taking 4 people 2 weeks to collate all the information and crunch them into a “mega-spreadsheet.” That’s 40 man days effort to report on the previous 30 day period!

    As organizations continue to collect an increasing amount of data from a growing list of data sources, more and more organizations I talk to are looking for solutions to manage this type of “information-fogginess,” but are skeptical about undertaking large scale BSM projects due to the implementation timescale and overall cost.

    Implementing BSM:

    I’m sure the person who first coined the term “scope creep” must have been involved in implementing BSM, as most BSM projects have a nasty habit of growing arms and legs during the implementation phase. I dread to think how many BSM projects have actually provided a return on their substantial investments.

    BSM has always been a heavily services-led undertaking as it is attempting to uniquely model and report on an organization. No two organisations are structured in quite the same way; each having its own unique IT architecture, operating model, tools, challenges and business goals. This is why BSM projects almost always begin with a team of consultants conducting lots of interviews.

    Let’s look at cost of implementation for a typical deployment such as the European Telco example I described earlier. This type of project could easily expect 100 – 200 man days of professional services in order to deliver. Factoring in software license costs, training, support & maintenance costs, the project needs to deliver a pretty substantial return in order to justify the spend:

    Cost of BSM implementation:

    Professional services

    (100-200 days @ $1800 per day)

    $180,000 – $360,000

    Software license

    $200,000 -$500,000

    Annual support and maintenance

    $40,000  – $100,000

    Training

    $25,000

    TOTAL

    $445k – $985k

    Now if we compare to the pre-existing cost of manually producing the monthly analysis:

    Existing manual process costs:

    Days per month creating reports

    10

    Number of people

    4

    Total man days effort per year

    480 days

    Average annual salary

    $45,000

    Total working days per year

    225

    Annual cost to generate reports

    $96,000

    Even with our most conservative estimates it would take almost 5 years before this organization would see a return on their investment by which time things will probably have changed sufficiently to require a bunch of additional service days in order to update the BSM implementation. This high cost of implementation is one reason why there is such a reluctance to take the leap of faith needed to implement such technologies.

    The most successful BSM implementations I am aware of have typically been the smaller projects, which are primarily focused around data visualization; but with powerful open-source reporting tools such as graphite, graphiti or plotly available for free, I wonder if BSM still has a place even with these small projects today?

    What does success look like?

    Fundamentally, BSM is about mapping business services to their supporting IT components. However, modern IT environments have become highly distributed, with SOA architectures that have data dispersed across numerous cloud environments and it is just not feasible to map basic 1:1 relationships between business and IT functions any more. This growing complexity can only increase the amount of time and money it takes to complete a traditional BSM implementation. A simplified, more achievable approach is needed in order to fulfil the need to provide meaningful business insight from today’s complex IT environments.

    In 2011 Netscape founder Mark Andreessen famously described how today’s businesses depend so heavily on applications when he wrote “software is eating the world”. These applications are built with the purpose of supporting whatever the individual business goals are. It seems logical then that organizations should look into the heart of these applications to get a true understanding of how well the business is functioning.

    In a previous post I described how this can be achieved using AppDynamics Real-time Business Metrics (RtBM) to enable multiple parts of an IT organisations to access business metrics from within these applications. By instrumenting the key information points within your application code and gathering business metrics in real time such as the number of orders being placed, the amount of revenue per transaction, and more, AppDynamics can enable everyone in your organization to focus on the success or failure of the most important business metrics.

    These are very similar goals to those of a traditional BSM project, however in stark contrast to every BSM project I have ever heard of; AppDynamics can be deployed in under an hour, without the need for any costly services as detailed in a previous blog post introducing Real-time Business Metrics.

    Instead of interviewing dozens of people, architecting and building complex dependency models, gathering data and analyzing it all to make sense of what is happening, AppDynamics Real time Business Metrics focuses on the key metrics which matter to your business, providing focus and a common measurement for success across IT and The Business.

    So before you embark on a long and costly BSM project to understand what is happening in your business, why not download a free trial of AppDynamics and see for yourself; there is an easier way!

    DevOps Is No Replacement for Ops

    DevOps is gaining traction according to a study at the end of 2012 by Puppet Labs. Their study concluded that DevOps adoption within companies grew by 26% year over year. DevOps is still misunderstood and has tremendous room for greater adoption still but let’s be clear about one very important thing: DevOps is not a replacement for operations!

    If you’re not as well versed in the term DevOps as you want to be I suggest you read my “DevOps Scares Me” series.

    Worst Developer on the Planet

    The real question in my mind is where do you draw the line between dev and ops. Before you get upset and start yelling that dev and ops should collaborate and not have lines between them let me explain. If developers are to take on more aspects of operations, and operations personnel are to work more closely with developers, what are the duties that can be shared and what are the duties that must be held separate?

    i-will-not-write-any-more-bad-code

    To illustrate this point I’ll use myself as an example. I am a battle hardened operations guy. I know how get servers racked, stacked, cabled, loaded, partitioned, secured, tuned, monitored, etc… What I don’t know how to do is write code. Simply put, I’m probably the worst developer on the planet. I would never trust myself to write code in any language to run a business. Now in this particular case it comes down to the fact that I simply don’t currently posses the proper skills to be a developer but that’s exactly my point. In order for DevOps to succeed there needs to be a clear delineation of duties where expertise can be developed and applied.

    Experts Needed

    Operations is a mix of implementing standard processes, planning for the future, and sometimes working feverishly to try and solve problems that are impacting the business. Just like writing good code, providing good operational support takes experience. There are many areas where you need to develop expertise and you shouldn’t just start doing operations stuff unless you have some experience and a peer review of your work plan.

    the-expert

    You do have a work plan don’t you? A work plan could be something as formal as a change control (ITIL would be good to read about) or something as informal as a set of steps written on a bar napkin (hopefully not, but it’s possible). The point is that you need a plan and you need someone to review your plan to see if it makes sense. Here are some of the things you need to consider when creating your plan:

    • Implementation process – What are you going to actually do?
    • Test plan – How will you be able to tell if everything is working properly after you implement your change?
    • Backout process – How do you revert back if things don’t go right?
    • Impacted application – What application are you actually working on?
    • Dependent application(s) – What applications are dependent upon the application you are changing?

    The point I’m trying to make here is that DevOps does not change the fact that well defined operations processes and practices are what keep companies IT departments working smoothly. Adopting DevOps doesn’t mean that you can revert back to the days of the wild west and just run around making changes however you see fit. If you are going to adopt DevOps as a philosophy within your organization you need to blend the best of what each practice has to offer and figure out how to combine the practices in a manner that best meets your organizations requirements and goals.

    Bad operations decisions can cost companies a lot of money and significant reputation damage in a short period of time. Bad code written by inexperienced developers can have the same effect but can be detected before causing impact. Did someone just release new code for your application to production without testing it properly? Find the application problems and bad code before they find you by trying AppDynamics for free today.