How Anti-Patterns Can Stifle Microservices Adoption in the Enterprise

In my last article, Microservice Patterns That Help Large Enterprises Speed Development, Deployment and Extension, we went over some deployment and communication patterns that help keep microservices manageable as you use more of them. I also promised that in my next post, I’d get into how microservice patterns can become toxic, create more overhead and unreliability, and become an unmanageable mess. So let’s dig in.

First things first: Patterns are awesome.

They help formalize ideas into reusable chunks that you can distribute and communicate easily to your teams. Here are some useful things that patterns do for engineering departments of any size:

  • Make ideas distributable

  • Lower the barriers to success for junior members

  • Create building blocks

  • Create consistency in disparate/complex systems

Patterns are usually an intentional act. When we put patterns in place, we’re making a clear choice and acting on a decision to make things less painful. But not all patterns are helpful. In the case of the anti-pattern, it has the potential to create more trouble for the engineering team and the business.

The Anatomy of An Anti-Pattern

Like patterns, anti-patterns tend to be identifiable and repeatable.

Anti-patterns are rarely intentional, and usually you identify them long after their effects are visible. Individuals in your organization often make well-meaning (if poor) choices in the pursuit of faster delivery, rushed deadlines, and so on. These anti-patterns are often perpetuated by other employees who decide, “Well, this must be how it’s done here.”

In this way, anti-patterns can become a very dangerous norm.

For companies that want to migrate their architecture to microservices, anti-patterns are a serious obstacle to success. That’s why I’d like to share with you a few common anti-patterns I’ve seen repeated many times in companies making the switch to microservices. These moves eventually compromised their progress and created more of the problems they were trying to avoid.

Data Taffy

The first anti-pattern is the most common—and the most subtle—in the chaos and damage it causes.

The Problem

The data taffy anti-pattern can manifest in a few different ways, but the short explanation is that it occurs when all services have full access to all objects in the database.

That doesn’t sound so bad, right?

You want to be able to create complex queries and complex data-ingestion scenarios to go across many domains. So at first glance, it makes sense to have everything call what it needs directly from the database. But that’s a problem when you need to scale an individual domain in your application. Rarely does data grow uniformly across all domains, but rather does so in bursts on individual domains. Sometimes it’s very difficult to predict which domains will grow the fastest. The entangled data becomes a lot like taffy: difficult to pull apart. It stretches and gets stuck in the cogs of business.

In this scenario, companies will have lots of stored procedures, embedded complex queries across many services, and object relationship managers all accessing the database—each with its own understanding of how a domain is “supposed” to be used. This nearly always leads to data contamination and performance issues.

But there are even bigger challenges, most notably when you need to make structural changes to your database.

Here’s an example based on a real-life experience I had with a large, privately owned company, one that started small and expanded rapidly to service tens of thousands of clients. Say you have a table whose primary key is an int starting at 0. You have 2,147,483,647 objects before you’ll run out of keys—no big deal, right?

So you start building out services, and this table becomes a cornerstone object in your application that every other domain touches in some meaningful way. Before you know it, there are 125 applications calling into this table from queries or stored procedures, totaling some 13,000 references to the table. Because this is a core table it gets a ton of data. Soon you’re at 2,100,000,000 objects with 10,000,000 new records being added daily.

You have four days before things go bad—real bad.

You try adding negative values to buy time, not realizing that half the services have hard-coded rules that IDs must be greater than 0. So you bite the bullet and manually scrub through EVERY SERVICE to find every instance of every object that has been created that uses this data, and then update the type from an integer to a large integer. You then have to update several other tables, objects, and stored procedures with foreign key relationships. This becomes a hugely painful effort with all hands on deck frantically trying to keep the company’s flagship product from being DOA.

Clearly, not an ideal scenario for any company.

Now if this domain was contained behind a single service, you’d know exactly how the table is being used and could find creative solutions to maintain backwards compatibility. After all, you can do a lot with code that you simply can’t do by changing a data value in a database. For instance, in the example above, there are about 800 million available IDs that could be reclaimed and mapped for new adds, which would buy enough time for a long-term plan that doesn’t require a frantic, all-hands-on-deck approach. This could even be combined with a two-key system based on a secondary value used to partition the data effectively. In addition, there’s one partitionable field we could use to give us 10,000x more available integers, as well as a five-year window to create more permanent solves with no changes to any consuming services.

This is just one anecdote, but I have seen this problem consistently halt scale strategies for companies at crucial times of growth. So try to avoid this anti-pattern.

How to Solve

To solve the data taffy problem, you must isolate data to specific domains only accessible via services designed to service them. The data may start on the same database but use schema and access policy to limit access to a single service. This enables you to change databases, create partitions, or move to entirely new data storage systems without any other service or system having to know or care.

Dependency Disorder

Say you’ve switched to microservices, but deployments are taking longer than ever. You wish you had never tried to break down the monolith. If this sounds familiar, you may be suffering from dependency disorder.

The Problem

Dependency disorder is one of the easiest anti-patterns to detect. If you have to know the exact order that services must be deployed to keep them from failing, it’s a clear signal the dependencies have become nested in a way that won’t scale well. Dependency disorder generally comes from domains calling sideways from one domain’s stack to another (instead of down the stack from the UI to the gateway) and then to the services that enable the gateway. Another big problem resulting from dependency disorder: unknown execution paths that take arbitrarily long times to execute.

How to Solve

An APM solution is a great starting point for resolving dependency disorder problems. Try to utilize a solution that provides a complete topology of your service execution paths. By leveraging these maps, you can make precision cuts in the call chain and refocus gateways to make fan-out calls that execute asynchronously rather than doing sideways calls. For some examples of helpful patterns, check out part one of this series. Ideally, we want to avoid service-to-service calls that create a deep and unmanageable call stack and favor a wider set of calls from the gateway.

Microlith

Microliths basically are well-meaning, clear-service paths that take dependency disorder to its maximum entropic state.

The Problem

Imagine having a really well-designed service, database and gateway implementation that you decide to isolate into a container—you feel great! You have a neat-and-tidy set of rules for how data gets stored, managed and scaled.

Believing you’ve reached microservice nirvana, you breathe a sigh of relief and wait for the accolades. Then you notice the releases gradually start taking longer and longer, and that data-coupling is happening in weird ways. You also find yourself deploying nearly the entire suite of services with every deployment, causing testing issues and delays. More and more trouble tickets are coming in each quarter, and before you know it, the organization is ready to scrap microservices altogether.

The promise of microservices is that there are no rules—you just put out whatever you want. The problem is that without a clear definition of how data flows down the stack, you’re basically creating a hybrid problem between data taffy and dependency disorder.

How to Solve

The mediation process here is effectively the same as with the dependency disorder. If you are working with a full-blown microlith, however, it will take some diligence to get back to stable footing. The best advice I can give is, try to get to a point where you can deploy a commit as soon as it’s in. If your automation and dependency orders are well-aligned, new service features should always be ready to roll out as soon as the developer commits to the code base. Don’t stand on formality. If this process is painful, do it more. Smooth out your automated testing and deployment so that you can reliably get commits deployed to production with no downtime.

Final Thoughts

I hope this gets the wheels spinning in your head about some of the microservices challenges you may be having now or setting yourself up for in the future. This information is all based on my personal experiences, as well as my daily conversations with others in the industry who think about these problems. I’d love to hear your input, too. Reach out via email, chase.aucoin@appdynamics.com; Twitter, https://twitter.com/ChaseAucoin; or LinkedIn, https://www.linkedin.com/in/chaseaucoin/. If you’ve got some other thoughts, I’d love to hear from you!

Microservice Patterns That Help Large Enterprises Speed Development, Deployment and Extension

This is the first in a two-part series on microservice patterns and anti-patterns. In this article, we’ll focus on some useful patterns that, when leveraged, can speed up development, deployment, and extension. In the next article, we’ll focus on how microservice patterns can become toxic, create more overhead and unreliability, and become an unmanageable mess.

Microservice patterns assume a large-scale enterprise-style environment. If you’re working with only a small set of services (1 to 5) you won’t feel the positive impact as strongly as organizations with 10, 100, or 1000+ services. My biggest pattern for startups and smaller projects is to not overthink and add complexity for complexity’s sake. Patterns are meant to aid in solving problems—they are not a hammer with which to bludgeon the engineering team. So use them with this in mind.

Open for Extension, Closed for Modification

We’re going to start our patterns talk with a principle rather than a pattern. Software development teams working on microservices get more mileage out of this one principle than most any other pattern or principle. It’s a classic pattern from the SOLID principles of Robert C. Martin (Uncle Bob).

In short, being open for extension and closed for modification means leaving your code open to add new functionality via inheritance but closed for direct modifications. I take a bit of a looser definition that tends to be more pragmatic. My definition is, “Don’t break existing contracts.” It’s fine to add methods but don’t change the signature of existing methods. Nor should you change the functionality of an established method.

Why is This Pattern So Powerful in Microservices?

When you have disparate teams working on different services that have to interoperate with one another, you need a certain level of reliability. I, as a consumer of a service, need to be able to depend on the service in the future, even as new features are added.

How Do We Manifest this Pattern?

Easy. Don’t break your contracts. There’s never a good reason to break an existing production contract. However, there could be lots of good reasons to add to it, thereby making it “open for extension and closed for modification.” For example, if you have to start collecting new data as part of a service, add a new endpoint and set a timeline to depreciate the old service call, but don’t do both in a single step. Likewise with data management: If you need to rename a column of data, just add a new column and leave the old column empty for awhile. When you depreciate an old service, it’s a good time to do any clean-up that goes with that depreciation. If you can adhere to this principle, everyone in your organization with have a better development experience.

Pattern: Enterprise Services with SPA Gateways

When we start building out large applications and moving towards a microservice paradigm, the issue of manageability quickly rises to the surface. We have to address manageability at many layers, and also must consider dependency management. Microservices can quickly become a glued-together mess with tightly coupled dependencies. Instead of having a monolithic “big ball of mud,” we create a “mudslide.”

One way to address these problems is to introduce the notion of Enterprise Domain Services that are responsible for the tasks within different domains in your organization, and then combine that domain-specific logic into more meaningful activities (i.e., product features) at the Single Page Application (SPA) gateway layer. The SPA gateway serves to take some subset of the overall functionality of an application (i.e., a single page worth) and codify that functionality, delegating the “hard parts” (persistence, state management, third-party calls, etc.) off to the associative enterprise services. In this pattern, each enterprise service either owns its own data as a single database, a collection of databases, or as an owned schema as part of a larger enterprise database.

Pattern: SPA Services with Gateway and ETL

Now we are going to ramp up the complexity a bit. One of the big questions people get into when they start down the microservices path is, “How do I join complex data?” In the Enterprise Services with SPA gateways example above, you would just call into multiple services. This is fine when you’re combining two to three points of data, but what about when you need really in-depth questions answered? How do you find, for instance, all the demographic data for one region’s customers who had invoices for the green version of an item in the second quarter of the month?

This question isn’t incredibly difficult if you have all the data together in a single database. But then you might start violating single responsibility principles pretty fast. The goal here then is to delegate that responsibility to a service that’s really good at just joining data via ETL (Extract, Transform, Load). ETL is a pattern for data warehousing where you extract data from disparate data sources, transform the data into something meaningful to the business, and load the transformed data somewhere else for utilization. The team that owns the domain that will be asking these types of demographic questions will be responsible for the care and feeding of services that perform the ETL, the database or schema where the transformed data is stored, and the services(s) that provide access to it.

Why Not Just Make a Multi-Domain Call at the Database?

This is a fair question, and on a small project it may be reasonable to do so. But on large projects with lots of moving parts, each part must be able to move independently. If we are combining directly at the DB level, we’re pretty much guaranteeing that the data will only ever travel together on that single DB, which is no big deal with small volumes of data. However, once we start dealing with tens, hundreds or thousands of terabytes, this becomes more of a big deal as it greatly impacts the way we scale domains independently. Using ETLs and data warehousing strategies to provide an abstraction layer on the movement and combination of our data might require us to update our ETL if we move the data around. But this feat is much more manageable than trying to untangle thousands of nested, stored procedures across every domain.

Closing Thoughts

Remember, these are just some of the available patterns. Your goal here is to solve problems, not create more problems. If a particular pattern isn’t working well for your purpose, it’s okay to create your own, or mix and match.

One of the easiest ways to get a handle on large projects with lots of domains is to use tools like AppDynamics with automatic mapping functionality to get a better understanding of the dependency graph. This will help you sort out the tangled mess of wires.

Remember, the best way to eat an elephant is one bite at a time.

In my next blog, we’ll look at some common anti-patterns, which at first may seem like really good ideas. But anti-patterns can cause problems pretty quickly and make it difficult to scale and maintain your projects.

Common Application Problems and How to Fix Them: The Select N + 1 Problem

At AppDynamics we get the opportunity to see the inner workings of a lot of applications. While these applications all seem pretty different to their end users, what’s under the hood usually doesn’t vary that much (sorry, your app isn’t a unique snowflake after all). They all have similar service oriented architectures using a variety of databases, caches, and queues. This holds true for the performance issues these apps experience and the antipatterns that cause them. The same problems show up in high-speed trading applications, e-commerce sites, mobile apps and online games – so we thought we’d put together some of the most common performance problems in a blog series to show you how to find, fix and prevent them. In this blog post we’ll take a look at a pretty common problem that can be tricky to detect in a large application: the Select N + 1 problem.

What is it?

The N + 1 problem is a performance anti-pattern in which an application makes N + 1 database calls (where N is the number of objects fetched). Like most antipatterns, this isn’t necessarily a problem in itself, but under certain circumstances (where N is large, for example) it will cause performance to degrade by making hundreds or even thousands of database calls for a single business transaction.

In plain English: You’re spamming your database with really small, fast queries instead of using one or two more complex ones.

Here’s how it usually goes down: You have two database tables with a parent/child relationship (like blogs and posts, or products and line items), and you want to iterate through all of them. So you do this:

SELECT id FROM Parent

and then executing a query for each record:

SELECT * FROM Child WHERE parent_id = ?

This isn’t necessarily a bad way of doing it, especially if there aren’t many parents or children. But what if you’re a giant e-commerce company with thousands of products and line items? Suddenly each transaction is calling the database thousands of times. Even if each database call is super fast, the cumulative response time of that transaction will be seconds (if not longer). Which is not an ideal situation. The problem only becomes worse with increase in traffic.

How to find it with AppDynamics

Like I said, this isn’t always a problem, but when it is it can be hard to find. The database will probably look normal to the DBA – they won’t see any long-running queries, and CPU may look fine under normal load. The best way to find a problem like this is using a performance monitoring solution that allows you to drill into a particular Business Transaction (user request) and see what code is executed and how it is accessing the database.

Here’s an example from an AppDynamics customer:

The best part about AppDynamics is its request snapshots. They not only allow us to troubleshoot performance problems faster – they also allow us to perfect his code. “AppDynamics allows us to see what is going on and identify the issues that could be refactored and made faster,” he said. “It lets us bridge the gap between anecdotes from users and actual, actionable information.” – Cornell University

AppDynamics Database

How to fix it

This problem happens most often when you’re using a persistence engine or an object/relational mapper (ORM) like Hibernate and you’re using lazy loading. Be sure to understand the defaults for any Object-relational mapper before you begin using them.

If you’re just writing raw SQL, you may want to fetch all your data at once and then join the two sets of records, like this:


SELECT p.id, p.name FROM Parent p
LEFT OUTER JOIN child c ON p.id = c.parent_id
INNER JOIN grandchild g ON c.id = g.parent_id

Optimizing database access

Many times how an application access the database will be a focus of optimization. Stay tuned for our next blog post in the series we will discuss the importance of caching database access. If you enjoyed this blog post, checkout our ebook on java performance problems.

Find out more about AppDynamics Pro and get started optimizing your application with a free 15 day trial.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Thoughts? Let us know on Twitter @AppDynamics!