After adopting microservice architecture, whether from the ground up or after splitting a monolith into microservices, many engineers are wondering what’s next. Susan Fowler (author of Production-Ready Microservices) introduces some of the challenges that microservice architecture presents, and then argues that architectural, operational, and organizational standardization is necessary to mitigate the tradeoffs that come with microservices.
Austin: Coming up next, we’ve got Susan Fowler, who worked at CERN for a little while which, I think is cool. She was at Uber as an SRE and now is at Stripe doing stuff that she’s not going to tell anybody about apparently, which sucks. She also wrote, Production Ready Microservices which O’Reilly has published. She’s going to talk about microservice standardization. Go ahead.
All right. So, I’m sorry I’m the last thing standing between everyone and beer and hanging out, but I’m going to try to make this as exciting as possible. If you’re sick of microservices, you’re going to love this talk. If you’re not sick of microservices, you’re still going to love this talk. So, I feel like it’s a win win for everyone. So, I’ll go ahead and get started.
A little bit about me … I started off thinking I was going to do Particle Physics forever. I didn’t ever expect to be talking to anyone about software infrastructure. Turns out there are no jobs in physics, at all. I worked on CERN, on Atlas experiment, on the CMS experiment for a little while. I did some hardware, did some software, did some data analysis and then left that for the tech world. I worked at two small startups before joining Uber about a year and four months ago, I believe.
While I was at Uber, I got to work on a really really crazy project. Basically when I got there they had around 1000 microservices and it was kind of a mess, to be honest. They had some really high priority microservices that were doing really well. The had a lot attention from RBSRE’s. They were really available, really reliable, really stable. But then there were all these other teams and there just weren’t enough SRE’s to go around to dedicate to those teams.
What happened was a small little task force was formed called the SRE Consulting Team. I was one of the founding members of it. We were given this directive from our High and they said, All right , your job is to find a way to make all the other Uber microservices scalable and stable and reliable. That was, at the time about 850 microservices that needed that. It quickly rose to about 1800 by the time this whole process was done. I’ll get a little into that later.
I’m no longer Uber. I’m now at Stripe working on some special projects that soon I will be announcing and you will all think are really cool. I also wrote a book, as Austin pointed out, called Production Ready Microservices. The book is about everything that I’m going to talk about today except it goes into way more detail. Sadly, I can’t go into all the implementation details which are the really cool details that you probably all want to know. However, you do not have to spend a bunch of money buying my book if you don’t want to of if the cost is prohibitive. There is a free version summary of the book called, Microservices in Production that you can find if you Google Microservices in Production or look me up online. So, check that out.
All right. We’re going to start off talking about six things that every microservice organization is going to run into at scale. You might start seeing these when you have a very small organization if you have only a few microservices. But, when you start to get to something like 100, 1000, 2000 or more, you’re definitely going to run into these problems.
First challenge: Organizational siloing and sprawl. I know we just talked about Conway’s Law and we talked about Inverse Conway’s Law which is basically that the organizational structure of the company is going to mirror it’s architecture. So, if you have a monolith, Inverse Conway’s Law says that your communication structure and your organizational structure is going to match that. If you have a bunch of microservices … so say that you have 1000 microservices each with these development teams and no one really knows what’s going on in all the teams … the organizational structure is going to be basically the same as the architecture.
So what happens is that you get all these different microservices teams that are all very siloed. You get a lot of technical sprawl because nobody knows what everyone else is doing. No one knows what the best practices are for pretty much anything. They don’t know what the best practices are for let’s say monitoring. How should they be monitoring things? How should they be deploying? How should they be dealing with development? How should they be doing tests? Unless all these things are standardized, it’s kind of a free for all.
Microservice developers and development teams become just like microservice. They become really good at doing one thing. They get really good at doing whatever is related to their microservice and nothing else.
If a developer wants to transfer to another team, if there are 1000 other teams or even just a few hundred other teams, the skills that they developed in that team that they were on might not necessarily translate. One of the things that I found in talking with these developers on these teams was they would go to another team and say, Yeah, I feel like I’m at a totally different company. This is not the same company. That shouldn’t happen, right? So you get all these communication problems.
Another challenge is that, like I was saying earlier, when you have all of these development teams and all of these microservices, it’s absolutely impossible to staff every single microservice with a bunch of operational engineers who can take over all of the maintenance and the running of the application and then a bunch of developers who are only working on features. That’s impossible. What a bloated organization you can come up with if you have that many services. If you’re moving really fast and you have a lot of development all the time, new features, operations engineers aren’t going to have any time to really understand the services. So, if you have developers churning out new features then most of the time the operations engineers who are trying to run and maintain everything will have to spend most of their time trying to catch up and figure out what’s going on.
You end up with development teams that need to shoulder all the operational tasks, something that a lot of them really aren’t used to right? Most developers don’t know how to deal with configuration management. Most developers don’t know, necessarily, how to design a deployment pipeline. Not all developers know how to write really intense load balancing and integration tests and end to end testing.
The second challenge is that when you’re dealing with microservices at scale, you have a lot more ways to fail. If we know anything about distributed systems, we know that they can fail in pretty much every way possible, and they will fail. If there is a way the system can fail, it will fail at some point. Especially when you reach hundreds of services, or thousands every single microservice becomes a point of failure.
Of course, the goal is that it won’t be. That’s sort of the lie we tell ourselves when we adopt microservices is that, Oh it’s a black box. Oh, we don’t have to deal with it. Oh everything is very isolated we can treat these things as third party services sometimes. It’s a point of failure really, right? So, you have to find a way to overcome that … to make sure that microservices themselves are not points of failure.
Challenge number three is competition for resources. I like to think about organizations that have adopted microservice architecture as ecosystems. They’re really really complicated. They’re really really delicate and they’re just like any other ecosystem in the natural world. Hardware resources are really scarce. Engineering resources are scarce. It’s really hard to prioritize when you have hundreds of services, thousands of services … who is really important? Who’s the one team that should get this extra hardware? How should we prioritize these things? Who should get head count? None of these things are free. These things are really expensive. Engineers are expensive. Hardware is expensive. It’s really difficult to scale. You can’t just throw unlimited hardware. You can’t throw unlimited engineers at the problem. You might try at first because it seems easy when you have just a few microservices, and by a few I mean a few dozen, a few hundred. When it gets more than that it’s just not tenable anymore.
The fourth challenge is the list of misconceptions of microservices that so many of the developers in an organization will have. This is not just developers, it’s primarily developers, some managers too. These can get really dangerous. If you just adopt microservices all of a sudden or if you bring people in who have never quite worked on microservices but have heard a lot about it or read a lot about it, they have these misconceptions that I’ve labeled here.
A lot of people think that microservices are the wild west. You can do whatever you want, just get the job done and do it however you want. That’s what a lot of people think. They think that this means free reign over architecture decisions. It doesn’t matter, right? You’re so isolated as long as other services can depend on you. As long as you make sure that you fulfill that whatever specifications you were told you needed to fulfill then you’re fine.
Another myth is freedom to choose any programming language. I hear it all the time. People say, Oh, you write about microservices. Or Oh, you worked on microservices. Or Oh, we’re adopting microservices that means I can write everything in Haskell. No. That’s not how this works. Anyone who wonders why that is a myth is because it’s fine to write anything in whatever language you want you’re the one that’s going to write all the libraries for it but when you start having thousands of services and you need to have a centralized application platform, and make sure that everyone is using the same libraries, everyone is up to the same standards … every single language costs a lot. It costs a lot of engineering hours. It costs a lot of engineering resources. You need to reign that in. There’s no such thing as use any programming language and be effective and efficient.
Freedom to choose any database. Another big myth … usually when people bring this up to me I say, Okay, who’s maintaining it? If it’s the microservice team, there are very few people that have expertise in the kind of fancy databases that teams often want to use. A really good example is this is, a team will say, Oh, well we heard Cassandra is really amazing. We’re going to use Cassandra. Okay great, we don’t use Cassandra. If you have one of our managed databases, we don’t use that so, you’re going to have to maintain that on your own. To which they say, Well, this is supposed to be microservices we’re supposed to use whatever we want. It doesn’t work like that.
Microservices are a silver bullet. There’s all this talk about how, Oh, microservices will save us for everything. We can be scalable now. We can be super efficient. We can have crazy developer velocity. Not necessarily true. You can get a lot of benefits from microservices but I think that the wrong way to look at this would be to say, We’re having trouble with our system, let’s switch to microservice. Instead it’s, Oh, we’ve reached scalability now it’s time to switch to microservices. It’s a step in the evolution. It’s not some big prize or a way out of engineering challenges.
So, basically all of this can be summarized I think … although the one on the bottom which I’m going to read out … is that adopting microservices means that any developers can build any services that does one thing extraordinarily well, they can do whatever they need or want to do to build it as long as it gets the job done. That summarizes all of the big misconceptions that organizations kind of run into as soon as they start working with microservices.
Number five, technical sprawl and technical debt. When you have all these different teams when you’re using microservices at scale, everyone is using their favorite tools. Everyone starts to deploy with custom scripts. Everyone builds custom infrastructure, pretty scary. There are a thousand ways to do every single thing. Anytime there’s a new service, anytime there’s a new team, they define them, from scratch most of the time how they’re going to do everything. This isn’t scalable.
What happens is that you end up with technical sprawl which is pretty much all the different ways to do everything scattered all the way around. Nobody knows what anyone else is doing so you don’t know if your dependencies are deploying correctly. You don’t know how they’re monitoring. You don’t necessarily know that they’re reporting they’re availability right. You get a lot of technical debt because people will move between teams or new services will be built to replace old services and that stuff is still sitting somewhere running. There are scripts somewhere that are running on some chron jobs on some box, God knows what they’re doing, and no one knows. So, there’s no ever, any team that goes and cleans that stuff up. No one wants to do that. Everyone wants to be the cool new development team that’s going to build these new features, build these new services. You get thousands of services. A bunch of them are running. Most of them are maintained. Many of them are forgotten about. Not good.
Here’s our last challenge which is the inherent lack of trust. Imagine … put yourself in the shoes of one of these developers on one of these teams. You have done the best job possible that you can to make sure that your service is as as you can make it. Your team is really close knit. You’ve tried to figure out all the best practices for how to build a microservice, how to make sure it’s stable, how to make sure it’s reliable. But, you know that you can’t necessarily trust that all your dependencies are going to be doing the same thing. You don’t know that your dependencies are going to be able to give you the data that you want when you need it, in the time that you need it. You start to get scared.
Let’s say that it goes even a little bit below that. You started worrying about your infrastructure. Maybe you’re monitoring, or you’re data store … whatever teams are maintaining that on the infrastructure side, they don’t seem to be listening to you that much or they don’t seem that transparent. You just don’t know what everyone is doing. Everyone is just so spread out. So you say, Okay, we’re going to just build our own infrastructure right? If we can’t trust the infrastructure that everyone else is using because we don’t know what’s going on over there. We don’t know why they’re making these decisions. Then, we’re going to just build our own. That’s really dangerous. It’s really really dangerous but it happens a lot. It happens at a lot of companies. I’ve heard horror stories about it happening.
A way to think about this is, you have all these microservices right? And they live in these very complex dependency chains. I have some diagrams that I’ll show you in a second that will illustrate this. They are completely reliant on each other. Anyone that says that microservices are like very independent and isolated is kind of a myth. The teams get really isolated from each other but the microservices them selves don’t. You get these complex dependency chains. There’s no way to know that the dependencies are reliable. There’s no way to know that clients won’t compromise your own microservice. You don’t know that they’re one day going to do surprise load testing in production and take your service out.
There’s no trust at the organizational level. No trust at the team level. No trust across teams. There’s no way of knowing that microservices can be trusted with production traffic, which is the real issue. The real issue is that you have production traffic coming in that, externally, customers are relying upon. They need it to work. They need all these complex services and these crazy chains to work. But, most of the time, you have no way of knowing that all the services that are running can actually be trusted with production traffic. They’re not production ready. That’s where this concept of production readiness comes in.
There is a way around these challenges. It’s standardizing these services at a crazy scale. The reality is that microservices are not isolated systems at all. The truth is that no microservice or set of microservices should ever compromise the integrity of the overall product or system. You should never have to worry that somewhere in those complex dependency chains somebody isn’t following best practices and therefore they’re going to knock down all the dominoes.
Here’s the diagram that I made. When I was trying to figure out how to think about all this, I realized that there was a really easy way to separate out the different layers of the microservice ecosystem. Let’s start from the bottom. We have the hardware layer. We like to abstract and make layer models of things. This has all of your servers, all of your databases, your resource abstraction, resource isolation, configuration management, all the host level monitoring, all the host level logging and things on that level.
Above that you have communication. You have all these hosts and then you have the communication layer which is like the web on which they talk to one another. That’s going to be network. That’s going to be DNS. That’s going to be RPC, endpoints, messaging, service discovery, service registry, load balancing. On top of that you have the platform that needs to stand between the hardware, the communication, and all of the microservices on top. I call that the application platform. That’s all your self service development tools. That’s your dev environments, your testing, your building, all the packaging, release, your deployment pipelines, your application logging, your application level monitoring, everything that’s standing between those.
On the top you have the microservices which is only the microservices and the microservice specific configurations. Now, notice that I did separate these out really purposefully because the microservices … that’s what the microservice development teams need to work on. Everything else needs to be abstracted away, first of all if microservice architecture is going to be successful. If you have microservice development teams who are writing their own stuff that’s on the communication or the application platform level or even a the hardware level if people are doing their own configuration management, you would get a lot of technical sprawl. A thousand ways to do something and no real accountability across the ecosystem. Basically, you make sure that all this is abstracted away and then the microservice teams only work on the microservices.
Here’s a diagram I made of an example microservice to give an example of what I mean when I talk about dependencies in clients. This is one microservice. Let’s say that you have a thousand microservices. This is one. It has upstream clients that need to get information from it that call it via some API endpoints. It’s got some database that it needs to talk to whether that’s shared or dedicated. It has its own dependencies that it needs to get information from. It possible has a message … if it needs to process tasks and it needs to process a lot of tasks, a client makes a request. And then it needs to go and get information from some dependencies and calculate something then it will send those over to some message broker, some distributed task work or something like that. This is pretty complicated. I mean, it looks simple, but it’s pretty complicated.
There’s a lot going on here. I think the biggest thing to notice is that you have all these dependencies, you have enough sources of failure just in your own microservice, your database and your task workers. But, when you have these dependencies, any one of those failures is going to affect you. Let me give you a really good example of this. Let’s say you want to have four 9’s availability. If dependency A has two 9’s, you’re never going to reach four 9’s. And then think about what dependency A looks like. Dependency A has it’s own little crazy web of complication just like this right? It has it’s own architecture. It has it’s own database probably. It has it’s own dependencies that it needs to get information from. God knows if those are reliable, if those are available, if those are stable, if those are going to scale when you need to scale. It gets really complicated real fast.
Here’s the solution. What I believe is the solution to all of these challenges is that you need to hold all your microservices to high architectural, operational, and organizational standards. A microservice that meets these standards is deemed production ready so that it can be trusted with production traffic. This is really high level so I’m going to go into how we do this and what it actually looks like in practice.
There are two approaches we can take. The first is local standardization. This is actually something that I’ve seen at a lot of companies that have started doing this. It’s really hard to do when you’re just doing it locally but, they determine standards on a microservice by microservice basis. They treat it as an isolated system. They say, All right , what are the best requirements? What are the best practices? How can we make this the best possible service? The problem is that you don’t get this really organizational and team trust. It does add to techniCAL sprawl and technical debt because then you have a whole bunch of teams that are running around with various ideas about what they believe production readiness is. It’s not scalable because it takes a lot of effort and time to come up with your own set of production readiness standards and you still don’t know if all the services you depend on are production ready.
The second one is global standardization. This is, you determine standards at a very high level that apply to every single microservice at the company. This is one thing that I did at Uber. Literally came up with standards that applied to thousands of microservices. You have to make them really general because they need to apply to every single service but then need to be specific enough that they’re quantifiable and produce measurable results. So, what do I mean by this? I mean that if you decide that you need to build a specific kind of deployment pipeline for every single microservice then the only way that you can make it a production readiness standard is if it actually made the service better in some very quantifiable, measurable way. You would have to be able to take this structure of a deployment pipeline and then every single microservice that used it, you would have to see a dramatic increase in availability, reliability, something along those lines.
The problem is that it’s very hard to determine from scratch what appropriate standards are. You can grasp out of thin air, you can look at books, it’s pretty difficult. It’s really really hard to figure out standards that apply to all of microservices and that actually make a difference.
So, oh, it looks like I’m missing a … No, I’m missing a slide. Well, I know what I’m going to say so, sorry it won’t be up here. Basically, the way to approach it then, because it’s really hard to come up with standards is that you have to think of a goal. So, the goal that works really well, especially for microservice ecosystems at scale is availability. That’s something that when you’re at any large company when you’re dealing with any large system that’s a really good way to measure success. You say, Well, what’s our availability? It’s really high level. It’s really kind of useless for direction. If you sit down with a developer and you say, Hey, go make your service more available. They’ll be like Huh? Okay. How do I do that?
It’s not a production readiness standard, but it’s a goal. But then you think, Okay, so we want to have availability. Now, what do we need to do to get there? Well, then we can come up with some standards. The standards that we have are stability, reliability, scalability, performance, fault tolerance, catastrophe preparedness, monitoring, and documentation. I’m going to go through these and give some quick examples. I’m not sure how I’m doing on time. All right, We’re not doing too bad.
For stability and reliability here are the guiding principles. We get increased developer velocity with microservices so there are more changes, more deployments, more instability. Stability allows us to reach availability by giving us ways to responsibly handle changes to microservice. When we have a stable, reliable microservice, we know that at a very abstract level, that that means that we know that even when it’s changing, even when it’s deploying, even when it’s being developed, we can still trust it. We know it’s not going to go down. We know it’s not going to break and destroy our own services. Right?
A reliable microservice is one that can be trusted by its clients, its dependencies, and by the ecosystem as a whole. So, stability and reliability are linked. Most stability requirements are going to have an accompanying reliability requirement. An example of this would be a deployment pipeline. I go into this a lot more in the book and in the Microservices in Production ebook that you can get for free. Basically, one of the ways to make sure that you have a stable, reliable microservice is to make sure that you have a stable, reliable deployment pipeline.
What does this mean? It means that you have a deployment pipeline that has several stages so that you can catch any bugs before they hit production. This means that you’ll have something like development, staging, canary, production. So that you know by the time a new build is rolled out that it is not going to compromise any of the services. So, what is that? That would be a very stable and reliable deployment pipeline which would make the microservice, in turn, very stable and reliable.
The next two are scalability and performance. We often think that we get scalability and performance with microservice for free but it’s actually not true when you get to a crazy scale. They need to be able to scale appropriately with increases in traffic. A lot of microservices, honestly, don’t scale. It might a choice of language. Some programming languages are not designed for concurrency, partitioning, and efficiency. I’m not going to name any, because I don’t want to offend anyone by calling out your favorite language, but there are some that you can probably think would come to mind. Microservices that are built like this, that can’t scale with expected growth, they have increased latency. They have poor availability. They have a lot of incidents and outages.
When scalability becomes a problem, it brings down a service. If you can’t handle a large number of requests, then you’re compromising everything in that dependency chain. Scalability and performance are linked in a way that stability and reliability are linked so scalability is how many requests a microservice can handle whereas performance is how well the service can process those tasks. We know what a scalable microservice looks like and then a performant microservice would be one that handles really quickly, processes tasks efficiently, and properly utilizes resources. This would be one that had very good capacity planning, good resource abstraction, resource awareness, knowledge of where the resource bottle necks and limitations are.
Fault tolerance and catastrophe preparedness …Here are the two other standards. These are ones that often come to mind but it’s not exactly clear what needs to be done to make sure that services meet these requirements. We know that microservices live in these really complicated ecosystems and these complex dependency chains. They fail all the time. They fail in every way that they possibly can. Insuring that we have availability … so we have availability as our goal right? So, to insure that we need to make sure that none of the ways that a microservice can fail will actually take down the system. That means we need to know all the failure modes. We need to make sure that we test for these failure modes. We need to make sure that if we can’t architect them away, we at least have backups, we have ways to mitigate any damage that might occur.
An example of how you would make these microservices fault tolerant and catastrophe prepared is by doing extensive resiliency testing. A lot of code testing in the development process, load testing, a lot of chaos testing. Every single failure mode you can think of that you know could affect the service, make sure that you actually push it in production in real time to fail in that way and see how it survives.
Our last two standards are monitoring and documentation. Good monitoring allows us to know the state of the system at all times. You can’t know if your service is available or any of the other things if you’re not monitoring it correctly. If you wanted to make your services production ready, you want to make sure that they meet the monitoring standard … this would mean something like coming up with a list of key metrics that every service at your company needs to be monitoring, figuring out appropriate thresholds, figuring out what needs to be alerted on, and going through that whole chain. If there’s something wrong with our system, will we know?
It also means having really good logging. Logging is part of monitoring. One of the things that I discovered in a really terrifying way with microservice architecture is that the state of the system is never the same from one second to another. So, really good logging is essential. You will almost never be able to replicate a bug that happened, ever. The only way to know what happened is to make sure that you recorded the state of the system at that time. The only way to do that is through proper logging. It’s so incredibly important. If you know the state of the system at all times through good logging and good monitoring then it makes it really easy to be able to trust your services because you know what’s going on.
Documentation is really important and one that often gets forgotten about. It removes technical debt. It makes sure that when teams move from team to team or when new engineers are brought in, they can learn about the system. There’s another thing that’s part of documentation that I like to call understanding. This is like organizational level, understanding of the microservice. This means putting a team into a room and having them whiteboard out the architecture of their service. If they can do that then they understand their service, right? Having them share that out with their dependencies, with their clients, is really important.
Unfortunately, I’m running low on time. But … oops, I’m pressing the wrong button. All right, last slide and then we’ll go on.
Okay, now you might say, Wow we just went through that really fast, none of that made sense. Oh my gosh! Two things that I want to say is, first of all, you should check out the Microservices in Production report if you’re interested in learning more. Basically, I know I just ran through those standards really quickly, but there are sets of requirements for each of them that you can take look at and that you can apply to your microservices. Once you get to that point you say, Okay, I want to try this out. I want to standardizing my microservices. There are three steps that you can do.
The first is to get buy ins from all levels of the organization. It needs to be adopted at every level really. It can’t just be top down because people will resent it. People won’t understand it. If they’re just being told that they need to make everything a certain way. You need to make sure that engineers and engineering managers and infrastructure engineers all understand that there is a level of misunderstanding which, they’ll know. If you talk to a lot of developers at companies who are working on microservices, they usually say very similar things like, Yeah, I have no idea what anyone else is doing. Or, I have no idea how that service works. or That service is terrible. Or, Oh, they need to fix their documentation. Or, I have no idea how my service works. Which is the scariest one to hear.
Everyone has those concerns and if you talk to engineers and say, Okay, what can we do to make sure that you don’t have these concerns anymore? A lot of those conversations naturally do lead to talks about standardization, making sure that everyone is following some best practices.
The next step is to determine your organization’s production readiness requirements. Every organization is going to be different. Every organization is going to have different application platform, different technologies in each of the layers of the ecosystem. So, determining precisely what will make a microservice stable at your company is going to be different than what is going to make a microservice stable at one of my companies or any other company. Production readiness requirements need organizational context. You will need to sit down and think, Okay, what’s our goal? What are the standards that we want to have? And then, What are requirements that we can add that will produce measurable results? And then try it out with a few services. See how it goes. See what improvements you can make.
Last, you need to make production readiness part of the engineering culture. It’s not a hindrance, it’s not a gate. One of the scariest things for developers is thinking that they need to meet some rules that seem arbitrary just to make somebody on top happy, right? To think that maybe they can’t deploy, they can’t put new features out just because there’s some rules they have to follow … that doesn’t make people happy. That doesn’t help them to do their best work. Honestly, these developers, they want to do good work. They want to build the best service possible. When you give them a set of production readiness requirements, you’re not giving them a set of rules. You’re not giving them a set of constraints. You’re giving them a guide and you’re arming them and empowering them. You’re saying, Okay, you want to build the best service, here, let’s do it. Here’s how you can make sure that you can trust every other service and every other service can trust you.
There are a lot of questions that they don’t necessarily know like, What’s the best database? Or How should I be deploying? Or How should I be dealing with my development environment? When you have those answers and you have them for the whole company and they have good reasons and they produce actual results then it becomes part of the culture and it will get embraced. If you want to learn more you can always hit me up on Twitter. You can also check out my books, and I write a lot of blog posts about microservices, and then, of course you can always talk to me at anytime.
I’m going to go ahead an open up to questions.
All right, I see a couple over there but first, you described going from … taking an organization which had microservices that had no standardization and having to standardize, how long did that take? How costly was that?
It’s still an ongoing process. I pretty much started because there were some services that were really hurting, some teams that really needed guidance, some teams that just needed people who had experience in reliability and scalability and so my team went in and worked with these teams. Made sure that we could actually make their services better and help them, empower them so they knew what they needed to do, empower them so they knew what the stack below them looked like. Most developers don’t know what their application platform or what their hardware or communication layers look like. When you let the know those things, they feel really empowered because then they know what they’re dealing with. They know what they’re building on top of.
Gradually it got picked up. It’s still an ongoing process, right. I’m sure they’re still coming up with more requirements and figuring out new ways. They probably have version 2 or version 3 out right now.
You mentioned proper documentation is one of the crucial steps. In my experience, code changes so frequently that it’s really hard to maintain an update documentation. How do you approach this problem?
Susan Fowler: Could you say that last part again?
Flynn: How do you approach the problem of keeping documentation up to date in a world where code changes as rapidly as code tends to change?
That’s a really good question. I like to take a very higher level approach to documentation where, keep documentation to only what is useful and relevant. A really good example of this is what is something that is not going to change very often? The architecture. In your documentation, you should have an architecture diagram. You should also have a description of the API end points. What are the end points? What are the responses? Stuff like that. I think that a lot of times documentation is sold as or included as something that is this very tedious, kind of like a post mortem for an outage. You think, I made a code change, I need to write a post mortem now. No, no, no. That’s not what it is at all. Actually, one thing that I instituted when I was back at Uber, are these things called architecture reviews.
Basically, we would all meet with the team and we would say, Okay, we’re all going to sit down and make sure we have an understanding of what’s going on and then we’ll update the documentation. You can schedule these things. I find that’s the most effective way.
Audience Member 2:
You had shown a diagram where there was split between hardware and operations, and microservices at the top. Isn’t it kind of contradicting to devops where there developers are supposed to be doing the operations part as part of microservices but over here we are telling that the Operations Team is going to be separate. Can you please explain that part?
Yeah. So, basically what happens is that everyone needs to be an engineer both on the development side and the operations side and do both of those things at every level. For example, if you have these microservice development teams, the microservice teams are going to be, or should be on call and responsible for all the maintenance of their own services. Then, on the application platform layer, you’re going to have teams that are building out these deployment pipelines, all the logging, all of the monitoring. They’re building these and they’re on call and they’re maintaining them and they’re running them. So basically, I think that the divide between ops and developers when you have these really crazy large microservice ecosystems, they need to mold into the same thing. The developers and the Ops engineers need to be able to fill both of those roles.
Flynn: Anything else or is everyone just jonesing for beer now? I think I’m going to take that as everyone is jonesing for beer. Thanks very much.
Susan Fowler: Thank you so much.
Give Susan a hand. All right we did it. We’re done. Thanks everyone for coming out. Thanks to all the staff who’ve done production and have kept the place clean today. Really really amazing stuff. Thank you to all the speakers, fantastic talks. If you guys want to talk microservices but you need alcohol to do that, we’ve got that. So, we’ll do this again next year. My name is Austin, this has been a Datawire Production. You guys have a good night. Cheers.
Try the open source Datawire Blackbird deployment project.