Rapidly Updating Microservices – Screencast

Richard Li and Rafael Schloming (Datawire)

Description

This video introduces the open source Datawire Microservices Development Kit and Mission Control, which lets developers quickly build, launch, and debug microservices using the frameworks they already know in Java, JavaScript, Ruby, and Python.

Transcript

Richard Li:
"We're going to talk about rapidly updating microservices. Microservices is an architect shift for the distributed development of cloud applications. We're all familiar, I think at this point, around the concept of microservices, but the reason we talk about distributed development is because it lets different team independently release features. With microservices, instead of one long release cycle of monolithic applications, you actually can release each of your microservices with separate release cycles that are really tailored to the duration that you actually mean."

"What do developers actually need to build microservices? Here at datawire, we've been asking ourselves this question. The answer is you don't actually need that much more. You have a modern web application stack that includes business logic, a web application framework, like rails or flask or jango, and a persistence layer. This actually gives you most of what you actually need to build a microservices. The key difference is that web applications don't typically interact with other web applications. They interact with users and their browsers."

"With microservices, because they interact with other microservices, they actually need an additional later, or framework, that lets them do that. Examples of interactions with other microservices include locating other available microservices that you can communicate with. It's advertising your own availability as a service saying, for example, that you are the payment service with a particular version. Other services that need payments functionality can actually communicate with you. It's about logging, because now if a request goes across multiple microservices, you can't actually see that the context for that request isn't just one microservices, it's actually multiple microservices."

"It's about error handling, so if one request doesn't actually ... one microservices doesn't respond, you have actually a plan to handle that situation. We're actually introducing the open source microservice development kit that gives you that all the behavior you need for those interactions with other microservices. It's actually complementary to your web applications framework. What we're going to do now is we're actually going to show you the MDK in action."

Rafael Schloming:
"As we've been saying, one of the great things about microservices is you can build them using all the tools and techniques that you're familiar with, but the irony of this is that there's actually hidden danger with this. So much is the same about building microservices and building traditional web apps that it can be easy to ignore the differences. When you ignore the differences between building microservices and building traditional web services, that's where you can get into some trouble. As we've mentioned, the key differences are around interaction points with other microservices and this is where the MDK comes in."

"It's not a framework. It does not replace any of the tools you know and love. It's a library that interfaces with a number of open protocols designed to work in cooperation with your favorite language and framework and give you the tools you need to handle those interaction points safely. To show this working, we've taken the most popular frameworks in Python, Java, Ruby, and JavaScript and we have ... using the MDK, we've made it possible to build a micro-service ... build a full fledged feature as a micro-service in each one of these languages in your framework of choice within a few minutes. To who you how this works, I'm actually going to start with flask, a very popular python web app framework and show you how we can use the MDK to turn the hello world flask application into a micro-service."

"It's not a framework. It does not replace any of the tools you know and love. It's a library that interfaces with a number of open protocols designed to work in cooperation with your favorite language and framework and give you the tools you need to handle those interaction points safely. To show this working, we've taken the most popular frameworks in Python, Java, Ruby, and JavaScript and we have ... using the MDK, we've made it possible to build a micro-service ... build a full fledged feature as a micro-service in each one of these languages in your framework of choice within a few minutes. To who you how this works, I'm actually going to start with flask, a very popular python web app framework and show you how we can use the MDK to turn the hello world flask application into a micro-service."

Richard Li

"What happens when you actually have multiple microservices? When you have multiple microservices, it's an environment that's much more dynamic. You're actually constantly updating microservices. You may have one team that's reverting one version of micro-service an older version, for example, performance reasons, or another team that's rolling on a new micro-service that canary test a new feature. A third team might be working on updating a micro-service with a bug fix and so on and so forth. With all these constant changes, each of these changes introduce the probability that you might actually have a bug. When you roll out a new change, it might introduce a regression in the business logic. The update might not be able to handle the load once it gets to production. Perhaps the new version actually is fully tested by itself, but once you roll into production, you realize that there's another micro-service that depends on that micro-service, and that service is no longer respecting that contract, which causes the calling service to throw an exception."

"What do you do about this situation? The traditional answer to this is canary testing. What you do with canary testing, you roll out a version of the microservices, you route a small percentage of your traffic to that micro-service, monitor it for errors, and gradually increase the traffic over time until you've actually migrated from one percent of the traffic to this micro-service to 100%. Canary testing is a very powerful tool. Unfortunately, canary testing is really designed to help you with scale and performance testing. It helps you quickly understand whether or not your new version of the service is ready for production."

"However, sometimes your canary turns out to be an ostrich. By that I mean you actually canary test something, it actually works just fine under load. The problem is there's actually a software bug. That software bug is created by a human. An example of this might be you've actually introduced a software bug. It actually returns an object or a value that is not understandable by the calling service, and that calling service throws the exception. What do you do? You don't want to bury your head in the sand like an ostrich, so we're actually introducing mission control, which actually helps you manage and monitor multiple microservices."

"Mission control has a dashboard, and you can see here the dashboard shows the two instances of the microservices from the hello world example. What we're going to do is we're actually going to kill both of these instances, and you're going to see mission control actually update. It's going to shut down. What we're going to do is we're actually going to launch a micro-service application. The countdown application, which will simulate multiple microservices, four of them. You can see them starting up here. There are two and now four healthy services spread across 11 instances for high availability. The countdown service features a front end service that you can krill to. That talks to three back end services, the time service, the election service, and olympic service."

"The idea behind this is that each of these services represent features that you can rapidly iterate on independently. You can see here we've done a krill to the front end service, and we've gotten results from the time, election, and olympics micro-service. All of these services also have logging information. If you actually go to the log console, you'll see the request actually come across in the log console. If we look at the log console here, we can actually zoom in and look at the all the log messages involved in this one request. This is one of the reasons before that we were using the MDK to actually log."

"If you just use a regular logging system, and you have 11 processes running in the cloud somewhere, all of these logs are going to go ... they're going to go to different places, and the machines they're running on are going to not necessarily have synchronized clocks. You're going to have a lot of trouble actually figuring out what log messages on one service caused log messages to occur on another service. This makes it really hard to understand what's going on in the context of a distributed request like this that is spread across so many different servers. Because the MDK is actually managing a distributed trace ID behind the scenes for you, when you use the MDK to log, it makes it really easy to take this and pass this token around amongst your services. That means we can actually construct a complete causal order for all of the log events that occurred by all the services involved in this distributed request."

"You can see the front end logs that it is received an incoming request. It logs that it's initiating a request to the time node. The time node logs that it receives an incoming request and that it responded and so forth. You can see this is the exact same order you would expect these to occur if they were all within a single process on one machine with a single clock."

"What we're going to do now is we're actually going to show canary testing by rolling a new version of the time service, 1.2, into production. We start the time 1.2 service, and you can see when we krill to the front end that some of the results will be the time 1.0 service, some of these will be the time 1.1, some of them will be time 1.0, 1.1, and 1.2. some of the requests, it's a small percentage request, are actually going to time 1.2 service. Again, as we mentioned previously, this works great as you're incrementally ramping up production. However, we're now going to show you an example where the bug, there's actually a bug. That bug is actually in the election micro-service. We're going to show you what happens when that bug is an ostrich and how you actually handle that situation."

"This is election version 1.3. We actually started out with election version 1.1, which was a very shallow bug. For every request we got, we just blew up. It turned out this was actually a little bit too simple of a case. The root cause for this bug is really obvious, the symptom is right ... or the cause is right where the symptom is. This kind of bug, you generally catch with local testing. Version 1.2 of this, we actually inserted a time dot sleep in this service. Again, that kind of bug, it was pretty obvious which service was misbehaving. That's actually an ideal case for something like canary testing. This version 1.3 is actually a full on integration bug."

"The reason it's tricky is because what we've actually done here is taken the original hello world service that we started out with, and we've actually modified it to advertise itself as election version 1.3. The reason this is a problem is because every other service in this ... or all the services that depend on the election service in this topology, they're not expecting to get back this hello world string. They're expecting to get back a [jason 00:16:49] data structure full of something of some useful information, but this version of the election service actually thinks it's completely healthy and running just fine. If we start this on local host port 6000, we can actually krill directly to it and see it thinks everything is hunky dori."

"If we actually go to our front end again, and we might have to make a few requests here, we can see when we actually hit this thing, there's an error, but if I keep hitting it, you'll see that error doesn't pop up again. That's because the MDK actually ... the circuit breaker logic in the MDK will trigger as soon as that error is actually detected. It will temporarily blacklist that particular node that the front end was talking to when that error occurred. This gives you ... this mitigates the impact of actually pushing this integration bug into production. It gives you the opportunity to go look at mission control and figure out what's actually going wrong."

"You can see, I can filter down and look at ... there was actually a request with an error. I can actually see the stack trace for the error here. I can zoom in and look at that in the context of the full distributed trace. Now, as we predicted, the front end tried to parce the hello world value as a jason object, and that didn't work out so well. It's pretty obvious in this distributed trace what's going on because the front end, when it was actually reported, the MDK saw that we were actually talking to election version 1.3, and it included that information in this error report. If it's not obvious to us why we blew up right there, and the cause isn't local, this gives us a big hint as to where I might go looking for changes in this particular suspiciously new version of the election service that was launched to see what the root cause of an integration bug like this might be."

"The question is how do you actually run as fast as an ostrich? When you have a software bug and it's being shipped, you actually want to cut it immediately out of rotation. Even if one percent of your requests are going to a system with a software bug like we just showed with election 1.3, that's one percent of the requests that are actually failing. A circuit breaker integrated with service discovery actually gives you that resilience. The answer to this is you want to limit the impact of the failure with something like a circuit breaker and service discovery. The second thing you want to do is you want to be able to quickly root cause that issues with distributed tracing and logging in a visualization for that. That's what we do with mission control."

"Data wire, with the MDK and mission control, integrates service discovery, dynamic routing, distributed logging, circuit breakers, and a dashboard so that using your existing web application framework, you can actually ship a feature as an independent micro-service right away. The MDK and mission control, they're engineered and designed for open source. The MDK is 100% open source. Not only are the server components in mission control open source, all the protocols between the MDK and mission control are also open source. We actually formally define the exact protocols for discovery, logging, and feature protocols as part of the MDK, and using our cork [transpilar 00:20:31] technology, we actually trans-pile those reference definitions of the protocols into language native implementations in java, JavaScript, ruby, and python."

"We're working on additional back ends. Most popular request for us today is go. The reason why we do this and we wrote a compiler is because it really makes it so that each of these languages can inter-operate seamlessly. We think that's critical when you're adopting microservices, because there's no one language that fits for everyone. The beta today is available. You can go to app.datawire.io, create a free account, and get started in ten minutes."

Expand Transcript

Stay in the Loop

Keep up with the latest microservices news.

Simplify and streamline microservice deployment.

Try the open source Datawire Blackbird deployment project.