Microservice, Microservice. Wherefore Art Thou Microservice – Nic Jackson

Nic Jackson (Notonthehighstreet.com)

Description

“Microservice, Microservice wherefore art thou Microservice”, if you are thinking this talk will be littered with Shakespeare references then you will be in for a disappointment if you are hoping this talk will unlock the secrets to some effective patterns for service discovery in your microservice architecture then you are in for a treat.

Service discovery can be one of the most difficult techniques when learning microservice patterns, especially if you are running a containerised system. In this talk, I will discuss some of the common patterns for service discovery and how they can be used in your environment. Additionally, we will explore a couple of frameworks which do the hard work for you, akin to how in the financial sector, algorithms and platforms simplify the process of identifying the best cryptocurrency to invest in, by providing real-time analysis and market insights.

Key takeaways:

• What service discovery is and why you need it
• Introduction to common service discovery patterns
• Patterns for fault tolerance and high availability

Presentation Slides


Transcript

Austin : “Hey guys, this is Nic Jackson. Thank you Daniel Bryant for that fantastic presentation and amazing Q&A session. I saw a lot of comments about that coming in so that’s awesome. We’re enjoying the Q&A and thank you guys for putting questions in the chat and sending out questions down there below in the Q&A session. It’s just fantastic. I want to introduce you guys to Nic Jackson from the UK. Nic, I know you’ve been watching us earlier, we’ve gone to [inaudible 00:00:42] to UK, with a little bit on the east coast and we’re going to gradually just time zone our way across the globe and finally, end up on the west coast in the US.”

“Nick is the engineering evangelist for a really fast-growing e-commerce the Notonthehighstreet in the UK. He manages all the engineering practices and stuff, but I’ll have him talk to you about that. Welcome Nick.”

Nic : “Awesome. Thanks. That’s me, right? Okay, so it’s really great to be here today. I’m super excited about everything that’s going on in microservices at the moment. For me, it’s one of the most interesting feels in a long while because it’s almost, in some ways, a realization of a lot of dreams and problems which have been occurring in the industry since we started trying to push applications on to the internet. As Austin was saying, my name is Nic Jackson, I’m an engineering evangelist. I work for a company in the UK called Notonthehighstreet. Notonthehighstreet are a reasonable-sized company. I mean we turnover about 160 million pounds, about $210 million per year. We’re reasonably scaled. We’ve been [inaudible 00:02:04] about 10 years now. In fact 10 years just as a first birthday and we’re still, I supposed we’re technically a startup in the fact that we are VC-backed and the original founders.”

“Certainly, we’ve got huge aspirations and really big plans for what we want to do with our business and how we can grow it and continue to sort of sell some amazing product. One of the things that we’ve kind of been looking at is microservices. The reason for that is that 10 years ago, we started out with a PHP application and for a time, that PHP application served our business incredibly well. It allowed our business to grow. We went from a zero turnover business up to like a million pounds, more half a million dollars but then, we needed to evolve on. We then built a Ruby on Rails application. Again, this was the first foray for the organization into e-commerce. We actually have the capability to sell these products online which was a fantastic thing for us.”

“That allowed our revenue to grow from half a million dollars … Sorry, about a million pounds up to where we are today. As we’ve continued to grow, we’ve realized that the bus sort monolithic pattern is not serving us so well. The fact that the engineering team has gone from three developers to nearly 15 developers now has introduced a hell of a lot of complexity especially because we practice continuous integration and continuous deployment so the teams are regularly pushing stuff into production maybe 10 times or more a day which is a wonderful way to work.”

“We realized about 18 months ago that if we wanted to scale, if we wanted to take advantage of the cloud which can allow us to bring us closer to our customer when we expand internationally, then, we really needed to change our patterns, we needed to change the way that we were doing some things. Microservice came up and Docker, everybody was like “Oh wow. We’re going to build up again.” That was amazing until we realized just what we didn’t know about this.”

“Microservice, Microservice, Wherefore Art Thou Microservice. If anybody is out there and they’re hoping we’re going to go a little bit with Shakespearean references and this is going to be delivered in a flowery prose, you’re going to disappointed, it’s not going to happen. I don’t know a great deal about Shakespeare and I don’t know a lot about flowery prose but what I do want to share with you is some of the things that we’ve learned and some of the things that I hope will be able to help you when you’re starting out on your journey or just continuing on your journey into the wonderful world of microservices.”

“For me, the thing about microservices is that a microservice architecture is a like a big truck. You think “Well okay, big truck,” but the thing about big truck is if you try and operate it without the correct instruction, then, it’s not going to go particularly well. Now, the core problem for a lot of people who’ve been working in the industry and work with [inaudible 00:05:37] monolithic applications is they look at that big truck and they see big car. Because they’ve got a driving license, because they know how to operate a car, they say “Well, hey, I must be able to drive a truck, correct?” Well, you go from this, you’re looking at that little car. That’s a lovely picture of a gentleman in a Pike which is exactly how we drive our cars over here in the UK and you got into that big truck but the second you get into that big truck and set off down the street and you start driving, you end up in this situation.”

“It’s only when you end up in this situation that you actually realized that you might need a little bit of help in operating this big truck. The thing is that they’re not so different but you just need to understand some of the common patterns and I hope by the end of this talk, then I would be able to help you with that. What are we going to cover? Today, I want to just cover some things around service discovery and I want to cover some things around some software patents which can help you both on the client and help you protect your services against cascading failure.”

“We’re going to look at service discovery both service side and client side and we’re going also look again at some fault tolerance patterns. We’re going to take a look at some things circuit breaking, some load balancing and some kind of ways we can do a nice management around decoupling ourselves and looking at some nice stuff. If you’re not already practicing microservices but you’ve been working in an industry for a while, then, at some point, you’ve most likely been exposed to service-oriented architecture and even if you’ve only been working with monolithic applications, you’ve most likely interacted with third party services.”

“What I’m not going to do is I’m not going to tell you why you should adopt microservices because you’re already here, you’ve already made that decision. Let’s look at the history and see how things are different today and how we can improve our things. In any application, services need to communicate. Now, whether those services like in a monolithic application are actually classes and that they’re talking through ROMO, that procedural calls [market 00:08:08]calls or whether they are actually distinctly separate applications and are communicating over REST or RPC like in the service-oriented architecture, the thing that’s very common is that in both incidences, the entities of both of the services are known to each other. They don’t change very often.”

“They’re running at fixed locations, well-known and that’s what enables them to communicate without you having to worry about it. When we implement microservices, that all changes. It changes because the applications are typically running in a virtualized, they’re running in containerized with Docker environment and they’re dynamically changing all of the time as the various different forces are being applied to your application and that’s amazing, right? That means that you can scale different parts of your application as and when the demand meets it. You don’t have to scale a whole thing at once. Once that’s an incredibly powerful thing to have, how the hell do you know where the services are running.”

“How do I know which box my services are running or how the hell do I know which port it’s running on and what’s the IP address. It’s very sort of different and changing environment. Daniel had a great quote, which, I don’t know if he spoke about it in his earlier presentation, I missed a little bit of it but he mentioned it when I saw him a few weeks ago and that is microservices are easy but building microservice systems is hard and I [inaudible 00:09:54] a lot. He mentioned earlier that what you’re actually looking at is a truck but you’re seeing a car. Let’s see how we can solve that problem. Let’s have a quick look at service discovery.”

“There’s two main patterns for service discovery, we’ve got server and we’ve got client side discovery. This is probably what your set up’s going to look like, you got multiple incidences of the same service. They are potentially running on different machines and you got a client which needs to speak to a single incidence. It doesn’t care which one, it just wants an answer. You need a method for your service to register itself. Service A needs to register itself so that the client can call the registry and it can say “Hey, where the hell are you?”

“Let’s have a look at some of those options. What we don’t want is to be in this situation here. Now, seriously, if you’re running in a microservice environment and you don’t have a solid base for service discovery like everyday, your clients are playing Where is Wally? That’s not a good place to be because that’s going to be leading to errors, it’s going to lead to problems, it’s going to lead to frustrations for the engineers.”

“DNS, right? DNS is probably the most simplistic approach. It’s also probably one of the most well-understood approach because it’s been around since the dawn of time. Well, the dawn of the internet anyway. Simply, what we would do with DNS is we’re registering the service and we’re allowing the DNS server to respond the records and we’re going to round-robin through those, we can expose SRV records so we can get both the forms and we can also get the IP address and … Sorry, but something was just beeping there. That’s a really nice thing but the disadvantages with DNS are the things that it was designed to deal with. I mean we’ve got TTL, right?”

“Now, in a dynamically-changing environment, TTL is an absolute no-no. You can’t really be dealing with the fact that you might have an hour long sort of cashing on our DNS record. Now, of course, you can reduce the cash right down but then, if you reduce details next to nothing, you’ve got in an inefficiency in the system as well that you happen to make a lot of calls. How do you update it? How are you going to update your DNS records? These sort of problems which I think almost ruled in … There may be some uses but I want to look at some more methods and some sort of pieces of software out there which implement a better approach with dynamic service registries.”

“Some of these are Zookeeper, Consul, we’ve got Eureka which is Netflix and we got Etcd and Sky DNS. Now, Zookeeper was originally developed as part of the Hadoop cluster and it does a bunch of interesting stuff including leader election, message cuing, key values but then, it’s hard. It is just a hierarchical names space for storing information. Now, the problems with Zookeeper are that it’s … supposing in modern terms, it’s a little bit outdated. It does maybe too many things. It can be also quite difficult to set up and manage. We have a bit of a love-hate relationship with Zookeeper at Notonthehighstreet. We run Zookeeper cluster but we only run it because we’re on Mesos. Mesos and Marathon need Zookeeper. For our service registry, we use Consul. Which I want get on to next.”

“Consul is a lovely little product. It’s out there from HashiCorp and they’re doing some pretty amazing stuff out in the field of microservices right now. It does two things, no, maybe does three as well. You’ve got your key value store which you can use for your configuration which again is a useful thing having a distributed configuration. It does service discovery. You can either access the service discovery over the standard HTTP RESTful requests or actually, you can use Consul’s DNS and use the service records from the standard DNS interface.”

“One of the nice things that Consul does is Consul will also do health checks. You can tell Consul to check the health end point of your service. You can only choose to accept registered end points which are healthy and that can be quite a nice feature, all of that out of the box. I definitely recommend you check this out. It’s a really great product and I think certainly one of the better ones which I’m dealing with right now.”

“What about Eureka? Now, Eureka is from Netflix. It doesn’t try to do everything like Consul or Zookeeper. It provides basic load balancing in a REST phase standpoint so you can write your own clients. There’s also a Java client which has got additional capabilities. It can do health checking and stuff like that but if you’re not using Java, you might need to find something else. Etcd. Now, Etcd is a distributed key values store which can be used for shared config and service discovery. It’s incredibly reliable. It’s used by things like Kubernetes.”

“From an API perspective, it’s all PC which might not fit into your set up depending on what sort of communication protocols you’re using but there’s actually quite a nice add on to Etcd which is Sky DNS. I think Sky DNS is also used as part of the Kubernetes project but what it does is it sits at top of Etcd and it provides a thin layer which allows you to query the store using DNS service records. Sky DNS also implements a simple RESTful interface. We’ve got services, we’ve got our registry, how are we going to make sure that the service registry is kept updated. Personally, we really like Registrator and Registrator’s been around, I don’t know how long but we’ve been using it for about 12 months now.”

“What it does is it’s a very light-weight process that it runs a Docker container on your cluster. It reads the Docker advance and it will detect when a container is started and it will detect when a container is stopped. It’ll take that information and it’ll send it to a backend service registry like Consul or Etcd or I think it’ll work with Eureka and pretty much quite a few of the backends. It allows you configure things in quite a nice way that you can sort … you’re only using environment variables as part of your container config which is, it’s quite an altrusive and Registrator will sit there quite happily just in the background. There’s a couple of other things which I want to talk about a little bit more in depth later on because I want to talk about the Autopilot pattern but we have Container Pilot as well from Joyent which I think is a lovely product and a piece of software as well so we’ll take a look at that.”

“Service Side Discovery. We got our registry but how do we solve the problem of our clients knowing where the services are, which one to contact. One of the patterns which you’ll commonly see used is service side discovery and the client in essence is possibly an API Gateway. It could be something like Com, it can be EngineX, HAProxy or a combination of multiple things. This is quite a nice pattern to use because you can put a whole load of logic inside the client. You can deal with health checking and load balancing and circuit breaking. You can cluster them and also, it’s a nice stuff. Personally, I think this pattern is mainly useful as part of an API Gateway. For request outside of your services coming into the cluster and they could be public or they could protected third party requests.”

“One of the modifications to this which I think is something that we’re starting to look very heavily at when we’re talking about how does Service A talk to Service B or Service B talk to Service C is client side discovery and with client side discovery, what you’re doing is you’re delegating the responsibility for service discovery and circuit breaking and load balancing to the client that’s making the request. One of the beauties of doing this is that you can handle that logic internally on individual app by app basis. You get more control but you’re also looking at the shortest path between your services. You’re not going through a centralized source which is more of a, sort of an old school Enterprise Service Bus pattern. Service buses are perfectly fine but I’m not really sure you need them. I think this is a good pattern.”

“How we implement this pattern in the client or even if you’re writing your own API Gateway. We need to sort of consider a few fault tolerance patterns. We need to look at things like timeouts, circuit breaking. We need to look at how we’re going to put bulkheads in our services. How we’re going to fail fast. These are the things that are going to keep our application up and running. We have an incident in Notonthehighstreet when we first put our application up and it went down, it crashed. The entire stock went out. It wasn’t a malicious request, it wasn’t too much load, we did it ourselves.”

“We basically implemented a time-marked pattern which was inefficient. We tied up all of the connections to the server and then, all of our health checks started failing and it was just a massive cascading failure. I can deliver a lot it from actually the experience of taking a, slightly bit of a beating from our CTO at that time. Let’s take a look at them first anyway. Timeout. Now, it’s essential that any resource which is blocking your thread has got a timeout because we need to ensure that the thread is eventually become unblocked whether the service is available or not.”

“From a client’s perspective, making me wait is a bad thing. What you want to do is you want to return a response, success or fail as quick as you possibly can. It’s also a way that you’re going to be able to protect your downstream services and your system because you’re not tying up the thread in a good service which is waiting for one which has already failed downstream and then, thus, causing that cascading failure where the whole thing goes down because of one weak link. You can introduce a point of failure.”

“Circuit breaking combined with timeouts is a really lovely way to deal with this. I’ve lifted this diagram out of, I think Michael Nygard’s book Release It! I either stole it out of that or I stole it out of Sam Newman’s Building Microservices books, two of which are both fantastic pieces of reading. The circuit breaker pattern is implemented in the client. In a normal situation when the circuit breaker is closed, the request will pass straight through to the downstream service, the call will be made, the response will be returned and everything will be happy.

“Now, in the incidence that the downstream service doesn’t respond, the circuit breaker will increment it, failure can’t. You are setting that tolerance, right? You control that area or business logic. You may determine that one failure is enough to break a circuit or you may say “You know what, I’m going to give it a couple of chances.” Either way, once that failure count has reached its threshold, the circuit breaker then enters the open state. If you try to make any further requests to that downstream service, it’s not even going to try it. It immediately just going to return a fail state.

“After a period of time, the circuit breaker will then enter a half open state which will then allow yet another request for the downstream service. If it succeeds, the circuit breaker will again return to the close state but if it fails, it will immediately revert back into the open state. It won’t wait for any failure count. This is a really nice pattern to help protect yourself. If you’re using this with something like load balancing, then, you’ve got the capability that if it’s particular service of a cluster which is misbehaving, you can make the attempts too to request some information from it from your client.”

“When it fails, you break the circuit and you can move on to the next one and your load balance splits. You can be sure that you’re not going to be wasting effort retrying the dead service before the service registry has time to remove it and then, you got plenty of things from your service discovery. That’s a really nice pattern. I think there’s a lot of libraries out there which implement this. I think it was probably popularized by, again, Netflix with Hystrix. Hystrix is being copied in a number of places. They’re using something, a little bit more modern than Java which … I’ve got nothing against Java honestly but I am a professional Java troll according to my Swack profile so I want to get at least one trial patrolling.”

“I mean, again, one of the things around services is that you possibly have a service which is providing information to many more services so in this incidence, we have Foo and Bar, and Foo and Bar both depend on some information that bus can provide. Let’s make an assumption that the software engineers who have been working on Bar haven’t been particularly gracious in the way that they’re going consume the resources of Bazz. What they do is they execute too many requests, they send some bad information. Whatever they do, they cause Bazz to fail and because Bazz has failed, Foo and Bar and Bazz have all failed because they can’t complete their actions.”

“One of the things you can do if you’ve got a dependent service which is more of a higher priority than others and in this incidence, we’re going to say Foo is our most priority service. This is our payment service. Bar is just an email service so we can deal with some latency in there but we can’t we deal with latency with payments. It’s primary importance to us as a business to make that money. What you can do is you can use a bulkhead and a bulkhead is exactly the same as what you would find as a bulkhead on a ship. It compartmentalized … Good God, it is [evening 00:28:07] over here. It allows you to sort of say right, this section of the cluster of Bazz is protected and is only usable by Foo and this section is only usable by Bar or whatever. If the same developers write that bad code which causes Bazz to misbehave and Bar goes down, we don’t care because our payment gateway which is our Foo service still has uninterrupted access to Bazz.”

“Again, a technique, we employ this at Notonthehighstreet. We have services which are used by both our front end consumer facing application and we use the services which are used by our API which the partners use and also, from our partner management system. We’ve implemented bulkheads separating those. If a partner causes the API to run slowly, that’s not going to have a negative effect of slowing down a consumer experience. It’s something that’s working out really nicely for us.”

“I love this picture. This picture is awesome. Ultimately, it’s not just about failure, right? The key thing about failure is that you need to fail fast, right? You don’t want to fail slowly, you want to fail fast and you want to understand why you failed and learn by your mistakes. There’s a number of reasons that can cause a failure and there’s a number of ways that you can ensure that your service has failed fast. One of those things is smart health checks. Now, again, quoting Release It! which I think is just a super book. Michael Nygard talks about handshaking. He advocates a pattern where before you make a call to a downstream service, what you’re actually going to do is you’re going to handshake with it. You’re going make a request to it. You’re going to do maybe do the health check on it. You’re going to say “Hey, are you okay? Can you accept my call?”

“What you can do with that is then you can immediately determine whether the request is going to succeed so you don’t need to wait for any timeouts. Again, you’re failing fast. Now, the disadvantage of that is that you’re making an additional network call, you’re making an additional call which is increasing the latency in your application, so a pattern which I’m looking at at the moment which I advocate, you also to look out is that the service itself knows its own state, right? If I have, again, using our service Bazz, if Bazz knows it’s in trouble, if Bazz can monitor it, it can action pool so it can understand when Bar is exhausted. If Bazz can understand that it’s ran out of disk space for example, or if Bazz is understanding that in a little bit more of an advanced case, that it’s actually running slower than its SOA.”

“Why doesn’t Bazz immediately just reject the request? As a client, I make a request to Bazz, Bazz goes “I’m sorry. I’m just too busy. I can’t deal with this right now,” immediately sends a fail state. The client can then handle that by breaking the circuit and then, trying the next one in the loop. In terms of the efficiency of that call, it’s just the one call to the downstream service to understand that you failed. We don’t have to do any additional handshakes or anything which can be a really nice pattern and I think that’s something that we’re really looking into so we can put some more intelligence into the way that we’re load balancing our services.”

“In certain incidences, we will have long running CPU-intensive or just machine-intensive operations and it would be nice to be able to be smartly distributing the load to our services. I mentioned earlier that we were going to take a look at the autopilot pattern as well. This is something that I really like because for me, I’m a big fan of decoupling in a bad extraction. I don’t really like being constrained to a particular platform. That’s one of the reasons why I love Dockers so much. It gives me the flexibility to easily move a pre-packaged application around it, a variety of distributions.”

“The autopilot pattern which I’m guessing was developed by Joyent and the excellent container pilot software which I really recommend you check out is a great way of managing a lot of the things that you’re going to deal with in service discovery. We’ve already said like how do I get the service registry. We’ve said that well, for service to service communication, we’re going to use a client-based service discovery but we didn’t really talk about where is that client going to get the service information from. We said that it could get it from DNS, it could get it from the APIs, or obviously, depending on whether you’re using Etcd or Consul but what is the standard pattern on doing that.”

“An autopilot is a nice way of doing it. It’s kind of, I suppose you would have a system which is working, almost like a [inaudible 00:34:04], it’s responsible for monitoring not just the health of your application but it’s responsible for talking to your service discovery layer. In effect, it is an abstraction, right? The actual service itself and the service level code has got no knowledge of whether it’s talking to Consul or Etcd or anything like that. That knowledge is completely taken away from it which is a great thing.”

“Container body deals with that, that level of communication. It will talk to Consul and if it detects that there has been an update in the service discovery … Sorry, in the service registry, then, it can pass the message back to your application. The other nice thing is that with container pilot, container pilot will also be responsible for starting your application and therefore, registering it with the service registry in the first place. It can also execute health checks against the application so that if the application starts entering a fail state, it can then de-register it from the service registry so all of the other containers and application services in the cluster, I’m not going to be trying to contact it. It’s a really nice thing. They’ve got plug ins for a whole bunch of stuff, it will do Consul, Etcd. Very simple config, very lightweight, runs and go, definitely, definitely check that out.”

“Now, Frameworks, right? If I look at the history of where we’ve been with microservices, I think initially maybe 12, 18 months ago, we had a lot of problems with how we could build these things and how we could deploy them and I think that problem is fast becoming solved now. There’s people like Codeship, there’s people like CircleCI. You can run a bunch of stuff in Jenkins if you want to roll your own. You’ve got Travis. Building and deployment is really not a problem. We can run a job just about anywhere and we can push that container to a registry.”

“Then, orchestration was a bit of a problem. You were almost forced down the root way. You had to either use like a relatively immature version of Elastic Beanstalk where you’re running just a single incidence of Docker which is running one container or you’ve got to spin up a huge sort of cluster with Mesos and Marathon which is a fabulous piece of software. For a smaller business which doesn’t necessarily have the operational expertise around managing Zookeeper, clusters and all the other stuff can be a little bit daunting but orchestration again is becoming a solved problem.”

“I think, you’ve got people like Microsoft who are doing Mesos as a service. I think I read something today that DigitalOcean have partnered Mesosphere and you’re going to be able to get DC/OS in their platform. That’s awesome. You’ve got Docker Cloud which is a great service. Google Container service has come a hell of a long way and it’s looking really solid. And as has Amazon’s AWS, a massive container service.”

“Orchestration I think has now solved problem for us. I think you’ve got multiple choices. You can take whatever you want based on your favorite flavor of ice cream and the scale of your business. Service Discovery is still something that’s a little bit idly and we’re still spending our stacks of Consul or we’re still spending our stacks of Etcd and that, for me, is a problem. I shouldn’t have to deal with service discovery and yes, okay, like with DC/OS, I can get service discovery and yes, with Docker 1.12, I can get service discovery and things like that with Swarm. Again, I’m unlocking myself into a platform and it feels a little bit against this pattern here, the autopilot pattern which is that my container should not really be bound to any knowledge of what they’re running on.”

“There’s some really interesting stuff out there. Like Datawire, right? Datawire who are hosting this today and who, I think have got something really interesting going on with their platform and I was really pleased with that. I really looked at this the other day and I love the idea that you can have service discovery and service registry as a service. As an application engineer, I want to write application code. I don’t want to be dealing with infrastructure. I think they’re doing a lot of smart stuff with the patterns that they’re employing.”

“MDK is doing things like it’s doing load balancing, it’s doing circuit breaking, it’s doing automatic service discovery. Definitely, check out the stuff that’s just launched today. I never had a chance to check it out but the thing I love the most about this and I’m not saying this just because of Datawire but I think I love about this is that they are now starting to solve one of the last problems that I have with microservices. Soon, I think within 12 months, that this again is not going to be a problem. When you’re asking yourself “Should I build microservices or should I build a monolith?” We don’t even need to make that question. We’re going to have the things that we need. We’re going to have the services cover the orchestration, we’re going have a [vent 00:40:08] busters potentially as a service.”

“All of that is really going to allow us to concentrate on creating great applications, great businesses, delivering great value to our customers and having absolute all of the time software engineers because we’re not dealing with problems we don’t want to deal with.”

“Lastly, for me, I’ve really enjoyed talking about this today. It’s something I’m really passionate with. Please feel free to contact me, ask me some questions now, obviously, but if there’s something you haven’t thought about. Got some stuff on GitHub. I’m a big fan of Open Source, I try to push as much as I can on there. Notonthehighstreet, we’re hiring. I’ve got to say that. They pay my wages. They’re a lovely company, don’t get me wrong. I couldn’t be more happy to work for them, but seriously, we are hiring. We’re definitely awesome than any other company out there. I think we’ve gotten [inaudible 00:41:09] … any questions.”

Flynn: “Okay. Well, first of, thanks very much for [inaudible 00:41:18]. Glad to hear you’re taking a look at [inaudible 00:41:23]. I actually was going to ask you. You’ve mentioned a bunch of things in your set of tools about, really a bunch of tools that are built on this model of strong consistency. Have you thought much about strong consistency versus eventual consistency versus throwing consistency out the window. Do you have any thoughts on that?”

Nic : “[inaudible 00:41:48]”

Flynn: “Actually, I can’t hear you right now. Oh, all right. Technical difficulties. We could do this by sign language, interpretative dance. Can somebody on the chat let me know if you can hear me? Excellent. Okay. In that case, Nic, why don’t you go ahead, if others can hear you and we’ll just assume it’s something on my end. No? All right, so … Okay. Austin, is there a magic you want to do here? When in doubt, call the producer back into the room.”

Nic : “Eastern [inaudible 00:42:45]”

Flynn: “Looks like Nic’s going to come back in a moment, we hope.”

Austin : “Do we lose Nic?”

Flynn: “There’s Austin, there’s Nic. Nic, I can hear you again. Excellent.”

Austin : “It’s the tubes across the [Atlantic 00:43:02].”

Nic : “Awesome. Sorry, about that. I think I’m having a little bit of problem.”

Speaker 3: “The packets guys are confused about whether they’re supposed to be on the right or the left.”

Austin : “It’s Brexit.”

Nic : “Yeah, exactly, Brexit. I was just about to say. I think we’re having a few problems with our economy over here. Somebody maybe forgot to pay the bill on the internet and we got a little bit cut off. You’re asking about eventual consistency and …”

Flynn: “[inaudible 00:43:32] Yeah.”

Nic : “… and then things like that. A lot of the software right now is trying to solve the problem of eventual consistency so things like Consul and Zookeeper and Etcd, they’re trying to reduce the latency on that so that the service and the cluster are all as up-to-date as possible. Now, I think that’s a great thing but it’s an incredibly difficult problem to solve. I’m also going to say that maybe it’s not a really important problem to solve so I think maybe when you’re looking at sort of like if it happens eventually.”

“If you’re dealing with the problem at a client level, if you’re implementing the bad sort of fault tolerance on the code level in your clients, it doesn’t really matter if you’ve got slightly out of date. I am talking slightly out of date, I’m not talking hours. If you’ve got slightly out of date from your service registries because all you’re going to do is you’re going open the circuit only. You’re going to be clustering your microservices anyway, right? That’s one of the beauties of being able to run that fault tolerance multi-incidence pattern.”

Flynn: “When you get down to it, even if your [inaudible 00:44:55] is out of date, as long you still have services that you can talk to that you know about, you’ll probably still going through [inaudible 00:45:02] that.”

Nic : “Then, I think one thing that I maybe failed to mention when I was talking about a lot of the patterns as well is that the key thing is that you need to know when these things are happening. I need to know very, very quickly when my circuits are breaking. You should really make sure that you’re monitoring as reporting those things out. Again, that’s like call to anything that implements the Hystrix pattern. Again, it’s really useful for both the software engineers, operational engineers like as a devops mentality to be able to see that a particular service is flapping or maybe that, actually a link in a circuit is permanently broken.”

Flynn: “Yeah, and all of those are unfortunately difficult problems. Telling the difference between transient, permanent failures and all that.”

Nic : “Yeah.”

Flynn: “Somebody named Daniel Bryant, but I don’t know who that is but he’s asking about balancing obstruction versus mechanical sympathy. How would go about how as an applications programmer, you want to be able to work on your application, you don’t want to have to mess about with all those deployment business and all those sort of thing. Could you talk a bit more about that? Do you think that there is … maybe the way I put this one is at what point would you be inclined to tell the applications developer “No, really, you’re going to have to understand some of these stuff versus …”

Nic : “[inaudible 00:46:31]talking like day one, right? I mean the thing is in … I think, again, I wish I had caught Daniel talking, I’m going to watch the videos later on but he mentioned about the full stack thing, right?”

Speaker 3: “Yes. I thought it was a remarkable point.”

Nic : “Yeah, but those are Mesos. The way I like to look at it is that as an application developer, you treat your application like a child. You are responsible from birth right the way through until you deliver it, through a university and to the point when it can move on and meet up with another application developer who will take care of … for the rest of his life but you have to understand the full life cycle of the stuff that you’re dealing with. It’s absolutely incorrect and wrong for an application developer to say “Well, hey, I’ve written a code. I don’t need to know about Docker, right?” I don’t even know about how Dockerfiles work.”

“Well, yes you do, you really, really do because the Dockerfiles is the kind of the final layer of abstraction between you and the metal. There’s a lot of stuff in there that is code-related. You might have conflict files being generated using Consul template or something like that. You really should be looking at things like Docker as a great way as … you can enhance the quality of your application. That you can leverage it to be able to test your services, to be able to test the boundary of the services.”

“You shouldn’t have to be testing Mesos or Kubernetes or something like that. There’s going to be a vast amount of people dealing with that problem but the life of that cycle, of that application, you should you know about it. You should understand how the service discovery works because you need to be writing the code that interacts with it. You need to understand what the consistency is on it because you need to be understanding what are the levels of tolerance of failure that I’m going to have to deal with. Dealing with failure in a microservice environment is absolutely essential. It’s going to happen. It’s just the way that it works and it shouldn’t be a problem if you take a little bit of care.”

Flynn: “In fact, I would argue that … I would say that it would happen frequently.”

Nic : “Yeah. First, I mean you will know on a daily basis, right? We will lose a Mesos Slave.”

Flynn: “Daily, hourly.”

Nic : “Daily, yeah. We don’t care. We’re like we’ll see the messages coming in production incidents, Mesos Slave has gone away. I’m like “Yeah, whatever.” It will recreate.”

Flynn: “In a perfect world, and here, I’m going to go back a bit to Dan’s point about the full stack developer … Excuse me, and the quotation he threw up on the screen was that if you’re not designing the chips as well as writing the web application, then, we don’t want to talk to you, right?”

Flynn : “Yeah.”

Flynn: “How far down the stack though would you say, in a perfect world, the applications guys should really need to worry? Are you looking ahead to the future where a developer doesn’t necessarily need to know about the Dockerfile or are you thinking more that even if you don’t need to, it’ll probably still be useful for you.”

Nic : “I think you do. Our application engineers write the Dockerfile. They do so, we’ll pair it up with operations and dev ops because obviously, there’s an area of expertise which can be brought to that process but they’re responsible for doing it. The reason that they’re responsible for doing it is, again, because they’re responsible for taking the code and ensuring it gets deployed into QA or production or wherever it is and that it’s running correctly. They’re also the people who are going to get woken up in the middle of the night …”

Flynn: “We’re just going to have to [inaudible 00:50:36] that.”

Nic : “… if something goes wrong, right?”

Flynn: “Yeah.”

Nic : “If I’m going to get woken up in the middle night, the first thing is I’m going to make sure I know what’s going on because I want to get back to sleep quickly. The second thing is I’m going to try and do my very best to ensure that I don’t get woken up in the first place. I think from a curiosity perspective, I mean I’m personally a genuinely curious guy, right? If somebody says “Hey, we’re running this on Mesos,” my next question is “Why? Really? How does that work?” I want to know this stuff. I just like, I like knowing how stuff works and I think everybody should be encouraged to that because again, I don’t have to have an in-depth knowledge on how can install a Mesos cluster but I need to understand that containers are transient, that they can move amongst boxes.”

“I need to understand that they can just disappear. I need to understand that Mesos could just terminate my container halfway through a process. What happens if I’m dealing with a request which is, I’m dealing with a payment or something like that. Mesos could just kill my container, right? I could get halfway through that so I could have taken the money from the customer but I’ve not actually got to the point where I’m just batching the goods. I am aware of those situations on how things work a little bit lower down and that helps me to design better, more robust application code.”

Flynn: “Yeah. I’m sorry, I was just having flashbacks of how you deal with robust payment code.”

Nic : “Yeah.”

Flynn: “One of the things I think you mentioned earlier also was about configuration with respect to your services. Are you guys just throwing everything into Consul and Zookeeper and the like or have you evolved in other scheme or …?”

Nic : “Well, we’re looking at that right now. We’re looking at a pattern. We actually genuinely like Consul as a kind of a key value store. It affords us a lot of flexibility. What we don’t do is we don’t touch Consul manually. All of that configuration is versioned and get[inaudible 00:52:56] and it’s an automated process which will update Consul. One of the things that we’re trying to look around is how we can be a little bit leaner on Consul because if we’ve got …”

Flynn: “Leaner on what?”

Nic : “… services which are … Like being a little bit lighter on the resource usage because it’s really easy to just write a Consul template which is a running a watch on like say a hundred different keys. You do that on a thousand incidences of different services and your Consul cluster can be very fast overwhelmed. We’re looking and saying “Well, we should be doing some better way of doing that.” Could we, should we tie application deployments to config so if we update the application config, should we just restart a service. Then, actually, Consul template will only run once. That’s an actually an okay pattern to use when we’re running multiple incidences because Marathon will do a roll and restart on our clusters … Sorry, on our services to ensure that we don’t lose capacity.”

“We’re in a place at the moment where it’s not really a problem. It will become a problem and I think the container pilot pattern is something that we find quite appealing because what container pilot can do is container pilot will pull Consul rather than sort of listening for changes and keeping open connections. You can define how often it will pull. Then, as a process of that, I could still run Consul template and generate my config file and send a message to the application to do an update. I think it’s really easy to just say “Well, hey, we’re in the cloud. So therefore, resources are unlimited.” Yeah, they are until you find they’re not. I think it’s nice to think about not being too cool on premature optimization but just think about some stuff and just realize …”

Flynn: “There’s difference between premature optimization and warranted optimization, right?”

Nic : “Yeah.”

Flynn: “Let’s see. Questions are showed up about do you have a preference on the ideal number of developers who are giving service? What size of team do you like to see working? How do you structure things at Notonthehighstreet?”

Nic : “Sorry, was that what size is our team and how do we structure things?”

Flynn: “Well, for a given service, how do you attempt to structure the teams there? Do you carve out teams for a service? Can you talk a little about that in general?”

Nic : “Yeah, of course. We’re experimenting a little bit at the moment. What we have is we have about, I think it’s eight product teams and the product teams are broken up into different areas of the business. We’ll break things up into consumer which is, in effect, front end partner which is application services which serve our partners which are the people who make the wonderful products.”

“Also, payments and checkout and authorization and that kind of get into another sort of subdivision. Each of the teams are responsible for the work that they produce and they’re responsible for maintaining those services for … Yeah, exactly, two pizza … Well, two pizza teams. Pizzas aren’t that big in the UK, I mean seriously these angry developers really got one pizza but in essence, they may develop a service and they will own it for the life cycle of that service. They’ll look after it from a … if it goes down in the middle of the night, it’s the developers who built it who will get up.”

“Now, the benefit that brings them is that we’re not putting any constraints on what languages they’re working in, right? If somebody wakes up in the morning, reads a great article that Elexir and Phoenix are the greatest thing in the world and they want to build a service in Elexir and … Then, you can do that because ultimately, I’m not going to be waking up in the middle of the night if your service goes down.”

“If you believe that that’s the right thing to do and you’re willing to support that, then, you’ve got architectural freedom to do it. That causes a few problems because you can end up in a highly polyglot environment. You do have to think about the life cycle of the service, but again, it depends on the scale. The teams also have the options. We don’t want to be restrictive around backlogs. I don’t want to be waiting on on team B because team B are the only people who can make a change to say the payment service which I need on the front end. We don’t have any hard rules on that. I have the ability that I can go and change the code …

Speaker 3: “Who’s got [inaudible 00:58:35] or …

Nic : “… in team B’s service.

Flynn: “Sorry, let me [inaudible 00:58:39] that. Is that something …”

Nic : “But because I’m not going …”

Flynn: “[inaudible 00:58:39] need permissions or is that something that you deliver and structure things [inaudible 00:58:44] and then the developers can go and touch it [inaudible 00:58:42]?”

Nic : “Yeah, we don’t do any heavy hierarchical level of permissions. We have one permission level which is developer and you’ve got access to every piece of source codes.”

Flynn: “Your secrets and stuff.”

Nic : “I mean my secrets and stuff are stored in a different place but like as Github goes like you’re a developer, you got read and write to everything so I can change a different team service. Now, the other team doesn’t have to accept my change, right? Because ultimately, they’re the ones who are supporting the service. If I’ve written junk, they don’t want to just accept anything [inaudible 00:59:28]lunch but what we’re trying to encourage is that I will have that conversation, maybe go to that, stand up, “Hey look, I really want to make this change to the payment service. This is what it’s going to do. Do you have any objections?” “No, we don’t,” “Do you have any useful information for me?” “Yeah, this, this, this and this will be really helpful for you.” Now, I “Okay, great.”

“When I’ve made it, are we cool to pair and look at the polar request and I’ll just make sure that the coding standards are right for your team and that I’ve got everything okay, and they’re like “Yeah, okay, let’s do that.” We’ll do that and we’ll have that collaborative approach. They’ll check my code. They’ll look at it and go “Dude, you seriously don’t know how to code a job. This is a BS. Let me rewrite this for you.”

Flynn: “Yeah, it makes sense.”

Nic : “No, it’s … but the [inaudible 01:00:21] and then, we have that collaborative approach. Yeah, my job was really bad. I’ve not code a job in 16 years.”

Speaker 3: “I’m going to say nothing of that Java …”

Nic : “Those times [inaudible 01:00:37] without a platform.”

Flynn: “Yeah, we’ll just leave it at that. To kind of close the circle on that one. Where do you store secrets? Preferably, I’d like the key to the vaults for that.”

Nic : “Yeah, no problem. I’ll email you that earlier on. We use Vault, Vault for passwords and things like that. The access to those are … I don’t even know what they are, to be perfectly honest with you.”

Speaker 3: “That’s actually great.”

Nic : “I genuinely couldn’t tell you. I’m not just lying here. Yeah, I’ve never needed to know where they are. The way that the system works is that they … are just already there in the key value for me to use as an application, so I got no idea where the operations guys store the secrets. I’ve got every confidence that they are somewhere safe and then, the password won’t [inaudible 01:01:46].”

Speaker 3: “You know who to call, right?”

Nic : “You got a question there. Yeah. Do we do TLS between services internally. We don’t. Right now, so we are looking at how we’re managing our services and how we’re going to deal with a better level of authentication and a level of trust. We don’t want to be down the sort of the desktop security process which we’re unfortunately verging towards at the moment. We are going to implement a principle of no trust between services. A service will have to pass a JWT, the JWT will have to be signed and the receiving service will have to validate the signature before it would accept the request.”

“By taking that a level further and we’re going to put TLS between the services and using self-generated keys again to ensure the traffic. Yes, we are … We don’t do that right now but that’s something that we’re working towards to implementing, hopefully, across this summer. We’ve got to figure out a lot of things out around how we can rotate the keys around the various different services and we’ve got half an idea on how we’re going to do that, but we really need to test it. We don’t want to be in a situation that we rotate a key and find out that none of our services can communicate because that hasn’t propagated properly.”

“Yeah, I think it’s a great pattern. There was a great talk. Again, it’s called desktop security. You can find it … I think it was at O’Reilly Conf in April earlier on this year. You can find it on Safari books and stuff like that. Definitely check that out. There’s some great patterns around implementing security in your microservices.

“I believe there’s also a great talk at DockerCon which is just happened. I haven’t watched that yet. I’m going to check that out this weekend when I’ve got a little more free time, but yeah, check that out. Should you do TLS, personally, my perspective is yes, yes, you should do it.”

Speaker 3: “It’s relatively easy [crosstalk 01:04:02]”

Nic : “It’s not going to cause any problems.”

Speaker 3: “… getting your certificate of authentication and so they’re just protecting the channel. Yeah. We should probably talk a little bit more about fun we’ve had with JWT and Java not too long ago. On that note, I think that we are out of time and need to wrap this one up. Thank you very much, Nic.”

Nic : “It’s been really great speaking with you all and thank you so much for listening. Seriously, any questions, I love talking about stuff. Just hit me up.”

Expand Transcript

Stay in the Loop

Keep up with the latest microservices news.

Simplify and streamline microservice deployment.

Try the open source Datawire Blackbird deployment project.