Ancestry’s Journey towards Microservices, Containerization and Kubernetes – Paul MacKay

Paul MacKay (Ancestry)

Description

Adopting new development approaches such as containerization is a big change for traditional enterprise environments. Ancestry, the global leader in family history and consumer genomics, has been a big data company long before the term existed with billions of historical records and millions of family trees, much of which ran in a traditional IT environment. With a new flood of genomic data from its AncestryDNA test and the desire to continue to increase the speed of innovation, Ancestry adopted containerization and micro services using Kubernetes orchestration APIs. This session will describe Ancestry’s journey to containerization and how a coherent and consistent API set such as Kubernetes can aid companies looking to make a similar transition. Paul MacKay, one of Ancestry’s Software Architects, will discuss what the company has learned during the past few years of development from both a technical and cultural change perspective.

Presentation Slides

Transcript

Kelsey Evans: All right, hello everyone, and welcome back to day two of the Microservices Practitioner Virtual Summit. Thank you all for joining. I’m here with Paul MacKay from Ancestry, who’s going to talk about their adoption of Microservices and Kubernetes. Just to point out the features, if you have questions throughout the presentation, you can post them in the Q&A using the Q&A button at the bottom of your screen. There’s also a chat button. Feel free to use that as well. And additionally, if you would like to live chat with attendees during the presentation, you can use our Gitter channel, which is gitter.im/datawire/summit. So with that, Paul, I’ll hand it over to you.

Paul MacKay: Well, thanks. It’s exciting to be with you this morning. My name is Paul MacKay, I’m a Software Engineer Architect at Ancestry. And I’m excited to be a part of this summit, and to kind of explain some of Ancestry’s journey towards Microservices, Containerization, and Kubernetes. And hopefully, there are some things that you can glean from us, as we glean from others, to help you in such a process.

As with any type of journey, it’s always good to know where one has come from, and the notion of figuring out the items that you want to change. And also to ascertain along this journey the things that you’ve learned. And then, of course, then to discover kind of where we are, as the adage goes, where we want to be up around the bend and up the grade.

Anyway, I’d like to start with just kind of where we come from. Many of you probably know and have heard about Ancestry. We truly are a science and technology company, that we try to make sure that we bring people together. As far as a background goes, we started in 1983 as a publishing company. And back in 1996, ancestry.com was launched to be a site for family history.

And then over the years, we’ve become the global leader in family history and consumer genomics. And our whole goal, the reason why we’re here, is to help harvest all that information to provide connections with family trees and historical records, and DNA to help people find more about themselves and how they became who they are because of their ancestries and connections they have.

It’s interesting, with our DNA, that we’ve been able to … Ancestry DNA, also to provide some interesting connections with ethnic groups and cultures and histories that people may have not known about. And hopefully that people will find out that we’re more related than we’re not in this world.

As you can imagine, data drives our business. We have a lot of data. And what we try to do is present that data in a way that will help people achieve this goal that we have for them, to be connected. We do have 20 billion historical records over that, and of that we then have 90 million families, separate trees with over 10 billion profiles. There’s, as is noted here, 175 million shareable photos, documents, written stories, collections that provide an incredible wealth of information for people. A large sum of data petabytes of data. And in the Ancestry DNA network, we have over four million members. And currently we have 37 million third cousins and closer matches that we’ve been able to find to help people, again, achieve that mission.

Let’s talk about some of the technologies that we’ve used in the past, what we’re currently using, and some of the things that … Maybe some of the difficulties that we’ve had along the way. First of all, as I indicated, we began in 1996, and we became a … We started out on Microsoft Windows, using a C# and .NET frameworks to provide our services. We of course used a lot of the Microsoft, great technology that they have, and SQL Server, IIS, and other things, to grow our business, to grow our site.

And over time, we then also adopted other open-source technologies. Java and Node.js, Python, and then running them also on other OSes such as Linux and such. We’ve had our own private center, in which we’ve hosted Ancestry. And also in that data center, you can imagine we have thousands and thousands of servers, and thousands of VMs running on those servers. And when we talk about the whole experience of all of Ancestry’s properties, we have hundreds of services.

Along the way, we have at times broken up our services from being very macro, or these large services, into micro-size services. And how they all interact with each other is usually through REST interfaces or through other protocols such as Thrift, Protobuf for high speed on the back end. We’re a traditional deployment shop, meaning that we believe in the adage of continuous integration and deployment. We use Jenkins and Go and other things that are very useful and great technologies to help us in managing our deployments.

What we’ve done in the past, as I mentioned, we were using a lot of .NET and other technologies that provide some great, rich environments. But what we did was, in our data center kind of created this convention of a service, whether it’s a large macro or a micro service, that we’d have one of those services per VM. And so the deployment model was such that you’d have to provision the VM, customize it for the environment that the service is to run in.

And this process would take quite a bit of time from our staging environments to the production. And some services would take 20, 50, over an hour to get from that staging and integration type environment to the live production environment. So let me just talk about kind of how our journey has been changing over time, and kind of why we chose to change.

About three years ago I attended QCon, which is a great conference. And in that conference, the buzz was Docker. Now, in my former lives in other companies, I have worked with software virtualization, I have a Unix/Linux background, and so I understood the need, or the benefit I should say, of Linux containerization. And so when I saw Docker as being a way of providing such a great, nice runtime for containers I thought, “Wow, this is something that we ought to adopt.”

And so we began to experiment with Docker. And we used Docker Compose to facilitate the bringing of various services to coordinate with each other. We actually created our own little agent, Docker agent, to run on Linux boxes so that we could do some remote deployments. We could start and stop, we could destroy or delete, and then pull down these images and such. And it really turned out to be quite good We were able to demonstrate to ourselves, as well as to others in management, that it was pretty easy to deploy and to scale up these services.

And we were able to really prove that the deployment times would be greatly reduced from the current methods of a single service to VM. And again, because of this great model of containerization, it … From microservices or any kind of services, it’s just so much easier to deploy. And again, one kind of a nice benefit to all of this is, that we were able to prove out the notion of a greater density than just having this one VM per service. And so it … Able to utilize our computing resources more effectively.

And then what we were able to do is, we were able to create our own. We had some services that we broke apart, that we were experimenting with the benefits of microservices, and so we broke them apart. Actual services that were part of the … That make up the site. And we actually started deploying them, using this method, using docker itself. Well, so you can imagine that in such a process, you learn a lot. And so, I’d like to kind of just discuss a little bit of some of the things that we did learn.

First of all, adopting new technologies is hard. It’s hard to do. And one of the reasons why, I think, is we developers are kind of stubborn. We are very comfortable with things as they are, how they’re done today. We kind of feel like the old adage, that hey, the old way’s working, why would I want to change? And we actually feel like we’re faster. We know that way of doing things, and so why would we need to change?

And it’s hard for us developers to actually, to see the advantages of some new technologies or paradigms. To be able to see that hey, changing to something new really would be something that would be benefiting. Especially with given schedules, and the pressures that are put upon getting the features out. But as in any type of adoption, of new technologies, processes and paradigms, there’s a real cost to it. You just can’t … There’s nothing free in this life.

And so what’s interesting about it is, that you have to recognize right up front that any type of adoption of new technologies takes away from developing the features, or things that you’re actually providing to your customers. I mean, we’re a genealogy and a genomics, a consumer genomics company. We’re not in it to create deployment methodologies or infrastructure, or whatever. We’re in it to create great features, secure and wonderful features for our customers.

And so, new technologies can be very disruptive to schedules, and it causes product management and other upper management to really have to be convinced as far as why would we want to divert our attentions, our resources, our efforts away from new features, or at least postponing them, to develop or to integrate new technologies into the company.

Well, so, those are some of the interesting challenges that we learned along the way. And some of the early discoveries which are kind of, some people would say, “Well, yeah, all that’s kind of obvious.” But people … developers have a lot of different opinions as to the appropriate size of their services. Some are very comfortable with large macro, monolithic-type services, and others are very accustomed to the notion of microservices, to have single intent and all the benefits of a micro service.

And so you just have this wide spectrum of macro to micro, and those opinions just keep on, probably in your corporations, keep on … It still exists. And the other thing is, and when we understood that containerization was going to be a great benefit to us, we realized soon that the typical off-the-shelf distros, Linux distros were just, they’re just too big. One, they’re not built specifically to run containers or Docker. And they have just a very large footprint, when it comes down to deploying just off-the-shelf.

We also realize that with these kind of off-the-shelf Linuxes to just run the containers that we wanted to, that there are so many packages that they would install. And then to kind of cull it down took a real process to make it so that it was lean and mean. And so we also found that more packages, of course, means more updates and perhaps even more of a footprint for security vulnerabilities.

We also found, as probably many of you, that Docker containerization really needs newer Linux kernels than most of the supported type of distros that are out there. And in fact, most of the good environments for running docker are running on almost bleeding-edge, leading edge of the kernel.

The other thing is, is that we discovered that even though we knew the benefits of Docker and containerization, we knew the benefits of microservices and such, that we needed some training. We needed to train again. We had a lot of Windows developers, good Windows developers, that were not accustomed to Linux, containerizations, the various tools, the various concepts, and even some of the paradigms of how to decompose various services into smaller chunks.

And the other thing that we found out early on is, you just can’t dictate how big a service should be to developers. There are appropriate sizes, and not one size fits all. And so it was detrimental to try to say, “Okay, everything needs to be broken apart, everything needs to be in this type of a size or decoupled methodology.” And so, we also realized was, okay, we found out how this wonderful flow of taking containers and services, whether they be big or small, all the way to production, is really facilitated using containers. We found out that container orchestration is hard to do really well. And so it became a little bit of a need that we needed to go beyond where our own capabilities were to accomplish that.

The other thing I think is important, maybe I should go back to a little bit too is … Let’s go here. That adopting these new paradigms, I found as in this exercise as well as many other exercises, there’s some basic principles or patterns in any type of adoption of these new technologies or paradigms.

First of all, you have to understand your current technologies. You have to understand how developers do things today, why they do them, how they do them. And it’s that old adage, it’s better to understand first than to be understood. One cannot dictate a new technology just because it’s new. You have to understand how to adapt to somebody’s environment or philosophy in how they do things.

The other big thing is to … Is that in any adoption of technology and in changing, you need a patron. You need somebody to help give you that air cover. Gives you the time, the resources, the real ability to experiment, to do new things. And you just can’t do that without having somebody having your back. In that same vein, you … It is this approach that you need support from up above, but you also need this groundswell. So it is a very complementary cooperative type of approach. You have that groundswell of pushing technologies from the developer level, but also then having somebody above that can then do the things they need to do to help make that survive.

The other thing is that in any type of new technology adoption, as I mentioned before, you just can’t push it on people and just say, “Hey, new is good. And what I’ve read, Hey, I read it on the net, on the Web, it’s gotta be good stuff. We need to adopt it.” You really have to own your own stuff, meaning you have to change your own way of doing things. You need to adopt your own ways of adopting this new technology, in order to be real. You just can’t do simple POCs. You actually have to do something that shows that the technology is useful and is viable.

The other thing, too, is to really create a partnership. In any type of a rolling-out of a technology or a paradigm such as decoupling into microservices, you really need to create a partnership. A partnership with pilot teams. Now, when I talk about pilot teams, I’m not talking about just POCs or proof of concept. I’m talking about teams that have real problems, and also have a diversity of those problems. It just can’t be a simple way of showing, okay, this works for this team, so therefore it must work across all teams.

We want to make sure that the teams knew that we were with them, that if they didn’t … If they were not successful, we were not successful. That we were part, meaning we had a service or services, that we’re part of this change process. And there’s other need to be very agile. It’s amazing when you adopt new technologies or new paradigms, that you’re really learning on the fly.

There are a lot of things that you can anticipates, but there are many things that you can’t. And so both from your own perspective, those that are involved in this as far as these pilot teams, as well as the patrons or your management that is supporting you, this notion that there will be mistakes made. And that’s okay, that we just need to work together, and that we’re all going to make each other successful.

The other interesting thing is, when it comes to services, microservices, is this notion of how do you … What is the right size of a service? I mean, there are some nice best practices, and there are some good things that we’ve all heard and continue to hear, and there’s some great wisdom that’s learned by many people who’ve gone through these processes. But one thing is, is that I think the key is to be very pragmatic. The notion of just breaking up a service just because hey, that’s the vogue thing to do, is not, it’s just not going to fly.

So we just don’t break up a service just to break up a service. And that there’s this notion that, again, that things are not free. That there is a cost in managing these services. There are things to consider, such as network latency, among these various services. The notion of monitoring now. If you break up a service into many services or microservices, you have to worry about monitoring. You have to worry about coordinating these, the deployments of all these services. You need to understand how you scale. Do you scale a portion of these microservices, or a subset of these microservices? Or do you scale all of it at once? So there’s a lot of things to consider, as far as the cost of managing these services.

And the other thing true is this notion of, are they really going to be used independently by other services? Yes, there are some things that are good for academia and for doing things. But there really truly is down-to-earth, day-to-day … Are these things going to be used by other services? Does it really make sense that these services will exist by themselves, and that it really is something that is useful for the ecosystem. And I guess at the bottom of every kind of decision would be, is to be pragmatic and not dogmatic when it comes to determining the size of services. And I think it’s being very flexible in that pragmatic approach.

So as I mentioned Linux, when we wanted to run containerization, we realized soon that we needed to find something that was really geared towards containerization, towards Docker. And so early on, we chose CoreOS. Many of you may be using CoreOS, or have heard of it. It’s a very lightweight OS that has running of containers in Docker as a first-class citizen. It’s not kind of bolted on after the fact. It’s also nice that the OS objects are holistic. Meaning, there’s a … You can automatically push to machines, to a specific partition, kind of a kernel partition if you will, that it has two partitions. And so that you can actually be running on a version of what’s the OS, and then you can switch to another one. And if things aren’t working well, then you just go back to the previous version.

Again, there’s not this notion of bringing down all these packages and hoping they all work together. No, you update CoreOS as a whole. And then as I said, it’s nice to be able to roll back if there’s something that’s not working well. Again, coming from a Unix/Linux background, less is more. And so this notion that if you have fewer packages, you then have fewer vulnerabilities. The attack surface is less. And so this is an important factor when you’re considering this. And again, because only Docker containers run on there, and it’s so small as an OS, it’s very infrequent that we have a direct need, or a need for direct access to the actual machines.

So now let’s come to the realization of, how do we orchestrate with containers? As I mentioned, it’s not an easy task if you’re trying to do it all by yourself. And again, I think we all know, given this summit and others that have experienced it, containerization is really an incredible asset to anyone’s environment that wants to use microservices.

So as I mentioned, we started early on. And so, over two years ago, when I was looking at how to do orchestration of containers, and as I mentioned we kind of created our own agent to do it, and found out that it was kind of hard to do. I looked out and saw that there was this thing called Kubernetes. And the notion of Kubernetes was such that it was pre-beta. But I could see, given my previous experiences, that this was something that was going to be very, very useful to us.

At the time, of course, there were other orchestration-type tools such as Mesos, which came from a grid, large, large cluster environment, and those are wonderful for those types of environments. But we were coming from a, kind of a medium small to larger. And in looking at the architecture of Kubernetes and where they were trying to go with it, I kind of put my stake in the sand and said, “Okay, this is going to be our orchestration technology, even though it was beta.”

But yet I felt comfortable. Again, when we talk about adopting technologies, new technologies or paradigms, it’s good to, as they say, to be on the shoulders of giants. And so it’s good to be able to see that people are actually doing or have done it, and they’ve got a track record. They’ve got some wounds to explain how to do things. And of course, as we all know that Googles maintains monitors and launches billions of containers a week. And so we knew that the technology, or at least the idea, is the architecture that was coming out in an open-source form in Kubernetes was something that would be extremely viable to do.

So we created a very small, kind of a sandbox cluster, for these pilot teams. We gathered, as I mentioned, a “committed.” Now I say “committed,” it’s not just saying, “Hey, that’s kind of cool stuff, I’d like to be part of it.” No, we really wanted committed pilot teams. We wanted their management to commit to it. We wanted the individuals that were a part of those pilot teams to be committed with this effort, as we were trying to break their services apart, and as we tried to use Kubernetes for the deployment of those containers.

And so, you really do need to have committed teams. We had daily stand-ups. We made sure that … We wanted to make sure that we addressed their problems. And again, when I talk about the committed pilot teams, we also wanted to make sure that they were a diverse set, that there were some hard problems to be solved. Solvable problems, not things that would just cause a short-circuit of any adoption because we just couldn’t do it. But hard and unique and different problems. Meaning, we had teams that came with some large services, we had some others that came with smaller microservices. We had some that came with Java, Node.js and Python or other things.

And so we wanted to make sure that we had a good spectrum to understand all the various nuances of adopting the new technologies. And tat we made sure that they knew that we were there to stand with them every day, and find out how to address their concerns early on.

And so we did this, and we … Also as I mentioned, we had to provide some training, not just in the Docker and Kubernetes training, but Linux and other things, to kind of onboard people. It’s one thing to say, “Hey, this is cool,” but it’s another thing to make sure they feel comfortable, that they feel empowered to be able to adopt this, that they don’t feel like … That they’re so … It’s so foreign that they just want to go back to their old ways.

So we started providing training, we started providing also some templates and best practices and scripts to really bootstrap, and to help them really jump-start their initiation into how to break up these services, and also how then to play them using containers and orchestrate them using Kubernetes.

So now we kind of come down to kind of where we are. Where are we in this journey, and where do we kind of think we’re going? Well, what we’ve developed over the past couple of years, as I mentioned, we started with Docker almost three years ago, and with Kubernetes about two and a half years now. It’s great that Kubernetes has had its second birthday just last week of its first release. We actually were in production in, about a year and a half ago with some services.

And so given that couple of years of experience with Kubernetes, we’ve been able to develop some deployment standards, or at least suggestions, in deploying services. For one, we create a namespace for each service. It doesn’t matter the size, whether it’s a macro or a micro service. We feel that the notion, or the convention, of a namespace, a Kubernetes namespace, per service is very viable, and actually very useful.

As you know, in Kubernetes namespace of services have this notion of being self-discovered. And so we came up kind of with a naming convention, so we could kind of understand the big mess of all these services. And also make sure that it was using the lowest common denominator of being DNS-friendly. So all of our namespaces or services have a convention of a functional group name and then the actual service name, so that we can actually kind of know what’s going on and who belongs to what, or what belongs to who.

We also came up with this convention that you should only have one container per pod. Now, again, it’s a convention. We just feel like a pod is the deployment process, it’s the deployment unit. And so therefore, oftentimes then, just a container per pod is a viable paradigm to follow.

The other thing that we found that was useful, and continue to find useful, is we started with very wide privileges. And this is dealing with the deployment model. And we’ve only narrowed them as we’ve needed to. The areas that we’ve kind of narrowed were, of course would be in secrets, or other things that are necessary to provide some sort of a very secure environment, which we have to have here at Ancestry. But we still did, and still do, allow developers to deploy all the way to production. And as I mentioned, we do control certain things that need to be controlled by operations and intersect group.

Another thing that’s, I don’t think unique to us, but something that we’ve been able to establish as a convention, is we have multiple clusters to provide all these deployments into. Some people use a cluster and then use namespaces to divide those up. We find that it’s more secure for our purposes and easier to do, to create separate clusters for each environment. So we have a dev environment, we have a staging environment, and then of course we have a production environment.

And as I mentioned, we do have our own private data center, but we also are moving to the Cloud. And with our great partnership with AWS, we then have kind of a high [inaudible 00:31:26] environment. So we actually have clusters here in our data center, as well as we have clusters in AWS. And so we find that that’s a very good model for us in our deployment.

The other thing is, what’s wonderful about Kubernetes, as you may know, in deploying microservices, is this notion that there is an intra-cluster DNS server in the cube DNS. And there’s a service discovery. And so one doesn’t have to go out of the cluster to talk to another service if they don’t want to. So again, as I talked about with microservices, there can be … One of the drawbacks could be network latency. The thing that we found is that that is greatly reduced in Kubernetes by using the intra-cluster DNS service.

Our production environment is … As such, we do have our own Docker registry. As I mentioned before, we don’t have a dogmatic approach to the size of a service. So each of the services could be large or small, but each of them do have their own repository in our private registry.

We, for monitoring, as I mentioned, with microservices and any services in general, monitoring is a huge concern. Making sure that things are, at any size, ar conforming to the SLAs, the service that you really want to render for other services that are calling you, as well as of course to your ultimate customers. And we use Prometheus to do that. Again, it’s very much … It knows Kubernetes intimately, and so we can go from the container level to the pod level to understanding how the services are running. So it’s been a very useful tool to us.

As I mentioned also, with all these services, these hundreds and hundreds of services, logging becomes a very crucial thing. What we’ve done is, we’ve created our own cluster-wide logger. We’ve indicated that if you want to participate in this, all logging is put to standard out. And so, then we have on the ring node, we have this log for the [inaudible 00:33:56], and then this sends it off to a centralized Kafka cluster that we then use ELK and other things to have our teams make sure that they understand where their logs are and how they can get their logs, no matter what size of service.

We also created a namespace portal that we use, so as I talk about these microservices or macroservices, and all these different clusters that we have, we made it so that it was easy to create these namespaces, these services, and that it would be propagated across all of the various clusters, both on-prem and in AWS.

We also created an authorization framework, if you will, for Kubernetes that allows us to have a finer granularity. So that we can then start very broad, and narrow as we need to for the capabilities that developers would have with inside the environment.

And one thing too, is that we require resource quotas. It doesn’t matter how small the service is, there’s an impact on the whole when you’re putting them on the node. So we do require that there is a CPU and a memory quota for every service. And the reason why I say it’s soft is because we kind of establish a kind of like, here’s the bar right here. But given, of course, the need of a team, they could come and say, “Oh, I really need this, these number of cores and this amount of memory,” and we can certainly up it for them.

We have also provided some quick-start tools. Again, this notion that if every team has to adopt this new technology or paradigm, they have to do it all themselves, reinvent it all themselves, that provides a kind of a roadblock, a barrier to adoption. And so we created some internal tools that would help. The tools quickly deploy any size of service. And what’s nice about these tools is they work across all clusters. So you can just deploy a very easy pipeline to, across any of these clusters that we have.

Also, it’s nice to be able to have some tools that provide some best practices or conventions, though, that for instance, the services have the certain annotations in AWS to make sure that they’re private, not public [AOBs 00:36:21], or that they have certain X-forwarded, forward-type things, so that we can see IP addresses and such.

The other thing too, is that we want to make sure that there’s no magic behind any of our tools. We don’t want to get in front of any of our developers. One of the things that I have seen in the past is sometimes dev ops can control too much of the environment, and that there’s just, there’s too many behind the curtain. Nobody knows what’s really going on. And so we try to be very transparent. We provide the quickness, this nice tool to aid in deployment. But we make sure that it’s transparent, that they can generate just the standard resource files for Kubernetes, and then they can deploy it themselves that way if they want to. But we do have this tool to help them.

We’ve also created some scripts to help with the insertion of secrets, to version them. We provide also some tools that we created to create a backup of our clusters. And my old adage is, I really don’t care about backups, I care about restores. And so the notion that we can actually standup a cluster very readily and rehydrate everything that exists into that environment.

And then we have created these other, as I mentioned, these scripts that help us to create clusters on-prem or in AWS very quickly, whether they may be for individual teams, or for R&D type purposes, or for specialized environments that need to be there. And then what’s nice about all this is that we’ve been able to facilitate the deployment of any type of service, any size of service. We don’t … We’re not dogmatic, as I mentioned before, in what that means, how that looks like.

So, where are we? And where do we hope to be in the future with microservices, containers, and Kubernetes? Well, as I mentioned, we do have several clusters. I think the last count we had were nine clusters, that we have both on-prem and in AWS. The production clusters, there are hundreds of nodes, and in those environments we have hundreds and hundreds of namespaces or individual services, and then thousands of pods that are running within them.

I also mentioned that we have our private data center, but we’re also moving to the Cloud. And so again, we’re not stuck with any one solution. We can use Kubernetes to deploy across both our own data center and into the Cloud. And as I mentioned, we do have hundreds of these services, and it doesn’t matter what the sizes are.

We do have live production traffic, as I mentioned. One of the interesting things that we early on launched is an app called “We’re Related.” And the We’re Related app, it’s kind of interesting, is that this team decided to take this adage of breaking it apart. And so it is made up of 14 different microservices that help coordinate themselves to provide this single environment for our customers. And what’s nice about it too is, as I mentioned, that with inside Kubernetes, to reduce the network latency, these microservices don’t have to go outside of the cluster at all to communicate to one another. And so it really does provide a really seamless, wonderful environment for them.

And so we have very, very many of the services that Ancestry provide that are running in Kubernetes today. And we truly feel that the containerization is really the easiest path for developers to make those decisions. The appropriate decisions at their level, as to what the size of services they should be, and how they decouple things.

So as I mentioned, I think everyone’s familiar that’s a programmer, there’s this notion of a REPL environment, this notion of a read, evaluate, print and loop. What I find with Kubernetes in deploying services of any size, is now you have with Kubernetes, this compile, deploy, execute and loop. And I think that’s important for this audience, in microservices specifically, because there’s no longer this barrier of coordination of deployment of experimenting with the various size and decoupling, that now you can truly … You compile, you can get it deployed, you can figure out what it looks like in the environment, and then you can reiterate and figure out what’s appropriate or not.

And so we find that the Kubernetes and containerization really has been a great boon to us. And as such, we hope that Ancestry is providing the services that people, our customers, are wanting. Again, it’s wonderful to leverage the community that’s out there, both the open source community and others, that help us provide the services. Again, we’re genomic … Consumer genomics and genealogy, and a history science. So we’re not here to build all these things ourselves.

So with that, I’d like to maybe just open up to any questions that you might have for me. And if there’s anything I can explain more about what I’ve said.

Kelsey Evans: Great. Thanks, Paul. So, we have had a few questions come in. If you didn’t get a chance to type yours yet, go ahead and put those in the Q&A or the chat section. So we’ll get started with these. So Paul, you mentioned how you have separate devs, stage and production clusters. How do you ensure isolation between teams working on different services? For example, you don’t want a team working on a service X to break everyone else who depends on service X during the development process.

Paul MacKay: Yeah. Well, that’s a good question. In fact, I’m not sure if we have a silver bullet for that. As I indicated, we do have the separate clusters for dev, stage and live. And so yes, it does provide this notion that you can change a version of your service. You can maybe tweak a little bit, hopefully not break everybody. But because we have these separate clusters, one can deploy it to the dev cluster, experiment with it, coordinate with your customers that are using your services, your clients, and then propagate it up. So we find that that model works well. And also this notion of separate namespaces. You know, that there’s not this stepping on each other’s toes with config maps or secrets, or what have you. So that’s kind of our model right now.

Kelsey Evans: Okay. We’ve also … We’ve gotten a lot of questions about whether or not the video and the slides will be available. And you will be getting an email, all of you, later this afternoon with the video and the slides, so you can see them.

Okay. So, how did you address Docker security concerns?

Paul MacKay: Well, so, the key with Docker security is this notion of one, as I mentioned, on the OS level, it really applies with containers in general. Meaning, you want your runtime commit containers to be very, very lightweight and small. You don’t want to create a container that has all the other distro in it. So we use … A lot of our services are built on Alpine, and we utilize other methods to inspect the containers to make sure that there aren’t any vulnerabilities for the libraries that they’re using. Again, we try to keep it very, very small.

And, as I mentioned, with CoreOS, they’re very, very tightly aware of Docker and security. So we leverage that OS greatly to help make sure that our security concerns are met. Again, data is our company, and we want to make sure that we’re extremely secure. And so we’re constantly analyzing, monitoring, and measuring our security infrastructure. So, yes there are concerns, we continue to look at them and to address them.

Kelsey Evans: Okay. How long did it take to deploy Kubernetes?

Paul MacKay: So I assume the question is, how long did it take for us to deploy, meaning from our initial sandbox cluster to production? I assume that’s the question. We started this effort … It’s actually been about two years ago now, actually, with the pilot groups. And we then … Because it was rather new, both the Docker and also the Kubernetes, we took several months to make sure we understood all the various nuances. And so the actual production traffic did not occur until it was a year and a half ago. So I would say it took us about three months, four months to make a full adoption into a production environment. And so, yeah. It’s not … It takes a little bit of effort, but I would say now it’s a lot easier to do it than it was when we started.

Kelsey Evans: In VM or Legacy versus containers, how do you break out your SLAs or QOSes while using containers?

Paul MacKay: That’s a very interesting question. So each of our … By convention, inside of Ancestry, we have some conventions on, that every service has to give, have an endpoint to give their SLAs. Meaning, each endpoint gives a status of their dependencies, the downstream, as well as their own criteria, as far as where [inaudible 00:46:51]. So it really hasn’t changed between VMs and containers. These services inside the containers still have these endpoints. The SLAs have to be established and well-maintained between the clients and the service. And so I’m not sure if we’ve really found any difference or any problems in going to containerization versus the VMs. But yet we have this convention, and we’re constantly monitoring those QOSes and the SLAs.

Kelsey Evans: How big is your team, and how many teams were involved in completing this effort?

Paul MacKay: Well, so, what’s interesting about it, for those that have used Kubernetes, or at least are investigating it, kind of the interesting adage within a lot of the corporations, specifically like Netflix and Google and others, is that as teams grow, meaning development teams grow, the support groups should not grow. Automation, other things that facilitate the environment, should take over rather than personnel.

So actually, we started small and we remained small. We started with three individuals on our core team, and we’re … We have actually four right now. And what’s interesting is, as I mentioned before, how important it is to own your own service as well, to be real. We also own a service that is a vital part of Ancestries’ site. So if we do things bad, it affects things. So even though it’s a team of four, we both have a service that we maintain as well as all of these clusters.

So a relatively small team, and it’s been incredible, as far as the community of Kubernetes, as to help one another out. I started the Utah Kubernetes meetup here, and we have an incredible number of companies and teams that we all kind of coordinate and collaborate. So it’s … You don’t have to have a huge … You can start small, and it seems to be okay to grow big and be small, as far as teams.

Kelsey Evans: Great. What benefits have you seen from using the namespace per service model?

Paul MacKay: As I mentioned before, what we found is that it provides a good containment of security credentials, of config maps. So again, there’s not this stepping on each other’s toes. We’ve facilitated again with tools and other things, that it’s not a burden to have so many. And so people are … As I mentioned, like with the We’re Related app or Hints apps or others, that they may have tens of microservices that are each individual namespaces, but they’re easily deployed either as a whole, or not using some of the tools as well as just [inaudible 00:49:59] in general. So we’ve found it to be very, very beneficial to have this separation, or this sizing of … or one service per namespace.

Kelsey Evans: Great. Okay, well that looks like all the questions that have come in. If anybody else didn’t, had a question that they didn’t get a chance to ask, feel free to go over to the Gitter throughout the day and post it there. Again, it’s gitter.im/datawire/summit. So with … Oh, there’s one more. What are the big challenges in integration of AWS with your private cloud?

Paul MacKay: One thing that’s nice about the community, is there’s a lot that the community has put forth to facilitate that integration with AWS. I’m assuming that what you’re talking about is, of course we have our own clusters inside the, our private cloud if you will, in AWS. But if you look at the cloud providers that Kubernetes has, one of them is AWS. So when you create … For instance, as an example, when you create a service in Kubernetes, ELBs are automatically created. You can also make sure that it’s automated across availability zones. You can also specify that it’s private, not public, when you first create it. You can also make sure that if you want to do SSL termination, that the ELB, it can actually be done at the ELB.

But as far as Kubernetes goes, it’s all seamless and underneath you as a developer. So we’ve … As far as challenges go, I think there have been challenges, is making sure that we understand … For instance if, for those who are familiar with AWS, there’s a notion of the IAM for roles and such. We actually have augmented an open source project called Cube 2 IM, that’s a numeric two, to actually facilitate so that we can have our pods assume a role that’s running on a Kubernetes node, that then can do various things that’s associate policies on that role.

We also have augmented it, so that we can actually … We have a proxy agent up in the cloud, so that we actually, in our private data center, in our private Kubernetes clusters, it’s seamless. Meaning the pods even in our private data center can assume roles up in AWS and have access to AWS resources in a very secure manner. So there are some challenges, but I think we’ve been able to overcome a lot of them.

Kelsey Evans: Okay, and another one. Do you use only Docker containers? Do you have any limitations of Docker being vendor property versus open source? Did you try LXC containers?

Paul MacKay: Well, so, that’s a good question. So as I mentioned before, with my previous experiences in other companies, understood the notion of zones and routes and containerization and LXCs. What I found with Docker specifically is it just wraps it so much more nicely for regular developers. Meaning you don’t have to know all the intricacies of LXCs to do it well. As you may know, there are other container runtimes. We currently do use Docker, but there are also other things that we do use, containers that are using Rocket for the cluster, because we’re CoreOS. But we haven’t really had too much of a concern. There’s the OCI, the Open Container Initiative, that’s out there too. So I don’t … It doesn’t really worry us, I guess, and we haven’t seen too many limitations in what we’ve seen in Docker right now. So hopefully that answered the question there.

Kelsey Evans: Okay. All right, so those are all of our questions that have come in. Again, if you didn’t get to post them, feel free to post them in the Gitter. So that wraps up our presentations for today. We will be back tomorrow at 1:00 p.m. Eastern. Rafael [Schloming 00:54:18] will be talking about how to have developers adopt microservices. So we’ll see you all tomorrow, and thanks so much, Paul.

Paul MacKay: Thank you.

Expand Transcript

Ancestry’s Journey towards Microservices, Containerization and Kubernetes – Paul MacKay

Paul MacKay (Ancestry)

Description

Presentation Slides

Transcript

Stay in the Loop

Keep up with the latest microservices news.

Simplify and streamline microservice deployment.