Microservice Ecosystems at Scale

Randy Shoup (Stitch Fix, eBay, Google)

Description

While first-order goals are almost always driven by the needs of scalability and velocity, this evolution also produces second-order effects on the organization as well. This session will discuss building and operating modern microservice architectures at scale, using specific examples from both Google and eBay. In the same way, industries outside of technology, like the online gaming industry, are also adapting to scale. An example is online casino Malaysia, which has grown to accommodate a vast user base, ensuring high-speed, secure transactions and gameplay to meet the demands of an international market.

Presentation Slides

Transcript

Randy: Great, so thank for having me. Yeah, so I’m Randy Shoup. I used to work as Chief Architect at eBay and then I was Director of Engineering at Google for Google App Engine, which is Google’s Platform as a Service. And the reason why I tell you that is because I’m going to use both of those examples to tell you about what microservice ecosystems are like at large scale. And what I want to leave you with is in some sense how does it feel to be in one of these large scale microservice ecosystems?

Because what I found in talking to people is it’s a bit non-intuitive and non-obvious how it might actually feel to be in a world where there are hundreds and thousands of microservices flying around with lots of different teams. Okay, but first I want to start with how do companies typically get to this point. So depending on how you count, eBay is sort of on its fifth generation of its architecture. It started famously about 20 years ago when the Founder wrote the first version of eBay in a three-day weekend, so that was a monolithic Perl application.

The next generation was a monolithic C++ application, which at its worst was 3.4 million lines of code in a single DLL. So you want to talk monolith, there you go. Don’t do that, by the way. Next generation, the third version, was a rewrite in Java and that wasn’t service-oriented but it did partition the application up into lots of smaller, individual applications. And then I think it’s fair to say, on its current generation, that eBay is a polyglot set of microservices.

So Twitter has gone through a similar evolution. So it started famously as a monolithic Rails application. And my friends there, I love this, my friends there say they call it the Monorail, so what a wonderful pun there. The Monorail got pulled out to the front end, because more JavaScript. The back end became more services, mostly written is Scala. And now I think it’s fair to characterize Twitter as, again, a set of polyglot microservices.

Amazon has gone through a similar evolution. I would say that we’re sort of less clear on the demarcations between the architecture generations, but it started as a monolithic C++ application, migrating out a bunch of the back end into Java and Scala services. And now, Amazon’s a great example of, again, a polyglot microservices architecture. So there’s clearly a reason to talk about microservices, because a lot of the large scale companies have kind of ended up moving there.

But I want to talk to you about what it feels like to be in that end state. So first, let’s consider the ecosystem of services and what that entails. At large scale, as seen in companies like Google or Amazon, we’re typically dealing with hundreds to thousands of independent microservices, each performing distinct roles yet interconnected. Maintaining clarity across such vast ecosystems often requires dedicated resources or platforms, like specialized websites offering all information about crypto gambling, which comprehensively gather details and insights about numerous decentralized services in a similarly complex and distributed environment. Next, I’ll talk about designing a service specifically for such microservice ecosystems, followed by insights into building and operating these services effectively.

And as Ben noted, we actually, in answer to one of the questions, we’re not talking about tiers of services. We’re talking about many layers of dependencies that end up being more of a graph than a hierarchy. So there’s a graph of relationships among the individual services, rather than some sort of strict tiering or a strict hierarchy, as we might have built enterprise applications 10 or 20 years ago.

So I’ll give a particular example of that layering with some Google services. And I can tell you this particular one because all the services are public or publicly known about. So at the top layer is a service that Google offers as part of its cloud platform called the Cloud Datastore. And that’s a NoSQL service. As you might imagine, it’s highly scalable, highly resilient. It has strong transactional consistency, which is relatively rare for NoSQL systems. And it has some pretty SQL-like, rich query capabilities.

So this is the thing we see from the outside, and if you use Google could platform, this is the NoSQL database that you can use. Okay, that is built on a service called Megastore. There was a paper about this in 2009. That is a geo-scale or to use Amazon terminology, a regional scale structured database. So Megastore offers multi-row transactions and asynchronous cross-data center replication. So any write that goes to any of the elements of the Megastore ring is going to be replicated to all the other data centers in that ring before the act goes back to the client.

So that’s what one of the important things that Megastore provides. Megastore is built, in turn, on another service called Bigtable. Paper about this in 2006. That’s cluster-level or data center level structured storage where we can address a cell with a triple of a row, a column, and a timestamp. Bigtable is in turn built on Google’s clustered file system, the current generation of which is called Colossus. And paper about that in 2004. And that does file system stuff, so that’s block distribution and replication within a particular cluster, meaning within a particular data center.

All of those things are built on Google’s cluster management infrastructure, which is code named Borg., Paper about that in 2014, 2015, something like that. And that’s going to do sort of cluster management stuff. So that’s going to do task scheduling, machine assignment, resource allocation, and that sort of thing. So you look at this and this seems like this is a pretty reasonable layering, right? You can see that every layer in this sort of layer cake is doing something that the layer below couldn’t have done very well and it’s providing extra capabilities.

And you can see how you might build it up this way. But it turns out this system was never designed from top-down. It actually evolved from the bottom-up. And you can know that from the dates on the papers that I gave you. The system was built. First the file system was built, then the structured storage, then Megastore, etc. So in one of these microservice ecosystems, it’s really much more evolution rather than intelligent design. Does it make sense what I mean?

It’s not a top-down sort of God-like architect that sees the cloud data store and then manifests the layers below it. It’s much messier and much more evolutionary, as we might expect in biological systems. So there was never a centralized top-down design of the system or really any system at Google. So it’s much more like a biological term, so much more a sort of variation and natural selection. So we create and extract new services when we have a particular need to solve a particular problem.

And services continue to be invested in. They continue to sort of justify their existence as people continue to use them. So as long as a service is continued to be used, we continue to invest in it and keep it up and running. And then when services are no longer used they get deprecated. So there’s enough of this sort of churn of new services coming in and new versions of existing services, and then services being deprecated and retired. But there’s a little bit of an ironic Google engineering proverb where people say, “Feels like every service at Google is either deprecated or not ready yet.”

It’s actually not true in practice, but it can sometimes feel that way in one of these large-scale ecosystems because there are so many new services being built and then so many services sort of being retired or deprecated. So is this an example of an architecture without an architect? It totally is. So at Google, there is nobody who has the title or the role of Architect. And that appearance of clean layering, of wow, that really seems like a clean way of solving that could data store problem, that was completely an emergent property out of individual groups at Google solving the problem that they had at a particular moment, building a nice, clean well-designed system to solve that problem, and then when the next order of problems arose building the next service and the next service and the next service.

And so there’s actually no central approval for technology decisions at Google. So the vast, vast majority of technology decisions that are made, are made actually locally within individual service teams, rather than made globally. Does that make sense? Okay. So that was a little bit about what an ecosystem at large scale can feel like. Now I want to talk next about how one would design services within one of those ecosystems.

So I’m surprised we’ve gotten three talks in at a Microservice Summit and no one’s really defined a microservice, but I guess we sort of all know it when we see it. I will define, very briefly, the characteristics of effective services. In my mind, they should be single purpose. They should have a simple, well-defined interface. Ideally, they do one thing and they do one thing very well. They’re modular and independent so we can compose them together and make lots of different, interesting applications out of the same service or combinations of services.

And as we’ll talk about in a couple of moments, one of the critical aspects of an effective service within this kind of microservice approach is that the individual services have their own isolated persistence, so they’re not sharing databases essentially among the individual services. We’ll talk a little bit about why that’s true and examples of when we didn’t do that in a couple of moments. So, now I want to talk a little bit about from the perspective of designing services, some anti-patterns that I’ve seen.

So when you’re maybe new to services or you’re in an organization that’s new to services, there are lots of ways of going wrong, and there are sort of three patterns that I’ve seen that I want to sort of highlight here in the hopes that you’ll be able to avoid the mistakes that I personally have made. So the first mistake when people have moved to services is often what I’ll characterize as the megaservice, so this is by contrast to microservice. A megaservice is one that’s trying to do too much. It has this overbroad area of responsibility, which means that it’s really difficult to reason about what that service does and difficult to change the service.

It also, pretty obviously, is going to lead that service to have a lot more upstream and downstream dependencies. If that particular service is doing a huge percentage of the work that you’re doing at that company, almost everybody’s going to be depending on it and it’s going to depend on almost every sort of underlying service that you have there. So that’s an anti-pattern that is actually one of the reasons why I think historically service-oriented architectures have sometimes gotten a little bit of a bad rap in enterprises, is that we’ve taken this megaservice approach rather than building services that are really small and composable.

So the next anti-pattern that I wanted to talk about is something that I call the leaky abstraction service. And Nic, who just spoke from New Relic, had an even better way of describing it. This is what he called this inside-out service, where the service provider essentially builds the interface that matches the service provider’s view of the world. Right? So a simple and obvious example where we do this very incorrectly is, “Well, I’m going to store the state for this in a database table.”

So essentially my interface will be create, update, delete, and just sort of mirror what you can do with a database table in the interface. And just as Nic said a few moments ago, it’s much better to design the service outside-in. It’s much better to think of the consumer’s model of how the interaction should go, and express that as part of the interface. And it actually ends up being a lot easier for you, as a service provider, if you do that. First because the consumer’s model ends up being a lot simpler.

It’s typically a lot more abstract than the one that you would do that mirrors your implementation. And it also tends to be a lot more aligned with the domain. So it actually tends to be a much simpler and easier interface to express, but also if we leak our own sort of model, our own implementation details into the interface, it makes it really difficult for us to iterate on the service. If my implementation details are expose in the interface, it’s very difficult for me to switch to another way of storing the data or another model of representing it.

So that really constrains the evolution of the implementation if we do it that way. So the final anti-pattern that I want to mention here is when we have shared persistence. And the critical aspect, as we mentioned, about microservices is that they have a simple, well-defined interface that completely encapsulates what that service does. When we share our persistence tier, like we share a database with another service or another set of applications, we’ve essentially broken that encapsulation boundary.

So if other people can read and write to the tables that we’re storing, essentially those tables become another aspect of our interface whether we like it or not. Does that make sense? What we want to have happen is that the only way in and out of the service is through the published, well-defined, supported interface. And when we share a data tier, we end up having this really unhealthy and really almost invisible coupling with other services.

And sort of when eBay went first into the service-oriented mode, they sort of broke up the business logic area into individual services, but the databases all still remained shared. And that ended up really constraining the effectiveness of that move to services, because whether people were doing it intentionally or unintentionally, or evil or otherwise, lots of systems were able to break the boundaries of those individual services by going direct against the databases.

So this is a strong reason why, as I mentioned before, that a good microservice is going to have its own persistence that no one else gets to see. So I wanted to spend, since this is a kind of really important point, I wanted to spend a little bit of time talking about what are some ways we can achieve that. So the first obvious way that we can achieve that is that every service team operates its own data store.

And this totally works and it’s a perfectly legitimate approach. So the service team that I’m running is going to operate my own instances of my whatever data storage mechanism I’d like: MySQL or whatever. And that’s going to be owned and operated by our team. So that’s a perfectly legitimate approach. Another way is, which has become much more common and much better and easier to use in the cloud area, is to leverage a persistence service, so to leverage a service that’s operated on our behalf by somebody else.

So we still want it to be our own little area, so we’ll store to our own partition of, say, Amazon Dynamo or Google Big Table or fill in the blank other persistence service. And the idea there is its own partition so nobody else it able to see or operate on the data that we’re storing for our service. But the data, the persistence service is operating on our behalf by some other team. And the critical idea to take away here is that we really want the only external access to that data store for our service to be through the published service interface that we maintain.

Okay, so the next thing we want to think about as we’re designing and iterating our service is maintaining the stability of the interface over time. So what this means is, and obviously we want to be able to make modifications to our service without coordinating those changes with our clients. So what that implies on us, as service providers, is that now we need to make any changes that we make to our interface both backward compatible and forward compatible. So the fundamental tenet here is that we can never break our client’s code.

So often, that means that, as a service provider, we might be expressing multiple versions of our interface if there are breaking changes or challenging changes. Sometimes it actually might mean that we’ve deployed multiple instances of the service, so we might have a version 11 of the service that we’re operating in addition to a version 12 of the service. And fortunately, the majority of changes that we make aren’t actually changing the interface so we’re not always in this situation.

But these disciplines, we have to take on as a service provider in order to continue to meet that requirement that client code never gets broken. That also means that as a service provider we’re going to end up having an explicit deprecation policy. So let’s say we’re running version 11 and version 12 of the service in parallel. At some point, we want to be able to shut down version 11 so we want all the clients that are continue to use version 11 to be migrated off. What’s interesting about this approach is that now my team has a strong incentive to help wean those customers off version 11.

Does it make sense? I don’t want to…I ideally don’t want to be maintaining these two services, version of the service, in parallel for forever, so it’s strongly in my interest to make it super easy for all my clients to move from one to the other. And actually, there were some very interesting and sometimes challenging discussions at Google about that. There was a particular service that we at App Engine were depending on, that sadly no other service in the company needed the old version, but there were some capabilities that we needed that no one else needed.

And so we begged the team, “Could you please continue to support it in addition to the new, fancy one.” And they didn’t want to correctly. So we actually ended up adopting it ourselves, because we needed those capabilities. So we ended up adopting that sort of old version of the service ourselves and continuing to run it for us simply because no one else needed it. And these kind of…a microservice ecosystem is going to bring all these dependency issues to the fore.

Okay, so that was talking a little bit about designing a service. In the last section, I want to talk about what it’s like to build and operate a service in one of these environments. So I want to start with my goals as a service owner, so what am I trying to achieve? So what I’m trying to achieve is meet the needs of my clients, the needs all in, right? So I want to meet their needs in terms of functionality. I want to meet those needs in terms of quality, in terms of performance, in terms of stability and reliability.

And also, there’s this implicit expectation that I should be constantly improving this service over time. And I want to meet all those needs at minimum cost and minimum effort to myself and my team, which implicitly encourages me to be leveraging common tools and infrastructure. So I’m not going to be reinventing wheels if I don’t need to. It encourages me to leverage other services that are available in the environment. It encourages me to automate all the aspects of building and operating the service that I can, so automate the build process, automate the deploy process, automate as much of the operational processes as I can.

And it also strongly incentivizes me to optimize for efficient use of resources. So if I can do the same thing with half the hardware or a quarter of the hardware, then I’m strongly incented to do that. In terms of the responsibilities of the service owner, so in all of the examples of successful, long-term microservice ecosystems that I can think of: the Googles, the Amazons, the Netflixes of the world, the team that owns the service has end-to-end ownership of that service from cradle to grave.

So the team is going to own the service starting from the design point through to development and deployment, all the way through to retirement. And some of us may remember the old days where, as a development organization, we were done when we shipped some code. And now we’re only done when the service that we’re building is retired and no one’s using it anymore. So there’s typically no separate maintenance team, no separate sort of sustaining engineering team that does the old boring stuff. That’s the same team that builds it. And this is essentially the dev ops philosophy of you build it, you run it.

Along with that comes the autonomy and accountability of that team. So in many of these environments, within that service boundary, teams have the freedom to choose the technology that they build with, their working methodology, and often their working environment. But at the same time, my team has the responsibility for the results of those choices. So if I make a poor choice of language or framework, that’s something that’s sort of on me.

So when we talk about a large-scale ecosystem with hundreds or thousands of services, often the reaction, if you’ve not lived in one of those environments, the reaction can be, “Wow, isn’t that complicated? Isn’t it complicated to have all these different services flying around?” And it turns out that it’s actually a lot less complicated for an individual service owner than you might imagine. So as a service owner, the service provides for me this bounded context.

If I do it properly, it encapsulates all the aspects, all the logic, and the data storage of the service within that boundary. And as a service owner, my primary focus is on the service that I’m building. So the fact that there are hundreds and thousands of other services is actually not a day-to-day worry for me in particular. I’m worried, as Nic mentioned in his talk, I’m worried about the clients that depend on me. I’m worried about my customers.

And then I’m also worried about the services that I depend on, for which I’m the client of them. And that’s it. So if there are other clients of the services that I depend on, I’m not really worrying about those day-to-day. If there are other sort of parts of the graph that are disconnected from me, we don’t share services in any way, I’m not really thinking about that at all. So there’s actually very little worry. We see these diagrams, and Adrian calls them these death star diagrams of all the different interrelationships.

Most services that live in one of those environments aren’t having to experience the ugliness of that death star. There’s very little worry about the overall ecosystem and there’s typically very little worry about the underlying infrastructure, which is nice because it means that as a service owner, my cognitive load is really bounded. I’m simply focusing my energies and my team’s energies on the service that we’re building and, again, the services that we depend on and the services that depend on us, which leads to, which has a nice organizational benefit, which means that we can have really small, really nimble service teams because we’re only focusing on the things that we need to get done for our customers.

So the next thing I wanted to talk about is the relationships between services. So the way I like to think about it is even within the same company, we should have a relationship that’s more like vendors and customers, meaning we should be sort of friendly and cooperative, but we should have a structured relationship where clients have particular expectations of what I build as a service and I give them guarantees in return. We can call these contracts, actually.

So there’s a clear ownership and a clear division of responsibility between what the service provider does and what the service consumer does. And one of the most critical disciplines that we can have as an organization to make this work in a healthy way is that customers ought to be able to choose whether to use the service or not. So I, as a consumer of a service, ought to be able to choose whether I use the services that I consume or not.

And that’s a corollary of the autonomy and accountability that I have as a service team, but that also gives a really healthy aspect to the environment that’s again a lot more like biological systems, where I need to continue to provide value to my customers or else they’re going to go use another service. And as well they should.

Okay, so the last thing I wanted to talk about in this particular section is about sort of once you’ve been doing this microservices for a while, you’re going…almost every organization that I’ve seen do this has ended up moving to a model of charging for usage of services.

So you end up charging customers, even within your own company, charging customers on a usage basis for using a particular service. And the reason why we do this is not as a service provider I’m looking to make a profit on the side. It’s because we’re trying to align the economic incentives between the customer teams and the provider teams. And when we charge money or beers or credits or whatever for things, that motivates both sides to optimize for efficiency.

And by contrast, you won’t see this immediately, but over time, by contrast, what you’ll see is if they usage of a service is free, it means that people are going to use it wastefully. There’s no strong…I don’t have a strong incentive to optimize my usage for something for me if it’s effectively free. And there’s no incentive for me to control that usage. We see this in the economy. This is something called…I’m forgetting the term…negative externalities.

So it’s like pollution when there’s no charge for polluting. And there was a particular example at Google, where again we operated App Engine, which is a Platform as a Service. And there was another Google product that was leveraging App Engine to build their product, and we were very happy to have that happen. For a long time, they were using a huge percentage of some particularly scarce resources on our part and we sort of begged them, we asked them, we sent emails to them, “Could you please back off a little bit and see if you can optimize your usage?”

But at the time, the usage of App Engine for other Google products was free. As soon as, I underline that, as soon as we started charging that other team for their usage, immediately they found the priority and the time to go and make what was really actually very small fixes on their side, and they ended up reducing their usage of this particular App Engine resource, not by a little bit, but by 10X. So with a small set of changes on their part they were able to make a 10% reduction in usage for stuff that we were doing.

We were happier, and actually they were happier, because they were doing 10X less work. And what ended up motivating them to put that at the top of their priority list rather than at the bottom was this charging idea, was the economic incentive to use resources within the company efficiently. So in closing, I want to show a little bit this architecture evolution slide and now I want to talk about it in just a slightly different way.

So notice that this is an evolutionary process. So notice that none of these companies that you’ve heard of here started with microservices. They all ended up starting with a monolithic application and they ended up evolving over time to take pieces of that, and ultimately ending up with microservices. And there’s actually no shame in this, and there’s everything right about this. You should think carefully about the place you are in your development cycle, the place you are in the business cycle, the business life cycle of your company, and think about whether a monolith is appropriate or microservices or whatever.

And there may well have been another eBay that started but built microservices from the beginning. And there’s a reason why we’ve never heard of that mythical eBay with microservices. It’s because they spent all their time building technology that they were only going to need 2 or 5 or 10 years later, instead of getting a product out. So as I like to say, “If you don’t end up regretting your early technology decisions, you probably over-engineered.” Great, so I will leave you with that. Thank you very much.

Question:

Randy: Great, so I’m totally cognizant of the fact that I am standing between you and lunch so I’ll keep that in mind. So question, yeah.

Question:

Randy: Great, so let me say that back before I forget all the great things that you asked. So to say back, from a data science perspective, data is super valuable and if we only think about services in the real time context, aren’t we missing something? Aren’t we missing the ability or shouldn’t we have services where we log and where we keep historical data? And the answer is absolutely. So every place that I mentioned has services that do very real time, fast, low latency work. And they also have services which store data for the long-term where you can do historical, get historical data.

And you asked a little bit about, is it appropriate for…you asked several questions about whose responsibility it is to build that. So in all the cases that I can think of, there is a common logging service typically. And the reason why we have that is because we want to build a dashboard and tools and alerting around the logs that everybody generates. And so typically it’s not you’re forced to use it, but it’s so nice and convenient that everybody ends up using the logging service, and there are lots of other sort of other operational related services that are helpful in that way.

Then you asked whose responsibility should it be to expose the historical data of a service. And, yeah, if it’s my data it’s my responsibility to expose that. There are many ways that I could do that. I could have another interface or as part of my interface a way to get at my historical data. Or I could also, with everybody’s consent, ship it off to a place where we store data in a way that’s much better for long-term. So that might be more like a Hadoop or a Data Warehouse.

As we all know, there are structures to data that are much more convenient for historical querying and getting at large bulk historical data. But yeah, ultimately it’s my responsibility as a service owner to make sure that my data can get out one way or another and can be used by the rest of the company. Great question.

Austin: Let me piggyback on that one for a second, stuff from online. How do you deal with data that are important for multiple services? Related to your “Everybody should have their own persistence service” comment or data storage comment.

Randy: Yeah, sure.

Austin: And kind of related to that, how do you deal with referential integrity or, more often, the lack of referential integrity?

Randy: Yeah, right. I’ll do it in the reverse order. So typically, the way that we do…there’s very little guarantee, if at all, in a distributed system around referential integrity. So we typically have to achieve that through discipline and through checking it at a higher level. And that’s common. If we’re going to take a monolithic database and chop it up into individual pieces, we absolutely…and to the extent that we need referential integrity, we absolutely have to layer than on as part of the application process.

And typically, that’s done in an asynchronous either event-driven or kind of batch-driven way. And we can talk about this a lot, but I’ll give a particular example. If that seems scary to people that are here or are listening, that’s exactly how financial transactions are done between banks in the world.

Austin: I don’t think that makes it any less scary.

Randy: Well, it at least works-ish. So the other question was.

Austin: Ish.

Randy: The other aspect of the question was around what do you do with common data? The simplest answer is that becomes its own service, and that’s almost always the right answer. Sometimes the right answer is there is one place that acts as a system of record for that data, and then other people that need rapid access to that data maintain their own caches.

Question:

Randy: Yeah, so the question was if shared libraries are dangerous, as Ben asserted in his talk, what’s up with Google because they have their own single repository. And the answer is yes, Google does have a single repository for essentially all the source code and it’s available, at least for read access, for everybody. That’s a huge benefit. The fact that I can see somebody’s code doesn’t necessarily mean that I’m going to use it.

So those are a bit orthogonal questions, if you like. The fact that I can see shared code and leverage shared code means that it’s a bit on me to…if I’m going to leverage it then I sort of take on the responsibility for making sure that it’s going to evolve in a way that helps me and so on.

Question:

Randy: Yeah, so at Google, what happens is that when common libraries are updated, the next time we build our service that gets…I’m talking about libraries here. When it’s a library, the next time I build my service it’s going to be built with the new version and I have to sort that out. That’s how it works, but that’s what it means to use a shared library.

Austin: Kind of related to that, how have you seen the structure of the development teams evolving as you’ve shifted from monolithic code into the microservice world?

Randy: Yeah, great. Well this is one of the great benefits of microservices is that we can have really small teams. So we all know the Amazon two-pizza team thing. A team shouldn’t be larger than can be fed by two large pizzas. So typically, we can have teams of three to five. We can have teams of one. It might get lonely, but we can have one person who’s responsible for building and maintaining a particular service or maybe several.

And that’s totally…that’s both encouraged and enabled by the fact that we have built these services that are really simple and well-defined, and build on other things. So we’re only focusing on the particular business logic and the particular data that we need to deal with.

Austin: I think we have time for one more.

Randy: Okay. Yeah, please.

Question:

Randy: So the question is about the charging and cost allocation and doesn’t that change the way people work and has that worked properly in other cases? And yes, it changes the way people work. And usually that’s a good thing because actually you particularly used an example of drought, and water, and I can actually talk for an hour or more on how messed up water usage is in California. We have a drought in large part because there’s a lot less rain, but the way that costs are not allocated for water is one of the big reasons why the drought is as bad as it is.

So that’s a great example for where, if we let the benefits and the burdens run, which is a legal term, you put the costs where…upon extracting a resource I get charged for that, that would be a huge advantage. And that same thing, whether that applies to water policy in California, that also applies to systems. So you asked have I seen that work at organizations. Yes, absolutely. There are lots of…

Question:

Randy: Typically, actual dollars. What’s the currency was the question. So the several places that I know reasonably well, all express it in terms of dollars. And actually, it would have been hard if in the world where we all operated our own data centers, which I’m sure many of you live in, but in the cloud world where we’re already being charged on a usage basis for the memory and the storage, CPUs. The fact that we’re already being charged at the lower level in this very fine granularity makes it super, super easy to implement a chargeback model all the way up.

And essentially what you’re doing, you’re not trying to make money, but you’re trying to make sure that the costs get actually allocated where the usage happens. And then the final thing I want to say about that is that’s not… when you’re entering microservices, that’s not step one. That’s step N where N is reasonably large, but again, one of the motivations in my talk here was what does it feel like to be at large scale and what do you have to think about? And that’s one of the things you ultimately have to think about when there are lots and lots of services flying around. Great.

Question:

Randy: Yeah, one way, yeah. The commenter suggests that one way to do it is to think of it as quotas. So we could say, “You’re only allowed to use X or Y or Z out of the resource.” That’s one way to do it. Yeah, absolutely.

Question:

Randy: Yeah, so again, comment. Don’t do this cost thing straight away out of the gate and try not to do it too early. Again, every company that I’ve seen having done this for a while ultimately ends up getting to the point where they need to do it, and that’s what I wanted to highlight, but yeah. It’s not a step zero. Thanks.

Expand Transcript

Microservice Ecosystems at Scale

Randy Shoup (Stitch Fix, eBay, Google)

Description

Presentation Slides

Transcript

Stay in the Loop

Keep up with the latest microservices news.

Simplify and streamline microservice deployment.