The real strength of micro services is physical isolation more so than logical isolation. That is, there is an 80/20 rule as to scaling.
For instance, there is probably one function of your system that is responsible for a huge amount of work. If everything is in one database (say SQL, mongo,...) You have a complex system that is hard to scale. If you split the heavy load out, you might find the scaling problems vanish (because the high load no longer has the burden of excess data) and even if there is still a problem it is much easier to optimize and scale a system that does just one thing.
The most disturbing thing about microservice enthusiasts is that they immediately jump to: oh, we can write these services and clients in Cold Fusion, Ruby, COBOL, Scala, Clojure, PHP and even when we write them, the great thing is "WE DON'T HAVE TO SHARE ANY INFRASTRUCTURE!"
That's bougus to the Nth degree because a lot of the BS involved with distributed systems has to do with boring things like serialization, logging, service management, etc.
I think you still want to use the same language, same serialization libraries, management practices, etc. across all of these services otherwise you are going to get eaten alive dealing with boring but essential problems.
The real strength of micro services is that it should force you to specify your services by API and protocol. This is what allows it to scale to larger groups developers at the same time as scaling to large customer bases.
Both Netflix and Twitter, for example, have fallen into the trap you describe of using a common set of infrastructural libraries. With Twitter this has manifested by their micro services actually being a tightly coupled distributed application, not a loosely couple set of services. Netflix on the other hand has the set of libraries, yet have many groups not using them and because they went with libraries instead of APIs/protocols the non-library using applications don't follows the standards.
You need fully specified APIs and protocols with at least a 2 if not 3 different implementation platforms to keep you out of the trap of the common infrastructure.
The dream of reuse has made a mess of many systems. The value of components (you can call them microservices if you like) lies in independent entities communicating over well defined interfaces. This means that you can split work up into manageable units (and distribute the work across teams if you want to). But if things are genuinely independent then we have to accept that there will be some redundancy. As long as you stick to communicating with another component through its top level protocol/interface then things are OK. The temptation is to notice that there is something else the other component does internally that you would like to re-use in your implementation. So you ask them to expose that. This is where it starts to go wrong. You have increased the complexity of interaction with the other component just to get some re-use. Do this a few more times and the system becomes unmanageably complex. Its like any organisation. If you want to succeed you need to be able to delegate fully and don't try to micromanage people for the sake of some minor optimisation.
So the thing is to be disciplined. Only communicate over the formal interface and mind your own business/implementation. That means don't get involved in other peoples business and when other people try to get involved in yours tell them to $@&*!... please move along. Michael feathers has a good post [1] about how in COM it was hard to create components and so this enforced this kind of discipline. I think that another thing about COM was that you never got the source code for a component, it was always binary and the only thing that was published was the interface. This prevented people from nosing around in other peoples implementation. Feathers' point is that microservices are similar: Its hard to create a microservice so the granularity does not end up getting too fine. But to my mind there is no need to pay the price (failures, administration) of having to distribute a service just to get low granularity. Sure, if you want to distribute in order to scale that's fine. But otherwise just have your components in process and be disciplined about your interfaces. (Besides, microservices aren't that hard to make; Once you have made a few of them so you are going to have to have discipline there too)
There's a way to eliminate the reuse problem as well. If you make the services event-based and only let them speak to each other through messages then a service can never use another service. Services only listen and dispatch, no interservice request/reply calls.
Docker is a promising building block for solving logging and service management and that's something I've been playing with for a while. You stream your logs to your container's stdout/stderr and can pick them up from the Docker REST API. You can also hit the Docker REST APIs across your hosts to discover the IP addresses and ports of the containers running various images. None of this is "solved" yet but it's getting there.
Serialization... depends, but you can get pretty far with JSON over HTTP.
So I guess you do need your microservices to be uniform in some respects, but the consistency of "every microservice is a Docker container exposing a REST API that speaks JSON and streams logs to stdout" would seem to be enough.
It's easy to build that in several languages... if Rails makes sense, use Rails. If Sinatra makes sense, use Sinatra. If Java makes sense, use Java.
You could run into trouble if you start using languages that only one or two of your staff members know, but it does allow for some flexibility (i.e. just because certain parts of the application are enterprisey doesn't mean everything has to be.)
There are few issues with this approach. First, many security regulations require deterministic capture of audit-oriented log events (e.g. login failure/success, invalid access attempts, etc). Typically, you are required to provide a local spool to handle circumstances where a remote logging service is unavailable/overloaded. Simply logging to stdout is insufficient and hoping it gets picked by a process over HTTP is woefully inadequate in these circumstances. Second, as mentioned previously, HTTP headers will be larger than most logging events. Therefore, this approach likely doubles/triples (admittedly back of the napkin estimates) the data being transferred. rsyslog and its ilk learned this lesson long ago -- using a lightweight TCP protocol to minimize transfer overhead. Personally, I favor a locally spooling rsyslog instance in each container configured to push to a central logging service.
Besides the issues of implementation, log data needs to be analyzed. Standardizing on a logging library and associated configuration provides important consistency to use a common set of log analytics in alerting systems not to mention keeping system admins performing forensics operations sane. Finally, in addition to logging, there is service instrumentation which requires consistency to yield useful visualization and alerting for operations teams.
While Docker may allow permit a vast polyglot infrastructure, carefully choosing stacks deployed will have a tremendous impact on operational robustness. The greater the consistency across containers reduces the effort required to deploy and operate new services.
What you're attacking is not how Docker logging works.
The Docker daemon collects stdout/stderr from the beginning of each container's life, always, and stores it locally. You can then ask the Docker API for all the log events from a particular container ID at any time.
You would have one system that polls the Docker APIs on all your hosts and dumps the logs into the management system of your choice at some interval.
Log events are not going to HTTP live, you are not making an HTTP request per entry, and the event will be captured whether or not the network is available at the moment.
I definitely like the direction that Docker is going.
JSON over HTTP sucks. I mean, it's OK if user experience doesn't matter, but if user experience does matter, then latency matters, and if latency matters you find pretty quickly that serialization/deserialization overhead is a big deal -- particularly in a microservices environment where an external request is going to mean a large number of microservice requests happen, which means data could be serialized/deserialized many times. If you're happy being in the middle of the pack, go on with JSON, but if you are playing to win you need something faster (preferably with a framework that means development as easy or easier than JSON)
With multiple languages you don't just have the problem of training but you have other complexity-multiplying problems.
For instance, one of the reasons why languages like Haskell are always a bridesmaid and never a bride is that they lack a complete "standard library". You need a urlencode() and decode function if you write web apps. If you write your own you may be more concerned with making it tail recursive than making it correct. I remember how the urldecode in Perl was busted for a decade and never got fixed, and that's a mainstream language. As flawed as PHP is, it took off because it had "good enough" implementations of everything web developers need in the stdlib.
In a microservices system there tend to be functions similar to urlencode that need to be replicated across the system, and if these are implemented over and over again in different languages it is inevitable that some or all implementations will be buggy. If you can centralize these things in one library everybody depends upon, however, life is easy and happy.
> For instance, one of the reasons why languages like Haskell are always a bridesmaid and never a bride is that they lack a complete "standard library". You need a urlencode() and decode function if you write web apps.
Okay:
cabal install urlencoded
> If you write your own you may be more concerned with making it tail recursive than making it correct
You know, if there is one criticism I've never heard -- or expected to hear -- of Haskell programmers in general, its that they are overly concerned with optimizations over correctness.
You don't need to use the same language to use the same serialization library, however. If the latency of HTTP/JSON is truly too high, there's Protocol Buffers, Thrift, and a host of others in that space.
HTTP/JSON is nice, however, for the wealth of tooling that already exists and the relative ease of standardizing API design with REST.
If you like the REST architecture it works just fine with the message body being Thrift or whatever. I think of the ElasticSearch API which lets you use Thrift or XML or JSON.
The interesting distinction between binary serialization formats is if the schema is separated from the data.
For instance, if both sides of the system know you are encoding a 24 bit color value you can send 3 bytes and that is it; coding and decoding can be very quick and even possibly done on a "zero copy" basis.
If you are using something like JSON you not only have the waste involved with converting "255" to an #FF byte, but you also have to embed the schema in the sense of "here is an array of three integers (which happen to be bytes)" or "here is key 'red' and value R,..."
Thus, JSON is not "schema-free", it is "schema embedded in the data" and this inevitably bulks up the data and slows down encoding and decoding. Yes, general-purpose compression eats some of the storage/network encoding overhead, but you'll get it even tighter if you eliminate that fat before you put it through the compressor.
Now, separating the schema from the data means you need to make both sides aware of it, which is why you need some strategy for handling this in a systematic way rather than hoping things will work out OK without a plan.
> With multiple languages you don't just have the problem of training but you have other complexity-multiplying problems.
> For instance, one of the reasons why languages like Haskell are always a bridesmaid and never a bride is that they lack a complete "standard library". You need a urlencode() and decode function if you write web apps. If you write your own you may be more concerned with making it tail recursive than making it correct. I remember how the urldecode in Perl was busted for a decade and never got fixed, and that's a mainstream language. As flawed as PHP is, it took off because it had "good enough" implementations of everything web developers need in the std lib.
I don't understand what you're saying here - are you arguing that we shouldn't use languages that don't have web functionality built into the std lib? What about web frameworks, e.g. Java + Spring?
Haskell's libraries and packages are actually very well fleshed out at this point. That's not the reason for poor adoption. It really is just that it's academic and mathy and difficult to understand.
In Java you can find implementations of that stuff and will probably use it. If there is anything wrong it is that there are too many different implementations out there.
In the case of Haskell you ~might~ find implementation of that stuff but you'll probably think you're too smart to have to reuse somebody else's code and also be too smart to have to deal with the corner cases.
> In the case of Haskell you ~might~ find implementation of that stuff but you'll probably think you're too smart to have to reuse somebody else's code and also be too smart to have to deal with the corner cases.
What is there besides extreme language bigotry to suggest that Haskell programmers tend to think they are too smart to do either of these things? Haskell programmers use other people's code all the time. And Haskell programmers handle corner cases as much as any other programmers.
Your criticism of Haskell here just doesn't have the ring of truth to me. Haskell has lots of good, modern frameworks and libraries for doing web-related things, which aren't difficult to use or any more corner-case-y than their corollaries in other languages.
I get the vibe that there is some underlying point you're attempting to make and that the Haskell-specific stuff is secondary. Maybe your point is that using a technology stack with unproven maturity in a given domain (in this case web application development) is riskier and likely more time-consuming than using the more common stacks for that domain. If so, then I agree. (But I'm also very appreciative of the early-movers who put in the time and effort to make immature ecosystems around nice technologies more mature; somebody has to do it!)
I'm curious--what sort of workload are you looking at where JSON/HTTP is insufficient?
We do a lot of realtime signal streaming, and haven't had many problems. We'll be switching to a different format (or embedded format) to better handle some numerical pickiness on our end, but the fundamental transport is fine.
I think 'streaming' is the operative difference there. If you're opening a connection and sending large amounts of JSON 1-way then header/encoding overhead is basically not a problem. But a 2-way RPC protocol is basically making a separate HTTP request each time. For log file messages you're going to be sending more header more object definitions and headers than data.
I'm curious how you're formatting your JSON too - some formats are going to be way lighter then others.
Why not SNMP? It seems to be the protocol everyone loves to hate but there's a ton of ecosystem around it and it directly solves the status/discovery case
I think the advantage of the "We don't have to share any infrastructure" mindset is that standardization is a process, not a destination. A living system will always have some diversity of libraries, versions, languages, etc. While it's important to keep it under control, a system that is robust to being half written in PHP and half in $goodlanguage is much more consistent with the messy reality than a plan of "Step 1, port everything to Ruby, Step 2, write a single set of SOA tools, Step 3, profit"
I've always found it best to allow the developers the freedom to use whatever they want as the official 'standard' and allow them to self organize into a few de-facto standards. You naturally standardize around better tech and attract better developers while not restricting future innovation.
The one problem with this system is that it can be fairly easy to torpedo when you get new management who sees the lack of rules as a lack of organization. They will then enforce some arbitrary standard and waste a lot of resources figuring out their mistake.
> I think you still want to use the same language, same serialization libraries, management practices, etc. across all of these services otherwise you are going to get eaten alive dealing with boring but essential problems.
Standard formats and protocols are probably more important than same languages and libraries for that particular issue; but, at the same time, not being tied to a particular language/library doesn't mean you should add them willy/nilly. It means that when a particularly component has a compelling reason to use a different language library, you aren't constrained to not do that (or you decide for a good reason to begin a major transition from one language to another, or one library to another, you can do it incrementally rather than in a big-bang transition or behind an additional adapter layer.)
Adding multiple languages is, by itself, a bad thing. If there is a good reason to add another language (like a lot of stuff already written) than the good outweighs the bad.
I hear so much crazy stuff about microservices like: we can have different teams making all the microservices and they don't need to share any people or talk to each other at all, etc.
Practically microservices contribute to real world agility not when they encourage silo-building, but when you can move developers from task to task as necessary. Perhaps Microservice A has reached stability and doesn't need to change much. The more standardization you have, the more you can move somebody who worked on A to work on microservice C, and the have somebody on B make a quick fix to A when the first guy is busy on C.
> I hear so much crazy stuff about microservices like: we can have different teams making all the microservices and they don't need to share any people or talk to each other at all, etc.
Minimizing coupling between components -- teams -- in the development process (as well as in the deployed systems) is a major benefit of SOA/microservices.
> Practically microservices contribute to real world agility not when they encourage silo-building, but when you can move developers from task to task as necessary.
Sure, being able to share staff in series is a benefit. Not needing to share resources in parallel is also a benefit. Silo building is harmful, because sharing best practices and tools that are a good fit for different teams tasks are a good thing.
Coupling that limits flexibility of one team to meet its requirements because of conflicts with another teams needs are bad. There's a big difference between enabling sharing (which is mostly a social/process thing) versus requiring sharing (because of poor architecture choices.)
That silo-building is sort of the logical extreme.
The happy medium is more like "because our services are loosely coupled, we're free to make code changes to our service and not worry about inadvertently breaking any other team's service as long as we keep our API stable." Which is a huge win over a monolithic system where everything is tightly coupled.
I think the real strength of microservices is that it encourages software that is structured like the teams that wrote it.
The general rule is that this is true anyway. However having a strategy that actually encourages it removes many communication headaches that accompany monolithic or highly distributed builds.
Is that an argument for a "one executable per team" architecture? I'd support that, but it seems closer to what most people mean by "monolithic" than what most people mean by "microservices" in my experience.
For instance, there is probably one function of your system that is responsible for a huge amount of work. If everything is in one database (say SQL, mongo,...) You have a complex system that is hard to scale. If you split the heavy load out, you might find the scaling problems vanish (because the high load no longer has the burden of excess data) and even if there is still a problem it is much easier to optimize and scale a system that does just one thing.
The most disturbing thing about microservice enthusiasts is that they immediately jump to: oh, we can write these services and clients in Cold Fusion, Ruby, COBOL, Scala, Clojure, PHP and even when we write them, the great thing is "WE DON'T HAVE TO SHARE ANY INFRASTRUCTURE!"
That's bougus to the Nth degree because a lot of the BS involved with distributed systems has to do with boring things like serialization, logging, service management, etc.
I think you still want to use the same language, same serialization libraries, management practices, etc. across all of these services otherwise you are going to get eaten alive dealing with boring but essential problems.