Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Github and Engineyard part ways (engineyard.com)
157 points by jcapote on Sept 11, 2009 | hide | past | favorite | 95 comments


Hey all. Wanted to chime in the discussion.

My blog post is direct and to the point. There's honestly no mystery about this, I laid out all the details, possibly excepting one.

We designed our infrastructure offering 3 years ago. We knew that the vast majority of websites would produce nearly read-only file I/O. GFS's less than stellar write performance isn't a problem in that typical case.

Along comes Github, and they have an entirely different disk I/O profile to the rest of our customers. Github was built on a shared-something architecture because GFS made it quick and easy to get up and running, developing features, and attracting users. This is a good thing!

Github has been very vocal about their dislike of the GFS filesystem we use for shared filesystem access. GFS doesn't scale forever, and we've never suggested it does. It could scale far larger than it has at Github, but fizx hit the nail on the head: we weren't willing to do it for free, and Github was unwilling to pay our price.

We warned Github many, many moons ago that given their growth rate, in order to scale their application smoothly and inexpensively, a shared-nothing architecture was eventually going to be needed. We offered to help them with that architecture, as we saw Github running wonderfully atop a cloud service such as EC2.

Rather than do the re-architecture now, they've chosen instead to move to a vendor who can provide them a high performance, high availability, non-commodity, proprietary network file server infrastructure. I suspect it will work well for them, and we'll all enjoy a faster Github. :-)

From my perspective, I really want everyone to understand that there's no bad blood on the EY side, and hopefully none on the Github side. Business is business, decisions need to be made each and every day.


Rather than do the re-architecture now, they've chosen instead to move to a vendor who can provide them a high performance, high availability, non-commodity, proprietary network file server infrastructure. I suspect it will work well for them, and we'll all enjoy a faster Github. :-)

GitHub has been working on a re-architecture since our last major feature release, which was GitHub Issues in April.

And we can't wait to roll it out.


Great news, Chris, sorry to misrepresent.

So, the new Github architecture is shared nothing?


So, the new Github architecture is shared nothing?

This is a pet peeve of mine -- what does "shared nothing" even mean?

A database, a "NoSQL" key store, and memcached are all sharing something, and that something is a contention point.

Specifically in this context, how do you implement transactional SCM semantics without sharing something (such as a distributed lock system?).


Shared nothing usually means that each node in a system shares no source data with any other node, it has it's own discrete database/disk/etc. However, these systems commonly share a communication link, so they can tell eachother about changes made on their database/disk etc. With a truly shared nothing platform, an entire node can go down and all of the other nodes continue to operate normally (except from the increase in load as users are migrated from the dead system to the live systems).

The bad part about shared nothing is quite simple, it's damn hard to do, and harder to get right. One of the biggest problems is handling conflicting data between nodes. You either have to handle the conflict with your application, or your database needs to automatically choose the best copy (based on date etc).


I much prefer being talked to this way instead of hearing about "non-Ruby repositories."


Personally, I like hearing about both the technical and the commercial reasons driving a business decision. The fact that GitHub did not serve Ruby projects exclusively (or primarily) was an important factor to EY - why should they neglect to mention that?


Agreed. That bit sounded a bit pathetic when the issues were clearly technical and not ideological.

Are we really to believe that EY would have facilitated GitHub's growth had the repos remained overwhelmingly Ruby?


This is a much more... reasonable explanation than "GitHub offers the largest free storage quota among the big SCM hosters, and we came to the conclusion that we didn’t want to subsidize that quota for non-Ruby developers.".

Which had me honestly scratching my head. Then again, I prefer Mercurial and Python, so to each his own I suppose.


I applaud the transparency of EY here. This is far more honesty than you get from most hosting companies.

I'm also pleased that github is upgrading since I've noticed gradually decreasing performance for about a year now which prompted me to setup my own private repo server (but I missed the github front-end). Kudos to github for doing something about it.


Would have been nice to see a joint statement (or at least concurrent statements) out of EY and Github. This way it looks like EY is just trying to beat them to the punch...either out of bad blood or to get in front of any potential bad PR.


I think they could have worded this better: "GitHub offers the largest free storage quota among the big SCM hosters, and we came to the conclusion that we didn’t want to subsidize that quota for non-Ruby developers."

It sounds odd Engine Yard hosts an entire for-profit company for free just in exchange for some complimentary accounts.

My imagination tells me GitHub is tired of being slow and Engine Yard couldn't do anything more to help them. Instead of the news being "GitHub Leaves Engine Yard Because It's Too Slow" the news is "Engine Yard Kicks GitHub To The Curb Because GitHub has Non-Ruby Repos."

I don't see either headline being positive towards Engine Yard.


Ok, as far as I can tell, the real story is that Engine Yard relies on a GFS/SAN setup that doesn't scale in the unique way that Github needs.

If you think about it, Github is one of the few sites that actually directly uses the filesystem heavily. Everyone else hits scaling issues on the DB first.

  The sad thing of all of this is it's not really a matter  
  of scaling, and it never has been. Our bottleneck has 
  always been the file system. GFS just... sucks. I'm sorry, 
  but I have to say it. Case in point, your graph. The first 
  rebuild I ran timed out because of GFS. The second one ran 
  fine, took maybe a minute to process, if that. GFS impacts 
  everything... gem build failures due to cloning... GFS. 
  Network graphs taking long time to build... GFS. Caching 
  jobs not completing... GFS. I think you see where I'm 
  going here. There's no plans to deploy the new code to the 
  live servers, and I think the reason is that we're afraid 
  it'll make GFS performance worse, not better. But on the 
  new servers where we don't have to fight GFS, it's 
  amazing.


Funny thing is that we told github that gfs would not scale for them over a year ago, we also outlined how to move to a shared nothing chunk server architecture. They didn't take our advice so it's mostly their own architecture decisions that were holding them back with regards to gfs.

Anyway there seems to be plenty of airchair quarterback on this one. The real story is that we can't afford to host them for free anymore.


FWIW - thanks for hosting them for so long. GH is a wonderful service, and I'm sure EY contributed greatly to it's success.


Is it so hard to imagine that they weren't jumping at the chance to write their own custom non-filesystem storage backend?


Well, EY could make Github fast again by providing more resources. But then, someone would have to pay for the additional resources. EY doesn't think it's worth it, and Github can get the additional resources elsewhere for cheaper (EY is expensive, because you pay for premium support).

It seems like it's more about the fit between the companies, than any particular performance issues.


It's also inaccurate: GitHub doesn't offer the largest free storage quota. That's still Google Code, which offers 1024 mb/project, rather than 300 mb/user.


Google is also happy to bump up your quotas if you have a genuine need.


As are we, disk space should never be an issue with us unless you're doing something very wrong (hosting videos, etc).


The GNU Emacs repository uses almost all of my (paid) storage space. I am not doing anything wrong, either, Emacs just has a lot of history and a lot of code.


Emacs officially still uses CVS, right?


bzr.

But throwing everything on github is a lot easier than using bzr, so I do that. There is no bzrhub, and I don't have commit-access to GNU Emacs.


> It sounds odd Engine Yard hosts an entire for-profit company for free just in exchange for some complimentary accounts.

why? its fairly high quality, targeted advertising with a built-in demonstration.


> It sounds odd Engine Yard hosts an entire for-profit company for free just in exchange for some complimentary accounts.

Well, sort of. It's really a value trade-off. EY offers its customers something that would cost them $50 a month, so it's costing Github (customers) * $50 to offer that. They exchange the lost value for added value in hosting (i.e., they're "losing" a few thousand a month in free accounts for our customers, so we'll offer them free hosting to continue to offer that).

It's simple barter.


Very excited to see GitHub continuing to grow and EngineYard and GitHub handling their barter deals so cordially. Both companies are class acts!!!


That assumes that Github is slow because of their hosting (Engineyard) and not because of how it's written.

I dunno, it could go either way for me: "Engineyard drops Github for being too costly" vs "Github needs better hosting, so we're dropping Engineyard"


Its probably more about costs for both sides. I doubt that EY is _slow_ without regard to cost. Just that instead of rewritting GitHub's web app to deal with inherent performance issues, it makes more sense to throw more hardware at the problem. There does come a point where owning more of your server stack makes more financial sense.


It is going to be interesting to see how the performance of Github and their freemium offering changes when they start paying for infrastructure somewhere else. Unless they can smooth talk another host into giving them free infrastructure.


Based on replies from my support queries about performance, they are aware that paying customers expect better/more-consistent performance than what we've had of late.


I see, so even if they did pay, it still wouldn't be fast enough. So free even has its costs, in terms of dissatisfied customers.


It's like when your boss let's you "resign" so you don't have to tell a future employer you were fired. EY was fired.


How can github fire EY when EY was giving them free stuff. Why doesn't github just pay EY for the services they are using?

This is another failure of Freemium. GH is only able to provide such large repos because they get free infrastructure.

I can't really blame EY for bailing. The partnership is no longer equitable. GH is getting something for nothing. I shouldn't say nothing, it's something, but it's not enough.


If your service doesn't meet my needs, it doesn't matter how cheap it is - including $0. According to my understanding, EngineYard doesn't meet GitHub's needs anymore.


I don't think that's totally fair to say, we've put a considerable amount of effort into GitHub over here at EY, across all of our departments. GitHub offers a great product and it's really amazing to see them grow so fast, but that success also introduces a whole lot of new problems to the situation, many of which can be quite costly to solve.

It's just a conclusion that makes the most sense for both sides of the arrangement.


EY is worth every penny they charge because they provide an incredible quality and commitment of service. They are without a doubt the best hosting company I've ever had the pleasure of working with. Not only are they quick with assistance 24/7, but they all know what they're doing.

This really shows poorly on the guys at Github. It sounds like EY went out of their way to accommodate the growth rate (I have yet to hear the Github guys dispute that) and just because EY was asking to be paid for all the hard work, Github was unwilling to, for once, be paid...Github ditched them.


If the expectation was to be paid, then EY would have offered a contract with a future non-free date. Nobody expected this eventuality, or if they did both sides were OK with reaping the benefits in the short term.


What about appreciation for a company that essentially helped sponsor your company's incubation?

One of the things I appreciate about EngineYard, and the Ruby development world in general, is that they appreciate the benefit of helping to incubate open-source and small startup operations. It's about re-investing back into the community. They did this when they helped incubate a startup company with some great programming minds behind it, which they knew would assist in the rapid development of open-source Ruby libraries.

But it seems GitHub hasn't absorbed any of that mindset by their move to ditch their supporter as soon as they're in a position to show appreciation for the help by actually paying for the service they receive for once.


And what if GitHub can't afford EngineYard?


I find that hard to believe. They're one of the most popular and successful companies in the Ruby community and they have paid accounts that they've been collecting on for quite some time now. EY has thousands of customers who pay them every month and can afford it. How would it be that one of their most popular customers who doesn't pay anything for hosting wouldn't be able to pay them for once?


EY is super expensive. Also most of the cost is for support techs who can read/understand ruby code and gems. Chances are the github guys don't really need that.


How many of those customers have VC funding? Github is completely bootstrapped.


scott_s is scott shacon and works for github



Indeed, I am not Scott Shacon. Too many Scott's! One time in undergrad three of us sat in a column and confused the hell out of our compiler's professor.


No, it's not. Scott Chacon's surname starts with a C, not an S.

http://github.com/schacon

http://progit.org/


Do you know if RackSpace will be giving GitHub free infrastructure? If they're not then do you expect them to fail?

FWIW, I first heard about Engine Yard through GitHub, and I'm sure I'm not the only one.


I do not expect rackspace to give them free infrastructure, no. I also don't think this will cause them to fail. They're pretty smart folks and they have revenue coming in, so they'll probably be fine.

After reading more into the thread and the issue being mainly with GFS, then perhaps it won't be that difficult to improve performance at Rackspace because they'll control the hardware and be able to use any file system they want.

Knowing why they can't run at EY clears up a lot of the questions. The scalability at EY is simply too limiting for GH, paid or not. It probably doesn't make sense for EY to install a new file system simply because GH wants it and GH probably can't afford, or doesn't want to afford the amount that it'd take for EY to move all their customers to a more scalable file system.

If you look beyond all the drama here, this really is a rational decision on both sides. It's a testament to the community here that two groups parting ways in such a dramatic way can both participate in this very comment thread together and with civility. It speaks volumes and I think everyone involved has handled themselves very well.

I can't imagine this is a pleasurable experience for either group, but when one party wants something the other is unwilling to provide and in this case, it's true of both EY (different file system) and GH (compensation for the change) it's rare for both to understand the other side's perspective and avoid irrational name calling and bickering. I'm glad this breakup hasn't gone down that route.


Come on, its not fair to say something for nothing. As much as you can't belittle the importance of infrastructure, you also can't act as if its the only thing that matters. GitHub obviously has a product outside the raw hosting they provide, and each additional user brings on extra work on their end.

For all we know it could have been equally lopsided in the other direction: if EngineYard was sending them tons of traffic through free accounts, it could have proved costly in terms of support and negative effect on paying customers.

We of course don't know.


When evaluating hosting providers, the quote Engineyard gave us was tremendously expensive.

They offered a great deal of value-added services, but we found that it would be cheaper to go with the second-most expensive option (Amazon EC2) while also hiring a qualified full-time sysadmin.

Whether this was a wise choice or not given our experience with EC2 is up for debate. But even if we had gone with Engineyard and had no problems at all it would be hard to justify a couple developer's salaries for it.


Its worth noting that you get more than just EC2 + a full-time sysadmin with EY (speaking as an EY customer myself), and I don't mean this to question your own decision...that's something we all have to make considering the budgets we have available.

With EY, you get EC2 + several heavily experienced sysadmins who work every day on the exact stack that you're working with...they know it in and out like the back of their hand + the EY stack which has been battle-tested for several years now in all sorts of unique situations + performance that you couldn't replicate yourself without all that experience.

Again, not attacking your choice at all, I'm just saying that its hard to appreciate all that you're getting until you actually use the service and once you are, its easy to justify the costs.


Yes, Engineyard offered by far the best impression of competence and having sysadmins who would actually work well with us. We've heard great things from everyone we've heard of who uses them.

I've also heard great things about the Porsche 911 turbo from everyone who drives one. We were expecting a very expensive quote, but we were all surprised by how high it actually was.


Just out of curiosity, what were your options that were cheaper than EC2, and why'd you reject them?


There were several hosting services that were a good deal cheaper. It costs $576/mo to run an extra large EC2 instance fulltime, while purchasing a server with similar specs only costs $2k or so. The data center across town could give us an equal number of server power for much cheaper. We're in our own data center right now, in another city, and we simply have a contractor we pay to go in and occasionally replace broken equipment. It really is quite cheap to do it this way.

Various hosts offered greater degrees of managed services at various prices. None really offered the flexibility that EC2 offered, or offered that much over simply colocating our own servers. With less traffic or fewer servers I can see many of these hosts being a great deal.

There also were several places that gave us completely screwy quotes that I can imagine were only meant to trick CEOs who don't know any better. One example is a place that offered us the use of an HTTP load balancer for the "low, low price" of $1000 a month. A lot of places offered expensive managed services, but when talking to the people who would serve us it was clear they didn't know what they were talking about. We did not seriously consider these places.

To Engineyard's credit, they were only managed host who sounded like they had knowledge that would be really valuable to us. They also made clear that they would devote significant resources to getting our site working smoothly. We would have paid a premium for this service, but not the premium they asked.

EC2 offers an amazing amount of flexibility that we found highly valuable. A sysadmin and developer spent most of a day diagnosing an issue with a DB server that ended up being a flaky drive. In EC2, we would have simply killed the server, fired up a new instance, waited an hour for its replication to catch up, and returned it to the rotation.

To return to EC2 we would have to drastically reduce the load on our MySQL instances, since we found that EBS volumes had somewhat unpredictable and not that great performance. Reducing our reliance on MySQL is something we're doing anyways, since on its current course it would have been difficult and expensive to continue scaling up in any data center.


Not to second-guess you, but extra-large reserved instances going full-time for 3 years go for $250/mo via http://aws.typepad.com/aws/2009/08/lower-pricing-for-amazon-...


Please, write it up and post it to HN! It deserves its own discussion.


Jesus Christ, there's a lot of drama-mongering in this thread. Grow the fuck up, it's a business decision that both sides had an interest in seeing happen.


> We identified the bottlenecks and supported GitHub and the community by making patches to ssh to allow key lookup in MySQL rather than a text file. That remains, to this day, one of the finest examples of Engine Yard support and it makes me extremely proud just thinking of it.

This seems... odd to me. It doesn't feel like the right boundary between businesses. From a hosting provider, I expect good, steady service, reboots, a root console, and that they'll fix anything that's on their end (hardware, for instance).

Patching ssh is development work, and is something I would expect to pay for to meet specific goals, but not something that comes as part of my hosting package. I mean, what if you are just cruising along, and don't need any deep hacking over a couple of months. Is your money being wasted? With actual developers, I could redeploy them to do other things. Can I do that with EY?

They seem like really good, sharp guys, but I don't quite get the business, I guess.


From our perspective, it was a matter of helping our partner and the Ruby community succeed.

We're working really hard to make Ruby on Rails succeed. This is just one of the many ways that we've pitched in to help it do so.

If, during that downtime, when the "Twitter's problems are Ruby on Rails" FUD was running high, would anyone believe that a Github scale failure wasn't a Ruby on Rails problem? We weren't willing to test it, so we solved the problem, i.e. we put our money where our mouth is -- That Ruby on Rails scales just fine, the bottlenecks are generally elsewhere.


> From our perspective, it was a matter of helping our partner and the Ruby community succeed.

I suppose what seems confusing to me is that there are a lot of things you could do to help your customers succeed, but many of them are fairly expensive - such as cutting your rates, or doing high end development work.

With more basic hosting, say EC2, I know that Amazon isn't going to do beans for me, so I'm on my own. I know exactly what the price does and doesn't include, and what I have to provide myself. EY seems fuzzier... it's almost like having an extra developer on staff in terms of talent, but it is and it isn't: you can't tell that person to go off and do something else.

Say I move one of my sites to EY; is your "whatever it takes" attitude going to include fixing up my ugly design/graphics work?:-)


It seems to me that the patch wasn't for "GitHub, the EngineYard client", but instead, "GitHub, the Ruby community resource", just as they support JRuby, Rubinius, and Rails development.

If you operated a site that EngineYard saw as a valuable resource for the Ruby community, I'm sure they'd help you out in whatever way possible as well.


What you describe is that paying customers subsidize 'community resources', which is great for me as a mooching Rails (and github) user, but perhaps not so great for paying customers. I don't think they'd put it that way; and indeed I believe they must provide a lot of 'extras' for their customers for the cost of that service. Still though, it seems that it's a service that will be best with clients who make a lot of help requests, getting their money's worth.


Maybe the developers building apps that run on EY aren't the best developers to be patching ssh? Is your money wasted if you hire a lawyer on retainer and don't ask enough questions?


EY publicizes their Ruby expertise, not their 'hack on security-critical crypto code' expertise, and while it's impressive, and to their credit that they were able to do it, it still leaves me thinking that a fixed monthly fee is the wrong way to pay for that kind of talent.

> if you hire a lawyer on retainer and don't ask enough questions?

At a certain point, yes.


Engine Yard and Github's business relationship aside, is anyone else sick of everything these days being about the tool or language people are using? For some reason we hear more from the Ruby side than any other about how great their language is, but frankly the only thing that matters are the products made from it. I don't know, I'm just kind of sick of Ruby fan boys.


I'm a Python guy, and that community's both sinned against and sinning. You have prominent Ruby guys claiming there's no momentum - Youtube, Friendfeed, Dropbox, loads of newspapers - behind the language. But, then, I'm chippy enough to notice...

It's the Perl community who just seem to get it done. Amazon and BBC iPlayer have more traffic than most of us, I bet!


"we hear more from the Ruby side than any other about how great their language is"

Um, that crown unquestionably goes to a big chunk of the python community right now. Reddit alone provides a steady stream of both extreme fawning over python and vitriol targeted at ruby, and your comment and its upvotes demonstrate that it's present here, as well.

It's also quite a colossal double standard to complain about Ruby-specific hosts considering the existence of GAE, all the django-targeting hosting companies http://www.google.com/search?q=django+hosting and all the companies providing language-specific services for PHP, Java and every other mainstream language.


I'm not complaining about the hosts, I'm complaining about the extreme focus we put on the language being used rather than the products being made.


"I'm complaining about the extreme focus we put on the language being used rather than the products being made."

What "extreme focus on the language"?

Simply mentioning "Ruby" in the context of a Ruby-only hosting company's decisions does not constitute "extreme focus" any more than mentioning "Python" or, more recently, "Java" when discussing the Google App Engine. It's completely uncontroversial to anyone who doesn't have an axe to grind.

Also, the services provided by the companies (or as you put it, "the products being made") are Ruby and Git hosting, with EY also maintaining several Ruby implementations. I'm not sure how you could say anything about EY's "products being made" without mentioning Ruby.


Perhaps in the context of what services EY are providing it isn't controversial, since they clearly just provide Ruby hosting. In general though I feel like lately everyone is so dead set on the langs/tools they are using that we tend to focus on the languages rather than the general CS topics behind all of them. This was just a good time to bring up my annoyance I guess.


Bottomline, it's a business decision, I'm sure if staying with EngineYard was an option they would have gone that route. Moving to EC2 is probably a lot cheaper than other pricey, like RackSpace solutions. It's part of growing up, you sometimes have new friends, but never forget those you grew up with. I wonder how many new EY customer use their github accounts ?


Does anyone have any idea what host github is moving to?


We're moving to Rackspace.


Cloud or dedicated/managed?


Real hardware and lots of it. Once we get moved over and everything's humming along nicely, we'll do a writeup or two about the new infrastructure.


Nice, I'm really looking forward that to that. I'd love to hear about the transition too - big moves aren't easy, as I'm sure you guys know.


What FS will you use? SSDs?


That'll be great...and hopefully address one of concerns about having our private repos sitting on ATAoE storage.


What concerns are those?


that was my guess, you guys have a big dev following, rackspace has marketing money that can be reallocated to hosting github.

good luck


It doesn't actually say Github is moving. It just says that Github isn't going to be supported by EngineYard in the same way. It's entirely feasible that Github remain at EngineYard.

Edit Woops. Looks like I missed the key part of a sentence there. Sorry folks.


Accordingly, we are currently working directly with their next provider to maintain smooth service as they cut over.


> From what I understand, we're at the data migration point, and we're waiting on EY for that.

s/working directly with/delaying/

http://support.github.com/discussions/site/721-network-feed-...


That's exactly what the article says:

"However, in the end, the best arrangement for them was moving."


"Accordingly, we are currently working directly with their next provider to maintain smooth service as they cut over."

"Provider" really sounds like they are talking about infrastructure


Silly question: the GFS referred to here is Global File System (http://www.redhat.com/gfs/) not Google File System, right?


You are correct, sir.


I think it's rather funny that this would never be news if not for the keywords. I mean, a customer changes a hosting provider for business reasons. Yawn. Who cares.

But, drop in "Ruby" and "Git" and see it as front page news on HN with a discussion going off on tangents.


Don't offer a free storage quota or limit how many projects you can host. There, problem solved. The people who leave will no longer be a drain on your servers and the people who stay will be paying for the maintenance of those servers.


That would kill the majority of open source projects hosted on github, free hosting for public repos is one of the reasons it's so great. I'm willing to pay for private repos, but if I had to pay for my public ones too, I'd just not host them on github.


I'd be willing to pay Github for some other premium service besides private repos (they're not of much use to me, tbh).

Uhh... can't think of anything off the top of my head that I really need or want though... OH, wait, I know, how about Bespin integration?


Maybe those open source projects should charge a fee as well?


I'm a paying GitHub customer and having big projects like Ruby on Rails and Perl hosted there makes me very comfortable. I figure if they can handle the big stuff, they can handle me! Plus, I use as much free software as everyone else and am happy to help out indirectly (by being a paying customer).

I expect prices will rise soon as well. I'm not sure anyone has mentioned this yet, even though it seems inevitable. Still, I think GitHub is fantastic and I'm planning to increase my usage.


The reliability will need to go WAAY up before they can even think about touching prices, I'm being about as patient as I can be for spending $$$ per month on it.


Does this mean I'll no longer get a free GitHub account as an EngineYard customer?

EDIT: N/M, found the answer.

Does this mean I can have a free GitHub account because I'm also a Rackspace customer :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: