Getting started with EC2 and SOA
November 29, 2013 11:54 AM   Subscribe

I want to write a web app that deploys to EC2, but not having any prior experience I'm struggling to find the info I need to get started.

My problem here really is that I have an experimental project I want to do, but I'm experimenting with a lot of things at once and it's hard to orient myself. I've been working as a software developer for a while, but I haven't done web programming in a long time, and my intent here is to get a feel for the nitty-gritty of creating a distributed web service. What I'm imagining in my head is an app (in my case, one written in a JVM language) that composes several specialized services, and ideally incorporates some sort of self-monitoring and automatic self-scaling depending on load.

My questions are:

1. How do I get started with EC2? It seems like there are a lot of intro tutorials to show people how to throw up a simple, monolithic app on a single instance using the AWS console, and then a lot of more advanced material aimed at people who are familiar with the service, and not much in between to guide someone like me towards my goal. I suppose I could just dive into familiarizing myself with the EC2 API, but I'd like something a little more structured than that.

2. Service-Oriented Architecture sounds nice, but how are projects exemplifying those principles organized in practice? Is each specialized service essentially its own hermetic project/repository? How does deployment work when your functionality is spread across several projects? Do you use something like chef-solo to prepare a VM, or is the AWS API sufficient? How are test suites organized? Are projects like these generally tightly coupled to the specifics of EC2, or is it worth creating a more general interface to abstract away the specifics of the hosting solution?

3. For the monitoring and load balancing elements, what drives the creation of specialized services like the ones that Netflix maintains versus just using Amazon's services?

Sorry, I know that's actually a lot of questions, but I'm having a hard time even figuring out what to ask given the breadth of the topic. Your help is very much appreciated.
posted by invitapriore to Computers & Internet (7 answers total) 8 users marked this as a favorite
 
I'd love answers to several of these questions myself, but I did want to point you at one thing that you might find educational if you haven't looked at it already: Elastic Beanstalk, which is the relatively turnkey self-monitoring auto-scaling AWS solution. The base technology is deploying a package containing application code (java, PHP, node.js, python, ruby in ZIP , WAR, etc.) to a number of ec2 instances behind a load balancer.
posted by jjwiseman at 12:56 PM on November 29, 2013


At its core, EC2 is just providing a virtual machine, typically running Linux or Windows (though there are other options). The APIs and fancy stuff like autoscaling, virtual private cloud, load balancing, etc. can mostly be looked at as layers on top of that. So I'd recommend just firing up a single instance and playing with it to get your feet wet. You can do everything you need from the web console, no need to touch the API. Then as your project needs it, you can start bringing in the other stuff.

Beanstalk is a decent way to get things up and running quickly but sometimes I feel like it brings in a bunch of complexity you don't need when you're just starting out.
posted by primethyme at 1:08 PM on November 29, 2013


You may want to look into using heroku for hosting your web application. Heroku runs on AWS but shields you from all of the infrastructure complexity. If you need to take advantage of other AWS services you can either programmatically or by moving your app onto an EC2 instance that you maintain. Basically it let's you punt on the tough stuff until you need to tackle it.
posted by askmehow at 1:46 PM on November 29, 2013


Best answer: My EC2-hosted app is, at this exact moment, serving a touch over 600,000 requests per minute. This year it varies from 150K to 1.2MM req/minute.

1) Just get started. Dive in with small instances. Figure out how to use CloudFormation and Auto Scaling Groups to deploy a simple app and have it respond to load. This will get you a long ways, though if you have success you'll probably need to implement predictive scaling, rather than the reactive capabilities that are built into EC2. I didn't find any documentation that was better than Amazon's, and I looked hard for it.

2) this is personal preference. Each service must work on its own, and have it's own tests, but I don't think it matters much how you organize it. You'll want to use some sort of configuration management and orchestration tools (chef, ansible, puppet, salt, whatever) to keep your sanity. Amazon's OpsWorks is basically Chef, but I've never used it so I can't comment on it's utility.

3) Amazon's monitoring is close to useless. Low granularity data, little history, hard to query, limited alerting options. You'll need more information to operate at scale. For load-balancing, I use haproxy on a bunch of instances scattered over many regions and AZs, and use Akamai GTM to balance the traffic to the load balancers. That said, ELBs work well enough for a lot of purposes.

If you're going to run your app on EC2, make sure you understand how to scale up and down automatically, and make sure your app is designed to withstand any piece of the application getting shot in the head (or simply crippled) at random. If you don't do that, there's no benefit to the platform.
posted by grudgebgon at 6:14 PM on November 29, 2013 [2 favorites]


Best answer: Seconding askmehow that a PaaS like Heroku or AppFog is more beginner-friendly than jumping straight into EC2.

Netflix is unusually open about the architecture - see their tech blog and Adrian Cockcroft's presentations. This blog post outlines their redesigned API.

Why use a custom load balancer like HAProxy or Nginx? Elastic Load Balancers (ELB) don't support weighted or least-connection algorithms, URL-based routing, or advanced health checks. They also can't be used for the apex of a domain (unless you use Amazon Route 53 for DNS) and take a while to scale up in response to sudden load increases. If none of those matter to you, they're great.
posted by djb at 6:19 PM on November 29, 2013 [1 favorite]


Response by poster: Thanks, everyone, that helps. For what it's worth, I've made several apps that deploy to Heroku, so I feel like I understand that pretty well -- EC2 just seems like a different beast, since the deployment workflow is more low-level than merely taking advantage of some git hooks.
posted by invitapriore at 2:59 PM on December 3, 2013


Best answer: So, it's been a long time since I asked this question and I've picked up a whole lot in the meantime and want to share what my process looks like now, specifically with regards to sub-question 2:
2. Service-Oriented Architecture sounds nice, but how are projects exemplifying those principles organized in practice? Is each specialized service essentially its own hermetic project/repository? How does deployment work when your functionality is spread across several projects? Do you use something like chef-solo to prepare a VM, or is the AWS API sufficient? How are test suites organized? Are projects like these generally tightly coupled to the specifics of EC2, or is it worth creating a more general interface to abstract away the specifics of the hosting solution?
I've come to find that keeping them in separate projects is really the only answer -- both for conceptual ease and because we want deployment for a specific service not to be impacted by changes made in another service.

And as for the rest...basically, this is where Docker and the assorted tooling around it comes to the rescue. Configuration management is still necessary to provision hosts, but app dependencies other than Docker are handled within the realm of the Dockerfile itself. It also keeps our AWS dependencies nicely sequestered in our configuration logic, and with the use of tools like Mesos we can even keep a lot of our scaling logic from knowing about the specifics of the underlying infrastructure provider. Not to mention that we can combine this with tools like Fig to quite painlessly run integration tests across the whole portfolio of services, as well as being able to develop against an environment that matches production much more closely than what is easily achievable with VM-oriented development.

It's really amazing being able to bootstrap what is basically an entire organization's infrastructure in the small on a single machine with minimal fiddling, and I really recommend everybody give this stack a try where practical.
posted by invitapriore at 2:55 PM on October 23, 2014


« Older Books that allow ten y/o girl take new feelings...   |   NAME THAT FILM PART N, Regency Fop edition Newer »
This thread is closed to new comments.