Nutchance of making this succeed without help
January 24, 2013 2:04 AM

I'm looking for a guide to how to set up and use Nutch, Solr and Drupal. Particularly Nutch and Solr. I have never coded before.

I am, as per the books, a complete dummy. I am competent in using Microsoft Office products, for whatever that is worth. I have a reasonable user understanding of what Nutch, Solr and Drupal do and have read up on them. I have worked on development projects in the content management space before as a project owner.

I now want to try them out for myself (before writing a spec to hand over to 3rd party developers) so I can at least get a sense of what is what.

Online tutorials like this assume more knowledge of the very basics than I have. For example, the instructions for setting up Nutch say:
1. Setup Nutch from binary distribution

- Download a binary package (apache-nutch-1.X-bin.zip) from here.
- Unzip your binary Nutch package. There should be a folder apache-nutch-1.X.
- cd apache-nutch-1.X/ [HUH?]
- From now on, we are going to use ${NUTCH_RUNTIME_HOME} to refer to the current directory (apache-nutch-1.X/).
I am using a mac, and am a recent switcher from PCs.

I don't mind paying for access to content or buying books. I already own the most recent Packt guide on Solr which makes sense to me as long as I ignore the parts that talk about coding. I don't mind paying someone to sit down and show me step by step how this works. Apparently you can set up a basic notch web crawl in one hour. I suspect this is not the case for me.

Is there a guide out there for complete beginners? Failing that, does anybody know a good, cheap London-based person who could sit me down and give me a basic tutorial in this stuff?
posted by MuffinMan to Computers & Internet (12 answers total)
If "cd apache" is confusing, I suggest you find some really basic screencasts of how to use the command line. Get some basic tips of using a command line and then find someone to sit down with to help you with installing other tools.
posted by kamelhoecker at 2:35 AM on January 24, 2013


I think you're probably over your head if those instructions are baffling to you. I'd maybe start with Linux for Dummies (I know you're using OSX, but the command line in OSX is pretty close to linux, and you'll be using mostly unix-like functionality when running apps like this.)
posted by empath at 2:48 AM on January 24, 2013


I agree with the other responders that you need to become familiar with the command line interface in OS X before you can do what you want to do.

"cd" means "change directory"

Here is one introductory explanation of the Terminal (the application you use in OS X to access the command line interface).

One thing I would not do is start with Linux for dummies. Though it is true that the command line in OS X is "pretty similar" to that of Linux, there are differences, and if you're really a beginner you're going to get confused and/or irritated by those differences.
posted by dfriedman at 4:22 AM on January 24, 2013


I almost think it would be better to just install a linux VM on OSX, though, and learn that way, since you'll almost never install any of those products on an OSX box in production. If you learn how to do it on OSX, you're probably going to have to learn how to install it all over again on linux anyway.
posted by empath at 4:41 AM on January 24, 2013


I guess to elaborate on the last answer -- I know you can get all these apps working on OSX natively, but the vast majority of tutorials you'll find are going to be assuming you're running linux (or possibly windows), and if you aren't familiar with both OSX and Linux, you're going to constantly run into problems following directions. The other advantage of using a VM is that if you mess up the install at some point, you don't need to figure out what you did wrong, you can just reload the image from scratch and try again.
posted by empath at 4:45 AM on January 24, 2013


empath--I think that's a fair point, if the OP's intention is to use this in a production environment. But it's not clear if that's the case. If it's only for his own enjoyment or curiosity, Linux may just add to the confusion.
posted by dfriedman at 5:15 AM on January 24, 2013


It's not just curiosity - I may need to use this in a production environment eventually, although not from a local machine. The idea of installing a Linux VM makes sense. Any pointers on how to do this would be appreciated.
posted by MuffinMan at 5:33 AM on January 24, 2013


It's actually pretty easy. You can pretty much just click next to continue through the whole process and it'll work. Since you're not going to be setting up a production server yourself, you don't really need to be overly concerned about really getting in to deep with configuring linux.

Once you're in linux, any of the tutorials should be a bit easier to follow (for example, to install solr in ubuntu, you can just copy and paste three commands)
posted by empath at 6:06 AM on January 24, 2013


Regarding VMs, I don't see why VMware would be needed. VirtualBox is easy to install on OS X and will work fine for this.

This is a little outdated, but the procedure won't have changed much except for version numbers: virtual Ubuntu LAMP server on OS X with VirtualBox.

If you want a graduated introduction to the command line, I'd suggest first breezing through Neal Stephenson's extended essay In the Beginning was the Command Line, downloadable freely here. It's something of a polemic, but it's a lot easier to get through than a dry how-to compendium as a first-read.

After that I'd go directly to TLDP and start reading the Introduction to Linux: Hands on Guide.

After that buy a copy of the Unix/Linux System Administrators Handbook (which also includes the more UNIX-like command syntax variations that OS X often expects) and turn your attention to your project.

Also find the IRC support channels and be aware of the listserv archives for the open-source projects you're using.
posted by snuffleupagus at 7:20 AM on January 24, 2013


It is fine to ask for help, but I think to succeed in doing what you are setting out to do, you need to focus on becoming more self-reliant.

First, you shouldn't be afraid to try stuff. That is the only way you are really going to learn anything. There is little you can do from a command line that is truly catastrophic. Just avoid doing anything as the 'root' user (which requires that you either log in as root, or that you escalate privileges by invoking the 'sudo' or 'su' commands). The last line against disaster is to have good backups. You have a mac. It has Time Machine for hourly automatic backups. Use it.

Next, I'm not sure how you settled on the combination of Nutch, Solr and Drupal, but I'm guessing you did some googling. Good, because you need to do a lot more of that. Trying things and googling the error messages will carry you a long way.

All that said, a few things that will help:

As others have suggested, any book or online website with an introduction to Linux or FreeBSD system administration will cover command-line ('shell prompt' or 'bash prompt') basics and clarified some of the points in the nutch tutorial that you got hung up on.

And the bit about NUTCH_RUNTIME_HOME, that is telling you to set and then export an environment variable by that name so that when you run nutch, it knows where it is supposed to find and stash its data, config files, components, etc. That probably doesn't make a whole lot of sense to you. That's fine, I've deliberately used some vocabulary that you can use to help Google your way to fuller explanations.

Finally though. Given where you are in terms of your knowledge, why are you going to be specifying technologies that developers should be building on?
posted by Good Brain at 2:22 PM on January 24, 2013


Next, I'm not sure how you settled on the combination of Nutch, Solr and Drupal, but I'm guessing you did some googling.

Just to be 100% clear. I've worked with CMS systems for several years, but not from a nuts and bolts development perspective. I haven't just googled it. I've left it open for the developers to suggest alternatives for the specific project I have, but I'm pretty clear that the options I've looked at are the best fit.
posted by MuffinMan at 12:30 AM on January 25, 2013


That's cool--it's worth noting that Drupal is one of the less approachable of the major CMSes; with the trade-off that doing more complex things with it is usually less clunky once you're up to speed.
posted by snuffleupagus at 5:49 AM on January 25, 2013


« Older Can an intermediary bank in a transaction apply...   |   5 or 6 days around Christmas time. Where to go... Newer »
This thread is closed to new comments.