Pharma Spam on a web site....
July 1, 2008 9:17 AM   Subscribe

A friend of mine's site is plagued with pharma spam in Google's cache.

A friend of mine's site is hosted on Dreamhost and when the site and the code for the site is viewed, there are no problems with it. However, if you search for just about any drug on the planet, his site comes up pretty high in the ranks on Google.

When you follow the link off Google, it makes no sense why his site is appearing. However, if you view the Google cache for the page... tons of pharma spam. This despite no one but the friend having passwords or access to the site and he's running the latest version of his CMS (expression engine) which is, by all accounts we could find, secure. His site also doens't allow trackbacks, comments, or anything else that allows others access of any kind to the cms.

So...

1) How did these spammers do this?
2) How can we ensure they don't do it again?
3) What should we do regarding Google? (ie, alert them? ignore it and hope it's gone the next time the robot comes thru?)
4) I did a bit of searching and found some other people with this problem... they're also on DH, though DH won't own up to it and blames wordpress (which isn't installed and never has been) and the friend not being diligent with his password (he is).
posted by dobbs to Computers & Internet (21 answers total)
 
Go ahead and give us the site - it'll be easier to diagnose the problem that way.
posted by unixrat at 9:54 AM on July 1, 2008


What's probably happening is that your friend's site was altered to only display the spam when the page is accessed by Google's crawler bot. This is done to increase the spammers' pages rank.
posted by qvtqht at 10:59 AM on July 1, 2008


Solution: Move off of Dreamhost.

Problem: Someone has found a way to exploit Dreamhost's servers and Dreamhost is playing dumb. But you get what you pay for.

Tto answer your questions:
1) Some exploit to the system. It could be by taking advantage of some failure point in Dreamhosts's general configuration or by hacking a blog or CMS you're running. After that they probably defaced your site in a non-obvious way (Text hidden by style sheets, etc, or by feeding search robots different text than regular browsers.

2) Get a new host.

3) Ignore it, it will go away pretty quickly. Alterting Google could get you delisted because your site has been compromised and is spammy and trying to manipulate page rankings. Simply get set up at the new host and ask Google to crawl it again. And if you're feeling up to it, add a site map.
posted by Ookseer at 11:08 AM on July 1, 2008


That was to cover the 1). Since you mention that they are running a CMS, what probably happened was that the spammers exploited a security hole in the CMS before it was patched. They probably did not do this individually to your site, but did a mass sweep of all sites running that CMS that they could find.

2) Have someone look at your web site and find where this alteration took place and remove it. To make sure it doesn't happen again, keep up on security updates and check your site regularly like you did when you found this. Check your site's cache on Google, visit your site, and also visit your site by clicking a link from Google.

3) You can wait for Google to re-index your site, or you can ask them to do it faster by using Webmaster Tools. Using the Add URL tool may also work.
posted by qvtqht at 11:12 AM on July 1, 2008


Response by poster: Thanks for the answers thus far.

What's probably happening is that your friend's site was altered to only display the spam when the page is accessed by Google's crawler bot.

How did they do this, though?

I have already recommended the guy move from Dreamhost. I recommend Pair--is anyone aware of a similar problem happening there?

Also... why would anyone do this? I don't understand how they benefit.
posted by dobbs at 11:24 AM on July 1, 2008


It's worth looking at the page source for your friend's site. If the black hats have gotten in, one trick they can pull is to insert their material but bracket it in JavaScript in a way that prevents it from displaying on a normal browser. (There are several ways of doing that.) That means that crawlers see it but you don't.

So just because you can view your friend's page and don't see anything untoward, that doesn't mean his page is clean. The only way to be sure is to load the page in your browser, and then tell it to show you the raw HTML source that was downloaded.
posted by Class Goat at 11:26 AM on July 1, 2008


Response by poster: By how I don't mean how did they get in (seems clear it was some security issue)... but what was altered? What is altered so that only Google's crawler sees it? I've examined the code, the htaccess file, and everything else I can think of and find nothing suspicious.
posted by dobbs at 11:30 AM on July 1, 2008


How did they do this, though?

There are several ways it can be done. One way is to put the black hat code inside an iframe and set the size of the iframe to one pixel by one pixel.

Also... why would anyone do this? I don't understand how they benefit.

Search Engine Optimization. They're gaming the google rating system by hijacking sites which Google considers "clean" and making them point to their own page, to boost their own rating.
posted by Class Goat at 11:30 AM on July 1, 2008


Response by poster: Yeah, I've already done that. There is nothing untoward in the code. I've also had the CMS makers look at it and they find nothing strange either.
posted by dobbs at 11:31 AM on July 1, 2008


Response by poster: Sorry, we're talking over each other here. :)

Class Goat, that's one of the things I don't get. When you do a search at Google for the drugs and his page comes up and you click the link, it just goes to his web site and loads the proper page. It doesn't redirect anywhere or give popups or anything. It just loads his normal web site.
posted by dobbs at 11:32 AM on July 1, 2008


Please give us the page URL. If you're not familiar with this kind of thing, you may have missed the trick. For instance, the black hat code could have been hidden inside an include file rather than being in the main line. It could have been obscured by being hex encoded. There are a lot of ways of obfuscating this.
posted by Class Goat at 11:33 AM on July 1, 2008


Response by poster: Class Goat, I memailed you the url. I feel weird posting it here for eternity in a thread about Pharm Spam when we're basically trying to disassociate his site from that. I hope that's okay.
posted by dobbs at 11:39 AM on July 1, 2008


Not a problem.

As I mentioned in my response, it looks like your friend's page is clean now. The residual problem is that Google has a long memory and doesn't necessarily visit very often.

qvtqht's third suggestion is your next thing to do.
posted by Class Goat at 12:03 PM on July 1, 2008


Response by poster: Thanks!
posted by dobbs at 12:23 PM on July 1, 2008


Just because the page looks clean to you does not mean that it is clean, especially since the owner has made no changes to it. The blackhat code probably prints the spam based on either the user agent string or the IP address of the client -- showing itself to search engines, but hiding itself from everyone else.

I doubt that it would be using javascript or iframe tricks, because that would not help them with ranking.

The first thing to try is to mask your browser as Googlebot and try viewing your site. This will only work if the blackhat code looks at the user agent and not the IP address.

The code could be hidden by using Apache directives in the .htaccess file, so they may not be obvious by looking at just the CMS code.

Finally, Dreamhost has its flaws, but I don't think this problem was caused by Dreamhost, especially if you or your friend installed the CMS yourself. It is more likely that the blackhats exploited the bug in your CMS as opposed to breaking into Dreamhost's servers. Moving to a new host is not guaranteed to resolve your problem.
posted by qvtqht at 12:41 PM on July 1, 2008


Qvtqht, I looked at the source and there isn't anything that would do that. It's not that it "looks clean" when loaded in the browser, it's that it looked clean when I read the HTML source.
posted by Class Goat at 12:43 PM on July 1, 2008


Class Goat: Did you read above about user agent strings, etc.? Did you look at the source code on the server side? If not, what user agent string did you send to the server when requesting the HTML?
posted by qvtqht at 12:52 PM on July 1, 2008


Use something like refspoof to fake like you're googlebot, and then hit your site. You'll find out pretty quickly whether it's still hax0red
posted by Mach5 at 1:09 PM on July 1, 2008


Response by poster: qvtght, I memail'd you the url as requested.

The CMS in question is Expression Engine and I could find no one else on the web reporting this problem using that CMS. I reported the issue on their forum and also had no one say they had it. Three of their people also investigated the issue promptly and, I thought, rather thoroughly, and found nothing amiss. They could be CYAing but based on years of using their CMS for numerous clients (none of which are having this issue), I believe them.

Mach5, that refspoof doesn't seem to work with the latest FF, which is all I have.
posted by dobbs at 1:18 PM on July 1, 2008


As a related note on the issue of hidden, Google-only content, are you using a third-party "skin" or visual design template that you did not author yourself? In the WordPress universe, there have been cases of skins being distributed for free, but with hidden links to linkfarm sites.
posted by squid patrol at 3:43 PM on July 1, 2008


Response by poster: squid patrol, no there's no skin. Oh, and I didn't design the site for the guy, just for the record. I merely installed the CMS which is not wordpress.
posted by dobbs at 4:52 PM on July 1, 2008


« Older Podcast CMS needed   |   San Diego Mexican food Newer »
This thread is closed to new comments.