What are these strange requests in my HTTP server log?
June 12, 2005 10:43 AM   Subscribe

Why would requests for other websites show up in my web server's access log, and does it indicate that there's a vulnerability in my server?

I've been running a small web site on a custom web server for the past month. Recently I've noticed in the logs a bunch of really odd HTTP requests showing up where the request is actually to a different web site: for instance, where a normal request is getting "/images/banner.png" or "/" or "/index" or something, there are requests for "http://www.sciencedirect.com/" and a couple other academically-related URLs. (In case you don't know, ScienceDirect is one of those sites that serves up academic papers and charges you — or more likely the school you go to — an arm and a leg for the privilege. It's unlikely that they're trying to spam server logs.) The requests come from a bunch of different source IP addresses, but the requests are always the same. I tried telnetting to the server and forging various GET requests that I can imagine would place an entry like that in my server's log, but I never get anything in response other than an error message.

Any ideas what's causing this? And more importantly, is it possible that my web server is doing something it shouldn't be doing and that I'll need to fix?
posted by jacobm to Computers & Internet (14 answers total)
I'm not sure either, but for what it's worth, we get this too. The requests are for websites which -we- linked to, though.
posted by Count Ziggurat at 10:50 AM on June 12, 2005

Can you give a single line log example?
posted by Kickstart70 at 10:51 AM on June 12, 2005

Can you post a couple of the log lines in question? One way this might happen is if, say, a person were at your site and typed in the offending URL into their location bar, but appended it to your URL instead of replacing it. In that case, the request would look something like (if the person were at the root level of your site):


This could account for a couple of strays, but it sounds like you're seeing more repetition than would be explained by that. Can you look at the referrer for the requests? Another explanation would be if someone mis-formatted a link like:

that might be the more likely explanation...

posted by TonyRobots at 10:53 AM on June 12, 2005

Are you sure you aren't seeing the referrer? Sounds like referrer spam to me.
posted by sbutler at 10:59 AM on June 12, 2005

Kickstart70 and TonyRobots: sure, but like I said this is isn't Apache or anything (if you care, it's the PLT Web Server) and the logging is a bit ... erm ... minimal. Here are a few lines from the log, a set of 6 sequential requests from two different IPs. The first three are what I'm talking about, and they're all from the same source; the second three are from another computer and they're normal.

(from "xxx.xxx.xxx.xxx" to "yyy.yyy.yyy.yyy" for "http://www.sciencedirect.com/" at "Friday, May 27th, 2005 10:11:10pm")
(from "xxx.xxx.xxx.xxx" to "yyy.yyy.yyy.yyy" for "http://ieeexplore.ieee.org/search/advsearch.jsp" at "Friday, May 27th, 2005 10:16:06pm")
(from "xxx.xxx.xxx.xxx" to "yyy.yyy.yyy.yyy" for "http://www.google.com/intl/zh-CN/" at "Friday, May 27th, 2005 10:18:59pm")
(from "zzz.zzz.zzz.zzz" to "yyy.yyy.yyy.yyy" for "/" at "Friday, May 27th, 2005 10:24:04pm")
(from "zzz.zzz.zzz.zzz" to "yyy.yyy.yyy.yyy" for "/images/banner.jpg" at "Friday, May 27th, 2005 10:24:04pm")
(from "zzz.zzz.zzz.zzz" to "yyy.yyy.yyy.yyy" for "/css/base.css" at "Friday, May 27th, 2005 10:24:04pm")

sbutler: I don't think it's referrer spam because the urls in question aren't your normal referrer-spam suspects. Unfortunately, the log doesn't include referer (yeah, I know, kinda ghetto, but there you go) so I can't see if it's a broken link on somebody else's web page, but even if it were, WHY would they be linking to my web page like that? It'd be pretty odd, my web site has nothing to do with any of the URLs appearing in the log. (I certainly don't link to anything remotely like them on my site anywhere at all.)
posted by jacobm at 11:05 AM on June 12, 2005

Ahhh... I see. After looking at those logs I'd agree with TonyRobots.
posted by sbutler at 11:09 AM on June 12, 2005

They look like something is trying to use your server as an HTTP proxy server. A request to an HTTP proxy looks just like a request to an HTTP server, except the full address is used in place of "/" or "/index.html". This is consistent with your log results.

Why they'd be doing this, I have no idea.
posted by cillit bang at 11:30 AM on June 12, 2005

These are probably open proxy scans: HTTP proxies expect the full URL in the GET. People scan for open HTTP proxies because they can be used to either cover their tracks on the web (whether for illegal activity or exploiting software vulenrabilities) or to use them for spamming (you can spam SMTP through an HTTP proxy by using POST requests).

TonyRobots explanation, though creative, doesn't work: Concatenating two URLs would cause the logged request to be /http://whateverURL.
posted by fvw at 11:32 AM on June 12, 2005

When I saw this I first thought they were trying to use your server as an HTTP proxy server, like cillit bang says. What happens when you go to http://yourserver/http://www.google.com/? It shouldn't work.

If they are trying to use your server as a proxy, they might do this to superficially hide which sites they are visiting or to abuse resources you have that they don't (a license to journals on ScienceDirect).
posted by grouse at 11:37 AM on June 12, 2005

Going to http://yourserver/http://www.google.com/ won't test for an HTTP proxy (it might work if you were using a CGI proxy installed at /, but it's not common). If you want to test whether your web server is proxying you can set it as your proxy in your web browser settings (it might just proxy for your machine and not for people on the big bad internet thoug, there's no substitute for checking the config and reading the documentation).
posted by fvw at 11:46 AM on June 12, 2005

cillit bang, fvw, grouse: that makes sense, the server is located on an academic network and it makes sense that people might be scanning for proxies so they could get access to journals and so on. And since my server doesn't respond to those requests by actually proxying for them, I guess there's no problem. That's a relief.

fvw: I've tried directly asking my server for the given url like this

$ telnet myserver 80
GET http://google.com HTTP/1.0

and I get an error rather than the page. Is that enough to verify that my server's not inadvertently acting as a proxy?
posted by jacobm at 11:56 AM on June 12, 2005

As long as you included the two newlines at the end, yes, that means you're fine. Mind you, there was no reason to assume you were running an open proxy, these scans are a bulk thing, I regularly see them scroll past in my webserver logs too.
posted by fvw at 12:19 PM on June 12, 2005

Thanks for correcting my error, fvw.

I have just checked the Apache access_log I have access to on an academic web server. I'm also finding IPs from a Chinese network trying to GET http://www.sciencedirect.com/. It doesn't work on this server either, so it looks like someone is trolling for an open door. Here is the whois for that IP in case you want to compare.

inetnum: -
netname: CHINANET-JS
descr: CHINANET jiangsu province network

posted by grouse at 2:17 PM on June 12, 2005

Yeah, I used to get the sciencedirect.com requests too when I was on a university network. Often wondered about those...
posted by Xelf at 4:37 PM on June 12, 2005

« Older Need help identifying a font   |   Canadian adoption? Newer »
This thread is closed to new comments.