Searching the net for an uploaded photo
June 25, 2005 8:18 AM   Subscribe

Is there a way to search for a photograph on line, not by caption, key words, etc. but by uploading a photo and then searching for that photo?
posted by rabbus to Computers & Internet (13 answers total)
 
I'm not sure I understand. How would you "search" for a photo without using some sort of keyword or tag? Secondly, if you uploaded it you would know its location, so hypothetically, you wouldn't have to search for it.

Can you give an example?

That being said, flickr is a service that allows you to upload photos and search for them.

Google Images also allows you to search for photos, but it is done by keyword.
posted by bwilms at 8:38 AM on June 25, 2005


I think you might be talking about Content-Based Image Retrieval, in which you search for photos that actually look similar to the one you upload. There is a large research project going on at Penn State that is looking into this problem. The airplane-enthusiast site airliners.net has a prototype of a CBIR system in place, but as far as searching the Internet at large, I know of no publicly-deployed solution.
posted by TPIRman at 8:58 AM on June 25, 2005


The ideal CBIR system from a user perspective would involve what is referred to as semantic retrieval, where the user makes a request like "find pictures of dogs" or even "find pictures of Abraham Lincoln".

That does indeed sound difficult but I think what rabbus is asking is much easier to implement and is something I've had a use for.

How can I find all instances of a given photo on the Internet? If you ignore resizing and cropping and just focus on the *exact* same photo, even if it has a different name, this doesnt sound too hard. Some kind of hash could be generated for the photo file, then if you know that hash it would be easy to have a search engine help you find all instances, again, of that same exact photo.

This may be highly useful for photographers looking for places where their images have been just renamed and hosted somewhere else.
posted by vacapinta at 9:21 AM on June 25, 2005


I was looking for something like this a while back. You would think computers could do this easily, with user set tolerances for composition and color. You'd get more hits if the program could use parts of photos in any orientation to match yours. The results would be interesting, because you could get pictures of completely different things that look alike.


“And that group of clouds over there gives me the impression of the stoning of Stephen . . . I can see the apostle Paul standing there to one side.”
posted by weapons-grade pandemonium at 9:27 AM on June 25, 2005


A somewhat related, but unfortunately (for you) unanswered AskMe is here.
posted by odinsdream at 9:35 AM on June 25, 2005


vacapinta, that might be less useful than you think. If there were a service like that, any savvy photo stealer would simply change one pixel to avoid matches.

weapons-grade pandemonium, computers can't do this kind of pattern matching easily yet. You know those graphics of words that you have to type in when you're buying tickets online or registering for a site? Computers can't recognize the words in those for much the same reason.
posted by Hubajube at 9:41 AM on June 25, 2005


Oh, I also wanted to add a reference to my previous question on this matter.

vacapinta, that might be less useful than you think. If there were a service like that, any savvy photo stealer would simply change one pixel to avoid matches.

I'm beginning to understand mathowie's point about countering every possible objection...sure, that solution is 0th level. But, there are larger level hashes which would even be robust under resizing etc.
posted by vacapinta at 9:43 AM on June 25, 2005


Yes, there is a company that provides this exact service - it's aimed at large organisations whose products are frequently pirated. For example, Nike could give them a copy of their 'swoosh' logo and the software would spider the web and find all unauthorised uses of the Nike logo - which could then be used to track down unauthorised retailers, counterfeiters etc.

No, sadly, I can't remember what the company was called or even if they still exist; I met them at a trade fair a few years ago and they were a startup at the time. I also seem to remember that the service was hugely expensive and only worked properly on logos. Still, it proves that it's possible, if not practical.

On preview, Hubajube: a couple of students came up with an algorithm to crack those "re-type the following letters" tests a few months ago. They won't be around for much longer.

the tests, that is: not the students
posted by blag at 9:49 AM on June 25, 2005


Some kind of hash could be generated for the photo file, then if you know that hash it would be easy to have a search engine help you find all instances, again, of that same exact photo.

That's exactly what bitzi does, but it's tough to explain and needs to be built into tools to be interesting.
posted by mathowie at 10:09 AM on June 25, 2005


blag, perhaps you're thinking of BayTSP?

"BayTSP's spider programs use patented algorithms to scour public web siteslooking for pictures, video, and music files. "Our algorithms are adaptive,"claims Ishikawa. "You can cut a picture in half and we'll still find it,matching the cut-down version against a database of originals, effectively matching the electronic DNA of the target."
http://www.pbs.org/cringely/pulpit/pulpit20020919.html
posted by trevyn at 12:59 PM on June 25, 2005


This is the coming thing, awaiting predictable increases in processing power and bandwidth, and not only for image content but also for video (the holy grail) and audio. In theory, it's not all that complicated, certainly for audio (spectrum matching using FFTs). You just need more juice than most of us have, probably some kind of distributed processing algorithm. God help us when it becomes commonplace, by the way. That will be a brave new world. Among other things, anonymity will be a thing of the past (think about an early application, those face-matching applications used for security screening at things like the super bowl).
posted by realcountrymusic at 5:32 PM on June 25, 2005


I'm sure there are other packages that do this, but the one I'm familiar with for Windows is called D'Peg! It can analyze image files for things like average brightness, quadrant RGB values, and such. It stores this in a database for each file, then runs a comparison search, finding images that match, plus or minus a certain tolerance.

D'Peg! is only for your local machine and locally accessible filesystems. Run it with a few options turned on, and if you have more than a few hundred images, you'll quickly realize why no web-wide system has been deployed.

I've been thinking about this problem for a while, and I think it could be done with a distributed infrastructure, and a Gnutella-like search system, where the query is passed to many nodes at once. Each participating machine could "know about" a few thousand images. The obvious use for this would be locating the source or copyright-holder of an otherwise anonymous image, and the problem would be controlling how people tag images as their own. I think the searching part can be done, though.
posted by Myself at 10:26 PM on June 25, 2005


blag: "Yes, there is a company that provides this exact service - it's aimed at large organisations whose products are frequently pirated. ... No, sadly, I can't remember what the company was called or even if they still exist"

Sounds like you're talking about DigiMarc which does "digital watermarking". I have a couple of friends who use to work there.

I think that even if you cut out a piece of a much larger "DigiMarc'ed" image, the watermark can still be detected as it was repeated over and over and over in the image.

They used to have (and probably do still have) something called MarcSpider that would scan the web and report back if it found any of your "marked" images being used by others.

But as far as I know, the only way DigiMarc can check for images is to check for the actual watermark on an image; the image is just a "host". No way for them to say "...find me a photo online where the ratio of red to green is X, the file is lighter on the right-hand side than on the left, and contains a large peach-colored circle..."

If memory serves, all recent versions of Photoshop check each and every image you open (even if it's an image you just made from scratch yesterday) for their watermarking.
(yeah, this does kind of bug me since, on a G3, any extra scanning is just more time to wait--but who knows, I could just be a nefarious bootlegger of The Lockhorns or something)
posted by blueberry at 8:58 AM on June 27, 2005


« Older Books about software product development?   |   NYC municipal bond Newer »
This thread is closed to new comments.