Bookmarklets and PHP parsing of websites
March 8, 2006 9:24 AM   Subscribe

I've been trying to find good resources on creating a bookmarklet for a web site i'm working on, and then parsing the passed url with PHP.

Basically i want to write a bookmarklet that allows a user to "post this page," the page being a youtube or google video page, to a website im working on. I haven't really been able to find good resources on integrating that with a PHP script that would then load the page and parse it, and in the process extract the embed this object tag. Any info on where i could find help on parsing web pages with PHP, and how one goes about setting up a web page to take urls passed by bookmarklets would be greatly appreciated.

I have root access to the server, and we're running apache 1.3.
posted by sourbrew to Computers & Internet (16 answers total)
 
This is easy. You pass various elements of the page using javascript variables into GET variables in the URL, which your php script then picks up from $_GET["variablename"]. I'll make you a contrived example in a minute to get you started.
posted by evariste at 9:29 AM on March 8, 2006


Best answer: Here's your basic bookmarklet. You want to compress it all into one logical line of code (no newlines), although you can have multiple javascript statements separated by ;'s.

javascript:d=document;w=window;t=';if(d.selection){t=d.selection.createRange().text}else%20if(d.getSelection){t=d.getSelection()}else%20if(w.getSelection){t=w.getSelection();}void(w.open('http://yourdomain.com/path/to/yourscript.php?&pagetitle='+escape(d.title)+'&url='+escape(d.location.href)+'&quote='+escape(t),'_blank','status=yes,resizable=yes,scrollbars=yes'))

This creates a new window that loads a URL that looks like http://yourdomain.com/path/to/yourscript.php?pagetitle=something&url=somethingelse&quote=theselectedtextonthepage

Continued in the next comment...
posted by evariste at 9:57 AM on March 8, 2006


Response by poster: what if i don't want the user to actually have to select the text though, if i just want to parse the site using PHP. I'm familiar with streams in C++, java, and some other languages, but i've been having a hard time finding GOOD help sites on setting them up in PHP. Should i just cave and buy the Orielly book?
posted by sourbrew at 10:05 AM on March 8, 2006


Best answer: Here's the form part of yourscript.php, which handles all the GET variables passed in in the URL:

<form method="post" class="myformclass" action="backendscript.php" name="bookmarklet_handler">
<table>
<td align="right"><label for="title_field" class="mylabelclass">Headline:</label></td>
<td><input class="mytextboxclass" id="title_field" name="pagetitle" size="48" value="<?=$_GET["pagetitle"]?>" /></td>
<textarea class="mytextareaclass" id="new_post_textarea" name="mainbody" style='width:800px;height:500px;'><?="<a href=\"" . $_GET["url"] . "\">" . $_GET["pagetitle"] . "</a>\n\n<blockquote>" . $_GET["quote"] . "</blockquote>"?></textarea>
</table>
</form>


Then all you have left to do is write "backendscript.php", which will handle the POSTed form input after your user has edited the new post to their liking and add it to the database. I assume you know how to handle a POST form in php.
posted by evariste at 10:07 AM on March 8, 2006


Response by poster: I suppose the functionality i invision goes like this

User sees "phatty" video

User click bookmarklet

My php file recieves url

Loads source for passed url

Determines if its YouTube or Google Video

Parses page to find a specific set of tags

Inputs tags into submission box.

I sort of feel like i'm asking for too much with all of that, a pointer to a good reference manual covering stream parsing would probably be adequate for now.
posted by sourbrew at 10:07 AM on March 8, 2006


Response by poster: also, that base code for the booklet should help a lot in at least getting things started, thanks
posted by sourbrew at 10:11 AM on March 8, 2006


sourbrew-streams? You're barking up the wrong tree! You want to use the DOM (document object model) in Javascript to pick out the parts of the document that you want to pass to php, in GET variables. Read up on the DOM to figure out how to select particular tags that you're looking for; there are tons of excellent DOM tutorials on the web if you google. javascript in the bookmarklet + DOM -> escape()'ed GET vars in the URL -> PHP form -> form handling script -> database -> website.

If you google around, you can also find a website that will take your multiline javascript and compress it into a single line with no spaces, suitable for use in a bookmarklet. That way you can write it in a comfortable way in your favorite code editor, and then turn it into a single-line, properly-escaped bookmarklet.
posted by evariste at 10:11 AM on March 8, 2006


Loads source for passed url

Determines if its YouTube or Google Video

Parses page to find a specific set of tags
You can do all this in javascript with its regular expressions. Just figure out if the URL (d.location.href in the bookmarklet above) contains youtube, and if it does, use the DOM to locate the embed or object tag, if any, and set a variable to contain the (properly escaped) innerHTML property of the tag. Then you handle it in PHP as above.
posted by evariste at 10:17 AM on March 8, 2006


More specifically, if you know that YouTube always gives the embed tag you're looking for a specific id, you can do this in javascript:

embtag=escape(document.getElementById("the_youtube_id").innerHTML);

and then pass it in the URL to your php script. Look at YouTube's source code on a couple of different pages and see if the pickings are as easy as that. I suspect they are.
posted by evariste at 10:22 AM on March 8, 2006


Anyway, good luck. This should be enough pointers for you to be able to figure out how to do this.
posted by evariste at 10:23 AM on March 8, 2006


My php file recieves url

Loads source for passed url


$source = file_get_contents( $_GET['url'] );

You should do the parsing server-side. You could do it client-side and that would be faster, but it's likely not worth it. At some point after you release the bookmarklet, Google Video or YouTube will change HTML format. Then you'll need to change your parser. If the parser is on your server, you can change it for everyone all at once. If it's on each individual user's client browser in the bookmarklet, each user will need to first realize the bookmarklet is broken and then return to your site to get the updated parser.
posted by scottreynen at 11:35 AM on March 8, 2006


Response by poster: Thank you sir, i won't be tackeling this until tomorrow, wanted to get my ducks in a row. I'll let you know how it works out.
posted by sourbrew at 11:36 AM on March 8, 2006


Response by poster: scottreynen, yeah i was already planning for that eventuality.
posted by sourbrew at 11:37 AM on March 8, 2006


scottreynen-ah, good point about the format changing.
posted by evariste at 11:44 AM on March 8, 2006


You might want to look at this FPP before you spend too much time on this.
posted by scottreynen at 1:28 PM on March 8, 2006


Response by poster: scottreynen, yeah i've seen that program before. Dosn't really apply to our site though, made a concious decision not to make downloads very easy. With the frequent take down notices to YouTube it seemed like it had the potential to call forth nasty-grams from the sky.
posted by sourbrew at 1:58 PM on March 8, 2006


« Older Will I kill us all? How do I estimate the strain...   |   Denied health insurance. Newer »
This thread is closed to new comments.