Bookmarklets and PHP parsing of websites
March 8, 2006 9:24 AM Subscribe
I've been trying to find good resources on creating a bookmarklet for a web site i'm working on, and then parsing the passed url with PHP.
Basically i want to write a bookmarklet that allows a user to "post this page," the page being a youtube or google video page, to a website im working on. I haven't really been able to find good resources on integrating that with a PHP script that would then load the page and parse it, and in the process extract the embed this object tag. Any info on where i could find help on parsing web pages with PHP, and how one goes about setting up a web page to take urls passed by bookmarklets would be greatly appreciated.
I have root access to the server, and we're running apache 1.3.
Basically i want to write a bookmarklet that allows a user to "post this page," the page being a youtube or google video page, to a website im working on. I haven't really been able to find good resources on integrating that with a PHP script that would then load the page and parse it, and in the process extract the embed this object tag. Any info on where i could find help on parsing web pages with PHP, and how one goes about setting up a web page to take urls passed by bookmarklets would be greatly appreciated.
I have root access to the server, and we're running apache 1.3.
Best answer: Here's your basic bookmarklet. You want to compress it all into one logical line of code (no newlines), although you can have multiple javascript statements separated by ;'s.
This creates a new window that loads a URL that looks like http://yourdomain.com/path/to/yourscript.php?pagetitle=something&url=somethingelse"e=theselectedtextonthepage
Continued in the next comment...
posted by evariste at 9:57 AM on March 8, 2006
javascript:d=document;w=window;t=';if(d.selection){t=d.selection.createRange().text}else%20if(d.getSelection){t=d.getSelection()}else%20if(w.getSelection){t=w.getSelection();}void(w.open('http://yourdomain.com/path/to/yourscript.php?&pagetitle='+escape(d.title)+'&url='+escape(d.location.href)+'"e='+escape(t),'_blank','status=yes,resizable=yes,scrollbars=yes'))
This creates a new window that loads a URL that looks like http://yourdomain.com/path/to/yourscript.php?pagetitle=something&url=somethingelse"e=theselectedtextonthepage
Continued in the next comment...
posted by evariste at 9:57 AM on March 8, 2006
Response by poster: what if i don't want the user to actually have to select the text though, if i just want to parse the site using PHP. I'm familiar with streams in C++, java, and some other languages, but i've been having a hard time finding GOOD help sites on setting them up in PHP. Should i just cave and buy the Orielly book?
posted by sourbrew at 10:05 AM on March 8, 2006
posted by sourbrew at 10:05 AM on March 8, 2006
Best answer: Here's the form part of yourscript.php, which handles all the GET variables passed in in the URL:
Then all you have left to do is write "backendscript.php", which will handle the POSTed form input after your user has edited the new post to their liking and add it to the database. I assume you know how to handle a POST form in php.
posted by evariste at 10:07 AM on March 8, 2006
<form method="post" class="myformclass" action="backendscript.php" name="bookmarklet_handler">
<table>
<td align="right"><label for="title_field" class="mylabelclass">Headline:</label></td>
<td><input class="mytextboxclass" id="title_field" name="pagetitle" size="48" value="<?=$_GET["pagetitle"]?>" /></td>
<textarea class="mytextareaclass" id="new_post_textarea" name="mainbody" style='width:800px;height:500px;'><?="<a href=\"" . $_GET["url"] . "\">" . $_GET["pagetitle"] . "</a>\n\n<blockquote>" . $_GET["quote"] . "</blockquote>"?></textarea>
</table>
</form>
Then all you have left to do is write "backendscript.php", which will handle the POSTed form input after your user has edited the new post to their liking and add it to the database. I assume you know how to handle a POST form in php.
posted by evariste at 10:07 AM on March 8, 2006
Response by poster: I suppose the functionality i invision goes like this
User sees "phatty" video
User click bookmarklet
My php file recieves url
Loads source for passed url
Determines if its YouTube or Google Video
Parses page to find a specific set of tags
Inputs tags into submission box.
I sort of feel like i'm asking for too much with all of that, a pointer to a good reference manual covering stream parsing would probably be adequate for now.
posted by sourbrew at 10:07 AM on March 8, 2006
User sees "phatty" video
User click bookmarklet
My php file recieves url
Loads source for passed url
Determines if its YouTube or Google Video
Parses page to find a specific set of tags
Inputs tags into submission box.
I sort of feel like i'm asking for too much with all of that, a pointer to a good reference manual covering stream parsing would probably be adequate for now.
posted by sourbrew at 10:07 AM on March 8, 2006
Response by poster: also, that base code for the booklet should help a lot in at least getting things started, thanks
posted by sourbrew at 10:11 AM on March 8, 2006
posted by sourbrew at 10:11 AM on March 8, 2006
sourbrew-streams? You're barking up the wrong tree! You want to use the DOM (document object model) in Javascript to pick out the parts of the document that you want to pass to php, in GET variables. Read up on the DOM to figure out how to select particular tags that you're looking for; there are tons of excellent DOM tutorials on the web if you google. javascript in the bookmarklet + DOM -> escape()'ed GET vars in the URL -> PHP form -> form handling script -> database -> website.
If you google around, you can also find a website that will take your multiline javascript and compress it into a single line with no spaces, suitable for use in a bookmarklet. That way you can write it in a comfortable way in your favorite code editor, and then turn it into a single-line, properly-escaped bookmarklet.
posted by evariste at 10:11 AM on March 8, 2006
If you google around, you can also find a website that will take your multiline javascript and compress it into a single line with no spaces, suitable for use in a bookmarklet. That way you can write it in a comfortable way in your favorite code editor, and then turn it into a single-line, properly-escaped bookmarklet.
posted by evariste at 10:11 AM on March 8, 2006
Loads source for passed urlYou can do all this in javascript with its regular expressions. Just figure out if the URL (d.location.href in the bookmarklet above) contains youtube, and if it does, use the DOM to locate the embed or object tag, if any, and set a variable to contain the (properly escaped) innerHTML property of the tag. Then you handle it in PHP as above.
Determines if its YouTube or Google Video
Parses page to find a specific set of tags
posted by evariste at 10:17 AM on March 8, 2006
More specifically, if you know that YouTube always gives the embed tag you're looking for a specific id, you can do this in javascript:
embtag=escape(document.getElementById("the_youtube_id").innerHTML);
and then pass it in the URL to your php script. Look at YouTube's source code on a couple of different pages and see if the pickings are as easy as that. I suspect they are.
posted by evariste at 10:22 AM on March 8, 2006
embtag=escape(document.getElementById("the_youtube_id").innerHTML);
and then pass it in the URL to your php script. Look at YouTube's source code on a couple of different pages and see if the pickings are as easy as that. I suspect they are.
posted by evariste at 10:22 AM on March 8, 2006
Anyway, good luck. This should be enough pointers for you to be able to figure out how to do this.
posted by evariste at 10:23 AM on March 8, 2006
posted by evariste at 10:23 AM on March 8, 2006
My php file recieves url
Loads source for passed url
$source = file_get_contents( $_GET['url'] );
You should do the parsing server-side. You could do it client-side and that would be faster, but it's likely not worth it. At some point after you release the bookmarklet, Google Video or YouTube will change HTML format. Then you'll need to change your parser. If the parser is on your server, you can change it for everyone all at once. If it's on each individual user's client browser in the bookmarklet, each user will need to first realize the bookmarklet is broken and then return to your site to get the updated parser.
posted by scottreynen at 11:35 AM on March 8, 2006
Loads source for passed url
$source = file_get_contents( $_GET['url'] );
You should do the parsing server-side. You could do it client-side and that would be faster, but it's likely not worth it. At some point after you release the bookmarklet, Google Video or YouTube will change HTML format. Then you'll need to change your parser. If the parser is on your server, you can change it for everyone all at once. If it's on each individual user's client browser in the bookmarklet, each user will need to first realize the bookmarklet is broken and then return to your site to get the updated parser.
posted by scottreynen at 11:35 AM on March 8, 2006
Response by poster: Thank you sir, i won't be tackeling this until tomorrow, wanted to get my ducks in a row. I'll let you know how it works out.
posted by sourbrew at 11:36 AM on March 8, 2006
posted by sourbrew at 11:36 AM on March 8, 2006
Response by poster: scottreynen, yeah i was already planning for that eventuality.
posted by sourbrew at 11:37 AM on March 8, 2006
posted by sourbrew at 11:37 AM on March 8, 2006
scottreynen-ah, good point about the format changing.
posted by evariste at 11:44 AM on March 8, 2006
posted by evariste at 11:44 AM on March 8, 2006
You might want to look at this FPP before you spend too much time on this.
posted by scottreynen at 1:28 PM on March 8, 2006
posted by scottreynen at 1:28 PM on March 8, 2006
Response by poster: scottreynen, yeah i've seen that program before. Dosn't really apply to our site though, made a concious decision not to make downloads very easy. With the frequent take down notices to YouTube it seemed like it had the potential to call forth nasty-grams from the sky.
posted by sourbrew at 1:58 PM on March 8, 2006
posted by sourbrew at 1:58 PM on March 8, 2006
This thread is closed to new comments.
posted by evariste at 9:29 AM on March 8, 2006