[!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"]
[html]
[head]
[meta http-equiv="content-type" content="text/html; charset=windows-1250"]
[meta name="generator" content="PSPad editor, www.pspad.com"]
[title]Sample Document[/title]
[/head]
[body]
[p]
[img src="http://blah.com/sample.jpg"]
[/p]
[p]
Some text is [a href="fjkj.html"]here[/a]
[/p]
[/body]
[/html]
All I want out of that thing is:bodyText = document.getElementsByTagName('body')[0].textContent).<html><head> <title>replace js</title> <script>function replace(){extractedHTML = document.getElementById('favouritecolour').innerHTML;// use the DOM to get the contents of that divreg = new RegExp("My favourite colour is [^.]+.");/* any colour will match because the regex is for "any string of characters up to the next period". */replacementHTML = extractedHTML.replace(reg, "My favourite colour is orange");// whatever colour they put, change it to orangedocument.getElementById('favouritecolour').innerHTML = replacementHTML;// put the tweaked HTML back in to the div}</script> </head><body><p> blah blah blah</p><div id="favouritecolour"> My favourite colour is blue. </div><p> <a href="javascript:replace()">replace</a></p><p> blah blah blah</p></body></html>
replaceSearchTerms('ph00dz', 'dumkopf');
// this code came from http://www.nsftools.com/misc/SearchAndHighlight.htm
// it was originally a highlighter... but repurposed for this thing!
function doReplace(bodyText, searchTerm, replaceText)
{
// find all occurences of the search term in the given text,
// and add some "highlight" tags to them (we're not using a
// regular expression search, because we want to filter out
// matches that occur within HTML tags and script blocks, so
// we have to do a little extra validation)
var newText = "";
var i = -1;
var lcSearchTerm = searchTerm.toLowerCase();
var lcBodyText = bodyText.toLowerCase();
while (bodyText.length > 0)
{
i = lcBodyText.indexOf(lcSearchTerm, i+1);
if (i < 0)br>
{
newText += bodyText;
bodyText = "";
}
else
{
// skip anything inside an HTML tag
if (bodyText.lastIndexOf(">", i) >= bodyText.lastIndexOf("< , i))br>
{
// skip anything inside a block
if (lcBodyText.lastIndexOf("/script>", i) >= lcBodyText.lastIndexOf("
{
newText += bodyText.substring(0, i) + replaceText;
bodyText = bodyText.substr(i + searchTerm.length); // bodyText.substr(i, searchTerm.length)
lcBodyText = bodyText.toLowerCase();
i = -1;
} // end if
} // end if
} // end else
} // end while
return newText;
} // end function
function replaceSearchTerms(searchText, replaceText)
{
searchArray = [searchText];
var bodyText = content.document.body.innerHTML;
for (var i = 0; i < searcharray.length; i++)br>
{
bodyText = doReplace(bodyText, searchArray[i], replaceText);
}
content.document.body.innerHTML = bodyText;
return true;
}
>>>
javascript:for(var i=0; !document.childNodes[i].innerHTML; ++i); document.body.innerHTML=document.childNodes[i].innerHTML.replace(/< .*?>/gm, ');>mrbill@ohno:~> links -dump test.html Some text is here mrbill@ohno:~> lynx -dump test.html [sample.jpg] Some text is [1]here References 1. file://localhost/disk/home/mrbill/fjkj.html
http://www.w3schools.com/js/js_examples_3.asp
Fact of the matter is, it's the best tool for doing this, far better and more reliable than regexps!
getElementsByTagName, for example, would be one way to find the title. There may even be convenience API these days where you can ask just for the page title.
Once you have an element, you can ask for its innerText() or (more compatible) innerHTML().
Another good site when shit inevitably breaks in some browsers. :)
posted by symphonik at 7:26 PM on March 26, 2006