One problem is enough
April 28, 2010 10:41 PM   Subscribe

Can somebody point me to the documentation for Microsoft's htmlfile ActiveXObject?

I need a little Windows script that uploads all the PDF files in the current folder to a web/FTP server, then regenerates an HTML page containing links to the uploaded files.

I'm writing it in JScript, and I would like it to end up as a simple, single, double-clickable .js file, so Windows Script Host will be what runs it, not a browser.

It's all done except for the part that reads in the existing HTML file, replaces the existing list of links with the regenerated list, and writes out the new HTML file. Assorted searches are giving me the hint that the first line I should be writing for this is

var doc = new ActiveXObject("htmlfile");

but I have no clue what to do with the resulting doc object. Can I populate it from a text stream containing HTML? How can I fiddle with its bits? How can I write the result out to a new text stream? If I do it this way, will any of the existing HTML indenting and/or commenting make it through to the output file?

So I'm looking for the official Microsoft documentation on the htmlfile object, and utterly failing to find it on MSDN. Can some kind soul link it for me? I so don't want to do this with regexes.
posted by flabdablet to Computers & Internet (4 answers total)
 
Would it not be easier to deal with the file as XML? Since you're controlling the file you can make sure it's compliant. The MSXML parser is well documented.
posted by DWRoelands at 11:09 PM on April 28, 2010


Best answer: Digging through a few links, this seems to be what you want. I'm assuming that based off of this link (see post #3 in thread)
posted by jangie at 11:19 PM on April 28, 2010


Response by poster: Based on this, I saved the following as foo.js and tried it out:

var doc = new ActiveXObject("htmlfile");
var fso = new ActiveXObject("Scripting.FileSystemObject");
var html = fso.OpenTextFile("original.html", 1, true);
doc.open();
doc.write(html.ReadAll());
doc.close();
html.Close();
html = fso.CreateTextFile("out.html", true);
html.Write(doc.documentElement.outerHTML);
html.Close();

It works, except that it removes most whitespace and re-renders all tags in uppercase; the result would be rather unpleasant for our web guy to work on by hand. It also loses the initial <!DOCTYPE ... > header.

The documentElement property is also not listed in the IHTMLDocument2 interface reference that jangie found, so I'm still basically working in the dark with this thing. I hate working in the dark.

So for all those reasons, I guess I'll be doing this the wrong way after all, with text search and replace. Sigh.

Embedded-systems programmers should not work on web stuff. It hurts our brains.
posted by flabdablet at 3:40 AM on April 29, 2010


documentElement is from the IHTMLDocument3 interface. When you're accessing a COM object through scripting (using IDispatch), you can use any interface that the object supports.

But you're probably right that this is a cumbersome approach for modifying HTML documents in-place.

You can't use Python and Beautiful Soup, or something like that?
posted by zixyer at 9:33 AM on April 29, 2010


« Older Free Windows-Based Database That Is Still Free For...   |   Employees and metrics and ops, oh my! Newer »
This thread is closed to new comments.