Rename HTML files with title tag
February 13, 2006 10:11 PM   RSS feed for this thread Subscribe

I have a large number of HTML files with numeric names (001.html, 2990.html, etc.). I would like to have each file renamed with text drawn from each file's HTML title tag.

I have searched high and low for a way to do this, but have not been successful. Anyone know of any scripts or apps that can do this?
posted by tranquileye to technology (6 comments total)
I'm not very familiar with the program, but I'm willing to bet that you can somehow get AutoIt to do it for you.
posted by charmston at 10:18 PM on February 13, 2006


Because I like a challenge... here's a perl script to do it (tries to avoid shell meta characters and duplicates):

#!/usr/bin/perl -w

use strict;

use File::Find;
use File::Spec::Functions qw/rel2abs :DEFAULT/;
use HTML::TreeBuilder;

if (@ARGV != 2) {
print STDERR "usage: ./html-rename <in dir> <out dir>\n";
exit 1;
}

my $in = rel2abs( shift );
my $out = rel2abs( shift );
find( \&wanted, $in );

sub wanted {
my ($fname, $tree, $title, $count);
return unless /\.htm(l)?$/;

$tree = HTML::TreeBuilder->new_from_file( $_ );
if ($title = $tree->look_down( '_tag', 'title' )) {
$fname = $title->as_text . '.html';
$fname =~ s/[\|&`\$\*\?~]//g;
} else {
$fname = $_;
}

if (-e catfile( $out, $fname )) {
print STDERR "DUPLICATE: $fname\n";
return;
}

link $_, catfile( $out, $fname );
$tree->delete;
}

posted by sbutler at 11:00 PM on February 13, 2006


What OS are you using?
posted by normy at 10:05 AM on February 14, 2006


BASH oneliner:

for x in `ls *.html`; do foo=`grep '' $x`; foo2=`echo $foo|sed 's/<title>//'|sed 's###'`; mv $x $foo2.html; echo $foo2;done

Note that this has no error checking and will fail on pages where the title has funky characters like spaces in it.
posted by Kickstart70 at 11:56 AM on February 14, 2006


Dammit. Metafilter ruined my code.

check here instead.
posted by Kickstart70 at 11:57 AM on February 14, 2006


Thanks folks! Excellent help.
posted by tranquileye at 10:14 AM on February 16, 2006


« Older In the past few weeks I have s...   |   VDayfilter: Woman I like, hav... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Semantic markup and the world wide web: non-noob... December 22, 2007
How do I legitimately hijack website content? August 15, 2007
Software filter: looking for a free program to... July 9, 2007
Proper GET/POST Design for Deleting Items on a Web... May 22, 2007
I forgot HTML, I forgot CSS April 22, 2007