Using awk to export many files from one text file
February 12, 2010 10:50 AM Subscribe
How can I use awk (or something better) to cut a big file into groups of files, based on the occurrence of a pattern?
I have a structure of folders on a server, that works out like this:
Client A
--Job 1
--Job 4
--Job 5
Client B
--Job 2
--Job 3
And so on, for many different clients and jobs. I wish to automate pulling a directory listing (already done via find /start/here/ -type d -maxdepth 2), then output a separate document for each client, containing a the list of jobs stored within.
I've gotten as far as redefining my awk delimiter from space to / (this is on OS X), and determining that the client name appears in $5, but I don't yet know enough awk to craft "gather all lines where $5 is the same, and put them into their own text file".
Or... am I using the wrong tool for the job? Is there a better way to do this?
I have a structure of folders on a server, that works out like this:
Client A
--Job 1
--Job 4
--Job 5
Client B
--Job 2
--Job 3
And so on, for many different clients and jobs. I wish to automate pulling a directory listing (already done via find /start/here/ -type d -maxdepth 2), then output a separate document for each client, containing a the list of jobs stored within.
I've gotten as far as redefining my awk delimiter from space to / (this is on OS X), and determining that the client name appears in $5, but I don't yet know enough awk to craft "gather all lines where $5 is the same, and put them into their own text file".
Or... am I using the wrong tool for the job? Is there a better way to do this?
~/scratch/fgf $ ls | while read d ; do seq 10 | while read n ; do mkdir -p "$d/Job$n" ; done ; done
~/scratch/fgf $ ls *
ClientA:
Job1 Job10 Job2 Job3 Job4 Job5 Job6 Job7 Job8 Job9
ClientB:
Job1 Job10 Job2 Job3 Job4 Job5 Job6 Job7 Job8 Job9
~/scratch/fgf $ ls | while read d; do ls $d > "joblist$d" ; done
~/scratch/fgf $ ls
ClientA ClientB joblistClientA joblistClientB
~/scratch/fgf $ cat joblistClientA
Job1
Job10
Job2
Job3
Job4
Job5
Job6
Job7
Job8
Job9
~/scratch/fgf $ cat joblistClientB
Job1
Job10
Job2
Job3
Job4
Job5
Job6
Job7
Job8
Job9
posted by doteatop at 11:03 AM on February 12, 2010
~/scratch/fgf $ ls *
ClientA:
Job1 Job10 Job2 Job3 Job4 Job5 Job6 Job7 Job8 Job9
ClientB:
Job1 Job10 Job2 Job3 Job4 Job5 Job6 Job7 Job8 Job9
~/scratch/fgf $ ls | while read d; do ls $d > "joblist$d" ; done
~/scratch/fgf $ ls
ClientA ClientB joblistClientA joblistClientB
~/scratch/fgf $ cat joblistClientA
Job1
Job10
Job2
Job3
Job4
Job5
Job6
Job7
Job8
Job9
~/scratch/fgf $ cat joblistClientB
Job1
Job10
Job2
Job3
Job4
Job5
Job6
Job7
Job8
Job9
posted by doteatop at 11:03 AM on February 12, 2010
try something like:
print "whatever data" >>"/output/" $5 ".txt"
You'll get text files named by client.
My syntax here might be shaky. It's been a few years.
posted by DarkForest at 11:04 AM on February 12, 2010
print "whatever data" >>"/output/" $5 ".txt"
You'll get text files named by client.
My syntax here might be shaky. It's been a few years.
posted by DarkForest at 11:04 AM on February 12, 2010
I think this will do what you need. If not, we may need more details:
for i in Client*;do find $i -type d -maxdepth 2 >`basename $i`.txt;done
posted by chrisamiller at 11:05 AM on February 12, 2010
for i in Client*;do find $i -type d -maxdepth 2 >`basename $i`.txt;done
posted by chrisamiller at 11:05 AM on February 12, 2010
Something like:
posted by rhizome at 11:11 AM on February 12, 2010
DIR=/start/here while read client echo $client > $DIR/$client/$client.log find $DIR/$client -type d -maxdepth 1 -exec echo "--{}" >>$DIR/$client/$client.log end< <(find $DIR -type d -maxdepth 1)
posted by rhizome at 11:11 AM on February 12, 2010
You could do something like this. If there is some pattern that distinguishes the Client Lines from the Job Lines, you will want to match for those.
posted by jefbla at 11:27 AM on February 12, 2010
BEGIN { do something here, if needed }
{
if ( /pattern that matches Client lines/) {
outputfile = $5 ".txt"
}
if ( /pattern that matches Job lines/ ) {
print $0 >> outputfile
}
}
END { do something at the end, if needed... }
posted by jefbla at 11:27 AM on February 12, 2010
Response by poster: Woah, cool. Lots of approaches for me to interpret- this is great, thanks.
Chrisamiller: I'm dangerously close to getting yours to work, but one hangup. The Client name often contains spaces. I'm getting exactly the output I need if my client is "blackstapler", but "Coffee mug" yields:
find: /root/directory/Coffee: No such file or directory
fine: mug: no such file or directory
I must be one pair of properly spaced quotes away from making it work, but I haven't determined where they go.....
posted by Steve3 at 11:45 AM on February 12, 2010
Chrisamiller: I'm dangerously close to getting yours to work, but one hangup. The Client name often contains spaces. I'm getting exactly the output I need if my client is "blackstapler", but "Coffee mug" yields:
find: /root/directory/Coffee: No such file or directory
fine: mug: no such file or directory
I must be one pair of properly spaced quotes away from making it work, but I haven't determined where they go.....
posted by Steve3 at 11:45 AM on February 12, 2010
Best answer: Try this:
for i in Client*;do find "$i" -type d -maxdepth 2 >`basename "$i"`.txt;done
It's good practice to quote bash variables anyway (though I tend to be lazy and only do it when I have to)
posted by chrisamiller at 12:06 PM on February 12, 2010 [1 favorite]
for i in Client*;do find "$i" -type d -maxdepth 2 >`basename "$i"`.txt;done
It's good practice to quote bash variables anyway (though I tend to be lazy and only do it when I have to)
posted by chrisamiller at 12:06 PM on February 12, 2010 [1 favorite]
Response by poster: That works perfectly- thanks chrisamiller.
posted by Steve3 at 12:28 PM on February 12, 2010
posted by Steve3 at 12:28 PM on February 12, 2010
"ls | while read d" is analogous to Useless use of cat. There is no need to run a whole subprocess and pipe here, the shell is perfectly capable of expanding globs on its own. "for d in *" is the right way. Actually in this case you should use "for d in */" so that the glob only matches directories and not files.
posted by Rhomboid at 4:41 PM on February 12, 2010
posted by Rhomboid at 4:41 PM on February 12, 2010
This thread is closed to new comments.
posted by phrontist at 10:59 AM on February 12, 2010