How do I print an old dos-era postscript file (font encoding: codepage 850) containing german umlauts?
May 16, 2009 9:26 AM   Subscribe

MS-DOS/Legacy file format filter: How do I convert an old postscript file (font encoding: codepage 850) containing german umlauts to pdf?/ How do I print such a file on modern printers?

Hi everybody,

basically, I need to print out / convert to pdf several really old postscript files, which were created by a dos app (1) and contain german umlauts. The font encoding of the files is done in such a way that e.g. for ü the character number is \201 (Codepage 850).

If I open these files with Ghostview or GSView the umlauts are not displayed at all.But I guess if I could get a hold of an old MS-DOS-era printer (that understands codepage 850), the files could be printed out correctly with this little utility, which is able to send postscript files directly to a printer.

Since such a printer is a difficult item to get nowadays, I've tried another approach:
I've tried to send the files with Printfile to a virtual Adobe postscript printer, which in turn should generate a new postscript file with an up-to-date character set. However, the results were the same: no umlauts were displayed.

Does anybody know how to solve this problem? From my point of view the best solution would be to convert the postscript files to pdf. But if that is not possible I would still be very!!! happy, if someone could tell me how to at least print these files.

Thanks in advance,

pu9iad





(1):Neither do I recall the app's name, nor do I have copies of the document in another file format.
posted by pu9iad to Computers & Internet (13 answers total) 1 user marked this as a favorite
 
What happens under Linux if you do ps2pdf?
posted by yoyo_nyc at 9:36 AM on May 16, 2009


Photoshop should be able to load it and let you convert it into a high res raster image, if that's useful to you.
posted by glider at 9:46 AM on May 16, 2009


Could it be that the fonts used do not have umlauts? It should be possible to find out which font is used by opening the .ps file in a text editor like Notepad. Postscript is more or less human readable, so you might also be able to extract the content that way.
posted by dhoe at 10:54 AM on May 16, 2009


Hi,

thanks for the many answers so far. However the problem is not solved yet.

@yoyo_nyc: No umlauts are displayed, either. I kind of expected that result a little bit. After all, afaik ps2pdf is also based on ghostscript.

@glider: That is an interesting idea. However, my recently downloaded demo version does only display some weird symbols instead of the umlauts.

@dhoe: The content can already be viewed and printed. But umlauts and other characters that are used frequently in German are missing.

As far as the fonts go, I don't think that is the reason for this problem:

% Do the foreign encoding now
/InitForeign % P:nothing, R:nothing
{
/Courier-SW /Courier MakeForeignFont
/Courier-Bold-SW /Courier-Bold MakeForeignFont
/Courier-Oblique-SW /Courier-Oblique MakeForeignFont
/Courier-BoldOblique-SW /Courier-BoldOblique MakeForeignFont
/Times-Roman-SW /Times-Roman MakeForeignFont
/Times-Bold-SW /Times-Bold MakeForeignFont
/Times-Italic-SW /Times-Italic MakeForeignFont
/Times-BoldItalic-SW /Times-BoldItalic MakeForeignFont
/Helvetica-SW /Helvetica MakeForeignFont
/Helvetica-Bold-SW /Helvetica-Bold MakeForeignFont
/Helvetica-Oblique-SW /Helvetica-Oblique MakeForeignFont
/Helvetica-BoldOblique-SW /Helvetica-BoldOblique MakeForeignFont
} bd

Maybe the Postscript version helps?:
%!PS-Adobe-1.0
% Postscript Initialization Prologue
% Copyright SoftWorks Development, 1990
% All Rights Reserved
% Version 1.00

Personally, I strongly believe that it has to do someting with the character numbers:
In the DOS-file the word für is displayed as follows:
f\201r

But if I create a text file containing that word and let the ghostscript driver generate a postscript file,
the same word is displayed as follows:
f\374r

I really appreciate any more ideas.
posted by pu9iad at 11:40 AM on May 16, 2009


I haven't worked a whole lot with PostScript (I tend to stay on the PDF side of things), but couldn't you do a search and replace of "\201" with "ü" in the ps file?
posted by niles at 12:02 PM on May 16, 2009


PostScript fonts, certainly ones of that era, didn't use an encoding that met any standard codepage. Instead they used Adobe Standard Encoding, and you pretty had to make your own encoding vector - which is what the MakeForeignFont routine would be doing. So unless the PS code is doing something heinous (which, sadly, was likely - early code assumed a lot about the interpreter you were running on) it should distill to correct PDF. Clearly, though, it's not.

In a past life, I was a PostScript maven in a multilingual dictionary company, so if you can share any files, I'll see what I can do.
posted by scruss at 12:19 PM on May 16, 2009


Yeah, hand editing the postscript is probably going to be the way to go here. Then convert to pdf.
posted by pharm at 12:28 PM on May 16, 2009


Since you have access to a linux box, this is very easy:

% sed s/\\201/\\374/g < evil.ps > good.ps

should do the job.
posted by b1tr0t at 12:44 PM on May 16, 2009


And to be pedantic, you don't want to hand edit. Instead, you want to build up a series of sed scripts that do the conversion for you. Actual hand editing is very error prone.

I would keep a backup of the original file, as well as each intermediate file, until I get the output I want.
posted by b1tr0t at 12:46 PM on May 16, 2009


Hello again,

running a search and replace script definitively is a good idea, but i favor a simpler and more reliable solution. Hypothetically speaking, what would happen if there is a line which contains the character sequence \201, but in this case \201 does not stand for ü, but was directly typed in the document?

@scruss: Thank you for that offer. That is very kind. I pasted a file stripped of all but one line of the text to http://pastebin.com/m7a8eeb0b. The document should read: "Diese heißt nämlich überhaupt nicht". If it does not take too much of your time, it would be fantastic if you could have a short look.
posted by pu9iad at 2:01 PM on May 16, 2009


Thanks, pu9iad - that brought back a bunch of memories -- all unpleasant -- of the dreadful ad-hoc typesetting systems we used to write to run on PS RIPs, as they had faster CPUs than the workstations of the day.

The example file wouldn't print as expected when sent to a LaserJet 4M+, which is only slightly newer than your application. I do have a more period-appropriate printer, but it's a balky thing, can only be manually fed, takes about 10 minutes a page, and I'm not sure if I still have a computer with a parallel port for it to talk to.

By going through the encoding vector in the file, it looks like it's not doing something right - the code intended for ß really is encoded to Æ. The files may be broken as designed, and may never have printed correctly on anything.

How many of these files are there, and how important are they? This may take some time, and $, to fix. Stop by comp.lang.postscript and see if anyone takes on your plight. Aandi Inston would be able to sell you software that would help, but it's expensive. I could find people to help; mefimail me.
posted by scruss at 10:13 AM on May 17, 2009


hang on, scratch that bit about font encoding. It does look like it's setting up the vector correctly. It's clearly not using it correctly, though ...
posted by scruss at 10:28 AM on May 17, 2009


@all: Thanks for your help.
I finally got to print it right on the printer of a colleague. It was a HP Laserjet 4200 PS.
posted by pu9iad at 11:47 AM on May 22, 2009


« Older Experiencing the quintessential of a place   |   Sensational scenic Shenandoah short-sabbatical... Newer »
This thread is closed to new comments.