.TH DOC2TXT 1 .SH NAME doc2txt, olefs, mswordstrings \- extract printable strings from Microsoft Word documents .SH SYNOPSIS .B doc2txt [ .I file.doc ] .br .B aux/olefs [ .B -m .I mtpt ] .I file.doc .br .B aux/mswordstrings .I /mnt/doc/WordDocument .SH DESCRIPTION .I Doc2txt is a shell script that uses .I olefs and .I mswordstrings to extract the printable text from the body of a Microsoft Word document. .PP Microsoft Office documents are stored in OLE (Object Linking and Embedding) format, which is a scaled down version of Microsoft's FAT file system. .I Olefs presents the contents of an Office document as a file system on .IR mtpt , which defaults to .BR /mnt/doc . .I Mswordstrings parses the .I WordDocument file inside an Office document, extracting the text stream. .SH SOURCE .B /sys/src/cmd/aux/mswordstrings.c .br .B /sys/src/cmd/aux/olefs.c .br .B /rc/bin/doc2txt .SH SEE ALSO .IR strings (1) .br ``Microsoft Word 97 Binary File Format'', available on line at Microsoft's developer home page. .br ``LAOLA Binary Structures'', .IR snake.cs.tu-berlin.de:8081/~schwartz/pmh .