doc (HWPF) Overview

HWPF provides early support for the legacy Word 97-2003 .doc format. Coverage is ~25%.

Status

HWPF is useful today for text extraction, simple indexing/migration workflows, and limited main-body edits. It can open OLE2 .doc files, parse the File Information Block and selected table stream, extract main document text from the CLX/piece table, extract table and header/footer text, expose a minimal Range/Paragraph/CharacterRun model, and preserve unedited OLE streams/storages during no-op or limited body edits.

It is not a complete Word binary editing engine. Images, footnotes, comments, fields, and tracked changes are not modeled through public APIs.

Basic Text Extraction

using DotnetPoi.HWPF.UserModel;

using var stream = File.OpenRead("input.doc");
using var doc = new HWPFDocument(stream);

Console.WriteLine(doc.getText());

Limited Body Editing

using DotnetPoi.HWPF.UserModel;

using var stream = File.OpenRead("input.doc");
using var doc = new HWPFDocument(stream);

doc.appendParagraph("Added by dotnet-poi");
doc.replaceText("{{name}}", "Example Corp");

using var output = File.Create("output.doc");
doc.write(output);

Supported Today

Limitations

For new documents, prefer docx / XWPF.