Format Coverage

Legend: complete / ⚠️ partial (write-only, etc.) / 🔵 preserved as unknown parts, not editable / not implemented / not applicable

xlsx / XSSF (~78%)

CategoryFeatureStatusNotes
Cell valuesstring, numeric, date, boolean, error
Formulasformula text write/read + cached valueExcel recalculation-on-open works
Formulasfull formula evaluation❌ deferredSee package-split
Stylesfonts, fills, borders, number formats, alignmentRound-trip verified
Layoutmerged cells, column width, row height
Layouthidden rows/columns, freeze panes
Layoutactive cell, sheet selection, active sheetActive cell/selected in-memory; active sheet round-trips
Layoutprint settings (margins, paper size, orientation, headers)
Drawingsimages, anchors, rotation, hyperlinks
Drawingscharts🔵Existing chart parts are preserved on round-trip; new creation/editing is not modeled
ReviewcommentsCell comment read/create/edit/remove is modeled via XSSFComment, cell/sheet lookup, and VML/comment part write/read. Rich formatting and VML shape styling are still minimal
Drawingsauto-shapes, group shapes, connectors🔵Unknown xdr:twoCellAnchor children in drawing.xml preserved verbatim via raw XML capture/re-emission. Currently only xdr:pic is modeled; all other element types survive round-trip.
Datadata validation, conditional formatting, auto filter
Datapivot tables⚠️Programmatic creation works; editing existing not modeled
Stringsshared strings, rich text runsPer-character formatting via XSSFRichTextString
Protectionworkbook/sheet protection
Macrosxlsm preservationVBA bytes preserved on round-trip
Othersparklines
Otherexternal data connections🔵xl/connections.xml / xl/externalLinks/* round-trip via _preservedEntries

docx / XWPF (~65%)

CategoryFeatureStatusNotes
Paragraphs/runstext, font name/size/color, bold, italic, underline, strikeoutRound-trip verified
Paragraphsalignment, indents, spacing, bullet/numbered lists
Tablescreate/read tables, rows, cellsRound-trip verified
Tablescell merging, borders🔵Round-trip preserved via raw XML capture/re-emission; API-level creation not modeled
Sectionspage size, margins, orientation
Sectionsheaders and footersRound-trip verified. Rich content (images, formatting) preserved via _preservedEntries when not modified via API.
SectionscolumnssetColumns() API, round-trip verified
Linkshyperlinks (external URLs)
Imagesinline images with rotation
Imagesfloating (anchored) images🔵<wp:anchor> elements preserved via raw XML capture/re-emission
Imagestext boxes (w:txbxContent)Text extraction from inline and anchored drawing textboxes is supported
AnnotationscommentsExisting comments round-trip via preservation; minimal XWPFComment lookup/text APIs, comment metadata/text editing, and paragraph range comment creation are modeled
Annotationsfootnotes, endnotes🔵word/footnotes.xml / word/endnotes.xml round-trip preserved
FieldsTOC, page numbers, mail mergeWrite/read/round-trip
SDTcontent controls (block-level and inline)🔵Block-level w:sdt in w:body and inline w:sdt inside w:p preserved via raw XML capture/re-emission.
Stylesparagraph style reference (pStyle)setStyle()/getStyleID() API, round-trip verified. word/styles.xml 🔵 preserved + default styles auto-generated. Character/table styles ❌
Track Changesrevision marks🔵Tracked-change XML (w:ins, w:del, moves, etc.) is preserved in body/paragraph child order during round-trip; accept/reject/create/edit APIs are not modeled
OtherOLE embeddings🔵word/embeddings/* round-trip via _preservedEntries
Otherdocm macro preservationVBA bytes preserved
Otherunknown part preservation_preservedEntries mechanism preserves non-model ZIP entries

pptx / XSLF (~40%)

CategoryFeatureStatusNotes
Slidescreate/read slides
Slidesslide size
Slidesnotes slides🔵ppt/notesSlides/notesSlide*.xml round-trip via _preservedEntries
Texttext boxes (p:sp) create/readRound-trip verified
Textmultiple paragraphs, run formattingbold, italic, underline, size, font, color
Shapesimages with position, size, rotationRound-trip verified
Shapestables (p:graphicFrame/a:tbl)Round-trip verified
Shapesgrouping, connectors, lines🔵Unknown p:spTree children (grpSp, cxnSp, etc.) preserved verbatim via raw XML capture/re-emission
ShapesSmartArt, charts🔵Preserved as unknown parts
Mediavideo/audio embedding🔵Non-image ppt/media/* round-trip via _preservedEntries
Animationanimations, transitions🔵Preserved as unknown parts
Themelayout, master, theme🔵Not editable, preserved on round-trip
Otherpptm macro preservationVBA bytes preserved

xls / HSSF (~35%)

CategoryFeatureStatusNotes
Cell valuesstring, numeric, boolean, blank, errorBIFF8 LabelSST/Number/BoolErr/Blank round-trip covered
Sheetsmultiple sheets, sparse rows/cells, high column indexes
Stylesfonts, data formats, alignment, wrap, borders, fills⚠️Common HSSFFont/HSSFCellStyle round-trip works; not full BIFF style parity
Layoutcolumn width, row height, hidden rows/columns, merged regions, freeze panes
Formulasformula text + cached value read⚠️Existing POI formula fixtures can be read; new BIFF formula token writing and evaluation are not implemented
Compatibilityrepresentative Apache POI .xls fixtures loadIncludes basic, styles, formulas, hyperlinks, comments, drawings, images, macro files as load/preservation cases
InteropJava POI bidirectional fixtures⚠️basic/styles/layout/unicode/comprehensive coverage
Preservationnon-Workbook OLE streams, VBA streams, unknown BIFF recordsLight edits preserve unmodeled streams/records where possible
Not modeledimages/shapes/charts/comments/hyperlink editing/filters/pivotsSome are load/preservation fixtures, but not public usermodel creation/edit APIs

doc / HWPF (~25%)

CategoryFeatureStatusNotes
ReadingOLE2 .doc open, FIB/table stream parsingWordDocument + 0Table/1Table selection and fallback covered
Textmain body text extractionCLX/piece table based extraction with compressed and Unicode text pieces
UserModelRange, Paragraph, CharacterRun⚠️Paragraph/run splitting and text composition covered for representative fixtures
Formattingcharacter and paragraph properties⚠️CHPX-derived font name/size/bold/italic/underline/strike plus minimal PAPX fields
Extractionheader/footer and table structuresgetHeaderStoryRange(), table row/cell iteration implemented
Editingno-op write, append paragraph, simple text replacement⚠️Limited main-body edit path, not a full Word binary editing engine
PreservationOLE streams/storages, embedded OLEUnedited stream/storage content is preserved in representative fixtures
InteropJava POI bidirectional testing⚠️Java POI correctly extracts tables and header/footer text from dotnet-poi saved files
Not modeledimages/footnotes/comments/fields APIExisting streams may be preserved, but these are not usermodel creation/edit features

ppt / HSLF (~12%)

CategoryFeatureStatusNotes
ReadingOLE2 .ppt open and stream inventoryDetects PowerPoint Document, Current User, and summary streams
Recordsrecord tree scan with raw bytesContainer/atom hierarchy is retained for preservation
Slidesslide count and orderPersist-pointer based order covered by representative fixtures
TextTextChars/TextBytes extractionUTF-16LE and CP1252 text atoms supported
Editingno-op write / round-tripOLE2 compound file is preserved through write
PreservationOLE streams/storages, images, comments, OLERepresentative preservation fixtures covered
InteropJava POI direction B⚠️C# fixture generation exists; Java-side assertion still pending
Not modeledslide creation, shape editing, images, animationsNew presentation authoring is not implemented for HSLF