第一个问题勉强回答一下:任何文字的渲染宽度都可以精确计算,从而可以预估需要加多少空格来实现对齐、判断换行或分页。比如 Python 中可以使用 PIL.ImageFont.ImageFont.getlength 方法,R中有一个 systemfonts::string_width(),虽然实现细节可能有所区别,底层都是简单包装了一下 FreeType。
第二个问题实在是不会了,这些文档工具链没有学明白…… 于是斗胆问了一下 GPT-5 Thinking High:
If the list style I set in the reference_docx: template.docx (pandoc template) for an R Markdown + officedown + officer document is not applied correctly, is there a way to fix it? For example, is it a robust solution to use Python and modify the generated Word document at the Office Open XML level directly?
Short answer: yes, it's fixable---but directly rewriting the DOCX's OOXML with Python should be a last resort. You'll get a much more robust pipeline by fixing how lists are produced (Pandoc/officedown/officer + a clean reference_docx) than by post-editing the XML.
## What's going on
- Pandoc -> DOCX lists don't take a named "List Style" from your template. Pandoc generates its own numbering definitions in numbering.xml and applies them to paragraphs (usually styled List Paragraph). Your template's List Paragraph spacing/indents will carry through, but bullet shapes / numbering formats generally won't.
- If your goal is precise control (custom bullets, multi-level numbering, spacing), styling only in reference_docx often isn't enough.
## Recommended, robust fixes (in order)
1. Start from a clean template
- Create a fresh Word doc, define Normal, Heading 1..N, and List Paragraph (set indents/spacing only; avoid attaching numbering here).
- Turn off "Automatically update document styles" in Word. Avoid any direct formatting in example list paragraphs.
- Use that as reference_docx.
2. Let officedown/officer own the lists you care about
- For lists where you require exact bullets/numbering, don't rely on Markdown lists. Insert them via officedown so they can bind to a named list style defined in your template:
---
output:
officedown::rdocx_document:
reference_docx: template.docx
---
```{r}
library(officedown)
library(officer)
# In your template, create a List Style named "My Bullet" with desired levels.
block_list(
x = list("First", "Second", "Third"),
style = "My Bullet" # a Word List Style (not just a paragraph style)
)
- This route is stable: you edit the list definition once in Word, and officer will apply it consistently.
3. If you must keep Markdown lists
- Accept that you can reliably control spacing/indents via List Paragraph in the template, but not bullet/number glyphs across levels.
- For minor tweaks (e.g., spacing), this is fine.
## When to consider OOXML post-processing
Only when you can't change the authoring (must keep Markdown lists) and you need enforced numbering/bullets beyond what Pandoc emits.
- Pros: Full control after the fact; can standardize all lists across the doc.
- Cons: Brittle. Pandoc/officedown updates can change numId/abstractNum generation; Track Changes may complicate the XML; python-docx doesn't expose numbering APIs (you'll end up editing raw XML).
## Minimal Python approach (outline)
- Treat the .docx as a zip; edit word/numbering.xml (and sometimes word/styles.xml for paragraph indents).
- Find the w:numId values actually used in word/document.xml, map to w:abstractNum via numbering.xml.
- Patch each referenced w:abstractNum/w:lvl:
- w:numFmt -> decimal, lowerLetter, bullet, etc.
- w:lvlText -> like "%1.", "%1.%2.", or a bullet char.
- w:suff -> tab or space.
- Optional: set w:pPr/w:ind for hanging indents.
Example (uses lxml; intentionally simple and idempotent-ish):
import zipfile, shutil, tempfile, os
from lxml import etree
NS = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
def patch_docx_lists(src, dst, desired):
# desired: dict like { level:int -> dict(numFmt='decimal', lvlText='%1.', suff='tab') }
tmpdir = tempfile.mkdtemp()
with zipfile.ZipFile(src) as zin:
zin.extractall(tmpdir)
doc_path = os.path.join(tmpdir, "word", "document.xml")
num_path = os.path.join(tmpdir, "word", "numbering.xml")
# 1) Collect used numIds from document.xml
doc = etree.parse(doc_path)
numIds = set(int(n.get("{%s}val" % NS["w"])) for n in doc.xpath("//w:numPr/w:numId", namespaces=NS))
# 2) Patch corresponding abstractNum in numbering.xml
num = etree.parse(num_path)
num_map = {} # numId -> abstractNumId
for n in num.xpath("//w:num", namespaces=NS):
nid = int(n.xpath("./w:numId/@w:val", namespaces=NS)[0])
if nid in numIds:
aid = int(n.xpath("./w:abstractNumId/@w:val", namespaces=NS)[0])
num_map[nid] = aid
for a in num.xpath("//w:abstractNum", namespaces=NS):
aid = int(a.get("{%s}abstractNumId" % NS["w"]))
if aid in num_map.values():
# Patch levels present in 'desired'
for lvl_node in a.xpath("./w:lvl", namespaces=NS):
lvl = int(lvl_node.get("{%s}ilvl" % NS["w"]))
if lvl in desired:
spec = desired[lvl]
# numFmt
nf = lvl_node.find("w:numFmt", namespaces=NS)
if nf is None:
nf = etree.SubElement(lvl_node, "{%s}numFmt" % NS["w"])
nf.set("{%s}val" % NS["w"], spec.get("numFmt", "decimal"))
# lvlText
lt = lvl_node.find("w:lvlText", namespaces=NS)
if lt is None:
lt = etree.SubElement(lvl_node, "{%s}lvlText" % NS["w"])
lt.set("{%s}val" % NS["w"], spec.get("lvlText", "%1."))
# suff
sf = lvl_node.find("w:suff", namespaces=NS)
if sf is None:
sf = etree.SubElement(lvl_node, "{%s}suff" % NS["w"])
sf.set("{%s}val" % NS["w"], spec.get("suff", "tab"))
num.write(num_path, xml_declaration=True, encoding="UTF-8", standalone="yes")
# 3) Repack
with zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:
for root, _, files in os.walk(tmpdir):
for f in files:
full = os.path.join(root, f)
arc = os.path.relpath(full, tmpdir)
zout.write(full, arc)
# Example usage:
desired = {
0: dict(numFmt="decimal", lvlText="%1.", suff="tab"),
1: dict(numFmt="lowerLetter", lvlText="%2)", suff="tab"),
2: dict(numFmt="bullet", lvlText="•", suff="tab"),
}
patch_docx_lists("in.docx", "out.docx", desired)
## Notes
- This edits only the **abstract numbering** referenced by lists actually present, so it's less fragile than hard-coding IDs.
- If you also need consistent indents, add/update w:pPr/w:ind under each w:lvl.
- Validate the output by opening in Word once (Word will normalize the package).
## Practical decision rule
- Need perfect, repeatable list styling? Use officedown/officer to insert lists bound to a named List Style in the template.
- OK with just spacing/indent tweaks? Tune List Paragraph in reference_docx.
- Locked into Markdown lists but need exact bullets/numbering? Use an XML post-processor (like the Python outline above), and add tests to ensure idempotence across Pandoc/officedown upgrades.
If you want, I can sketch a small R helper that runs after render: detect lists with docxtractr/officer, call the Python patcher, and drop the fixed file back in place. This keeps your R Markdown workflow intact while giving you the precision of OOXML only when necessary.