R Markdown 引用模板文件生成 Word 文档落款如何对齐？

chuxinyuan

问题描述

最近我在考虑利用 R Markdown 自动生成符合《党政机关公文格式》要求的 Word 版公文，这样就可以剩下很多公文格式调整的时间，尤其是国内这个大环境，甚至绝大部分论文都是 Word 来排版的。

目前，公文模板我初步定制好了,用的工具是 R Markdown + officedown + officer,生成的公文也基本上满足视觉要求，但是细节处还有一些不足：

发文机关（落款单位）位置是根据成文日期的长度和一行正文的长度计算前面添加多个空格实现的，并不总是严格的和成文日期对齐，真正套红头文件还得手动设置制表符拖动标尺调整。一开始想的通过设置模板搞定，但是成文日期的字符长度时常在变，而模板的参数是固定的，无奈只好放弃。
模板里设置的列表格式貌似不起作用，生成的列表和模板始终不一样。当然一般公文也就四级标题就基本够用了，很少用的列表。

我的代码

---
title: "公文标题"
output: 
  officedown::rdocx_document:
    # 引用模板用来控制最终生成的 Word 的格式
    reference_docx: template.docx
    page_margins:
      top: 1.458     # 页边距（上37）
      bottom: 1.380  # 页边距（下35）
      left: 1.063    # 页边距（左27）
      right: 1.063   # 页边距（右27）
      gutter: 0.00   # 设置装订线
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

```{r}
# 处理公文抬头前的空行问题
officer::ftext("")
officer::fp_par(text.align = "center")
```

```{r}
# 公文抬头或称呼，格式引用模板文件的“称呼（Salutation）”
salutation = "公文抬头："
paragraph_salutation = officer::fpar(
  officer::ftext(salutation), 
  fp_p = officer::fp_par_lite(
    # 未定义参数会继承默认设置
    word_style = "Salutation"
  )
)
officer::block_list(paragraph_salutation)
```

首段（三号仿宋体_GB2312）。

# 一级标题

## 二级标题

### 三级标题

#### 四级标题

##### 五级标题

###### 六级标题

正文部分（三号仿宋体_GB2312，英文字符使用中文字体）。字间距为每行28个字，行距为28磅。

- 列表

```{r}
# 附件和正文之间空一行
officer::ftext("")
```

```{r}
# 引入附件，格式与正文部分相同，无需独设置
officer::ftext(paste0("附件：", "附件材料"))
```

```{r}
officer::ftext("")  # 落款前空第一行
```

```{r}
officer::ftext("")  # 落款前空第二行
```

```{r}
officer::ftext("")  # 落款前空第三行
```

```{r}
# 发文机关署名
signature = "发文机关"

# 使用当前日期生成中文格式的成文日期，例如 "2025年9月9日"
date = as.Date(Sys.Date())
year  = as.integer(format(date, '%Y'))
month = as.integer(format(date, '%m'))
day = as.integer(format(date, '%d'))
date_signature = paste0(year, "年", month, "月", day, "日")

# 计算落款日期前面的空格
date_signature_width = nchar(date_signature, type = "width") + 5
date_signature_space = 56 - 12 - date_signature_width

# 计算发文单位前面的空格
signature_width = nchar(signature, type = "width")
signature_space = round(
  date_signature_space + 
    (date_signature_width - signature_width) / 2 + 1e-9, 0
)

# 引入发文机关署名，位置根据成文日期的位置确定 
officer::ftext(
  paste0(strrep(" ", signature_space), signature)
)
```

```{r}
# 引入成文日期，格式引用模板文件的 “Signature Date”
paragraph_date_signature = officer::fpar(
  officer::ftext(date_signature), 
  fp_p = officer::fp_par_lite(
    word_style = "Signature Date"
  )
)
officer::block_list(paragraph_date_signature)
```

想请教下大伙这个问题是怎么解决的？

nan.xiao

第一个问题勉强回答一下：任何文字的渲染宽度都可以精确计算，从而可以预估需要加多少空格来实现对齐、判断换行或分页。比如 Python 中可以使用 PIL.ImageFont.ImageFont.getlength 方法，R中有一个 systemfonts::string_width()，虽然实现细节可能有所区别，底层都是简单包装了一下 FreeType。

第二个问题实在是不会了，这些文档工具链没有学明白…… 于是斗胆问了一下 GPT-5 Thinking High:

If the list style I set in the reference_docx: template.docx (pandoc template) for an R Markdown + officedown + officer document is not applied correctly, is there a way to fix it? For example, is it a robust solution to use Python and modify the generated Word document at the Office Open XML level directly?

Short answer: yes, it's fixable---but directly rewriting the DOCX's OOXML with Python should be a last resort. You'll get a much more robust pipeline by fixing how lists are produced (Pandoc/officedown/officer + a clean reference_docx) than by post-editing the XML.

## What's going on

- Pandoc -> DOCX lists don't take a named "List Style" from your template. Pandoc generates its own numbering definitions in numbering.xml and applies them to paragraphs (usually styled List Paragraph). Your template's List Paragraph spacing/indents will carry through, but bullet shapes / numbering formats generally won't.
- If your goal is precise control (custom bullets, multi-level numbering, spacing), styling only in reference_docx often isn't enough.

## Recommended, robust fixes (in order)

1. Start from a clean template
   - Create a fresh Word doc, define Normal, Heading 1..N, and List Paragraph (set indents/spacing only; avoid attaching numbering here).
   - Turn off "Automatically update document styles" in Word. Avoid any direct formatting in example list paragraphs.
   - Use that as reference_docx.

2. Let officedown/officer own the lists you care about
   - For lists where you require exact bullets/numbering, don't rely on Markdown lists. Insert them via officedown so they can bind to a named list style defined in your template:

---
output:
  officedown::rdocx_document:
    reference_docx: template.docx
---

```{r}
library(officedown)
library(officer)

# In your template, create a List Style named "My Bullet" with desired levels.
block_list(
  x = list("First", "Second", "Third"),
  style = "My Bullet"  # a Word List Style (not just a paragraph style)
)

- This route is stable: you edit the list definition once in Word, and officer will apply it consistently.

3. If you must keep Markdown lists
   - Accept that you can reliably control spacing/indents via List Paragraph in the template, but not bullet/number glyphs across levels.
   - For minor tweaks (e.g., spacing), this is fine.

## When to consider OOXML post-processing

Only when you can't change the authoring (must keep Markdown lists) and you need enforced numbering/bullets beyond what Pandoc emits.

- Pros: Full control after the fact; can standardize all lists across the doc.
- Cons: Brittle. Pandoc/officedown updates can change numId/abstractNum generation; Track Changes may complicate the XML; python-docx doesn't expose numbering APIs (you'll end up editing raw XML).

## Minimal Python approach (outline)

- Treat the .docx as a zip; edit word/numbering.xml (and sometimes word/styles.xml for paragraph indents).
- Find the w:numId values actually used in word/document.xml, map to w:abstractNum via numbering.xml.
- Patch each referenced w:abstractNum/w:lvl:
 - w:numFmt -> decimal, lowerLetter, bullet, etc.
 - w:lvlText -> like "%1.", "%1.%2.", or a bullet char.
 - w:suff -> tab or space.
 - Optional: set w:pPr/w:ind for hanging indents.

Example (uses lxml; intentionally simple and idempotent-ish):

import zipfile, shutil, tempfile, os
from lxml import etree

NS = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}

def patch_docx_lists(src, dst, desired):
    # desired: dict like { level:int -> dict(numFmt='decimal', lvlText='%1.', suff='tab') }
    tmpdir = tempfile.mkdtemp()
    with zipfile.ZipFile(src) as zin:
        zin.extractall(tmpdir)

    doc_path = os.path.join(tmpdir, "word", "document.xml")
    num_path = os.path.join(tmpdir, "word", "numbering.xml")

    # 1) Collect used numIds from document.xml
    doc = etree.parse(doc_path)
    numIds = set(int(n.get("{%s}val" % NS["w"])) for n in doc.xpath("//w:numPr/w:numId", namespaces=NS))

    # 2) Patch corresponding abstractNum in numbering.xml
    num = etree.parse(num_path)
    num_map = {}  # numId -> abstractNumId
    for n in num.xpath("//w:num", namespaces=NS):
        nid = int(n.xpath("./w:numId/@w:val", namespaces=NS)[0])
        if nid in numIds:
            aid = int(n.xpath("./w:abstractNumId/@w:val", namespaces=NS)[0])
            num_map[nid] = aid

    for a in num.xpath("//w:abstractNum", namespaces=NS):
        aid = int(a.get("{%s}abstractNumId" % NS["w"]))
        if aid in num_map.values():
            # Patch levels present in 'desired'
            for lvl_node in a.xpath("./w:lvl", namespaces=NS):
                lvl = int(lvl_node.get("{%s}ilvl" % NS["w"]))
                if lvl in desired:
                    spec = desired[lvl]
                    # numFmt
                    nf = lvl_node.find("w:numFmt", namespaces=NS)
                    if nf is None:
                        nf = etree.SubElement(lvl_node, "{%s}numFmt" % NS["w"])
                    nf.set("{%s}val" % NS["w"], spec.get("numFmt", "decimal"))
                    # lvlText
                    lt = lvl_node.find("w:lvlText", namespaces=NS)
                    if lt is None:
                        lt = etree.SubElement(lvl_node, "{%s}lvlText" % NS["w"])
                    lt.set("{%s}val" % NS["w"], spec.get("lvlText", "%1."))
                    # suff
                    sf = lvl_node.find("w:suff", namespaces=NS)
                    if sf is None:
                        sf = etree.SubElement(lvl_node, "{%s}suff" % NS["w"])
                    sf.set("{%s}val" % NS["w"], spec.get("suff", "tab"))

    num.write(num_path, xml_declaration=True, encoding="UTF-8", standalone="yes")

    # 3) Repack
    with zipfile.ZipFile(dst, "w", zipfile.ZIP_DEFLATED) as zout:
        for root, _, files in os.walk(tmpdir):
            for f in files:
                full = os.path.join(root, f)
                arc = os.path.relpath(full, tmpdir)
                zout.write(full, arc)

# Example usage:
desired = {
    0: dict(numFmt="decimal", lvlText="%1.", suff="tab"),
    1: dict(numFmt="lowerLetter", lvlText="%2)", suff="tab"),
    2: dict(numFmt="bullet", lvlText="•", suff="tab"),
}
patch_docx_lists("in.docx", "out.docx", desired)

## Notes

- This edits only the **abstract numbering** referenced by lists actually present, so it's less fragile than hard-coding IDs.
- If you also need consistent indents, add/update w:pPr/w:ind under each w:lvl.
- Validate the output by opening in Word once (Word will normalize the package).

## Practical decision rule

- Need perfect, repeatable list styling? Use officedown/officer to insert lists bound to a named List Style in the template.
- OK with just spacing/indent tweaks? Tune List Paragraph in reference_docx.
- Locked into Markdown lists but need exact bullets/numbering? Use an XML post-processor (like the Python outline above), and add tests to ensure idempotence across Pandoc/officedown upgrades.

If you want, I can sketch a small R helper that runs after render: detect lists with docxtractr/officer, call the Python patcher, and drop the fixed file back in place. This keeps your R Markdown workflow intact while giving you the precision of OOXML only when necessary.

chuxinyuan

nan.xiao 非常感谢。

第一个问题，我数了下一行有 27 个字，仿宋_GB2312 三号字体是 16 磅，按照你给思路改了下，效果还不错，修改后的代码如下：

  # 计算成文日期前面的空格
  date_signature_width = systemfonts::string_width(
    strings = date_signature, 
    family = "FangSong_GB2312",
    size = 16
  )
  date_signature_space = (27 * 16 - 6 * 16 - date_signature_width) / 8
  
  # 计算发文单位前面的空格
  signature_width = systemfonts::string_width(
    strings = signature, 
    family = "FangSong_GB2312",
    size = 16
  )
  signature_space = floor(
    date_signature_space + 
    ((date_signature_width - signature_width) / 2) / 8
  )

第二个问题，按照 officedown 的 demo,貌似是可以正常引用的，可能是我的模板设置问题吧。官方 demo：

dir <- system.file(package = "officedown", "examples", "bookdown")
file.copy(dir, getwd(), recursive = TRUE, overwrite = TRUE)
rmarkdown::render_site("bookdown")
browseURL("bookdown/_book/bookdown.docx")

详见：https://ardata-fr.github.io/officeverse/officedown-for-word.html