在线文档 - Google 文档的数据协议设计
Google 文档作为 G Suite 重要的产品套件之一,作为优秀的在线协作文档而经常被开发者所讨论,在 Google 文档背后,有着一整套优秀的相关架构设计支撑,数据协议设计就是其中之一,非常具有学习和研究价值。
前言
截至 2020 年,Google 旗下的 G Suite 用户量达到 20 亿,而 Google 文档作为其重要的产品套件之一,作为优秀的在线协作文档而经常被开发者所讨论,在Google文档背后,有着一整套优秀的相关架构设计支撑,数据协议设计就是其中之一,非常具有学习和研究价值,本文旨在向研发同学详细介绍 Google 文档的数据协议设计精髓。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-747W8POL-1642041334639)(https://docs.corp.kuaishou.com/d/loadimage/5343388864566933907)]
定义
在线协作文档的内容分类
在设计一个数据协议之前,需要将一个在线协作文档所包含的全部功能进行整理:
文字
- 文字内容
- 特殊转义符号
- 换行符,制表符等等
- 人类文字
- 文字样式
- 大小
- 字体
- 加粗
- …
文档基本信息
- 背景色
- 纸张尺寸
- 纸张内外间距
插件
- 图片
- 超链接
- 评论
- 目录
- 表格
- …
在线文档的用户操作分类
在对用户操作进行分类之前,我们需要引入操作指令(command) 这个概念。简单来说,用户每次改变文档的操作,都将抽象成一次 command 发送到服务端,再由服务端将这次 command 分发给其他协作者的客户端。
在线协作文档有查看历史、协作、撤销的产品特性,故用户的每一个command都需要被原子化。我们对 Google文档的 command 进行分类(用户的每一次操作都应该可以通过这些类别的 command 组合来清晰的表达):
create 创建
- create 创建图片、列表项等实体
- insertAfter 在指定的位置后插入
- insertBefore 在指定的位置前插入
update 更新
- 更新已有的属性
delete 删除
- deleteAt 在指定的位置删除
tether 绑定
- 将某些内容和create的实体绑定起来
分析用户对 Google 文档的操作
打开一个 Google 文档,正文书写为 ”快手,拥抱每一种生活“,并对文本进行一些样式修改,例如加粗,修改文本颜色等
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ylYGet8F-1642041334641)(https://docs.corp.kuaishou.com/d/loadimage/-1598048658281776554)]
然后,通过其提供的版本历史记录功能,我们可以获取查看一个文档的历史版本的请求接口
https://docs.google.com/document/u/0/d/${docId}/showrevision
剔除掉一些非关键信息,我们得到以下 json 结构,很明显这是一个数组,接下来我们简称这个数组为 commands 并逐个进行分析
[{"ty":"is","ibi":1,"s":"快手,拥抱每一种生活"},{"ty":"as","st":"document","si":0,"ei":0,"sm":{"ds_pw":595.4399999999999,"ds_lhs":1,"ds_ph":841.68}},{"ty":"as","st":"headings","si":0,"ei":0,"sm":{"hs_h3":{"sdef_ps":{"ps_sb_i":false,"ps_sb":16},"sdef_ts":{"ts_fgc":"#434343","ts_bd":false,"ts_fgc_i":false,"ts_bd_i":false}},"hs_t":{"sdef_ps":{"ps_sb_i":false,"ps_sa":3,"ps_sa_i":false,"ps_sb":0},"sdef_ts":{"ts_bd":false,"ts_bd_i":true,"ts_fs":26,"ts_fs_i":false}},"hs_h2":{"sdef_ps":{"ps_sa":6,"ps_sa_i":false},"sdef_ts":{"ts_bd":false,"ts_bd_i":false,"ts_fs":16,"ts_fs_i":false}},"hs_h1":{"sdef_ps":{"ps_sb_i":false,"ps_sb":20},"sdef_ts":{"ts_bd":false,"ts_bd_i":true,"ts_fs":20,"ts_fs_i":false}},"hs_nt":{"sdef_ps":{"ps_lslm":1,"ps_lslm_i":false,"ps_sm":0,"ps_sm_i":false}},"hs_st":{"sdef_ps":{"ps_sb_i":false,"ps_sa":16,"ps_sa_i":false,"ps_sb":0},"sdef_ts":{"ts_ff_i":false,"ts_it":false,"ts_fs":15,"ts_ff":"Arial","ts_it_i":false,"ts_fs_i":false}},"hs_h6":{"sdef_ps":{"ps_sb_i":false,"ps_sa":4,"ps_sa_i":false,"ps_sb":12},"sdef_ts":{"ts_fgc":"#666666","ts_fgc_i":false,"ts_it":true,"ts_bd_i":true,"ts_fs":11,"ts_it_i":false,"ts_bd":false,"ts_fs_i":false}},"hs_h5":{"sdef_ps":{"ps_sb_i":false,"ps_sa":4,"ps_sa_i":false,"ps_sb":12},"sdef_ts":{"ts_fgc":"#666666","ts_bd":false,"ts_fgc_i":false,"ts_bd_i":true}},"hs_h4":{"sdef_ps":{"ps_sb_i":false,"ps_sa":4,"ps_sa_i":false,"ps_sb":14},"sdef_ts":{"ts_fgc":"#666666","ts_bd":false,"ts_fgc_i":false,"ts_bd_i":true}}}},{"ty":"as","st":"language","si":0,"ei":0,"sm":{"lgs_l":"zh_CN"}},{"ty":"as","st":"paragraph","si":11,"ei":11,"sm":{"ps_klt_i":true,"ps_awao_i":true,"ps_sm_i":true,"ps_ls_i":true,"ps_il_i":true,"ps_ir_i":true,"ps_al_i":true,"ps_bl_i":true,"ps_sd_i":true,"ps_sb_i":true,"ps_sa_i":true,"ps_lslm_i":true,"ps_br_i":true,"ps_bbtw_i":true,"ps_kwn_i":true,"ps_bt_i":true,"ps_ifl_i":true,"ps_bb_i":true}},{"ty":"as","st":"text","si":0,"ei":11,"sm":{"ts_un":false,"ts_un_i":true,"ts_sc":false,"ts_st_i":true,"ts_bgc":null,"ts_fs_i":true,"ts_bgc_i":true,"ts_ff_i":true,"ts_bd_i":true,"ts_va_i":true,"ts_fs":11,"ts_ff":"Arial","ts_bd":false,"ts_tw":400,"ts_it_i":true,"ts_fgc":"#000000","ts_fgc_i":true,"ts_it":false,"ts_va":"nor","ts_st":false,"ts_sc_i":true}},{"ty":"as","st":"text","si":1,"ei":2,"sm":{"ts_un":true,"ts_fgc":"#00796b","ts_un_i":false,"ts_fgc_i":false,"ts_bd_i":false,"ts_st":false,"ts_bd":true,"ts_st_i":false}},{"ty":"as","st":"text","si":3,"ei":10,"sm":{"ts_fgc":"#00796b","ts_st":false,"ts_fgc_i":false,"ts_st_i":false}},{"ty":"as","st":"text","si":11,"ei":11,"sm":{"ts_fgc":"#ff9900","ts_fgc_i":false}}
]
创建字符
commands[0]
{"ty":"is","ibi":1,"s":"快手,拥抱每一种生活"}
首先 “ty” 是 “type” 的缩写, “is” 是 “insertSpacers” 的缩写,然后 “ibi” 是 “insertBeforeIndex” 的缩写, “s” 是 “spacers” 的缩写,那么这个重新理解下这个 command
{"type":"insertSpacers","insertBeforeIndex":1,"spacers":"快手,拥抱每一种生活"
}
含义:在文档字符内容索引 1 的位置前插入 “快手,拥抱每一种生活”
创建文档基本信息
command[1]
{"ty":"as","st":"document","si":0,"ei":0,"sm":{"ds_pw":595.4399999999999,"ds_lhs":1,"ds_ph":841.68}
}
我们继续分析下一个 command ,“as” 是 “applyStyle” 的缩写,“st” 是 “styleType” 的缩写, “si” 是 “startIndex” 的缩写,“ei” 是 “endIndex” 的缩写, “sm” 是 “styleMap” 的缩写,“ds_pw” 是 “documentStyle_pageWidth” 的缩写,“ds_pw” 是 “documentStyle_pageHeight” 的缩写,“ds_lhs” 是 “documentStyle_lineHeightStrategy” 的缩写。
{"type":"applyStyle","styleType":"document","startIndex":0,"endIndex":0,"styleMap":{"documentStyle_pageWidth":595.4399999999999,"documentStyle_lineHeightStrategy":1,"documentStyle_pageHeight":841.68}
}
含义:这是一个文档全局配置,描述文档的纸张宽度为 595 point,高度为 841 point,因为要兼容不同设备的尺寸。
看到这,大家可能好奇为什么我们能够一眼就看出这个协议的含义。实际上,通过 debugger 调试 google docs 压缩后的代码,能够比较快的找到线索。后面的 command 我们就直接写翻译结果了。
https://docs.google.com/static/document/client/js/3556551332-client_js_prod_kix_core__zh_cn.js
创建标题样式(默认)
commands[2]
{"type":"applyStyle","styleType":"headings","startIndex":0,"endIndex":0,"styleMap":{"headStyle_h3":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingBefore":16},"styleDefault_textStyle":{"textStyle_foregroundColor":"#434343","textStyle_bold":false,"textStyle_foregroundColor_inherit":false,"textStyle_bold_inherit":false}},"headStyle_title":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingAfter":3,"paragraphStyle_spacingAfter_inherit":false,"paragraphStyle_spacingBefore":0},"styleDefault_textStyle":{"textStyle_bold":false,"textStyle_bold_inherit":true,"textStyle_fontSize":26,"textStyle_fontSize_inherit":false}},"headStyle_h2":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingAfter":6,"paragraphStyle_spacingAfter_inherit":false},"styleDefault_textStyle":{"textStyle_bold":false,"textStyle_bold_inherit":false,"textStyle_fontSize":16,"textStyle_fontSize_inherit":false}},"headStyle_h1":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingBefore":20},"styleDefault_textStyle":{"textStyle_bold":false,"textStyle_bold_inherit":true,"textStyle_fontSize":20,"textStyle_fontSize_inherit":false}},"headStyle_normalText":{"styleDefault_paragraphStyle":{"paragraphStyle_lslm":1,"paragraphStyle_lslm_i":false,"paragraphStyle_spacingMode":0,"paragraphStyle_spacingMode_inherit":false}},"headStyle_subTitle":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingAfter":16,"paragraphStyle_spacingAfter_inherit":false,"paragraphStyle_spacingBefore":0},"styleDefault_textStyle":{"textStyle_fontFamily_inherit":false,"textStyle_italic":false,"textStyle_fontSize":15,"textStyle_fontFamily":"Arial","textStyle_italic_inherit":false,"textStyle_fontSize_inherit":false}},"headStyle_h6":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingAfter":4,"paragraphStyle_spacingAfter_inherit":false,"paragraphStyle_spacingBefore":12},"styleDefault_textStyle":{"textStyle_foregroundColor":"#666666","textStyle_foregroundColor_inherit":false,"textStyle_italic":true,"textStyle_bold_inherit":true,"textStyle_fontSize":11,"textStyle_italic_inherit":false,"textStyle_bold":false,"textStyle_fontSize_inherit":false}},"headStyle_h5":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingAfter":4,"paragraphStyle_spacingAfter_inherit":false,"paragraphStyle_spacingBefore":12},"styleDefault_textStyle":{"textStyle_foregroundColor":"#666666","textStyle_bold":false,"textStyle_foregroundColor_inherit":false,"textStyle_bold_inherit":true}},"headStyle_h4":{"styleDefault_paragraphStyle":{"paragraphStyle_spacingBefore_inherit":false,"paragraphStyle_spacingAfter":4,"paragraphStyle_spacingAfter_inherit":false,"paragraphStyle_spacingBefore":14},"styleDefault_textStyle":{"textStyle_foregroundColor":"#666666","textStyle_bold":false,"textStyle_foregroundColor_inherit":false,"textStyle_bold_inherit":true}}}}
含义:这个 command 比较大,也比较特殊,它用来描述标题的默认样式。你可以将其理解为一个配置项,是标题、副标题等的默认样式。
创建语言配置
commands[3]
{"type":"applyStyle","styleType":"language","startIndex":0,"endexIndex":0,"styleMap":{"language_locale":"zh_CN"}}
含义:当前文档的语言
创建段落样式
commands[4]
{"type":"applyStyle","styleType":"paragraph","startIndex":11,"endIndex":11,"styleMap":{"paragraphStyle_keepLineTogether_inherit":true,"paragraphStyle_avoidWindowaAndOrphan_inherit":true,"paragraphStyle_spacingMode_inherit":true,"paragraphStyle_lineSpacing_inherit":true,"paragraphStyle_indentLeft_inherit":true,"paragraphStyle_indentRight_inherit":true,"paragraphStyle_alignment_inherit":true,"paragraphStyle_borderLeft_inherit":true,"paragraphStyle_styleDefault_inherit":true,"paragraphStyle_spacingBefore_inherit":true,"paragraphStyle_spacingAfter_inherit":true,"paragraphStyle_lineSpacing_lm_inherit":true,"paragraphStyle_borderRight_inherit":true,"paragraphStyle_borderBottomtw_inherit":true,"paragraphStyle_keepWidthNext_inherit":true,"paragraphStyle_borderTop_inherit":true,"paragraphStyle_ifl_inherit":true,"paragraphStyle_borderBottom_inherit":true}
}
含义:在以文档索引 11 位置为开始的段落样式
创建文本样式
commands[5] ~ commands[8], 我们通过 commands[5] 来介绍创建文本样式的数据结构
{"type":"applyStyle","styleType":"text","startIndex":0,"endIndex":11,"styleMap":{"textStyle_underline":false,"textStyle_underline_inherit":true,"textStyle_small_caps":false,"textStyle_strikethough_inherit":true,"textStyle_backgroundcolor":null,"textStyle_fontSize_inherit":true,"textStyle_backgroundcolor_inherit":true,"textStyle_fontFamily_inherit":true,"textStyle_bold_inherit":true,"textStyle_verticalAligment_inherit":true,"textStyle_fontSize":11,"textStyle_fontFamily":"Arial","textStyle_bold":false,"textStyle_textWeight":400,"textStyle_italic_inherit":true,"textStyle_foregroundColor":"#000000","textStyle_foregroundColor_inherit":true,"textStyle_italic":false,"textStyle_verticalAligment":"nor","textStyle_strikethough":false,"textStyle_small_caps_inherit":true}
}
**含义:**在文档内容索引0~ 11 的位置创建一个文本样式,此时我们能发现文档的层级自上到下是 :
文档→ 段落→ 文本
创建图片
这是通过 chrome 调试器查看 save 接口的请求体
https://docs.google.com/document/d/1VPHd0n4xbjMgRH8RWaG7nus7mU0s84TUztZaY4aUXwA/save
很明显,这是由多个 command 组合的,我们还是翻译一下
[{"type":"multi","multiCommands":[{"type":"insertSpacer","insertBeforeIndex":12,"spacers":"*"},{"type":"addEntity","entityType":"inline","id":"kix.mzzo7hjigmkz","entityPropertyMap":{"entity_embedded_object":{"embedded_object_marginLeft":9,"embedded_object_marginRight":9,"embedded_object_marginTop":9,"embedded_object_marginBottom":9,"embedded_object_type":0,"image_width":369.75,"image_height":272.25,"image_src":"","image_cid":"PLACEHOLDER_1d402a7d7b3610f3_0"}}},{"type":"TETHER_ENTITY","id":"kix.mzzo7hjigmkz","spaceIndex":12}]}
]
这个地方我们需要说明下插入图片(插件)操作的设计,实际这个操作是由 3 个 command 组合而成的,首先在文档内容的 12 位置插入一个特殊字符 “*”,然后给 12 位置创建一个”tether“类的command,然后这个“tether”类的command和 "addEntity"类的 command 建立链接,通过 "id: kix.mzzo7hjigmkz"建立关联
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ERzYccp8-1642041334642)(https://docs.corp.kuaishou.com/d/loadimage/-8658999341642970419)]
创建表格
创建一个 2 行 3 列 的表格
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-s2MzucNf-1642041334642)(https://docs.corp.kuaishou.com/d/loadimage/-889372112303852054)]
还是通过 Google Docs 的 Save 接口获得创建表格的 commands
[{"type": "multi","mts": [{"type": "insertSpacers","insertBeforeIndex": 14,"spacers": "\n\u0010\u0012\u001c\n\u001c\n\u001c\n\u0012\u001c\n\u001c\n\u001c\n\u0011"},cellTextStyleCommand1,cellTextStyleCommand2,cellTextStyleCommand3,cellTextStyleCommand...]
}]
由于这个 commands 非常大,需要把对单元格创建文本样式的操作都简化为 cellTextStyleCommand1,.cellTextStyleCommand , 创建表格的创建字符内容非常特殊,是 “\n\u0010\u0012\u001c\n\u001c\n\u001c\n\u0012\u001c\n\u001c\n\u001c\n\u0011”, 我们把它换种形式去看,他就是这样的
【TableStart】\u0010【Row】\u0012【CELL】\u001c\n 【CELL】\u001c\n 【CELL】 \u001c\n【Row】\u0012【CELL】\u001c\n 【CELL】\u001c\n 【CELL】 \u001c\n【TableEnd】\u0011
显而易见, Google 文档通过转义字符来代表表格起始、行起始、单元格起始以及表格结束。对表格内进行文字输入,实际就是在这些特殊的转义字符后添加字符串。
删除字符
假如我在文档上删除 “快手,拥抱每一种生活” 中的 “每一种”
{"type":"deleteSpacers","startIndex":6,"endIndex":8
}
含义:将文档索引 6 到 8 的位置删除
用户操作的合并
实际上用户通过对一个文档不断的操作,产生越来越多的 commands ,页面重新打开,Google 文档不会直接返回全量的 commands,而是对所有操作的 commands 进行一次合并,例如
{"type":"insertSpacers","insertBeforeIndex":1,"spacers":"快手","createdTime":"2020-11-03"
}
和
{"type":"insertSpacers","insertBeforeIndex":3,"spacers":",拥抱每一种生活","createdTime":"2020-11-04"
}
最终合并
{"type":"insertSpacers","insertBeforeIndex":1,"spacers":"快手,拥抱每一种生活","createdTime":"2020-11-05"
}
总结
通过对抓取到的 command 进行翻译,我们能够理解到: Google文档设计的前后端交互协议*(command)*能够在保障涵盖对一篇文档全部内容的同时,最大限度的减轻服务端压力。通过这次对Google文档的系统分析,我们可以对在线协作文档的前后端交互协议设计(本文中的command)得出如下建议:
- 遵循“用最少的数据,涵盖最全面的编辑器展示场景”的方针
- 分粒度维护用户可编辑的每一个场景(如文字、段落、表格等)
- 每次交互协议都应该是完全可预测且可逆的
实际上 Google 文档还有诸多特性涉及数据设计,例如分页、页眉、页脚、评论、批注等等,它又是如何设计的,好奇的同学可以在评论留言,我会为大家解答。
#专栏作家#
张驰Terry,微信公众号: zhangchi_insight,SaaS 领域连续创业者,9年从业经验,高级技术专家,专注于 CRM SaaS 和 Productivy SaaS。
本文为原创发布,未经许可,禁止转载。