👨🎓 About the Author
I obtained my undergraduate degree at Chinese University of Hong Kong (CUHK) from the year 2015 to 2019, at the Department of Chinese Language & Literature. While there, I spent a majority part of my studies in classical Chinese philology and palaeoraphy, with an emphasis on the dynasty bibilographies and manuscript version comparsions. My final-year dissertation there was supervised by Prof. Shen Pei. I spent the subsequent year between 2019-2020 at Edinburgh University, UK. My graduate dissertation, supervised by Prof. Joachim Gentz, focused on the fundamental research and collations of the National Museum of Scotland (NMS) collection of 1,777 oracle bones in Edinburgh. In 2022, I was admitted into Cambridge University, to conduct further research on the Lionel Charles Hopkins collection of oracle bones located at Cambridge University Library, under the supervision of Prof. Roel Sterckx. The website launched here forms part of my ongoing PhD research.
There has been a huge amount of work related to data gathering and coding efforts between having the initial idea and achieving the final launch of the current website. Since the summer of 2020, I have taught myself multiple programming languages such as python, C, C++, matlab and javascript. The reason for building a website like this is due to the careful considerations that there exists a serious lack in standardized font databases and transcriptions databases in the current field of oracle bone research, which has made the search for the desired graphs or any of the relevant decipherment literature quite miserable, to say the least. For the longest time, the simple task of typing and inputting oracle bone characters in Microsoft Word can not even be achieved, and the whole field has been staying quite static in terms of digitalization efforts, so much so that thumbnail images must be used to display graphs, and the manual leafing through each single rubbing has to be done to create a basic corpus of needed materials. Compared to the field of classical Chinse philology, where there has been a lot of amazing digital text databases such as "Zhongguo jingdian gujiku", "Erudition Database", "Shutongwen Database" that can aid in various aspects of historical research, the field of oracle bone digital research remains at a stage which can only be described as primitive. Although there has indeed been products like Chant Database from CUHK which supports over 100,000 entries of oracle bone transcriptions, the digital content published at that time was quite outdated and the databases of such likes prove very hard to update, due to largely the fixed nature of character encodings and modern-archaic mappings, which can not cater to the scenerios of characters that have no modern equivalents, or multiple modern equivalents, or uncoded modern equivalents, nor can they (the text databases) respond in real-time to the constantly evolving research in character decipherment, the result of which is that they lost utilities in the precise filtering of relevant characters and lingustic materials, and the researchers, therefore, ultimately end up spending a lot of manual efforts in referencing physcial book copies and back-and-forth visual examinations and memorization. This leads to a very steep learning curve for researchers who just stepped into the field of oracle bones, since it requires them to be very familiar with various handbooks, character compilations, concordances to do any types of efficient and informed research, the unnecessary technicality and mundanity of which impedes other essentital aspects of studies such as those in general history, lingustics, education and also prohibits scholars from other fields to engage in active discussions related to the study of oracle bones.
However, it is not difficult to see that the main obstacle to the development of oracle bone research is not only the difficulty of academic research itself, but also the lack of technical means. In terms of the lack of achievements in oracle bone character databases, vocabulary databases, transcription databases, OCR, NLP, etc., the fundamental reason is the same, that is, because there is no sound oracle bone character encoding solution. In the current field of oracle bone characters, it can be said that the situation where one oracle bone character corresponds to multiple modern equivalents, multiple modern characters correspond to one oracle bone character, multiple modern characters correspond to multiple oracle bone characters, or there is no modern equivalent corresponding to the oracle bone character, occupies the vast majority of scenerios in the oracle bone research. In addition, there are also cases where the oracle bone graphs are merged, separated, or canceled with other "subgraphs" (variant form), and that the related modern equivalents do not exist in the CJK character set, which ultimately makes it difficult to establish a long-term unified character database and a stable transcription database. Therefore, the construction of a full character library that can be extended in a reasonable and efficient manner is of great academic significance. Therefore, since 2021, I have been focusing on the research and development of related oracle bone character font generation and automatic tracing technology. After a long period of repeated experiments, I finally devised a process that can convert hand-drawn lines/strokes into high-definition vector characters, and persisted in drawing 200-500 oracle bone characters daily, completing the collection of all the characters included in Prof. Li Zongkun's "Oracle Bone Character Compilation". After that, I went through a long period of characters corrections, tracing corrections, supplementary collections and modern character unicode mapping, and finally completed this version of the oracle bone character library/database/font, which includes 50,000+ characters, in order to provide some standardized solutions for the digital processing of oracle bone characters. The production of this character database is equivalent to the completion of an electronic character compilation, but unlike traditional text compilations, the character appearance and modern character mapping in this character library can be updated and revised at any given time, making it far more flexible and extensible than any of the physical compilation works.
The establishment of this website is to provide a platform for downloading and searching for the Jingyuan Oracle Bone Font, and to add some related auxiliary tools and visualization modules. Of course, in the future, more research results will be gradually added, in doing which I hope to provide some new methods and perspectives for oracle bone research, and to make some contributions to the development of oracle bone research.
📚 About the Website
Jingyuan Oracle Bone Digital Platform, named after Yu Jiaxi's saying of "distinguishing the academic, examining the source (镜原 jingyuan)". The current website is designed as a digital humanities project for oracle bone research, aiming to systematically digitize, analyze and research some technical and difficult areas and problems in oracle bone research through digital means, including computer graphics processing technology (CV), natural language processing (NLP), deep learning and full-stack web technologies. The entire project is divided into four parts, and the current stage is mainly focused on oracle bone font, oracle bone character database, oracle bone input method, etc., and some relevant visualization pages have been added. More functions and the integration of academic resources will be introduced in the future, including:
- Phase One: Oracle Bone Font, Oracle Bone Character Database, Oracle Bone Smart IDE text editor, Oracle Bone Visualization Tools, etc. This part is the currently released content, mainly aimed to provide downloads of related fonts, quick searches for all oracle bone characters, quick input and typing methods, auto-completion of lingustic entries, and visualization of character types and typological evloution timeline.
- Phase Two: Direct mapping between oracle bone decipherment literature and characters, detailed character explanations, annotations of proper nouns, and automatic OCR recognition of oracle bone texts. This part will be the primary focus of next-stage development, and mainly to realize the direct correspondence between previous oracle bone literature and characters, add detailed character explanations and annotations for people, places, clans, officials, events, and objects, and train and release large models of oracle bone OCR recognition based on deep learning networks, which is expected to be completed within the next year.
- Phase Three: Establishment of a large unified and stable database of oracle bone texts/transcriptions. This part integrates the current published oracle bone fonts, query methods, and IDE to input and integrate the main oracle bone texts, establishing a wholistic database that can support all oracle bone character filtering and is able to reflect the latest academic research, and add multi-functional text retrieval functions. This task is expected to be completed within the next two years.
- Phase Four: Establishment of a large database of oracle bone rejoing (fragment reassembly) and a multi-functional query interface. Previous websites related to oracle bone rejoining, such as the Yin Qiwenyuan's rejoining database and Fudan University's Combination of Jade and Pearls, mainly focus on the collection and query services of existing joined fragments, but for the unjoined, but computer-assisted joinable pieces, there is currently no implementable solution. Although many institutions and individuals have indeed published oracle bone rejoining-related algorithms and AI models, but these generally only apply to smaller datasets and in the meantime rely heavily on extensive manual labeling. However, for the current 150,000 oracle bones, if one truly wants to apply computer-aided rejoining on a scale that can be categorized as an "application", the model or algorithm has to traverse all 150,000 oracle bones in all their possible permutations, which will produce 112.5 billion times of calculations. If a single comparing run of a rejoining algorithm takes 100ms - a relevant fast complex image processing algorthim - it will still take 35.7 years in single-computer-process calculation, and the storage of related results will also occupy thousands or tens of thousands of TB of space. Therefore, a project like this not only requires some manual labeling to limit the combination conditions (such as category, material annotation, etc.), but also relies on large-scale computer clusters and distributed computing technology, as well as extensive parallel computing and storage units. This stage of the project is expected to be completed in the next three years.
The realization of these functions requires a lot of computer professional technology. Taking the first two phases of the project as an example, the completion of all these work, in terms of programming languages, requires python, C++, Matlab, javascript/typescript, html/css/scss/sass, sql, etc., and in terms of graphic processing, python opencv, numpy, scipy, matplotlib, pillow, pytorch, etc., and in terms of full-stack web technology, django, nodejs, vue, vite, nuxt, tailwindcss, echarts, gsap, etc. for the current frameworks of the project. Learning these languages and framework technologies are a huge challenge for a non-computer major student, which is why I have spent a lot of time and efforts over the past few years for just completing the first phase of the project - a relevant smaller chunk of the whole project in whole. The large amount of code writing and frameworks involved also means unfortunately that technical problems and functional defects that will arise with the current and future website modules are inevitable, so I hope that the users of this website can understand and support the website, and I also welcome valuable comments and suggestions on the future of the website.
Of course, the launch of this website is not just the product of one person's efforts. In the later stages of the project, I have also received support and help from many institutions and individuals at home and abroad, making the eventual launch of the website a genuine possibility. In particular, I would like to thank the Oracle Bone Digital Laboratory at Anyang Normal University and the Pre-Qin History Research Department of the Chinese Academy of Social Sciences Institute which have given more than generous support for the launch of this website, and have also provided a lot of practical resources and suggestions, to whom I would like to express special thanks:
- Dean Liu Yongge, Li Bang, Qiao Yanqun of Anyang Normal University
- He Xiaoping, Dong Gaoshan, Gao Yue, Li Qian and other students of Anyang Normal University
- Prof. Song Zhenhao, Dr. Zhi Xiaona, Dr. Sun Yabing of the Pre-Qin History Research Department of the Chinese Academy of Social Sciences Institute
- Ph.D. student Wang Mengwei of the University of Chinese Academy of Social Sciences
- Wang Chaoyang, Hui Pengyu of Tencent SSV Laboratory
- Tencent Cloud and Huawei Cloud Xuchang Supercomputing Center operations staff
- Dr. He Yan, Head of the Chinese Section at Cambridge University Library, and Prof. Roel Sterckx of the FAMES fauclty at Cambridge University
📜 Update Log
The following keeps the update log of the current website modules, mainly recording the newly released features and bug fixes related to the website pages and components:
2024-10-15
🚀 New Features
- Released Integrated Rubbing-Transcription Viewing Page, which can be accessed here. The page includes a total of 10,077 rubbings, 21,941 inscrition sentences and 115,319 characters, and provides the viewing and adjustment functions for images, as well as both archaic and modernized forms of transcriptions for reference. The data in the page is sourced from the "Oracle Bone Multi-modal Dataset" released on July 5, 2024 by the "Digital Oracle Bone Collaboration Center", and the relevant character transcription, inscription unit, sentence order is generated dynamically based on the labeled information in the dataset. For more information, please refer to the relevant page.
2024-10-05
🚀 New Features
- Released Jingyuan Oracle Bone Font v1.0.1.
- Added query functions for
祭名 (Sacrifical Rite)
and田猎地名 (Hunting Location)
proper nouns in the font database. - Added
Related Decipherment Literature
database, and opened aRelevant Literature
section under the glyph details page, providing bibliographic search and PDF direct jump functions, such as the article "释甲骨文“𩂣”及相关诸字" under the glyph 𩂣. Currently, dozens of corresponding entries have been added, and more will be gradually added in the future. - Added
Language Switch
function, which supports the switching between English and Chinese languages of the website. At the moment, part of the content has been added internationalization support, and more will be gradually finished in the future. - Added
Theme Switch
function, which supports the switching between a few lighter themes of the website. The current theme is set to light mode by default, and the dark mode will be gradually optimized in the future. - Added media queries for responsive web page design, completed the adaptation for mobile devices such as phones and tablets, and now the main site pages can be viewed on mobile devices.
🔧 Fixes
- Fixed the bug in the
Character Database
page, where theComponent Query
mode could not be accessed normally and only an non-repeating component can be queried. Now, the component query mode supports multiple repeating components, such as日、日、日
,水、目、口
, etc. - Improved the loading performance of the
Oracle Bone Chronological Timeline
page, reducing the API requests from 200 to 1, and improving the overall page loading speed. - Improved the rendering performance of the
Font Documentation
page, moving the release log of versionv1.0.0
to the dropdown display, reducing the initial page loading time.