Linguistics Manifesto: PreBabel --- the universal & perfect language

{Go to, let us go down, and there confound their language, that they may not understand one another’s speech. So the LORD scattered them abroad from thence, upon the face of all the earth: and they left off to build the City. Therefore is the name of it called Babel, because the LORD did there confound the language of all the earth: and from thence did the LORD scatter them abroad upon the face of all the earth. (Genesis, chapter 11: 7 to 9)}

God did, Bible said.

Longing for a universal language is a dream of mankind since antiquity, such as the Biblical story of Babel. In the human history, many languages (such as, Greek, Latin, Arabic or English) claimed to be a universal language with the political or economic supremacy for a short period of time (hundreds of years), especially in the area that its political power could reach. Nonetheless, a few languages do act as trans-national and trans-racial literary language for millenniums, such as the Chinese written language in China, in Vietnam, Korea and Japan. However, there are, at least, two difficulties for any natural language to become a true universal language.

No natural language is easy. Less than 15% of people can truly master their mother language to a scholastic level. In general, the difficulty of learning another natural language as a second language is about 10 times harder than learning the mother tongue. Thus, even if we all accepted politically that one particular natural language (such as, English) is the lingua franca, the illiteracy rate for this language would have still been higher than 85% worldwide.
Just as all the de facto world languages owe their status to historical political supremacy, the suggestion of a given natural language as a universal language has strong political implications, and the major world powers will never be agreeing such an agreement. Thus, the best hope for a universal language, if ever possible, is by choosing an insignificant language or a constructed one, such as Esperanto.

The above analysis shows that the all lingua franca in history or currently are the result of political power, not a true universal language linguistically.

With these realities, a universal language, if any, must be:

as a second language for all people, and
as a constructed language.

Then, we must answer the following questions.

1. Can a constructed language have the same scope of a natural language?

2. Can a small set of root words (humanly readable, not machine codes) be found to encode the entire vocabulary of a natural language?

3. What is the minimum number of root words needed for such an encoding?

First is the first, can question 1 be answered, at least, in principle? The answer is a big Yes.

For every kind of encryption, it constructs a new language for a natural language. The simplest encryption for English is by moving its first letter to be the last one for every word. This newly encrypted vocabulary is, of course, a constructed language and is identical to the old language in scope. Thus, finding a set of symbols to encode all English words is theoretically practical.

However, this encrypted new English language has a zero gain in linguistics. Thus, the key point is about the question 2. Can we find an axiomatic set with finite number of members and rules while it can regenerate a natural language in its entirety and can be read by human (not machine) easily?

This book is trying to show that a PreBabel universal language is, indeed, a reality. In this preface, I will go over the history of development on this PreBabel discovery.

In the early 1990s, the computer scientists were searching for a universal computer language which can run on all computers regardless of their underlying computer architectures. The solution was the Java with a Java virtual machine, developed by Sun Microsystems.

At that time, my reaction was: Can we also construct a universal Natural language?

I immediately came up some criteria for this universal (natural) language (the U-language) as follow:

1. The theoretical definition -- a universal language (u-language) must be able to "re-produce" every nature language in existence. Here, the term "re-produce" is not translation. It must mean that the entire language system (vocabulary and grammar) of a selected language can be re-written with the PreBabel codes, vocabulary of the u-language. In fact, this selected language (such as English, Japanese, etc.) must be 100% isomorphic to a subset of this u-language. If such a u-language can be constructed, then a true automatic language translation machine can be built.

The practical constrains -- if a u-language is too difficult to learn by an average person (not machine), it will become a dead language right after its birth. The rule of the thumb is that it must not be more difficult than any nature language which is learned as a second language. In fact, the design criterion should be 10 times easier to learn than any nature language to be when it is learned as a second language. Yet, it is difficult to know what the term "10 times" means. We should give it a quantified criterion. It must be learned in 100 days when a person (12 years or older) spends 3 hours a day of good (no playing around) study.
The attributes --

It is a second language for many nature languages. That is, no particular nature language is a pre-requisite for learning this u-language. A u-language must be learned without any particular nature language as its language environment. It must be learned as a knowledge (such as chemistry or arithmetic), not as a living habit.
It has to be a mute or a silent language (at the beginning) in order for it to carry all-natural verbal languages as its dialects.
Of course, for any word token, it can always carry a sound. However, the pronunciation of the u-language word token should be evolved with the using community. Then, the verbal of the u-language will become a true universal speaking language.

With the above criteria, I proved two laws (in 1997):

PB Law 1: Encoding with a closed set of root words (the PreBabel root set), any arbitrary vocabulary type language will be organized into a logically linked linear chain.

PB theorem 0: if a closed set of root words can encode one natural language, it can encode ALL-natural languages.

Note: a closed set means that the parts (radicals) of all vocabulary of a language will not contain any symbol beyond (or outside of) the given root word set (in finiteness).

PB Law 2: When every natural language is encoded with a universal set of root words, a true Universal Language emerges.

With these two laws, I immediately concluded that I was unable to construct such a universal natural language, for three reasons:

1. although English has only finite number of word-tokens (alphabets and root-words), it can obviously not able to meet the above criteria.

2. I have no idea of how to construct a set closed codes (root-words or radicals) to encode a (any) natural language.

3. Even if I tried to invent a universal-code set, it will be a nightmare for me to prove or test out that that set of codes does, indeed, encode a (any) natural language in its entirety.

With the above three reasons, I did not think that searching for a universal (natural) language is a worth awhile project.

In 2001, I was in a party while one old man (about 70 years old) talked about the evilness of simplified Chinese written system. At that time, I had not learned anything about the simplified system and was not in any position to make any comment. Furthermore, I did not use (read or write) the traditional Chinese written system for 30 years by then; that is, I could not even write a simple Chinese sentence without wondering of how to write this or that words (even the mother tongue can be forgotten). Coming home from the party, I asked my father (a professor of Chinese Literature of Taiwan Central University) about this evilness of Simplified system. He gave me two books {康熙字典 (kangxi dictionary) and 說文解字 (Shuowen Jiezi)} and said: studying these two books and you will know the answer.

Both are dictionaries. Read dictionaries? Yes, I did.

康熙字典 (kangxi dictionary) is organized via 部首 (radicals) but gives the description of each word in terms of its phonetic. In Chinese, each word has many different pronunciations (Heteronyms). For word X, when it pronounces X, it means A; when it pronounces Y, it means B, etc...

So, 康熙字典 is all about word’s pronunciations which determine its meanings, and its usages.

As a dictionary, there is no right or wrong issue for 康熙字典.

Note: while Homographs/heteronyms are exceptions in English, they are 100% the case in Chinese. That is, each and every Chinese word is a Homograph/heteronym.

On the other hand, 說文解字 (Shuowen Jiezi) is all about the STRUCTURE (the composite of radicals and parts) of the words, based on a set of radicals (540). That is, the meaning of a word derives from those radicals. The sound of the word was given without any theoretical explanation. Although it describes 六書 (six ways of constructing the Chinese words): 象形 (pictograph) · 指事 (pointing) · 會意 (sense determinators) · 形聲 (phonetic loan) · 轉注 (synonymize) · 假借 (borrowing), yet 90% of the words (about 9,000) in the book are classified as 象形. Thus, in the history, the Chinese written system was described as pictographic system.

Obviously, the Chinese character system is described with two completely different pathways. From this inconsistency, I developed the “New Chinese Etymology”, with three results:

One, all Chinese written words (about 60,000 now) can be constructed with a set (220, a finite number) of root-words.

Two, the meaning of each and every Chinese written word can be read out from it face (by decoding its composing radicals)

Three, the sound (pronunciation) of each and every Chinese written word can be read out from it face too.

With the above finding, I published {Chinese word Roots and Grammar; US copyrighted on May 5, 2006, TX 9-514-465}. This book was written in Chinese.

On January 16, 2008, I published {Chinese Etymology; US TX 6-917-909}. This book is a textbook (in English) for foreigner (such as Americans) to learn Chinese via this new system.

On May 24, 2012, I published {Chinese Etymology Workbook One; with US TX 7-539-827}. This is a workbook for the above textbook.

It took me three years (from 2002 to 2005) to read 2 dictionaries. It took me also 3 years (from 2005 to 2008) to write two books (one in Chinese and one in English) on this new Chinese Etymology. In those years, I worked on Chinese Etymology every day without thinking about anything else.

One day in September 2008, I made a statement: the entire Chinese written language (one of the natural languages) can be encoded with a set (in finite numbers) of radicals. Then, the lightning strikes: what about my u-language laws of 1997?

Now, I have found a closed set of codes which can encode the entire Chinese written language; that is, this set should be able to encode all-natural languages in terms of my PB law 1 and theorem 0.

In addition to construct a u-language via my u-language theorem (1997) + the new Chinese etymology (encoding the entire Chinese language), I developed a u-language theoretically via the Martian Language Thesis (MLT) -- Any human language can always establish a communication with the Martian or Martian-like languages. Thus, the Martian Language Thesis is the first principle for linguistics. It encompasses the following attributes.

Permanent confinement -- no language (Martian or otherwise) can escape from it.

Infinite flexibility -- it can encompass any kind of language structure.

This MLT is based on the following two principles:

Universal principle I -- all languages (human or Martian) share the identical metalanguage.

Universal principle II -- all language structures are subsets of a universal language structure.

What is the meta-language then?

Meta-language consists of four parts:

One: the universal laws (physics, math, etc.) continent: all universal events are described by the universal laws.

Two: the universal conscientiousness (meaning) continent: the human conscientiousness views the universal laws in an identical way, getting the identical MEANING for all universal laws.

Three: there is a Grand Canyon between these two continents.

Four: Human natural languages are different symbol systems for connecting these two universal continents.

Thus, for the universal language, it must encompass the following three attributes:

A. Forming the words --- with finite number of symbols to form unlimited number of words while the meaning and the pronunciation of each word can be read out from its face.

B. Unique meaning of each word --- every word carries a “unique” meaning, not having multiple meanings.

C. Universal grammar --- a grammar is the mother of all grammars.

For answering these issues, I published a new website {http://www.prebabel.info/ } in June 2009. On October 12, 2010, I published {Linguistics Manifesto --- Universal Language & The super Unified Linguistic Theory; with US TX 7-290-840}. The issue of two continents is briefly discussed in Chapter Twelve of this book. For the details of the universal grammar, I published a book [The Great Vindications; the US copyright # TX 7-667-010 on January 23, 2013}.

The key emphasis of this book is about discussing the issue of the perfect language. That is, is the u-language also the PERFECT language?

What is the perfect language?

A perfect language should consist of three attributes:

One, it has only a finite number of tokens for constructing unlimited number of words (vocabulary).

Two, the phonetic (pronunciation) of a word (character) should be read out from its face.

Three, the meaning of a word (character) should be read out from its face.

Of course, a perfect language might not be a universal language. Although that universal language issue was addressed in detail in my previous two books, I, nonetheless, will readdress this universal language issue again and again in this book.

For English, it has 220 points out of the maximum of 300: 100 for ‘one’, having only 26 alphabets; 100 for ‘two’, almost every word can be pronounced from its face; 20 for ‘three’, as only words with roots/prefixes/suffixes can be guessed for its meaning.

On the other hand, I will show that Chinese written language is THE perfect natural language, having 300 points.

That is, I will show three linguistic issues:

One, Chinese written language can be encoded with a closed set of radicals (roots).

Two, with my u-language theorem of 1997 + the Martian Language Thesis, I have constructed a u-language.

Three, I have defined what the ‘perfect’ should be.

Now, going back to the issue of ‘Simplified Chinese system” which got me started, I discovered that the reason for its creation (the simplified) was caused by viewing that the original (traditional) Chinese written language was the worst language in the world, as the dog turd by those May 4^th movement scholars who pushed for abandoning the traditional Chinese written language, see the video {https://www.youtube.com/watch?v=HjbmAlWe_Ig } and Chapter One.

I, then, further discovered that Chinese government issued a language law in April 2006, prohibiting the use of any other forms (especially the traditional form) of Chinese written system and planned to abandon even the simplified system by 2016 while going 100% with the Romanization (the Pinyin). Yet, with my publication of {Chinese Etymology} also in 2006, China has abandoned her Romanization plan on August 30, 2017, see the news article {统编教材9月启用拼音晚学一个月, http://www.xinhuanet.com//local/2017-08/29/c_1121559170.htm } and https://www.linkedin.com/pulse/amen-victory-entire-chinese-people-jeh-tween-gong/ ; that is, I have saved the Chinese written system single-handed. These are addressed in detail in Chapter One of this book.

Superficially, this book discusses the details of the Chinese etymology, but it is not the point. The key points of this book are proving the reality of universal language and of the perfect language.

In fact, you (the readers) need not to know a single Chinese character in order to comprehend this book, as all those Chinese characters can be viewed as a set of Lego pieces. The key points of the books are the principles, the laws and the theorems of how to organize those Lego pieces. It is about the principles/laws/theorems which makes the universal language coming alive. This book just uses the Chinese etymology as one example to show those principles/laws and theorems.

Of course, this book can be very helpful for anyone who is interested in learning Chinese linguistics via this new Chinese etymology. However, the base of this new Chinese etymology (220 word roots and 300 sound modules) is not provided in its entirety in this book. If you (the readers) want to learn Chinese writing system via this new Chinese etymology, you must use the textbook {Chinese Etymology; US TX 6-917-909}.

This book is, in fact, a thread to sew up all my previous books on the following issues;

One, the theory of universal language.

Two, the definition of perfect language.

Three, the actual construction of u-language and the proof of a perfect language.

Four, the greatest historical event of saving the perfect language of the humanity from a disastrous destruction.

From Chapter one to Chapter twelve, I used Chinese etymology as one example to demonstrate the theory of universal language and to provide one real example of a perfect language. The Chapter thirteen is, however, a recap of the entire PreBabel principles and laws while also provides a real model for a PreBabel language.

Thus, this book is for linguists to witness the evidence of a PERFECT language system and of the reality of the universal language.

In addition to this book, you (the readers) are encouraged to read the following books.

One, Linguistics Manifesto --- Universal Language & The super Unified Linguistic Theory; Written in English, US copyright TX 7-290-840.

Two, The Great Vindications; Written in English and Chinese, US copyright TX 7-667-010.

Three, Chinese Etymology; written in English, US TX 6-917-909.

Four, Bible of China Studies & new Political Science; Written in English, US copyright TX 8-685-690.

Five, 中文的字根與文法: 天馬行空的漢語 (Chinese word roots and Grammar); written in Chinese, US copyright TX 6-514-465

Some info about those books is available in the Appendix of this book.

Linguistics Manifesto

Wednesday, October 8, 2025

PreBabel --- the universal & perfect language

No comments:

Post a Comment