Skip to content

Charles Goldfarb: The Markup Pioneer Who Revolutionized Digital Documents

In the history of computing, there are a handful of visionary pioneers whose ideas and innovations changed the trajectory of the field and rippled out to transform entire industries. Charles Goldfarb is undoubtedly one of those pioneers. Over a remarkable five-decade career at IBM, Goldfarb introduced the revolutionary concept of markup languages, paving the way for the creation of the World Wide Web and the digital content explosion. But his journey from lawyer to "father of markup" was anything but predictable.

The Accidental Technologist

Charles Goldfarb never set out to change the world of computing. After graduating from Harvard Law School in 1967, Goldfarb landed a job at IBM as a lawyer specializing in intellectual property. However, he quickly found himself drawn to the exciting work happening elsewhere in the company involving computers and information processing. When the opportunity arose to transfer to IBM‘s Cambridge Scientific Center and work on a primitive document management system, Goldfarb jumped at the chance.

It was there, in 1969, that Goldfarb had his eureka moment. He realized that for a document management system to work across different applications, there needed to be a standard way to represent the structure and meaning of the content, independent of its formatting. Together with fellow researchers Ed Mosher and Ray Lorie, Goldfarb developed the first markup language, dubbed the Generalized Markup Language or GML.

GML introduced the basic concepts that would come to define markup languages:

  • Wrapping content elements in descriptive tags, like <name> or <address>
  • Using start and end tags to define element boundaries
  • Focusing tags on the semantic structure vs. visual formatting
  • Creating a markup syntax that was human-friendly and machine-readable

Goldfarb and his colleagues also made GML extensible, allowing users to define their own tags and document types. This proved to be a key to GML‘s popularity within IBM. "Customizers quickly discovered that they could create new tag sets for new kinds of documents," Goldfarb recalled. "GML was being used for all kinds of applications I had never dreamed of."

Developing a Standard for Structured Documents

By the mid-1970s, as word of GML spread, Goldfarb recognized the need to standardize the language. In 1974, he convened a committee to develop what would become the Standard Generalized Markup Language, or SGML. Over the next 8 years, the committee (which Goldfarb chaired) worked to formalize GML‘s ad hoc features and add powerful new capabilities.

The SGML standard, published in 1986, retained GML‘s basic tag structure but added:

  • Document Type Definitions (DTDs) – schemas that define a document‘s element types, attributes, and structure. This allowed SGML to be used for a wide range of document types with rigorous, machine-checkable rules.

  • A formal grammar – SGML introduced a standardized, unambiguous syntax for defining markup languages using the Backus-Naur Form (BNF) meta-syntax notation. This made SGML markup and DTDs interoperable across systems.

  • Link processing – SGML could express links and cross-references within and between documents, foreshadowing hypertext and the Web.

  • Concurrent document types – SGML documents could contain multiple DTDs, allowing for compound documents with different content types like text and graphics.

But perhaps the most groundbreaking innovation in SGML was Goldfarb‘s concept of the validating parser. An SGML parser could read a document‘s markup and content, check it against the provided DTD, and flag any errors in tagging or structure. This allowed for automated processing of SGML documents at scale with predictable, reliable results.

In Goldfarb‘s view, the parser was the key to unlocking the full potential of descriptive markup. "Once you have a validating parser, you can write other applications that count on it to monitor the input for them," he explained. "They don‘t have to check for every possible error themselves; the markup takes care of that."

Rise of the Machines (and the Humans)

SGML‘s powerful features quickly made it the lingua franca for publishing systems and technical documentation, especially in complex domains like aerospace, defense, manufacturing and government. For example, the U.S. Department of Defense standardized on SGML for its technical manuals, allowing for interoperable documents across different agencies and contractors.

But Goldfarb always intended for SGML to be useful for individual authors as well as large organizations. "I thought the great power of generic markup was that it could be used by document designers to create any kind of markup language," he said. "The problem was they had to be fairly sophisticated to design something that really worked well."

This vision of "markup for the masses" was realized in a big way with the rise of HTML and the World Wide Web in the 1990s. HTML was directly based on SGML, adopting its tag structure and basic syntax. But HTML streamlined the SGML feature set and provided a fixed vocabulary of tags suited for simple hypertext documents. Suddenly, anyone could easily create and share content on this new global medium, the Web.

The impact was staggering. As Goldfarb put it: "People often credit the Web‘s success to its ability to allow anybody to be a publisher. I think that‘s only half the story. The real power of the Web is that it allows anybody to be a reader." The Web exploded from 130 sites in 1993 to over 650,000 in 1997, enabled by the dead-simple HTML markup Goldfarb had inspired.

An Enduring Legacy in a Changing World

In the years since the Web‘s rise, Goldfarb‘s ideas have continued to shape the evolution of digital content and communication. In 1998, XML (eXtensible Markup Language) was introduced as a streamlined successor to SGML, optimized for Web-based applications. XML has gone on to become a key enabling technology, powering everything from APIs and data exchange to ebooks and office document formats.

More recently, Goldfarb‘s vision of semantic, structured content has been embraced by new approaches like Content Management Systems, Linked Data, and the Semantic Web. By separating content from presentation, and focusing on meaningful structure, these technologies aim to make information more dynamic, interactive and intelligent.

Beyond inspiring new technologies directly, Goldfarb‘s conceptual breakthroughs have stood the test of time. The core principles he identified – descriptive vs. prescriptive markup, separation of content and formatting, machine-processable and human-readable code – are now design axioms for modern information systems of all kinds.

Goldfarb has also been a tireless advocate and educator for the field he helped create, as an author and speaker. His book "The SGML Handbook" (1990) remains an essential reference and guide for markup practitioners. And he continues to advise and consult on content engineering for a new generation.

For his immense contributions, Goldfarb has been widely honored by his peers and industry. But perhaps the greatest testament to his impact is that we now take for granted the way we effortlessly create, share and interact with digital content of all kinds, every day. Markup languages have woven themselves into the fabric of our information-centric world.

Looking ahead, as new frontiers like voice interfaces, AR/VR, and AI transform the way we engage with information and each other, Goldfarb‘s foundational ideas about meaningful content structure will no doubt continue to light the way. As he puts it, "The world is constantly changing, but there are some verities. Markup is one of them."