Home KM Contents

SGML -- it's not just for documents anymore

Kurt W. Conrad
The Sagebrush Group

Many people mistakenly believe that SGML (the Standard Generalized Markup Language, ISO 8879) is useful only for document production. SGML can also be used for non-document applications, for example, to manage administrative and financial information data sets to support project planning, process improvement, and re-engineering efforts. SGML can help balance mechanical (efficiency-oriented) and organic (flexibility-oriented) approaches to information management, thereby contributing to the adaptability and well-being of an organization. This article looks at SGML implementation efforts at the Department of Energy's Hanford, Washington site and discusses the value of the standard for managing information in a changing organizational environment.

The effects of change at Hanford

Like many organizations within the federal government and in the private sector, the Department of Energy's (DOE) Hanford site in Washington State is facing cataclysmic, all-consuming change. The latest evidence is seen in a special issue of the Hanford Reach, the site's weekly newsletter. The tone of this issue is summarized in the opening article:

For the foreseeable future, perhaps the only constant at Hanford will be change. Even the site's geography will be transformed as cleanup progresses and Hanford land is released for other uses. . . . The need to fundamentally change our focus, the way we organize ourselves and the way we do our work is rooted in many complex, interrelated forces. [DOE-RL, p. 1]

Within this context of fundamental change, DOE has directed its civilian contractors to implement SGML-based life cycle information management and delivery processes for scientific and technical information. Westinghouse Hanford Company (WHC) and Boeing Computer Services Richland (BCSR) -- which functions as the information resources management department for WHC -- have a rich legacy of document production standards that predispose them towards a site-wide infrastructural approach to implementing SGML. Given the radical nature of the changes at Hanford (including plans to restructure all contracting relationships in 1996), long-term investments are being de-emphasized and the costs associated with such an implementation strategy are difficult to justify.

The need to build a business case for implementing SGML throughout the Hanford site has resulted in a quest to understand and identify opportunities for strategic value. A technology-driven approach may seem inappropriate, but it reflects the inductive thinking process described by Hammer and Champy in Reengineering the Corporation. It is also consistent with trends in the SGML industry, where externally imposed requirements are translated into opportunities to improve internal information management processes.

The General Accounting Office's Executive Guide: Improving Mission Performance through Strategic Information Management and Technology and Paul Strassmann's The Business Value of Computers have been very helpful in identifying strategic opportunities for improving information management. These and other sources have led me to view SGML not simply as a document production or information management standard, but as a tool for improving the overall well-being of the organization.

This article summarizes a few of the major conclusions reached as result of BCSR's efforts to implement SGML to manage information:

Performance management systems can provide more strategic value than document production processes

It's hard today to find an organization that isn't facing tremendous competition, change, and uncertainty. This is definitely the case for Department of Energy contractors, both at Hanford and at other sites. With the end of the cold war, the emphasis at Hanford shifted from the production of nuclear weapons to environmental restoration. Given this changed focus, Hanford, as well as many other federally-funded sites, has been forced to rethink its business processes. Routine budget cuts have increased competition among individuals and organizations for scarce resources. These changes are not unique to the federal system; organizations outside the government are facing similar pressures.

At Hanford, accelerating change has increased the stress on individuals and management systems alike. Accident rates are a major concern. Information and management overloads are becoming commonplace. Planning horizons are shrinking. Contract reform, outsourcing, activity-based management, and process management initiatives are cascading down on Hanford's civilian contractor organizations with increasing frequency.

The typical use of SGML -- to produce and deliver technical documents -- would do little to improve overall organizational performance at Hanford. The document production process is important, but it is not the root cause of organizational stress. Instead, the effectiveness and flexibility of Hanford's performance management systems are a greater concern. The pace of change has outstripped the capabilities of the processes and supporting information systems that direct, control, and monitor work. Many of the existing processes and technologies are not well suited to the changed demands of the workplace.

For the Hanford site, as for other organizations that face similar instabilities, using SGML to help address the shortcomings of the systems that measure and track organizational performance is where its strategic value lies. This article examines mechanical and organic approaches to managing information technology and explores how SGML can be used to integrate and balance seemingly conflicting approaches to managing organizational information resources.

Engineering information flows to improve mechanical efficiency and cost effectiveness within organizations

The notion of engineering the flow of information derives from ideas associated with improving mechanical efficiency, minimizing transformation costs, and optimizing the use of resources. Within such a framework, the value of information is measured primarily in terms of the inputs consumed: materiel, labor, dollars, time. Paradoxically, the desire for mechanical efficiency within workgroups generally increases the overall life cycle costs of information. This is because the optimization of information investments at a workgroup level results in suboptimization when viewed from a life cycle perspective.

Optimization occurs because computer applications almost always target a subset of the information life cycle and have a narrow organizational focus. In addition, most commercial applications retain, at their core, a paper paradigm, in which the computer functions like a big, powerful pen. Computer applications can dramatically improve one's ability to generate information products, but they do little to manage the information content. In many organizations, the increases in personal productivity afforded by computing applications do little to enhance overall organizational performance.

Barriers to information exchange also hamper information flows within an organization. Reliance on proprietary technologies (hardware and software) and encoding schemes creates pockets of information that cannot be easily integrated. Duplicate systems increase development and maintenance costs and raise serious concerns about the integrity of data and information. If information cannot be re-used, it must be recreated and reconciled or undergo expensive, labor-intensive conversions, with the result that its life cycle costs increase.

The value of information can be enhanced by expanding the engineering effort to encompass a broader organizational and information life cycle focus. In this context, the use of formalized structures reflects an engineering approach to information management.

Using SGML to improve the information life cycle for non-document applications

The SGML user community has traditionally focused on how to improve mechanical efficiency during the production of documents, Many of the same issues apply to other, non-document forms of information, including graphics applications (for example, technical illustrations or spatial modeling), chemical analysis data packages, a data dictionary, and electronic forms.

Graphics software has historically taken an appearance-centric approach designed to produce paper deliverables. This has resulted in an expanding variety of encoding schemes, which, in turn, limit reusability and make text and graphics integration difficult. The fixation on visual appearance is also a barrier to integrating the information content. The need to better manage the information content of images (as opposed to the printed appearance) is leading to new approaches. Some companies are exploring the development of SGML-based graphics encoding standards. SGML-encoding would make it easier to attach metadata (data about data) and "intelligence" to drawings, thereby increasing the value of the information objects throughout the information life cycle and facilitating logical links to related information objects.

Spatial modeling represents a shift away from "drawing maps" to a focus on managing a more complex set of information that can be used to drive a number of specific visual representations, including virtual reality interfaces. The need to make spatial modeling data more readily available is driving discussions about possible SGML, HTML (Hypertext Markup Language), and VRML (Virtual Reality Modeling Language) encoding of the data to support online delivery. In addition, there is interest in linking or integrating spatial data with traditional document applications to enhance navigation and retrieval.

A group within the Westinghouse Hanford Company is exploring the development of SGML standards for chemical analysis data packages. The group procures analytical services and must devote substantial labor to validating that all of the requested information is present before actually checking the correctness of the content. One engineer estimates that the automated validation of SGML-based data packages could cut labor costs by 90%, in addition to improving the usability of the information by downstream processes.

During 1994, work was begun to develop SGML DTDs (Document Type Definitions) for a Hanford site data dictionary. A significant part of the justification was mechanical efficiency. The existing data dictionary resided on a mainframe. Vendor- and system-neutral encoding structures for data dictionary information were expected to improve cost effectiveness, in large part through the use of PC-based software tools. It was also envisioned that SGML-based standards for software development documents could be developed to collect and feed information into the site data dictionary at a much lower incremental cost.

Boeing Computer Services Richland is also studying the feasibility of encoding electronic forms in SGML. In the past, considerable labor has been required to recreate many of the forms when commercial software is upgraded. If we can demonstrate feasibility, we anticipate that SGML encoding will reduce redevelopment costs and simplify integration with databases and workflow management systems.

Improving the mechanical efficiency of performance management systems

Many of the issues involved in managing information also apply to managing organizational performance. The information that relates to organizational performance is routinely spread across a wide variety of applications and processes that include planning, project management, budgetary, financial, billing, quality, scheduling, calendaring, and tracking systems. With the increasing interest in business process re-engineering (BPR), commercial software systems that collect and store BPR data sets are starting to appear. In some cases, they are even being linked to computer-aided software engineering (CASE) tools.

One approach to integrating performance management systems is to create a single set of integrated process and data models that describe the organization's behaviors and the associated data flows. SGML encoding of these models can add value in a number of ways. SGML DTDs can be used to define interface specifications and facilitate the exchange of information between dissimilar computing systems. SGML encoding can also simplify the formatting of multiple output streams, especially for online and electronic delivery. Through the use of HyTime and other addressing mechanisms, data sets may not even need to be translated into SGML to be integrated with SGML data streams. HyTime's addressing characteristics provide a way of layering SGML-compliant metadata on top of non-SGML data sets so that an SGML system can interact with the data without costly conversions.

Redesigning unintegrated performance management systems is an important strategy for improving organizational effectiveness and resiliency. The same characteristics that enable SGML to integrate document production processes can be applied to the information life cycles that affect organizational productivity. Improving the mechanics of information creation and management may not, however, be sufficient to change organizational behavior significantly. The following section describes another information management approach that attempts to address some of the cultural and behavioral issues of organizational behavior.

Emerging "organic" information management models are likely to improve organizational learning and adaptation

Although mechanical efficiency is important, information management systems that are too rigid can hamper organizational flexibility. As organizations evolve from "lumbering bureaucracies" to "fluid, interdependent groups of problem solvers" [Barrett], biological models are being applied to organizations and information systems. SGML is well-suited to providing the additional of flexibility required for such organic approaches.

The limitations of engineering models

While the engineering of information management life cycles can provide tremendous value, a narrow focus on mechanical efficiency can be limiting when an organization is trying to nurture change. The mechanical goal of reducing an organization and the individuals who comprise it to a finite set of entities, relationships, and rules is an unrealistic one.

Reengineering is the supernova of our old approaches to organizational change, the last gasp of efforts that have consistently failed. What is reengineering but another attempt, usually by people at the top, to impose new structures over old -- to take one set of rigid rules and guidelines and impose them on the rest of the organization? It's a mechanical view of organizations and people -- that you can "design" a perfect solution and then the machine will comply with this new set of instructions. . . . Reengineering doesn't change what needs to be changed most: the way that people at all levels relate to the enterprise. We need to be asking: Has the organization's capacity to change increased and improved? . . . Or have we just created a new structure that will atrophy as the environment shifts? [Wheatley]

Even more fundamentally, a quest for mechanical efficiency in human systems is based on idealized views of what science and engineering are all about. This idealized view of engineering influences most corporate re-engineering initiatives and might explain why 70% of all re-engineering efforts fail.

As originally described by Hammer and Champy in Re-engineering the Corporation, re-engineering is much more of an organic process that relies on inductive thinking and the breaking of rules to "create new ways of working." The institutionalization of inductive thinking is, in turn, an ongoing effort of study and learning, aimed at identifying promising technologies and mapping out their possible applications long before they reach commercial viability. Re-engineering, like most organic processes, is not designed for optimization and predictability. "To re-engineer a company is to take a journey from the familiar into the unknown" [Hammer and Champy, p. 101].

Organic models

Organic systems are not engineered, they evolve. Accordingly, biological metaphors are more useful than mechanical metaphors for describing the challenges facing organizations today. "[O]ur strongest terms of change are rooted in the organic: grow, develop, evolve, mutate, learn, metamorphose, adapt. Nature is the realm of ordered change" [Kelly, p. 353].

Our understanding of organic systems has direct implications for developing IT strategies to improve organizational performance. Information architectures have traditionally attempted to maximize the mechanical efficiency of computer systems and databases, structuring information sharing and minimizing redundant data. But such formalizations are difficult to align with the dynamics of most organizations and the time frames in which change must occur.

To be effective, information architectures need to reflect the ways that people really produce, deliver, and use information to effect change. " . . . managers get two-thirds of their information from face-to-face or telephone conversations; they acquire the remaining third from documents, most of which come from outside the organization and aren't on the computer system" [Davenport, p. 121] People are a critical delivery vehicle because their interpretations increase the value of information and add context

An obvious example of organic information management systems is the Internet. The statistics are immense, the chaos is unimaginable, and the attention is becoming frenzied. No one owns it. No one controls it. But as TCP/IP and the World Wide Web demonstrate, the Internet is becoming an increasingly important laboratory for international computing standards.

SGML and organic information architectures

SGML is well suited to supporting a "human-centered" approach to information management. SGML is a user-driven standard and the industry has evolved with a strong emphasis on usability and interchange. SGML was designed to support multiple representations, allowing interfaces to be customized for different sets of users. Together with the HyTime standard, SGML can dramatically expand the types of information that are organized and referenced, well beyond what is normally "computerized."

Perhaps even more important are the conventions that are evolving for the modeling of information (document analysis) and their formalization as DTDs. Davenport places particular emphasis on having individuals design their own information environments because that participation directly influences their willingness to use the resulting conventions.

The need for participation introduces a whole new set of issues. First, non-technologists must be kept engaged in the modeling process. After their Fall 1993 conference and exposition, Seybold Publications reported seeing SGML used for a number of "database" applications. As compared with the use of relational database technology, they concluded that SGML makes it easier to construct and interact with complex hierarchical information models like those found in dictionary entries.

Second, Davenport observed that "the more a company knows and cares about its core business area, the less likely employees will be to share a common definition of it" [Davenport, p. 122]. DTD development is frequently described as a "contact sport" because of the conflict that results when a varied group of information producers and consumers attempts to develop common definitions. The steadily rising number of SGML applications attests to the flexibility of the standard and the ability of facilitation processes to negotiate these definitions successfully.

In the context of internal information architectures, however, precise uniformity may be counterproductive. "All information doesn't have to be common; an element of flexibility and disorder is desirable [Davenport, p. 122]. In addition, it becomes important to "assume transience of solutions" and "multiple meanings of terms" and to "build point-specific structures" [Davenport, p. 123].

To support these additional requirements, "organic" DTDs are likely to differ from interchange DTDs in a number of ways:

Applying organic information management models to performance management systems

How do organic information management approaches relate to performance management systems? Most organizations are involved in an ongoing learning process. Structuring and integrating the information from related systems is a powerful tool for both individual and organizational learning. Among Federal agencies and their civilian contractors, for example, the Government Performance and Results Act (GPRA) and its mandate of activity-based management (ABM) are driving new ways of thinking about financial systems. In addition, competitive pressures are changing accepted definitions of organizational performance and measurement approaches.

The learning curves associated with changing ways of thinking and doing are reflected at the workgroup level within organizations at the Hanford site. Groups and individuals have different levels of experience and understanding of emerging performance measurement approaches. In addition, different organizations have been collecting performance metrics of varying quality. Many organizations engage in ongoing process definition or process change activities. Also, a wide variety of tools are in use that, again, vary at the workgroup level.

Taken together, it is unlikely that a single, monolithic performance management "system" will meet the needs of all the organizations at the Hanford site. It is also unlikely, given the site culture of disparate software applications, that a single system would be culturally acceptable. "Some managers have always been distrustful of the information systems approaches of their companies, largely because they didn't understand them. In many cases, they were right to feel uneasy. As the diverse company experiences suggest, grand IT schemes that don't match what rank-and-file users want simply won't work" [Davenport, p. 131].

Accordingly, a decentralized, evolutionary approach may be required. Instead of designing and building a monolithic site-wide system, SGML-based performance management could be built around individual information components, which would be modeled one dataset at a time. Initially, planning datasets are being explored. A strategic planning facilitation and information model has been analyzed and an SGML DTD has been developed for site Support Program Planning (SSPP) sheets.

Information management architectures that do not allow for change hamper the learning process that is inherent in most organizations, including the Hanford site. SGML is flexible enough to allow localized variations and its strong emphasis on usability and interchange makes it an effective mechanism for focusing attention on an organization's core information management issues. The following section describes SGML's ability to enable a holistic approach to information management, thus balancing engineering and organic models.

Using SGML-based approaches to balance mechanical and organic information management approaches

A critical challenge for any information management system is to balance the benefits of mechanical efficiency with the need to facilitate the learning and flexibility necessary for timely adaptation to change.

In most organizations today, both mechanical and organic approaches exist, but they are not balanced. Formalized entity-relationship models, structured databases, and standardized interfaces are in widespread use. Likewise, localized adaptation and redundancy are quite the norm. By some estimates, over 200 different commitment tracking systems are in use at the Hanford site. The numerous variations do not add value; instead they are carefully engineered suboptimizations that become barriers to information exchange. Likewise, most existing applications lack the flexibility necessary to support expected rates of organizational change.

SGML can add functionality to an engineered information management architecture by helping to integrate and augment a wide variety of information technologies and by protecting information from technology changes. In addition, the formal validation process allows the structures of individual sets of information to be tested for conformance. The standard is flexible enough to allow individual organizations to determine how stringent or forgiving the conformance testing should be. Rules-based formatting maximizes flexibility during information delivery while limiting redundancy and duplication.

SGML supports the decentralized autonomy of organic approaches by allowing local variations in document structures and naming conventions. Used with architectural forms and HyTime, SGML can standardize interactions with non-standard data types and data structures to help embrace the chao