<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet type="text/css" href="docbook.css"?>
<book>
<bookinfo>
<bookbiblio>
<maintitle pagenum="1">Getting started with SGML</maintitle>
<subtitle>A guide to the Standard Generalized Markup Language and its role in
information management</subtitle>
<authorgroup>
<corpauthor>ArborText, Inc.</corpauthor>
<othercredit>
<authorblurb>
<para>Ann Nobody, Michigan</para></authorblurb></othercredit></authorgroup>
<pubdate>18 June 2002</pubdate>
<abstract>
<para>As the world standard for textual information, SGML has gained
prominence in many industries. Hundreds of companies have adopted SGML and
thousands are considering it. If your organization produces a high volume of
technical or business information of significant value, and if that
information lends itself to a regular structure, then SGML probably offers
significant benefits to you and your organization.</para>
<para>This White Paper examines the factors that led to the development of
SGML, the basic knowledge you need to understand SGML, the reasons for
adopting SGML, lists those industries where SGML use is already widespread,
and lists resources for more information and
training.</para></abstract></bookbiblio></bookinfo>
<chapter>
<title pagenum="2">The Business Challenge</title>
<para revision="None" xrefLabel="None" role="None">The explosive success of
the Internet is an obvious example of an information revolution that's well
under way. Companies that realize the tremendous cost and value of
information management are reengineering their processes for creating,
distributing and accessing information. The opportunities in each of these
areas can be enormous:</para>
<sect1 status="final" label="docbook">
<title>Information Creation</title>
<para revision="None" xrefLabel="None" role="None">By some estimates, 20% of
our GNP is spent on generating new information. And over 90% of that
information is in documents, not databases. When was the last time you took a
close look at how much your organization invests in the creation of
information?</para>
<para>In conventional word processing and desktop publishing systems, your
authors spend up to 30% of their time searching for information, and another
30% of their time applying styles and squeezing paragraphs so that each
printed page looks nice. Plus, nearly every 18 months, technology changes
completely, so you're continually paying for data conversions as software and
hardware become obsolete.</para></sect1>
<sect1 status="final" label="docbook">
<title pagenum="3">Information Distribution</title>
<para revision="None" xrefLabel="None" role="None">A few years ago, you could
provide your information on paper alone. Then CD-ROM technology became
low-cost and widespread, so you've either already faced or soon expect to
face the massive re-publishing effort needed to make all your information
available electronically. And in just the last year, the World Wide Web has
thundered out of nowhere, creating yet another new format for your
information.</para>
<para>At the same time, your customers want your information tuned to their
needs: they don't want to wade through huge technical manuals that describe
all system variations and all possible uses for all possible users&mdash;they
want information tailored to their own needs, so they can get to it and use
it fast.</para></sect1>
<sect1 status="draft" label="xlink">
<title>XLink sample</title>
<a xmlns:xlink="http://www.w3.org/1999/xlink" id="xlink" name="xlink"
xlinkhref="http://www.w3.org/TR/2000/PR-xlink-20001220"
xlinktype="simple">XLink Recommendation</a></sect1>
<sect1 status="draft" label="xhtml">
<title pagenum="4">XHTML sample</title>
<div xmlns="http://www.w3.org/1999/xhtml">
<p><strong>Amaya</strong> is a Web client that acts both as a browser and as
an authoring tool. It has been designed by the <a
href="http://www.w3.org/">World Wide Web Consortium <acronym
title="World Wide Web Consortium">(W3C</acronym>)</a> with the primary
purpose of demonstrating new Web technologies in a What You See Is What You
Get (<acronym title="What You See Is What You Get">WYSIWYG</acronym>)
environment. The current version implements the Hypertext Markup Language
(<acronym title="Hypertext Markup Language">HTML</acronym>), Extensible
Hypertext Markup Language (<acronym
title="Extensible Hypertext Markup Language">XHTML</acronym>), Mathematical
Markup Language (<acronym
title="Mathematical Markup Language">MathML</acronym>), Scalable Vector
Graphics (<acronym title="Scalable Vector Graphics">SVG</acronym>), Cascaded
Stylesheets (<acronym title="Cascaded Stylesheets">CSS</acronym>),
and&nbsp;Hypertext Transfer Protocol (<acronym
title="Hypertext Transfer Protocol">HTTP</acronym>).</p>
</div>
</sect1>
<sect1 status="draft" label="svg">
<title>SVG sample</title><svg:svg xmlns:svg="http://www.w3.org/2000/svg"
     width="8cm" height="8cm" viewBox="0 0 220 230" version="1.1">
  <svg:polygon style="fill:red; stroke:blue; stroke-width:7"
           points="71,0 88,50 142,51 99,83 115,135 71,104 27,135 43,83 0,51 54,51"
           transform="translate(39,46)"/>
</svg:svg>
</sect1>
<sect1 status="draft" label="math">
<title>Math sample</title><math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>y</mi>
    <mo>=</mo>
    <mfrac>
      <mn>1</mn>
      <msqrt>
        <mrow>
          <msup>
            <mi>x</mi>
            <mn>2</mn>
          </msup>
          <mo>+</mo>
          <mn>1</mn>
        </mrow>
      </msqrt>
    </mfrac>
  </mrow>
</math></sect1>
<sect1 status="final" label="docbook">
<title pagenum="5">Information Access</title>
<para revision="None" xrefLabel="None" role="None">In the U.S. alone,
businesses produce 92 billion documents every year&mdash;and that number is
skyrocketing. Can your people easily access the information you create in
your own company? How about the information you receive from other
companies?</para>
<para>An organization's future can depend on how effectively it identifies,
manages, and uses its information. The latest thinking in information
management takes an enterprise-wide approach to the creation, distribution
and maintenance of information. Organizations that have taken this broad view
have realized enormous improvements in the cost, accuracy, timeliness,
accessibility, and variety of the information they create and use.</para>
<para>As part of this movement, companies in some industries are joining
together to develop standards for exchanging information with each other and
with their customers. Companies that keep up-to-date with these standards
will be able to do business more efficiently and compete more effectively in
global markets. This white paper describes how one such standard, the
Standard Generalized Markup Language (SGML), works as part of an overall
information management strategy.</para></sect1></chapter>
<chapter>
<title>Unleashing the Power of Information</title>
<para>Traditional documents and the methods for handling them suffer many
limitations. The printed document is often the result of a sophisticated
information process. Once it's printed, however, the document represents a
dead-end in the information flow because it has no link to the electronic
information base.</para>
<para>Raw data may start in the form of technical specifications or
engineering data. This information must be gathered, sorted, organized, and
then manually assembled into hard copy documents. With each step in the
documentation process, the information may have changed by mistake. The
further removed the result is from the original source of information, the
greater the risk of erroneous data. The problem can become so large that a
majority of documents go out of date as soon as they are printed.</para>
<para>A systematic approach to information management treats text and
graphics as part of an organization's electronic information base. This gives
everyone access to the information. By taking a broad view of the information
creation and delivery process, you can see documents as any composition of
information&mdash;the output from a database query, a printed document, an
on-line diagnostic manual, an illustrated parts catalog, a collection of
video clips, or a home page on the Internet's World Wide Web.</para>
<para>SGML allows you to manage information as data objects instead of
characters on a page. Rather than a stream of indistinguishable bits and
bytes, the data is 
<quote>chunked</quote> into identifiable discrete elements of information.
This technology enables you to store and reuse the information efficiently,
share it with many users, and maintain it in a database.</para></chapter>
<chapter>
<title>Getting to Know SGML</title>
<para>This white paper provides an introduction to existing SGML technology,
its advantages and benefits, as well as an overview of some related standards
and how they fit into an overall approach to managing information. We also
define some of the terminology and acronyms to familiarize you with the
language associated with SGML. While SGML is a fairly recent technology, the
use of 
<quote>markup</quote> in computer-based documents has existed for a while.
Let's first look at earlier markup schemes that led to SGML.</para>
<sect1>
<title>What is markup?</title>
<para>Markup is everything in a document that is not content. Markup
originally referred to the handwritten notations that a designer would add to
typewritten text; these notations contained instructions to a typesetter
about how to lay out the copy and what typeface to use. This kind of markup
is known as 
<firstterm>procedural markup</firstterm>.</para>
<sect2>
<title>Procedural markup</title>
<graphic entityref="markups"></graphic>
<para>Most electronic publishing systems today, such as word processing
software and desktop publishing software, use procedural markup. Procedural
markup is typically unique to a specific software package such as 
<trademark>Microsoft</trademark> Word and 
<trademark>Quark XPress</trademark>. Each has its own set of markup codes
that make sense only to itself. This markup usually takes the form of
formatting codes that are mixed in with the text of the document. Procedural
markup codes apply to a single way of presenting the information, such as a
printed page, and provide no capability to define appearance for other media,
such as CD-ROM and Internet.</para></sect2>
<sect2>
<title>Descriptive markup</title>
<graphic entityref="generic"></graphic>
<para>Descriptive markup, also known as 
<quote>generic markup,</quote> describes the purpose of the text in a
document, rather than its physical appearance on the page. The basic concept
of descriptive markup is that the content of a document should remain
separate from its style. Descriptive markup is based on the 
<firstterm>structure</firstterm> or 
<firstterm>content</firstterm> of a document and identifies elements
accordingly&mdash;such as a chapter, a section, or a table of
contents&mdash;using notations that describe what the element is, not how it
appears. By separating presentation information ( 
<foreignphrase>i.e.</foreignphrase>, style) from the structure and content,
descriptive markup allows for multiple presentations of the same information.
For example, you can publish on paper, on-line, on CD-ROM and on the World
Wide Web (Internet), all from the same set of source files with descriptive
markup.</para></sect2>
<sect2>
<title>Drawbacks of procedural markup</title>
<para>Producers of technical documentation increasingly prefer descriptive
markup over procedural markup. Procedural markup is tedious and expensive;
authors can spend 15% to 50% of their time on the appearance of each page. If
style guidelines change, or if you need to present the same information in a
different format, massive re-formatting is usually required. When a company
changes software or hardware systems, enormous data translation tasks arise,
often resulting in errors. Because procedural markup is tied to one final
printed product, you cannot change formats easily. Interchanging documents
based on procedural markup works easily only if both parties have the same
hardware and software system.</para></sect2></sect1>
<sect1>
<title>What is SGML?</title>
<para>The Standard Generalized Markup Language, or SGML, is an international
standard (ISO 8879) published in 1986. SGML prescribes a standard format for
embedding descriptive markup within a document. More importantly, and crucial
to its real value and power, SGML also specifies a standard method for
describing the structure of a document.</para>
<para>In other words, SGML allows you to set up structural rules for each
type of document you produce. SGML ensures that each element, which is
labeled with descriptive markup such as 
<quote>chapter,</quote> 
<quote>title,</quote> and 
<quote>paragraph,</quote> fits in the logical, predictable structure of your
document type.</para>
<para>SGML supports an infinite variety of document structures. Users
typically create a different document structure for each category of
information they produce: information bulletins, technical manuals, parts
catalogs, design specifications, reports, letters and memos.</para>
<para>SGML allows you to create documents that are independent of any
specific hardware or software. Since SGML documents conform to an
international standard, they are portable. You can exchange them seamlessly
with users who have different systems.</para>
<para>The world of photography demonstrates the power of standards: SGML is
to documents as standardized film speed is to cameras. Today you can purchase
a roll of film marked 
<quote>ISO 100,</quote> put the film in your camera, set the camera's film
speed to 100 (which many cameras do automatically), and you're ready to
shoot. You don't have to worry that the brand of film is not compatible with
your particular make of camera. The film and camera manufacturing
industries&mdash;through the International Organization for Standardization
(ISO) and American Standards Association (ASA)&mdash;have agreed on standards
for film speeds. Many industries plan to use SGML so that their documents
work as easily on different computers as film works in different
cameras.</para></sect1>
<sect1>
<title>How does SGML work?</title>
<para>To understand SGML we must look at the three layers of a typical
document: structure, content, and style. SGML separates these three aspects,
but deals mainly with the relationship between structure and content.</para>
<sect2>
<title>Structure</title>
<para>At the heart of an SGML application is a file called the 
<firstterm>DTD</firstterm>, or 
<firstterm>Document Type Definition</firstterm>. The DTD sets up the
structure of a document, much like a database schema describes the types of
information it handles. A DTD provides a framework for the types of elements
(such as chapters and chapter headings, sections, and topics) that constitute
a document.</para>
<para>A DTD also specifies rules for the relationships between elements; for
example, 
<quote>a chapter heading must be the first element after the start of a
chapter</quote>; or 
<quote>each list must contain at least two items.</quote> These rules, which
the DTD defines, help ensure that documents have a consistent, logical
structure. A DTD accompanies an SGML document wherever it goes. A 
<quote>document instance</quote> is a document whose content has been tagged
in conformance with a particular DTD.</para></sect2>
<sect2>
<title>Content</title>
<para>Content is the information itself: content includes titles, paragraphs,
lists, tables, graphics, and audio. The method for identifying the content's
position within the DTD structure is called 
<quote>tagging.</quote> Creating an SGML document involves inserting tags
around content. These tags mark the beginning and end of each part of the
structure and identify the type of contents they enclose. In the following
example, 
<sgmltag class="starttag">par</sgmltag> indicates the start of a paragraph,
and 
<sgmltag class="endtag">par</sgmltag> indicates the end of the paragraph: 
<programlisting>&lt;par&gt;Paragraph
content.&lt;/par&gt;</programlisting></para>
<para>You can nest elements within other elements; in the following example,
the paragraph ( 
<sgmltag class="element">par</sgmltag>) is an element within the topic ( 
<sgmltag class="element">topic</sgmltag>): 
<programlisting>&lt;topic&gt;&lt;par&gt;Content.&lt;/par&gt;&lt;/topic&gt;</programlisting></para>
<para>The structure of a particular document is revealed by the nesting of
tags: 
<programlisting>&lt;section&gt;&lt;subhead&gt;Content&lt;/subhead&gt;
&lt;par&gt;Content is the information
itself.&lt;/par&gt;&lt;/section&gt;</programlisting></para>
<para>Fortunately, human beings usually don't have to deal with manually
typing in tags and checking to make sure all the tags are there. Some
SGML-based authoring software programs make it easy to enter tags by clicking
on pull-down menus that guide you by listing only those tags that are valid
at the cursor's current position in the document. These programs rely on a
software module called a 
<quote>parser</quote> that verifies that the document follows the rules of
the DTD. (The parser also verifies that the DTD itself is structurally
correct.) The following illustration shows how an SGML-based authoring
program would display the tags for the previous ASCII example:</para>
<graphic entityref="sgmlexa"></graphic></sect2>
<sect2>
<title>Style</title>
<para>SGML itself has nothing to do with setting standards for style, so most
systems still rely on proprietary methods of setting style. It is the style
that determines the final appearance of the document information. Some
efforts are being made to develop standards-based style sheets; two of these
efforts have resulted in the mature OS standard and the still unreleased
DSSSL standard.</para>
<para>The U.S. Department of Defense CALS initiative developed its own style
standard, known as the Output Specification (OS). The OS is in the form of a
particular DTD that allows the user to create a Formatting Output
Specification Instance, or FOSI (usually pronounced 
<quote>fossy</quote>), that is well suited to both print and electronic
output.</para>
<para>A FOSI is essentially a powerful style sheet that specifies the
formatting for each tag in a DTD. With the FOSI, the document, and the DTD,
you have a complete interchange package for printed documents that maintains
its format and style as it is interchanged among systems. In early 1995, an
ISO committee released a draft of the Document Style Semantics and
Specification Language (DSSSL), which is on its way to becoming an
international standard for presenting SGML-based documents. Official release
is expected later this year.</para>
<para>The complete DSSSL standard covers a broad scope, so subsets are being
developed to handle varying levels of functionality. A subset whose
functionality is approximately equivalent to FOSIs is expected, and work on
tools to convert FOSIs to and from DSSSL is under way.</para>
<para>Many military contracts currently require FOSIs, and many non-defense
firms have also embraced the Department of Defense's OS standard because it's
a mature and supported standard. It is expected that both DSSSL and FOSIs
will remain important standards for the foreseeable
future.</para></sect2></sect1></chapter>
<chapter>
<title>What Does SGML Give Me?</title>
<para>SGML has become mainstream technology that you can use with confidence.
Your adoption of SGML will allow your organization to gain the maximum value
from your generation and use of information:</para>
<sect1>
<title>Increased productivity</title>
<para>A structured approach to documents helps writers organize the
information as they are creating it, and keeps content separate from style.
This separation enables you to set up centrally-controlled style guidelines,
so authors can focus on generating the content rather than adjusting each
document's appearance. That change alone can as much as double your authors'
productivity.</para>
<para>You can also improve efficiency by keeping a central information base
so that authors don't have to recreate the same information in order to use
it. This also ensures that the most current information is made available to
all. And, a single update to the information base ensures that all documents
created from that information base will automatically be
updated.</para></sect1>
<sect1>
<title>Reusability</title>
<para>A printed document is just one of many possible products from
SGML-based information. For example, a technical publications group can use
tags to identify a procedure as a sequence of steps. In this case, you
identify the beginning and end of the procedure, and each step within the
procedure. The same procedure can now appear in several forms: maintenance
and operational manuals, on-line technical manuals, training guides, etc.
More importantly, since the tags are machine-readable, the computer can
manage and maintain the many different uses of the same single source of
information, so no re-keying is required to produce this information in new
document formats.</para></sect1>
<sect1>
<title>Information longevity</title>
<para>SGML is a simple, standard file format with an indefinite shelf life;
you'll never again have to convert your documents when a hardware or software
system becomes obsolete. Once you setup your SGML information base, the
information will always be available, because it carries everything needed to
create a document. So even when your hardware or software becomes obsolete,
your information remains usable, portable, and available.</para></sect1>
<sect1>
<title>Improved data integrity</title>
<para>Defining a document's structure helps ensure that the right information
is in the right place, which improves the organization of your information.
Because SGML eliminates the need for data conversion when it passes across
systems, you reduce the risk of losing information by filtering data from one
format to another.</para></sect1>
<sect1>
<title>Better data control</title>
<para>With SGML, you can define and manipulate information elements at any
level of detail. A tagged element can have attributes that provide
characteristics or properties about the element. This attribute information
is useful for managing and manipulating the information elements. For
example, an ID (identifier) attribute can uniquely identify a single
paragraph, a whole section, a legal notice, an illustration, a task, or any
element that you may want to use repeatedly. The following example shows a
paragraph with an ID attribute: 
<programlisting>&lt;para
id=431&gt;Content.&lt;/para&gt;</programlisting></para>
<para>By simply referencing the ID, you can include this information into
your document in as many places as you need. This eliminates re-typing and
ensures that the information is identical in every instance.</para>
<para>Plus, the IDs you set are machine readable so that the computer can
find and link related information. This allows you to use IDs for a variety
of information management controls. These controls can help you: 
<itemizedlist>
<listitem>
<para>Manage the security of information by allowing only certain people to
view or change information with selected IDs.</para></listitem>
<listitem>
<para>Automate the information flow&mdash;for example, updating the data in
one place can trigger the update of the same information in other places
within the same document and in other
documents.</para></listitem></itemizedlist></para></sect1>
<sect1>
<title>Shareability</title>
<para>Since SGML is aware of the individual components of a document, you can
easily build entirely new documents out of existing information. This
capability enables users to share the latest information without duplicating
it. An example of this might be a standard legal notice or copyright
statement appearing in documents throughout a company. The legal department
maintains this module of information, updating it on occasion. A single tag
in your document can pull in the current legal notice each time you access or
output your document, eliminating needless duplication of information and
ensuring the accuracy of your information.</para></sect1>
<sect1>
<title>Portability of information</title>
<para>Today, information networks proliferate where different computers,
operating systems, and applications must share information. In these sort of
networks, portability becomes the key in making sure all who need it can
access the information. Thanks to the hardware and software independence of
SGML, you can easily exchange SGML documents among different
environments.</para></sect1>
<sect1>
<title>Flexibility beyond traditional publishing</title>
<para>The information you create today may be used a year from now in ways
you haven't yet anticipated. Just last year, the need to publish on the World
Wide Web did not even exist! The spectacular growth of the Web serves as
dramatic proof that we simply cannot anticipate all the purposes for which
our information may eventually be used.</para>
<para>SGML permits you to use your information for applications beyond
traditional publishing. For example: 
<itemizedlist>
<listitem>
<para>World Wide Web pages</para></listitem>
<listitem>
<para>information databases</para></listitem>
<listitem>
<para>diagnostic/expert systems</para></listitem>
<listitem>
<para>electronic mail</para></listitem>
<listitem>
<para>hypermedia and hypertext documents</para></listitem>
<listitem>
<para>database publishing</para></listitem>
<listitem>
<para>CD-ROM publishing</para></listitem>
<listitem>
<para>Interactive Electronic Technical Manuals (IETMs)</para></listitem>
<listitem>
<para>electronic
review</para></listitem></itemizedlist></para></sect1></chapter>
<chapter>
<title>Is SGML Right for Me?</title>
<para>In the life cycle of a product, the cost of gathering, producing, and
maintaining the necessary technical information can exceed the initial
hardware cost. For many industries, technical information is part of a
deliverable product, or a product in itself. Any industry whose product line
is heavily dependent on information can benefit from SGML.</para>
<para>In evaluating how SGML can help your organization, you may wish to
consider some strategic business issues to help in your information
management plan. A strategic approach should prompt you to examine your
current information needs and your current document management methodology.
Some questions to consider include: 
<itemizedlist>
<listitem>
<para>Does your information require a long life-span? (For example, technical
information related to airplanes often needs to be maintained for over 20
years.)</para></listitem>
<listitem>
<para>Do you need to exchange documents across mixed hardware
environments?</para></listitem>
<listitem>
<para>Do you need to produce large documents with a disciplined
structure?</para></listitem>
<listitem>
<para>Do your documents contain information common to other documents within
a department, across corporate divisions, or even across separate
organizations?</para></listitem>
<listitem>
<para>Do you have information that's used for different purposes? (For
example, a part number may appear in a maintenance manual as well as a parts
inventory database.)</para></listitem>
<listitem>
<para>Does your information change frequently and get used
often?</para></listitem>
<listitem>
<para>Do you produce information that needs to comply to industry or company
guidelines?</para></listitem></itemizedlist></para>
<para>By examining your requirements, you can evaluate how SGML fits into
your information management strategy. Standardizing on SGML doesn't mean you
need to use it for all documents; SGML is most useful for documents with a
definable structure. Since SGML handles documents as collections of
distinguishable data elements, it is useful to think in terms of modules of
information, rather than complete printed documents.</para>
<para>SGML is most useful as a tool in an integrated information management
strategy. Making such a strategic choice and planning the implementation
should be decided by a company's high-level management. There will be initial
implementation costs in moving to SGML. But the payback comes from benefits
that accrue over time and enhance your investment in information. Any
organization that exchanges information between systems, applications,
departments, and companies will realize these benefits.</para></chapter>
<chapter>
<title>What Is a Good SGML System?</title>
<para>By design, SGML applications are meant to be customized. Just as
there's no out-of-box database application that can serve all the needs of an
organization, there are no one-size-fits-all SGML application. Since each
organization's information requirements are different, there are many DTDs.
More organizations are also looking at industry-wide information needs and
developing standards for handling that information.</para>
<para>A number of products on the market handle SGML to some degree. But not
all products handle all the features of the SGML standard. The sections that
follow describe some basic requirements.</para>
<sect1>
<title>Provides real-time interactive parsing</title>
<para>An invaluable feature in an SGML system is real-time, interactive SGML
validation. This feature allows the software to provide context-sensitive
editing assistance based on the cursor's current position in the document.
For example, if the cursor is immediately after the beginning tag for a
section, and all sections must have a section heading, the software allows
you to insert only a section heading tag. This feature ensures that the
author does the correct tagging at all times which ensures that the author
creates a valid SGML document the first time.</para>
<para>By contrast, systems that use batch parsing allow authors to insert
tags and text without checking each action against the DTD. In this approach,
authors create documents in one format, then filter parts of the document
into SGML, and then run the SGML through a validating parser. When the parser
finds errors, the author must correct the original document, then filter and
parse the changes again. The author must repeat this cycle until the entire
document parses successfully. This approach adds steps to the publishing
process that add no value. Time saved by authoring in a familiar format is
lost in the filtering and validating process. A system that creates native
SGML information eliminates the costly, time-consuming, and often error-prone
process of retrofitting documents into valid SGML.</para></sect1>
<sect1>
<title>Uses real SGML</title>
<para>If your authoring software merely produces SGML as output, then your
information is still tied to a proprietary format, and still at the mercy of
software and hardware obsolescence. A publishing system that uses SGML as its
native file format allows your information to remain accessible and usable
regardless of hardware and software changes. If you need your information to
remain accessible as you grow into new systems and new technologies then
using a native SGML file format provides a distinct advantage over a system
that filters the data into SGML. Here's an acid test to identify a real SGML
system: can the software accept any SGML document, display that document, and
then save that document, leaving it unchanged?</para></sect1>
<sect1>
<title>Supports any DTD</title>
<para>To be fully usable, a good SGML product allows you to create a variety
of new document types in addition to accepting existing DTDs used in some
industries. This feature is sometimes called the ability to handle 
<firstterm>arbitrary</firstterm> or user-defined DTDs. With arbitrary DTDs
you are free to create any document type.</para></sect1>
<sect1>
<title>Supports SGML features</title>
<para>The developers of SGML built into the standard a number of features
that facilitate automated publishing and document reuse. A fully-featured
SGML publishing package should support this functionality. Some of the basic
features to look for include: 
<itemizedlist>
<listitem>
<para>
<firstterm>Marked sections.</firstterm> Marked sections let you create
multiple versions from a single master document using regions of conditional
text that only appear in specified versions. For example, you might want to
build a single source document that describes two variations of your product.
You simply write the source document with marked sections for the areas that
differ. The system can then identify these areas and produce two different
versions of your information from the same source file.</para></listitem>
<listitem>
<para>
<firstterm>External file entities.</firstterm> A file entity is simply a
pointer to a separate document file. You can use file entities to break a
large document into subdocuments. You can also use a file entity to reference
frequently repeated boilerplate information such as an electrical
caution.</para></listitem>
<listitem>
<para>
<firstterm>Graphic entities.</firstterm> A graphic entity is a pointer to a
separate graphic file.</para></listitem>
<listitem>
<para>
<firstterm>Text entities.</firstterm> A text entity is a single tag that
represents a common phrase repeated throughout a document. This allows you to
reference the tag instead of re-keying the phrase each time you need to use
it.</para></listitem></itemizedlist></para></sect1></chapter>
<chapter>
<title>Who Uses SGML Now?</title>
<para>Early in its history, the primary adopters of SGML were defense
contractors. In the last two years, however, the trickle of commercial users
has turned into a torrent. Many leading industrial groups recognize the
benefits SGML offers and have adopted it for information management and
exchange among their members, and between members and their vendors and
customers.</para>
<para>Several industries have developed standards for information
exchange:</para>
<variablelist>
<varlistentry>
<term>AAP</term>
<listitem>
<para>The American Association of Publishers developed The American National
Standard for Electronic Manuscript Preparation and Markup, a general purpose
book DTD for publishers, authors and editors.</para></listitem></varlistentry>
<varlistentry>
<term>ATA (airlines)</term>
<listitem>
<para>The Air Transport Association, a consortium representing the commercial
airline industry, developed several DTDs under the ATA-100 specification. The
ATA's European counterpart, AECMA, is also adopting standards based on
SGML.</para></listitem></varlistentry>
<varlistentry>
<term>ATA (trucking)</term>
<listitem>
<para>The Maintenance Council of the American Trucking Association has
initiated a task force with the mission of 
<quote>Establishing the Standard for Electronic Service Information.</quote>
This task force represents large truck manufacturers and fleet operators
interested in standardizing the interchange of service information, and they
are developing the T2008 DTD, modeled after the SAE's J2008 DTD for
automobiles and light trucks. The first release of the standard is expected
in 1996.</para></listitem></varlistentry>
<varlistentry>
<term>DocBook</term>
<listitem>
<para>Founded by ten major producers and consumers of technical documentation
for computer systems, the Davenport Group has developed the DocBook DTD for
exchanging and delivering computer documentation. Founding members included
Novell, O'Reilly &amp; Associates, Fujitsu OSSI, Hewlett-Packard, Digital
Equipment Corporation, SCO, Hal Computer Systems, Hitachi Computer Products,
SunSoft and Unisys.</para></listitem></varlistentry>
<varlistentry>
<term>DoD</term>
<listitem>
<para>The U.S. Department of Defense created the Continuous Acquisition and
Life-Cycle Support (CALS) initiative (recently renamed from Computer-aided
Acquisition and Logistic Support). The next section describes CALS in more
detail.</para></listitem></varlistentry>
<varlistentry>
<term>Pinnacles</term>
<listitem>
<para>Led by Intel, National Semiconductor, Texas Instruments, Phillips, and
Hitachi, the Pinnacles Group is developing the Pinnacles Component
Information Standard (PCIS) to allow reusability of component data by
semiconductor customers and vendors. This data can include descriptions,
specifications, physical diagrams, code fragments, behavior models, and other
text, tables, graphics, and technical data.</para></listitem></varlistentry>
<varlistentry>
<term>SAE</term>
<listitem>
<para>The Society of Automotive Engineers is developing the J2008 DTD for
electronic interchange of service and diagnostic information. The J2008 Task
Force is part of the Vehicle Electronic/Electrical Systems Committee, whose
mission is to increase customer satisfaction and lower product life cycle
costs by recommending standards that promote more effective diagnosis of
vehicle systems. The DTD is expected to be released for approval as a
Technical Draft Standard in 1995. After three years, it will be voted upon
again to determine if it should become a Recommended
Practice.</para></listitem></varlistentry>
<varlistentry>
<term>TCIF</term>
<listitem>
<para>The Telecommunications Industry Forum is an international association
of carriers and major vendors of telecommunications products and services.
The TCIF initiative is focused on the re-use of technical information across
multiple applications and different
environments.</para></listitem></varlistentry></variablelist>
<para>Many SGML applications are in commercial use. Other industries moving
to SGML include pharmaceuticals, publishing, and manufacturing.</para>
<para>Overseas, SGML is gaining wide acceptance. The European Airbus, a
consortium of companies in the commercial airline industry in Europe, adopted
SGML. Telecommunications, aerospace, manufacturing, and other commercial and
military interests throughout Europe are also using SGML.</para></chapter>
<chapter>
<title>What Is CALS?</title>
<para>CALS stands for Continuous Acquisition and Life-Cycle Support (recently
renamed from Computer-aided Acquisition and Logistic Support). It is a
large-scale, long-term information management project initiated by the U.S.
Department of Defense (DoD). Since the DoD receives goods and services from a
wide range of suppliers, contractors and subcontractors, it constantly
handles massive quantities of technical information. Today's weapon systems
are technologically complex and can have a life span of 20 years or more. As
a result, the amount of technical data needed to support and maintain these
systems is overwhelming.</para>
<para>The CALS standards that apply to maintaining technical information
include: 
<itemizedlist>
<listitem>
<para>MIL-STD-1840: The Automated Interchange of Technical Information: this
is the umbrella standard specifying overall guidelines for electronic data
storage and exchange of CALS documents on magnetic tape.</para></listitem>
<listitem>
<para>MIL-M-28001: SGML (Standard Generalized Markup Language) for exchanging
text.</para></listitem>
<listitem>
<para>MIL-D-28000 IGES (Initial Graphics Exchange Specification) an
object-oriented format for technical drawings.</para></listitem>
<listitem>
<para>MIL-R-28002 CCITT Group 4 (International Consultative Committee on
Telephony and Telegraphy) for raster images.</para></listitem>
<listitem>
<para>MIL-D-28003 CGM (Computer Graphics Metafile) for object-oriented
graphics.</para></listitem></itemizedlist></para></chapter>
<chapter>
<title>Resources</title>
<para>Here are a few resources for more information on SGML.</para>
<sect1>
<title>Conferences, tutorials, and training</title>
<para>The Graphic Communications Association (GCA) was instrumental in the
development of SGML. The GCA provides conferences, tutorials, newsletters,
and publication sales for both members and non-members. 
<literallayout>Graphic Communications Association 100 Daingerfield Road
Alexandria, Virginia 22314&ndash;2804 USA +1
703.519.8160</literallayout></para>
<para>SGML Open is a non-profit, international consortium of providers of
SGML products and services dedicated to accelerating the further adoption,
application, and implementation of SGML. 
<literallayout>SGML Open 910 Beaver Grade Road, #3008 Coraopolis,
Pennsylvania 15108 USA +1 412.264.4258</literallayout></para>
<para>ArborText also offers a range of introductory to advanced level SGML
training courses, including DTD and FOSI training. For further information on
ArborText's training services, schedules, and course descriptions, please
contact ATI's Training Team at +1 313.996.3566.</para>
<bridgehead>Books on SGML</bridgehead>
<para>
<citation>SGML: An Author's Guide to the Standard Generalized Markup
Language</citation>, Martin Bryan, Addison-Wesley, 1988, ISBN
0&ndash;201&ndash;17537&ndash;5</para>
<para>
<citation>The SGML Handbook</citation>, Charles Goldfarb, Oxford University
Press, 1990, ISBN 0&ndash;19&ndash;863737&ndash;9</para>
<para>
<citation>Practical SGML</citation>, Eric van Herwijnen, Kluwer Academic
Publishers, 1994, ISBN
0&ndash;7923&ndash;9434&ndash;8</para></sect1></chapter>
<glossary>
<title>Glossary</title>
<glossentry>
<glossterm>ASCII</glossterm>
<glossdef>
<para>(American Standard Code for Information Interchange) This standard
character encoding scheme is used extensively in data
transmission.</para></glossdef></glossentry>
<glossentry>
<glossterm>ANSI</glossterm>
<glossdef>
<para>(American National Standards Institute) This group is the U.S. member
organization that belongs to the ISO, the International Organization for
Standardization.</para></glossdef></glossentry>
<glossentry>
<glossterm>attribute</glossterm>
<glossdef>
<para>An attribute provides more information about an element such as
classification level, unique reference identifiers, or formatting
information.</para></glossdef></glossentry>
<glossentry>
<glossterm>CCITT Group 4</glossterm>
<glossdef>
<para>(International Consultative Committee on Telegraphy and Telephony) This
CALS standard for raster graphics incorporates tiling, which divides a large
image into smaller tiles. You can exchange graphic files in CCITT/4 format in
a compressed state so they take up much less file
space.</para></glossdef></glossentry>
<glossentry>
<glossterm>CITIS</glossterm>
<glossdef>
<para>(Contractor Integrated Technical Information Service) As part of CALS
Phase II, CITIS is a draft functional specification for services. DoD
acquisition managers designed CITIS as a plan to gain access to
product-related digital technical information.</para></glossdef></glossentry>
<glossentry>
<glossterm>CGM</glossterm>
<glossdef>
<para>(Computer Graphics Metafile) CGM is one of the CALS standard formats
for representing 2&ndash;D technical illustrations. CGM is an object-oriented
graphic format.</para></glossdef></glossentry>
<glossentry>
<glossterm>DSSSL</glossterm>
<glossdef>
<para>(Document Style Semantics and Specification Language) This draft
international standard (DIS 10179) applies to the specification of processing
information for SGML documents. DSSSL is expected to became an international
standard.</para></glossdef></glossentry>
<glossentry>
<glossterm>DTD</glossterm>
<glossdef>
<para>(Document Type Definition) A DTD is the formal definition of the
elements, structures, and rules for marking up a given type of SGML document.
You can store a DTD at the beginning of a document or externally in a
separate file.</para></glossdef></glossentry>
<glossentry>
<glossterm>EDI</glossterm>
<glossdef>
<para>(Electronic Data Interchange) This is a set of computer interchange
standards for business documents such as invoices, bills, and purchase
orders.</para></glossdef></glossentry>
<glossentry>
<glossterm>element</glossterm>
<glossdef>
<para>An element is a piece of data within a document that may contain either
text or other subelements such as a paragraph, a chapter, and so
on.</para></glossdef></glossentry>
<glossentry>
<glossterm>element declaration</glossterm>
<glossdef>
<para>A statement in the DTD defining an element and declaring the order in
which it may appear in the document and what other elements it may
include.</para></glossdef></glossentry>
<glossentry>
<glossterm>entity</glossterm>
<glossdef>
<para>An entity is a self-contained piece of data that can be referenced as a
unit. You can refer to an entity by a symbolic name in the DTD or the
document. An entity can be a string of characters, a symbol character
(unavailable on a standard keyboard), a separate text file, or a separate
graphic file.</para></glossdef></glossentry>
<glossentry>
<glossterm>entity declaration</glossterm>
<glossdef>
<para>A statement in the DTD or document that assigns an SGML name to an
entity so you can reference it.</para></glossdef></glossentry>
<glossentry>
<glossterm>FOSI</glossterm>
<glossdef>
<para>(Formatting Output Specification Instance) A FOSI is used for
formatting SGML documents for printing and other outputs. It is a separate
file that contains formatting information for each element in a
document.</para></glossdef></glossentry>
<glossentry>
<glossterm>HTML</glossterm>
<glossdef>
<para>(HyperText Markup Language) This is the format of files published on
the World Wide Web. HTML is an application of SGML; to author in HTML using
SGML-based authoring software, you simply need the HTML
DTD.</para></glossdef></glossentry>
<glossentry>
<glossterm>IGES</glossterm>
<glossdef>
<para>(Initial Graphics Exchange Specification) The IGES standard for
engineering, product design, and manufacturing drawings is one of the CALS
standard graphics formats.</para></glossdef></glossentry>
<glossentry>
<glossterm>Internet</glossterm>
<glossdef>
<para>The Internet is a worldwide communications network originally developed
by the U.S. Department of Defense as a distributed system with no single
point of failure. The Internet has seen an explosion in commercial use since
the development of easy-to-use software for accessing the
Internet.</para></glossdef></glossentry>
<glossentry>
<glossterm>ISO</glossterm>
<glossdef>
<para>(International Organization for Standardization) The ISO is an
industry-supported organization that establishes worldwide standards for
everything from data interchange formats to film speed
specifications.</para></glossdef></glossentry>
<glossentry>
<glossterm>markup</glossterm>
<glossdef>
<para>Markup is anything added to the content of the document that describes
the text.</para></glossdef></glossentry>
<glossentry>
<glossterm>parser</glossterm>
<glossdef>
<para>A parser is a specialized software program that recognizes SGML markup
in a document. A parser that reads a DTD and checks and reports on markup
errors is a validating SGML parser. A parser can be built into an SGML editor
to prevent incorrect tagging and to check whether a document contains all the
required elements.</para></glossdef></glossentry>
<glossentry>
<glossterm>PDES/STEP</glossterm>
<glossdef>
<para>(Product Data Exchange Standard/Standard for the Exchange of Product
Model Data). PDES/STEP are standards under development for communicating a
complete product model with sufficient information content that advanced
CAD/CAM applications can interpret. PDES is under development as a national
standard and STEP is under development as its international
counterpart.</para></glossdef></glossentry>
<glossentry>
<glossterm>tag</glossterm>
<glossdef>
<para>In the world of SGML, a tag is a marker embedded in a document that
indicates the purpose or function of the element. Each element has a
beginning tag and an end tag.</para></glossdef></glossentry>
<glossentry>
<glossterm>World Wide Web</glossterm>
<glossdef>
<para>Often referred to as WWW or the Web, this usually refers to information
available on the Internet that can be easily accessed with software usually
called a 
<quote>browser.</quote> Organizations publish their information on the Web in
a format known as HTML; this information is usually referred to as their 
<quote>home page</quote> or 
<quote>web site</quote>.</para></glossdef></glossentry></glossary></book>
