
|
| UIEE.com |
Universal Information Exchange Environment
|
|
UIEE © Version 2.44 Functional
Specification
By Thomas A.
Sawyer, TAS Software Innovations
© Copyright 1989-2003 Thomas A. Sawyer/TAS
Software Innovations. This specification offers no guarantee
with respect to fitness, accuracy, or suitability for any
application. TAS Software disclaims any responsibility for
liability incurred from the use of the information presented
herein. This specification imparts no rights of reproduction,
redistribution, or proprietary usage, either expressed or
implied. Misrepresentation by any person or company claiming
ownership, authorship, or copyright of any part of this
specification is in violation of Federal Copyright laws, and
will be prosecuted.
The UIEE specification has been
placed in the public domain in the year 2003 to serve as an
authoritative resource for reference and development purposes,
and to provide a common set of rules and standards for
programmers writing compatible assemblers and disassemblers.
This specification is actively maintained. Developers are
directed NOT to make any alterations to this specification in
their proprietary software, since doing so defeats the purpose
of the specification and simply creates problems for everyone.
Instead, developers are strongly encouraged to contact
us directly for answers to questions, to report errors or
omissions, or to issue requests for updates to the
specification.
TAS Software provides a range of
products and services for companies requiring fast and
efficient UIEE assembly and disassembly. Click
here for information regarding UIEE file conversion products.
Introduction
This specification
describes the structure and application of UIEE (Universal
Information Exchange Environment), a file format and
protocol for exchanging data between dissimilar information
systems. The format allows both text and object data to be
reconstructed according to the requirements of the receiving
system, while retaining the original integrity of both the
data and the record structure. This specification supports
UIEE version 2.44.
Originally created in 1989, UIEE was
adopted as the standardized means of information exchange by
the out-of-print (OP) book trade in the 1990's, in particular
by Interloc Bibliofind Advanced Book Exchange Alibris
Amazon and others. Its adoption has served to remove
virtually all of the problems associated with delimited files,
and has helped to jump-start an important segment of the
Internet economy. Developers are encouraged to use the UIEE
protocol, but only as specified proprietary alteration of
this specification is expressly prohibited.
The intent
of UIEE is to be able to extract and assemble a complete
record from a user's own database, send the record to a remote
location, then receive the record back again in a form such
that the field information is retained and all components are
present and intact. A properly-defined UIEE record should be
capable of being imported back into the original database such
that all of the record's original content and characteristics
are restored. To this extent, UIEE is self-defining and does
not suffer from the traditional problems resulting from
delimited files, which traditionally rely upon positional
significance or substantial user interaction to define how
data is treated when imported or exported.
This
specification provides a basis for on-going development. UIEE
Version 2.44 presents a protocol for handling both text data
and specialized information treated as embedded objects, with
particular attention paid to embedded images. New rules
govern the ability for files to be embedded directly within
the UIEE file at the sending end, then subsequently extracted
intact at the receiving end without data loss. Automatic
extraction instructions are contained in the text portion of
the UIEE file. There are no restrictions with respect to the
types of files that can be embedded.
UIEE contains data
records whose text fields are pre-defined through
tokens. A token is a two-character code which precedes
a field, separated by a vertical bar ( | ) character. The
complete collection of tokens used for a particular
application is referred to as a Token Set. There are
presently five defined (5) token sets, which can be viewed in
the Tables
section.
UIEE Philosophy and
Objectives
Users should not have to be
software engineers to exchange data with one another or with
external systems. The generally accepted approach to
integrating an image into a database is for the address of the
image to serve as the object reference. This could be (for
example) a path on a disk drive, a URL, or even an embedded
bitmap in a rich-text document. This is fine as long as the
data resides on the user's computer, or retains its static
characteristics through compound addressing.
The
problem with this approach is that once the data has left the
user's computer, the addresses change. A remote record is no
longer a mirror image of the original record, but rather a
complex hybrid residing at a remote location, or even multiple
locations. Reassembly becomes very difficult or even
impossible. Each component retains its original identity by
name only because the exchange process itself imposes changes
upon the record structure.
Many applications have been
produced by companies that perform the transmission of both
text and object data. However, records are often treated as
proprietary output in the form of HTML or XML, which may not
be usable by other information providers, and certainly not by
most users. For example, should the user's database be
damaged, requiring a re-import of the original records, there
is a low probability that the record structure will be
preserved by reading an HTML or XML file sent back from the
destination site. Some upload software produced by third-party
companies effectively manages text and images, but in
practice, these are often tied into proprietary processes
handled through a dedicated user login channel. Information is
not "exchanged" as much as "piped" into dedicated locations
with the expectation that data will appear in a certain way,
according to a predefined template.
To compound
matters, users must frequently upload images by hand as
individually-named files despite the software's
sophistication, and this can take a vast amount of time. To do
anything else useful with their records, users must export
data to an intermediary form on their own computer (such as an
Excel spreadsheet or Access database). To say that the process
of HTML-to-text-to-Spreadsheet and vice versa (with associated
images) is cumbersome would be the pinnacle of understatement.
Strangely, this has become the accepted industry norm. People
have gotten used to it because "that's the way it's done" even
though every company does it slightly differently. There is a
distinct lack of standardization.
There are file format
inconsistencies as well. Many of Microsoft's products, for
example, do not strip out delimiter characters from exported
text, making the file impossible to read by another computer.
Excel spreadsheet files, often used as an intermediary medium,
are not foolproof and the steps required to import them are
too complex for many users to follow. Clearly, a better
solution is needed.
It is the humble opinion of this
author that the industry norm of exchanging text on a
delimited basis, and then subsequently handling attachments
(particularly images) on a manual basis as separate files
imposes a cumbersome burden upon both users and the systems
that support their activities. UIEE attempts to provide a
satisfactory alternative by bundling related information in a
form such that the individual components retain their
identities, directly associated with the text forming the
basis of the database records. Aside from the obvious
convenience of not having to deal with multiple elements to
create a single record, the other advantage of this approach
is that the record retains its original characteristics and
can be re-imported at a later date without suffering data loss
or alteration.
It was for these reasons that UIEE was
originally created, and why it has now been updated. It is the
objective of this specification to present a comprehensive set
of standards which will move the development of UIEE forward
to meet the needs of the market in which it is in common use
today, and thus increase the likelihood of productive use by
all who rely upon it to serve their personal and professional
needs.
What's New in UIEE
2.44
UIEE's simple goal is to make life
easier, both for end users and the systems that support their
activities. UIEE 2.44 still provides all of the same text-only
functionality as previously existed, and as such none of the
basic rules have changed. However, new rules have been added
to support different types of information, while
simultaneously retaining downward compatibility with existing
UIEE-compatible systems to the furthest possible
extent.
Particular emphasis has been given to images,
because it is expected that these will be the most
commonly-used objects that will be associated with database
records. That said, it is important to understand that any
type of object can now be embedded in a UIEE file. These
include multimedia files, HTML documents, ZIP files, even
executable files. This is a very powerful
format.
Objects can reside within the UIEE file, or
they can be remotely stored and accessed via embedded HTTP
addresses, or they can be directly retrieved by the remote
server for inclusion in the destination database. Obviously,
the manner in which embedded objects are handled will vary
according to the needs of the systems which receive and
process the information. However, the behavioral aspects of
UIEE have also been updated to provide different situational
control over how an embedded object is treated.
New
Token Sets have been defined to support multiple listing and
sales channels. These are described in the Tables
section.
In addition, new tokens have been added to
each token set to provide more universal support for document
assembly and retrieval. For example, A new Language
field is now a standard component of all UIEE records. The
field provides native support for the ISO 639-x language codes
as defined in the MARC specification. It is hoped the use of
this field will be adopted by the many services who make use
of UIEE, in order to make it easier for both users and systems
to index and retrieve records with different language
representations.
Comparison Of
UIEE File Formats
The text portion of both
old and new UIEE file formats are essentially the same.
Pre-2.44 disassemblers can read 2.44 files, with the exception
that they will ignore the new data that has been added. A
comparison of the old and new formats is shown below:

Each UIEE
2.44 file contains one, two, or three components, always
appearing in this order:
Text data
Pointer data
Binary Data
UIEE files containing embedded objects
will always contain all three components. Should the pointers
contain only URL addresses, then the pointer portion of the
file is present but the binary portion is absent. Should there
be no associated objects, then both the pointer and binary
portions of the file are absent.
By definition, the
embedded pointers are relative to the start of the binary
data, not the start of the UIEE file. This was done to
preserve the ability to edit the text portion of the UIEE data
without disrupting the binary positioning, or to separate the
two and then re-combine them after the fact. It also provides
an absolute means of detecting the binary data boundary,
through the use of the character (ASCII 26). This
approach also insures that old UIEE parsers can always
continue to read UIEE 2.44 files, the only difference being
that any embedded data will be ignored by the old
parser.
UIEE File
Header
The UIEE file header consists of five
lines of text which are common to all versions of
UIEE:
Line 1 — User Id
Line 2 — Token Set
Line 3 — Date (MM-DD-YYYY)
Line 4 — Time (HH:MM:SS)
Line 5 — Blank [CR] [LF]
Line 5 denotes start of
first UIEE text record. Note that blank lines should consist
of a [CR] [LF] sequence. Although other EOL conventions exist
for different operating systems, this sequence is the only one
that satisfies all functional criteria for all systems, and is
therefore recommended.
Line 2 represents the Token
Set being used. To preserve downward compatibility with
older UIEE versions, this should normally be the word "BOOKS."
However, in UIEE 2.44, the following expanded token sets are
defined (four standard and one non-standard):
ANTIQUES: Indicates general Antiques or
Collectibles database. Records are parsed for minimum
conformity (generally Record Number, Title, Description,
Price, and Listing Codes). Tokens are based on standard
ANTIQUES field assignments (see Tables
section).
AUCTION: Indicates all records in the
file are auction records and are thus given special parsing to
insure conformity to auction venue rules. For example, if
directed to eBay then all records must contain a valid eBay
User Id, opening bid price, reserve price, e-mail address,
category code, etc. Tokens are based on standard AUCTION field
assignments (see Tables
section).
BOOKS: Normal default, refers to
general books database, new or used. Records are parsed only
for minimum conformity (generally Record Number, Title, Price,
and Listing Codes). Tokens are based on standard BOOKS field
assignments (see Tables
section).
RETAIL: Indicates general retail
merchandise. Records are parsed for minimum conformity
(generally Record Number, Title, Price, and Listing Codes).
Records in this category are intended for inclusion in retail
databases, and as such much specify a pointer to one or more
destinations (such as Half.com and similar vendors). Tokens
are based on standard RETAIL field assignments (see Tables
section).
CUSTOM: Indicates self-defining usage.
No parsing is performed. All tokens are alphanumeric and field
assignments are based on the requirements of the destination
database. Normally only used for test purposes or for special
cases requiring unusual or proprietary field assignments (see
Tables
section).
Thus, there are five distinct sets of
UIEE tokens currently defined, each set associated with a
particular usage. Token parsing is also different for each
usage. This is an important consideration when making a design
decision with respect to how to handle the disassembled files
at the server side.
For production environments, the
recommended procedure is to establish an Identity File
and associate it with a dedicated program designed to handle
all UIEE functionality, such as TAS Software's UIEE
2.44 Distributor or other program that both disassembles
and distributes files according to the needs of the receiving
system. The Identity File tells the disassembler what to do
with the information it extracts and the locations in which it
should reside. This file also provides the system
administrator with a direct means of configuring each user's
requirements to support the system(s) in which the information
will be applied. An Identity File is similar to an INF file
and is described later in this
specification.
Basic UIEE Text
Record Structure
Each UIEE text record
contains a variable number of lines, each of which is preceded
by a two-character prefix and a vertical bar or "pipe" symbol
( | ). A blank line [CR] [LF] delimits the end of each text
record (see Prefix Code table for specific usage of each
field). Thus, the basic record structure is:
TD|(text
data) TD|(text data) TD|(text
data) : : TD|(text
data) [CR][LF]
where "TD" is any
2-character token representing text data. Any text appearing
on a separate line that does NOT contain a pipe symbol in the
3rd character position is ignored. Thus, any
non-tokenized text, regardless of where it occurs, is treated
as a remark only and not part of the UIEE text data. For
example:
TD|(text
data) TD|(text data) This line will be
ignored TD|(text data) : : TD|(text
data) [CR][LF]
A consecutive token
appearing on the next line following the same token is treated
as an extension of the previous line, separated by a space
character. In the example below, a Comments field appears in
two lines:
TD|(text
data) TD|(text data) NC|This line contains text
normally NC|appearing in the Comments field TD|(text
data) : : TD|(text
data) [CR][LF]
A common mistake made
when constructing a UIEE text record is to wrap text to the
next line without including the identifying token. Thus, the
following example would not be read correctly and only the
first five words in the Comments field would be parsed because
the 2nd token was missing:
TD|(text
data) TD|(text data) NC|This line contains text
normally appearing in the Comments field TD|(text
data) : : TD|(text
data) [CR][LF]
Line Lengths and Wrapping
Lines can
be of any length. However, it is recommended that the software
creating the UIEE files should wrap the text such that it can
be easily viewed in any editor for debugging purposes, as
shown below:
TD|(text
data) TD|(text data) NC|Four score and seven years
ago our fathers brought NC|forth on this continent a new
nation, conceived in NC|Liberty, and dedicated to the
proposition that all NC|men are created
equal. TD|(text data) : : TD|(text
data) [CR][LF]
The recommended wrap
length is 70 characters, but it can occur at any desired line
length. Wrapping should always occur at the point a space
character (ASCII 32) appears in the text. If no space
character is present, the line should not be
wrapped.
Text Record Parsing
and Field Sequencing
Each token set contains
certain elements which must always be present. For the
purposes of this specification, the BOOKS token set is used to
serve as examples. However, it is important to keep in mind
that the rules governing token sets are specific to the
particular set being used, and the parsing rules are slightly
different for each set (see Tables
section).
For example, in the BOOKS set, the following
tokens must always appear, and should appear in this
order:
UR or RE User Record Number (UR preferred, RE
supported for compatibility). TI Book
Title (remaining fields appear in any order) PR
Price (parsed in For-Sale records only). XA
Lifespan XB Action Code XC Family
Code XD Database Code
An example of a
text record which follows this format is shown below:
UR|MYBOOKS000552 TI|The Missions of
New Mexico, 1776 AA|Adams, Eleanor B. and Chavez,
Angelico CN|Very fine w/fine
dj PP|Albuquerque DP|1956 NC|Well-preserved,
excellent binding, good color in dj. MT|American
History KE|Architecture KE|New
Mexico KE|Religion KE|Western
Americana LG|eng WT|18 PR|145.00 XA|4 XB|1 XC|BO XD|S [CR][LF]
In
the above example, the UIEE record starts with the UR (User
Record Number) tag. This is the standard convention.
Subsequently, the remaining fields can appear up to the first
listing code field (XA) in any order. Thus, it is always
preferable to place the record number first and the listing
codes last for easier inspection and validation.
The
listing codes for this particular record indicate that it has
an unlimited lifespan (XA=4), it is a new or replacement
record (XB=1), is a member of the "Books" family (XC=BO) and
is to be entered in the For-Sale database (XD=S).
In
UIEE 2.44, this record may have objects associated with it.
However, note that there is no such indication at this point.
It is not until the disassembler encounters a PD (pointer
data) token that this fact becomes relevant. Hence, downward
compatibility is preserved for older UIEE parsers — the
remaining data appearing at the end of the file can be
discarded if the receiving system is not capable of
disassembling it.
Note: This example also
contains a Language (LG) field. The Language field has always
been a part of the UIEE specification, but it has been treated
as optional in the past. However, the 2.44 specification
strongly recommends that a Language field should always
be included as part of every UIEE record, whether or
not the receiving system will make use of it. The code
contained in this field corresponds to the ISO-639/2
List of Recommended Language Identifiers, as per the MARC
specification. Either the ISO-639-1 (two-character codes) or
ISO-639-2 (three-character codes) may be used, with preference
given to the ISO-639-2/T three-character code set (see Tables
section).
Allowed ASCII
Characters
The only ASCII codes for which
specific restrictions exist are control characters (ASCII 0
through 31). These cannot appear as part of any text field. A
properly-constructed UIEE parser will strip these characters
from text before passing it along as output. In addition, it
is recommended that any spurious pipe ( | ) symbols appearing
in text should be stripped as well, to avoid possible problems
downstream.
Generally speaking, it is recommended that
only ASCII 32 through ASCII 127 should appear as part of UIEE
text records, as these are the only characters having
universal recognition. That said, upper-order ASCII codes
(128-254) may be used and there is no specific restriction to
their use. However, in most character sets, these codes
correspond to foreign language characters or graphical
symbols, which are loosely defined as any ASCII code greater
than 127. Many different symbols can appear for these codes.
Although you are able to insert these characters in your
records, they may not display or print as you intend, and for
this reason they are not recommended for generic
use.
HTML in UIEE Text
Fields
In general, the use of HTML in UIEE
text fields is strongly discouraged. By definition, a text
field is expected to contain only the text portion of a
record. Markup languages like HTML further augment the text
portion of a record through extended functionality, but such
functionality is not appropriate in most cases for data
regarded as input by downstream systems. For example, various
search engines parse HTML data differently and adding HTML to
text fields may disrupt the ability for the text to be found.
In addition, certain fields may have clearly-defined length
limits and HTML greatly lengthens the text in any field.
Indeed, there are many reasons why HTML should not be included
in a text-only field.
However, UIEE does support the
presence of HTML and in fact is transparent to it. Therefore,
HTML can be included directly in UIEE text fields, but care
must be exercised to insure the HTML tags are properly
formatted such that they will be reassembled correctly further
downstream. In particular, there can be no spurious line
breaks and each complete field should be capable of direct
HTML parsing. For example, the record in the previous example
could appear as:
TD|(text
data) TD|(text data) NC|{font face=arial size=2}Four
score and seven years NC|ago our fathers brought forth on
this continent a new NC|nation, conceived in Liberty, and
dedicated to the NC|proposition that all men are created
equal.{/font} TD|(text data) : : TD|(text
data) [CR][LF]
Note: Because
every system is slightly different and business needs vary
considerably, embedded HTML in the text portion of a UIEE
record might not be reassembled at the receiving end in the
same sequence in which it originated. A lot can happen further
downstream when UIEE text data is applied to a system. UIEE
assumes that each individual text field is a portion of a
larger record, and hence they can appear in any order. Thus,
in the example above, the HTML tags would probably not
reassemble properly if they were in separate fields, even
if they appeared that way in the original record.
Hence, the general rule is that if a UIEE file is to
contain embedded HTML tags in one or more text fields, the
tags must always be specific to the field(s) in which they
appear, and the HTML must stand on its own after reassembly.
Document tags, such as {HTML} and {BODY}, should never be
included in the text portion of a UIEE record. Such tags
should only appear in complete HTML document objects
referenced as binary data, not as part of individual UIEE text
data fields.
Finally, if HTML data exists in a database
field, in general it should NOT be stripped by the sending
software. Doing so defeats the objective of intact data
exchange. Rather, it is the responsibility of the receiving
system to perform any stripping or preconditioning such that
the field data conforms to the system
requirements.
HTML
Representations
If HTML is to be used, then
there are several characters commonly used in the text portion
of a UIEE file whose representations are preferred as HTML
rather than their ASCII literal codes. In particular, the
double quote (") the ampersand (&) the
less-than (<) and greater-than (>) symbols
should all be represented as their respective HTML
equivalents. In this context, a literal ASCII value
representation (such as
&)
is preferred over a tag representation. This convention serves
to prevent any erroneous or conflicting HTML parsing that
might occur if formatting tags are present in the text portion
of the record.
Note: A standard UIEE parser
performs no direct translations of HTML tags and merely passes
along the original text to the receiving system. Although most
users do not enter HTML tags in text fields in their
databases, there may exist software options to convert
characters to HTML representations when assembling the UIEE
file. Once again, it is up to the receiving system to
translate these tags if required, and it is up to the import
software to reverse-translate them should this be needed.
Nevertheless, in both cases, there is a preference for the
presence of unambiguous HTML representations, especially if
other HTML tags are present and the text will be used as input
for an Internet-based database system.
UIEE Listing
Codes
Listing codes tell the receiving
database what to do with each UIEE record when it is received.
There are four codes listing codes common to all UIEE records,
regardless of the token set used, as shown in the following
example:
XA|4 XB|1 XC|BO XD|S
Generally,
listing codes should appear as the last four codes in a UIEE
record. All four listing codes must be present in every UIEE
record. As a general parsing rule, any record not containing
all four listing codes or for which any one code is invalid
will be rejected by the receiving system.
XA - Lifespan Code
The
Lifespan Code defines how long a record should reside in the
destination database. For all token sets except AUCTION, this
code is a single digit from 0 to 4. The default is XA|4
(unlimited). The XA prefix must be followed by a single digit,
as below:
0 30 days 1 90 days) 2 180 days 3 1
year 4 Unlimited
For the AUCTION token set,
the lifespan code defines the number of days the auction
should run, and may be up to three digits in length. Typical
values are 3, 5, 7, and 10, but any value greater than zero
and less than 999 will be accepted by a UIEE
parser.
XB - Action
Code
The Action Code tells the receiving
system what action to take when a record is received, and
provides a visual indication within the record itself of the
current working status. There is no default, the action must
be explicitly set. The XB prefix must be followed by a single
digit, as follows:
1 - List This Record: This is
the normal (default) code. It indicates the receiving database
should treat the record as a new or replacement
record.
2 - Sold: Remove: Indicates the item has
been sold and that the record should be removed from the
receiving database. In some systems, this action will generate
a Realization record indicative of the sale.
3 -
Acquired: Remove: Intended mainly for Wants, indicates the
desired item has been acquired and that the record should be
removed from the receiving database.
4 - Traded:
Remove: Indicates a trade for the item has been concluded,
and that the record should be removed from the receiving
database.
5 - Listing Withdrawn: This code
simply removes the on-line listing. No Realization record
should be generated.
6 - Do Not List: This code
indicates that the record should not be listed under any
circumstances. The sending software should normally prevent
the inclusion of these records in the UIEE file, but the UIEE
parser at the receiving end must be able to detect and discard
them should any appear. Does NOT result in a removal from the
receiving database.
7 - On Hold: Indicates a
buyer has inquired about the item but that the transaction has
not yet been concluded, and that the record should be removed
from the receiving database.
XC - Family Code
The Family Code
specifies the broad classification under which the records
falls. This code is used primarily by receiving databases to
limit search results to specific types of records. There is no
default, family code must be explicitly set. The XC prefix
must be followed by a two-character code, the contents of
which are specific to the token set used by the UIEE file:
BOOKS
BO - Books General AU -
Autographs EB - Electronic Book EP - Ephemera FC -
Facsimiles LE - Letters MS - Manuscripts MP -
Maps MT - Miniatures PA - Pamphlets/Offprints PH -
Photographs PO - Posters SI - Serial Issues SR -
Serial Runs SV - Serial Volumes TC - Trade
Catalogs UN - Undefined
ANTIQUES
AG -
General Merchandise (defined by category token) AL -
Architectural AQ - Antiquities AS - Asian BO -
Books DE - Decorative Arts EH - Ethnographic FN -
Furniture MP - Maps MR - Maritime MU - Musical
Instruments PM - Primitives RG - Rugs &
Carpets SC - Scientific SL - Silver TX -
Textiles UN - Undefined
AUCTION
AG -
General Merchandise (defined by category token) AN -
Antiques AR - Art AV - Automotive BA - Baby BD -
Building Materials BO - Books BU - Business CA -
Cameras CG - Clothing CH - Charity CL -
Collectibles CP - Computers DL - Dolls DV -
DVD's EL - Electronics FD - Food GI - Gifts GL -
Glassware HE - Health HO - Hobbies HF - Home
Furnishings HI - Home Improvement JE - Jewelry MO -
Motorcycle MU - Music NU - Numismatic OF -
Office PE - Pets PL - Philatelic PR -
Professional PT - Pottery RS - Real Estate SP -
Sporting Goods TI - Tickets TV - Travel TY -
Toys UN - Undefined VG - Video
Games
RETAIL
AG - General Merchandise
(defined by category token) IR - In-Store Merchandise NR
- Consignment Merchandise PS - POS Merchandise RR -
Remaindered Merchandise WR - Warehouse Merchandise UN -
Undefined
CUSTOM
Any defined family code
from any other token set may be used as a CUSTOM family
code.
XD -
Database Code
The Database Code tells the
receiving system in which database to store the uploaded
record. The default is XD|S (For Sale). The XD prefix must be
followed by a single character, as below:
S - For-Sale Database W - Wants Database T -
For-Trade Database M - Remainders Database R -
Realizations Database
Pointer Data
In UIEE 2.44, text
records are essentially the same as in older UIEE versions,
but attached object data is stored in binary format at the end
of the UIEE file. Specific pointers to the binary data appear
at the end of the text portion of the UIEE file, just before
the binary data begins. It is therefore possible to "view" a
UIEE 2.44 file right up to the point where the text data ends
and the binary data begins. This is an extremely useful side
benefit for troubleshooting or validating UIEE records. It
also permits older UIEE parsers to read UIEE 2.44 files
without suffering data loss.
Pointer data is
represented in UIEE 2.44 by a single token, PD, of
which there are always five (5) tokens present.
The 1st token contains the User Record Number
with which the object is associated (represented in the text
portion of the UIEE record by UR or RE).
The 2nd token represents the literal name of
the object (without path data).
The 3rd and 4th tokens represent the
start and end bytes of the embedded binary object.
The 5th and last token is a disposition
code, which contains information about how to handle the
embedded object after it has been extracted.
The end of
the pointer data is indicated by the presence of a [CR] [LF]
sequence. Multiple representations will appear one after the
other, in the same sequence in which the text records appear.
The end of all pointer data is represented by a single ASCII
26 (Ctrl-Z) character. The next byte is the first byte of the
binary data.
An example of a complete UIEE 2.44 file
containing a single record with pointers to a single embedded
object is shown below:

From this
example, it can be seen that the binary pointers 0 and
23691 are relative to the start and end of the binary
data portion of the UIEE file only, not the start of the UIEE
file itself. To determine the file size, subtract one from the
other and add 1. In the above example, the physical size in
bytes of the embedded JPEG file "mybook001234a.jpg" would be
(23691 0) + 1 = 23,692 bytes.
Pointer Data Token Rules and
Restrictions
As previously stated, if
pointer data is present, it is represented by five (5) tokens
for each record. Multiple object representations will appear
one after the other, in the same sequence in which the records
appeared in the text portion of the UIEE file. The following
summarizes pointer token usage for each object:
Token 1 Contains the User Record Number
with which the object is associated (represented in the text
portion of the UIEE record by UR or RE). This should always be
a unique value and should normally be assigned automatically
by the software creating the UIEE file. The value in the first
PD field must be a precise replica of the value in the UR or
RE field in the text portion of the UIEE record with which the
object is associated. The only characters allowed in this
field are A-Z, 0-9, and an underscore (_).
Record numbers should always appear in UPPER
CASE.
Token 2 Represents the name of the
object. In the case of an embedded file, this is the file name
without path data. In the case of a URL, this must be a
complete HTTP address. Long file names are allowed, but are
discouraged as they add ambiguity to the parsing process.
Regardless of the content, all data appearing after the pipe
symbol is assumed to be a part of the file name or URL. If the
information in the 2nd token is a URL rather than
an embedded object, then both the 3rd and
4th tokens are set to zero (see
below).
Tokens 3 and 4 Represent the
start and end bytes of the embedded binary object. Binary
objects have an option base of zero (0). For example, a file
having an embedded length of 1000 bytes must be SEEK'ed from
byte position zero through 999 to be retrieved intact. Thus,
the first pointer in every data set is an offset to position
zero relative to the start of the binary portion of the UIEE
file. To determine the file size, subtract one from the other
and add 1. If both tokens are set to zero, then no extraction
takes place.
Token 5 Contains the
disposition code, which defines how to handle the
embedded object after it has been extracted. The default is
zero (0) which means that the object is either new or a
replacement and that further action is entirely dependent upon
the operation of the system into which the object is being
placed. The possible disposition codes are:
0 - (default) object is an embedded, named
file, server extracts and stores or forwards to destination as
part of record.
1 - Object is represented by a
static URL, server does not retrieve, only URL is passed as
part of record.
2 - Object is represented by a
static URL, server retrieves object and stores or forwards to
destination as part of record.
3 - Object not
embeddded, remove relationship from
record.
Multiple
Object Identifiers
There is no limit to the
number of objects that can be associated with one UIEE text
record. In cases of multiple associations with a single text
record, each pointer data set carries the same User Record
Number (UR). These appear as separate UIEE definitions in
sequence, as shown in the example below:
PD|MYBOOK001234 PD|frontcover001234.jpg PD|0 PD|36449 PD|0
PD|MYBOOK001234 PD|backcover001234.jpg PD|36450 PD|70831 PD|0
PD|MYBOOK001234 PD|illustr001234.jpg PD|70832 PD|99002 PD|0
Note
that the sequence of multiple pointers carries intrinsic
meaning. By default, the sequence of appearance should be
interpreted by the receiving system as the same sequence of
intended end usage. For example, in the case of images, the
appearance of image 1, image 2, image 3, etc should coincide
with their final intended sequence of appearance in a database
or document. Obviously this will be overridden if the objects
are explicitly referenced by name further downstream, but in
the absence of an explicit sequence, the order of appearance
in the UIEE file represents a default sequence. Hence, the
UIEE software assembler should reflect the end usage sequence
of the objects being embedded in the UIEE
file.
Universal Resource
Locators (URL's) as Embedded Objects
An
embedded object in a UIEE file does not have to be a physical
file. An object can be declared by reference through the use
of a URL. In such a case, the file name in the 2nd PD token is
replaced by a URL corresponding to the location at which the
object resides. If a URL is present, then the 3rd and 4th PD
tokens are set to zero to indicate that no extraction is
required. However, the manner in which the URL is handled is
determined by the disposition code in the 5th PD
token.
The disposition code determines how the server
should handle the URL in a manner complementary to that of an
embedded physical file. Depending upon the degree of UIEE
interoperability of the receiving system, URL's may require
the receiving server to perform additional work. Disposition
codes are defined in the Tables
section.
Extracted URL's are always created as Internet
shortcut files with a .LNK file suffix. See UIEE Identity
Files below for additional information about how URL's and
other embedded objects can be handled by the receiving
system.
UIEE Identity
Files
An Identity File is not a part of a
UIEE file, but is rather an optional file entity used to tell
the assembling and/or disassembling software how to handle
embedded UIEE components. Identity files are companion files
to the use of UIEE and are not required, but are nevertheless
presented in the context of this specification as a simple,
standardized means of configuring sending software and
receiving systems. In addition, an Identity File is able to
preserve a user's configuration preferences at the sending end
and can serve as a basis for self-configuration, debugging,
and troubleshooting.
An Identity File defines the
existence of one or more users and tells the sending software
or receiving system how to handle each user's data. This is
especially useful for multiple-user systems requiring specific
handling and redirection of extracted data to meet the
system's operational needs. Each record defines an individual
user in a program or system, consisting of simple flat text
with a format similar to an INF file.
Each Identity
File record entry consists of several lines preceded by a
three-character identifier and an equal (=) sign
separating the identifier and the text:
UID = [User
Id] TFD = [Destination path for extracted UIEE text] TFF
= [File format of extracted UIEE text] TFN = [Naming
convention for extracted UIEE text] OFD = [Destination path
for extracted objects] OFN = [Naming convention for
extracted objects]
User Id: (UID)
This is the normal login ID of the user.
Text File
Destination: (TFD) This is generally a named subdirectory
in a system, or a common directory requiring explicit naming
of the UIEE file. Relative path names should not be used
unless system security requires it. Otherwise, by default the
entry should correspond to a complete path stemmed from the
root directory.
Text File Format: (TFF) This
defines the format of the extracted text. "UIEE" is the
default, the assumption being that the most common application
is for a redirector to handle importing extracted UIEE text
further downstream). If the "DELIMITED" option is used, then
there must be an accompanying extraction template which
defines how the UIEE fields are mapped to the delimited file,
what delimiter character(s) are used, etc.
Text File
Naming Convention: (TFN) There are three possibilities:
"LITERAL" "INCREMENTAL" and "RANDOM." The literal option means
that output files have the same names as the original UIEE
files. The incremental option means that output files are
given sequential names by the disassembler. The random option
means that files are given random names.
Object File
Destination: (OFD) This is generally a named subdirectory
in a system, or a common directory requiring explicit naming
of the UIEE file. Relative path names should not be used
unless system security requires it. Otherwise, by default the
entry should correspond to a complete path stemmed from the
root directory.
Object File Naming Convention:
(OFN) Generally, extracted objects should retain their literal
names and so the use of "LITERAL" is highly recommended. The
"INCREMENTAL" and "RANDOM" options can be used, but they may
produce undesirable results if other UIEE components reference
an object's name literally within other objects (such as an
HTML file). The choice is a function of the needs of the
receiving system and how it handles object data further
downstream.
An example of a typical Identity File
record is shown below:
UID = MYUSERID TFD
= \System\Users\MYUSERID\Data TFF = UIEE TFN =
LITERAL OFD = \System\Users\MYUSERID\Objects OFN =
LITERAL
Note that extracted URL's are
always created as Internet shortcut files with a .LNK file
suffix. If the LITERAL option is chosen, the disassembler uses
the User Record Number as the file prefix, followed by a
single character (a-z) to denote the sequence. Thus, if UIEE
text record MYBOOKS001234 has an associated object URL of:
http://www.images.com/myaccount/myimages/image001.jpg then
this URL would be extracted as an Internet shortcut file and
given the filename:
MYBOOKS001234a.lnk Additional URL
objects associated with the same record would be given
alphabetical appendments of "b," "c," etc. up through the
letter "z." Thus, up to 26 external URL object associations
can exist directly for a single UIEE text record. Records
requiring more than this number should use an embedded
document defining all of the URL's in a single file, in which
there is no limit to the number of URL's that may
exist.
Note: Unless a receiving system or
program is performing an unusual or proprietary set of
functions, it is highly recommended that both UIEE and object
naming conventions be set to LITERAL, such that no alteration
of the names of embedded objects takes place. Renaming can
affect the transportability of UIEE data. Should naming
conflicts arise, the system should handle them as exceptions
and notify the user accordingly, rather than perform an
automatic rename and continue
processing.
| | |
|