UIEE.com Universal Information Exchange Environment
Home Specification Tables Products Contact

UIEE© Version 2.44 Functional Specification

By Thomas A. Sawyer, TAS Software Innovations

© Copyright 1989-2003 Thomas A. Sawyer/TAS Software Innovations. This specification offers no guarantee with respect to fitness, accuracy, or suitability for any application. TAS Software disclaims any responsibility for liability incurred from the use of the information presented herein. This specification imparts no rights of reproduction, redistribution, or proprietary usage, either expressed or implied. Misrepresentation by any person or company claiming ownership, authorship, or copyright of any part of this specification is in violation of Federal Copyright laws, and will be prosecuted.

The UIEE specification has been placed in the public domain in the year 2003 to serve as an authoritative resource for reference and development purposes, and to provide a common set of rules and standards for programmers writing compatible assemblers and disassemblers. This specification is actively maintained. Developers are directed NOT to make any alterations to this specification in their proprietary software, since doing so defeats the purpose of the specification and simply creates problems for everyone. Instead, developers are strongly encouraged to contact us directly for answers to questions, to report errors or omissions, or to issue requests for updates to the specification.

TAS Software provides a range of products and services for companies requiring fast and efficient UIEE assembly and disassembly. Click here for information regarding UIEE file conversion products.


Introduction

This specification describes the structure and application of UIEE (Universal Information Exchange Environment), a file format and protocol for exchanging data between dissimilar information systems. The format allows both text and object data to be reconstructed according to the requirements of the receiving system, while retaining the original integrity of both the data and the record structure. This specification supports UIEE version 2.44.

Originally created in 1989, UIEE was adopted as the standardized means of information exchange by the out-of-print (OP) book trade in the 1990's, in particular by Interloc™ Bibliofind™ Advanced Book Exchange™ Alibris™ Amazon™ and others. Its adoption has served to remove virtually all of the problems associated with delimited files, and has helped to jump-start an important segment of the Internet economy. Developers are encouraged to use the UIEE protocol, but only as specified — proprietary alteration of this specification is expressly prohibited.

The intent of UIEE is to be able to extract and assemble a complete record from a user's own database, send the record to a remote location, then receive the record back again in a form such that the field information is retained and all components are present and intact. A properly-defined UIEE record should be capable of being imported back into the original database such that all of the record's original content and characteristics are restored. To this extent, UIEE is self-defining and does not suffer from the traditional problems resulting from delimited files, which traditionally rely upon positional significance or substantial user interaction to define how data is treated when imported or exported.

This specification provides a basis for on-going development. UIEE Version 2.44 presents a protocol for handling both text data and specialized information treated as embedded objects, with particular attention paid to embedded images. New rules govern the ability for files to be embedded directly within the UIEE file at the sending end, then subsequently extracted intact at the receiving end without data loss. Automatic extraction instructions are contained in the text portion of the UIEE file. There are no restrictions with respect to the types of files that can be embedded.

UIEE contains data records whose text fields are pre-defined through tokens. A token is a two-character code which precedes a field, separated by a vertical bar ( | ) character. The complete collection of tokens used for a particular application is referred to as a Token Set. There are presently five defined (5) token sets, which can be viewed in the Tables section.



UIEE Philosophy and Objectives

Users should not have to be software engineers to exchange data with one another or with external systems. The generally accepted approach to integrating an image into a database is for the address of the image to serve as the object reference. This could be (for example) a path on a disk drive, a URL, or even an embedded bitmap in a rich-text document. This is fine as long as the data resides on the user's computer, or retains its static characteristics through compound addressing.

The problem with this approach is that once the data has left the user's computer, the addresses change. A remote record is no longer a mirror image of the original record, but rather a complex hybrid residing at a remote location, or even multiple locations. Reassembly becomes very difficult or even impossible. Each component retains its original identity by name only because the exchange process itself imposes changes upon the record structure.

Many applications have been produced by companies that perform the transmission of both text and object data. However, records are often treated as proprietary output in the form of HTML or XML, which may not be usable by other information providers, and certainly not by most users. For example, should the user's database be damaged, requiring a re-import of the original records, there is a low probability that the record structure will be preserved by reading an HTML or XML file sent back from the destination site. Some upload software produced by third-party companies effectively manages text and images, but in practice, these are often tied into proprietary processes handled through a dedicated user login channel. Information is not "exchanged" as much as "piped" into dedicated locations with the expectation that data will appear in a certain way, according to a predefined template.

To compound matters, users must frequently upload images by hand as individually-named files despite the software's sophistication, and this can take a vast amount of time. To do anything else useful with their records, users must export data to an intermediary form on their own computer (such as an Excel spreadsheet or Access database). To say that the process of HTML-to-text-to-Spreadsheet and vice versa (with associated images) is cumbersome would be the pinnacle of understatement. Strangely, this has become the accepted industry norm. People have gotten used to it because "that's the way it's done" even though every company does it slightly differently. There is a distinct lack of standardization.

There are file format inconsistencies as well. Many of Microsoft's products, for example, do not strip out delimiter characters from exported text, making the file impossible to read by another computer. Excel spreadsheet files, often used as an intermediary medium, are not foolproof and the steps required to import them are too complex for many users to follow. Clearly, a better solution is needed.

It is the humble opinion of this author that the industry norm of exchanging text on a delimited basis, and then subsequently handling attachments (particularly images) on a manual basis as separate files imposes a cumbersome burden upon both users and the systems that support their activities. UIEE attempts to provide a satisfactory alternative by bundling related information in a form such that the individual components retain their identities, directly associated with the text forming the basis of the database records. Aside from the obvious convenience of not having to deal with multiple elements to create a single record, the other advantage of this approach is that the record retains its original characteristics and can be re-imported at a later date without suffering data loss or alteration.

It was for these reasons that UIEE was originally created, and why it has now been updated. It is the objective of this specification to present a comprehensive set of standards which will move the development of UIEE forward to meet the needs of the market in which it is in common use today, and thus increase the likelihood of productive use by all who rely upon it to serve their personal and professional needs.



What's New in UIEE 2.44

UIEE's simple goal is to make life easier, both for end users and the systems that support their activities. UIEE 2.44 still provides all of the same text-only functionality as previously existed, and as such none of the basic rules have changed. However, new rules have been added to support different types of information, while simultaneously retaining downward compatibility with existing UIEE-compatible systems to the furthest possible extent.

Particular emphasis has been given to images, because it is expected that these will be the most commonly-used objects that will be associated with database records. That said, it is important to understand that any type of object can now be embedded in a UIEE file. These include multimedia files, HTML documents, ZIP files, even executable files. This is a very powerful format.

Objects can reside within the UIEE file, or they can be remotely stored and accessed via embedded HTTP addresses, or they can be directly retrieved by the remote server for inclusion in the destination database. Obviously, the manner in which embedded objects are handled will vary according to the needs of the systems which receive and process the information. However, the behavioral aspects of UIEE have also been updated to provide different situational control over how an embedded object is treated.

New Token Sets have been defined to support multiple listing and sales channels. These are described in the Tables section.

In addition, new tokens have been added to each token set to provide more universal support for document assembly and retrieval. For example, A new Language field is now a standard component of all UIEE records. The field provides native support for the ISO 639-x language codes as defined in the MARC specification. It is hoped the use of this field will be adopted by the many services who make use of UIEE, in order to make it easier for both users and systems to index and retrieve records with different language representations.



Comparison Of UIEE File Formats

The text portion of both old and new UIEE file formats are essentially the same. Pre-2.44 disassemblers can read 2.44 files, with the exception that they will ignore the new data that has been added. A comparison of the old and new formats is shown below:



Each UIEE 2.44 file contains one, two, or three components, always appearing in this order:

  • Text data
  • Pointer data
  • Binary Data

    UIEE files containing embedded objects will always contain all three components. Should the pointers contain only URL addresses, then the pointer portion of the file is present but the binary portion is absent. Should there be no associated objects, then both the pointer and binary portions of the file are absent.

    By definition, the embedded pointers are relative to the start of the binary data, not the start of the UIEE file. This was done to preserve the ability to edit the text portion of the UIEE data without disrupting the binary positioning, or to separate the two and then re-combine them after the fact. It also provides an absolute means of detecting the binary data boundary, through the use of the character (ASCII 26). This approach also insures that old UIEE parsers can always continue to read UIEE 2.44 files, the only difference being that any embedded data will be ignored by the old parser.



    UIEE File Header

    The UIEE file header consists of five lines of text which are common to all versions of UIEE:

  • Line 1 — User Id
  • Line 2 — Token Set
  • Line 3 — Date (MM-DD-YYYY)
  • Line 4 — Time (HH:MM:SS)
  • Line 5 — Blank [CR] [LF]

    Line 5 denotes start of first UIEE text record. Note that blank lines should consist of a [CR] [LF] sequence. Although other EOL conventions exist for different operating systems, this sequence is the only one that satisfies all functional criteria for all systems, and is therefore recommended.

    Line 2 represents the Token Set being used. To preserve downward compatibility with older UIEE versions, this should normally be the word "BOOKS." However, in UIEE 2.44, the following expanded token sets are defined (four standard and one non-standard):

    ANTIQUES: Indicates general Antiques or Collectibles database. Records are parsed for minimum conformity (generally Record Number, Title, Description, Price, and Listing Codes). Tokens are based on standard ANTIQUES field assignments (see Tables section).

    AUCTION: Indicates all records in the file are auction records and are thus given special parsing to insure conformity to auction venue rules. For example, if directed to eBay™ then all records must contain a valid eBay User Id, opening bid price, reserve price, e-mail address, category code, etc. Tokens are based on standard AUCTION field assignments (see Tables section).

    BOOKS: Normal default, refers to general books database, new or used. Records are parsed only for minimum conformity (generally Record Number, Title, Price, and Listing Codes). Tokens are based on standard BOOKS field assignments (see Tables section).

    RETAIL: Indicates general retail merchandise. Records are parsed for minimum conformity (generally Record Number, Title, Price, and Listing Codes). Records in this category are intended for inclusion in retail databases, and as such much specify a pointer to one or more destinations (such as Half.com™ and similar vendors). Tokens are based on standard RETAIL field assignments (see Tables section).

    CUSTOM: Indicates self-defining usage. No parsing is performed. All tokens are alphanumeric and field assignments are based on the requirements of the destination database. Normally only used for test purposes or for special cases requiring unusual or proprietary field assignments (see Tables section).

    Thus, there are five distinct sets of UIEE tokens currently defined, each set associated with a particular usage. Token parsing is also different for each usage. This is an important consideration when making a design decision with respect to how to handle the disassembled files at the server side.

    For production environments, the recommended procedure is to establish an Identity File and associate it with a dedicated program designed to handle all UIEE functionality, such as TAS Software's UIEE 2.44 Distributor or other program that both disassembles and distributes files according to the needs of the receiving system. The Identity File tells the disassembler what to do with the information it extracts and the locations in which it should reside. This file also provides the system administrator with a direct means of configuring each user's requirements to support the system(s) in which the information will be applied. An Identity File is similar to an INF file and is described later in this specification.



    Basic UIEE Text Record Structure

    Each UIEE text record contains a variable number of lines, each of which is preceded by a two-character prefix and a vertical bar or "pipe" symbol ( | ). A blank line [CR] [LF] delimits the end of each text record (see Prefix Code table for specific usage of each field). Thus, the basic record structure is:

    TD|(text data)
    TD|(text data)
    TD|(text data)
    :
    :
    TD|(text data)
    [CR][LF]

    where "TD" is any 2-character token representing text data. Any text appearing on a separate line that does NOT contain a pipe symbol in the 3rd character position is ignored. Thus, any non-tokenized text, regardless of where it occurs, is treated as a remark only and not part of the UIEE text data. For example:


    TD|(text data)
    TD|(text data)
    This line will be ignored
    TD|(text data)
    :
    :
    TD|(text data)
    [CR][LF]


    A consecutive token appearing on the next line following the same token is treated as an extension of the previous line, separated by a space character. In the example below, a Comments field appears in two lines:

    TD|(text data)
    TD|(text data)
    NC|This line contains text normally
    NC|appearing in the Comments field

    TD|(text data)
    :
    :
    TD|(text data)
    [CR][LF]

    A common mistake made when constructing a UIEE text record is to wrap text to the next line without including the identifying token. Thus, the following example would not be read correctly and only the first five words in the Comments field would be parsed because the 2nd token was missing:

    TD|(text data)
    TD|(text data)
    NC|This line contains text normally
    appearing in the Comments field

    TD|(text data)
    :
    :
    TD|(text data)
    [CR][LF]




    Line Lengths and Wrapping

    Lines can be of any length. However, it is recommended that the software creating the UIEE files should wrap the text such that it can be easily viewed in any editor for debugging purposes, as shown below:


    TD|(text data)
    TD|(text data)
    NC|Four score and seven years ago our fathers brought
    NC|forth on this continent a new nation, conceived in
    NC|Liberty, and dedicated to the proposition that all
    NC|men are created equal.

    TD|(text data)
    :
    :
    TD|(text data)
    [CR][LF]

    The recommended wrap length is 70 characters, but it can occur at any desired line length. Wrapping should always occur at the point a space character (ASCII 32) appears in the text. If no space character is present, the line should not be wrapped.



    Text Record Parsing and Field Sequencing

    Each token set contains certain elements which must always be present. For the purposes of this specification, the BOOKS token set is used to serve as examples. However, it is important to keep in mind that the rules governing token sets are specific to the particular set being used, and the parsing rules are slightly different for each set (see Tables section).

    For example, in the BOOKS set, the following tokens must always appear, and should appear in this order:

    UR or RE User Record Number (UR preferred, RE supported for compatibility).
    TI Book Title
    (remaining fields appear in any order)
    PR Price (parsed in For-Sale records only).
    XA Lifespan
    XB Action Code
    XC Family Code
    XD Database Code

    An example of a text record which follows this format is shown below:

    UR|MYBOOKS000552
    TI|The Missions of New Mexico, 1776
    AA|Adams, Eleanor B. and Chavez, Angelico
    CN|Very fine w/fine dj
    PP|Albuquerque
    DP|1956
    NC|Well-preserved, excellent binding, good color in dj.
    MT|American History
    KE|Architecture
    KE|New Mexico
    KE|Religion
    KE|Western Americana
    LG|eng
    WT|18
    PR|145.00
    XA|4
    XB|1
    XC|BO
    XD|S
    [CR][LF]

    In the above example, the UIEE record starts with the UR (User Record Number) tag. This is the standard convention. Subsequently, the remaining fields can appear up to the first listing code field (XA) in any order. Thus, it is always preferable to place the record number first and the listing codes last for easier inspection and validation.

    The listing codes for this particular record indicate that it has an unlimited lifespan (XA=4), it is a new or replacement record (XB=1), is a member of the "Books" family (XC=BO) and is to be entered in the For-Sale database (XD=S).

    In UIEE 2.44, this record may have objects associated with it. However, note that there is no such indication at this point. It is not until the disassembler encounters a PD (pointer data) token that this fact becomes relevant. Hence, downward compatibility is preserved for older UIEE parsers — the remaining data appearing at the end of the file can be discarded if the receiving system is not capable of disassembling it.

    Note: This example also contains a Language (LG) field. The Language field has always been a part of the UIEE specification, but it has been treated as optional in the past. However, the 2.44 specification strongly recommends that a Language field should always be included as part of every UIEE record, whether or not the receiving system will make use of it. The code contained in this field corresponds to the ISO-639/2 List of Recommended Language Identifiers, as per the MARC specification. Either the ISO-639-1 (two-character codes) or ISO-639-2 (three-character codes) may be used, with preference given to the ISO-639-2/T three-character code set (see Tables section).



    Allowed ASCII Characters

    The only ASCII codes for which specific restrictions exist are control characters (ASCII 0 through 31). These cannot appear as part of any text field. A properly-constructed UIEE parser will strip these characters from text before passing it along as output. In addition, it is recommended that any spurious pipe ( | ) symbols appearing in text should be stripped as well, to avoid possible problems downstream.

    Generally speaking, it is recommended that only ASCII 32 through ASCII 127 should appear as part of UIEE text records, as these are the only characters having universal recognition. That said, upper-order ASCII codes (128-254) may be used and there is no specific restriction to their use. However, in most character sets, these codes correspond to foreign language characters or graphical symbols, which are loosely defined as any ASCII code greater than 127. Many different symbols can appear for these codes. Although you are able to insert these characters in your records, they may not display or print as you intend, and for this reason they are not recommended for generic use.



    HTML in UIEE Text Fields

    In general, the use of HTML in UIEE text fields is strongly discouraged. By definition, a text field is expected to contain only the text portion of a record. Markup languages like HTML further augment the text portion of a record through extended functionality, but such functionality is not appropriate in most cases for data regarded as input by downstream systems. For example, various search engines parse HTML data differently and adding HTML to text fields may disrupt the ability for the text to be found. In addition, certain fields may have clearly-defined length limits and HTML greatly lengthens the text in any field. Indeed, there are many reasons why HTML should not be included in a text-only field.

    However, UIEE does support the presence of HTML and in fact is transparent to it. Therefore, HTML can be included directly in UIEE text fields, but care must be exercised to insure the HTML tags are properly formatted such that they will be reassembled correctly further downstream. In particular, there can be no spurious line breaks and each complete field should be capable of direct HTML parsing. For example, the record in the previous example could appear as:


    TD|(text data)
    TD|(text data)
    NC|{font face=arial size=2}Four score and seven years
    NC|ago our fathers brought forth on this continent a new
    NC|nation, conceived in Liberty, and dedicated to the
    NC|proposition that all men are created equal.{/font}

    TD|(text data)
    :
    :
    TD|(text data)
    [CR][LF]

    Note: Because every system is slightly different and business needs vary considerably, embedded HTML in the text portion of a UIEE record might not be reassembled at the receiving end in the same sequence in which it originated. A lot can happen further downstream when UIEE text data is applied to a system. UIEE assumes that each individual text field is a portion of a larger record, and hence they can appear in any order. Thus, in the example above, the HTML tags would probably not reassemble properly if they were in separate fields, even if they appeared that way in the original record.

    Hence, the general rule is that if a UIEE file is to contain embedded HTML tags in one or more text fields, the tags must always be specific to the field(s) in which they appear, and the HTML must stand on its own after reassembly. Document tags, such as {HTML} and {BODY}, should never be included in the text portion of a UIEE record. Such tags should only appear in complete HTML document objects referenced as binary data, not as part of individual UIEE text data fields.

    Finally, if HTML data exists in a database field, in general it should NOT be stripped by the sending software. Doing so defeats the objective of intact data exchange. Rather, it is the responsibility of the receiving system to perform any stripping or preconditioning such that the field data conforms to the system requirements.



    HTML Representations

    If HTML is to be used, then there are several characters commonly used in the text portion of a UIEE file whose representations are preferred as HTML rather than their ASCII literal codes. In particular, the double quote (") the ampersand (&) the less-than (<) and greater-than (>) symbols should all be represented as their respective HTML equivalents. In this context, a literal ASCII value representation (such as &#38;) is preferred over a tag representation. This convention serves to prevent any erroneous or conflicting HTML parsing that might occur if formatting tags are present in the text portion of the record.

    Note: A standard UIEE parser performs no direct translations of HTML tags and merely passes along the original text to the receiving system. Although most users do not enter HTML tags in text fields in their databases, there may exist software options to convert characters to HTML representations when assembling the UIEE file. Once again, it is up to the receiving system to translate these tags if required, and it is up to the import software to reverse-translate them should this be needed. Nevertheless, in both cases, there is a preference for the presence of unambiguous HTML representations, especially if other HTML tags are present and the text will be used as input for an Internet-based database system.



    UIEE Listing Codes

    Listing codes tell the receiving database what to do with each UIEE record when it is received. There are four codes listing codes common to all UIEE records, regardless of the token set used, as shown in the following example:

    XA|4
    XB|1
    XC|BO
    XD|S

    Generally, listing codes should appear as the last four codes in a UIEE record. All four listing codes must be present in every UIEE record. As a general parsing rule, any record not containing all four listing codes or for which any one code is invalid will be rejected by the receiving system.


    XA - Lifespan Code

    The Lifespan Code defines how long a record should reside in the destination database. For all token sets except AUCTION, this code is a single digit from 0 to 4. The default is XA|4 (unlimited). The XA prefix must be followed by a single digit, as below:

    0 30 days
    1 90 days)
    2 180 days
    3 1 year
    4 Unlimited

    For the AUCTION token set, the lifespan code defines the number of days the auction should run, and may be up to three digits in length. Typical values are 3, 5, 7, and 10, but any value greater than zero and less than 999 will be accepted by a UIEE parser.


    XB - Action Code

    The Action Code tells the receiving system what action to take when a record is received, and provides a visual indication within the record itself of the current working status. There is no default, the action must be explicitly set. The XB prefix must be followed by a single digit, as follows:

    1 - List This Record: This is the normal (default) code. It indicates the receiving database should treat the record as a new or replacement record.

    2 - Sold: Remove: Indicates the item has been sold and that the record should be removed from the receiving database. In some systems, this action will generate a Realization record indicative of the sale.

    3 - Acquired: Remove: Intended mainly for Wants, indicates the desired item has been acquired and that the record should be removed from the receiving database.

    4 - Traded: Remove: Indicates a trade for the item has been concluded, and that the record should be removed from the receiving database.

    5 - Listing Withdrawn: This code simply removes the on-line listing. No Realization record should be generated.

    6 - Do Not List: This code indicates that the record should not be listed under any circumstances. The sending software should normally prevent the inclusion of these records in the UIEE file, but the UIEE parser at the receiving end must be able to detect and discard them should any appear. Does NOT result in a removal from the receiving database.

    7 - On Hold: Indicates a buyer has inquired about the item but that the transaction has not yet been concluded, and that the record should be removed from the receiving database.


    XC - Family Code

    The Family Code specifies the broad classification under which the records falls. This code is used primarily by receiving databases to limit search results to specific types of records. There is no default, family code must be explicitly set. The XC prefix must be followed by a two-character code, the contents of which are specific to the token set used by the UIEE file:

    BOOKS

    BO - Books General
    AU - Autographs
    EB - Electronic Book
    EP - Ephemera
    FC - Facsimiles
    LE - Letters
    MS - Manuscripts
    MP - Maps
    MT - Miniatures
    PA - Pamphlets/Offprints
    PH - Photographs
    PO - Posters
    SI - Serial Issues
    SR - Serial Runs
    SV - Serial Volumes
    TC - Trade Catalogs
    UN - Undefined

    ANTIQUES

    AG - General Merchandise (defined by category token)
    AL - Architectural
    AQ - Antiquities
    AS - Asian
    BO - Books
    DE - Decorative Arts
    EH - Ethnographic
    FN - Furniture
    MP - Maps
    MR - Maritime
    MU - Musical Instruments
    PM - Primitives
    RG - Rugs & Carpets
    SC - Scientific
    SL - Silver
    TX - Textiles
    UN - Undefined

    AUCTION

    AG - General Merchandise (defined by category token)
    AN - Antiques
    AR - Art
    AV - Automotive
    BA - Baby
    BD - Building Materials
    BO - Books
    BU - Business
    CA - Cameras
    CG - Clothing
    CH - Charity
    CL - Collectibles
    CP - Computers
    DL - Dolls
    DV - DVD's
    EL - Electronics
    FD - Food
    GI - Gifts
    GL - Glassware
    HE - Health
    HO - Hobbies
    HF - Home Furnishings
    HI - Home Improvement
    JE - Jewelry
    MO - Motorcycle
    MU - Music
    NU - Numismatic
    OF - Office
    PE - Pets
    PL - Philatelic
    PR - Professional
    PT - Pottery
    RS - Real Estate
    SP - Sporting Goods
    TI - Tickets
    TV - Travel
    TY - Toys
    UN - Undefined
    VG - Video Games

    RETAIL

    AG - General Merchandise (defined by category token)
    IR - In-Store Merchandise
    NR - Consignment Merchandise
    PS - POS Merchandise
    RR - Remaindered Merchandise
    WR - Warehouse Merchandise
    UN - Undefined

    CUSTOM

    Any defined family code from any other token set may be used as a CUSTOM family code.


    XD - Database Code

    The Database Code tells the receiving system in which database to store the uploaded record. The default is XD|S (For Sale). The XD prefix must be followed by a single character, as below:

    S - For-Sale Database
    W - Wants Database
    T - For-Trade Database
    M - Remainders Database
    R - Realizations Database




    Pointer Data

    In UIEE 2.44, text records are essentially the same as in older UIEE versions, but attached object data is stored in binary format at the end of the UIEE file. Specific pointers to the binary data appear at the end of the text portion of the UIEE file, just before the binary data begins. It is therefore possible to "view" a UIEE 2.44 file right up to the point where the text data ends and the binary data begins. This is an extremely useful side benefit for troubleshooting or validating UIEE records. It also permits older UIEE parsers to read UIEE 2.44 files without suffering data loss.

    Pointer data is represented in UIEE 2.44 by a single token, PD, of which there are always five (5) tokens present.

  • The 1st token contains the User Record Number with which the object is associated (represented in the text portion of the UIEE record by UR or RE).

  • The 2nd token represents the literal name of the object (without path data).

  • The 3rd and 4th tokens represent the start and end bytes of the embedded binary object.

  • The 5th and last token is a disposition code, which contains information about how to handle the embedded object after it has been extracted.

    The end of the pointer data is indicated by the presence of a [CR] [LF] sequence. Multiple representations will appear one after the other, in the same sequence in which the text records appear. The end of all pointer data is represented by a single ASCII 26 (Ctrl-Z) character. The next byte is the first byte of the binary data.

    An example of a complete UIEE 2.44 file containing a single record with pointers to a single embedded object is shown below:



    From this example, it can be seen that the binary pointers 0 and 23691 are relative to the start and end of the binary data portion of the UIEE file only, not the start of the UIEE file itself. To determine the file size, subtract one from the other and add 1. In the above example, the physical size in bytes of the embedded JPEG file "mybook001234a.jpg" would be (23691 — 0) + 1 = 23,692 bytes.



    Pointer Data Token Rules and Restrictions

    As previously stated, if pointer data is present, it is represented by five (5) tokens for each record. Multiple object representations will appear one after the other, in the same sequence in which the records appeared in the text portion of the UIEE file. The following summarizes pointer token usage for each object:

    Token 1 Contains the User Record Number with which the object is associated (represented in the text portion of the UIEE record by UR or RE). This should always be a unique value and should normally be assigned automatically by the software creating the UIEE file. The value in the first PD field must be a precise replica of the value in the UR or RE field in the text portion of the UIEE record with which the object is associated. The only characters allowed in this field are A-Z, 0-9, and an underscore (_). Record numbers should always appear in UPPER CASE.

    Token 2 Represents the name of the object. In the case of an embedded file, this is the file name without path data. In the case of a URL, this must be a complete HTTP address. Long file names are allowed, but are discouraged as they add ambiguity to the parsing process. Regardless of the content, all data appearing after the pipe symbol is assumed to be a part of the file name or URL. If the information in the 2nd token is a URL rather than an embedded object, then both the 3rd and 4th tokens are set to zero (see below).

    Tokens 3 and 4 Represent the start and end bytes of the embedded binary object. Binary objects have an option base of zero (0). For example, a file having an embedded length of 1000 bytes must be SEEK'ed from byte position zero through 999 to be retrieved intact. Thus, the first pointer in every data set is an offset to position zero relative to the start of the binary portion of the UIEE file. To determine the file size, subtract one from the other and add 1. If both tokens are set to zero, then no extraction takes place.

    Token 5 Contains the disposition code, which defines how to handle the embedded object after it has been extracted. The default is zero (0) which means that the object is either new or a replacement and that further action is entirely dependent upon the operation of the system into which the object is being placed. The possible disposition codes are:

    0 - (default) object is an embedded, named file, server extracts and stores or forwards to destination as part of record.

    1 - Object is represented by a static URL, server does not retrieve, only URL is passed as part of record.

    2 - Object is represented by a static URL, server retrieves object and stores or forwards to destination as part of record.

    3 - Object not embeddded, remove relationship from record.



    Multiple Object Identifiers

    There is no limit to the number of objects that can be associated with one UIEE text record. In cases of multiple associations with a single text record, each pointer data set carries the same User Record Number (UR). These appear as separate UIEE definitions in sequence, as shown in the example below:

    PD|MYBOOK001234
    PD|frontcover001234.jpg
    PD|0
    PD|36449
    PD|0

    PD|MYBOOK001234
    PD|backcover001234.jpg
    PD|36450
    PD|70831
    PD|0

    PD|MYBOOK001234
    PD|illustr001234.jpg
    PD|70832
    PD|99002
    PD|0

    Note that the sequence of multiple pointers carries intrinsic meaning. By default, the sequence of appearance should be interpreted by the receiving system as the same sequence of intended end usage. For example, in the case of images, the appearance of image 1, image 2, image 3, etc should coincide with their final intended sequence of appearance in a database or document. Obviously this will be overridden if the objects are explicitly referenced by name further downstream, but in the absence of an explicit sequence, the order of appearance in the UIEE file represents a default sequence. Hence, the UIEE software assembler should reflect the end usage sequence of the objects being embedded in the UIEE file.



    Universal Resource Locators (URL's) as Embedded Objects

    An embedded object in a UIEE file does not have to be a physical file. An object can be declared by reference through the use of a URL. In such a case, the file name in the 2nd PD token is replaced by a URL corresponding to the location at which the object resides. If a URL is present, then the 3rd and 4th PD tokens are set to zero to indicate that no extraction is required. However, the manner in which the URL is handled is determined by the disposition code in the 5th PD token.

    The disposition code determines how the server should handle the URL in a manner complementary to that of an embedded physical file. Depending upon the degree of UIEE interoperability of the receiving system, URL's may require the receiving server to perform additional work. Disposition codes are defined in the Tables section.

    Extracted URL's are always created as Internet shortcut files with a .LNK file suffix. See UIEE Identity Files below for additional information about how URL's and other embedded objects can be handled by the receiving system.



    UIEE Identity Files

    An Identity File is not a part of a UIEE file, but is rather an optional file entity used to tell the assembling and/or disassembling software how to handle embedded UIEE components. Identity files are companion files to the use of UIEE and are not required, but are nevertheless presented in the context of this specification as a simple, standardized means of configuring sending software and receiving systems. In addition, an Identity File is able to preserve a user's configuration preferences at the sending end and can serve as a basis for self-configuration, debugging, and troubleshooting.

    An Identity File defines the existence of one or more users and tells the sending software or receiving system how to handle each user's data. This is especially useful for multiple-user systems requiring specific handling and redirection of extracted data to meet the system's operational needs. Each record defines an individual user in a program or system, consisting of simple flat text with a format similar to an INF file.

    Each Identity File record entry consists of several lines preceded by a three-character identifier and an equal (=) sign separating the identifier and the text:

    UID = [User Id]
    TFD = [Destination path for extracted UIEE text]
    TFF = [File format of extracted UIEE text]
    TFN = [Naming convention for extracted UIEE text]
    OFD = [Destination path for extracted objects]
    OFN = [Naming convention for extracted objects]

    User Id: (UID) This is the normal login ID of the user.

    Text File Destination: (TFD) This is generally a named subdirectory in a system, or a common directory requiring explicit naming of the UIEE file. Relative path names should not be used unless system security requires it. Otherwise, by default the entry should correspond to a complete path stemmed from the root directory.

    Text File Format: (TFF) This defines the format of the extracted text. "UIEE" is the default, the assumption being that the most common application is for a redirector to handle importing extracted UIEE text further downstream). If the "DELIMITED" option is used, then there must be an accompanying extraction template which defines how the UIEE fields are mapped to the delimited file, what delimiter character(s) are used, etc.

    Text File Naming Convention: (TFN) There are three possibilities: "LITERAL" "INCREMENTAL" and "RANDOM." The literal option means that output files have the same names as the original UIEE files. The incremental option means that output files are given sequential names by the disassembler. The random option means that files are given random names.

    Object File Destination: (OFD) This is generally a named subdirectory in a system, or a common directory requiring explicit naming of the UIEE file. Relative path names should not be used unless system security requires it. Otherwise, by default the entry should correspond to a complete path stemmed from the root directory.

    Object File Naming Convention: (OFN) Generally, extracted objects should retain their literal names and so the use of "LITERAL" is highly recommended. The "INCREMENTAL" and "RANDOM" options can be used, but they may produce undesirable results if other UIEE components reference an object's name literally within other objects (such as an HTML file). The choice is a function of the needs of the receiving system and how it handles object data further downstream.

    An example of a typical Identity File record is shown below:

    UID = MYUSERID
    TFD = \System\Users\MYUSERID\Data
    TFF = UIEE
    TFN = LITERAL
    OFD = \System\Users\MYUSERID\Objects
    OFN = LITERAL

    Note that extracted URL's are always created as Internet shortcut files with a .LNK file suffix. If the LITERAL option is chosen, the disassembler uses the User Record Number as the file prefix, followed by a single character (a-z) to denote the sequence. Thus, if UIEE text record MYBOOKS001234 has an associated object URL of:
    http://www.images.com/myaccount/myimages/image001.jpg
    then this URL would be extracted as an Internet shortcut file and given the filename:
    MYBOOKS001234a.lnk
    Additional URL objects associated with the same record would be given alphabetical appendments of "b," "c," etc. up through the letter "z." Thus, up to 26 external URL object associations can exist directly for a single UIEE text record. Records requiring more than this number should use an embedded document defining all of the URL's in a single file, in which there is no limit to the number of URL's that may exist.

    Note: Unless a receiving system or program is performing an unusual or proprietary set of functions, it is highly recommended that both UIEE and object naming conventions be set to LITERAL, such that no alteration of the names of embedded objects takes place. Renaming can affect the transportability of UIEE data. Should naming conflicts arise, the system should handle them as exceptions and notify the user accordingly, rather than perform an automatic rename and continue processing.
  •