e
olif
The olif element is the base document element of a document in Open
Lexicon Interchange Format (OLIF).
a
OlifVersion
The OlifVersion attribute holds data about the version of
OLIF to which the XML instance (document) conforms. The OLIF
Consortium publishes the string identifier that might be
used for the OlifVersion attribute.
e
body
The body element groups a list of entries which contain
linguistic/lexical/terminological data categories for entry
strings/designators.
e
entry
The entry element groups all of the
linguistic/lexical/terminological data categories related to a single
entry string/designator.
e
mono
The mono element groups the monolingual data within an entry.
e
crossRefer
The crossRefer element groups the data categories for
cross-references. Cross-references define relations
between the given entry (link source) and other entries in the lexicon
(link target) in the same language.
e
crLinkType
The crLinkType element classifies the relation between
the entry from which the link originates and the entry to which the link
points. The possible relations include ISO relations (most of which
formally apply to concepts rather than the terms themselves; they have
been adapted here for the purposes of OLIF) and the analysis contained
in EuroWordNet (July, 2000).
Example values: synonym, antonym
e
orthVariantType
The orthVariantType element classifies the type of
orthographic variant that the target of a cross-reference represents
(currently only used for German; used for example to list old/new
spelling) represents.
Example values: german-4
e
monoDC
The monoDC element groups optional data categories for administrative,
morphological, syntactic and semantic data.
e
monoAdmin
The monoAdmin element groups the administrative data within a
monolingual entry.
e
userDesignat
The userDesignat element holds a user designator of an entry string.
The userDesignat element can be used if a need exists to represent the
entry string not just in canonical form.
e
syllabification
The syllabification element holds data about the syllable boundaries
within the entry string.
Example use: do-cu-men-ta-ry, li-be-ra-li-ty
e
geogUsage
The geogUsage element holds data about the geographical usage, or
dialect, of the entry string.
Example values: CA, GB
e
entryType
The entryType element classifies the entry string as being a product
name, trademark, or orthographic variant (note that orthographic
variants may also be encoded as cross-references).
Example values: trademark, orth-var
e
entryFormation
The entryFormation element classifies the shape/structure of the entry
string.
Example values: abb, acr
e
phraseType
The phraseType element classifies the phrasal type of an entity.
Example values: mw
e
entryStatus
The entryStatus element classifies the entry status of an entry within
a given lexicon/termbase (note that there exists a separate data
category for the administrative status).
Example values: word
e
entrySource
The entrySource element holds data about the entry source, or the
lexicon/termbase that the entry originated from.
Example use: TermDB for software package X
e
originator
The originator element holds data about the individual who originated
the entry.
Example use: Christopher Columbus
e
adminStatus
The adminStatus element classifies the administrative status of an
entry relative to a given work environment.
Example values: ver
e
company
The company element holds information about the company/organisation
for which the entry is valid.
Example use: LongDistanceRunners Ltd.
e
abbrev
The abbrev element holds data about an abbreviated form of the entry
string (note that abbreviations may also be encoded as cross-references).
Example use: ERP
e
orthVariant
The orthVariant element holds data about an orthographic variant of the
entry string (note that orthographic variants may also be encoded as
cross-references).
Example use: auf Grund
e
depSynonym
The depSynonym element holds data about a rejected or deprecated
synonym of the entry string.
Example use: IS-H
e
timeRestrict
The timeRestrict element holds data about a time restriction, or
the period of time during or since which usage of the entry is
valid.
Example use: 20011115T140324Z/20011215T140324Z
e
product
The product element holds data about a product for which an entry
is valid.
Example use: Spreadsheet3005
e
project
The project element holds data about a project for which an entry is
valid.
Example use: localization of product X from English into German
e
confidence
The confidence element holds data from terminology extraction.
The value of the confidence element indicates, how confident the term
extraction program is, that the term really is a term.
Example values: 0.99, high
e
monoMorph
The monoMorph element groups the morphological information within a
monolingual entry.
e
morphStruct
The morphStruct element holds data about the morphological structure
of the entry string (note the possibilities provided for multiwords
by means of the synStruct element).
Example use: #[[gebrauch+s]:[gegen+stand]]#
e
inflection
The inflection element holds data about the inflection pattern(s)
of the entry string (or its head in case of a multiword/phrasal
entry).
Example use: book, 16
e
head
The head element holds data about the head word in a
multiword/phrasal entry string.
Example use: infotype (planned compensation infotype)
e
gender
The gender element classifies grammatical gender.
Example values: m, f
e
case
The case element classifies grammatical case.
Example values: d, a, loc
e
number
The number element classifies grammatical number.
Example values: sg, du
e
person
The person element classifies grammatical person.
Example values: first, sec
e
tense
The tense element classifies verb tense.
Example values: pres, fut
e
mood
The mood element classifies verb mood or mode.
Example values: imper, cond
e
aspect
The aspect element classifies verbal aspect.
Example values: perf, iter
e
degree
The degree element classifies adjectival degree type.
Example values: comp, sup
e
auxType
The auxType element classifies the auxiliary type for an
auxiliary verb.
Example values: have, faire
e
monoSyn
The monoSyn element groups the syntactic information within a
monolingual entry.
e
synType
The synType element classifies the general syntactic behavior of
the entry string.
Example values: cnt, refl, attrib
e
synPosition
The synPosition element classifies the unmarked positioning of the
entry string syntactically.
Example values: prenoun, cl-init
e
transType
The transType element classifies the transitivity type of a verb.
Example values: trans, ditrans
e
synStruct
The synStruct element holds data about the constituent structure of a
multiword entry string (note the possibilities provided for single
words by means of the morphStruct element).
Example use: [[adj][noun]] (General Ledger)
e
synFrame
The synFrame element classifies the syntactic frame for the
entry string (subcategorisation).
Example values: subj-imps-opt, dobj-opt
e
prep
The prep element holds data about prepositions that further specify
syntactic frame elements.
Example use: into, about, from, mit, wegen, ausser
e
verbPart
The verbPart element holds data about verb particles that further
specify syntactic frame elements.
Example use: down, up, over
e
monoSem
The monoSem element groups the semantic information within
a monolingual entry.
e
definition
The definition element holds a prose definition of the entry
string.
Example use: Collection of interfaces usable by a programmer
e
natGender
The natGender element classifies the biological gender associated
with the entry.
Example values: m, f, un
e
semType
The semType element classifies an entry string with
respect to a semantic type classification structure.
Example values: anim-hum-pn, cnc-class
e
header
The header element groups data categories information about the data
that has been encoded (thus, header holds meta-data).
e
dataCatReg
The dataCatReg element groups data categories for extensions to
extensible OLIF data categories (like ptOfSpeech). The idea is that
whenever a user chooses to make use of a user extension (and for
example supplies his own tag set for part-of-speech), he explains
the overall listing of the data categories
and values he uses (for example via a URL that he puts into the
ptOfSpeechDCS element of the dataCatReg element). The dataCatReg
element contains several data category specifications (DCS).
e
ptOfSpeechDCS
The ptOfSpeechDCS element (DCS is short for data category
specification) holds data about a user-extended scheme for describing
the part-of-speech of OLIF entries. Users can for example describe
their additional part-of-speech tags by means of a URL or by means
of CDATA sections.
Example uses:
http://www.company.com/nlp/ptOfSpeech/projectX.htm
e
subjFieldDCS
The subjFieldDCS element holds data about a user-extended scheme for
describing the subject field information of OLIF entries (see the
comment for the ptOfSpeechDCS element for more information).
e
semReadingDCS
The semReadingDCS element holds data about a user-extended scheme for
describing the semantic reading information of OLIF entries (see the
comment for the ptOfSpeechDCS element for more information).
e
crLinkTypeDCS
The crLinkTypeDCS element holds data about a user-extended scheme for
describing the types of cross-references between OLIF entries (see the
comment for the ptOfSpeechDCS element for more information).
e
orthVariantTypeDCS
The orthVariantTypeDCS element holds data about a user-extended
scheme for describing the orthographic variants of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
morphStructDCS
The morphStructDCS element holds data about a user-extended scheme for
describing the internal morphological structure of entry
strings/designators (see the comment for the ptOfSpeechDCS element for
more information).
e
inflectionDCS
The inflectionDCS element holds data about a user-extended
scheme for describing the inflection of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
aspectDCS
The aspectDCS element holds data about a user-extended
scheme for describing the aspect of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
synTypeDCS
The synTypeDCS element holds data about a user-extended
scheme for describing the syntactic type of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
synFrameDCS
The synFrameDCS element holds data about a user-extended
scheme for describing the syntactic frames of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
synStructDCS
The synStructDCS element holds data about a user-extended
scheme for describing the syntactic structures of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
semTypeDCS
The semTypeDCS element holds data about a user-extended
scheme for describing the semantic types of OLIF entries (see
the comment for the ptOfSpeechDCS element for more information).
e
conceptHierarchyDCS
The conceptHierarchyDCS element holds data about a user-extended
scheme for describing the concept hierarchy/ontology of OLIF entries
(see the comment for the ptOfSpeechDCS element for more
information).
e
contentInfo
The contentInfo element groups data categories related to the
practice adopted for encoding quotation marks, abbreviations etc.
e
quotMarkInfo
The quotMarkInfo element holds data about editorial practice
adopted with respect to quotation marks.
Example use: our open quote is '!' and our closing quote is '$'
e
syllabificationMarkInfo
The syllabificationMarkInfo element holds data about editorial
practice adopted with respect to syllabification in the original.
Example use: we use '*' as marker
e
abbrevHandling
The abbrevHandling element holds data about the way how abbreviations
are represented. Two options exist: via the abbrev element or via a
crossRefer element.
Example use: we use both the abbrev element,
and the crossRefer element
e
langIdUse
The langIdUse element holds data about the way language
identifers have been used.
Possible values:
region_standard - the region part of a locale (e.g. the CA
in FR_CA) has been used even if the term also
exists in the unrestricted locale (e.g. French
as a whole).
region_exception - the region part of a locale only has been
used if the term does not exist in the
unrestricted locale.
e
valueDefaults
The valueDefaults element groups information about the default
values for various data categories. Whenever an OLIF entry does not
specify a value for one of these data categories, information from
the valueDefaults element should be applied.
e
valDefault
The valDefault element holds data about the default
value for one specific data category.
Example use: The example below shows how to set the default for
the data category 'product' to the string 'OLIF Converter':
OLIF Converter
e
workflowInfo
The workflowInfo element holds data about user-specific workflow
support.
Example use: to be validated by 31 Dec 2001 at the latest
e
termExtractInfo
The termExtractInfo element holds data which is relevant for
terminology extraction (e.g. name and size of corpus to
which term extraction has been applied).
e
fileDesc
The fileDesc element groups data categories relating to physical
features of the OLIF instance (document).
e
fileName
The fileName element holds data about the name of the OLIF file.
Example use: olifForAgency14Jan02.xml
e
fileId
The fileId element holds data about a unique identifier (e.g. a
globally unique identifier) of the OLIF file.
Example use: 011000358700000683362001E.xml
e
fileExtent
The fileExtent element groups data categories related to counts of
items (for example number of entries) in the contents of the OLIF
instance.
e
conceptCount
The conceptCount element holds data about the number of concepts in
the OLIF document.
e
entryCount
The entryCount element holds data about the number of entries in the
OLIF document.
e
termCount
The termCount element holds data about the number of terms
(generally defined as those entries which are both not general
vocabulary and distinguished from one another by the values of the
key data categories) in the OLIF document.
e
byteCount
The byteCount element holds data about the size of the OLIF document
including its tags, in its representation as a text file encoded in
the character set mentioned in the encoding attribute of the XML
declaration. This is useful for calculating media requirements or file
download times.
e
publStmt
The pubStmt element groups data categories related to the distributor
and the owner of the OLIF document. The publStmt element also gives
supplementary information about the OLIF document (e.g. copyright
protection).
e
distributor
The distributor element holds data about the person or
institution who distributes the OLIF document.
e
address
The address element holds data about a postal address of the
distributor.
e
telephone
The telephone element holds data about the telephone number of the
person or institution who distributes the OLIF file (preferably in a
format conformant to ITU-T/CCITT Recommendation E.123).
e
fax
The fax element holds data about the fax number of the person or
institution who distributes the OLIF file (preferably in a format
conformant to ITU-T/CCITT Recommendation E.123.
e
eAddress
The eAddress element holds data about an electronic address of the
person or institution who distributes the OLIF file. Note that more
than one occurrence of this tag can appear, so that multiple addresses
(possibly of different types) can be included.
e
availability
The availability element holds data about the availability
of an OLIF file, for example, any restrictions on its use or distribution,
its copyright status, etc. A company may use 'Available upon written
agreement' to indicate that the OLIF file may not be freely
redistributed.
e
idNo
The idNo element holds data about a number (e.g. ISBN) used to identify
an OLIF document.
e
date
The date element holds data about a date. Its value must be in ASCII,
in the format YYYYMMDDThhmmssZ. (e.g. 19970811T133402Z for
August 11th 1997 at 1:34pm 2 seconds.) This is one of the options
described in ISO 8601:1988. The value is preferably given in
Coordinated Universal Time (UTC; as indicated by the terminal Z). The
DateValue attribute can be used to specify the date in an arbitrary
format.
e
owner
The owner element holds data about the person, or institution that
owns the OLIF document.
e
replacements
The replacements element groups data categories for string
replacements that should be applied to the document. The replacement
element helps to compress data and might for example specify one
value for the date element of a list of 1000 elements.
e
mapping
The mapping element groups a mapValue and a mapTarget. The
mapValue should be used for the item designated by the mapTarget.
e
mappingValue
The mapping element holds data about a replacement string that is
used in a mapping.
e
mappingTarget
The mappingTarget element holds data about an item to which a
replacement should be applied.
e
name
The name element holds data about a name (e.g. of a distributor or
owner).
e
prop
The prop element holds data about non-standard (proprietary)
information in an OLIF document. It may be used for communicating
tool-specific information.
a
CreaTool
The CreaTool attribute holds data about the tool that
created the OLIF document. Its possible values are not specified in
OLIF but each tool provider will publish the string identifier it
uses.
Example use: CoolTermExtract
a
CreaToolVersion
The CreaToolVersion attribute holds data about the version of the
tool that created the OLIF document. Its possible values are not
specified in OLIF but each tool provider will publish the string
identifier it uses.
Example use: 2.14
a
OrigFormat
The OrigFormat attribute holds data about the format of the file from
which the OLIF document has been generated. The format specification may
include a product name and even a version tag. This may lead
to format specifications like the following:
LOGOS-eSense
LOGOS-LDE-1.1
LOGOS-LDE-1.2
a
AdminLang
The AdminLang attribute holds data about
the default language for the administrative and informative elements
'note' and 'prop'. The value of the AdminLang attribute must be one of
the ISO 3166/639 language identifiers (2 or 3-letter code) or one of
the standard locale identifiers (2 or 3-letter language code, dash,
2-letter territory/country code).
Example use: en
a
CreaDate
The CreaDate attribute holds data about the date of the creation of
the element. Its value must be in ASCII, in the format YYYYMMDDThhmmssZ.
(e.g. 19970811T133402Z for August 11th 1997 at 1 hour 34 minutes 2
seconds.) This is one of the options described in ISO 8601:1988.
The value should be given in Coordinated Universal Time (UTC; as
indicated by the terminal Z).
Example use: 19970811T133402Z
a
CreaId
The CreaId attribute holds data about the user who created the element.
Example use: Lars Nauter
a
DCSType
The DCSType attribute classifies a data category
specification.
Possible values:
replacement - replace existing OLIF values
extension - extend (add to) the predefined OLIF values.
a
InflectionDCSType
The InflectionDCSType attribute classifies
the way how inflection information has been encoded.
Possible values:
classDesignator - reference to a code/designator from a
classification scheme
inflectsLike - example
a
QuotMarkRet
The QuotMarkRet attribute classifies the convention used for
retaining quotation marks.
Possible values:
none - no quotation marks have been retained
some - some quotation marks have been retained
all - all quotation marks have been retained
a
QuotMarkForm
The QuotMarkForm attribute classifies the standardization of
quotation marks.
Possible values:
std - use of quotation marks has been standardized and open and
close quote marks are distinct
nonStd - open and close quote marks are represented indiscriminately
unknown*- use of quotation marks is unknown
a
ValDefaultRefType
The ValDefaultRefType attribute classifies the OLIF
item to which a value default refers.
Possible values:
el - element
att - attribute
en - entity.
a
ValDefaultRefName
The ValDefaultRefName attribute holds data about the
name of the element, attribute or entity to which a value default
is related.
a
ByteCountUnit
The ByteCountUnit attribute classifies the unit in which the bytecount
is measured.
Possible values:
bytes - bytes
kb* - kilobytes
mb - megabytes
gb - gigabytes
a
DistributorType
The DistributorType attribute classifies a distributor.
Possible values:
person - name of a person
place - name of a place
org - name of an organization article in a periodical
cmp - name of a company
a
EAddressType
The EAdressType attribute classifies the electronic
address (email address, web site, ftp site, etc.).
Possible values:
email* - the value is an electronic mail address
url - the value is an URL
a
Region
The Region attribute holds data about the territories within
which rights related to the OLIF data apply.
Possible values:
world* - the text is freely available
eu - European Union only
a
PubStatus
The PubStatus attribute classifies the current availability of the
OLIF data.
Possible values:
restricted - the text is not freely available
unknown* - the status of the text is unknown
free - the text is freely available
a
IdNotype
The IdNoType attribute holds data about a name or abbreviation
(e.g., isbn) identifying what type of identifying number is given.
Possible values:
isbn* - the value is an International Standard Book
Number (ISBN) number
a
DateValue
The DateValue attribute holds data about the a date in
ISO 8601 format.
a
OwnerType
The OwnerType attribute classifies an owner.
Possible values:
natPerson - name of a person
place - name of a place
org - name of an organization article in a periodical
cmp - name of a company
a
PropType
The PropType attribute holds data about the kind of data a
prop element represents.
a
PropLang
The PropLang attribute holds data about the language used in a
prop element.
e
keyDC
The keyDC element groups the five key data categories whose values
uniquely identify an entry.
e
canForm
The canForm element holds the entry string, represented in canonical
form in accordance with OLIF guidelines.
Example use: success story
e
language
The language element encodes the language to which the entry
string belongs.
Example values: fr, en
e
ptOfSpeech
The ptOfSpeech element classifies the part-of-speech represented by
the entry string. In cases of phrases/multiword entries, the value for
part-of-speech depends on the function of the phrase/multiword within
a clause; the part-of-speech of the head element often indicates the
value for part-of-speech value for the entire phrase/multiword
string.
Example values: noun, verb
e
subjField
The subjField element classifies the knowledge domain to which the
lexical/terminological entry is assigned.
Example values: agriculture, aviation
e
semReading
The semReading element classifies readings for entries with
identical values for canonical form, language, part-of-speech, and
subject field.
Example values: color, definite space
e
generalDC
The generalDC element groups general data categories. General data
categories are optional elements that can be used in any of the
top-level OLIF groups for entries (mono, crossRefer, or transfer).
e
updater
The updater element holds data about the individual who last modified
the entry.
Example use: Jessica King
e
modDate
The modDate element holds data about the date on which the
entry was last modified.
Example use: 20011115T140324Z
e
example
The example element holds data about a sample text or portion
of text that contains the entry string as an illustration of
usage.
Example use: ERP is on the rise again.
e
usage
The usage element holds data about a usage note for the
entry string.
Example use: Never use this when talking about ERP.
e
note
The note element holds data about a note, or commentary, on an entry
by a lexicographer/terminologist.
Example use: Never translate this.
e
locInfo
The locInfo element holds data about localization-relevant
information (e.g. product version, component name, operating system
platform, or build number).
a
KeyDCUserId
The KeyDCUserId attribute holds data about a user-defined identifier
of a grouping of OLIF key data categories. This identifier can for
example be used in cross-references.
a
KeyDCUniversalId
The KeyDCUniversalId attribute holds data about a universal identifier
(ie. one which is unique, not only in the user's environment but
worldwide) of a grouping of OLIF key data categories. This identifier
can for example be used in cross-references.
a
NoteType
The NoteType attribute holds data for categorizing notes (e.g.
'for localizer', 'for quality management').
e
transfer
The transfer element groups data categories which define bilingual
transfer relations between the given entry and other entries in the
lexicon in different languages (cf. to crossRefer elements which
point to entries in the same language).
e
trRestrictStmt
The trRestrictStmt element groups multiple related transfer
restrictions (eg. alternatives connected via the logical
operator OR).
e
trRestrict
The trRestrict element groups data categories for a single transfer
restriction.
e
structChangeStmt
The structChangeStmt element groups multiple
related structural changes (which can be connected via the logical
operator AND).
e
structChange
The structChange element groups data categories related to a
change in the target language vis-a-vis the source structure based
on the transfer restriction having been satisfied. Structural
changes are definable for the following parts-of-speech: noun, verb,
adjective, preposition.
e
changeType
The changeType element holds data related to the type of change.
Example values: change-role, add-in-target
e
changePOS
The changePOS element holds data about the part of speech of an
element being added or deleted
Example values: noun, adj
e
changeValue
The changeValue element holds data about the string or
data category being changed.
Example values: active, subj-dobj
e
equival
The equival element holds data about the degree of transfer
relationship between words/phrases in two different languages.
Example values: full, partial
e
contextStmt
The contextStmt element groups multiple related contexts (contexts can be
connected by means of logical operators).
e
context
The context element holds data about one of the following:
a) the context for a given translation of a source word/phrase into
a target word/phrase
b) the context for a structural change in the target language
Example values: pp, genobj
e
testStmt
The testStmt element groups multiple related tests (connected
by means of logical operators).
e
test
The test element holds data about a single test.
e
testType
The testType element holds data about the type of test.
Example values: string, datacat
e
testDC
The testDC element holds data about a data category
to which a test pertains.
Example values: semType, tense
e
testValue
The testValue element holds data about the string or
data category being tested in the context(s) (eg. 'sg' if the
test is on the data category for grammatical number).
Example values: anim-hum, sg
e
logOp
The logOp element holds data about a logical operator.
Possible values:
AND - for trRestrictStmt and structChangeStmt
OR - for trRestrictStmt
NOT - for trRestrictStmt
e
logOpAnd
The logOpAnd element holds data about the logical operator AND.
a
TrTarget
The TrTarget attribute holds data about the target entry of a transfer
relationship.
a
TrDefault
The TrDefault attribute holds data about the default transfer.
e
workflowInfo
The workflowInfo element holds data about workflow-related
information like the task that is currently performed, its
deadlines, and the person responsible for executing the task.