Council of East Asian Libraries
Committee on Technical Processing
Wednesday, March 30, 2005
The 2005 annual meeting of the Council of East Asian
Libraries (CEAL) Committee on Technical Processing (CTP) was
called to order at 4:00 pm on Wednesday, March 30, 2005 in Columbus Hall
KL, Hyatt Regency Hotel in
The
first presentation, “Virtual International Authority File (VIAF)” by Dr. Edward T. O’Neill, Consulting
Research Scientist, OCLC), was a progress report on a project to test the VIAF
concept jointly undertaken by die Deutsche Bibliothek (DDB), the Library of
Congress and OCLC.
The Project demonstrates the feasibility of VIAF by linking the personal name
authority records between DDB’s Personennormdatei (PND) and Library Congress
Name Authority File (LCNAF). VIAF is characterized as metadata linking users from records in
one agency’s personal name authority file to corresponding authorities in other
authority files. It is
designed to permit the linking of any number of authority files. In order to harvest metadata from the
agencies’ authority files, Open Archive Initiative (OAI) protocols will be used. Through
a specially designed user interface, web access will be
provided. VIAF will support
multi-lingual and multi-script capability.
The Project consists of five phases:
1. Create enhanced authority files to both PND and LC personal names
2. Match PND and LC enhanced authority records to
create the initial version of VIAF
3. Build OAI Server
4. Metadata harvesting using OAI protocols
5. Develop an end user interface with Unicode
displays.
As the authority records generally include very few, if any, details about the
person and/or their publishing history, additional information is necessary to
determine if different authority records represent the same person. In order to unambiguously match authority records,
information from bibliographic records is used to enhance the authority records
in Phase I: Creating the Enhanced Authority Files. There
are four situations and some problems identified in Phase I in the LCNAF and
PND authority files:
1. A person may have the same established form in both authority files;
2. Different people may be assigned the same established form: Adams, Mike;
3. Different forms of the name may be established for the same person: Morel,
Pierre (LCNAF) = Morellus, Petrus (PND);
4. A particular person may not be established in both
files.
LC authority records are brief:
010 n 84044261
040 DLC $c DLC $d DLC
100 1 Larson, Jack
670 Thomson, V. The
cat, c1982: $b t.p. (Jack Larson)
From the bibliographic records, we gain significant additional information
about Jack Larson: (1) he is a lyricist;
(2) his primary subject area is music; (3) he was published in the 80s and 90s
by G. Schirmer and Belwin Mills in New York; (4) he worked with Virgil Thomson
and Gerhard Samuel; and (5) Jack Larson is the only name he has used on his
publications, etc.
Dr. O’Neil illustrated how information from mining the bibliographic record is used to create derived authority records as a
prerequisite for enhancing the authorities, using ocm10025532, Virgil Thomson’s
musical score, the cat. The record is also found in the Library of Congress online catalog (LC
Control Number 84758340):
|
LC Control Number: |
84758340 |
|
000 |
00901ncm a2200289 a 450 |
|
001 |
5588276 |
|
005 |
19841210000000.0 |
|
008 |
840627s1982 nyuuua n eng |
|
035 |
__ |9 (DLC) 84758340 |
|
906 |
__ |a 7 |b cbc
|c orignew |d 3 |e ncip |f
19 |g y-genmusic |
|
010 |
__ |a 84758340 |
|
020 |
__ |c $2.95 |
|
028 |
22 |a 48418 |b G.
Schirmer |
|
040 |
__ |a DLC |c DLC
|d DLC |
|
045 |
2_ |b d198006 |b d198007 |
|
048 |
__ |b va01 |b ve01
|a ka01 |
|
050 |
00 |a M1529.3 |b .T |
|
100 |
1_ |a Thomson, Virgil, |d 1896- |
|
245 |
14 |a The cat :
|b duet for soprano and baritone / |c Virgil
Thomson ; [words by Jack Larson]. |
|
260 |
__ |a |
|
300 |
__ |a 1 score (11 p.) ; |c 31
cm. |
|
500 |
__ |a For soprano,
baritone, and piano. |
|
650 |
_0 |a Vocal duets with
piano. |
|
600 |
10 |a Larson, Jack
|x Musical settings. |
|
700 |
1_ |a Larson, Jack. |
Extracted information from ocm10025532/LCN 84758340 is added to LCNA
84044261 to crate a derived authority record with variable fields 9XX with all
text being normalized, i.e., in lower case only, as follows:
LCN in 903; title in 910; publisher in 921; place of publications in 922, added
personal entry extracted from 7001 in 930; language in 940; broad subject area
in 942; publication date in decade in 943; materials type in 944; and
information extracted from 1001 in the mined bibliographic record is given in
950 1. The enhanced record for Larson, Jack given
below will incorporate the frequency count for 9XX fields identified by
subfield ‘9’.
00824nz
2200301n 4500
0
1 oca01144962
1
5 19840809154202.7
2
8 840702n| acannaab | | n
aaa | | |
3
10 $a n 84044261
4
40 $a DLC $c DLC $d DLC
5
100 1 $a Larson, Jack.
6 670 $a Thomson, V. The cat, c1962: $b t.p.
(Jack Larson)
7
903 $a 84758340 $9 1
8
903 $a 93710923 $9 1
9
910 11 $a the cat $b duet
for soprano and baritone $9 1
10 910 11 $a sun like $b on a poem by jack larson $9 1
11 921 11 $a g schirmer $9 1
12 921 11 $a belwin mills publ.
corp $9 2
13 922 $a nyu $9 2
14 930 $a jack larson $9 1
14 940 $a eng $9 2
16 942 $a 234 $9 2
17 943 $a 198x $9 1
18 943 $a 197x $9 1
19 944 $a cm $9 2
20 950 11 $a thomson , virgil $d 1896 $9 1
21 950 11 $a samuel, gerhard $9 1
The details of usable authorities at the end of Phase I
are given below:
LC
DDB
Number of established names: 3,834,162 2,498,071
Number of names used in 2,159,315 2,255,187
bibliographic records
(Enhanced authority records)
Phase
2 of the project focused on matching the enhanced LCNAF and PND authorities
using matching algorithms. To be considered for a match by algorithms, two names must be
consistent. For example, names “Smith,
J. William” and “Smith, John” are consistent, while “Smith, J. William” and
“Smith, John Q.” are inconsistent. For
similarity measures, records from both files with consistent names are compared and a numeric similarity measure is computed
for each pair of records. The pair of
records with the highest similarity is considered to be the
best match. If the similarity is greater
than the critical level, the pair of authority records is considered to be a match. As of
March 29, 2005, the Project focuses on the similarity measures. Dr. O’Neil shared the first VIAF record with
the participants:
Rec stat: n Entered: 20030225
Type: z Upd
status: a Enc
lvl: n Source:
Roman: Ref status: a Mod rec: Name use: a
Gov agn: Auth status: a Subj: a Sub
use: b
Series: n Auth/ref: a Geo subd: n Ser use: b
Ser num: n Name:a Subdiv tp: n Rules:
a
1 010 1
2 040 VIAF $c VIAF
4 700 17 Valk, J. P. de $2 loc $0 n 82238624
5 700 17 Valk, Johannes P. de
$d 1946- $2 pnd $0 122519973
In
this record, subfields ‘2’ and ‘0’ in 7XX are defined
for source and control numbers for authority files. Source code ‘loc’ and ‘pnd’ represents LCNAF
and PND respectively.
The Project pursues to build OAI Server (Phase 3), maintain ongoing metadata
harvesting using OAI protocols (Phase 4) and finally to build end user interface with Unicode display, building on local
system’s authority structure. If the
proof-of-concept is successful, VIAF may be expanded
to include other authority files for personal names and include other types of
authorities such as corporate and geographic names.
At
the end of the presentation Dr. O’Neill extended invitation to the participants
to visit the following site for further progress report of the project:
http://www.oclc.org/research/projects/viaf
The
second presentation, “Hong Kong Chinese Authority Name” by Ms. Maria Lai-che
Lau (Chinese University of Hong Kong), Mr. Patrick Lo and Mr. Owen M.L. Tam
(both at Lingnan University) informed the participants of the latest
development on HKCAN Project.
Initiated in 1999 by six academic libraries, HKCAN became a cooperative project
in 2001 in order to build a Chinese name authority file with CJK scripts that
meet the need of the bilingual community, improving and streamlining authority
control operations and participating in regional and global cooperative
activities on authority work. HKCAN
members are:
As of January 2005, HKCAN records (total number: over 127,000 records) include
over 51,000 records from the Library of Congress and over 76,000 records
created originally by HKCAN members. The breakdown of
127,00 HKCAN records follows:
personal names: 88,000 (69%)
corporate names: 15,000 (12%)
conference names: 1,100 (1%)
uniform titles: 23,000
(18%)
Between August 2004 and January 2005, HKCAN members contributed over 7,900
records.
New XML version 2003 was developed in order to facilitate searching in
simplified or traditional Chinese characters or in pinyin form. This version supports Unicode; has option to export in
text, MARC format; supports CJK index and phrase searching irrespective of
input characters in simplified or traditional; supports Z39.50 protocol and
enables interactive transfer to INNOPAC system.
The next enhancement includes option to export in XML format and OAI-PMH
compliance. The presenters illustrated
the search interface with examples of personal, corporate, conference names and
uniform titles, introducing system support for searching authors in Chinese
character and workflow of HKCAN data processing and special features of HKCAN
XML software.
HKCAN XML software uses Unicode UTF-8 to store Chinese characters and EACC code
to store MARC format records, following MARC XML structure. Chinese
characters are exported for subsequent storage in EACC
or Unicode UTF-8. Records can be
exported in MARC format. Records can
be uploaded to INNOPAC individually or by large batch. Records can be displayed
in MARC or text format. User
authentification can be controlled by user name and
password or via IP address. Records can be updated under management mode before
downloading. Statistical data collection
is facilitated. Modeled after the concepts of Virtual
Authority File, HKCAN offers one-stop searching in multiple authority files
concurrently. In the future HKCAN plans
to enhance the effectiveness of Chinese authority works among Chinese libraries
worldwide and to promote sharing of existing resources among
The presentation was concluded with the invitation to the project site:
http://hkcan.ln.edu.hk/
The
third presentation dealt with Cataloging Questions received to the
Committee. On behalf of the Committee,
Chair Mr. Morimoto presented answers in detail to the seven questions directed
to the Committee by CEAL members.
The
fourth presentation of the session by Mr. Kio Kanda, “LC Cataloging Update 2005”
covered latest developments for CJK-related cataloging at the Library of
Congress.
(1) The latest version is being tested for Unicode implementation at LC. LC’s new OPAC with JACKPHY is expected within this year. The new OPAC offers JACHPHY searching and connection to
Authority files [Contact: Barbara Tillet (btil@loc.gov)
and Ann Della Porta (adel@loc.gov)].
(2) Unicode in LC Classification Minaret is expected
within a couple of months. The sacred
book section of BQ schedule may be the first to have original scripts. This may be accomplished using several web sites for
sacred books such as:
www.cbeta.org (Chinese Buddhist Electronic
Text Association)
www.sutra.re.kr (Tripikata Koreana)
http://www.l.u-tokyo.ac.jp/~sat/
(Machine-readable text-database of the Taisho Tripitaka
(the Taisho Shinsu Daizokyo)
[Contact: Kio Kanda (kkan@loc.gov)]
(3) Chinese Law Classification Projct, Law Librry of Congress. There are over 10,000 titles in 54,000
volumes with “050 LAW”, requiring to be reclassed to KNP, KNQ, and KNR. There are 2,000 titles with “050 LAW’ in RLIN
database. Until the beginning of the
1990’s, there was no law classification schedule for
The project began in December 2004 and will continue in the Law Library until
completed. Next tergat langauges are Arabic and
Japanese. [Contact: Marie Whited, Cataloging Law
Liaison, Law Library of Congress (mswhited@loc.gov)]
(4) Korean
Cataloging projects include the following:
-Revision of Korean language word division and Romanization guidelines (the
first draft will be sent to CEAL later this spring);
-Cataloging of a collection of Korean
gray literature about the democratic movements in South Korea in the late
1980’s. The 234-piece collection is being described on 120 bibliographic records. Each bibliographic record will bear the name
of the collection in a 710 field: Minjuhwa Undong Collection (Library of
Congress);
-Korean/Chinese Team members Sook Hee Weidman and Sarah Byun have begun to work
with Library staff and CEAL members to draft guidelines for the cataloging of
Korean rare materials.
(5) Japanese mathematics books (Wasansho) listed in Shojo Honda’s bibliography
had been cataloged. Pre-Meiji works (5,200 titles)
have been cataloged, leaving 400 titles listed in Honda’s bibliography on
Japanese literature, performing arts, and reference books. Search
term ‘cw: JARB’ in RLIN will retrieve Japanese rare books. Descriptive cataloging guidelines for
pre-Meiji Japanese books has been in a holding pattern, partly because the
LC’s Cataloging of rare books has been in a revising process.
[Contact: Isamu Tsuchitani (itsu@loc.gov)]
(6) Arrearage reduction of law materials in Japanese and Korean and retroactive
classification of Japanese law serials are handled at
Serials Record Division with Gary Bush as Team leader.
(7) LCSH includes the following changes:
from “Bonpo (Sect)” to “Bon (Tibetan religion)”, modifier “Bonpo” to “Bon”;
from “Orientalists” to “Asianists” and “Middle East specialists”; from
“Oriental languages”/”Oriental literature” to “Asian languages”/”Asian
literature”. LC Authority records
changed for individual art/religious objects: “Emaki” ‘painted scrolls’ in 130
fields. Data in bibliographic records have not been updated yet.
(8) There are 32.5 CJK catalogers at LC, distributed in four divisions:
Regional and Cooperative Cataloging Division (RCCD), Serials Division (SRD),
Geography and Map (G&M), and Special Materials Cataloging Division (SMCD):
RCCD SRD G&M SMCD
Chinese: 10 1 1.5
Japanese 10 2
0.5
Korean 5 1 0.5 1
The
last segment of the session consisted of Committee Report/General Remarks
presented to the membership by Chair, Mr. Hideyuki Morimoto. The
2005 CTP annual session adjourned at 5:55 pm. Respectfully
submitted, Hisami
Konishi Springer
During the past year the committee members engaged in the following activities:
(1) Planning /preparation for committee session at 2005 annual meeting
(2) Considering relevance of (a) Committee workshop(s), and, if deemed
significant and feasible, planning/preparation for such workshop.
In 2005 the Committee planned and implemented
CEAL-Sponsored SCCTP Cataloging Workshops for Electronic Serials and
Integrating Resources at the
(3) Further work on AACR2 workbook for
East Asian publications, 2nd ed.
Shiok Lim, Hee-sook Shin, and Hisami Springer collaborated with Phillip Melzer
as focal point, releasing Descriptive
cataloging of East Asian material: chapters
1-2, 5-7; draft chapters 9, 23, 25-26, and Appendix C at URL: http://www.loc.gov/catdir/cpso/CJKIntro2.html
(4) Maintenance of the Committee web site (http://cealctp.lib.uci.edu/) with
managers Hee-sook Shin (contents) and
Abraham Yu (site).
(5) Collecting/organizing pinyin Romanization questions from CEAL members for
securing answers from LC. On March 19
2004, with Daphne Wang and Iping Wei serving as focal points, the Committee
submitted to LC summary of collected reactions of CEAL members to Feb. 23, 2004
documents.
(6) 053 addition in literary author name authority records, based on the lists
previously compiled by the Committee, cycle 1999-2002 Focal points: Daphne Wang and Iping Wei
[Chinese authors]; Hisami Springer [Japanese authors]; Hideyuki Morimoto, with
help of Shiok Lim [Korean authors]. Approximately
1,250 Chinese literary author numbers were added to
name authority records, while all files related to Japanese literary authors
were lost through inundation damages.
(7) Participation in HKCAN use: a committee member has been consulting HKCAN,
June 2004- for creation/updating of name authority records.
Mr. Morimoto concluded his presentation with the following general remarks:
(1) Harvard-Yenching Library started contribution to CONSER. E.g.
OCLC: 57425266
(2) LC started entering 13 digit ISBNs, when available.
(3) Some CEAL members reviewed and commented on draft of AACR3, Pt. 1
(description), Dec. 2004-Feb. 2005; last update to AACR2, summer 2005; There
will be an ALA program on June 26 2005, 8:30 am-12:00 noon in Chicago,
entitled: AACR3: The Next Big Thing in Cataloging. The
publication target for AACR3 is 2007.
After the general remarks, the Chair invited questions from the public. A question was raised on how to access HKCAN database. Ms Lau and Mr. Lo reiterated their project
URL which can be accessible worldwide within six months. For
an inquiry of the release of Unicode based LC OPAC, Mr. Kanda replied that the
timeline is within a year.