Association
for Asian Studies
Council
on East Asian Libraries
Committee
on Technical Processing
Program
Minutes
(Philadelphia
Marriott Downtown, Grand ballroom Salon A/B)
March 24, 2010, 8:30-9:30 am
CTP Program
A. Cooperative Identities Hub (Karen Smith-Yoshimura, OCLC
Research)
Summary:
Names touch everything, but they can be very ambiguous. People change their names depending on
context (periods, countries and languages).
The qualifiers in the authority files aren’t sufficient. WorldCat Identities
Hub creates a framework to concatenate and merge authoritative information,
builds a gateway to all forms of names without preferring one form over
another, uses social networking model, provides a switch to extract relevant
information for re-use in own contexts and produces a federated TRUST environment to authenticate and authorize contributors. The Hub’s objectives are to increase metadata
creation efficiency, to identify identity regardless of language or discipline,
to determine preferred form within its own context, to enable contributing
agencies to augment their own data resources and to expose information about
personal and corporate bodies beyond original contexts. The Hub’s functions are
to be searched by both people and software applications, information can be
added, merged, split and flagged, be able to create new entities, do batch
updates and encourage discussions. WorldCat Identities Hub creates a summary page for every
name found in WorldCat. At the meantime, a publication timeline is
created showing an author’s publication history: works about the author, works
by the author, even works by the author published posthumously. A virtual international authority file (VIAF)
is trying to match names across 20 authority files (13 million name records and
10 million personas). VIAF is all about
creating links between existing files of names.
The result is a VIAF record that shows information derived from the
cluster of linked name records and associated bibliographic
records. At the end of the
presentation, OCLC’s participation in other names efforts was mentioned: ISNI
(International Standard Name Identifier) and ORCID (Open Researcher and
Contributor ID). Conclusion: for
everything you want to talk about, give it a URI and provide useful information
at that URI, use structure like metadata and link it to other resources. The Hub tries to bridge the gap between our
technologies and the rest of the world.
B. Processing Update from the Library of Congress (Philip Melzer, LC)
Philip reported on the following LC activities:
1.
Technical services statistics of the
LC Asian and Middle East Division.
2.
Use of bibliographic data from
vendors
3.
Transliteration tools for Chinese
and Korean languages with Voyager. (LC is seeking collaborator for Japanese
tool.)
4.
Use of KOMARC records.
5.
Authority non-Latin reference
pre-population project.
6.
PCC non-Latin guidelines.
7.
LCRI25.3A for named individual work
of art—using English uniform title if available.
8.
Bibliographic File Maintenance for
Korean records—replacing ayn and alif
with apostrophe.
9.
Korean authority records – should we
retain 4xx reference? (Feedbacks are needed.)
10. Korean Romanization and word division guidelines.
C. Emerging standards and guidelines related to cataloging
(Charlene Chou)
In
her presentation, Charlene Chou, gave briefing about emerging standards and
guidelines concerning cataloging practice in general as well as potential
impact on CJK cataloging practice. Her presentation is in two parts:
1.
Emerging
standards and guidelines from international level from IFLA (International
Federation of Library Associations and Institutions)
Currently
serving as a liaison of the Standard Committee, Continuing Resources Section,
Association for Library Collection and Technical Services (ALCTS), American
Library Association (ALA) to IFLA Cataloging Section, Charlene has
opportunities expose and/or actively involved in several IFLA Sections or Task
Forces that developing new standards and guidelines, which she shared with us
and prompted/encouraged for CEAL community for input or working on CJK language
in specific. For example, provide input on the National Bibliographies in the
Digital Age: Guidance and New Directions, or follow the Guidelines for Multilingual
Thesauri as a research model for CJK languages.
2.
RDA
Testing at national level or scope in US
Currently serving as an
active RDA testing participant at Columbia University, Charlene shared detailed
RDA testing planning and her advocating effort to include CJK materials for
testing.
March 24, 2010, 11 am-12:30 pm
CPS/CTP Joint Program
1.
Discovering East Asian Resources
through Next Generation Catalogs (NGC): Enhancements and Issues (Xiuying Zou, University of Pittsburg)
[Click
for presentation file]
Xiuying covered the landscape of the next generation
catalog, why we need the next generation catalog, its enhanced features, East
Asian resources issues, and suggestions for the East Asian library community.
She mentioned the following applications: Encore, Primo, AquaBrowser,
VuFind, LibraryFind, WorldCat Local, WebVoyage, and SearchWorks.
2.
Next
Generation OPAC – VuFind at the University of
Michigan (Mari Suzuki, University of Michigan) [Click for presentation file]
Mari Suzuki
shared the background information on selecting VuFind
as the University of Michigan Library’s next generation OPAC. She discussed
features of VuFind and the differences from the
previous Aleph OPAC. She also examined issues related to CJK display, which
were viewed as an inherent problem in the western language centered
environment. The user experiences with Michigan’s VuFind
OPAC were also discussed. One of the search problems mentioned in the
presentation (title phrase search) was solved thanks to the Library Systems
Office as of April, 2010. For details, see her full presentation slides.
3.
Yufind & OPAC requirements for CJK plus (Keiko Suzuki & Tang Li, Yale University) [Click for presentation file]
§ Tang
Li and Keiko Suzuki gave a presentation on the Yale’s implementation of VUfind: Yufind.
§ VUfind was developed by Villanova University; it is open
source and compatible with large systems
§ Other universities such as Michigan were already using the
system before Yale decided to implement
it
§ The implementation of the system started in April 2008 at
the main library; pilot did not include non-Western languages
§ The interface is defaulted to simple search with an
advanced search option
§ The features of the new generation catalogue present
faceted navigation, relevance sorting, items availability, RSS feeds option
§ A project initiated in late November 2009 sets out to
identify issues related to multi-script functionality in MARC-based discovery
applications, user requirements and expectations, etc
§ Among the issues related to CJK they have
o
Word segmentation and spaces (eg., Japanese kana treatment)
o
Problem with character variation
(traditional vs. simplified)
o
Interface
o
Display of CJK characters
§ Display of CJK headings present a problem, as they are not
authority-controlled. Authority control
may be used as a way to integrate records lacking original script
§ At the end, they requested feedback and comments on the
“desired requirements for CJK”
[Presentation file cannot
be released]
Past:
Google’s mission is to organize the world’s
information and make it universally accessible and useful. Google Books will
provide limited preview from publishers and authors for each book.
Vital statistics of Google Books project:
•
Over 12 million books scanned
o
out of 120 million works (174 million manifestations)
•
About 4 billion pages
•
Over 2 trillion words
•
40+ libraries
•
400+ languages
o
60 books in Hmong
o
233 in Tai
o
5 in Sichuan Yi
o
1583 in Mongolian
o
447 in Khmer
o
2820 in Nepali
o
573 in Nepal Bhasa (3 in classical Nepal
Bhasa)
•
Most of our corpus is non-English
Metadata creation:
1.
Collect metadata from 100+ sources
(libraries, commercial aggregators, union catalogs, publishers, retailers)
2.
Parse the records into our internal
format
§ MARC, ONIX, others...
§ "UVA stores item data and call numbers in
955$a..."
3.
Cluster the records into expressions
and manifestations
4.
Create a "best of" record
for each cluster
5.
Index and display elements of that
record on books.google.com
Following the
880 linkages to extract bibliographic data in other scripts (and better
handling of invalid UTF-8 conversions)
Present:
Copyright
continues to be an important issue.
Google Book Settlement:
If approved, it resolves lawsuit brought against Google by AAP & AG.
Benefits:
Future:
Three Stages of
Google Books:
•
Scanning
•
Scaling
•
Structure
Structure:
•
"Copyright
lasts way too long to strike the balance between benefits to the
author and the public."
•
"The
entire raison d'ętre of copyright is to strike a balance between
benefits to the author and the public."
•
"Thus
the optimal copyright term is c(x)
= 14 (n + 1)."
Read anywhere:
Currently our main focus is books. Like web search indexes the entire web, we’d
love to index all books. However, for image processing as well as document
understanding, no algorithm will work on every volume. Example includes
annotation on the margin of Chinese books.
Books as a
corpus of human knowledge:
•
Understand
one book
•
Understand
all books
•
Understand
relations between books
Linguistic
analysis: "Research that performs
linguistic analysis over the Research Corpus to understand language, linguistic use, semantics and
syntax as they evolve over time and across different genres or other
classifications of Books."
To
gain insights into human progress.
The
“Great Men” theory.
To
create a curated online bookshelves.