2010 Annual Meeting Program Minutes
(Philadelphia Marriott Downtown, Grand ballroom Salon A/B)
March 24, 2010, 8:30-9:30 am CTP Program
March 24, 2010, 11 am-12:30 pm CPS/CTP Joint Program
A. Cooperative Identities Hub (Karen Smith-Yoshimura, OCLC Research)
Summary: Names touch everything, but they can be very ambiguous. People change their names depending on context (periods, countries and languages). The qualifiers in the authority files aren’t sufficient. WorldCat Identities Hub creates a framework to concatenate and merge authoritative information, builds a gateway to all forms of names without preferring one form over another, uses social networking model, provides a switch to extract relevant information for re-use in own contexts and produces a federated TRUST environment to authenticate and authorize contributors. The Hub’s objectives are to increase metadata creation efficiency, to identify identity regardless of language or discipline, to determine preferred form within its own context, to enable contributing agencies to augment their own data resources and to expose information about personal and corporate bodies beyond original contexts. The Hub’s functions are to be searched by both people and software applications, information can be added, merged, split and flagged, be able to create new entities, do batch updates and encourage discussions. WorldCat Identities Hub creates a summary page for every name found in WorldCat. At the meantime, a publication timeline is created showing an author’s publication history: works about the author, works by the author, even works by the author published posthumously. A virtual international authority file (VIAF) is trying to match names across 20 authority files (13 million name records and 10 million personas). VIAF is all about creating links between existing files of names. The result is a VIAF record that shows information derived from the cluster of linked name records and associated bibliographic records. At the end of the presentation, OCLC’s participation in other names efforts was mentioned: ISNI (International Standard Name Identifier) and ORCID (Open Researcher and Contributor ID). Conclusion: for everything you want to talk about, give it a URI and provide useful information at that URI, use structure like metadata and link it to other resources. The Hub tries to bridge the gap between our technologies and the rest of the world.
B. Processing Update from the Library of Congress (Philip Melzer, LC)
Philip reported on the following LC activities:
- Technical services statistics of the LC Asian and Middle East Division.
- Use of bibliographic data from vendors
- Transliteration tools for Chinese and Korean languages with Voyager. (LC is seeking collaborator for Japanese tool.)
- Use of KOMARC records.
- Authority non-Latin reference pre-population project.
- PCC non-Latin guidelines.
- LCRI25.3A for named individual work of art—using English uniform title if available.
- Bibliographic File Maintenance for Korean records—replacing ayn and alif with apostrophe.
- Korean authority records – should we retain 4xx reference? (Feedbacks are needed.)
- Korean Romanization and word division guidelines.
C. Emerging standards and guidelines related to cataloging (Charlene Chou)
In her presentation, Charlene Chou, gave briefing about emerging standards and guidelines concerning cataloging practice in general as well as potential impact on CJK cataloging practice. Her presentation is in two parts:
- Emerging standards and guidelines from international level from IFLA (International Federation of Library Associations and Institutions)
- RDA Testing at national level or scope in US
Currently serving as a liaison of the Standard Committee, Continuing Resources Section, Association for Library Collection and Technical Services (ALCTS), American Library Association (ALA) to IFLA Cataloging Section, Charlene has opportunities expose and/or actively involved in several IFLA Sections or Task Forces that developing new standards and guidelines, which she shared with us and prompted/encouraged for CEAL community for input or working on CJK language in specific. For example, provide input on the National Bibliographies in the Digital Age: Guidance and New Directions, or follow the Guidelines for Multilingual Thesauri as a research model for CJK languages.
Currently serving as an active RDA testing participant at Columbia University, Charlene shared detailed RDA testing planning and her advocating effort to include CJK materials for testing.
CPS/CTP Joint Program
A. Next generation catalogs: North American libraries experiences
- Discovering East Asian Resources through Next Generation Catalogs (NGC): Enhancements and Issues (Xiuying Zou, University of Pittsburg)
- Next Generation OPAC – VuFind at the University of Michigan (Mari Suzuki, University of Michigan)
- Yufind & OPAC requirements for CJK plus (Keiko Suzuki & Tang Li, Yale University)
Xiuying covered the landscape of the next generation catalog, why we need the next generation catalog, its enhanced features, East Asian resources issues, and suggestions for the East Asian library community. She mentioned the following applications: Encore, Primo, AquaBrowser, VuFind, LibraryFind, WorldCat Local, WebVoyage, and SearchWorks.
Mari Suzuki shared the background information on selecting VuFind as the University of Michigan Library’s next generation OPAC. She discussed features of VuFind and the differences from the previous Aleph OPAC. She also examined issues related to CJK display, which were viewed as an inherent problem in the western language centered environment. The user experiences with Michigan’s VuFind OPAC were also discussed. One of the search problems mentioned in the presentation (title phrase search) was solved thanks to the Library Systems Office as of April, 2010. For details, see her full presentation slides.
- Tang Li and Keiko Suzuki gave a presentation on the Yale’s implementation of VUfind: Yufind.
- VUfind was developed by Villanova University; it is open source and compatible with large systems
- Other universities such as Michigan were already using the system before Yale decided to implement it
- The implementation of the system started in April 2008 at the main library; pilot did not include non-Western languages
- The interface is defaulted to simple search with an advanced search option
- The features of the new generation catalogue present faceted navigation, relevance sorting, items availability, RSS feeds option
- A project initiated in late November 2009 sets out to identify issues related to multi-script functionality in MARC-based discovery applications, user requirements and expectations, etc
- Among the issues related to CJK they have
- Word segmentation and spaces (eg., Japanese kana treatment)
- Problem with character variation (traditional vs. simplified)
- Display of CJK characters
- Display of CJK headings present a problem, as they are not authority-controlled. Authority control may be used as a way to integrate records lacking original script
- At the end, they requested feedback and comments on the “desired requirements for CJK”
B. Google: Past, Present and Future (Jon Orwant, Google Representative)
- Over 12 million books scanned
- About 4 billion pages
- Over 2 trillion words
- 40+ libraries
- 400+ languages
- 60 books in Hmong
- 233 in Tai
- 5 in Sichuan Yi
- 1583 in Mongolian
- 447 in Khmer
- 2820 in Nepali
- 573 in Nepal Bhasa (3 in classical Nepal Bhasa)
- Most of our corpus is non-English
[Presentation file cannot be released]
Google’s mission is to organize the world’s information and make it universally accessible and useful. Google Books will provide limited preview from publishers and authors for each book.
Vital statistics of Google Books project:
- out of 120 million works (174 million manifestations)
- Collect metadata from 100+ sources (libraries, commercial aggregators, union catalogs, publishers, retailers)
- Parse the records into our internal format
- MARC, ONIX, others...
- "UVA stores item data and call numbers in 955$a..."
Copyright continues to be an important issue.
Google Book Settlement: If approved, it resolves lawsuit brought against Google by AAP & AG.
- Rightsholder control
- Snippets => 20%
- Library subscriptions
- Research corpus
- Free terminals in every US public library building
- Downloadable books for purchase
- Access for the print-disabled
- Book Rights Registry: a non-profit organization to find and pay rights holders
Three Stages of Google Books:
- "Copyright lasts way too long to strike the balance between benefits to the author and the public."
- "The entire raison d'être of copyright is to strike a balance between benefits to the author and the public."
- "Thus the optimal copyright term is c(x) = 14 (n + 1)."
Read anywhere: Currently our main focus is books. Like web search indexes the entire web, we’d love to index all books. However, for image processing as well as document understanding, no algorithm will work on every volume. Example includes annotation on the margin of Chinese books.
Books as a corpus of human knowledge:
- Understand one book
- Understand all books
- Understand relations between books
Linguistic analysis: "Research that performs linguistic analysis over the Research Corpus to understand language, linguistic use, semantics and syntax as they evolve over time and across different genres or other classifications of Books."
To gain insights into human progress.
The “Great Men” theory.
To create a curated online bookshelves.