(on using META-SHARE records in CLARIN)

(jpiitula 2012-06-05 in Oslo)

Table of Contents

1 World Situation: CLARIN and META-SHARE

Two large language resource metadata systems-to-be in Europe:

http://www.clarin.eu/external/ (go to VLO)

http://www.meta-net.eu/ http://www.meta-share.org/ http://metashare.csc.fi/

2 Our Situation: in Both

3 CLARIN Metadata Architecture

CLARIN has the component metadata model.

http://www.clarin.eu/cmdi http://catalog.clarin.eu/ds/ComponentRegistry/# http://www.isocat.org/

4 META-SHARE Metadata Architecture

META-SHARE has (will have) a defined content model for metadata

META-SHARE nodes provide an editor for resoureInfo records.

Records can be published directly from the editor.

UHEL node: http://metashare.csc.fi

5 CLARIN Harvests, META-SHARE Self-Synchs

CLARIN uses OAI-PMH to provide records for harvesting

META-SHARE nodes will synchronize with each other automatically

http:architecture.svg

http:architecture.png

http://www.openarchives.org/OAI/openarchivesprotocol.html

6 The Issues are Three

I see three issues in sending META-SHARE records to CLARIN:

  1. Format shift from resourceInfo to CMDI/ISOCat and OAI-PMH
    • should be a natural match (except mandatory DC)
    • may still need much work
  2. Tracking the identity of the reused records
    • important
    • not trivial
  3. Authorization or identification of reusable records
    • whose records are they?
    • is this an issue?

7 First, Format Shifting Issue

Shifting from resourceInfo to CMDI/ISOCat should be straightforward:

Most likely implemented as XSL Transforms (XSLT). Identify the META-SHARE records as the original.

http://metashare.ilsp.gr/portal/knowledgebase/resourceInfo

http://catalog.clarin.eu/ds/ComponentRegistry/# http://www.isocat.org/

8 Second, Identity Tracking Issue

Records in META-SHARE get edited, sometimes removed.

CLARIN OAI-PMH providers must track the changes to an existing record accurately.

(Different records on the same resource are not a major issue. Also, they may be detected by persistent identifiers, but hardly resolved automatically.)

9 Possible Record Identity in META-SHARE?

META-SHARE plans to have an identifier for each record.

It may not be the right kind, though.

So there may be a problem. Not sure. Wait for v3, this summer.

10 Identity Tracking in OAI-PMH provider

When a META-SHARE record is considered by an OAI-PMH provider for CLARIN:

If already there:

If not there, add it as a new item with a new OAI-PMH identifier

When a META-SHARE record is already in an OAI-PMH provider:

Might notice absence of old record, presence of new, but don't really want a trail of deletions when there have really been only changes.

11 Third, Authorization Issue

How do we notice when a new META-SHARE record is available for CLARIN? Scenarios:

  1. Someone personally makes their own records available.
  2. CLARIN harvests META-SHARE directly somehow.
    • is there a mechanism for such access?
    • a node admins could leak the contents
    • not sure of legal issues at all
  3. Other.

At least some of the records there are ours.

12 Summary

META-SHARE has a defined content model with a supporting editor for describing language resources. META-SHARE synchronizes itself.

CLARIN has a compatible but flexible format and semantics. CLARIN harvests its own providers.

We are in both. We desire a bridge.

  1. Authorize inclusion of (some) META-SHARE records in CLARIN somehow.
  2. Track the identity of an included record through changes and deletions somehow.
  3. Need CMDI profiles, OAI-PMH envelopes, mandatory DC, transforms (likely XSL). Adding to ISOCat might be useful for CLARIN. (Profile can be loose if the bridge is one-way but should be precise if two-way.)

Author: Jussi Piitulainen <jpiitula@hippu4.csc.fi>

Date: 2012-06-04 21:34:21 EEST

HTML generated by org-mode 6.21b in emacs 23