Thomas Baker
| Organization | Dublin Core Metadata Initiative |
|---|---|
| Organization type | Other |
| Country | United States of America |
This member participated in the following Forums
Forum Forum: "Building the CIARD Framework for Data and Information Sharing" April, 2011
Question 4: What actions should now be facilitated by the CIARD Task Forces?
Congratulations to FAO for the exciting news about AGROVOC, VocBench, and Agrotagger!
As defined in the "five-star" approach, the fourth star is about making your resources "citable" by identifying them with URLs, and fifth star -- the summit of the Linked Data mountain -- is about "linking your data to other people's data to provide context".
As I see it, linking your data to others' data is about embedding your data into a rich web of cross-references -- pathways by which people can discover your data.
Some of those pathways may connect your resources with other resources -- "this research report is the basis for that article", or "this news item summarizes that conference paper". Other pathways connect people to resources -- "Hugo wrote this report" or "Sanjay recommends that blog". Others connect resources to "topics", as in "this research report is about maize (http://aims.fao.org/aos/agrovoc/c_12332)".
Focusing on simple connections suggests a way forward:
1) Ask: what Resources, People, and Topics are important enough to be linked to or cited? Then aim at providing guidance on how to give those things URLs.
2) Then ask: What are the most important ways to link those things? One could perhaps boil this down to a few types of statements such as those listed above. Then aim at providing guidance on publishing simple metadata to make those connections. The guidance would describe how to extract basic information from existing data.
3) Then ask: How can we pull these links together and make them searchable? Some of these goals are already implicit in the CIARD Pathways to Research Uptake (http://www.ciard.net/pathways), just with a tighter focus on harvesting and querying the linked data.
A colleague of mine experienced in "selling" linked data approaches to organizations tells me that the single most convincing demonstration of the utility of the new approach is when people see their own data linked and discoverable in a new context.
Question 2: What are the prospects for interoperability in the future?
I recognize, with Diane, that part of the problem has indeed been the use of technologies pushed by IT departments because they lie within their comfort zones, which typically means XML and SQL. (It should however be added that not all data needs to be exposed as linked data, and that managing data in XML or SQL may in many circumstances be the most practical solution.)
That being the case, the question becomes: How can this or that database or XML database be tweaked to expose linked data -- perhaps only an extract of the full data, or perhaps on-the- fly? Data can be managed in XML or SQL and exposed as RDF. If a given XML or SQL database was originally designed with linked data in mind, or if it happens to map cleanly to linked-data structures, such transformations will be that much easier to implement.
The VIVO project has alot to say about this, as much of their data is extracted and converted from the wide range of databases and formats used on their campuses. In today's world, the (growing) diversity of data formats is a given. It is precisely because the linked data approach does not require data to be managed in a particular format that it stands a chance of succeeding.
san_jay writes about the Interoperability Triangle:
> It is good to see that some of us are trying to bring the human factor in
> interoperability. ...
> But if I summarise from everything from this thread, doesn't everything
> comes to people, processes and technology?
kbheenick writes:
> I feel that the concept of 'interoperability' needs to be considered ,
> ranging all the way from people collaborating to systems collaborating,
> with concepts and information interoperability being somewhere in
> between. ...
> People successfully interoperating means that there has been...
> an agreed set of communication protocols...
I like Sanjay's notion of an Interoperability Triangle
of "People, Processes, and Technology", and I also like
Krishan's point that "processes" have to do with "concepts"
and "communication".
One might summarize this as a triangle of "People --
Communication -- Technology".
PEOPLE
I enthusiastically agree with the emerging emphasis in this
discussion on the "human factor" in interoperability. VIVO is
an excellent example, as the emphasis since its beginnings
some five years ago has been on "connecting people" and
"creating a community" [1].
COMMUNICATION
What makes Linked Data technology different from traditional
IT approaches is that it is analogous to the most familiar
of all communication technologies -- human language.
RDF is the grammar for a language of data. The words of
that language are URIs -- URIs for naming both the things
described and the concepts used to describe those things, from
verb-like "properties" to noun-like "classes" and "concepts".
The sentences of that grammar -- RDF triples -- mirror the
simple three-part grammar of subject, predicate, and object
common to all natural languages. It is a language designed
by humans for processing by machines.
The language of Linked Data does not itself solve the
difficulties of human communication any more than the
prevalence of English guarantees world understanding.
However, it does support communication across a similarly
broad spectrum.
When used with "core" vocabularies such as the fifteen-element
Dublin Core, the result may be a "pidgin" for the sort
of rudimentary but serviceable communication that occurs
between speakers of different languages. When used with
richer vocabularies, it supports the precision needed for
communication among specialists. And just as English provides
a basis for second-language communication among non-native
speakers, RDF provides a common second language into which
local data formats can be translated and exposed.
TECHNOLOGY
Given the speed of technical change, it is inevitable that the
software applications and user interfaces we use today will
soon be superseded. The Linked Data approach acknowledges this
by addressing the problem on a level above specific formats and
software solutions, expressing data in a generic form designed
for ease of translation into different formats. It is an
approach designed to make data available for unanticipated uses
-- uses unanticipated both in the present and for the future.
[1] http://www.dlib.org/dlib/july07/devare/07devare.html
Question 1: What are we sharing and what needs to be shared?
Asad, are you saying that data should be validated in the sense of "schema validation" -- i.e., making sure the data conforms to a format and constraints understood by particular software applications?
Or do you mean "validation" to refer to an evaluation of the quality of information or to verification that the information comes from a reliable source (or even that it has been vetted by experts)?
Both senses of validation are significant but would require different approaches.
kbheenick writes:
> Does that mean that we need to look at our information with
> new 'lenses' and label it with appropriate keywords so they
> can be 'found'. Does it mean that we have to repackage our
> information into different modular formats such that they
> can fit into the the larger information systems; or can the
> technology do all that for us?
I once worked at an economic research institute
which found that people in the region found jobs less by
reading classified ads or visiting employment offices than
through the advice of friends or relatives.
A few years later, a class of mine at the Asian Institute of
Technology in Bangkok found that members of the AIT faculty
each tended to identify with a specialized sub-field consisting
of some 100 colleagues spread over the globe. To remain
current, these faculty members relied less on generalized
literature searches than on recommendations and advice from
their international colleagues.
The general point is that as we design information systems
to serve different audiences, we also consider that people
like to find things by asking other people or looking to them
for recommendations. Assembling information into coherent
packages for particular target audiences is not just a question
of formats but of enabling people to discover information
through following links from people they know or trust.
jimcory wrote:
> I know from working with CrisisCommons that there are
> structured tweets, email chains and skype chats that are
> important to capture for future reference. Forums are
> perhaps more formal ways of capturing discussions, but in
> some cases the immediacy of chat is necessary. Do we rely on
> the conversation participants to capture the info into more
> traditional forms (wikis, summary papers) or do we need to
> somehow tap into live discussions? What does this entail
> when older chats/emails may be archived?
RDF and OWL are great, but much of the utility of Linked
Data derives simply from its use of URIs as globally citable
identifiers for making cross-references between things.
W3C working groups provide a fine example of how URIs,
generated automatically and routinely by the software
environment in which its teleconferences are held, make it
easy to link from live discussions to other types of resources.
Consider, for example, a mailing-list posting of 16 February
[1], which refers to an ACTION recorded in the teleconference
minutes of 10 February [2] -- minutes which were, in turn,
generated automatically from the chat channel log [3].
To me, this is related to what makes a good Tweet -- being
able to: 1) provide a comment, 2) refer to a person (e.g.,
@jenit), 3) give the comment a subject (#tpac), and 4) link
to a document in a compact form that is easy to scan, as in:
@jenit Core vocabularies - FOAF, DC, SKOS etc - reduce
need for invention, provide focus for tools #tpac
http://bit.ly/c1mqxn
Note that this tweet is itself citable with a URI [4].
Tweets and triples use URIs to tie things together. The trick
is to make it easy for people to make these connections,
for example by making URI generation into something that just
happens in the underlying software -- and to make it easy for
people to leverage those URIs effectively when they search
for things.
Tom
[1] http://lists.w3.org/Archives/Public/public-xg-lld/2011Feb/0034.html
[2] http://www.w3.org/2005/Incubator/lld/minutes/2011/02/10-lld-minutes.htm…
[3] http://www.w3.org/2011/02/10-lld-irc#T16-02-40
[4] http://twitter.com/#!/tombaker/status/1270560629727232
Scientists will be more motivated to share when the benefits of doing so can be demonstrated -- not just to themselves but to their employers or funders. Search engines that target Linked Data, perhaps for a specific domain such as "agricultural research", will be able to follow incoming links to a scientist's work in order to generate statistics and analytics, as Twitter engines do for "trending topics".