Forum: "Building the CIARD Framework for Data and Information Sharing" April, 2011
Question 2: What are the prospects for interoperability in the future?
29/03/2011
"Interoperabilty"1 is a feature both of data sets and of information services that gives access to data sets. When a data set or a service is interoperable it means that data coming from it can be easily "operated" also by other systems. The easier it is for other systems to retrieve, process, re-use and re-package data from a source, and the less coordination and tweaking of tools is required to achieve this, the more interoperable that source is.
Interoperability ensures that distributed data can be exchanged and re-used by and between partners without the need to centralize data or standardise software.
Some examples of scenarios where data sets need to be interoperable:
transfer data from one repository to another;
harmonize different data and metadata sets;
aggregate different data and metadata sets;
virtual research environments;
creating documents from distributed data sets;
reasoning on distributed datasets;
creating new information services using distributed data sets.
There are current examples of how an interesting degree of internal interoperability is achieved through centralized systems. Facebook and Google are the largest examples of centralized systems that allow easy sharing of data and a very good level of inter-operation within their own services. This is due to the use of uniform environments (software and database schemas) that can easily make physically distributed information repositories interoperable, but only within the limits of that environment. What is interesting is that centralized services like Google, Facebook and all social networks are adopting interoperable technologies in order to expose part of their data to other applications, because the huge range of social platforms is distributed and has to meet the needs of users in terms of easier access to information across different platforms.
Since there are social, political and practical reasons why centralization of repositories or omologation of software and working tools will not happen, a higher degree of standardization and generalization ("abstraction") is needed to make data sets interoperable across systems.
The alternative to centralization of data or omologation of working environments is the development of a set of standards, protocols and tools that make distributed data sets interoperable and sharing possible among heterogeneous and un-coordinated systems ("loose coupling").
This has been addressed by the W3C with the concept of the "semantic web". The semantic web heralds the goal of global interoperability of data on the WWW. The concept was proposed more than 10 years ago. Since then the W3C has developed a range of standards to achieve this goal, specifically semantic description languages (RDF, OWL), which should get data out of isolated database silos and structure text that was born unstructured. Interoperability is achieved when machines understand the meaning of distributed data and therefore are able to process them in the correct way.
1 Interoperability http://en.wikipedia.org/wiki/Interoperability
Linked Open Data is probably the way to go. But there is a chicken-and-egg dilemma here: why would people make the investment and expose their data if nobody comes to use it and there is little data to combine with?
I think CIARD ot the agricultural information community in general can play a role, I can think of at least two ways:
- Formulate and find funding for projects that use LOD as a technology and that solve real life problems. The way to engage should be exposing your data in the right format.
- As Diane pointed out (and I hinted at it) documenting the data (how it is collected and what the parameters mean) is a lot of work and scientists need to provide most of it. Tranlating to LOD is still more work: you do not just have to say what the rows and columns in your spreadsheet mean, you should also think of the right encodings (URI's) for the things and properties. and values But this is something where the community (through CIARD or otherwise) can help by developing guidelines. AIMS has made a start by working on guidelines for the exchange of bibliographic information (LODE), but what about other forms of information typically exchanged for agriculture, like field trials, soil surveys, farm data etc.?
I am aware that I am inclined to talk about datasets in the first place, that is what I am workiong on at the moment. But I think much of this is also applicable to other forms of information, like news, project descriptions etc.
properties and
There will not be a bulk transformation of data sets into "LOD". This will happen in a pragmatic case by case manner. A need comes up, i.e. "comprehensive information on data regarding animal feed" and the "needy institution" will contact the data owners to mobilize the data. I another scenario a world wide community of researchers in a specific area will do the necessary transformation work to foster collaboration. I a third szenario a data owner will transfer his data into "LOD" to push them stronger into a world wide use.
I an way, data sets, are much easier to transform into a format sharable through RDF LOD. They are already structured and the meanings of fields and columns are defined; sometimes it is a straight forward transformation process. There is the problem of " provenance" - with all kind of data, text, numbers, pictures and others.
Hm... it seems nobody has realized that this thread has started... :-)
I think it is very important to agree on what interoperability is and what different levels and types of interoperability can be achieved.
I guess we all agree that a web page listing some database records in HTML is not interoperable, while the same list of records as an RSS 2.0 feed is interoperable, but for example do we all agree that the same list as an RSS 1.0 (RDF) feed extended with richer metadata and using Agrovoc or NALT terms (or even better URIs) for subject indexing is more interoperable?
This is important because once we decide to make the effort of making our sources interoperable it is worth opting for the better forms of interoperability.
I would say that the level of interoperability of an information source corresponds to the number of "lives" that data from that source can live.
- Information in an HTML page doesn't live any more lives than its own.
- Information in a basic RSS 2.0 feed lives potentially infinite new lives in its "first generation": it re-lives in all the websites and RSS readers that display it. But basic RSS metadata do not allow for efficient filtering and semantic re-use of the information and most websites and RSS readers cannot do much with basic RSS feeds beside displaying them, which in turn means again HTML pages, and no more "lives".
- Information in an extended RSS 1.0 (RDF) feed can live the same lives as basic RSS 2.0 feeds in its "first generation", but with the additional advantage that RDF triples (and URIs) and richer metadata can be more easily re-processed and re-packaged as new interoperable sources, allowing information to live additional lives, through several "generations".
- Information in a highly specialized XML format can be easily re-processed and re-packaged by specialized clients, but few consumers will be aware of the specialized metadata (provider and consumer are "tightly coupled"), thus limiting the number of "lives" it can live.
Usually, specialized formats, vocabularies and protocols allow for more advanced re-processing of the information, including semantic re-organizations, but only by specialized consumers, while simple protocols (RSS) in simple formats (plain-structure XML) using well-known vocabularies (DC, FOAF) are easily understood by any consumer (like RSS readers): provider and consumer are "loosely coupled".
These are two different types of interoperability, and it's not always easy to decide which one is better. In most cases the best option could be to expose data in more than one format / protocol.
In general, it seems to me that RSS 1.0 (RDF) feeds using extended metadata and URIs from standard vocabularies combine the best of the two worlds.
The concept sounds very appealing as it will open doors for better information sharing. Perhaps from another angle, it will also help wastage of resources in terms of for instance duplication of work.
However, I must also confess that interoperability also looks as Eutopia! For the simplest of reasons (as our friend Andry pointed one for Q1) - Digital Divide.
For interoperability to work, websites / platforms etc must have a minimum level of features that will allow them to handle meta data (and all the associated tit bits) and to communicate among themselves.
Then there is also the issue of standards. Whenever you speak about this to head of institutions, its as if you are speaking Martian and as if you are asking for Mega $$$$.
I believe, if we are to overcome such massive constraints, CIARD will have to be proactive and invite all NARS across the globe to first join the CIARD community.
By brining them to a single platform, it would be easier to communicate with them and push forward ideas that in isolation would seem impossible to conceptualise.
Instead of explaining the process of interoperability, the best approach will be to first explain the benefits that THEY would reap with interoperability ...
...
Sorry for repeating this as I wanted to bring this reply here. My name is Sallam working in technology Dessemination Department with Research Authority in Yemen.
I see that previous contributions have raised many issues of which many go beyond information sharing, but need to be reflected into future trends that help improve information documentation and information sharing. Some issues like another continent … another dream, lack of knowledge on computers and web 2.0, "my interest first", cultural heritage, lack of knowledge packed products that are in the interest of farmers, lack of incentives among researchers, especially in many of the developing countries, lack of clear culture sharing, how to document and make visible outputs, and other many issues that could remain as obstacles affecting information sharing.
Normal 0 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"جدول عادي"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;}
I go back again to the persistent gap between the efforts of producing knowledge that are in the minds of researchers or in technical reports or even in scientific articles and the efforts of integrating this knowledge into simple and visible outputs that are of the interest of farmers, especially poor farmers. Many researchers in many countries think that their end product is in publishing their research results in scientific journal where they gain scientific recognition and job promotion.
It would be wise if we think of ways of motivating researchers to pay efforts in making visible outputs and success stories in formats that are in the interest of farmers rather than the sole interest of the scientific community. I can give a story from my experience, as I tried to pull out my work experience during the past 20 years into two success stories, one of these stories was published as a study, not as a scientific article although it was reviewed. The another story was published by GFAR in a competitive work. Also I tried to prepare many small booklets and leaflets that are useful for farmers as they are supported by results from marketing surveys and marketing information system. The issue is when I introduced all my work for scientific promotion, all were rejected as they are not published in scientific journals including the one published by GFAR/AARINENA.
Now, the issue is how we can think of suggestions that could help facilitate better recognition of research efforts and contribute to breaking the vicious circle in the integration of scientific and indigenous knowledge as well as the mechanisms that facilitate more participatory and farmer-centered approaches leading to suitable formats of publishing and sharing information.
I agree totally with San_jay contributions, the factor that this is the way to go in terms of opening doors to sharing infromation and mostly for the developing countries and I want to believe even with the digital divide and our intitutional heads not undertanding the new terminologies it can happen.
First we need champions who can push the gospel of features and standards that enable sharing of information, by promoting platforms like CIARD as a refence point.
On the other CIARD needs to document processes for example Valeria contributions where she explains the pathways and the results (what you can achieve if you used the RRS), this would enable users easily choose which tools work for them. Second document best practices and success stories that can be used as case studies by others for example what we have done in KAINet has resulted in having GAINS and ZAR4IN. Or look at the guys at ILRI are doing in sharing info. despite being in developing cuntry with the digital devide.
I think with Interoprability we can dream with Johannes but at different levels.
In my country of origin, the discussion about animal feeds and their economical, ecological and social impact is a hot topic at the moment.
In a lunch meeting today with view of the Spree (but in telephone connection with Munich :-)) we discussed the possibility of a webportal that brings together all information on animal feeds.
Let´s take Soya as an example:
Different types of Soya , in which feeds is it used, what are the formulations, for which animals it is used, which country delivers, who are the producers, in which way do trade streams evolve, prices, what pesticides are used, which are the residue limits, which incidents on contaminants have been reported, analyses published by the producers, which is the energy balance in different ecological conditions, laws.
Al lot of this information is available but it is hardly accessible , because of the time it takes to screen all available information by a human. A lot of information is also unavailable because neither will nor obligation to publish does exist.
We discussed how to overcome these problems and to set up a nice little prototype within December, but I will discuss the way to do this under the next topic. :-)
Now I want to point out three somewhat hierarchical conditions of sharing data and making shared data processing possible.
a) data need to be public. This is not a technical, but a societal issue. It is not a small issue, but awareness that openness and transparence is good is growing. This is the basis, but it creates only a dispersed universe of datasets (in a broad sense considering that also text is data)
b) data need to be published in a way that machine can process them. XML has been created to make texts machine processible and to exchange data between databases. XML is only a Syntax and does not express meaning. So the W3C added RDF and OWL to express meaning. There is technology to use data expressed in RDF or OWL. Encoding your data this way is sometimes an investment, but not rocket science.
c) the semantics between different datasets must be understood by machines. This is much more tricky and without reference to common vocabularies/ontologies very labor intensive. If disperse datasets refer to common published vocabularies/ontologies, it is much easier. This is the reason why our AIMS team in FAO is so heavily investing in AGROVOC and similar vocabularies.
If all these 3 conditions were met by all data sets, the construction of our portal on animal feeds would be really easy. And not only a portal on animal feeds. We could have "google desktops" with the information we need for our work.
See eScienceNews , there is no human editor behind this news aggregation service. It's automated and how is it done? It pegs on the condition of sharing data (that Johannes mentiones earlier). Opening up access to data, exposing the data in an format that is consumable (easily), also it has to be well described (semantics) - the pillars to facilitate interoperability among data and data sources.
eSciencenews is a harvester based on Drupal, that screens press releases and blogs from all important universities. The content is then indexed with an internal categorizations scheme by an machine run algorithm and displayed.
If the categorization scheme of eSciencenews would be based o a LOD vocabularies using URIs, it could automatically further link to other resources which would be marked up/indexed with the same URIs
I am somewhat familiar with the eScienceNews system and although I haven't looked at the underlying technologies the site uses for it's implmentation I have a pretty good guess as to what it's doing. I suspect that it's using a system called OpenCalais (http://www.opencalais.com/) a web service that does a semantic analysis of "documents" using Natural Language Processing, machine learning and other methods to provide entities/tags that are delivered by back to the client which can then be used to enhance the discovery of those document by providing information on what a document is about.
When we're talking about where we can go in the future in sharing information, tools suche as Open Calais that let the machine do some of the work to improve interoperibility the the discovery of information will become quite valuable. Another project that I am familiar with is an AgroTagger system which essentially uses a similar text analysys approach then applies AgroVoc terms for tagging the document.
Hi John,
There is a note written by the creator of eScienceNews.com (http://drupal.org/node/261340). our team has also made a trial based on the description. by far there is less semantic calculation involved, it uses Naïve Bayesian algorithm to calculate the similarities among online texts aggregated from the RSS sources, this require a training process, as Agrotagger does I guess.
I agree the future you described the tools like OpenCalais, Textwise could bring, Johannes once introduces a website to me, which you may have known, that PhaseIITechnology (http://www.phase2technology.com/) has been working in this area.
Information exchange and networking is most effective if it is coined by reciprocity. - Also farmers' contributions might be of importance for researchers. That implies - independent from the finally chosen technical solution, farmers or other stakeholders should be motivated to reply, meaning to comment on a report may not be a disincentive. Therefore it should be thought about right from the beginning if it will be possible to establish "regional data transformation centres". Certainly, such an solution might raise the costs for running the information network, but it might be worth thinking about.
I don´t like very much the word "extension service" . The language implies that something is only extended passively to another group. But this is not the case. The results and outputs of science have to be transformed into practice, into technology and business.
But without any doubt there has to be a layer between scientists and practitioners (farmers, enterprises) that mediates between groups of people that often hardly speak the same language. And this is as IAMO pointed out an investment, but a necessary one.
Please have also a look again what Krishan wrote about the different stakeholders. He explicitely included infomation management specialists among the stakeholders. I agree very much with this, because they have a role in creating systems that link science information to advisory services and practice.
I think a quite good example for this is TECA http://teca.fao.org which is facilitated by colleagues in FAO but has stakeholders from Farmer Organizations to Scientists
We developed an Information exchange connecting agri-experts in India with farmers. The interactions start as a question from a farmer with a response from an expert and additional dialogue in some cases.
There are now about 35,000 posts on aaqua.org, about 1/3 of them coming from the farming community, 1/3 from agri-experts and 1/3 from agri-consultants/students/teachers/researchers etc.
We have tried to cluster the Q&A into topics by (a) identifying agri-topics and (b) counting the no of Q&A on each topic. This analysis is shared with agri-extension organizations helping them prioritize farmer's needs and prepare content on those areas so that information can be provided pro-actively.
The forums area used by wealthy, educated farmers but is not limited to them. A large no of questions come from an author who is asking the question on behalf of a small farmer who is a friend, relative, beneficiary etc.
Challenges include capacity building of agri-extension organizations so that they have 10 hr Internet access, power and at least two people who can (a) provide answers to questions coming online and (b) archive Q&A discussion hapenning on the phone.
aaqua.org
Is interoperability the basis of 'collaboration'? Collaboration is our goal (& hopefully a common one, even though our derived benefits may differ) and interoperability is one of the means of achieving it.
I feel that the concept of 'interoperability' needs to be considered, ranging all the way from people collaborating to systems collaborating, with concepts and information interoperability being somewhere in between.
People successfully interoperating means that there has been a mutual recognition of value of the knowledge/information, an understanding of each others' context, an agreed set of communication protocols, and an agreed vision of the process of 'interoperating'.
The same would apply when dealing with concepts (whether its among concepts in people's minds- tacit or that have been described in ontologies (and vocabularies?) - explicit).
I am trying not to use big words or acronyms that I have often seen so far, because some of these are quite new to many of us and that should not affect our 'interoperability' within this discussion.
Thus, I also feel that interoperability among systems may not happen until we have interoperability among people, concepts followed by data, before we can put them together again as contents (or constructs) of systems.
So, as we go through his discussion, it may be useful to try to classify the technologies we are talking about according to which of the four they are suited to address:-
When Sanjay talks of seeking support from Management to invest in his institution's system's interoperability, he is looking for tools that address interoperability among people - can these be policy statements from international forums, like GCARD; intergovernmental statements about the need for regional integration;
When Hugo, in response to Qu. 1, states that he did not see the collaboration among people to define or discover new uses of information, was he looking for tools that facilitate interoperability around concepts? When we talked about the variety of information needs that have to be satisfied under Qu1, were we referring to the need to improve the understanding of the concepts of the user by the information provider? Do the LOD, RDF & URIs address this need sufficiently?(I would not know)
What about when it comes to data? Do the 'standards' or 'most popular formats' of storage and information exchange address the interoperability among databases? It seems that we have historically spent more time of this aspect (until the technology forced us to move onto new concepts?). Based on the responses, it also seems that to take us forward, the focus will have to be elsewhere or on more than just interoperability around data exchange.
So, finally when we tallk of interoperability among systems, is it a combination of the above? Laurent seems to be saying that we need to go beyond the three levels as distinct domains, but start defining new constructs that mix the terms above - which then enables the systems to share meaningful information. Does the example of an Artificial Intelligence behind the editing of the e-sciencenews give us an example of how we may need to build these constructs in the future?
The above is just me thinking aloud...but then, what does all this discussion mean to the person sitting in an institution in the developing world, with poor connectivity and blackouts (in a financially stalled government as Dick put it). What picture do we paint about interoperability for the person who has to decide what is the next step in their institution? We need to be able to paint a scenery for them to illustrate that the efforts they put in now in a system is not going to be wasted or that it can already, with collaboration with another (external, regional?) partner, be contributing to the global pool.
We also need to be able to paint a scenery for ourselves on this forum of how these different techniques of facilitating interoperability fit together!
Krishan wrote:
>When Hugo, in response to Qu. 1, states that he did not see the collaboration among >people to define or discover new uses of information, was he looking for tools that >facilitate interoperability around concepts?
Krishan, I am not sure what you are referring to. If you mean my example of bringing together different types of data relating to climate and agriculture: I meant to say exactly the opposite. What I hinted at is that bringing these things together is not just a technical or logical issue. There is a human side to it as well. But it needs to be done, especially for more insight in urgent problems like agriculture and climate change. Not necessarily easy, but like - just a random example :=) - in a marriage, it is worth the effort to learn to talk to each other.
When Krishnan brought up the topic of Interoperibility amoung people I thought that might be a good opportunity to introduce (for those that are not familiar with it) a project developed out of Cornell called VIVO (http://www.vivoweb.org). I'm hoping that my boss (the original developer and current development manager) will chime in to provide greater detail but VIVO is an open source semantic web application originally developed and implemented at Cornell. When installed and populated with researcher interests, activities, and accomplishments, it enables the discovery of research and scholarship across disciplines at that institution and beyond.
There is currently a large NIH funded VIVO project underway that involves seven institutions to create a national network of scientists will facilitate the discovery of researchers and collaborators across the country. Essentially, it is being implemented to facilitate the Interoperability among people.
Although VIVO has not yet founds it's way into the Agricultural Information Systems domain in any sort of production environment there has been a great amount of interest.
For example, suring a recent visit to several institutions in Costa Rica we had talked about devleoping a community of experts system that might involve insitutions associated with the SIDALC project in latin america. That includes 158 institutions in 22 different countries.
There is also a project at the United States Department of Agriculture (USDA) that has committed to using VIVO to create a one-stop shop for federal agriculture expertise and research results. Here's the official announcement: http://www.ars.usda.gov/is/pr/2010/101005.htm
Personally, I've done a bit of work integrating VIVO with Drupal based systems and the creation of a "Semantic Services" project that is being used in a few Cornell departments to provide faculty information to students.
When talking about interopibility among people I think taking a serious look at VIVO is warrented.
It is good to see that some of us are trying to bring the human factor in interoperability. This is often overlooked because we have a tendency to concentrate on the technology aspect. And perhaps, this is why at times it is difficult to have buy in. Or in many cases haven't we seen systems being setup but then they lose momentum and eventually become souvenirs!
But if I summarise from everything from this thread, doesn't everything comes to people, processes and technology?
We cannot seperate any of the three and hope a system to be interoperable? Or can we?
Indeed, the triangle is pivotal no matter how the final system for information sharing finally looks like. Going through the contributions it becomes obvious that many expect more than passive sharing of information, but are thinking on active networking.
In fact there is a gradient coining the interaction/ the processes: starting from nearly solely technical characteristics with nearly no personal component ending with multilateral face-to-face-communication.
The first one might be efficient, the latter is generally more effective, particularly if you are thinking on "new knowledge" (see Hugo's contribution), because personal communication in a group is said to lead to greates creativity. Furthermore, personal communication makes a network more sustainable, because closer contacts and trust are established.
On the gradient in between these two poles, there are communication forms, like e-mail, skype conversations, bilateral meetings ...
Meaning, establishing a network for sharing information seems to be most promissing if it includes the different communication forms
I totaly Agree with IAMO's statement that establishing a network for sharing information seems to be most promising if it includes the different communication forms. this could be considered as incentive that encourage participation in information sharing and could enforce more win-win information sharing.
Interoperability seems more like an irresistable force than a strategy. People want information and providers wishing to be accessed, provide it in several usable formats. As standards become commonly available, major data managers, document publishers and content streams adopt them in order to remain competitive or viable. This is a natural progression and can be observed as one looks back on the evolution of information technology.
Data sharing standards are inevitably accompanied by open access tools that form the glue to tie separate bits together. Information consumers follow after, looking to create new analyses and perspectives. On an as needed basis, the pieces are arranged and connected in a freeform construction of content and functionality. Each of these unique triangles are designed for a subset of information consumers. Each of the triangles can in turn be linked to other information networks by using standards to create yet more community specific applications.
There is no one-size-fits-all. Standards and flexible linking provide for all the uses one can imagine. They are also constantly evolving to provide the next great trend in information sharing.
I agree that interoperability looks to be inevitable. The only question is how quickly it will happen and what can we do to make it happen faster?
Education is one thing. Although it may seem obvious to the likes of us, there is still a lot of basic awareness raising needed about the benefits of standards and sharing. Building standards and mechanisms for data exchange into the tools that people use in their everyday life is another, so that the data is captured in an interoperable form from the start. Standards compliance should 'just happen' without people having to think about it.Thinking is good. That's why any of this is happening. It is also inevitable. It is one of the things we are good at. In regards to that, I have been thinking...
I agree with you that there can be things we do to facilitate the adoption and use of standards rather than letting things take there own course. You mentioned education which is something that needs to happen at all levels of the information spectrum, from data/tool producers to information consumers. One way of spreading the word might be to come up with a set of recommendations for different categories of interaction that are tailored to the needs of specific user groups. Are there different technical requirements for research groups as opposed to community farmers and can those information tool kits be adjusted to accommodate different cultural expectations?
Possible criteria for the tool kits might include ease of implementation, affordability, robustness of the standard, current level of adoption, flexibility, extensibility, infrastructue opportunites and limitations. Development of new and better standards will continue, but for putting tools in place now we need solutions that work out of the box.
san_jay writes about the Interoperability Triangle:
> It is good to see that some of us are trying to bring the human factor in
> interoperability. ...
> But if I summarise from everything from this thread, doesn't everything
> comes to people, processes and technology?
kbheenick writes:
> I feel that the concept of 'interoperability' needs to be considered ,
> ranging all the way from people collaborating to systems collaborating,
> with concepts and information interoperability being somewhere in
> between. ...
> People successfully interoperating means that there has been...
> an agreed set of communication protocols...
I like Sanjay's notion of an Interoperability Triangle
of "People, Processes, and Technology", and I also like
Krishan's point that "processes" have to do with "concepts"
and "communication".
One might summarize this as a triangle of "People --
Communication -- Technology".
PEOPLE
I enthusiastically agree with the emerging emphasis in this
discussion on the "human factor" in interoperability. VIVO is
an excellent example, as the emphasis since its beginnings
some five years ago has been on "connecting people" and
"creating a community" [1].
COMMUNICATION
What makes Linked Data technology different from traditional
IT approaches is that it is analogous to the most familiar
of all communication technologies -- human language.
RDF is the grammar for a language of data. The words of
that language are URIs -- URIs for naming both the things
described and the concepts used to describe those things, from
verb-like "properties" to noun-like "classes" and "concepts".
The sentences of that grammar -- RDF triples -- mirror the
simple three-part grammar of subject, predicate, and object
common to all natural languages. It is a language designed
by humans for processing by machines.
The language of Linked Data does not itself solve the
difficulties of human communication any more than the
prevalence of English guarantees world understanding.
However, it does support communication across a similarly
broad spectrum.
When used with "core" vocabularies such as the fifteen-element
Dublin Core, the result may be a "pidgin" for the sort
of rudimentary but serviceable communication that occurs
between speakers of different languages. When used with
richer vocabularies, it supports the precision needed for
communication among specialists. And just as English provides
a basis for second-language communication among non-native
speakers, RDF provides a common second language into which
local data formats can be translated and exposed.
TECHNOLOGY
Given the speed of technical change, it is inevitable that the
software applications and user interfaces we use today will
soon be superseded. The Linked Data approach acknowledges this
by addressing the problem on a level above specific formats and
software solutions, expressing data in a generic form designed
for ease of translation into different formats. It is an
approach designed to make data available for unanticipated uses
-- uses unanticipated both in the present and for the future.
[1] http://www.dlib.org/dlib/july07/devare/07devare.html
It seems we all agree that Linked Data is the way to go. So the framework is set. But within this framework the issue of defining a minimum set of data that allow to interoperate information of a certain type by other systems is still open.
It is not so much an issue of which description vocabularies (Dublin Core, FOAF, MODS, AgMES, Darwin Core, geoRSS...) to use, since this can be tackled by mapping vocabularies and using stylesheets - although the LOD recommendation is always to use widely adopted vocabularies - but it is more an issue of which data should be included in an information object so that it is fully interoperable.
For instance, if we are exchanging data about events, is it enough to use the basic RSS metadata set? RSS 1.0 is RDF, can use URIs and can be LOD-compliant, but if we don't include information on the dates and the location of the event in specific RDF properties, is an RSS feed of events fully interoperable?
An example of a service that aggregates events from different sources is AgriFeeds. The added-value service that Agrifeeds offers in aggregating events is that users can browse events chronologically in a calendar and geographically by region and country. A feed of events that doesn't have properties for the start and end date of the event and for the location, is not interoperable by AgriFeeds. In fact, it is not discarded but it is treated as a basic news feed, without the possibility to exploit the advanced chronological and geographical browse.
Another similar issue is subject indexing. Since none of the sources aggregated by AgriFeeds uses Agrovoc or other subject lists mapped to Agrovoc to tag news and events, no coherent subject browsing is possible.
In this sense, defining the actual data (or the metadata set, in traditional terms) that are recommended for each information type is more important than agreeing on a specific standard in terms of DTD or RDF schema (the "description vocabulary"). Vocabulary issues can be solved from a technical point of view, but if the data we need are not there "interoperation" and therefore re-use of information may not be possible.
Just a hint to still another "prospect", which maybe will be better covered in the next thread on latest developments.
It is good to agree on LOD as the future of interopeability, but what are we going to say to Institutions that are supposed to produce and consume LOD and don't have tools that allow them to do it?
It is true that software tools are clearly moving towards LOD, but we have keep monitoring developments in this field in order to be able to recommend tools that are not only capable of producing LOD (and therefore create a triple store of all contents managed in the system) but also flexible enough to allow to customize the classes and properties used in the triple store.
More perhaps in the next thread.
I like very much the concept of people - communication - technology. In my case, as head of the open access network in my organisation INRA, I can act toward people and make everything possible to communicate.
The Information System Division in the curl
But even if I am aware of what LOD can bring to data dissemination, I have to work with the Information System Division in all institutional projects. They have different purposes. They choose the technology : SQL database at first, then XML ones. They don't want to investigate in RDF. A group of information managers inside INRA are working on semantic projects to demonstrate the ability to use this technology and to achieve scientific goals. It is the only way to convince the IS division to go further with RDF and LOD...
Question of skill
It is not easy in France to find computer scientist - RDF skilled - to work with. Most of them have never heard about OAI-PMH. OAI is much easier than RDF to learn. Even companies that provide computer services are not yet ready with RDF development. I would be interested in knowing the situation in other countries as well as potential subcontractors !
Diane
Dear colleagues. I might take again to some ideas concerning the re-thinking on how to encourage information sharing and create shared values. In addition to the already raised obstacles to information sharing and interoperability, such as lack of clear policy and investment plans, lack of incentives, time constraint, cultural heritage, lack of relevant knowledge packed products, etc. in my experience, information technologists and information experts' community are moving very much faster than research and development community, especially researchers and extensionists in developing countries, as too many terminologies, data formats and information platforms are introduced and increasing miss much between national, regional and international information systems. This situation make it difficult to create shared culture and harmony as well as mutual trust in step by step improvement of information structuring and information sharing. So, by taking this in mind, we should think of mechanisms, and ways of reducing this gap. Creating a learning processes, integration and synergy that allow all stakeholders to contribute is important.
I recognize, with Diane, that part of the problem has indeed been the use of technologies pushed by IT departments because they lie within their comfort zones, which typically means XML and SQL. (It should however be added that not all data needs to be exposed as linked data, and that managing data in XML or SQL may in many circumstances be the most practical solution.)
That being the case, the question becomes: How can this or that database or XML database be tweaked to expose linked data -- perhaps only an extract of the full data, or perhaps on-the- fly? Data can be managed in XML or SQL and exposed as RDF. If a given XML or SQL database was originally designed with linked data in mind, or if it happens to map cleanly to linked-data structures, such transformations will be that much easier to implement.
The VIVO project has alot to say about this, as much of their data is extracted and converted from the wide range of databases and formats used on their campuses. In today's world, the (growing) diversity of data formats is a given. It is precisely because the linked data approach does not require data to be managed in a particular format that it stands a chance of succeeding.
Hi Diane
I started to be aware of the acceptance and application of Open Access as well we Web of Linked Dada (WLD) in China, since last year. I wanna know how many institute and organizations or projects has adopted these two approaches to run their applications, the number is very limited.
yes perhaps people is more used to the tools they use well, and there are many usable tools to develop a CMS application or simple online query system, which are enough for user to find the info they want.
but to me WLD or LOD is not difficult, perhaps it requires more patience than technical skill, an LOD demonstration seldom looks attractive (see those samples from W3C).
While using data in models other than its original one, it often can not be used as is, some kind of transformations will be needed. Such transformations can be divided into 3 occasions: formal transformation, semantic transformation, and the combine of both. The formal transformation is about the data type, precision, format, etc. All of these formal transformation concerns can be handled by LOD standards. The semantic transformation is about the conceptual and logical aspects of data, and can not be handled by LOD. The semantic transformation even can not be fully done by Ontological methods, because that logical inference can only handle formal semantic problems, most of the real problems can not be formalized completely, and commonsense computation must be introduced to solve them. Today’s commonsense computation is represented by Watson like machine, and still far from mature. Commonsense computation will still become the bottle neck of interoperability in future years.
Prospects of interoperability in the future are promising. Already there are many organisation that are sharing data especisly in business entities They are making some of their data accessible and used by other organizations and vice versa. For this to truly work it requires changing negative mind set regarding making available and sharing the existing information and knowledge. Interoperability will reduce duplication, the amount of time in generating new information and where possible knowledge and improve efficiency of agricultural research systems.
Prospects for interoperability are very high. However they depend on the agreement of international standards for data formats, designs and tools. The agreements to share information located in institutional repositories will also be key. We need to approach this issue at global, regional, national and institutional levels. We also need to be aware of where the technology is moving in relation to web design.
Mon nom est Andriamparany travaillant au Ministère de l'agriculture à Madagascar.
Je vois que les précédentes contributions ont soulevé de nombreuses questions dont beaucoup vont au-delà de l'échange d'information, mais doivent se refléter dans les tendances futures qui aider à améliorer la documentation d'information et de partage de l'information. Certaines questions comme un autre continent ... un autre rêve, le manque de connaissances sur les ordinateurs et web 2.0 », mon premier intérêt", le patrimoine culturel, le manque de connaissances emballés produits qui sont dans l'intérêt des agriculteurs, le manque d'incitations entre chercheurs, en particulier dans l'absence de nombreux pays en développement, de clair partage de la culture, la façon de documenter et de faire résultats tangibles, et beaucoup d'autres questions qui pourraient demeurer comme des obstacles touchant le partage des informations. Maintenant, la question est de savoir comment nous pouvons penser à des suggestions qui pourraient faciliter une meilleure reconnaissance des efforts de recherche et contribuent à briser le cercle vicieux cercle dans l'intégration des connaissances scientifiques et autochtones ainsi que les mécanismes qui facilitent la plus participative et centrée sur l'agriculteur approches menant à des formats adaptés de l'édition et le partage d'informations.