Valeria Pesce

Valeria Pesce

Organization Global Forum on Agricultural Research and Innovation (GFAR)
Organization type International Organization
Organization role
Partnerships facilitator & digital innovation adviser
Country Italy
Area of Expertise
data sharing, data policies and rights, digital agriculture, information management, open data, data science

I am currently project manager and convener at the Secretariat of the Global Forum on Agricultural Research and Innovation (GFAR)  and data scientist at FAO, and I collaborate with the Secretariat of the Global Open Data for Agriculture and Nutrition initiative (GODAN). I have represented FAO and GFAR in EC-funded projects on data infrastructures (agINFRABig Data Europe) and I manage the CIARD RINGAgriProfiles and Agrisemantics Map open data platforms in coordination with other global and regional actors. More recently, I have dedicated a good part of my time to the planning and convening of workshops and webinars on the issue of farmers' data rights.

This member participated in the following Forums

Forum Forum: "Building the CIARD Framework for Data and Information Sharing" April, 2011

Question 1: What are we sharing and what needs to be shared?

Submitted by Valeria Pesce on Thu, 04/07/2011 - 17:47

Just to remind everybody that the forum thread on question 2 is open!

Submitted by Valeria Pesce on Wed, 04/06/2011 - 18:43

I agree with John.

I think the audience is more relevant when it comes to deciding on which topic / subject area we want to share information and provide information services. Of course a Research Institute on plant genetic resources will consider it essential to provide its audience with information services on plants and genes and less important to provide for instance (even if they have it) information on national government bodies or other things.

But that has to do with the topic/domain of what we are sharing and the scope of our activities, not with the type of "information object" that can be shared.

In whatever subject domain we work, information can be "serialized" (?) in different ways, it can be bibliographic records, news items, blog posts, pictures,  datasets (contact lists, raw scientific data, directories of projects - datasets in the end include everything)...
When it comes to deciding which of these types of information needs to be shared, I think the audience doesn't matter, depending on specific conditions they may need one or the other, so it's worth sharing everything. And, as John says in his post above, in making this decision we have to consider more machines (other systems) than the human audience :-)

 

Submitted by Valeria Pesce on Tue, 04/05/2011 - 17:09

Reading some of the posts here and always finding out something new that I didn't know about what other Institutions are sharing and in which form, maybe there is one thing we have to add to the list of what we need to share: we need to share information on what we are sharing :-)

Seriously, we usually find out about interesting information services because we know who is managing them, because maybe there is a short email campaign publicizing them, because other websites link to them, but when we are looking for information sources of a specific type (RSS, OAI providers...) because we want to build an added value service or just aggregate information in our RSS reader, we may not find everything that exists and we may even miss the best sources.

This was indeed the main reason behind the idea of creating the RING (http://ring.ciard.net) under the CIARD initiative: a directory of information sources in agriculture, described and classified, especially with regard to technical features that are relevant to interoperability.

(Of course, at a higher level of complexity, also some techniques for "auto-discovery" could be used, adopting XML / RDF descriptions of sources: already the RSS channel metadata and the OAI-PMH Identify verb go in this direction).

Submitted by Valeria Pesce on Mon, 04/04/2011 - 18:31

This study on information exchange patterns gives an interesting perspective...

Of course, given its scope, this study is focussed on exchange among scientists.
I think what most of the other stakeholders in agriculture (donors, extensionists, farmers) are concerned with is the sharing of heterogeneous information (scientific, raw data, management data) among different types of actors.
So, in my opinion, even if top-down information sharing policies apparently do not influence the way scientists exchange information, they may as well be needed if in the view of donors or project managers it is essential that scientific information is shared beyond the scientific community.

Submitted by Valeria Pesce on Mon, 04/04/2011 - 18:23

I like the focus on LOD and RSS in the above posts. And also the mention of integrated relational "content models".
The difficulty is sometimes in sharing highly structured and integrated information in a simple way (like RSS) while retaining the relationships across different systems, but I'm sure this will be the subject of another discussion in another more technical thread of this e-consultation.

In reply to the question of "What needs to be shared", I think that it is not always obvious to imagine what can be useful to others. An Institution that produces scientific publications and manages projects in the area of plant diseases may decide to share only peer-reviewed articles, final project outputs and a list of projects, and then maybe what a certain extensionist in the field is looking for is a picture of a specific plant disease or the addresses of plant clinics in a certain country. That Institution may have had that picture in some project documentation and some addresses of plant clinics in its contact lists, but perhaps those "information objects" were not considered worth sharing. So the full potential of the information owned by the Institution is not exploited.
(I am speaking here independently of technical difficulties and other constraints in setting up a sharing environment, only focussing on "what needs to be shared")

What I mean is that we don't necessarily need to make an assessment of the usefulness of certain "information products" at the source level (except for a quality assessment of course): thanks to intelligent aggregators, consumer services and the final user can decide what information to assemble and from where.

But in order for intelligent aggregators to work, what matters is the "re-usability" (already mentioned by Diane above) of the information we share:
the more granularly we structure our contents (identifying the smallest possible information units), the better can other services re-use it and re-package it for different end-users.

So in my opinion any potentially useful piece of information is worth sharing, possibly as small information units with metadata, like single records (e.g. the name and specialization of an expert, or scientific data on a gene), single electronic resources (e.g. pictures or videos), even single semantic units automatically extracted from an article...
 

Question 2: What are the prospects for interoperability in the future?

Submitted by Valeria Pesce on Thu, 04/07/2011 - 17:44

Hm... it seems nobody has realized that this thread has started... :-)

I think it is very important to agree on what interoperability is and what different levels and types of interoperability can be achieved.

I guess we all agree that a web page listing some database records in HTML is not interoperable, while the same list of records as an RSS 2.0 feed is interoperable, but for example do we all agree that the same list as an RSS 1.0 (RDF) feed extended with richer metadata and using Agrovoc or NALT terms (or even better URIs) for subject indexing is more interoperable?

This is important because once we decide to make the effort of making our sources interoperable it is worth opting for the better forms of interoperability.

I would say that the level of interoperability of an information source corresponds to the number of "lives" that data from that source can live.

  • Information in an HTML page doesn't live any more lives than its own.
     
  • Information in a basic RSS 2.0 feed lives potentially infinite new lives in its "first generation": it re-lives in all the websites and RSS readers that display it. But basic RSS metadata do not allow for efficient filtering and semantic re-use of the information and most websites and RSS readers cannot do much with basic RSS feeds beside displaying them, which in turn means again HTML pages, and no more "lives".
     
  • Information in an extended RSS 1.0 (RDF) feed can live the same lives as basic RSS 2.0 feeds in its "first generation", but with the additional advantage that RDF triples (and URIs) and richer metadata can be more easily re-processed and re-packaged as new interoperable sources, allowing information to live additional lives, through several "generations".
     
  • Information in a highly specialized XML format can be easily re-processed and re-packaged by specialized clients, but few consumers will be aware of the specialized metadata (provider and consumer are "tightly coupled"), thus limiting the number of "lives" it can live.

Usually, specialized formats, vocabularies and protocols allow for more advanced re-processing of the information, including semantic re-organizations, but only by specialized consumers, while simple protocols (RSS) in simple formats (plain-structure XML) using well-known vocabularies (DC, FOAF) are easily understood by any consumer (like RSS readers): provider and consumer are "loosely coupled".

These are two different types of interoperability, and it's not always easy to decide which one is better. In most cases the best option could be to expose data in more than one format / protocol.

In general, it seems to me that RSS 1.0 (RDF) feeds using extended metadata and URIs from standard vocabularies combine the best of the two worlds.

Become a member

As e-Agriculture Forum member you can contribute to ongoing discussions, receive regular updates via email and browse fellow members profiles.