Question 3: What are the emerging tools, standards and infrastructures?

The new paradigm for interoperability on the web and for building the basic layer for a semantic web is the concept of Linked Open Data1 (LOD).

Instead of pursuing ad hoc solutions for the exchange of specific data sets, the concept of linked open data establishes the possibility to express structured data in a way that it can be linked to other data sets that are following the same principle. Examples of an extensive use of "linked open data" technologies are the NYT or the BBC news service. Some governments too are pressing heavily to publish administrative information as LOD.

The Linking Open Data cloud diagram

The technology of LOD is based on W3C standards such as the "Resource Description Framework2" (RDF), which facilitates the exchange of structured information regardless of the specific structure in which they are expressed at the source level. Any database can easily be expressed using the RDF, but also structured textual information from content management systems can be expressed in RDF. The presentation of data in RDF makes them understandable and processable by machines, which are able to mash up data from different sites. There are now mainstream open source data management tools like Drupal or Fedora commons which already include RDF as the way to present data.

Within the area of agricultural research for development an infrastructure to facilitate the production of linked open data is needed. The four key elements to make this possible are:

a registry of services and data sets (CIARD RING,http://www.ring.ciard.net);

common vocabularies to facilitate automatic data linking (thesauri, authority files, value vocabularies);

technology (content management systems, RDF wrappers for legacy systems);

training and capacity development

1 Linked Data - Connect Distributed Data across the Web http://linkeddata.org/ Last accessed March 2011
2 Resource Description Framework http://www.w3.org/RDF/ Last accessed March 2011

Submitted by Valeria Pesce on Sun, 04/10/2011 - 18:15

I would start with a short list of some interesting recent developments (in terms of tools, standards and infrastructures) that can help achieve better interoperability of information in agriculture going in the direction of Linked Open Data (LOD).

1) The publication of "authority data" that are relevant to the agricultural sector (and here I include subject vocabularies, KOS, authority lists of special entities like journals or authors, geographic entities...) as Linked Open Data. An example is Agrovoc. Also the geopolitical ontology is ready to be published as LOD. And an authority list of journals on agriculture has been published by FAO.

2) The mapping of some of these authority data between each other (e.g. Agrovoc to NALT, and several geographic encoding standards mapped in the geopolitical ontology)

3) Software tools (document management systems, content management systems, blogging platforms etc.) going towards LOD.
In the AgriDrupal community, we are experimenting with the Drupal CMS and its RDF features. Drupal can expose all contents as a triple store, mapping all data in the system to classes and properties from any namespace (also through a SPARQL engine) and consume Linked Data by importing RDF records both from files and from SPARQL queries.

4) A very recent development: the preparation of some "recommendations" for publishing bibliographic records as LOD: this is interesting because it goes beyond the concept of a rigid RDF schema and proposes several options for each RDF property, taken from vocabularies such as Dublin Core, Bibo and AgMES, giving options both for literal values and for URIs and different options depending on the granularity of description desired. These recommendations should be published at the end of April.
Similar things could be done for other information types.

5) A portal keeping track of all the information services / sources exploiting these standards and tools: the CIARD RING.

Submitted by Krishan Bheenick on Mon, 04/11/2011 - 12:07

Thank you Valeria for enlightening us on the trends in tools and technologies.
I believe this kind of information, i.e. being described at a more general level, and which most of us can relate to, is what we need more of.

Too often, I have felt that the CIARD initiative is trying to talk to a broad range of specialists, each more interested in one area of the spectrum of disciplines that the CIARD has to deal with. However, each group of specialists needs to use some technical terms that others may not be familiar with. In the end we have useful conversations going on in pockets of specialised areas, and news of the success stories or significant initiatives are not reaching the rest of the CIARD stakeholders.

I would like to suggest that we try to draw a conceptual model of how our interventions fit in the broad sense of what CIARD is trying to help us achieve.
If we take the LOD as an example, it needs to be explained to all our stakeholders so we all understand why a group among us is getting particularly excited about LOD – generally its those people involved in vocabularies, ontologies and (?) who seem to be saying a lot about LOD. What is it that the extension and advisory services information specialists need to grasp about LOD that will also bring them on board? We need to have some of these examples brought to the surface.

During the past 2 questions, there has been an extended discussion on the issue of the Research-Extension-Farmer linkages, with a new breed of Agricultural Information Managers sitting somewhere within that triangle. There has also been talk of multiple triangles of people:processes:technologies being mentioned which are interlinked, and the need to define these.

So, in addition to the tools, standards and infrastructures, we seem to need the conceptual framework within which we are working to be better defined. Maybe I should say that there is a need for the conceptual model to be formalized, as we all seem to have an idea of what CIARD helps us to do, but we have not yet shared our mental models to agree on a formal model that tries to describe the CIARD framework. Could this interfacing be visualized in the form of those triangles being interlinked at the people/processes/technologies points such that they bridge people from different ends of the CIARD stakeholder spectrum? Perhaps that is an activity that could be tried out during a face-to-face gathering discussion the CIARD.

Submitted by Hugo Besemer on Mon, 04/11/2011 - 16:30

Well, I have tried to do some of what you propose: explain to a wider audience why we get excited about LOD. In a general introcution for information committees within our science groups here I had included it in a general intro on data repositories and data curation. I could as well have spoken about Paracelsus' prognostications, and I have skipped it for later presentations.

I guess that people will understand it when they see a result. And it should be something that really could not have been done in another way. Just pulling in additional information from other sources is not good enough, people are used to web 2.0 mashups.

Submitted by Marco Fahmi on Fri, 04/15/2011 - 06:51

For my work, I use the two frameworks to present research based on aggregating and synthesising data: a Researcher/Policy Maker/Grower framework the clearly identifies the benefits of the work for each of these groups. For the scientists, this usually cover common themes discussed here such as standardisation of data. But also, higher exposure of research data and the (re-)use of data to build better models.

For the policy maker, publicising, standardising and sharing data is a win because it is a better investment of public money. Often the data is generated using public funding or public infrastructure. Open access to the data removed barriers for better exploitation of the data. It also means that research projects have tangible outcomes in the form of readily available data that can be used by scientists and possibly growers. Finally, the re-use element of data sharing is attractive as it means better return on the investment since the data can be later used for longitudial studies or for larger scale models without any outlay of money for new data collection programs (in comparision, data management and curation is cheaper).

For the grower, the immediate benefits might be a little trickier to convey but they often grasp the need for better models which can only be obtained with larger data sets with high temporal or geographic resolution.

The other frame I use is the People, Technology, Process continuum. This highlights that the technology part of data sharing is often that easiest or least problematic. We also have fundamental issues to deal with in research practice (when and how well do we document data? What is a data set? How to we guarantee that data is shared in safe and secure way? How can we combine data from various sources in a meaning way?)

The people part focusses on intrinc and extrinic rewards for sharing data. Incentives to change attitudes about open access of data and impositions to change behaviours. Also, the ability to acknowledge and rewards early adopters of open access etc.

Submitted by Diane Le Hénaff on Mon, 04/11/2011 - 13:52

Thank you a lot for all these interesting references. I received a draft version of the recommendations mentionned in 4) and I can say it is very useful.

Diane

Submitted by Jiae Liu on Mon, 04/11/2011 - 10:00

LOD is not so heavy as ontology, but I am worried about its ability to express semantic.

Submitted by Hugo Besemer on Mon, 04/11/2011 - 11:03

For a complete picture: maybe Linked Open Data can be the way to go for many types of data, but not for all. For many of us disk space seems to be unlimited, but there are also scientists who manage to get beyond those limits. Observational data like spectral data sometimes comes as multidimensional arrays and it may come in terabytes of it. Even a simple marked-up character based format like csv may become too large. There is a binary exchange format NetCDF http://www.unidata.ucar.edu/software/netcdf/docs/faq.html . NetCDF automatically documents the data structure as well (but of course all sorts of additional data documentation is needed for re-use.) There is specific indexing software http://opendap.org/ to query and transfer parts of a dataset (for files of this size transport is also an issue)
Although linjked open data is out of the question for the datasets themselves, the metadata that descripes them may very well be LOD. An example of a repository that exposes its metadata as LOD and exchanges as NetCDF is 3TU Datacentrum http://datacentrum.3tu.nl/en/home/

Submitted by Marco Fahmi on Fri, 04/15/2011 - 06:04

A number of organisations here in Australia have used NetCDF with great success. Namely, it has been used to store remote sensing data (AusCover) and gas emissions data (OzFlux). It has also been used (with custom variations) to store marine data (eMII).

As Hugo notes, NetCDF is very convenient in that it encapsulates both data and metadata in one file without having to worry about low-level technical details such as the order of data rows, number formatting etc.

What is not encapsulated is a controlled vocabulary or an ontology; this will have to be linked to externally. This is not necessarily a bad thing a loosely-coupled ontology means you can easily mix and match ontologies (or have none) based on your need.

On the other hand, I think that preparing and sharing NetCDF documents is probably only convenient for large homogeneous data sets and where the scientific community has pretty much agreed on the general format of the data. I wonder if preparing NetCDF files (and discovering the content of NetCDF files) when data sets are small or there are no standards is too burdensome for the individual researcher.

Submitted by Diane Le Hénaff on Mon, 04/11/2011 - 13:49

CIARD RING as well as this forum is really interesting to identify tools, infrastructures. It gives the opportunity for partnerships.

Submitted by Nabeel Abu-Shriha on Mon, 04/11/2011 - 17:58

Dear All,

I am sorry for being late in contributing to this valuable E-discussion and my points will concentrate mainly on defining the main data users and beneficiaries and what will be our main purpose from this approach in data management and utilization for AR4D.

I am much worried about how it will be easy for every one looking for such data to use it in a very simple way without going through this long series of procedures and recommendations.

Me as rural community development expert I am much interested in how I can utilize this data when planning for my programs and activities, how can i transfere this data to actual programs and projects that benefits farmers and how i can introduce appropriate technology/information in developing my agriculture and farmers standards of living.

They said if you have the information you have the power but how can I use this information to develop my techniques and ways for more production, good quality and better market.

Charing of information are very essential as first step to start maximize our research benefits and for wider distribution and benefit but with now tools in how to use it in practice and apply it in the ground to the designated target groups will be ONLY adding new books on the Shelf.

Thank you again for giving us this opportunity to take part in CIARD valuable initiatives and Best Regards,

Nabeel Abu-Shriha

Amman-JORDAN

Submitted by Simon Wilkinson on Tue, 04/12/2011 - 11:14

I feel compelled to appeal for consideration of an "appropriate level of technology". There is a huge gap in the capacity of institutions to deal with a lot of this technology.

A case in point: Most of the research centres in our network do not have an IT department. When you get down into the state/provincial level institutions they may not have any IT staff or information specialists at all. Whatever systems they have tend to be run by enthusiasts from some other discipline rather than professionals in this area.

This group needs simple tools that will allow them to share their data in a meaningful way (perhaps with larger systems operated by better-resourced institutions that can archive it). They need things that are easy to set up, use and maintain.

If I may labour the point, tools that require specialised server environments and complex configuration to set up and a programmer or engineer to maintain are beyond their capacity. Such resources are just not available to them.

There's a place for both 'high end' or complex tools that require the resources of a large institution to run and 'low end' tools that don't. I guess I'm saying that the low end should not be neglected, as this will help perpetuate the digital divide.

Submitted by Simon Wilkinson on Tue, 04/12/2011 - 18:32

Maybe I should offer an example. We have been doing some work to add support for Dublin core and OAI-PMH to the content management system we use, ImpressCMS.

The first module we have released is a 'podcasting' tool for publishing audio and video recordings. As you might expect, recordings can be browsed online, downloaded, streamed or accessed with podcasting clients via RSS feed with enclosures, and shared via social media. Installation is a simple two-click process. The data entry form uses unqualified Dublin Core fields to capture metadata in a convenient form. The module also supports a zero-configuration implementation of the OAI-PMH. The key point is that the operator doesn't *need* to know the details of Dublin Core or understand OAI-PMH in order to establish an OAI repository with this module. It 'just works' out of the box and can be used by *non-specialists*. A live demo of the Podcast module is available here (info about the archive including the base URL is available here). A second more general purpose "library" module is currently in beta. Both are distributed under the GPL V2.

Submitted by Johannes Keizer on Tue, 04/12/2011 - 13:24

Greetings from Brussels. I am here to participate in a meeting between the EC and some enterprises about "Linked Open Government Data". I was invited to present at this meeting AGROVOC published as Linked Open Data set. There is a growing awareness also among donors that we need to improve the information (management) architecture in the area of agricultural research and innovation. The EC has recently offered funding for 2 projects in this area The AIMS team is involved in a project called agINFRA, which is just negotiating the budget with the commission. agINFRA is not a research project, but it's aim is to create useful and usable systems and tools. Part of agINFRA are also partners from China, India and Ecuador, so not only European institutions. I am informing here about agINFRA because I want the involvement of the entire community which is discussing here. We will need feedback to do something useful. Therefore I want here to share some points of the work program for agINFRA

- we will invest in components like AgriDrupal and AgriOceanDspace that can be used efficiently also in institutions and working groups without IT department and without bigger IT support and which do not require constant connectivity neither for use within the institution. These components will stay within their open source community and the project will give only a boost to adapt these tools for our purposes. Further in the year, when the project officially is signed and will start we will approach all of you to ask for particpation in the requirements formulation

- we will improve the AGROVOC VocBench to make it a serious reference system for concepts and entities in agricultural research and innovation that can easily be used by everyone, online and offline. We have started two weeks ago with AGROVOC as "Linked Open Data" and we have already 20,000 links from AGROVOC LOD to other systems and vocabularies. The power of this has to be made usable for the entire community

- we will improve what our colleagues at IIT Kanpur have developed with agroTagger. Many of you will have seen was Thomson Reuters has developed with open Calais. We will make AgroTagger the openCalais of our community. We think that marking up the agricultural content on the web as much as possible with concept URIs from AGROVOC would bring us very much forward in the ability of connecting data.

- part of agINFRA is the improvement of the CIARD RING to create a gloabl switchboard for information services

- agINFRA will also look to cloud computing and develop prototypes how cloud services with high computing power (i.e for big sets of LOD triples) can be used to process data for other partners which do not have this computing power.

These are only some spots. All upcoming tools and technologies will have to address the heterogenicity of partners, data and services. They should not aim to overcome this heterogenicity, but to make data sharing possible within this heterogenicity.

The goal is not to creae one big infrastructure in which all have to participate. The goal is to create components, methodologies and standards which serve to create many interoperable infrastructures.

Submitted by Diane Le Hénaff on Tue, 04/12/2011 - 15:26

There is a -just started - national (French) project about building a Scientific Digital Library Infrastructure that will not only focus on archiving publications.

I am involved in some of the working groups and LOD is being considered as a relevant technology.

I would have liked to go beyond geographical barriers as I think that thematic networks and infrastructures are more relevant than countries. Then it is more a funding issue...

Submitted by XiaoLu SU on Tue, 04/12/2011 - 16:51

Automatic annotation can help to improve tagging or indexing of contents. If annotated properly, with a well organized thesaurus, a featured term map can be extracted from each text. With these featured maps, contents of these texts can be identified from each other, links to terms can be extended to other vocabularies, even in other languages, that make possible to automatic key work tagging, and ease the job of search engine. A network analysis method can make automatic annotation more accurate. And the thesaurus can also get benefits from annotation results to improve its structure.

Submitted by Johannes Keizer on Wed, 04/13/2011 - 08:28

I agree very much with you! Our friends and colleagues at IIT Kanpur have developed a tool called AgroTagger, which is now available also on the web. It uses AGROVOC to analyze text for concepts that covered in AGROVOC and then produces AGROVOC tags/URIS to which later then can be referred.

Submitted by XiaoLu SU on Thu, 04/14/2011 - 06:34

We have also developed an auto-tagging system, using China agriculture thesaurus. It is focused on Chinese processing. Now it runs as windows application, and can not be accessed from Internet. We have planned to develop a web version soon, and wish it can interoperable with AgroTagger. if AgroTagger can not handle Chinese, I think our system may do some help.

Submitted by Johannes Keizer on Thu, 04/14/2011 - 12:29

woww! this is very precious Info. AgroTagger at the moment processes English, Hindi and French as I know, but having Chinese would be essential. We should set up collaboration on this.

AgroTagger is part of the agINFRA FP7 project. CAAS AII and ITKanpur are both part of agINFRA :-) So the opportunity is very good to create a multilingual automatic indexing for agricultural information!

Submitted by Prabhakar Tadianda on Thu, 04/14/2011 - 16:46

Agrotagger was developed in collaboration with FAO and ICRISAT. For a given document it can suggest keywords from Agrovoc or from a small (about 3000) subset of Agrovoc(named agrotags). You can check it out at http://agropedialabs.iitk.ac.in/Tagger/. The documents can be in formats like PDF, DOC, .. This software is also mirrored at MIMOS Malasia http://202.73.13.50:58300/agroTagger/

Submitted by Karin Nichterlein on Tue, 04/12/2011 - 18:19

Dear all, let me present you a concrete information and communication system that focuses on sharing of information on problems, technologies and good practices relevant for small producers. This system is provided by the tri-lingual platform on Technologies for Agriculture (TECA) established by FAO in English, French and Spanish that was first introduced as a supply-driven repository for technologies validated with producers. An evaluation revealed that TECA is a good system to document and share otherwise dispersed information on technologies for smallholders with the advantage of providing a standard framework for description. The evaluation as well found that the platform lacked interactivity and needed to become demand driven. A new TECA version in Drupal was launched that combines the repository with communication tools allowing exchanges by users on technology and information needs and sharing of experiences. The new platform and its new functions such as an exchange group function was tested a) with rural users in a field pilot in Uganda and b) with a community focusing on the subject area beekeeping and marketing, in close collaboration with the International Federation of Beekeepers’ Associations, both groups using facilitators to encourage sharing of experiences and technologies. Lessons learned are being used to further improve the platform to make practical information easier accessible (low band use) and understandable for those working closely with small holders: producer organizations, public and private extension agents, NGOs, training institutions, research systems or institutions, input suppliers, marketing agencies, etc. While improving the TECA platform http://teca.fao.org/ (including upgrading it to Drupal 7), we are also revising the TECA guidelines and will soon have a package ready consisting of the improved TECA Drupal components and the guidelines that will help to integrate TECA into national programmes, institutions, development projects etc. for sharing of technologies and good practices with those most in need and often neglected, rural people and small holders and those directly supporting them. Related to the content management system we use for the platform, we closely collaborate with the AgriDrupal community. The platform under revision will also be adjusted to the latest Agrovoc developments.

Kind regards

Karin Nichterlein

FAO, Research and Extension Branch

Submitted by Sanjay Chandrabose Sembhoo on Tue, 04/12/2011 - 19:48

I have been trying hard to absorb all the valuable insights provided in this thread.

I think everyone agrees that linked data / interoperability is the way / pathway to better globalised information sharing.

I appreciate also the fact that several tools - Agricdrupal, AgriOceanDspace, CMS etc..., have been mentioned, but I am really questioning myself on whether we are being democratic enough and whether we are helping the community ...

So in addition to what Krishan and Ajit have already mentioned ...

I would have been more keen to learn the other way round and focus on the features that need to be present for any information system, however small it may be, to play its role fully.

For instance, I want to run a library management system and I want to share my contents internationally, in this case, other than document/media management aspects (circulation, loans, barcoding etc..) of the system, on what criteria should I be chosing a particular system? What should there be in the tummy of this software, for it to be able to communicate outside and share what we have?

Other than features, let us not be overwhelmed with technology and overlook the human factor and processes involved. As long as a partcular system is secure, stable and has the desired features / components, why not use the easiest and most aesthetically pleasing ones.Why make life difficult!

Bottom line:

1. Yes we have certain tools out there that can work wonders. But what's in a name!

2. Our priority here is sharing information. The way - linked data. For that purpose, it is not singled out tools that we need.

3. We need to understand as a community the core of these tools, so that we have the freedom to choose the one we feel at ease with. This is how we will all start using systems that can communicate among themselves. Otherwise, we will end up frightening a large fraction who won't be on the same boat.

4. It would be helpful if we (especially the IT people) could come up with a list of features / standards required for particular systems to effectively communicate with each other.

5. This will be our buy in - Where we tell the community - Listen... if we want our systems to communicate, we need to use for XYZ system that has ABC features ... and here are a LIST of tools that have them. Have your pick and let us start!

Apologies if I went out of context!

(Johannes - I could not help smiling at this: Real world problems are resolved only by handling these technical details.

Bro... am not sure if nature works this way...)

Submitted by Hugo Besemer on Wed, 04/13/2011 - 09:44

I was quite intrigued by your post , San_Jay, and some on this forum (at least Valeria, Johannes) know why. This post addresses the question at what level CIARD should be involved with tools, their selection, support and development.
I can see a number of levels:
- List the features that certain tools should have to share your information with the rest of the world, as CIARD is advocating. It should for example be harvestable with the OAI-PMH protocol or produce sitemap files for indexing in search engines. This to me seems to be what you are advocating. One of the iterations of the "tools piece" that we are struggling with did exactly that. We were not sure whether this was helpful enough for all: it helped setting up a shopping list but gave no information about brands and addresses of shops.
- A next iteration under development now attempts to list tools that have a number of these desired features. Such a list should come with more relevant faetures ranging from hard technical issues (operating system) to more soft issues (skill level required to run it) This approach will only be succesfull of the CIARD commmunity as a whole is willing to share it's experience to maintain all this.
- A next level of involvement coud be assembling a package or toolkit that fulfills the neds of most of the community. A sort of filled shopping bag with what many of us need anyway. FAO has assembled Agridrupal to support a number of partners. FAO contributes it as an option to the CIARD community but other CIARD partners follow different lines. Do we expect this level of support from CIARD?

I think it comes back to the question of standards. We can only achieve all the things we want if we follow certain standards. As Johannes said standards are not set but accepted. Standards will not be accepted if there are no tools that handle them. For example: XML and XSLT would not have been accepted if Michael Kay would not have developed the Saxon processor, to get people started. Do we need such seminal technoogies for the standards that we are tring to get accepted?

I suggest we use question 4 of this discussion to discuss how CIARD should proceed with all this.

Submitted by Sanjay Chandrabose Sembhoo on Thu, 04/14/2011 - 05:27

Hugo ... Intriguing? I wish I also knew why?

Thanks for commenting on the post. You got the picture right and the points (levels ...) brought forward are pertinent. The issue of standards is a priority.

We shall definitely catch up on Q4, but till then, I would also appreciate some reflection on below:

I believe it is not only about standards and features but also of 'freedom to choose'. This is very important for the community to join the wagon. If not, the process will either be limited or fail.

If an organisation has choice, then it becomes easier to SELECT tools that can INTEGRATE existing institutional infrastructure and / or cohabit with existing tools. It hardly bears financial costs and is more easily accepted by management (Let's not forget that we need the green light from our bosses before overhauling anything).

Again, if there is choice, it is most likely that the easiest system adapted to current needs will be selected - less staff training will be required and adaptation time will be less.

And so forth...

The community is made of thousands of institutions scattered around the globe. Most of them are under staffed and under budgetary constraints and CIARD's initiatives might definitely not be their priority of the day. Should they be willing to share information, which tool will they readily opt for or at least try: One selected out of many that meets and suits their current needs and situation or will it be singled out ones!

Submitted by Valeria Pesce on Thu, 04/14/2011 - 17:45

I think what Hugo is saying here is very interesting and gives a good answer to Sanjay's concerns, which have been often expressed also by other colleagues when talking about giving advice also on which tools to use for better information sharing.

In particular, when Hugo says that the list of "features" _may_ not be enough ("it helped settingup a shopping list but gave no information about brands and addresses of shops"), I think he is raising a point that is very important when you think of the decision making process: if you give a list of features to a technical person, this person may be able to evaluate several tools and identify those that have the necessary features, but managers who don't have technical staff or time to conduct this evaluation may just want to know which tools have the necessary features ("brands and addresses of shops") and they might look for a source of such information that has some consensus and "authority", and maybe CIARD at this point could represent that consensus, but this will happen only if the community sharing their experiences with tools is large enough and if there is some agreement on which "features" are most relevant for which needs.

And one personal and questionable (and it will be questioned I'm sure) opinion on being completely tool-agnostic: the fact is that there _are_ differences between tools, and being too neutral can also be damaging. There are tools that have proven more suitable to information management and integration of standards than others, and more sustainable.

And tools aren't just "technology", they are the instrument through which you implement the standards and in the end you make your information accessible.

Submitted by Marco Fahmi on Fri, 04/15/2011 - 06:38

In a recent seminar, someone said in a presentation that "standards are like toothbrushes, everyone wants their own". And that's a really tricky situation. Most of us cannot afford building our own standards; and even if we ever do, we cannot deploy them in a sufficiently large scale to become accepted. Yet, we often find existing standards (if any) lacking or inadequate for our own purposes.

Since I work for a small research institute, we have elected to borrow somebody else's toothbrush. We agreed on some minimum requirements (XML, OAI-PMH) and then went out and looked at what is available out there that meets our needs. We were fortunate to find something that did (in our case EML - Ecological Metadata Language) but this came at a cost.

The protocol is general-purpose enough and the software is forgiving enough that you can deposit any of our data in it. But this means we have to give up powerful search and indexing capabilities and sophisticated ways to manipulate the data. This could only be done using robust semantic technology which we do not have. It is like buying pants that are a couple of sizes too big but missing the belt that will keep them up.

It is virtually impossible to translate knowledge from one ontology to another without running into problems. But we are too small to do such work. Fortunately a consortium has recently emerged (DataONE) to specifically look into this for the ecology space. Amongst what they do, they are now working on making their technology (the EML-based system we use in our institute) interoperable with other platforms.

I think this probably goes some way in delineating what can and cannot be done for research organisations and for consortia: like us, a bottom-up approach for the former based on immediate needs. For the latter, a top-down strategy focussed on interoperability (semantic and technological).

Submitted by Richard Tinsley on Tue, 04/12/2011 - 22:17

As I have been following the forum it appears the emphasis remains sharing research information within the research community across the globe. Perhaps that is where most of the sharing is needed. But there is a question that I have been thinking of the last couple days, and that is to what end. My guess is that:

1. occassionally the sharing of information will lead to a completely innovative idea, that can then be researched in detail, etc.

2. More commonly it is simple a sharing of similar information that reinforces or fine tune already estabished innovations. As such if gives the scientists confidence in what they are doing. But since agriculture is always a very local science how critical this is may be a worthy question to address.

3. It also tend to be commodity specific or even sub-commodity specific looking at fertility, weed control, pest management separately and treating the research commodity as if it were the only commodity involved.

This is all well and good and is done in an ideal research environment with few if any limitations on resources needed to conduct the experiments. As mentioned before this gives excellent results as to the physical environment potential, but does not measure the drag on this potential that result from the limited resources the end user may have when attempting to apply the research to their specific fields, and integrate it into the rest of their farm enterprises, both crop and animal as appropritate.

My concern along with that of several other members is how the research information gets to the farmer end user, particularly the smallholders with the very limited resources to implement it and in the overall economic and administrative environment found in most developing countries. That is virtually no tax base to fund agriculture support services.

The need here is for more general information on all the activities needed to optimally producer a crop. This is normally done as strictly an educational effort for detailed recommendations and assuming that farmers can readily accept the research result once they understand it. If not it is the failure on their part to learn and they need to be repeately taught. But what happens to the research result when the basic crop establishment, by a hungry exhausted farmer who can only work 3 or 4 hours a day, takes 8 weeks instead of the anticipated 2 weeks? Will the rest of the detailed message still be valid, or will the plant population, weeding, fertilizer response be severely compromised? How does this drag get fed back into the research program so adjustment can be made? I wish the forum could spend a couple days address that issue.

Also, most smallholder are involved several crop and animal enterprise the management of which has to be integrated to obtain the maximum yield of all farm enterprises and priortize their crop program, usually giving priority to subsistence crops over cash crops. Again that can compromise the best research effort. But are there any return links so the researchers can better appreciate the constraints under which the farmers are operating and not simply blame them for failure to learn and appreciate the research results. The alternative is to keep pushing out solid research results that are beyond the means of smallholder farmers to utilize.

This still leaves the question of the means used to convey research information to the farmers. With most extension officers concnetrate on collaborating with development project as a means of getting some supplemental income, most of this is through the donor assisted NGO. They then try to operate this though farmer organizations and cooperatives, but this leaves the question is this best method, or dispite the social ideal is the accompaning business model just to cumbersome and inconveninet that the farmers are better off taking their business elsewhere. I fear the latter is the case and an issue that needs to be address in some forum, if not this one.

Thank you.

Submitted by Burley Zhong Wang on Thu, 04/14/2011 - 10:40

Very good points related issues mentioned from the post of Nabeel Abu-Shriha. I have some limited experiences on participating the projects serving the practical end users (farmers and enterprenuers) with ICTs. there are very good project designed well but failed to generate a sustainable effect after the some local service center invested. it seems that the end users wanna some "fast food" that easily understood to act, and can bring them benefit in a see-able future, and better free of charge. other wise, they'd like to ask the real person.

the PC, mobile, even CATV have been utilized as network terminals already, and there are certain orgs input region-oriented info, but is that enough? can we make make more valuable customized service and regarding them as knowledge farmer/agri-entreprenuer, as what we did to users in business world?

and finally, I do feel there are much more resources for research community comparing to those for those end users, is it really hard to meet their requirement on information based on the existed tools, standards and infrastructures?

Submitted by Sylvester Dickson Baguma on Wed, 04/13/2011 - 21:37

I agree with what is written. LOD is the way to go. Content management systyems will play a leading role in mashing up different contents to get analyses of interest. There might be new tools and approaches that are in use and hence the more justification of our meeting. However , application of these tools and approaches will require capacity development especially in the South.

Submitted by Nodumo Dhlamini on Thu, 04/14/2011 - 19:54

The models of social networks such as Face Book, Linked In, Twitter will determine issues of interoperability in a big way. User generated information is here to stay
The concepts of semantic web design will also feature in a great way.
Cloud computing is clearly the way to go.
Translation will be a key issue to be addressed – countries like Africa where the languages are quite diverse will continue to pose a challenge. Thus the support of initiatives such as google translate are key
Mobile technologies and platforms cannot be ignored

Submitted by Elizabeth Dodsworth on Fri, 04/15/2011 - 17:51

I would be interested to know what emerging technologies focus on producing good translations of scientific and technical content. During my recent visit to South and Central America some researchers suggested that Google Translate is not good enough for highly technical language - from English to Spanish. It throws up ambiguities. Are there any tools that focus on technical translations. Could CIARD provide a directory of those tools with recommendations?

Submitted by Jim Cory on Fri, 04/15/2011 - 17:09

We have talked about systems for sharing data within a broad group of users. Perhaps we need to think about these systems as a hierarchy. At one level are a set of systems that fully support global standards and provide automatic semantic matching.

At an intermediate level are a group of systems that are more localized and which understand the global standards and intake and output data to and from global systems, but are focused on incorporating and sharing local data gathered in less formal ways. These intermediate systems work with local developers to provide access to global and local information in a format that is relevant to local practitioners.

At the practioner level there are flexible delivery and reporting tools that provide current, localized, market oriented data and also incorporate crowd sourcing methods for gathering information from individual practitioners.

All levels may not support the same standards, but there could be localized methods for taking data in and then translating them into global standards as the data moves up the hierarchy.

Submitted by Laurent Lefort on Fri, 04/15/2011 - 19:09

Data citation

One way to encourage the sharing of data is to develop the practice of data citation.

Here are two useful background documents, one from the Australian National Data Service (ANDS) and the other from the

ANDS and the other from Gen2Phen, an EU project focusing on Health and Life science research data.

- Data Citation Awareness http://ands.org.au/guides/data-citation-awareness.html

- D9.3 Draft Report on Incentives and Rewards in the Field of Biomedical Research Databases http://www.gen2phen.org/system/files/private/D9.3%20Draft%20Report%20on…

Standards and tools to transform data from SQL to RDF

I recommend to RDF beginners to start with tools which implements a Direct Mapping from a database. Direct Mapping is defined as a mapping that mirrors the database schema in RDF with a minimal effort required to implement it. There have also been efforts to let users annotate the SQL code to provide the same capability e.g the work done by the FlyWeb project

- My first mapping from RDB to RDF using a direct mapping http://ivan-herman.name/2010/11/19/my-first-mapping-from-direct-mapping/

- Future of FlyWeb work on Chado OWL ontology and RDF mapping http://generic-model-organism-system-database.450254.n5.nabble.com/Futu…

These tools are not only interesting for users willing to transform an existing database. MOLGENIS http://sourceforge.net/projects/molgenis/ is a tool to allow users to create their own TAB-delimited format to record science data and then move it into RDF (using D2RQ) to allow other apps to process it. The starting point is a couple of XML files (one for the data model and one for the UI) with a simple syntax (there is a utility to extract the data model one from an existing database). Out of these original "models", the MOLGENIS project aims to derive a range of tools including a R API and RDF access using D2RQ in a comparable way to what has been done in the FlyWeb project.

LOD and ontology

The gap between "lightweight semantics" like LOD and ontology-based approach is much smaller than it used to be. The Datatype Reasoning capabilities enabled by the OWL2 standard http://www.w3.org/TR/owl2-overview/ and the new features provided by ontology engineering tools like SPARQL-DL http://www.w3.org/2001/sw/wiki/SPARQL-DL can help LOD users to exploit ontology content even if it is mixed with numerical data.

Graph databases (NoSQL)

Finally, Graph databases may also have a role to play in future Linked Open Data infrastructures because there are new (and also old) products now fighting over a market niche which is roughly half way between traditional databases and triple stores.

Sandro Hawke Toward Standards for NoSQL NoSQL Live … from Boston March 11, 2010 http://www.w3.org/2010/Talks/0311-nosql/talk.pdf

Become a member

As e-Agriculture Forum member you can contribute to ongoing discussions, receive regular updates via email and browse fellow members profiles.

Contact

[email protected]

e-Agriculture

Forum: "Building the CIARD Framework for Data and Information Sharing" April, 2011

29/03/2011

Become a member

Contact

e-Agriculture