Global Forum on Food Security and Nutrition (FSN Forum)

Adrian Muller

Research Institute of Organic Agriculture FiBL
Switzerland

Section 1:

Example 1 of the matrix: seems very general – I think examples have to be much more concrete to be helpful as an illustration.

Section 2:

On poor data quality, p 19: may add the following: often data is inconsistent, e.g. in spelling of commodities (e.g. “soybeans” and “soyabeans” in different types of FAOSTAT), etc. – this also causes problems. Furthermore, data is sometimes only available in pdf-format or in excel-sheet organised in such a way as to make systematic use by data processing and analysis programs particularly tedious.

Generally, may also put a focus regarding the data challenges to see some of them as a problem of data science and not too related to food systems – thus improvements could be sourced from the vast expertise in data science any by the help of data experts – with no specific relation to food systems. Thus, also explicitly try to learn from existing large data users on how they solve the data problems – e.g. big data in astronomy, particulate physics, neuroscience; in large companies (Amazon, etc.), in social media companies, etc.

Section 3:

Section 3.1.2 states: «Thus, while technological advances may reduce cost and widen the reach of surveys, the social divide may lead to the underrepresentation of those with poorer digital access and literacy (LeFevre et al., 2021) Policies and interventions that are based on such data generated from skewed sampling are therefore not useful to the unrepresented stakeholders who may have the utmost need for data-driven policy and support (Bell et al., 2017; LeFevre et al., 2021).» - true, but biased sampling is often a key mistake in any data collection and one should be very aware of this also in most traditional approaches for collecting data with the aim to gain a representative picture – as all too often the sampling strategy chosen does not allow this – the new digital devices may add an additional reason for bias to this, but being aware of it, this can be dealt with – what I want to emphasize here: biased sampling is also a problem in all other cases, where no new technologies are involved, and awareness of it needs to be increased also there. – This is taken up in section 3.3.2, so I would refer to 3.3.2 here in 3.1.2, and also vice-versa.

 “3.1.5. Lack of stakeholder engagement

Finally, the usability of the data is limited when stakeholders have not been involved in the survey planning and there is inadequate dissemination or access to information on what data is available and how it can be used by the stakeholder. These limitations to the access and use of data for improved decision-making, make it difficult to advocate for further funding and commitments towards the collection and analysis of food security and nutrition data.”

Regarding the quote above I would say, that this very much depends on the problem at hand and solutions identified and the data needed to implement those – stakeholder interaction is not always needed, or, if needed, it has to be specified in more detail. Thus, data usability is not in all cases limited if stakeholders have not been involved in survey planning – depends on the goals and topic.

Section 3.3, p36: the following is indeed a key challenge, one has to work on: “– reveals the overall scarcity of a minimally sufficient, statistical and quantitative analytic literacy, needed to ensure the validity of the results presented and their proper use.” – even more than getting more data – we have to assure that the data we already get is of good quality, and that the people analysing it know what they do and what can and cannot be done with the data at hand. – There is e.g. a gap in literacy on how to set up useful data structures: relational database, etc. – as you quote Rosenberg at the end of the intro to 3.3. Take the data on the Infoods-page, for example – the tables are in excel and all the tables look somewhat too very different – and they do not follow the relational database guidelines, so it is difficult to work with them. This would be a first and easy step to improve. – in more detail, in Infoods, where are e.g. cells with values but also an index for a footnote besides the value, or there are merged cells, thus disrupting the matrix structure, etc. – there are empty cells implicitly to be filled with the last previous entry in the same column, etc……some tables are available in pdf-format only, etc. – so this is a very sub-optimal data structure.

3.3. Lack of data processing and analytical capabilities – important section.

The sections 3.3.1 – 3.3.6  are very important, please invest on those to make them as helpful as possible. – One input on proxies: sometimes, the art of choosing an indicator is to avoid overly costly data collection requirements while still being able to make statements about the topic of interest. – Wisely chosen proxies can be very helpful – but it is a challenge to identify those – but it is often worth the effort.

May also add a section on “robustness” – not in the sense of uncertainty or noise (3.3.1) – but relating to how good the data has to be for supporting advice on actions to be taken. In some cases, there are “robust” patterns that can be identified from a range of approaches and without much sensitivity to changes in values of relevant parameters – thus, in such cases, data requirements are much lower than in cases, where results are very sensitive to which value a specific indicator may take. – Identifying these robust areas can be very helpful, as it reduces costs for data collection while still ensuring the possibility to derive advice for actions to be taken that will lead to the intended outcome with high certainty. – I would add such aspects to the framework presented at the beginning, e.g. giving explicit advice, on how to refine step 2 on data of the 4 steps above:

Given the priority, problem to be solved, question to be answered: which data is needed; then: which data is already there and which has to be collected. From the data that needs to be collected: identify first this data, which is useful in such a robust way as just described: are there parts of the priority/problem/question, where solutions seem to be quite clear, robust to how detailed the data is – then first go for them. Also, try to identify the big leverage points that may provide much effect on the basis of relatively less data, and do not focus on minor aspects, that may lead to incremental improvements but require large quantities of data.

Related to this, maybe some thoughts on the following statement: section 3.1.2: «For example, new data analytic architectures that generate farm and field level data allows farmers and stakeholders to monitor processes and make a decision for the precision livestock farming. (Fote et al., 2020). The use of these advanced technologies provides a level of granularity and immediate access to data that was lacking in traditional surveys.» true – but the first question again needs to be: which data is needed? There is some danger that the possibility to collect some more granular, detailed data at lower costs results in collecting it – without a clear aim and without a clear rationale that this data really contributes to increased food security. Thus, also with new technologies and with the huge potential of cell-phone-based data collection etc. – the first step always needs to be (as indicated in the framework) – which data is needed to solve which problem. Then the decision is taken on how to collect it.

Section 4:

This is somewhat confusingly structured and superficial.

As stated there, there are many new technologies, approaches, etc. that produce data. But these could be named in relation to how data is generated today – but which of them is then useful has to be decided on the basis of the framework introduced: what is the problem, which data is needed, how is it collected: there, sensors of the IoT may become important – or not. So I would much more locate this discussion on how to collect data as an instrumental discussion to what is needed than as a self-contained description of what is out there. Whether sensors of IoT or crowdsourcing is the best source of data strongly depends on what is needed. Related to the source of data can then be discussed, which requirements arise to transform the data into information – but also there, it should be strongly guided by the needs. Furthermore, the chapter, as it is now, covers a variety of concepts that are not all related to this step of data-to-information, or in very different ways. The Block-Chain, for example, plays a totally different role in this than Virtual Reality or social media: so I would also here differentiate much more in relation to the needs. May even add this as a step in the framework suggested above, given that there is a deluge of data and extracting information from it gets more and more challenging: i.e. between “2. Data” and “3. Translation” may add a separate step: “X.Information” – thus highlighting the crucial need to very explicitly think about and discuss how there is information gained from the data available – always guided by what is needed –

The steps may then look as follows: 1. Problem; 2. Information needed to solve it; 3. Data needed to get this information; 4. How to collect and analyse this data; 5. Translation, et… - thus, the information step may should be addressed earlier, before collecting data, as it is the focus of interest, and only when knowing which information is needed, we are able to collect the adequate data.   

Similarly for 4.1.3 “processing data” – this is not a goal in itself, thus address it again in closest connection to the goals formulated, and it is a service which becomes a topic due to the huge amount of data available and the related challenges to process it to extract the information needed. – But this can be addressed on a purely technical level.

Chapters 4.2. on new tech and 4.3. on how these support FSN are much too general – here, I would rather provide 2-4 in depth examples, presented in considerable detail, to illustrate certain key aspects of this in concrete cases, than providing extensive lists and references of examples without further contents.

4.4 and 4.5 are very important, but they could also be combined, each time discussing the risks and the mitigation approaches together, not in separate subsections.  

Section 5:

Governance: this is also a central issue, I have not much to add here, beside the following point:

One aspect that could be important is to think about where data collection and analysis can be AVOIDED – e.g. by sort of “self-organised” actions on a very small-scale level. Take e.g. a remotely organised extension service based on cell-phone pictures of pests and diseases and their damages, respectively – such a system can work well without collecting and analysing the data in detail – it requires a functioning cell-phone infrastructure as well as enough and well-educated farm advisors. Thus, the answer to a problem related to pest outbreaks in a region may not be to necessarily collect data but to establish a good remotely organized advisory system (I use this example just to illustrate my point – there will be better examples). – Clearly, some data is needed at the beginning (on which pests are there, etc.) – but what I want to emphasise is that in the framework of  1. Priority – 2. Data – 3. Translation – 4. Utilization – the data part can be really small – really only as much as needed. – Clearly, in such a context, more data can be collected to have better information for other cases, or maybe to better manage the given case – but again, it whould be driven by the problems to be solved and not  by the possibility that data can be collected relatively easy.

Thus, I would say, that a guiding principle should be to always collect as few data as possible to address the stated problem with the identified data need – this then also simplifies the data governance.  

Some further general comments:

  • May make a stronger statement somewhere at the beginning of the report, in the following direction: all these new data technologies, etc. are only a tool in the FSN context and not a goal in itself. I have the feeling that we sometimes tend to give it too much significance. We definitely should avoid adopting an approach that implicitly runs somewhat as follows: “we have the technology X – so let’s see what we can solve with it and how we can apply it.” – As displayed in section 1, the course of action really needs to be as follows: “we have problem Y, then identify which technology is most adequate to solve it!” – such thoughts could be emphasized somewhat more, I think.
  • The report goes quite far from data, information and analysis into discussion of physical devices and physical aspects (e.g. EWaste, section 4.4.5)), which I would not have expected from the title and goals of the report; may rephrase to really focus on the data/info/analysis part only and drop the rest; or broaden the rest and then also state this at the beginning of the report and include things such as 3-d-printing of spare parts to mend broken machines, while avoiding the need for complicated and time-consuming transport to remote areas, etc.

Related to this is the following: I think it is somewhat unclear, whether the focus of the report should be on data and data analysis for FSN (as stated at the beginning) or whether digitalization is also a central aspect (as here and there in the text). I would more clearly separate them – as data and data analysis is one specific aspect on gaining information for management and policy design, while digitalization is more about certain TOOLs to implement this and agronomic practices, etc.. Data and data analysis is about how to get information on the situation, and digital technologies can partly help with this, but many other tools can contribute there as well – depending on the goals.  

It may also be helpful to separate information provision approaches from data collection and analysis – e.g. virtual reality etc. may be stronger as educational tools than for data analysis.