The use of the term “model” often evokes images of small-scale physical representations of larger objects, such as planes, buildings, ships or cars. Such models can be classified as “physical models” and are a subset of the possible existing range of models. Regardless of whether they are scaled down physical models, graphical models or mathematical models, models attempt to represent some larger or more complex phenomenon with a simpler or smaller representation.
The simple, common interpretation of a model is in fact a reasonable starting point to better appreciate what a mathematical model is. As an illustration, prior to constructing a building, an architect or engineer will make a small physical model of the building in order to represent its features and appearance. The purpose of this model is to allow stakeholders to better visualize what the final product will be, and to experiment and suggest changes based on the small-scale model prior to the final building being constructed. Mathematical models, like the scaled down physical models used to help visualize buildings or other similar structures, are used to simplify and represent a real system in a manner that helps us visualize, describe and manipulate it.
A mathematical model takes the description of a real system (for instance, the processing of fish products), including the conceptual understanding of how the process works and any associated data, and translates this into a system of mathematical relations. The mathematical model generated in this way allows the process being described to be clearly and transparently illustrated, and more importantly, to be investigated and changed at the mathematical level to see what effects might occur at the large-scale level. (This could be viewed as similar to the changes that might be made to the architectural model of the building prior to construction, except in this case changes can be made to the system on an ongoing basis if needed.) In microbial risk assessment, the basic goal is to translate pathogen product combination systems into mathematical models.
In general, risk assessments can be broadly classified as qualitative and quantitative. Qualitative risk assessments involve the descriptive treatment of information, whereas quantitative assessments work with numerical data. It is important to recognize that the decision to perform a qualitative vs. quantitative risk assessment should be viewed as a sequential process, as opposed to an either/or decision. Progression from qualitative to quantitative would occur when the issue is of such a nature that the time and resources required can be warranted (although qualitative risk assessments done in an appropriate fashion also require time and resources). There is a key advantage in progressing and investing the resources in the quantitative approach, specifically progressing from qualitative to quantitative means that the flexibility, acceptability, objectivity and power of the decisions made are also increased.
This report is concerned primarily with the more quantitative aspects of mathematical modelling; however, in the interest of completeness it should be noted that qualitative risk assessments should not be interpreted as basic literature reviews, as is sometimes done, and passed off as a risk assessment. Qualitative assessments still need to arrive at some estimate of the magnitude of the probable harm. In these types of assessments, it is important to recognize the necessity of being precise about “qualitative” statements and measurements since descriptive characterizations of likelihood and impact can be misinterpreted. A qualitative estimate of risk may be conducted by assigning ratings such as negligible, low, medium, or high to the risk factors. If such a system is used, specific guidelines and definitions of the assigned ranges for each rating must be clearly described and justifiable.
Mathematical models do not necessarily have to be obscure and with complicated mathematical equations. As an example, a model to describe the time spent catching fish at sea by a fishing vessel could be described by the following equation:
t = C/r
where “t” is the time spent fishing, “C” is the number of fish the vessel sets out to catch and “r” is the rate at which the vessel catches fish. This could be considered a simple mathematical model.
It should be noted that this model is probably an oversimplification of how the fishing duration system might actually work, but at a general overview level, this may be appropriate. Additional complexity could be incorporated by recognizing that the components of this model may in fact be functions of additional parameters, for instance, “C” may itself be a function of the time of year (greater quotas at different seasons) or geographic locations (certain areas with greater quotas based on the quality of the catch). The denominator “r” may also be a function of other parameters, including the type of vessel being used, the waters being fished and any other number of parameters that might affect the rate at which fish are caught.
In general, mathematical models can be highly variable in their complexity and as a result the ease with which they can be solved. The simple example illustrated above can be reasonably easy to solve, however, this situation can often be the exception rather than the rule. The ability to arrive at a solution to the mathematical model can be classified into the following situations:
If an analytical solution exists and is reasonably easy to obtain, then the analytical solution should be pursued. However, when this is not possible, as in the last two scenarios above, then the model needs to be analysed by means of simulation, and we then refer to the model as a simulation model. Simulation models can be classified into three categories, discussed below.
Static and dynamic models can be differentiated along the lines of their treatment of time. Static simulation models attempt to characterize the behaviour of a system at a particular and fixed moment in time. Dynamic models, on the other hand, represent a system as it changes over the course of time. An example of a static model might be an equation that estimates the level of contamination in water given the current surrounding and environmental conditions. A model that predicts the changes in the level of contamination on a day-to-day basis would be an example of a dynamic model.
Continuous and discrete models are differentiated as a result of the discrete or continuous nature of the systems they attempt to describe. Very simply, discrete systems can be viewed as systems in which the variables describing the system change instantaneously at a point in time (the objects might be discrete units, and as a result, if a unit is removed the system as a whole changes instantly). A continuous system is one in which the changes in the system occur continuously. An alternative visualization of the difference between a continuous and discrete system is shown in Figure 2.1.
Comparison between a continuous and discrete system
Deterministic and stochastic models can be differentiated along the lines of their treatment of randomness and probability. Deterministic models do not include any form of randomness or probability in their characterization of a system. In a deterministic model, regardless of its complexity, the outputs are determined once the inputs have been defined. Conversely, stochastic models include components of randomness within their definition; as a result, the outputs are in fact estimates of the true system. Stochastic models tend to be a better representation of natural systems, given the randomness inherent in nature itself. It would be unlikely that a model to describe a natural system could be deterministic. (However, theoretically when details down to the genetic level can be perfectly understood, it is feasible.)
In many papers and reports related to quantitative risk assessment, deterministic and stochastic models have been differentiated based on the fact that one uses single point estimate values and the other uses ranges or statistical distributions of values. The true definition of a deterministic model is defined in the prior paragraph; however, the use of point estimates in a model produces the impression that the system being modelled is deterministic. In essence, single point values input into the mathematical model produce single point value outputs that appear fully determined. Important issues which have received substantial attention in the recent past are models using point estimates and stochastic models, which are discussed in detail below.
Historically, many risk assessments have used point-estimates, single values such as the mean or maximum values of variable data sets, to generate a single numerical value for the risk estimate generated through a model. The most common concern raised against single value types of assessments is that they frequently use the extremes, or “worst-case” of the risk situation, without regard for how likely these extremes are to occur. Alternatively, if the “average” risk is calculated based on mean values, the extremes are disregarded, which may be important since they could represent highly susceptible subpopulations or infrequent, but severe circumstances.
The alternative to using point estimates is to consider the stochastic approach. The stochastic approach constructs risk assessments that incorporate the variability inherent in the system itself as well as the uncertainty in the input parameters. This is accomplished by utilizing probability and uncertainty distributions. It is important to recognize that there is a difference between uncertainty and variability, which is addressed further on.
To illustrate the implications of the point estimate and stochastic approach to risk assessment, the following is a hypothetical scenario for an exposure assessment that estimates the dose of a pathogen ingested by the consumer based on the concentration of the pathogen in a seafood product at harvest, and accounts for growth and inactivation prior to consumption. This is a simplified example to contrast the point estimate and stochastic approach with only a few input parameters considered: the concentration of the pathogen in the raw product at harvest, growth during initial transportation prior to refrigeration, die-off during secondary transportation as a result of freezing, and inactivation as a result of cooking. Figure 2.2 graphically represents the model and Table 2.1 summarizes the values parameters.
Schematic of the simplified model
Parameter values used in simplified model
|Concentration in seafood product||0.5||2.0||3.5||Log CFU/g|
|Growth prior to freezing||0.0||1.7||4.0||Log|
|Die-off during frozen storage||0.0||0.7||1.5||Log|
|Inactivation during cooking||1.0||2.8||4||Log|
|Amount of product consumed||100||150||200||grams|
In this example, the seafood product is assumed to have a pathogen concentration between a minimum of 0.5 log CFU/g and a maximum of 3.5 log CFU/g, with a mean concentration of 2.0 log CFU/g. The catch is stored on board the shipping vessel, but due to potential delays and inefficient refrigeration, it is assumed that growth can occur at this point. Growth between 0 and 4 log is assumed to occur as a result of these delays and inefficiencies. The product is subsequently frozen and this results in a 0 to 1.5 log die-off. Finally, the product is cooked prior to consumption, and cooking results in a 1 to 4 log reduction with the consumer eating between 100 and 200 gram portions of the product.
If we were to use a point estimate approach, we could use the mean values of the input variables such as 2.0 log CFU/g for the concentration, and 2.83 log inactivation during cooking. These point estimates are then used to calculate the “best estimate” for the number of organisms ingested by the consumer:
Dose ingested = 10 [ 2.0 + 1.7 - 0.7 - 2.8] × 150 = approx. 220 cells.
The analysis can be taken further by using the maximum and minimum point estimates of each of the variables to calculate possible outcomes based on different combinations of concentration, reductions due to cooking, and consumed amounts. However, as the model gets more complex the number of possible combinations increases dramatically. In the simple example shown above there are 243 different possible combinations that could be generated, calculated as:
No. of scenarios = (No. of point estimates at each variable)[No. of variables]
It is unlikely that all the different possibilities would be evaluated when the point estimate approach is used, rather, it would suffice to calculate the bounds and know that all the values between these bounds are possible. In this example, the maximum possible dose that might be ingested occurs when the concentration of the pathogen is at a maximum, the growth that occurs prior to freezing is at a maximum, the reductions occurring during freezing is at a minimum, the inactivation during cooking is at a minimum and the largest portion sizes are consumed. Using these values we estimate ingestion of approximately 6.3e8 cells as the “worst case” estimate. It is important to note that in a reasonably straightforward example such as this one, it is possible to determine the combinations that lead to a “worst-case” scenario; however this becomes increasingly difficult as the model becomes more complex.
Often when a point estimate approach is used, the tendency has been to take a “conservative” approach, meaning a margin of safety has tended to be incorporated so that errors occur on the side of safety. However, this practice can become a contentious issue, especially when debating how much conservatism is enough and the impact of propagating conservatism through the model, which can result in truly unrealistic estimates. The conservative point estimates tend to reduce the credibility of the assessment and essentially results in risk management decisions based not on scientific realities or all the information, but rather on regulatory guidelines or the assessor’s conservatism. The effect of the conservative point estimate on the results of a risk assessment have been succinctly stated by Burmaster (1996) as follows:
Another drawback of the point estimate approach is that the likelihood or probability of a point estimate risk actually occurring is ignored. All values between the minimum and maximum points are regarded as equally likely to occur. In reality, however, some values within the interval are more likely to occur than others. Using the illustration above, while it may be true that consumers eat between 100 and 200 grams, it is more likely that there are a small proportion of people who eat the extreme amounts and that the true consumption pattern follows some statistical distribution (the normal distribution might be an example). This is an example where probabilistic techniques can be applied to provide more accurate estimates, providing more information to the risk manager, representing reality better than point estimates, without propagating conservative values through the model.
Probabilistic/stochastic assessments represent all the information available for each parameter, described as a distribution of possible values. The distribution used to describe a data set is dependent on the amount of data available and knowledge about the nature of the phenomenon. For the example described, distributions can be used to replace the point estimate representations (Figure 2.3).
Distribution values used in stochastic example model
It should be noted that these distributions are only illustrative and are not necessarily the most appropriate for describing the variables listed. The concentration of the pathogen in the food is represented by a normal distribution centred at 2.0 log CFU/g, indicating that this value will be the most frequently occurring concentration found in the food. Triangular distributions are used to describe some of the other parameters, assuming that the amount of information available here is limited (minimum, most likely, maximum).
The outcome of the probabilistic analysis is a distribution of possible ingested doses. The results are shown in Figures 2.4A and 2.4B. For comparison, the bounds generated using point estimates are also shown.
FIGURE 2.4A and 2.4B
Probabilistic analysis results for example model
It can be seen that using point estimates to set the bounds does not provide as much information as may be necessary to make the best management decisions. In comparison with the distribution of likely exposures in Figure 2.4A, it is readily evident that when a “worst-case” scenario is used to derive a point estimate of maximum ingested dose, the estimated value is high (8.8 logs, or 6.3e8 cells) but this would occur very rarely. In Figure 2.4B, the x-axis is plotted on a non-log scale. Using this scale, the very low likelihood of the “worst-case” scenario occurring becomes even more evident, especially when you consider that the point labelled “Max” on the figure would in reality be located far of the page to the right.
If the risk manager were presented with this worst-case scenario, without any indication of how likely it is that the event will occur, the risk manager may inappropriately allocate valuable resources to reduce an event that rarely occurs. It should be kept in mind, however, that if the outcome of ingestion of this particular pathogen is severe, it might be an appropriate management decision to ensure that this adverse outcome, however unlikely, is prevented.
A mathematical description of the production and consumption of a food using probability distributions is very difficult to calculate analytically. While some analysis is practical on very small and simple models, a compound model of food production involving pathogen growth, destruction and infection is often too complex to solve analytically. As described previously, this is a situation in which it becomes necessary to employ simulation methods in order to solve the mathematical models. Monte-Carlo analysis is a mathematical simulation tool well suited to solving stochastic simulation models. The method has become even more attractive in the past 10 to 15 years with the availability of reasonably cheap computer processing power.
The mathematician Stansilaw Ulam (1909−1984) is the person most credited or associated with the development of Monte Carlo simulation. Ulam and John von Neumann, at the Hydrogen Bomb Super Conference in 1946, realized the potential application of the method to simulating the probabilistic problem concerned with random neutron diffusion in fissile materials (Rugen and Callahan, 1996). Despite their initial development and application in the late 1940s, Monte Carlo methods were largely ignored in the risk assessment arena until quite recently. The method was viewed negatively after the 1950s as a result of being used to attempt to solve all types of mathematical and physical problems, without regard to its suitability to efficiently solve some problems and not others (Moore, 1996). The Monte Carlo method has since seen applications ranging from science to economics and from engineering to insurance. Monte Carlo simulation is very computationally intensive, therefore the availability of powerful desktop computers has also served to increase its use in recent times.
Monte Carlo analysis as it applies to risk assessment is a relatively straightforward procedure. The method can be applied to models already developed using a deterministic method by replacing the point estimates with probability distributions. Monte Carlo simulation involves randomly sampling each probability distribution within the model hundreds or even thousands of times, producing a new scenario at each iteration. In essence, a new “point-estimate” is generated at each iteration for each parameter within the model and the result recorded. The process is then repeated until each individual probability distribution has been sufficiently recreated.
Monte Carlo simulation of a triangular distribution (Triangular [1,4,8])
Figure 2.5 shows how a triangular distribution with a minimum value of 1, maximum of 8 and most likely value of 4 is recreated as the iterations in the simulation proceed. At the first iteration, only one value has been selected which is comparable to a simple point estimate randomly selected between the limits of the distribution. After 5 iterations, the distribution still appears to be random point estimates between the limits. However, after 100 iterations it can be seen that values around the most likely value, 4 have been selected more frequently than those at the extremes. Finally, after 5 000 iterations the triangular distribution can be observed to be sufficiently recreated with the majority of the samples selected around the most likely value with values spreading out towards the extremes sampled with decreasing frequency.
In this example only one parameter is shown, depending on the complexity of the model there could be several distributions that are sampled at each iteration with the model re-evaluated after each of these iterations and the results stored. The sampling of the input distributions in this way and the subsequent evaluation of the model then generates a distribution for the output of interest. The output represents a result that encompasses most of the possible combinations for the inputs.
An important characteristic of Monte Carlo analysis, in addition to the repetitive sampling, is the selection of samples at every iteration based on the defined probability distribution. Thus, based on the parameters of a distribution, some values are selected much more frequently than others are, which reflects real world events much more faithfully. An analogy of this could be the height of people in the population; it is well known that there is a range of heights in the general population. However, the majority of people fall within a much smaller range, say a mean of 5.10 feet (ft). If we had the resources to perform a “point-estimate” simulation that evaluated all the possibilities that existed, we could select thousands of samples from between the maximum and minimum heights observed, and evaluate our model thousands of times and tally the results. This method however would be making the assumption that it was just as likely to find a 7 ft individual as a 6 ft individual. Monte Carlo analysis selects thousands of samples as well, however, based on the input distribution, 6 ft individuals are selected much more frequently than 7 ft individuals. The model that these samples are fed into thus generates estimates that reflect a much more realistic scenario.
The simulation of a model using Monte Carlo analysis allows the model to be used for more than just the estimation of risk. By conducting a sensitivity analysis on the model, the parameters or variables within the model that influence the outcome can be determined. This analysis can serve to focus analysis, research, management or modelling efforts. Variables that have a significant impact on the output being investigated should be addressed, first by the assessor who may have made unacceptable simplifications, and then by the manager who should implement action if feasible. Alternatively, the identification of variables that do not influence the output allows resources to be spent on more important, immediate concerns.
Inherent in the stochastic approach is the concept of uncertainty and variability. As described earlier, the point estimate approach tends to ignore the existence of uncertainty and variability. Uncertainty and variability both characterize the existence of a range of values; however, the cause of the existence of the range is different, as described below. In addition, the ramifications of the two are also different.
Probabilistic analysis attempts to characterize the variations that are inherent in most parameters related to time, space or populations. In conducting a risk assessment that describes the variations that occur in a parameter, recognition has to be given to the existence of variations as a result of two distinct phenomenon: uncertainty and variability.
Variability is essentially a property of nature, a result of the natural random processes. It represents diversity in a well-characterized population or parameter. Variability cannot be reduced through further study or additional measurements. An example of this could be the amounts of food people eat. Conducting surveys on eating habits provides information on how much food people eat; however, some people will always eat more or less than others regardless of how much data we collect on them.
Uncertainty is a property of the risk assessor. It results from the lack of knowledge about a phenomenon or parameter and the inability to characterize it. Uncertainty can in many cases, be reduced through further measurement or study. An illustration of this could again be drawn from the description of how much food people eat. With little information available, perhaps a minimum and maximum amount of food consumed could be estimated. By conducting additional research, the amount of food people eat and how frequently different amounts are consumed could be determined.
Uncertainty and variability have different ramifications in the results of a risk assessment and the risk management decisions that are pursued. Uncertainty, as described, would imply the need to understand the phenomenon. Variability would initiate control of the phenomenon. If the output of interest is influenced by the uncertainty in a parameter, the management decision may be to focus more research or data collection activities to this parameter. The parameter could then be better characterized or understood, and the assessment re-done. If the variability in a parameter is the driving force, then controls placed on this parameter to potentially reduce the variability may be the favoured decision.
To illustrate the difference in decision-making, let us assume that a risk assessment model is generated. The model looks at the concentration of a pathogen in a food product at the processor, simulates the transportation of the product to retail, evaluates the cooking and eating preferences at home and arrives at an estimate of the risk. The model estimates that the risk to consumers from the pathogenic organism is influenced by the concentration of the pathogen on the raw product. Unfortunately, the concentration on the product is highly uncertain; while most studies have shown the presence of the organism in the product, there has been no quantification of how much of the organism is present. As a result, the risk assessor has had to describe the quantity with a uniform distribution that implies the concentration could be anywhere between 1 log CFU/g to 6 log CFU/g. In this situation, the most appropriate management decision may be to determine the true concentration in the product by commissioning research, or the collection of more data.
An alternate scenario might be a situation where there have been extensive surveys conducted on the temperature of the product during refrigerated transportation (temperature is well characterized), but the data shows there is variability in the temperature because of poor refrigerated transportation protocols. If it is then determined in the assessment that the risk to the consumer is most influenced by the growth of the pathogen as a result of temperatures experienced during transportation from the processor to retail, the most appropriate management decision in this case would be to implement controls on the transportation of the product to reduce the variability, preventing the temperature from reaching dangerous levels.
The data indicates that temperature has been well characterized, however, there is variability in the temperature because of poor refrigerated transportation protocols. The assessment determines that the risk to the consumer is most influenced by the growth of the pathogen as a result of temperatures during transportation from the processor to retail. Then, the most appropriate management decision would be to implement controls on the refrigerated transportation of the product to reduce the variability of this parameter into critical temperature regions.
As a result of these two very different implications, risk assessors and researchers have been pushing for the separate treatment of variability and uncertainty. Most probabilistic assessments use probability distributions in which the uncertainty and variability are combined. The separation of variability and uncertainty in risk assessment is a computationally expensive undertaking, especially as the complexity of the model increases. The simulation techniques to separate out variability and uncertainty in a Monte Carlo analysis are beyond the scope of this section. It may be appropriate, however, to illustrate a simple example of the difference between the variability and uncertainty associated with a parameter.
Variability, as defined earlier, relates to the diversity in a well-characterized parameter. As an example, we may want to describe the variability in the concentration of an organism on a media. If we assume excellent knowledge about the concentration, perhaps this quantity can be described with a normal distribution with a mean of 3 and a standard deviation of 2. This describes the variability in the concentration of the organism that we may find at any period in time. Since organisms can grow and die, if were to take measurement at any one point in time, we could expect to get a slightly different value, hence the variability in the concentration. However, if we are uncertain about the parameters for the distribution, due to imperfect knowledge about the parameter, we can incorporate our uncertainty by describing the parameters of the distribution with a distribution of their own. The mean could range from between 2 and 4, and the standard deviation could range from 1 to 3. Thus, we have uncertainty in the variability, and as a result, the parameter could take on any number of distributional forms between the extremes:
Conc. = Normal (MU, SD)
Variability and uncertainty:
Conc. = Normal (MU, SD)
MU = Triangular (min, most likely, max)
SD = Triangular (min, most likely, max)
Figure 2.6 shows the variability in a parameter and the associated uncertainty. The solid black line represents the distribution with only variability present, the light grey lines reflect the uncertainty in the distribution, either through the mean or the standard deviation. Figure 2.6A illustrates the original level of uncertainty that may exist in describing the distribution for a parameter. As more information becomes available, either through research or data collection activities, the description of the parameter begins to approach Figures 2.6B and 2.6C. In Figure 2.6C, the uncertainty, or possible range in the mean value and standard deviation has been reduced by ¾ of a log CFU/gram at both extremes. In this situation, the solid line represents the limiting case. In other words, if we removed all the uncertainty that related to the parameter then we would be left with only variability. If the variation at that point was unacceptable according to the assessment, steps could be taken to reduce the spread of the distribution in the form of controls or interventions.
Changing the uncertainty in a parameter