Web Guide  > webguide > 9. Maintenance > 9.3 Web usage analysis

9.3 Web usage analysis

Guideline

Web Usage Analysis is provided by CIOK and in this context refers to the usage of FAO owned Web sites by the global Web user community.

CIO monitors the Web site and generates automated reports from the logs of the Web servers and maintains and configures Google Analytics profiles.

Further, CIO can provide assistance in obtaining meaning and actionable information from those reports, and in providing suggestions regarding which other reports might be helpful.

To request statistics for your site, including specific directories or any special requirements your site may have (e.g. aliases, redirects and multiple locations), contact: web-stats@fao.org.

Statistics on visits, visitors, top pages etc. can help you understand what visitors do on your Web site, help you spot its strengths and weaknesses, and take any needed action to improve the site, information and service for your visitors.

From traffic analysis you can also obtain data on the types of computer and browser your visitors used to access your site.

Tools

In FAO there are currently two main tools used to produce Web access statistics: AWSTATS and Google Analytics.

  1. AWSTATS

    AWSTATS is a free open source  Web server log analysis tool written in Perl. AWSTATS runs daily, on machines in CIO  and provides static reports in HTML. Since AWSTATS is based on analysis of Web server logs, it can detect hits on page elements other than HTML, such as JavaScript, GIF files and so on. Likewise it can count PDF accesses, crawler access and failed page accesses (such as 404 not found or 500 Server Error). It can also detect crawler accesses; and you can go back and reprocess old Web server logs if you wish to do more detailed historical analysis.
  2. Google Analytics

    Google Analytics is a free tool provided by Google, which uses client side JavaScript to record Web site access. It does not require any software to be installed locally, however every page that is to be counted by Google Analytics must include the Google Analytics JavaScript. This is usually fairly easy to do if the site is driven with a CMS and uses templates.
    Google Analytics provides a rich Web interface to access and explore the statistics, you can set email alerts which are triggered by thresholds of Web activity. Likewise you can have PDF reports emailed automatically to you.
    Geolocation is particularly good compared to AWSTATs, and single reports can be generated for access across multiple domains relatively easily.
    T here are however some disadvantages to Google Analytics. It will only start analyzing Web usage from the day you install and configure it on your site so you cannot use it for historical analysis.
    Likewise it only counts HTML page views, and only on the pages you include the JavaScript. Access by any client that doesn’t support JavaScript will not be counted.
    Also PDF downloads are not usually counted (it is possible to count clicks on links on pages that lead to PDF downloads, but this will usually undercount PDF access as some PDF downloads will be from links on other pages that don’t have the Google Analytics JavaScript (for example search engine results pages).

Automated crawlers

Web sites are visited both by users with browsers and also by automated software which systematically download pages, called crawlers or spiders.

Typically, automated software is trying to harvest information about the site and the largest amount of traffic from automated crawlers is from search engines keeping their indexes up to date.

Generally speaking Web traffic analysis tries to gather information about what real users are doing on the site. It can also be important to know something about automated access to the site too.

For example, if all or part of your site cannot be found in search engine results, it’s worth checking to see if the crawlers have visited all parts of your site that should be represented in search engine indexes.

Likewise it’s not always possible to eliminate all crawler activity from the statistics, either because technology does not allow it (such as in the case of FTP server statistics, the FTP server itself doesn’t record the kind of software visiting the site as there is no scope to gather that information in the FTP protocol).

Or in some cases, crawlers may pretend to be users. They can occasionally be detected by seeing a large amount of activity from single IP addresses, that don’t involve downloading of images, while other crawlers can be extremely subtle behaving exactly like a user and so are impossible to detect. These are, however, exceptions.

Separation of FAO and non-FAO usage

In general, Web statistics separate access from internal FAO users to access from external users in order to have a better idea on how many non-FAO users access sites. Detection of FAO usage v's non-FAO usage is done by IP address.

Internal users includes the regional offices, sub-regional offices, country offices and liaison offices (because they use the same IP 168.202.x.x address). Only a handful of small offices are not on the WAN, but they come through the Internet with public addresses.

References and resources

Web Traffic Analysis [internal]