Enhanced StatisticsOverviewIn 2009, a survey was conducted in the DSpace community to find the most important feature requests for the next major revision of DSpace. One hundred responses later, enhanced statistics reporting emerged as the clear leader. The statistics that DSpace provided were always quite basic, and recognising this, with Open Repository we've always offered a Google Analytics report alongside the repository. Whilst Google Analytics provides ways of visualising the activity on a repository that surpass the reporting capability of any standard repository instance, there have historically been a number of drawbacks to this approach. 1) Does not capture of file/bitstream downloads 2) Does not understand the structure of the repository (ie. give overview of activity within a collection or community) 3) Does not allow you to see reports within the context of the repository, or otherwise make the reports public Whilst we were able to mostly resolve the first problem by providing specific hooks to record clicks on the bitstream view/open links, the ability to understand the repository structure and reporting within the repository could not be addressed for a long time, as there was no way of retrieving the usage data from the analytics account. This changed last year, with the release of the Google Analytics API. And with DSpace 1.6 due to be released this month with a new statistics reporting module, this provided a framework with which to enhance the statistics capabilities of Open Repository. In doing so, we have also learnt more about how Google Analytics works, allowing us to enhance the data that is captured.
Item StatisticsUnderlying everything, we have the statistics for each item. The link to the statistics for an item is shown in the row of buttons to 'show full item record', 'recommend this item', etc. At the top of the item statistics, is a count of the number of times the item has been viewed - this only relates to the metadata pages (the simple and full item records). If any files have been uploaded to the item, then the download counts for these are displayed underneath this. Note that if a file has not been downloaded at all, then it's name will not show in the table. These values are for the totals for the whole time that the item has been in the repository. Next, we have the recent accesses, broken down by month, over the previous six months. The values shown are for the item views, and the cumulative total of all file downloads (if there are multiple files for this item, then it will be a single number that is total of downloads for all them). Finally, there is a breakdown of the combined item views and file downloads by location - the top 10 countries and cities. Like the item views and file downloads at the top of the page, these numbers relate to the entire time the item has been in the repository. See an example of statistics for an item
Collection / Community StatisticsStatistics are also available at collection and community levels. From the home page of the collection or community, a link to 'view statistics' is placed at the bottom of the page, below the latest submissions. Like the item statistics, this starts with a breakdown of the community or collection views for the entire time of the repository. The views for the community itself relate to the home page, and any browse pages scoped to just the items within that community or collection. The 'All Items' entry is a total of all the views of metadata (simple and full record) pages for items that are in that community or collection. Similarly, 'All downloads' is a total for all downloads of files that are attached to items within that community or collection. Below that, the recent accesses are shown, broken down by month, over the previous six months. Like the table above, this is broken down into community or collection views, and totals for item metadata views, and file downloads. Next, are the top item views (based on the metadata pages), and top file downloads for the contents of that community or collection. The top items table provides the name of the item, a link to the item page, and a link to the statistics for that item. While the top downloads table shows the file name, the name (and a link to) the item that it is part of, and a link to the item's statistics page. Whilst only the top 10 entries are provided in the table, a link to 'view all' is displayed underneath each table. Finally, there is a breakdown of activity by location - this is also covers the entire time of the repository. However, it is only showing the views of the community or collection pages themselves - not the items or file downloads within. See an example of statistics for a collection
Repository StatisticsLike the community or collection, statistics can be displayed covering the repository as a whole. The presentation is very similar. This time, the breakdown of views is based on 'All Communities' (community home and browse pages), 'All Collections' (collection home and browse pages), 'All Items' (simple and full metadata records), and 'All downloads' (accesses of the file attached to items). The top country and city lists are based on all activity across the repository - browse pages, search pages, communities, collections, items, and file downloads.
VisualizationLong lists of names and numbers can be overwhelming. To aid with understanding the statistics, we have included a number of visualizations. The total views and downloads for items, collections, communities, files at the top of each page are represented by a bar chart. Additionally, when you 'view all' the top items or top downloads for a collection, a bar chart is presented alongside the table. A line chart depicts the recent six month activity, showing community or collection views as appropriate, as well as item views and file downloads. Finally, alongside the top country and top city lists, an interactive map highlights the countries with the most accesses.
Most Accessed Items
A common request is to display the most accessed items for the repository. As part of the new statistics package, we are able to show a list of the most accessed items at the overall repository level, and specific to each community or collection. The number of accesses can be determined by the number of views of the metadata pages, or the number of file downloads (total for an item), or a combination of both metadata views and file downloads.
SecurityIn the introduction, we highlighted that one of the drawbacks of Google Analytics was that it was not possible to have public access to the report data, and that these enhancements allowed us to not only understand the usage within the context of the repository, but to also make this data available to any visitor to the repository. But allowing public access to the statistics is not always desirable. In order to satisfy those requirements, it is possible to restrict access to the statistics data to just registered users, or specific groups set up within the repository.
Enhanced LoggingWith a greater understanding of the Google Analytics logging mechanism, we have been able to improve the range and accuracy of data that is captured in the Google Analytics reports. File downloads are now tracked accurately in all circumstances, even when direct urls to the files are shared between users. Additionally, through the event logging mechanism, we are now able to show how much bandwidth a repository is using to serve the incoming requests.
|






