<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Murray Lab &#187; Analysis</title>
	<atom:link href="http://genetics.uiowa.edu/category/analysis/feed/" rel="self" type="application/rss+xml" />
	<link>http://genetics.uiowa.edu</link>
	<description></description>
	<lastBuildDate>Fri, 20 Nov 2009 16:58:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Re-analyzing Analytics</title>
		<link>http://genetics.uiowa.edu/2009/09/16/re-analyzing-analytics/</link>
		<comments>http://genetics.uiowa.edu/2009/09/16/re-analyzing-analytics/#comments</comments>
		<pubDate>Wed, 16 Sep 2009 23:57:34 +0000</pubDate>
		<dc:creator>Rory O&#39;Connell</dc:creator>
				<category><![CDATA[Analysis]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[supplemental]]></category>
		<category><![CDATA[update]]></category>
		<category><![CDATA[website analysis]]></category>

		<guid isPermaLink="false">http://genetics.uiowa.edu/wp/2009/09/16/re-analyzing-analytics/</guid>
		<description><![CDATA[I rather expected something like the results I got out of the Google Analytics numbers we have for the past couple years, but I was really surprised on the almost 9:1 discrepancy.  It didn’t quite sit right, so after a bit of thought I wanted to look at some results in a different manner.
How Analytics [...]]]></description>
			<content:encoded><![CDATA[<p>I rather expected something like the results I got out of the Google Analytics numbers we have for the past couple years, but I was really surprised on the almost 9:1 discrepancy.  It didn’t quite sit right, so after a bit of thought I wanted to look at some results in a different manner.<span id="more-71"></span></p>
<h2>How Analytics works</h2>
<p>After signing up for the free Google Analytics site, you are given a piece of code to add to your website pages.  This piece of code (anonymously) collects some information from the user, like what browser and so forth, and sends it back to Google.  It also keeps track of usage statistics, like the time on a single page and more.  The key component, however, is adding the code to each web page you want to track.  I did not add the code to all of the internal tools pages, as that data wouldn’t make any sense trying to lump it in with general website traffic.</p>
<p>The code that Google provides will catch if you go from a web page to a downloadable file on the site, such as all of the PDF files and supplements we have on our site.  However, it is impossible for Analytics to catch traffic that goes directly to a downloaded file.  For instance, if someone were to google ‘dmso in pcr’, and the Google result links directly to one of our protocols PDF that has DMSO in a reaction, Analytics will not catch that.</p>
<p>However, it is possible to catch that type of traffic using the logs for the web server.  Web servers will create a text file that contains a line for each access for each element on the web site.  Since it’s obvious that these files can get very large quickly, it’s standard to do a rotation of these log files.  The previous server system that I was using, however, rotated out log files greater than 7 days old by deleting them entirely.  I replaced that system a little over a month ago, but data previously to that is lost.</p>
<h2>Digging into the heart</h2>
<p>Initially this started as a project that after about 2 pm on the 15th, I noticed that server traffic jumped up by a very large amount.  Since this was directly after my presentation I figured this was the case, but I was still curious as to where all of the traffic was going.  After looking at the Analytics site and seeing results that did not jive with the server traffic, I made the connection I alluded to above on how it is impossible for Analytics to determine traffic in certain conditions.</p>
<p>To try and see how different the results may be from the Analytics that I used previously, I and analyzed the access log files that I have from the web server.  By themselves the files are mostly useless, but in the hands of a powerful analysis program that can disassemble and group data from access log files to generate reports from it.  Using the quite neat <a href="http://deep-software.com/download.asp">Deep Log Analyzer</a> I took a look at just the week previous.</p>
<p><a href="http://genetics.uiowa.edu/wp-content/uploads/2009/09/web-analysis-1.PNG"><img class="alignnone size-medium wp-image-73" title="web analysis 1" src="http://genetics.uiowa.edu/wp-content/uploads/2009/09/web-analysis-1-300x210.PNG" alt="web analysis 1" width="300" height="210" /></a></p>
<p>Since the log file contains all site accesses, and I did not filter any results, there are more elements than in Analytics.  For instance, all of the Tools pages are shown.  The first element is a small program that handles returning the results from the SDS tool, which gets accessed quite a bit, so it is the first element.  Ignoring that and all of the other Tools the trends still follow from the Analytics results (n.b. I’m comparing a week of data to 2 years of Analytics data so…)</p>
<p>Something that I can track with analyzing the logs that I cannot with Analytics are direct downloads that do not come from a link on the site.  This shows the total number of hits to all of the downloaded files in the past week</p>
<p><a href="http://genetics.uiowa.edu/wp-content/uploads/2009/09/web-analysis-2.png"><img class="alignnone size-medium wp-image-74" title="web analysis 2" src="http://genetics.uiowa.edu/wp-content/uploads/2009/09/web-analysis-2-300x210.png" alt="web analysis 2" width="300" height="210" /></a></p>
<p>Definitely our published Protocols generate the most downloaded traffic, something that was not caught at all by Analytics because these files were accessed directly from Google and not from a link on our page.  This also shows much more traffic on our publication PDFs, another thing that wasn&#8217;t caught in Analytics.</p>
<p>Looking at the collected Google search terms…</p>
<p><a href="http://genetics.uiowa.edu/wp-content/uploads/2009/09/web-analysis-3.png"><img class="alignnone size-medium wp-image-75" title="web analysis 3" src="http://genetics.uiowa.edu/wp-content/uploads/2009/09/web-analysis-3-300x210.png" alt="web analysis 3" width="300" height="210" /></a></p>
<p>Seems quite different than the terms I found from Analytics.  There is a higher rate of scientific terms.  Unfortunately, the free version of the utility does not allow for the exporting of reports, so I can’t post the dataset.</p>
<p>The page access numbers directly from the log files still correlate with what I found from Analytics, although just from a cursory glance the ratio of Google search terms would likely be evened out some, certainlly different from the 9:1 ratio that I got from Analytics previously.  It would take some more analysis of log files over a longer period of time to really see the discrepancies.  I still stand by my original premise however that a very large portion of the traffic comes from name-searching, but it is probably not as drastic as I originally seen.</p>
<p>When we start using Wordpress for managing content however, it is organized in such a fashion that when we place downloadable content on the site, like my slides from the presentation, Google should recognize that the PDF is a part of some other content so people searching for the terms will end up with the page with the content instead of the downloadable file itself.  Thereby allowing Analytics to pick up on it again.  It will be really interesting to me to convert all of the content to Wordpress in a way that Analytics will pick up on and then re-analyze the data from Analytics in a year.</p>
]]></content:encoded>
			<wfw:commentRss>http://genetics.uiowa.edu/2009/09/16/re-analyzing-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
