Tuesday, June 19, 2007

How is my Web Analytics data captured?

The concept important to learn after grasping the basic Web Analytics fundamentals is how do the Analytics software capture data. The basic thing required to track data on a Webpage is the Web Analytics code or tag. For e.g. if you want to find whether a page has the Google Analytics code, go to View -> Source and Ctrl+F for analytics.js. This is a JavaScript code that can be pasted on a Webpage to track Analytics data. Similarly there is a different technology of a .gif Image call to the Server to track Web analytics data. Similarly, to find an image call code you can go to www.msn.com and www.yahoo.com and search for c.gif and p.gif respectively. Below is my understanding of the various steps that take place before the data shows up in the Analytics tools. I have summed them up in a seven step hierarchy and call it the Web Analytics Data Lifecycle.

Webpage -> JavaScript Tag/Image Call -> Web Server -> Log Files -> Processing -> Databases -> Web Analytics Tool

1) Webpage - Web pages are the most pivotal part of this hierarchy. These can be HTML, ASP, ASP.NET etc or any other page hosted on a Web Server.

2) JavaScript/Image call – Omniture, Google Analytics, Web Trends etc use the JavaScript tag. This code needs to be placed on the client’s Webpage so that it can track the required parameters that can be used to make important business decisions. This JavaScript is then rendered when a person lands on a page and it sends parameters like Page Name, I.P, and resolution etc to the appropriate datacenters. The other widely used technology is the Image call technology which is a simple (img src) tag that calls the .gif image on the appropriate Server. Again, once the page is rendered, it calls the image and a hit is registered. Usually it is recommended that you place these tags on the top of your page so that they get called even if the page is not fully loaded.

3) Web Server – This is the datacenter/Web Server that is responsible for storing the parameters captured by the page tag. These Servers are usually very powerful and can store TBs of data and are also responsible for dropping cookies in the client’s machine. These cookies are the crux of Web Analytics and are crucial for calculating user behavior.

4) Log Files – The parameters captured by the Analytics Tag are stored in the Server Logs which are systemically designed to store data in the form of text files which can be in TXT or CSV format. Again these log files are huge and it is recommended to store them as compressed Archives.

5) Processing – The log files are then processed by the operations or database team via an ETL (Extract, Transform, Load) process. This is a very complex step and a team having strong technical expertise can do the job. This team is also responsible for filtering out the so called bot traffic.

6) DataWarehouse – A DataWarehouse stores the filtered data that will be displayed in the Web Analytics Tool. These databases are mostly used by Digital Analysts who want to create custom reports usually not possible with the help of Web Analytics tools. They can write their own custom queries and create a report not present in the analytics tool.

7) Web Analytics Tool – This is the final step of the Web Analytics Data Lifecycle and is the GUI form of data. Tools like Adobe Analytics, Google Analytics etc are the backbone for all the Analytics that take place nowadays. Anything like exporting data to creating graphs, charts are the basic features of these tools. They help organizations make the business decisions that generate revenue and make the appropriate changes to the Web Pages to retain users and also entice them to come back.

So you read how a simple web page forms the basis of such complex processes that are used to transform a simple http request into data that generates revenue. I hope you liked this article and would appreciate if you can critique it by commenting.

No comments: