3

Monitoring requests

You can record all the requests to a website in a file or in the database to spot an invading robot or display statistics like the total number of visitors or the 10 most consulted pages.

The configuration parameters are defined in the file config.inc.

global $log_dir;

$log_dir = ROOT_DIR . DIRECTORY_SEPARATOR . 'log';

global $track_db, $track_log;
global $track_visitor, $track_visitor_agent;
global $track_agent_blacklist;

$track_db=false;
$track_log=false;       // true, file name or false
$track_visitor=false;
$track_visitor_agent=false;
$track_agent_blacklist=false;   // false or array of agent signatures

$track_visitor set to true triggers logging requests. $track_visitor_agent adds the content of the header User-Agent to the registered data. $track_db gives the name of the DB table which contains the log, track by default if $track_db is just true. $track_log gives the name of the file which contains the the log, track.log in the folder defined by $log_dir by default if $track_log is just true. If $track_db and $track_log are false, no logging is performed.

To filter out requests sent by known services, such as Google, Facebook or Nagios, define the parameter $track_agent_blacklist as an array of the signatures in lowercase they write in the field User-Agent of a request.

$track_agent_blacklist='facebook|facebot|googlebot|nagios';

In this configuration, the function track will ignore the requests sent by the Facebook and Google robots and Nagios probes.

Logging requests is managed by the function dispatch in engine.php:

function dispatch($languages) {
    global $base_path;
...
    global $track_visitor, $track_visitor_agent;

    $req = $base_path ? substr(request_uri(), strlen($base_path)) : request_uri();

    if ($track_visitor) {
        track($req, $track_visitor_agent);
    }
...
}

The database records the information about a request in the table track.

CREATE TABLE `track` (
  `track_id` INT(10) UNSIGNED NOT NULL,
  `time_stamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `ip_address` INT(10) UNSIGNED NOT NULL,
  `request_uri` VARCHAR(255) NOT NULL,
  `user_agent` VARCHAR(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `track` ADD PRIMARY KEY (`track_id`);
ALTER TABLE `track` MODIFY `track_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT;
USAGE

To display the content of the connection log:

$ tail track.log

To obtain the total number of visitors:

$ cut -f 1 track.log | cut -d' ' -f 3 | sort | uniq | wc -l

To list the 10 most consulted pages:

$ cut -f 2 track.log | sort | uniq -c | sort -rn | head -10

To check the DB:

mysql> SELECT * FROM track;

NOTE: If necessary, add the prefix $db_prefix defined in db.inc to the name of the table track.

To obtain the total number of visitors:

mysql> SELECT COUNT(DISTINCT ip_address) from track;

To list the 10 most consulted pages:

mysql> SELECT request_uri, COUNT(request_uri) AS count from track GROUP BY request_uri ORDER BY count DESC LIMIT 10;

IMPORTANT: The amount of data generated can rapidly fill up the DB and the log file. Choose only one mode by setting $track_db or $track_log to false. Once a campaign for analyzing the types of the clients (navigators, mobiles, robots, etc.) is over, leave the parameter $track_agent to false.

SEE ALSO

track, log, useragent

Comments

To add a comment, click here.