生命物理科学研究室・高須昌子のHP

List of class taught by takasu thermodynamics perl code to analyze access by takasu

ホームページのアクセスの統計をとる
perlスクリプト: file.pl

            by 高須    97.8.2    → ホームページ

   Perlのページ  コンピュータのページ

１。このスクリプトの目的

access_logから、どのファイルにアクセスが多いかの
統計をとり、結果をhtml化します。

出力例はこちらです。

２。準備

(1) access_logの出力方法を調べる。

私の使っているサーバーでは、最近設定を変えたので、次のように出力されます。

inet-proxy2.toshiba.co.jp - - [01/Jul/1997:12:31:20  0900] 0 GET
/~takasu/link/mo.html HTTP/1.0 200 1856
http://hiwa003.s.kanazawa-u.ac.jp/~takasu/comp/mo.html Mozilla/3.0
(WinNT; I) -

この場合、GET と HTTP の間の部分がファイル名になっていることがわかります。
他の出力方法の場合は、それに合わせて、
プログラムの$file_startと$file_endを変えます。

(2)統計にのせたくないファイルを指定する。

これは、プログラムの初めの方の@grep
に、のせたくないファイル名を指定します。
既に消してしまったファイルなどをここに書いておきます。
この部分は、access_logの行全体を調べますので、
マシン名を書くことにより、自己アクセスを消すこともできます。

また、単にディレクトリを変えた場合は、 @move_fromと、@move_toで指定します。

(4)html化するかどうかを指定する。

プログラムの初めの部分の、
$html_or_doc をyesにするとhtml化します。
この場合、$urlでサーバーのURLを指定します。

３。実行

以下のソースを例えば、file.pl というファイルにセーブし、

      chmod 755 file.pl

によって、実行可能にする。入力ファイルは$file_nameに書いておきます。

     file.pl > out1

により、出力はファイルout1に出ます。

４。本文

#!/usr/bin/perl
#   file.pl               by M. Takasu,  August 4, 1997                     
#     version 2.1
#   analyze access_log and get statistics for html files
#
##########    input from user
 $html_or_doc = "yes";     # yes for html, no for document
$file_name="log5";
   
# the words you don't need
@grep   = ("1.0 404","1.0 401","cgi-bin","icons",
        ".jpg",".jpeg",".gif",".pl","/results_ct.html", "j.html",
        "kanri","riji","Photo/97.6.22","photo.hyoushi");

 $file_start = "GET";      # look at access_log to determine position
$file_end   = "HTTP";
 
#  the files that were moved recently 
@move_from = ("comp/card.html");
@move_to   = ("comp/mg/card.html");

#  url to be added 
$url = "http://hiwa003.s.kanazawa-u.ac.jp";

#########  program part: don't change unless necessary

$html1 = "<a href=\"";
$html2   = "\">";
$html3   = "</a>";

$nword = 0;
open(inputfile, $file_name) || die "can not open file \n";

while (<inputfile>) {
   $line = $_;
   ($where, $tail1) = split($file_start, $line);
   ($file, $tail2) = split($file_end, $tail1);
     $find = &checkline($line, @grep);      # check if line in @grep appears

   if ($find eq 0){
        $word = &cleanup($file);     # clean up the word
       $word = &checkfile($word, @move_from, @move_to); 

        if($word ne ""){      # check if the word is not empty
           $co{$word}++;                      # counter for words
          $nword++;
       }
   }
}

### Sort the words with counter and word ##########

@sorted = sort by_counter_word keys %co;
foreach $word1 (@sorted){

    if ($html_or_doc eq "yes"){       # for html
       print " ", $co{$word1}, "  ",$html1, $url, $word1, $html2,
       $word1, $html3, " \n";
   }
    if ($html_or_doc eq "no"){        # for documents
       print " ", $co{$word1},"  ", $word1," \n";
   }
}
print "nword, iaccum, = ", $nword, "   ", $iaccum, "  \n";

###  subroutines below ###################################
### check if $line1 contains a word in list $file1

sub checkline{ 

   local($line1, @file1) = @_;
   local($i) = 0;
   local($find) = 0;

   while ($i <= @file1 && $find eq 0) {   
      local($starta) = index($line1, $file1[$i]);  
      if ($starta > 1 ) {
           $find = 1;      # file is used
      }         
    $i++;
   }
   $find;   
}

#### if a word $w1 contains expression in list $file1,
####  change it to a certain expression in $file2.

sub checkfile{

   local($w1, @file1, @file2) =@_;
   local($i) = 0;
   while ($i <= @file1){
      $w1 =~ s/$file1[$i]/$file2[$i]/;    
      $i++;
   }
   $w1;
}
#### clean up the word ###################################
sub cleanup{

   local($w1) = @_;
    $w1 =~ s/\/\s*$//;               # get rid of last / plus space \s
    $w1 =~ s/\/index.html\s*$//;     # get rid of index.html
    $w1 =~ s/\s*$//;                 # get rid of space
   $w1 =~ s/^\s*//;               
   $w1;
}

########  sort definition ######################

sub by_counter_word{
    ($co{$b} <=> $co{$a}) || ($a cmp $b);
}
exit 1;

Perlのページ  コンピュータのページ

ホームページのアクセスの 統計をとる perlスクリプト: file.pl

１。このスクリプトの目的

２。準備

３。実行

４。本文

ホームページのアクセスの統計をとる
perlスクリプト: file.pl