生命物理科学研究室・高須昌子のHP

List of class taught by takasu thermodynamics perl code to analyze referer_log by takasu

ホームページのリンクからのジャンプの
統計をとる perlスクリプト: ref.pl

            by 高須    98.1.31   ホームページ

   Perlのページ  コンピュータのページ

１。このスクリプトの目的

referer_logから、どこのリンクから飛んでくるアクセスが多いか、
統計をとり、結果をhtml化します。

出力例はこちらです。

２。準備

(1) referer_logの出力方法を調べる。

私の使っているサーバーでは、
次のように出力されます。

http://www02.so-net.or.jp/~suzuden/lynk.htm -> /~takasu/comp/mo.html
== [02/Jan/1998:18:02:46  0900]

->の前の部分を調べればよいことがわかります。
他の出力方法の場合は、それに合わせて、
プログラムを変える必要があります。

(2)統計にのせたくないファイルを指定する。

これは、プログラムの初めの方の@omit_words
に、統計にのせたくない語句を指定します。
私の場合は、検索エンジンなどからのアクセスを省いてます。

(4)html化するかどうかを指定する。

プログラムの初めの部分の、
$html_or_doc をyesにするとhtml化します。

３。実行

以下のソースを例えば、ref.pl というファイルにセーブし、

      chmod 755 ref.pl

によって、実行可能にする。入力ファイルは$file_inputに書いておきます。

     ref.pl > out1

により、出力はファイルout1に出ます。

４。本文

#!/usr/local/bin/perl
#   ref.pl               by M. Takasu,  Jan 31, 1998
#   using perl 5
#   analyze referer_log and count the jumps from other pages
#
####   user input   

$html_or_doc = "yes";    # yes for html, no for document
$file_input = "ref5";   # referer_log

#       omit lines that contain any one of the following.
@omit_words   = ("goo.ne.jp", "infoseek",  "INFOSEEK", "odin.cgi", 
        "hole-in-one",
        "kuamp.kuamp.kyoto-u.ac.jp", "yahoo", "search", "query",
        "altavista.digital.com",  "jp.excite.com",
        "netscape", "qtfd", "yubitoma", "bookmark", "internal",
        "doc/mail", "nustub");
#
######  program part
#         don't change below unless necessary

$html1 = "<a href=\"";
$html2   = "\">";
$html3   = "</a>";

open(inputfile, $file_input) || die "can not open file \n";

while (<inputfile>) {
   ($where, $file) = split(/->/, $_);

    $yn_engine = &check_omit($where, *omit_words);

    if ($yn_engine eq "no"){  
       $count_link{$where}++;  
    }
}
close(inputfile);

&sort_link;

################### start subroutines ##############################

#-----------------------------------------
sub sort_link{
    @sorted = sort by_counter_link keys %count_link;
    local $item_number = 1;    # to be used in &print_link
    my $njump = 0;

    foreach $link (@sorted){
       $njump = $njump + $count_link{$link};
       &print_link;
       $item_number ++;
    }
    print "total ", $njump, "\n";
}
#-----------------------------------------
sub print_link{
   if ($html_or_doc eq "yes"){      # for html
       print " ", $item_number, "     ",$count_link{$link}, 
       "  ",$html1, $url, $link, $html2,
       $link, $html3, " \n";
   }
   if ($html_or_doc eq "no"){       # for documents
       print " ", $item_number, " ", $count_link{$link},"  ", $link," \n";
   }
}
#-----------------------------------------
sub by_counter_link{
    ($count_link{$b} <=> $count_link{$a}) || ($a cmp $b);
}
#-----------------------------------------
sub check_omit{    # copied from list.pl
#
#   $yn = &check_omit($line, list of words to be omiited) 
#
  local ($line, *omit) = @_;
  my($yn) = "no";
  for ($i =0; $yn eq "no", $i < @omit ; ++$i){
     if ($line =~ /$omit[$i]/) {
        $yn = "yes";
     }
  }
  $yn;
}

   Perlのページ  コンピュータのページ

ホームページのリンクからのジャンプの 統計をとる perlスクリプト: ref.pl

１。このスクリプトの目的

２。準備

３。実行

４。本文

ホームページのリンクからのジャンプの
統計をとる perlスクリプト: ref.pl