« Previous Next »

Thread: Filtering IIS logs of bot entries and outputing to multiple files

Last post 06-27-2007 5:59 PM by LogParser User : mila. 2 replies.

Average Rating Rate It (5)

RSS

Page 1 of 1 (3 items)

Sort Posts:

  • 06-26-2007, 5:04 PM

    Filtering IIS logs of bot entries and outputing to multiple files

    I raised the question below to google groups and received an answer (also below). I'm new to SQL, Logparser, scripting with autoit, and IIS so please excuse my ignorance. I'm having trouble with syntax despite having looked at the documentation. I need to figure our how to write the two queries suggested in the answer to my google groups question pasted further below.

    One thing I'm trying to figure out is whether there is an abbreviated way to refer to all fields in a WHERE clause. I want to remove all lines in the log file that contain "botname". Its easy enough to say:

    logparser "SELECT* INTO A1.log FROM A.log WHERE cs(User-Agent) NOT LIKE '%botname%' " -i:IISW3C -o:IIS

    But how do I tell logparser to query against not just the cs(User-Agent) field, but all fields? Or do I even need to be searching in the other fields for this particular task?

    Here, in a question posted to google, are more details on what I am trying to do (the query I will ultimately need will be more complicated than the above example, but it would be nice to get it working in the simplified form first):

    I just started reading about logparser and I would like to use it to
    rid IIS logs of entries containing bot names listed in a text file.
    I'll be using autoit and Logparser COM.

    I was wondering what the best way to go about this would be. From what
    I have read in the logparser documentation, it seems that the
    following would not be possible:

    1. Logparser query cannot be setup such that: it is passed a file (say
    a list of bot names) and told to filter the inputfiles of all lines
    containing any of the terms in said file.

    2. Logparser cannot produce output upon running a query on a set of
    multiple files such that the output is organized into files that
    correspond to the source filenames. (So if files A, B, C are to be
    filtered, the output cannot automatically be placed in files A1, B1,
    C1 such that the contents of A1 = filtered contents of A, and
    similarly for B1 and C1).

    Given the above assumptions, I have concluded that If I want to filter
    a large set of files and if the filtering involves a large set of
    filter terms, the best way to utilize logparser would be to run a
    query multiple times in nested loops (inner loop traverses the list of
    bots, and outer loop traverses the list of files. [In effect: For each
    file A, [for each filter term 'bot' run a query that retreives all
    lines NOT containing "bot", placing the output in A1. Repeat query on
    file A for all bots names] Repeat for all files].

    I'd like to confirm whether or not my assumptions 1 and 2 above are
    mistaken, and if my suggested way of completing this task is over
    complicating things. Also any further tips or tool /scripting
    recommendations are most welcome. Thanks.


    ANSWER:
    I think your approach is a bit complicated

    You overlook the fact that LogParser allows you to make a SQL query
    against an aggregation of log sources while retaining knowledge of
    which log source a particular log entry came from. This means that
    what you want to do can be done with two commands which pass over the
    logged data only twice.

    The first command is a LogParser query over the aggregate of the log
    sources with a SQL WHERE clause constructed from the bot names in the
    text file. This simultaneously walks both your nested loops with a
    single SQL query. This allows File:A,File:B,File:C --(SQL-based-
    Filter)-> File:A1B1C1

    The second command chops up the output of the first command based on
    the Filename field which indicates the original log source that the
    data came from. This allows you to reconstruct File:A1, File:B1,
    File:C1 given a file whose contents is A1B1C1 concatenated

    Of course, it is possible that aggregating everything into memory all
    at once is too much, in which case your incremental approach would be
    necessary.

    //David
  • 06-27-2007, 4:19 PM In reply to

    RE: Filtering IIS logs of bot entries and outputing to multiple files

    Hope this is useful to an apprentice like myself:

    After some trial and error I have a working version of the second query using multiplexing.

    The first query simply filters out rows containing a certain string from an aggregate of IISW3C input sources and outputs to a single w3c output format file named w3c_out.log.

    It contains rows whose first fields are the original IIS log source path (it took me a few tried to figures out that the field name is logfilename2, not logfilename1). I use this field to create an output structure origfoldername/filenames via multiplexing (put the rows in separate files depending on where they originally came from as follows):

    The query in run on w3c_out.log:

    logparser "SELECT EXTRACT_PATH(logfilename2), EXTRACT_FILENAME(logfilename2), date, time, c-ip, s-ip, cs-method, cs-uri-stem, cs-uri-query, sc-statu
    s, sc-bytes, cs-bytes, time-taken, cs-version, cs(User-Agent), cs(Cookie), cs(Referer) INTO *_Filt\FILT_* FROM w3c_out.log" -i:W3C -o:W3C -e:10

    The original folder and files were named:
    c:\logs\april2007\log1.log ....log5.log

    The first query produced c:\w3c_out.log

    The 2nd query used multiplexing to reorganize the output records as follows:
    c:\logs\april2007_filt\filt_log1.log ...filt_log5.log

    Now I need to figure out the syntax for running the first query against not a single search term but many terms in a txt file. Actually it seems this is possible by putting the filter terms in a csv and using a SELECT embedded in a NOT IN:

    SELECT* FROM c:\logs\april2007\*.* WHERE cs(uri-agent) NOT IN(SELECT botname FROM badbots.csv)" i:IISW3C o:W3C
  • 06-27-2007, 5:59 PM In reply to

    RE: Filtering IIS logs of bot entries and outputing to multiple files

    Well this query did not work:

    SELECT* FROM c:\logs\april2007\*.* WHERE cs(uri-agent) NOT IN(SELECT botname FROM badbots.csv)" i:IISW3C o:W3C

    The second select is being executed as having the input format specification IISW3C.
Page 1 of 1 (3 items)
Microsoft Communities