I am new to LogParser, but have to say it looks great...except for some really annoying things with log inputs.
I have a variety of logs I need to parse for one-time analysis and statistics. They will be in any number of formats, mostly proxy logs. Some are ISA and IIS, but some are SQUID, BlueCoat, SmartFilter, Apache and any number of other formats. The order of
log fields in unpredictable and most have space delimiters, but sometimes commas or tabs. And there could be 20-30 Gig of logs in any processing batch.
Each LogParser input format creates it's own challenge with any of these logs. Some have #Fields:, other don't, some have quoted strings, some have timestamps like this: [29/May/2010:19:48:21 -0400] (which parses to 2 fields with a space delimiter)
I'm torn between the TSV and W3C formats. Each has _almost_ what I need for everything, but not quite all.
Neither has both. So, is there a way to extend either of these formats so it has the other's features? I would settle for just dQuotes in the TSV format, but I will take what I can get. (There are lots of "User-Agent" strings, that really throw off parsing)
I understand there is some COM extensibility. I've looked at some sample code, but it just isn't clicking on how to put it all together in C# yet. I need to study more examples, but haven't found what I need to make the peices fall together yet. Any suggestion
for examples for this task specifically? (note that I would prefer not to use regex. I have a Perl tool that does it already and am trying to replace that instead of having people hand-edit regexes all day)
My desired end-state is to have a C# front-end that can load the first N lines of any file format into a datagrid, and assign fieldnames from a selection so I can proceed with some of the statistics I want to gather.
If there is not something already canned for the COM plugin, can you guide me on how to write one to get really close to how TSV works?
(I guess the LogParser source itself isn't available, huh? )
jonathan.h, I could not duplicate your error (when you enabled dQuotes), however the core problem is the time_stamp field does not have quotes around it, which causes all of the issues because it has a space in it ... I think everything else is exactly one
field/column off because the space in the bracketed time_stamp field ...
lordchariot
2 Posts
Custom TSV input format
Dec 14, 2010 05:47 AM|LINK
I am new to LogParser, but have to say it looks great...except for some really annoying things with log inputs.
I have a variety of logs I need to parse for one-time analysis and statistics. They will be in any number of formats, mostly proxy logs. Some are ISA and IIS, but some are SQUID, BlueCoat, SmartFilter, Apache and any number of other formats. The order of log fields in unpredictable and most have space delimiters, but sometimes commas or tabs. And there could be 20-30 Gig of logs in any processing batch.
Each LogParser input format creates it's own challenge with any of these logs. Some have #Fields:, other don't, some have quoted strings, some have timestamps like this: [29/May/2010:19:48:21 -0400] (which parses to 2 fields with a space delimiter)
I'm torn between the TSV and W3C formats. Each has _almost_ what I need for everything, but not quite all.
TSV has: headerRow/iHeaderFile, nSkipLines & lineFilter.
W3C has: dQuotes, #Fields: header row parsing.
Neither has both. So, is there a way to extend either of these formats so it has the other's features? I would settle for just dQuotes in the TSV format, but I will take what I can get. (There are lots of "User-Agent" strings, that really throw off parsing)
I understand there is some COM extensibility. I've looked at some sample code, but it just isn't clicking on how to put it all together in C# yet. I need to study more examples, but haven't found what I need to make the peices fall together yet. Any suggestion for examples for this task specifically? (note that I would prefer not to use regex. I have a Perl tool that does it already and am trying to replace that instead of having people hand-edit regexes all day)
My desired end-state is to have a C# front-end that can load the first N lines of any file format into a datagrid, and assign fieldnames from a selection so I can proceed with some of the statistics I want to gather.
If there is not something already canned for the COM plugin, can you guide me on how to write one to get really close to how TSV works?
(I guess the LogParser source itself isn't available, huh? )
Thanks.
logparser tsv custom format
ron_bo
52 Posts
Re: Custom TSV input format
Jan 02, 2011 08:32 PM|LINK
Your data is either tsv or w3c (or neither, I guess), so not sure why you need to combine both ... You should post some sample data ...
lordchariot
2 Posts
Re: Custom TSV input format
Jan 10, 2011 08:51 PM|LINK
So this might be an example of one file (first 2 lines):
#author src_ip server_ip "auth_user" time_stamp "req_line" status_code bytes_from_client bytes_to_client "referer" "user_agent" "attribute" block_res "media_type" "profile" elapsed_time "virus_name" rep_level cache_status
client 192.168.2.10 72.14.204.99 "domain\username1" [01/Jan/2011:13:50:41 -0500] "GET http://www.google.com/ HTTP/1.1" 200 592 14179 "" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0)" "gr" - "text/html" "Typical_Business_Filter" 22.847 "-" Neutral -
And this might be an example of another (no field headers):
10.11.12.13 - username [01/Jun/2008:00:00:03 +0000] "GET http://mail.google.com HTTP/1.1" 200 146 ALLOW "Web Mail"
In these instances, the [square brackets] don't parse very well.
I see a lot of talk about LogParser being able to define your own input formats, but I don't see any real examples of it (in C#).
Can someone point out a real example of parsing a custom format that I can at least look at?
jonathan.h
1 Post
Re: Custom TSV input format
Jan 30, 2013 03:18 PM|LINK
I'm having this issue also.
How do we parse below data using W3C input format
Filename: access.log
Content:
I tried using below query statement:
but it returned misaligned data from the headers. But when I try to add -dquotes:ON, I get an error:
What do you think did I miss?
ron_bo
52 Posts
Re: Custom TSV input format
Feb 18, 2013 01:32 PM|LINK
jonathan.h, I could not duplicate your error (when you enabled dQuotes), however the core problem is the time_stamp field does not have quotes around it, which causes all of the issues because it has a space in it ... I think everything else is exactly one field/column off because the space in the bracketed time_stamp field ...
ron_bo
52 Posts
Re: Custom TSV input format
Feb 18, 2013 01:36 PM|LINK
lordchariot -
This file is NCSA format ...