Hi all,
I would like to know which directories on my servers have more than 100 files or more than 50MB worth of files and of the directories that exceed either of those parameters I want to know which extension types (and how many files of each and sum(size) of those files) are in the directory.
The below query kinda works however it only returns paths where a single file type has more than 100 files or has a sum(size) or greater than 52428800 Bytes (50MB). So if I have a directory (eg Z:PUB) that uses 1GB and 500MB is doc files, 451MB is xls files and 49MB is txt files it will only return results on the doc and xls files and I will have no idea what other file types are in the directory. I need to base my size and file total criteria off of the directory as a whole not the specific file types within.
This will tell me that Z:\PUB contains 500MB of doc files and 451MB of xls files however it will return nothing about the 49MB of txt files.
SELECT count(*) as Files, TO_LOWERCASE(EXTRACT_PATH(Path)) as Path, TO_LOWERCASE(EXTRACT_EXTENSION(Path)) as Extension,
MIN(LastWriteTime) as MinLastModified, MAX(LastWriteTime) as MaxLastModified,
DIV(DIV(MUL(1.0, SUM(Size)), 1024), 1024) as SizeMB, DIV(MUL(1.0, SUM(Size)), 1024) as SizeKB, SUM(Size) as SizeBytes, SYSTEM_TIMESTAMP() as LastUpdate
INTO Filecount
FROM %source%
GROUP BY TO_LOWERCASE(EXTRACT_PATH(Path)), TO_LOWERCASE(EXTRACT_EXTENSION(Path))
HAVING Sizebytes > 52428800
OR Files > 100
Adding an additional SELECT statement in my WHERE clause would get me what I want as I could apply my first select statement only to directories that exceed the criteria defined in my where clause and still keep my group by extension type information as well. Unfortunately when I attempt this I am running into syntax errors.
Here's the query using a subquery in my where clause. Variables are passed from a Perl script and are the same in both select statements.
SELECT count(*) as Files, TO_LOWERCASE(EXTRACT_PATH(Path)) as Path, TO_LOWERCASE(EXTRACT_EXTENSION(Path)) as Extension,
MIN(LastWriteTime) as MinLastModified, MAX(LastWriteTime) as MaxLastModified,
DIV(DIV(MUL(1.0, SUM(Size)), 1024), 1024) as SizeMB, DIV(MUL(1.0, SUM(Size)), 1024) as SizeKB, SUM(Size) as SizeBytes, SYSTEM_TIMESTAMP() as LastUpdate
INTO Filecount
FROM %source%
WHERE (SELECT COUNT(*) FROM %source% WHERE Size > 0 GROUP BY TO_LOWERCASE(EXTRACT_PATH(Path))) > 100
GROUP BY TO_LOWERCASE(EXTRACT_PATH(Path)), TO_LOWERCASE(EXTRACT_EXTENSION(Path))
This returns the follow error when I run it
"Syntax Error: <term2>: expecting matching closing parenthesis instead of token 'COUNT(*)'
I really don't have the option of running the job twice to parse through my first pass as the job will already take a long time. I also cannot remove the file size or file number criteria (though the numbers being used here are just examples).
Any help would be greatly appreciated.
Thanks!