Jump to content

Search the Community

Showing results for tags 'log files'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Delphi Questions and Answers
    • Algorithms, Data Structures and Class Design
    • VCL
    • FMX
    • RTL and Delphi Object Pascal
    • Databases
    • Network, Cloud and Web
    • Windows API
    • Cross-platform
    • Delphi IDE and APIs
    • General Help
    • Delphi Third-Party
  • C++Builder Questions and Answers
    • General Help
  • General Discussions
    • Embarcadero Lounge
    • Tips / Blogs / Tutorials / Videos
    • Job Opportunities / Coder for Hire
    • I made this
  • Software Development
    • Project Planning and -Management
    • Software Testing and Quality Assurance
  • Community
    • Community Management

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Delphi-Version

Found 2 results

  1. I've got a log file that I want to parse so I can use the data behind some kind of "dashboard". I don't have anything specific in mind yet for the dashboard, except for maybe an approach I describe at the end of this post. Let me describe what's going on briefly first. Basically, the client has a bunch of offices around the country, and at the end of the day someone drops a pile of forms into a scanner and it scans them all into a single PDF file. I call this an "aggregate PDF file." These aggregate files are uploaded to an FTP area for us. We get anywhere from 10-40 of them to process daily and they contain anywhere from one to 200+ scanned forms. It's a batch process. I was assigned to maintain the code that performs that batch process. When I took this on, there were some problems, but they were totally invisible so nobody knew about them. The log file was only documenting a few errors, and it turned out there were other errors that weren't being detected or reported. So my first task was to enhance the log file to the point where I could use it to audit the results, and ultimately track down errors. I've tracked down most of the errors at this point, and I've found structural issues and other things that have been around for ages that nobody considers as "problems", but that's a story for another day. Here's a snippet of the log file showing what's recorded for a typical aggregate file. This is one of 37 aggregate PDFs in this particular batch, and it contains four documents. One of them appears to be junk, which is common. (Some offices will drop in dozens of the wrong forms into the scanner; we just ignore them.) Just FYI, there's an OCR process that occurs for each document where we try to extract a couple of numbers; they show up here as DocLink / Pick Ticket# / PTN, and OrderNum / Order#. The database is queried for a corresponding invoice that refers to one of these so they can be matched up. In a lot of cases, there isn't one. (Another issue that nobody sees as a problem.) Note that I separate different sections by ========================== flags. Don't look too close and try to make sense of the details as I've doctored it a bit to show some variations. =============================================================================== >> Processing Aggregate file 3/37: 0100_TEMSTOD4_%D%T_200507162634.pdf >> 4 Files extracted -- ===================================== -- Now processing single-page PDF (1 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.1.pdf >> Pick Ticket# found : 10058190 >> Order # found : 11452907 (corrected from: 1[452907) ===================================== -- Now processing single-page PDF (2 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.2.pdf >> Pick Ticket# found : 10052571 (corrected from: |0052571) >> Order # found : 11416133 (corrected from: I1416133) ===================================== -- Now processing single-page PDF (3 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.3.pdf ?? didn't find a PTN or OrdNum where we expected them to be () Maybe it's upside-down ... let's rotate it and try again :( Nothing useful here. Moving on... ===================================== -- Now processing single-page PDF (4 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.4.pdf ?? didn't find a PTN or OrdNum where we expected them to be () Maybe it's upside-down ... let's rotate it and try again >> Pick Ticket# found : 10061327 >> Order # found : 11547908 ============================ == CombineMultiPageOrders == >> RelatesToPage: first-pg=[0800_LVCopierBW05_08_2002_59_44.6.pdf] this-pg=[0800_LVCopierBW05_08_2002_59_44.7.pdf] >> RelatesToPage: first-pg=[0800_LVCopierBW05_08_2002_59_44.8.pdf] this-pg=[0800_LVCopierBW05_08_2002_59_44.9.pdf] -- Deleting file #8: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.9.pdf -- Deleting file #7: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.8.pdf -- Deleting file #6: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.7.pdf -- Deleting file #5: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.6.pdf >> 2 pages in C:\Loader\Out\2_1_0800_LVCopierBW05_08_2002_59_44.6.pdf | Doclink=10062731 OrderNum=11561321 >> 2 pages in C:\Loader\Out\2_1_0800_LVCopierBW05_08_2002_59_44.8.pdf | Doclink=10060874 OrderNum=11558566 ================================ == ProcessDocumentAttachments == >> Processing Attachment File 1/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf" DocLink=10058190 OrderNum=11452907 == RemoveDuplicateAttachments == -- Removing 1 Dup attachment(s) [20383603] -- TZipStore.Delete#1([20383603]) -- ZipFileName: \\xxxx.yyyy.com\attachments\2020-05-07\21419503.zip -- TZipStore.Delete#2(20383603) -- ZipFileName: \\xxxx.yyyy.com\attachments\2020-05-07\21419503.zip == LinkAttachmentToDocument == -- Pick Ticket found; linking with DocLink (PTN) = 10058190 docid = 3362739699 -- INSERTED document attachment named "Pick_Ticket_p10058190.pdf" with document_id=3362739699 -- TStorageMgr.StoreAttachment.FullName: \\xxxx.yyyy.com\attachments\2020-05-07\21419511.zip -- TStorageMgr.StoreAttachment( ZS(Not NIL), 20382703, C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf ) -- an = "C:\Loader\Out\20382703" -- FileExists(C:\Loader\Out\20382703) = YES! -- ZS.Add(an) --> 1 !! SUCCEEDED !! -- StorageMgr: storing C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf --> 20382703 (renamed before adding) -- File: C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf does NOT exist (should NOT) -- File: C:\Loader\Out\20382703 DOES exist (should) >> Processing Attachment File 2/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.2.pdf" DocLink=10052571 OrderNum=11416133 ** No matching doc (TicketHash) found for PTN: 10052571 ** Document with this order number already has a pick ticket (1 #row(s) found) : 11416133 >> Processing Attachment File 3/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.3.pdf" DocLink=10060319 OrderNum=-UNKNOWN- ** No matching doc (TicketHash) found for PTN: 10060319 ** Page OCR read problem: Missing orderNum >> Processing Attachment File 4/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf" DocLink=10061327 OrderNum=-UNKNOWN- == RemoveDuplicateAttachments == -- No duplicate attachments found == LinkAttachmentToDocument == -- Pick Ticket found; linking with DocLink (PTN) = 10061327 docid = 3362739763 -- INSERTED document attachment named "Pick_Ticket_p10061327.pdf" with document_id=3362739763 -- TStorageMgr.StoreAttachment.FullName: \\xxxx.yyyy.com\attachments\2020-05-07\21419495.zip -- TStorageMgr.StoreAttachment( ZS(Not NIL), 20382705, C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf ) -- an = "C:\Loader\Out\20382705" -- FileExists(C:\Loader\Out\20382705) = YES! -- ZS.Add(an) --> 1 !! SUCCEEDED !! -- StorageMgr: storing C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf --> 20382705 (renamed before adding) -- File: C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf does NOT exist (should NOT) -- File: C:\Loader\Out\20382705 DOES exist (should) ======================== == SetFilesAsBillable == -- IDs found that are ready to bill: [8124785177] =================== == UpdateWebview == -- Checking to see if there are any datafiles to sync with webview ... -- IDs found to sync with webview: [21419511,21419495] =============================================================================== >> Processing Complete for Aggregate file #3/37 : 0100_TEMSTOD4_%D%T_200507162634.pdf ... cleaning up ... =============================================================================== What I want to do is parse these out to be used to support some kind of "dashboard". Today's log file was nearly 14k lines long and it's pretty useless as-is. It all looks like the same patterns over and over -- only the numbers and filenames look different, and none of them are really meaningful. What's helpful is to see how they relate, and sometimes to be able to view the documents themselves. As can be seen, the structure is fairly simple with some variations in each block. It's easy to use regular expressions to recognize and parse different parts. What I'm wondering is what might be the best approach to ingest data like this? I can tell, for instance, when I've started processing an aggregate PDF (of which the whole example above is an example), and I can distinguish each of the different "sections" and "files" that are being processed. This much is easy. But would you build some kind of "parse tree" for this internally? Or would you just take the data as it's parsed and display it with bits and pieces attached as objects for when more details are wanted? Here's a statistical summary I show at the very end: =============================================== ============= S T A T I S T I C S ============= =============================================== == 37 Aggregate PDF files == 900 Documents processed == 804 Pick Tickets identified -- 89% == 51 Corrected PTNs == 143 Unreadable PTNs == 754 Order Numbers identified -- 83% == 246 Corrected Order Nums == 142 Unreadable OrdNums == 0 Pick Tickets with no PTN found == 194 PTNs not matching any documents -- 21% == 96 Forms found that were not identified as Pick Tickets -- 10% == 187 Docs with no matching Order Nums == 2 Docs with OrdNum that already have a PT attached == 6 Docs attached using OrdNums -- 0% == 52 Pages rotated to get viable data -- 5% =============================================== I'm thinking it might be nice to have something similar to act as the "entry point" to the dashboard that lets you click on one of the lines and display data starting from that perspective. You could drill-down to see details of what went into a given statistic. It might also allow you to see overlaps with other items if they were meaningful. There's some interesting data that could be gathered by looking at this data longitudinally. (Am I right in guessing that this edges into the world of "analytics"?) If you've got any experience with things like this, I'd be really interested in your thoughts on how you might approach it. Like ... would you store the parse tree anywhere? It takes less than 2 seconds to parse this file, so I'm not sure what might be gained from saving it. But I don't know ... that's what I'm asking about.
  2. David Schwartz

    log file policies

    I'm curious if anybody has any particular policies they follow when it comes to the use of log files. I've actually never run into any such policy anywhere I've ever worked. But we've run into a problem that is leaving us open to someone asking, quite legitimately, "What sort of policies do you guys follow when it comes to log files and data logging?"
×