Jump to content

David Schwartz

Members
  • Content Count

    1264
  • Joined

  • Last visited

  • Days Won

    26

Everything posted by David Schwartz

  1. David Schwartz

    Trying to DL image yields a web page instead

    ok, I set up a TWebBrowser in one tab and the image processing stuff in another tab. I open the ticket and can drag-n-drop the image link to an area at the top. It switches to the 2nd tab and I click a Process button. That causes the browser to navigate to the image, which it loads into the browser window. Perfect. Now I just need to grab it. But ... while I'm getting the height and width of the image, I'm not getting the image to show up most of the time. Sometimes, but mostly not. I'm finding the image files on the page using IHTMLElement2.getelementsByTagName('img') and returning the first one (since that's all there is on the page). img := getFirstImage; Image1_frame.Height := img.height+2; Image1_frame.Width := img.width+2; rnd := img as IHTMLElementRender ; rnd.DrawToDC(Image1.Canvas.Handle); Image1 is aligned to Client on a panel Image1_frame. So I set the frame's H & W -- they get set ok. But the image is usually not visible. I see that DrawToDC is deprecated, but I haven't found what to replace it with. What am I missing here?
  2. David Schwartz

    Trying to DL image yields a web page instead

    hmmmm ... now that's an interesting approach ... I'll still need a way to select each individual DL link because the way we do things in this case uses the same ticket to collect these requests up for a whole month, as there can be a dozen requests or more. We only want to deal with the latest ones, and there might be one, two, or even three at once.
  3. David Schwartz

    Trying to DL image yields a web page instead

    Yeah, I woke up this morning thinking it's probably looking for a login cookie. I wonder if there's some way to have the http component look up the cookie in the other browser's cache? I'm clicking and dragging from browser window A to the app, and I guess the http component looks like an unrelated browser window B. I don't really want to force the user into a second login. That said, I could add Name + Pwd fields to this little app and save them, but that's getting into a very muddy area here.... There are a few Authorization events in IdHttp component: * OnAuthorization * OnProxyAuthorization * OnSelectAuthorization * OnSelectProxyAuthorization I guess Right-Click --> Save As ... runs in the security context of browser window A, but a drag&drop runs in the context of browser window B. I wonder if I can set up a proxy of some kind? They really should be the same context.
  4. David Schwartz

    how to get a pseudo-design mode at run-time

    I'm working with TMS Web Core, which pretty much mirrors the VCL. I want to be able to switch between a "configuration" mode and a "run" mode. So say you click an Admin button somewhere and it flips a switch so that whenever you click on something it sends the object's instance pointer to a single handler rather than activating the normal mouse events. Then depending on the object selected, a property form will popup that lets you change certain properties. Some of the properties represent meta-data, and some are actual content. For example, if you click on a box that represents a video, then it will ask for a link to a video and a thumbnail. If you click on a text box, then it will ask you for the text you want to display. I don't need a full designer, as I don't want to move anything around. I just want to suppress the normal mouse events and route everything through a single handler. If this was just a normal Delphi VCL app, what's the best way to handle this at run-time? (No IDE is running, just the EXE.)
  5. David Schwartz

    how to get a pseudo-design mode at run-time

    Hmmm, gotta see how that's done in Web Core.
  6. David Schwartz

    Trying to DL image yields a web page instead

    You missed the part where I said this HTML page is a LOGIN page. The only image on it is the company's logo. That's not what I'm looking for. The URL is pointing to an IMAGE FILE. Not a login page. The MIME type is "image/png" not "text". And a half-dozen questions on Stack Overflow where people asked how to DL specific files, this is the approach that was recommended. Not one of them even hinted that a TWebBrowser is needed. Right-click --> Save As ... actually saves a PNG file. Not an HTML page. Always. The image is in a download folder. Not on a LOGIN page.
  7. I've never had a problem installing a bunch of component libs and then restarting Delphi afterwards. I don't understand why so many libs that GetIt installs require Delphi to be restarted after each one. And after it restarts, it doesn't even have the courtesy of leaving you back where you were when it initiated the restart.
  8. David Schwartz

    Why does GetIt require Delphi to restart so often?

    this makes sense if I'm going to USE what I just installed. But if I'm in the process of installing several components, then there's no need to restart because none of them depend or refer on the others. They're all self-contained. It works fine to restart after you've installed the whole batch. At least, I've never had a problem except for some unusually complex installs.
  9. What's your take on whether a FE and BE should be accessible from the same page / form or completely separately? I've seen desktop apps where the Admin / setup stuff is a totally separate app, and apps where there's a Setup / Config / Options link in a menu. Wordpress is infamous for their "Meta" section with a "Login" link to get to the Admin dashboard. You can't separate them even if you wanted to. SaaS solutions often take you to an Admin area that's separate from where your Users will go, and that generates the User's view elsewhere (frequently a subdomain). I've never given much thought to this. But with things like TMS WebCore, IntraWeb, UniGui, and others for building web apps, now I'm curious.
  10. David Schwartz

    Threading question

    I'm working with some code that was written many years ago, and it uses threading to circumvent delays that tended to occur back when computers were slower and it took a while for the UI to update. For the most part, these apps were headless -- just stuff that ran on a server and spat out log files to document their travails. The thing is, the log files are too sparse; there's not enough info in them to support any sort of auditing. So now we're finding that the program has been failing to update things in the DB and side storage for years and nobody knew. I'm trying to add code to capture more details to the log, but I'm getting random AVs that I suspect are due to the threading not properly synchronizing with the main form. Access violation at address 0040DD41 in module 'DocLink.exe'. Read of address 00000001 The code uses this to send data from the thread to the main form: procedure TSequence.StatusOut(const stat: string); begin if (MainWinHandle <> 0) then Windows.SendMessage(MainWinHandle, WM_STATUSOUT, LongInt(PChar(stat)), 0) end; //---------------------------------------------- // this seems to be the other side in the main form: procedure WMStatusOut(var Message: TMessage); message WM_STATUSOUT; . . . procedure StatusOut(const stat: string); begin LogWrite(stat, true, true); FrmMain.memStatus.Lines.Add(stat); Application.ProcessMessages; end; procedure TfrmMain.WMStatusOut(var Message: TMessage); begin StatusOut(PChar(Message.WParam)); // local StatusOut procedure inherited; end; The main form stuffs it into a TMemo (memStatus) as well as sending it to a logger. I've done some research and it seems that SendMessage should be sufficient. It's ok for low volumes of traffic, but as I send more data, the threads start throwing these AVs. Personally, I don't see any need for threading today, but it is what it is, and I'm not about to redo everything to remove the threading. Any suggestion on how to deal with this? BTW, I'm only guessing that this is the problem. We've got a version of this code that has been running for 2-1/2 years with none of these errors showing up. It's only now as I'm trying to capture more details in the log file that I'm getting these errors.
  11. David Schwartz

    Front-end vs Back-end question

    nearly every question posted here can be answered with "it depends" and stop right there.
  12. I have mixed feelings about this stuff. One big one is that Borland / Inprise really dropped the ball back in the D6/D7 years when they thought it was a Good Idea to hitch their wagon to .NET and everything Microsoft. They made some improvements in the language that left a lot of customers in the dust holding a bag of rocks. Here we are today and they're complaining that these same users STILL don't think it's worthwhile to invest in moving past D6/D7. Sheesh. Developers cost 2x-3x more today than they did back then, and if it didn't make financial sense to upgrade back then, then surely it makes worse sense today. Embt is not making any more friends complaining about the resources these legacy clients are costing them. The problem isn't the compiler -- it's the 3rd-party components like Dream Components that died on the vine and couldn't easily move forward. If they want to fix the problem, Embt should consider buying the rights to these old component libs and investing their own resources in making them work on the latest versions of Delphi. Add them to GetIt and give people a legitimate upgrade path. Whoa! What a novel idea! Still, a lot of folks still won't consider upgrading because it's harder than ever to find developers with solid Delphi skills today. (I think it's easier to find COBOL programmers today than Delphi folks!) Another option is to have a separate maintenance program for legacy products. I have not found a single job in the past decade doing NEW Delphi work -- it's all supporting LEGACY apps that were written in the D4-D7 years. Maybe they're using newer versions of the compiler, but it seems silly to me that the company is COMPLAINING about the fact that all of these old legacy clients are refusing to pay their ridiculous maintenance fees to stay exactly where they are. It's nice that Embt wants people to move forward, but until more jobs start showing up for NEW DELPHI PROJECTS, they're doing little more than Sisyphus pushing a rock up a hill while complaining about the effort involved. They (previous Mgt) created this problem but they don't seem to want to fix it. The world is moving to Open Source Software. Delphi is one if the few remaining products that's not just NOT OSS, but VERY EXPENSIVE for commercial use. Microsoft subsidizes the crap out of their dev tools, as do others like IBM and Oracle. I think the best thing for Delphi would be for Embt to push to get Delphi acquired by a company that can afford to move it in the direction of OSS by subsidizing it from other product revenues. Instead, they keep raising the costs to customers who are mostly using it to MAINTAIN OLD CODE. I'm working on my 4th or 5th gig since 2009 that's maintaining code written prior to D2007 and it hasn't changed at all. The company has NO PLANS FOR FURTHER DELPHI DEVELOPMENT beyond maintaining their legacy code. They pay for maintenance updates, but so what? A couple of places I worked are extremely hesitant to allow any sort of large-scale refactoring -- they say if they wanted to invest in that amount of work, they'd just assume switch to rebuilding the thing from scratch in C#/.NET or something else -- not Delphi. WHERE ARE THE NEW PROJECTS THAT ARE CREATING MORE DELPHI JOBS? This is a MARKETING PROBLEM for Embt. I don't think they have any right to complain when they have steadfastly maintained a posture that has gotten them exactly nowhere in the market. There's no evidence that their products are being used for more NEW product development than to support LEGACY projects. Where's the beef? Or rather, Where's the NEW work? (And don't respond with, "well, we're doing new stuff!" If you are, say how many devs you've hired to help with the NEW stuff vs. to maintain the OLD code. Rather, show me, say, 10 job postings made to any of the popular job boards that are legit posts to hire people for NEW DELPHI-based projects. Nobody hires new devs for new Delphi work -- it's a reward given to long-time employees. The new-hires are almost always for back-filling open spots maintaining the old code. We've lost 3 people in the past 6 months who worked with Delphi, and I'm the only new-hire to replace one. Now Mgt is running around like chickens with their heads cut off because they failed to plan for this. Two of these guys left to work on stuff that's "more fun"; one non-Delphi and one is another legacy project but with some slow growth of new features. EVERYTHING I've seen in the past decade, or been contacted by recruiters about, has been MAINTAINING LEGACY CODE. I've found NO NEW WORK on Delphi, especially within 500 miles of where I live.)
  13. David Schwartz

    Front-end vs Back-end question

    depends on what? I'm curious what folks think when it comes to web apps vs. desktop apps.
  14. David Schwartz

    app logins

    I'd like to hear people's thoughts about this topic. I'm working with TMS WebCore and their MyCloudData to prototype something. There's a kind of utopian idea that you can "build once, deploy anywhere", but there's a fly in the ointmen that nobody seems to talk about. It seems to me that web apps come in two flavors: open and accessible to all that don't tend to save data; and everything else that lets you do stuff and save data across some notion of "sessions". The former might delivery kind of utility, like prettyprinting code or translating data from one format to another. The latter is what I'll generically refer to here as a "membership site". (Perhaps another terms is more appropriate; this is just how I think of them.) Historically speaking, desktop apps had no form of "login" -- they relied on the fact that there was a login on the computer, and assumed anybody who could get on the computer was permitted to access the software on it. This assumption still lives today on desktops as well as mobile devices. Which means you cannot simply take a desktop app that saves user data and drop it on a website to turn it into a web-based app. A lot of existing apps DO, in fact, offer if not require you to login, and there are a lot of reasons for that besides allowing you to keep your saved data separate from others. One big reason is to access walled-off services that require a paid subscription to allow access, for example. (At the very least, a registration is required in any case.) The thing is, the front-end or web-app could use something like OAuth2 to verify your login. If it's simply to gain access to some stuff kept behind a paywall, that's fine. But what if it uses your login to partition your data from everybody else? Back-end services typically have a login; in many cases, they're used by the developer or vendor to ensure nobody else can use the resource(s). For example, if my app uses SQL Server or MySQL, I have a login that all of my apps probably use to access my DBs. They may all share the same credentials. But they're MY credentials, as the developer. What about the users? How deep do you push the use of user credentials? The user could login just to prove they have a current account, then everything else could be done with MY (developer's) credentials. If you need HIPAA or PCI compliance, tho, I'm not sure that would fly. I'm wondering about this b/c I work in an environment now where user credentials go all the way down to the bedrock for desktop stuff. I'm not sure about our web tools, except they do require logins that are integrated into our single-sign-on protocols. I can see that a lot of services my software might access do not need to be partitioned for use by each user with their own credentials. But, in some cases, they might. So let's say you have an app and it requires a login to access and maintain some personal (but not very sensitive) data, then it can drop a cookie (in the web-app case) that, say, lasts a month. (I see this on lots of my phone apps.) The login controls access to some common data as well as a limited set of personal data. This isn't how Desktop apps normally work -- Windows or MacOS or *nix logins run the show in most cases. I'm not sure about mobile apps. Web apps designed like Delphi apps are still rather new. (Any IntraWeb users wanna chime in here?) But you don't design php or Wordpress sites as if they're Delphi apps. (In Wordpress, everybody gets a login, but the underlying resources all rely on a common access login. Strangely, it's common for membership sites that run inside of Wordpress to have a completely separate way of managing users rather than using the logic built-in to Wordpress. I think that's because the membership sites want more meta-data than WP can collect on its users.) What do you do when you can build web apps in Delphi that can look and feel more like normal Delphi desktop apps? (I'm not says they MUST or even should, only that they can.) Have you given this any thought? If so, I'd love to hear your ideas.
  15. David Schwartz

    app logins

    @stijnsanders and @Kas Ob. -- I appreciate the depth of your replies, but in this case there's nothing really on the server that's personal other than when a user subscribed, for how long, if they're still active, and their renewal rate, along with their first name, pwd hash, and email address. I suppose I could encrypt that stuff, but it's not particularly sensitive. However, the front-end needs to read some of it -- in particular, the FE needs to know if a visitor is a currently active subscriber, and possibly if their subscription is close to expiring so they can renew it. Is any of this a problem with GDPR? I do like the idea of validating using a 3rd-party login like FB, Twitter, etc., as an option. I'll look into that. I imagine it puts the onus of user validation on these other systems, right?
  16. David Schwartz

    app logins

    This is what Wordpress does ... which I addressed. Seems in your hurry, you have nothing to add. Better off just moving on next time.
  17. David Schwartz

    Record and process audio

    Mitov.com has a bunch of things you can use. You want AudioLab. It works great.
  18. While your approach makes a little sense, many of us learned very painfully that centralized collections of things become a huge, ugly mess over time. If you're using OOP, the notion of "encapsulation" carries over to other things besides classes. Classes are contained in things, and those things are contained in things, and so forth. Put stuff in the unit that manages it. Use some form of dependency injection to pass objects into others that need them, and Factories to request things you need from common locations.
  19. David Schwartz

    How to flip image taken from front camera

    There's a setting somewhere to tell the camera to "mirror" the selfie images. For whatever reason they're set OFF by default, so most people end up with their selfies backwards. I don't know why this is so frigging confusing to people, or why the folks who make the camera software don't make it easier to flip images so they look correct without having to open your Gallery and edit the images one-by-one.
  20. I have Raize Components (errr ... Konopka Bonus thingies, or whatever they're called now) installed, and hitting F1 for help doesn't do anything useful. (in Tokyo 10.2.3). I know Raize Components had lots of great help files. So I'm curious whether the GetIt facility just chucks them aside, or if they're present but not installed? I'm not sure how to check.
  21. David Schwartz

    Do GetIt libs install any Help files?

    not here.... I did find the help files, but they're pretty out-of-date. (Look at TRzMRUComboBox. Only 2 of numerous events that are there. In particular, OnCloseup and OnSelect are missing. Also, TRzRegIniFile is missing a ton of stuff.
  22. David Schwartz

    Do GetIt libs install any Help files?

    where do I find them? And then install them into the Help system?
  23. I'm curious if anything has been said about language enhancements in 10.4. I tend to avoid language enhancements, but I'd love to see nullables as well as a ?? and ??= operators, and even a ternary ( x ? y : z ) operator, esp a ternary operator that works with nullables. I've seen that nullables are in the pipeline, but the overviews are getting so high-level that it's nearly impossible to discern any specifics.
  24. David Schwartz

    Why upgrade?

    If you're working with 32-bit apps and they work perfectly, and you're not using any of the new stuff being added to the Delphi ecosystem at it's fringes, and upper management asks why you need to upgrade everything to work with the latest Delphi release, what do you tell them? The code here was built in 2003 - 2011. It's stable, bug-free (aside from an occasional memory leak), and does not gain anything by being re-built on a newer version of the compiler. They really don't want to do any more work in Delphi, other than necessary maintenance. Their intention is to move all software over to C#. I don't know why, they just think it's a "better platform", and I don't really have anything to argue with. It's the same story I've been hearing for a decade now from pretty much everyplace I've worked.
  25. I've got a log file that I want to parse so I can use the data behind some kind of "dashboard". I don't have anything specific in mind yet for the dashboard, except for maybe an approach I describe at the end of this post. Let me describe what's going on briefly first. Basically, the client has a bunch of offices around the country, and at the end of the day someone drops a pile of forms into a scanner and it scans them all into a single PDF file. I call this an "aggregate PDF file." These aggregate files are uploaded to an FTP area for us. We get anywhere from 10-40 of them to process daily and they contain anywhere from one to 200+ scanned forms. It's a batch process. I was assigned to maintain the code that performs that batch process. When I took this on, there were some problems, but they were totally invisible so nobody knew about them. The log file was only documenting a few errors, and it turned out there were other errors that weren't being detected or reported. So my first task was to enhance the log file to the point where I could use it to audit the results, and ultimately track down errors. I've tracked down most of the errors at this point, and I've found structural issues and other things that have been around for ages that nobody considers as "problems", but that's a story for another day. Here's a snippet of the log file showing what's recorded for a typical aggregate file. This is one of 37 aggregate PDFs in this particular batch, and it contains four documents. One of them appears to be junk, which is common. (Some offices will drop in dozens of the wrong forms into the scanner; we just ignore them.) Just FYI, there's an OCR process that occurs for each document where we try to extract a couple of numbers; they show up here as DocLink / Pick Ticket# / PTN, and OrderNum / Order#. The database is queried for a corresponding invoice that refers to one of these so they can be matched up. In a lot of cases, there isn't one. (Another issue that nobody sees as a problem.) Note that I separate different sections by ========================== flags. Don't look too close and try to make sense of the details as I've doctored it a bit to show some variations. =============================================================================== >> Processing Aggregate file 3/37: 0100_TEMSTOD4_%D%T_200507162634.pdf >> 4 Files extracted -- ===================================== -- Now processing single-page PDF (1 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.1.pdf >> Pick Ticket# found : 10058190 >> Order # found : 11452907 (corrected from: 1[452907) ===================================== -- Now processing single-page PDF (2 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.2.pdf >> Pick Ticket# found : 10052571 (corrected from: |0052571) >> Order # found : 11416133 (corrected from: I1416133) ===================================== -- Now processing single-page PDF (3 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.3.pdf ?? didn't find a PTN or OrdNum where we expected them to be () Maybe it's upside-down ... let's rotate it and try again :( Nothing useful here. Moving on... ===================================== -- Now processing single-page PDF (4 of 4) -- PDF file: 0100_TEMSTOD4_%D%T_200507162634.4.pdf ?? didn't find a PTN or OrdNum where we expected them to be () Maybe it's upside-down ... let's rotate it and try again >> Pick Ticket# found : 10061327 >> Order # found : 11547908 ============================ == CombineMultiPageOrders == >> RelatesToPage: first-pg=[0800_LVCopierBW05_08_2002_59_44.6.pdf] this-pg=[0800_LVCopierBW05_08_2002_59_44.7.pdf] >> RelatesToPage: first-pg=[0800_LVCopierBW05_08_2002_59_44.8.pdf] this-pg=[0800_LVCopierBW05_08_2002_59_44.9.pdf] -- Deleting file #8: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.9.pdf -- Deleting file #7: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.8.pdf -- Deleting file #6: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.7.pdf -- Deleting file #5: C:\Loader\Out\0800_LVCopierBW05_08_2002_59_44.6.pdf >> 2 pages in C:\Loader\Out\2_1_0800_LVCopierBW05_08_2002_59_44.6.pdf | Doclink=10062731 OrderNum=11561321 >> 2 pages in C:\Loader\Out\2_1_0800_LVCopierBW05_08_2002_59_44.8.pdf | Doclink=10060874 OrderNum=11558566 ================================ == ProcessDocumentAttachments == >> Processing Attachment File 1/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf" DocLink=10058190 OrderNum=11452907 == RemoveDuplicateAttachments == -- Removing 1 Dup attachment(s) [20383603] -- TZipStore.Delete#1([20383603]) -- ZipFileName: \\xxxx.yyyy.com\attachments\2020-05-07\21419503.zip -- TZipStore.Delete#2(20383603) -- ZipFileName: \\xxxx.yyyy.com\attachments\2020-05-07\21419503.zip == LinkAttachmentToDocument == -- Pick Ticket found; linking with DocLink (PTN) = 10058190 docid = 3362739699 -- INSERTED document attachment named "Pick_Ticket_p10058190.pdf" with document_id=3362739699 -- TStorageMgr.StoreAttachment.FullName: \\xxxx.yyyy.com\attachments\2020-05-07\21419511.zip -- TStorageMgr.StoreAttachment( ZS(Not NIL), 20382703, C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf ) -- an = "C:\Loader\Out\20382703" -- FileExists(C:\Loader\Out\20382703) = YES! -- ZS.Add(an) --> 1 !! SUCCEEDED !! -- StorageMgr: storing C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf --> 20382703 (renamed before adding) -- File: C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.1.pdf does NOT exist (should NOT) -- File: C:\Loader\Out\20382703 DOES exist (should) >> Processing Attachment File 2/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.2.pdf" DocLink=10052571 OrderNum=11416133 ** No matching doc (TicketHash) found for PTN: 10052571 ** Document with this order number already has a pick ticket (1 #row(s) found) : 11416133 >> Processing Attachment File 3/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.3.pdf" DocLink=10060319 OrderNum=-UNKNOWN- ** No matching doc (TicketHash) found for PTN: 10060319 ** Page OCR read problem: Missing orderNum >> Processing Attachment File 4/4: "C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf" DocLink=10061327 OrderNum=-UNKNOWN- == RemoveDuplicateAttachments == -- No duplicate attachments found == LinkAttachmentToDocument == -- Pick Ticket found; linking with DocLink (PTN) = 10061327 docid = 3362739763 -- INSERTED document attachment named "Pick_Ticket_p10061327.pdf" with document_id=3362739763 -- TStorageMgr.StoreAttachment.FullName: \\xxxx.yyyy.com\attachments\2020-05-07\21419495.zip -- TStorageMgr.StoreAttachment( ZS(Not NIL), 20382705, C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf ) -- an = "C:\Loader\Out\20382705" -- FileExists(C:\Loader\Out\20382705) = YES! -- ZS.Add(an) --> 1 !! SUCCEEDED !! -- StorageMgr: storing C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf --> 20382705 (renamed before adding) -- File: C:\Loader\Out\0100_TEMSTOD4_%D%T_200507162634.4.pdf does NOT exist (should NOT) -- File: C:\Loader\Out\20382705 DOES exist (should) ======================== == SetFilesAsBillable == -- IDs found that are ready to bill: [8124785177] =================== == UpdateWebview == -- Checking to see if there are any datafiles to sync with webview ... -- IDs found to sync with webview: [21419511,21419495] =============================================================================== >> Processing Complete for Aggregate file #3/37 : 0100_TEMSTOD4_%D%T_200507162634.pdf ... cleaning up ... =============================================================================== What I want to do is parse these out to be used to support some kind of "dashboard". Today's log file was nearly 14k lines long and it's pretty useless as-is. It all looks like the same patterns over and over -- only the numbers and filenames look different, and none of them are really meaningful. What's helpful is to see how they relate, and sometimes to be able to view the documents themselves. As can be seen, the structure is fairly simple with some variations in each block. It's easy to use regular expressions to recognize and parse different parts. What I'm wondering is what might be the best approach to ingest data like this? I can tell, for instance, when I've started processing an aggregate PDF (of which the whole example above is an example), and I can distinguish each of the different "sections" and "files" that are being processed. This much is easy. But would you build some kind of "parse tree" for this internally? Or would you just take the data as it's parsed and display it with bits and pieces attached as objects for when more details are wanted? Here's a statistical summary I show at the very end: =============================================== ============= S T A T I S T I C S ============= =============================================== == 37 Aggregate PDF files == 900 Documents processed == 804 Pick Tickets identified -- 89% == 51 Corrected PTNs == 143 Unreadable PTNs == 754 Order Numbers identified -- 83% == 246 Corrected Order Nums == 142 Unreadable OrdNums == 0 Pick Tickets with no PTN found == 194 PTNs not matching any documents -- 21% == 96 Forms found that were not identified as Pick Tickets -- 10% == 187 Docs with no matching Order Nums == 2 Docs with OrdNum that already have a PT attached == 6 Docs attached using OrdNums -- 0% == 52 Pages rotated to get viable data -- 5% =============================================== I'm thinking it might be nice to have something similar to act as the "entry point" to the dashboard that lets you click on one of the lines and display data starting from that perspective. You could drill-down to see details of what went into a given statistic. It might also allow you to see overlaps with other items if they were meaningful. There's some interesting data that could be gathered by looking at this data longitudinally. (Am I right in guessing that this edges into the world of "analytics"?) If you've got any experience with things like this, I'd be really interested in your thoughts on how you might approach it. Like ... would you store the parse tree anywhere? It takes less than 2 seconds to parse this file, so I'm not sure what might be gained from saving it. But I don't know ... that's what I'm asking about.
×