Jump to content

Anders Melander

Members
  • Content Count

    2312
  • Joined

  • Last visited

  • Days Won

    119

Posts posted by Anders Melander


  1. 3 hours ago, Anders Melander said:

    I will verify that the older version of llvm-pdbutil (the version I targeted with the YAML output) doesn't fail like that.

    llvm-pdbutil.exe version 9.0.0.0 doesn't fail with the new PDBs.

     

    However, with the large blocksize (map2pdb -blocksize:8192 ...) it fails as expected (it's documented to only support a blocksize of 4096):

    Quote

    llvm-pdbutil: The data is in an unexpected format. Unsupported block size.

     

     

    4 hours ago, Jan Rysavy said:

    Latest llvm-pdbutil (16.0.4) "dump --all" returns following error on MAP2PDB PDB:

    
    Unexpected error processing modules: PDB does not contain the requested image section header type

     

    The source of that error message is here:

    https://github.com/llvm/llvm-project/blob/f74bb326949aec0cac8e54ff00fc081f746ff35d/llvm/tools/llvm-pdbutil/DumpOutputStyle.cpp#L395

    It would have been nice if the message specified what header type it was looking for but

     

    In any case, it's a bit suspicious that it mentions "image section header" because that's a thing in the PE image and the PDB doesn't contain any PE related stuff. It could just be a bad choice of words.

     

    Anyway, moving along. So the error message is produced by loadSectionHeaders(type).

    loadSectionHeaders(type) is called from dumpSectionHeaders(type) which is called from dumpSectionHeaders() with type=SectionHdr and type=SectionHdrOrig.

     

    I have neither type implemented but I have a comment in my source about SectionHdr:

    function TDebugInfoPdbWriter.EmitDBISubstreamDebugHeader(Writer: TBinaryBlockWriter): Cardinal;
    begin
      Result := Writer.Position;
    
      // We don't provide any of the optional debug streams yet
      for var HeaderType := Low(PDBDbgHeaderType) to High(PDBDbgHeaderType) do
        Writer.Write(Word(TMSFStream.NullStreamIndex));
    
      // TODO : I believe the SectionHdr stream contains the segment names
      (*
      for var Segment in FDebugInfo.Segments do
      begin
        xxx.Add(Segment.SegClassName);
        xxx.Add(Segment.Name);
      end;
      *)
    
      Result := Writer.Position - Result;
    end;

    Since these streams are optional it's pretty stupid of llvm-pdbutil to bug out if they aren't present.

    The old llvm-pdbutil didn't read them (which is probably why I didn't know the format or content of the streams) but since the new one does it should be possible to deduce the format from their source. I'm just concerned that this might be a completely wasted effort if msdia140.dll is failing on something else.


  2. 6 minutes ago, Jan Rysavy said:

    Is that normal?

    No, not as far as I remember.

    I don't have the map2pdb project on the system I'm on right now, but when I get home I will verify that the older version of llvm-pdbutil (the version I targeted with the YAML output) doesn't fail like that.


  3. So I'm looking at checkInvariants in msf.cpp and there's this comment:

    Quote

    check that every page is either free, freed, or in use in exactly one stream

     

    MSF is the container format for a PDB file. The MSF format is pretty much a mini FAT file system, the files being the different PDB tables: Source file names, line numbers, symbols, etc.

     

    Internally the MSF file system is divided into intervals.

    Each interval contains <blocksize> blocks and each block is <blocksize> bytes long. The blocksize used to be 4096. Now it can apparently also be 8192.

    At the start of each interval, there are two blocks (the Free Page Map) that contain a bitmap of allocated blocks in the interval. Allocated as in "in-use" by a stream.

    A stream is just a collection of blocks.

    All streams are listed in the stream directory. The stream directory is stored in one or more blocks.

    At the start of the file is the superblock. It contains various info about the file: blocksize, index of the first Free Page Map, the number of blocks in the file, a pointer to the stream directory, etc.

     

      MSF Block and Interval layout
    
      Interval    |                          0                           |                    1
      ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - -
      Block index |      0     |   1  |   2  |   3  |   4  | ...  | 4095 | 4096 | 4097 | 4095 | 4096 | 4097 | ...
      ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - -
      Log block   |   0 (N/A)  | N/A  | N/A  |   1  |   2  | ...  | ...  | 4094 | N/A  | N/A  | 4095 | ...  | ...
      ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - -
      Phys offset |      0     | 4096 | 8192 |12288 |16384 | ...  | ...  |4096^2|+4096 |+8192 | ...  | ...  | ...
      ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - -
      Content     | Superblock | FPM1 | FPM2 | Data | Data | Data | Data | Data | FPM1 | FPM2 | Data | Data | Data

    So in theory some of the blocks in an interval can be in use and some of them can be free and those that are in use should be referenced in the higher level stream's index of MSF blocks - otherwise, they are "leaked". I believe this is what checkInvariants verifies.

    Now, since I'm writing the PDB file in one go and never have a need to go back and free or modify an already allocated MSF block, I always mark all blocks as allocated when I start a new interval in the file. This means that I can, and most likely will, end up with blocks marked as allocated in the bitmap but not actually in use (or physically present in the file).

    procedure TBinaryBlockWriter.WriteBlockMap;
    const
      NoVacancies: Byte = $FF;
    begin
      BeginBlock;
      Assert((BlockIndex mod FBlockSize) in [1, 2]);
    
      // Mark all BlockSize*8 blocks occupied
      for var i := 0 to FBlockSize-1 do
        FStream.WriteBuffer(NoVacancies, 1);
    
      FStreamSize := Max(FStreamSize, FStream.Position);
    
      EndBlock(True);
    end;

     

    So why wasn't this a problem with the old version? In the old Microsoft source checkInvariants is only active in debug builds so my guess is that the old version simply doesn't perform this validation.

     

    Anyway, it's the best guess I have right now so it should be pursued. I'm not sure when I will get time to do so though.

    • Thanks 2

  4. 25 minutes ago, Jan Rysavy said:

    Look at difference in msdia140!MSF_HB::checkInvariants return value rax = 0 vs rax = 1.

    Yes, interesting; Probably the place where it goes wrong. Unfortunately, I have no idea about what "Invariant" refers to.

    So assuming the problem lies in the MSF format (which is just a container format - a mini internal file system), I can't see anything in the MSF format that could be referred to as variant/invariant.

     

    I have reviewed the code and as far as I can tell I'm completely blocksize agnostic; Nowhere do I assume a blocksize of 4096. I have tried changing the blocksize to 8192 but that just makes the old VTune choke on the file and the new one still can't read it.

    I will now try to see if the new VTune can work with the PDB files that ship with the old VTune (they have a block size of 4096). If it can then the problem is with map2pdb (i.e. I'm doing something wrong). If it can't then the PDB format has changed and I'm f*cked because there's no way to know what changed. The first time around I reverse-engineered some of the format by examining the files in a hex editor and I'm not doing that again.


  5. 24 minutes ago, David Heffernan said:

    if ever the collection has a significant size

    With a list of TEdits? Not likely.

     

    I would go for an encapsulated TList<T>:

    type
      TSetOfStuff<T> = class
      private
        FList: TList<T>;
      public
        function Contains(const Value: T): boolean;
        function Add(const Value: T): integer;
        procedure Remove(const Value: T);
        function GetEnumarator: TEnumerator<T>;
      end;
    
    function TSetOfStuff<T>.Contains(const Value: T): boolean;
    begin
      var Index: integer;
      Result := FList.BinarySearch(Value, Index);
    end;
    
    function TSetOfStuff<T>.Add(const Value: T): integer;
    begin
      if (not FList.BinarySearch(Value, Result)) then
        FList.Insert(Result, Value);
    end;
    
    procedure TSetOfStuff<T>.Remove(const Value: T);
    begin
      var Index: integer;
      if (FList.BinarySearch(Value, Index)) then
        FList.Delete(Index);
    end;
    
    function TSetOfStuff<T>.GetEnumarator: TEnumerator<T>;
    begin
      Result := FList.GetEnumerator;
    end;
    
    etc...

     

    • Thanks 1

  6. 18 minutes ago, David Heffernan said:

    One thing you could easily do would be to always output the db instructions, but put the asm in comments, then it would be just as readable, but have no ifdefs

    ...and no errors from the compiler if the asm is wrong.


  7. 1 minute ago, Fr0sT.Brutal said:

    Why not extract only necessary external requirements to custom low-fat units and leave the unit of interest as is? Or there's too much deps?

    I guess that's one way to remove the dependencies. I hadn't thought of that.

     

    But I would still have to clean up the remaining source and rewrite parts of it, so again: A fork.
    If I go that way I would probably prefer to simply start from the original, pre-JEDI, version of the source instead of trying to polish the turd it has become.


  8. 39 minutes ago, Jan Rysavy said:

    Yes, see attached 'wt' output from old msdia140.dll 14.29.30035.0 for the same input PDB file. In this case loadDataFromPdb succeeds.

    msdia14_142930035.zip

    Ew! It looks like they have done a complete rewrite. No wonder it's broken.

     

    So I guess this is an example of the main problem with the PDB format: Microsoft considers it their own internal format to do with what they like. They have their own (undocumented) writer, their own reader, and no documentation.


  9. 35 minutes ago, Jan Rysavy said:

    In msdia140.zip is attached 'wt' command output for msdia140!CDiaDataSource::loadDataFromPdb.

    Excellent!

    A few quick observations:

     

    First of all, it's strange that the error isn't logged. That would

     

    StrmTbl::internalSerializeBigMsf
    Lots of calls to this. I'm guessing it's reading MSF blocks and deblocking them into linear memory streams. This is probably the new code that supports the 8192-byte "big" MSF block size.

     

    MSF_HB::load

    Probably the code that loads the PDB tables from the memory streams.

     

    StrmTbl::~StrmTbl

    Lots of calls to this. Probably clean up after the load has been aborted.

     

    PortablePDB::PortablePDB
    Something wrong here. "Portable PDB" is the .NET PDB format. It's a completely different file format.
    I'm guessing it's falling back to that format after failing to validate the file as PDB.


  10. 3 hours ago, Stefan Glienke said:
    Quote

    we added a use_large_pdbs build setting which would switch the PDB page size to 8 KiB. However this setting initially needed to be off by default due to a lack of complete tool support.

    If the block size being 4096 is the only problem (I somehow doubt that I'm that lucky) then this is the line that needs to be changed to write 8192-byte blocks:

    https://bitbucket.org/anders_melander/map2pdb/src/2341200827af24f7dd75cb695a668dfa9564bcf5/Source/debug.info.writer.pdb.pas#lines-225
     

    constructor TDebugInfoPdbWriter.Create;
    begin
      Create(4096);
    end;

     


  11. 45 minutes ago, Stefan Glienke said:

    Yeah... Not too keen on that as a first approach.

    The last time I tried using the llvm pdb support as a reference I wasted a lot of time before I found out that it was very incomplete to the point of being unusable by VTune. It has probably improved but since then it's hard to tell what state it's in.

    https://github.com/llvm/llvm-project/issues/37279

    https://github.com/llvm/llvm-project/issues/28528

     

    I will try to see if I can reproduce and spot the problem in the source before I go down that road. Thanks anyway.


  12. 28 minutes ago, Stefan Glienke said:

    Looks like it.

     

     

    1 hour ago, Jan Rysavy said:

    This also probably explains why Profiler in Visual Studio (2022) stopped displaying function names in recent versions. It uses newer versions of msdia140.dll.

    Ooooh, interesting. Maybe they've accidentally broken support for the older format and not noticed it because they're only testing the new format now.

     

    The article Stefan linked to makes me think that even though the PDB format supported large PDB files, the PDB reader (msdia140.dll) didn't. Otherwise, they would only have had to update their PDB writer to support large PDB files.


  13. 8 minutes ago, Jan Rysavy said:

    I can confirm replacing C:\Program Files (x86)\Intel\oneAPI\vtune\2023.1.0\bin64\amplxe_msdia140.dll (version 14.34.31942.0) with version 14.28.29910.0 from VTune 2022.4.1 solves this problem!

    That means the bug is most likely in map2pdb because that DLL is Microsoft's API for reading PDB files.


  14. I'm writing a shaper for complex text layout and for that, I need to do Unicode decomposition and composition (NFD and NFC normalization).
     

    Does anyone know of a Pascal library that can do this?

     

    I have the following basic requirements:

    • Open source with a workable license (i.e. not GPL).
    • Cross platform (i.e. not tied to Windows or whatever).
    • Operate on UFC4/UTF-32 strings.
    • Based on the standard Unicode data tables.
    • Must have the ability to update tables when new Unicode tables are published.
    • Must support both NFD decomposition and NFC composition.

     

    So far I have found the following candidates:

    Delphi Unicode libraries

    • PUCU Pascal UniCode Utils Libary
      🔵 Origin: Benjamin Rosseaux.
      ✅ Maintained: Yes.
      ✅ Maintained by author: Yes.
      ✅ License: Zlib.
      â›” Readability: Poor. Very bad formatting.
      ✅ Performance: The native string format is UCS4 (32-bit).
      ✅ Features: Supports decomposition and composition.
      ✅ Dependencies: None.
      ✅ Data source: Unicode tables are generated from official Unicode data files. Source for converter provided.
      â›” Table format: Generated inline arrays and code.
      ✅ Completeness: All Unicode tables are available.
      ✅ Hangul decomposition: Yes.
      ✅ Customizable: Required data and structures are exposed.
      ✅ Unicode level: Currently at Unicode v15.
      â›” Unicode normalization test suite: Fail/Crash
       

    • FreePascal RTL
      🔵 Origin: Based on code by Inoussa Ouedrago.
      ✅ Maintained: Yes.
      🔵 Maintained by author: No.
      🔵 License: GPL with linking exception.
      ✅ Readability: Good. Code is clean.
      ✅ Performance: Code appears efficient.
      â›” Features: Only supports decomposition. Not composition.
      ✅ Dependencies: None.
      ✅ Data source: Unicode tables are generated from official Unicode data files. Source for converter provided.
      ✅ Table format: Generated arrays in include files.
      â›” Completeness: Only some Unicode tables are available.
      â›” Hangul decomposition: No.
      â›” Customizable: Required data and structures are private.
      ✅ Unicode level: Currently at Unicode v14.
      â›” Unicode normalization test suite: N/A; Composition not supported.

       
    • JEDI jcl
      🔵 Origin: Based on Mike Lischke's Unicode library.
      🔵 Maintained: Sporadically.
      🔵 Maintained by author: No.
      ✅ License: MPL.
      ✅ Readability: Good. Code is clean.
      â›” Performance: Very inefficient. String reallocations. The native string format is UCS4 (32-bit).
      ✅ Features: Supports decomposition and composition.
      â›” Dependencies: Has dependencies on a plethora of other JEDI units.
      ✅ Data source: Unicode tables are generated from official Unicode data files. Source for converter provided.
      ✅ Table format: Generated resource files.
      🔵 Unicode level: Currently at Unicode v13.
      ✅ Completeness: All Unicode tables are available.
      ✅ Hangul decomposition: Yes.
      â›” Customizable: Required data and structures are private.
      â›” Other: Requires installation (to generate the JEDI.inc file).
      🔵 Unicode normalization test suite: Unknown

     

    The FPC implementation has had the composition part removed so that immediately disqualifies it and the JEDI implementation, while based on an originally nice and clean implementation, has gotten the usual JEDI treatment so it pulls in the rest of the JEDI jcl as dependencies. I could clean that up but it would amount to a fork of the code and I would prefer not to have to also maintain that piece of code.

    That leaves the PUCU library and I currently have that integrated and working - or so I thought... Unfortunately, I have now found a number of severe defects in it and that has prompted me to search for alternatives again.

     

    Here's the project I need it for: https://gitlab.com/anders.bo.melander/pascaltype2


  15. 51 minutes ago, Stefan Glienke said:

    Cannot locate debugging information for file `somepath\Tests.exe'. Cannot match the module with the symbol file `somepath\Tests.pdb'. Make sure to specify the correct path to the symbol file in the Binary/Symbol Search list of directories.

    I think that is a generic error message meaning "Something went wrong and our error handling sucks". As far as I remember you get a message like that regardless of what problem VTune encounters when resolving through the PDB file.


  16. 11 minutes ago, Anders Melander said:

    address bitness

    I meant what do you mean by this ^

     

    Do you mean 32- vs 64-bit addresses? AFAIR there's no choice or ambiguities in the PDB format with regard to the size of an address value, relative or absolute, but I would have to take a look at the source to make sure.

×