-
Content Count
2312 -
Joined
-
Last visited
-
Days Won
119
Posts posted by Anders Melander
-
-
2 hours ago, Jan Rysavy said:Do you have any idea what they are testing in checkInvariants / loc_1800FCE71?
No. I can't figure it out.
-
3 hours ago, Anders Melander said:I will verify that the older version of llvm-pdbutil (the version I targeted with the YAML output) doesn't fail like that.
llvm-pdbutil.exe version 9.0.0.0 doesn't fail with the new PDBs.
Â
However, with the large blocksize (map2pdb -blocksize:8192 ...) it fails as expected (it's documented to only support a blocksize of 4096):
Quotellvm-pdbutil: The data is in an unexpected format. Unsupported block size.
Â
Â
4 hours ago, Jan Rysavy said:Latest llvm-pdbutil (16.0.4) "dump --all" returns following error on MAP2PDB PDB:
Unexpected error processing modules: PDB does not contain the requested image section header type
Â
The source of that error message is here:
It would have been nice if the message specified what header type it was looking for but
Â
In any case, it's a bit suspicious that it mentions "image section header" because that's a thing in the PE image and the PDB doesn't contain any PE related stuff. It could just be a bad choice of words.
Â
Anyway, moving along. So the error message is produced by loadSectionHeaders(type).
loadSectionHeaders(type) is called from dumpSectionHeaders(type) which is called from dumpSectionHeaders() with type=SectionHdr and type=SectionHdrOrig.
Â
I have neither type implemented but I have a comment in my source about SectionHdr:
function TDebugInfoPdbWriter.EmitDBISubstreamDebugHeader(Writer: TBinaryBlockWriter): Cardinal; begin Result := Writer.Position; // We don't provide any of the optional debug streams yet for var HeaderType := Low(PDBDbgHeaderType) to High(PDBDbgHeaderType) do Writer.Write(Word(TMSFStream.NullStreamIndex)); // TODO : I believe the SectionHdr stream contains the segment names (* for var Segment in FDebugInfo.Segments do begin xxx.Add(Segment.SegClassName); xxx.Add(Segment.Name); end; *) Result := Writer.Position - Result; end;
Since these streams are optional it's pretty stupid of llvm-pdbutil to bug out if they aren't present.
The old llvm-pdbutil didn't read them (which is probably why I didn't know the format or content of the streams) but since the new one does it should be possible to deduce the format from their source. I'm just concerned that this might be a completely wasted effort if msdia140.dll is failing on something else.
-
6 minutes ago, Jan Rysavy said:Is that normal?
No, not as far as I remember.
I don't have the map2pdb project on the system I'm on right now, but when I get home I will verify that the older version of llvm-pdbutil (the version I targeted with the YAML output) doesn't fail like that.
-
4 hours ago, Anders Melander said:I'm not sure when I will get time to do so though.
Of course I couldn't help myself. The Bugfix/LargePDB branch contains the changes so the Free Page Map now contains the correct values. Unfortunately, that didn't solve the problem.
-
40 minutes ago, Jan Rysavy said:CONFIRMED!
Wow. Thanks!
-
So I'm looking at checkInvariants in msf.cpp and there's this comment:
Quotecheck that every page is either free, freed, or in use in exactly one stream
Â
MSF is the container format for a PDB file. The MSF format is pretty much a mini FAT file system, the files being the different PDB tables: Source file names, line numbers, symbols, etc.
Â
Internally the MSF file system is divided into intervals.
Each interval contains <blocksize> blocks and each block is <blocksize> bytes long. The blocksize used to be 4096. Now it can apparently also be 8192.
At the start of each interval, there are two blocks (the Free Page Map) that contain a bitmap of allocated blocks in the interval. Allocated as in "in-use" by a stream.
A stream is just a collection of blocks.
All streams are listed in the stream directory. The stream directory is stored in one or more blocks.
At the start of the file is the superblock. It contains various info about the file: blocksize, index of the first Free Page Map, the number of blocks in the file, a pointer to the stream directory, etc.
Â
MSF Block and Interval layout Interval | 0 | 1 ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - - Block index | 0 | 1 | 2 | 3 | 4 | ... | 4095 | 4096 | 4097 | 4095 | 4096 | 4097 | ... ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - - Log block | 0 (N/A) | N/A | N/A | 1 | 2 | ... | ... | 4094 | N/A | N/A | 4095 | ... | ... ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - - Phys offset | 0 | 4096 | 8192 |12288 |16384 | ... | ... |4096^2|+4096 |+8192 | ... | ... | ... ------------+------------+------+------+------+------+------+------+------+------+------+------+------+- - - Content | Superblock | FPM1 | FPM2 | Data | Data | Data | Data | Data | FPM1 | FPM2 | Data | Data | Data
So in theory some of the blocks in an interval can be in use and some of them can be free and those that are in use should be referenced in the higher level stream's index of MSF blocks - otherwise, they are "leaked". I believe this is what checkInvariants verifies.
Now, since I'm writing the PDB file in one go and never have a need to go back and free or modify an already allocated MSF block, I always mark all blocks as allocated when I start a new interval in the file. This means that I can, and most likely will, end up with blocks marked as allocated in the bitmap but not actually in use (or physically present in the file).
procedure TBinaryBlockWriter.WriteBlockMap; const NoVacancies: Byte = $FF; begin BeginBlock; Assert((BlockIndex mod FBlockSize) in [1, 2]); // Mark all BlockSize*8 blocks occupied for var i := 0 to FBlockSize-1 do FStream.WriteBuffer(NoVacancies, 1); FStreamSize := Max(FStreamSize, FStream.Position); EndBlock(True); end;
Â
So why wasn't this a problem with the old version? In the old Microsoft source checkInvariants is only active in debug builds so my guess is that the old version simply doesn't perform this validation.
Â
Anyway, it's the best guess I have right now so it should be pursued. I'm not sure when I will get time to do so though.
- 2
-
11 minutes ago, Stefan Glienke said:I want a sex change operation so I can have your children.
Â
(it means "thank you" in case you wondered)
- 2
- 1
-
8 minutes ago, Anders Melander said:I will now try to see if the new VTune can work with the PDB files that ship with the old VTune
It can.
-
25 minutes ago, Jan Rysavy said:Look at difference in msdia140!MSF_HB::checkInvariants return value rax = 0 vs rax = 1.
Yes, interesting; Probably the place where it goes wrong. Unfortunately, I have no idea about what "Invariant" refers to.
So assuming the problem lies in the MSF format (which is just a container format - a mini internal file system), I can't see anything in the MSF format that could be referred to as variant/invariant.
Â
I have reviewed the code and as far as I can tell I'm completely blocksize agnostic; Nowhere do I assume a blocksize of 4096. I have tried changing the blocksize to 8192 but that just makes the old VTune choke on the file and the new one still can't read it.
I will now try to see if the new VTune can work with the PDB files that ship with the old VTune (they have a block size of 4096). If it can then the problem is with map2pdb (i.e. I'm doing something wrong). If it can't then the PDB format has changed and I'm f*cked because there's no way to know what changed. The first time around I reverse-engineered some of the format by examining the files in a hex editor and I'm not doing that again.
-
14 minutes ago, Jan Rysavy said:Ah, I see. So the 'wt' is a command you give to the debugger and it traces all calls made in the call tree? I thought the trace was something that msdia140.dll produced on its own.
Â
-
24 minutes ago, David Heffernan said:if ever the collection has a significant size
With a list of TEdits? Not likely.
Â
I would go for an encapsulated TList<T>:
type TSetOfStuff<T> = class private FList: TList<T>; public function Contains(const Value: T): boolean; function Add(const Value: T): integer; procedure Remove(const Value: T); function GetEnumarator: TEnumerator<T>; end; function TSetOfStuff<T>.Contains(const Value: T): boolean; begin var Index: integer; Result := FList.BinarySearch(Value, Index); end; function TSetOfStuff<T>.Add(const Value: T): integer; begin if (not FList.BinarySearch(Value, Result)) then FList.Insert(Result, Value); end; procedure TSetOfStuff<T>.Remove(const Value: T); begin var Index: integer; if (FList.BinarySearch(Value, Index)) then FList.Delete(Index); end; function TSetOfStuff<T>.GetEnumarator: TEnumerator<T>; begin Result := FList.GetEnumerator; end; etc...
Â
- 1
-
18 minutes ago, David Heffernan said:One thing you could easily do would be to always output the db instructions, but put the asm in comments, then it would be just as readable, but have no ifdefs
...and no errors from the compiler if the asm is wrong.
-
1 minute ago, Fr0sT.Brutal said:Why not extract only necessary external requirements to custom low-fat units and leave the unit of interest as is? Or there's too much deps?
I guess that's one way to remove the dependencies. I hadn't thought of that.
Â
But I would still have to clean up the remaining source and rewrite parts of it, so again: A fork.
If I go that way I would probably prefer to simply start from the original, pre-JEDI, version of the source instead of trying to polish the turd it has become. -
39 minutes ago, Jan Rysavy said:Yes, see attached 'wt' output from old msdia140.dll 14.29.30035.0 for the same input PDB file. In this case loadDataFromPdb succeeds.
Ew! It looks like they have done a complete rewrite. No wonder it's broken.
Â
So I guess this is an example of the main problem with the PDB format: Microsoft considers it their own internal format to do with what they like. They have their own (undocumented) writer, their own reader, and no documentation.
-
35 minutes ago, Jan Rysavy said:In msdia140.zip is attached 'wt' command output for msdia140!CDiaDataSource::loadDataFromPdb.
Excellent!
A few quick observations:
Â
First of all, it's strange that the error isn't logged. That would
Â
StrmTbl::internalSerializeBigMsf
Lots of calls to this. I'm guessing it's reading MSF blocks and deblocking them into linear memory streams. This is probably the new code that supports the 8192-byte "big" MSF block size.Â
MSF_HB::load
Probably the code that loads the PDB tables from the memory streams.
Â
StrmTbl::~StrmTbl
Lots of calls to this. Probably clean up after the load has been aborted.
Â
PortablePDB::PortablePDB
Something wrong here. "Portable PDB" is the .NET PDB format. It's a completely different file format.
I'm guessing it's falling back to that format after failing to validate the file as PDB. -
3 hours ago, Stefan Glienke said:This could be some relevant info:Â https://randomascii.wordpress.com/2023/03/08/when-debug-symbols-get-large/
Quotewe added a use_large_pdbs build setting which would switch the PDB page size to 8 KiB. However this setting initially needed to be off by default due to a lack of complete tool support.
If the block size being 4096 is the only problem (I somehow doubt that I'm that lucky) then this is the line that needs to be changed to write 8192-byte blocks:
constructor TDebugInfoPdbWriter.Create; begin Create(4096); end;
Â
-
2 hours ago, Jan Rysavy said:MS released debug symbols for msdia140.dll... nice
Neat. If you can spot where it gives up on the pdb file and returns an error that would be suuuuper nice.
Â
Does it produce any debug output while loading?
-
45 minutes ago, Stefan Glienke said:@Anders Melander Might be worth looking into https://github.com/llvm/llvm-project/blob/main/llvm/lib/DebugInfo/PDB/Native/PDBFileBuilder.cpp
Yeah... Not too keen on that as a first approach.
The last time I tried using the llvm pdb support as a reference I wasted a lot of time before I found out that it was very incomplete to the point of being unusable by VTune. It has probably improved but since then it's hard to tell what state it's in.
https://github.com/llvm/llvm-project/issues/37279
https://github.com/llvm/llvm-project/issues/28528
Â
I will try to see if I can reproduce and spot the problem in the source before I go down that road. Thanks anyway.
-
28 minutes ago, Stefan Glienke said:This could be some relevant info:Â https://randomascii.wordpress.com/2023/03/08/when-debug-symbols-get-large/
Looks like it.
Â
Â
1 hour ago, Jan Rysavy said:This also probably explains why Profiler in Visual Studio (2022) stopped displaying function names in recent versions. It uses newer versions of msdia140.dll.
Ooooh, interesting. Maybe they've accidentally broken support for the older format and not noticed it because they're only testing the new format now.
Â
The article Stefan linked to makes me think that even though the PDB format supported large PDB files, the PDB reader (msdia140.dll) didn't. Otherwise, they would only have had to update their PDB writer to support large PDB files.
-
51 minutes ago, Lars Fosdal said:Is there Unicode support in @Alexander Sviridenkov's https://delphihtmlcomponents.com/index.html ?
Doesn't really matter:
2 hours ago, Anders Melander said:I have the following basic requirements:
- Open source with a workable license (i.e. not GPL). 
Â
-
8 minutes ago, Jan Rysavy said:I can confirm replacing C:\Program Files (x86)\Intel\oneAPI\vtune\2023.1.0\bin64\amplxe_msdia140.dll (version 14.34.31942.0) with version 14.28.29910.0 from VTune 2022.4.1 solves this problem!
That means the bug is most likely in map2pdb because that DLL is Microsoft's API for reading PDB files.
-
I'm writing a shaper for complex text layout and for that, I need to do Unicode decomposition and composition (NFD and NFC normalization).
ÂDoes anyone know of a Pascal library that can do this?
Â
I have the following basic requirements:
- Open source with a workable license (i.e. not GPL).
- Cross platform (i.e. not tied to Windows or whatever).
- Operate on UFC4/UTF-32 strings.
- Based on the standard Unicode data tables.
- Must have the ability to update tables when new Unicode tables are published.
- Must support both NFD decomposition and NFC composition.
Â
So far I have found the following candidates:
Delphi Unicode libraries
-
PUCU Pascal UniCode Utils Libary
🔵 Origin: Benjamin Rosseaux.
✅ Maintained: Yes.
✅ Maintained by author: Yes.
✅ License: Zlib.
â›” Readability: Poor. Very bad formatting.
✅ Performance: The native string format is UCS4 (32-bit).
✅ Features: Supports decomposition and composition.
✅ Dependencies: None.
✅ Data source: Unicode tables are generated from official Unicode data files. Source for converter provided.
â›” Table format: Generated inline arrays and code.
✅ Completeness: All Unicode tables are available.
✅ Hangul decomposition: Yes.
✅ Customizable: Required data and structures are exposed.
✅ Unicode level: Currently at Unicode v15.
â›” Unicode normalization test suite: Fail/Crash
 -
FreePascal RTL
🔵 Origin: Based on code by Inoussa Ouedrago.
✅ Maintained: Yes.
🔵 Maintained by author: No.
🔵 License: GPL with linking exception.
✅ Readability: Good. Code is clean.
✅ Performance: Code appears efficient.
â›” Features: Only supports decomposition. Not composition.
✅ Dependencies: None.
✅ Data source: Unicode tables are generated from official Unicode data files. Source for converter provided.
✅ Table format: Generated arrays in include files.
â›” Completeness: Only some Unicode tables are available.
â›” Hangul decomposition: No.
â›” Customizable: Required data and structures are private.
✅ Unicode level: Currently at Unicode v14.
â›” Unicode normalization test suite: N/A; Composition not supported.
Â
-
JEDI jcl
🔵 Origin: Based on Mike Lischke's Unicode library.
🔵 Maintained: Sporadically.
🔵 Maintained by author: No.
✅ License: MPL.
✅ Readability: Good. Code is clean.
â›” Performance: Very inefficient. String reallocations. The native string format is UCS4 (32-bit).
✅ Features: Supports decomposition and composition.
â›” Dependencies: Has dependencies on a plethora of other JEDI units.
✅ Data source: Unicode tables are generated from official Unicode data files. Source for converter provided.
✅ Table format: Generated resource files.
🔵 Unicode level: Currently at Unicode v13.
✅ Completeness: All Unicode tables are available.
✅ Hangul decomposition: Yes.
â›” Customizable: Required data and structures are private.
â›” Other: Requires installation (to generate the JEDI.inc file).
🔵 Unicode normalization test suite: Unknown
Â
The FPC implementation has had the composition part removed so that immediately disqualifies it and the JEDI implementation, while based on an originally nice and clean implementation, has gotten the usual JEDI treatment so it pulls in the rest of the JEDI jcl as dependencies. I could clean that up but it would amount to a fork of the code and I would prefer not to have to also maintain that piece of code.
That leaves the PUCU library and I currently have that integrated and working - or so I thought... Unfortunately, I have now found a number of severe defects in it and that has prompted me to search for alternatives again.
Â
Here's the project I need it for: https://gitlab.com/anders.bo.melander/pascaltype2
-
51 minutes ago, Stefan Glienke said:Cannot locate debugging information for file `somepath\Tests.exe'. Cannot match the module with the symbol file `somepath\Tests.pdb'. Make sure to specify the correct path to the symbol file in the Binary/Symbol Search list of directories.
I think that is a generic error message meaning "Something went wrong and our error handling sucks". As far as I remember you get a message like that regardless of what problem VTune encounters when resolving through the PDB file.
-
11 minutes ago, Anders Melander said:address bitness
I meant what do you mean by this ^
Â
Do you mean 32- vs 64-bit addresses? AFAIR there's no choice or ambiguities in the PDB format with regard to the size of an address value, relative or absolute, but I would have to take a look at the source to make sure.
MAP2PDB - Profiling with VTune
in Delphi Third-Party
Posted
Yes, it definitely should. I'm just too tired to dissect the file in a hex editor right now 🙂
Maybe later tonight.