Feature req: Compiler unit dependency graph / log with warnings about circularity

Lars Fosdal · July 11, 2023

https://quality.embarcadero.com/browse/RSP-41961

The process of cleaning up circular references can be quite challenging, as we today have no good tool to discover and track the unit interdependcy. "Blatant" circular references are explicitly forbidden, but since we can include units both in the interface and the implementation section - it is quite easy to circumvent this rule.

Another challenge is when you inadvertently drag in a massive collection of units into your project, because someone needed a structure or function from a specific unit - which again uses other units, which uses others again - and so forth.

The discovery of where this unit is included, and when in the compilation it is parsed, would be significantly helped by a simple build log. A sequential log of the compilation of each unit in the application - indicating where it first was necessary to compile another unit to complete the current unit.

I suggested it could look something like this - more comments in the QP issue.

Pls vote/comment if you find it interesting.

unit1 compiling... 
   unit2 compiling... 
     unit3 compiling...
     unit3 compiled (lines, warnings, hints)
   unit2 compiled (lines, warnings, hints)
unit1 compiled (lines, warnings, hints)

RCrandall · July 11, 2023

I too would love to see a solution for this. I have read up on the problem but have no practical way to tackle it yet and I don't have the smarts to just create something from first principles. I would pay good money for a utility that just does it and could clean up my code base (20 years and counting) in a day. Sigh.

Der schöne Günther · July 12, 2023

Can you elaborate on the motivation? Why should one care about unit interdependency? To speed up compilation?

Lars Fosdal · July 12, 2023

3 minutes ago, Der schöne Günther said:

To speed up compilation?

Among other things. See comments in issue.

It has also been said that dealing with circularity is increasingly challenging for the compiler.

Der schöne Günther · July 12, 2023

Yes, I have read that comment, but I am not much wiser. I have honestly never bothered with this, and now I wonder if I should have.

The core of the motivation is "Refactor the code, so the compiler trips over its own feet less"? What is "DCU cache stability" Marco mentions? Is it that Delphi often does not register source files updating (for example, by switching version control branches) and then tries to link outdated DCU files in?

Uwe Raabe · July 12, 2023

Avoiding circular unit references speeds up compilation significantly. Let me cite @Bill Meyer in his excellent book Delphi Legacy Projects : Strategies and Survival Guide:

Quote

When you are working with 1,000 or more units, and the majority of them are participating in dozens (or more) of such cycles, and the length of the cycles is dozens (or more) units, then you have a real snarl.

And worse, you will find that the days of building in seconds are long gone, and that building the application now takes minutes.

As slower compile times also affect the IDE, especially Code Insight, we are not just talking about longer build times, but responsiveness of the IDE itself.

Also mentioned in the book and backed by my own experience and involvement, a very helpful tool is the Unit Dependency Analyzer built in MMX Code Explorer (also available as standalone). Bill dedicates a whole chapter in his book on Cleaning Uses Clauses.

Lajos Juhász · July 12, 2023

For me MMX Code Explorer works perfectly to reduce circular references (Unit dependency Analyzer). For compiler stability possible it's more important to write the entire unit name eg. Winapi.Windows instead of Windows and having a large list of unit scopes names in the options.

Uwe Raabe · July 12, 2023

1 minute ago, Lajos Juhász said:

For compiler stability possible it's more important to write the entire unit name eg. Winapi.Windows instead of Windows and having a large list of unit scopes names in the options.

Relying on unit scope names also makes the compiling slower. Besides MMX there is also UsesCleaner as its command line companion resolving these issues.

Lajos Juhász · July 12, 2023

3 minutes ago, Uwe Raabe said:

Relying on unit scope names also makes the compiling slower.

For me it made unstable. There was no chance to do compile all for a large project group. The solution was to add scopes to unit names and improve code structure and reduce circular references.

Fr0sT.Brutal · July 12, 2023

Sounds interesting. Another issue that could be solved more easily with this feature is inability to build some unit deep in dep graph. Now compiler just throws error "cannot compile used unit" at position of project's main uses section leaving the task of searching the trouble unit to you.

Edited July 12, 2023 by Fr0sT.Brutal

dummzeuch · July 12, 2023

I guess going back to the feature set of Turbo Pascal 3 would also improve the stability of the IDE and the compile speed

[/sarcasm]

Brandon Staggs · July 12, 2023

18 hours ago, Lars Fosdal said:

it is quite easy to circumvent this rule.

There is no rule being circumvented. The rule is you can't have a circular dependency in the interface. You are free to create them in the implementation.

I agree that should be avoided as much as possible and it often signals a need for some refactoring (breaking things out into smaller units). But it is not always wrong.

I agree that it would be nice to have tooling in the IDE to work on uses clauses to track interdependent units, unnecessary unit references, etc. I would especially like to be able to see exactly why a unit is being included in a compilation. In a very large project it can be difficult to track down who is bringing in a unit when you don't expect it to be needed.

Lots could be done here for sure.

Brandon Staggs · July 12, 2023

7 hours ago, Der schöne Günther said:

Why should one care about unit interdependency? To speed up compilation?

It's not always bad, but generally you should structure your code so that you don't have "chicken and egg" issues. I tend to think that if you have two units depending on each other to compile, the parts that are creating the dependency should be broken out into another unit that isn't dependent on either one. The most obvious example are simple types appearing in interface sections of both units when each unit needs simple types from the other. Adding a third unit to define the simple types is an easy fix that just makes code easier to manage and extend later.

Bill Meyer · July 13, 2023

On 7/12/2023 at 9:22 AM, Brandon Staggs said:

It's not always bad, but generally you should structure your code so that you don't have "chicken and egg" issues. I tend to think that if you have two units depending on each other to compile, the parts that are creating the dependency should be broken out into another unit that isn't dependent on either one. The most obvious example are simple types appearing in interface sections of both units when each unit needs simple types from the other. Adding a third unit to define the simple types is an easy fix that just makes code easier to manage and extend later.

There can be exceptional cases which justify sparing use of circular dependencies. Donald Knuth makes the point with regard to sorting algorithms. @Stefan Glienke has also argued for such cases.

My point, and what @Uwe Raabe was speaking very explicitly about is large programs where circularity is commonplace, and not at all exceptional. There is no rational case which can support that practice. In the majority of cases, circular references are needed because of badly organized code modules. And I say that based on thousands of modules and millions of lines of code, over a period of 15+ years.

Bill Meyer · July 13, 2023

On 7/12/2023 at 3:31 AM, Uwe Raabe said:

Avoiding circular unit references speeds up compilation significantly.

I might go so far as to call circular unit references code rot. And they are one of the most common issues in legacy code, in my experience.

Bill Meyer · July 13, 2023

On 7/11/2023 at 2:39 PM, Lars Fosdal said:
https://quality.embarcadero.com/browse/RSP-41961

The process of cleaning up circular references can be quite challenging, as we today have no good tool to discover and track the unit interdependcy. "Blatant" circular references are explicitly forbidden, but since we can include units both in the interface and the implementation section - it is quite easy to circumvent this rule.

Another challenge is when you inadvertently drag in a massive collection of units into your project, because someone needed a structure or function from a specific unit - which again uses other units, which uses others again - and so forth.

The discovery of where this unit is included, and when in the compilation it is parsed, would be significantly helped by a simple build log. A sequential log of the compilation of each unit in the application - indicating where it first was necessary to compile another unit to complete the current unit.

I suggested it could look something like this - more comments in the QP issue.

Pls vote/comment if you find it interesting.
unit1 compiling... 
   unit2 compiling... 
     unit3 compiling...
     unit3 compiled (lines, warnings, hints)
   unit2 compiled (lines, warnings, hints)
unit1 compiled (lines, warnings, hints)

One of the issues is that in a large project with many circular references, that log will show modules appearing again and again.

In one large project, I found over 20,000 such references, and a cold build was up to eight minutes. Reducing the circular references by about 25% brought the build time down to about a minute. And the time was stable over several builds in a session, where with 20K+ it had been increasing on successive builds.

MMX does produce a report of circularity chains. It is large, and not friendly to humans, but it is easy to parse, and I built a simple tool which delivers in a grid the counts for each of the modules with circular references, and which can compare two reports, which is useful as you continue to work on the project.

Bill Meyer · July 13, 2023

Let me make a rash assertion. If you have a large project in which some modules are large (>15K lines) and contain many (>200) uses references, then you are referencing some of them simply to gain access to relatively simple types. Move those to other modules with no code. Modules of types and consts can be global with no negative impact. But when those declarations are no longer found in massive units, you will begin to see your uses clauses shrink.

Edited July 14, 2023 by Bill Meyer

Vincent Parrett · July 14, 2023

9 hours ago, Bill Meyer said:

Move those to other modules with no code.

This ^^^ - it's one of the first things I look to do when refactoring code.

9 hours ago, Bill Meyer said:

large (<15K lines)

Units with 15K lines - well that's a code smell - units that long are horrible to work with. As an example, VirtualTrees.pas from Virtual-TreeView used to be 38K lines long. Reviewing the code it was easy to see why it was so long.. because of circular references between classes (often unnecessarily) and the heavy use of friend access to private variables. Refactoring it was not easy and did require some minor breaking changes - but the code is now much more manageable - VirtualTrees.pas is now only 2K lines and code is split into units that make sense.

Uwe Raabe · July 14, 2023

Just a small anecdote: Working on one of my customers code base, which is heavily convoluted with circular references, I was able to break a cycle with simply using a string literal instead of a global constant declared in one of the units, knowingly sacrificing at least some of the Clean Code principles. The constant was declared like this:

const
  cLocalHost = 'localhost';

BTW, a valuable tool for me to understand someone else's code and detect the fibers a cycle is made from is SciTools Understand. Although it may look a bit expensive for some at first, the time saving effects are absolutely worth it.

Edited July 14, 2023 by Uwe Raabe

Vincent Parrett · July 14, 2023

16 minutes ago, Uwe Raabe said:

BTW, a valuable tool for me to understand someone else's code and detect the fibers a cycle is made from is SciTools Understand. Although it may look a bit expensive for some at first, the time saving effects are absolutely worth it.

Does it actually support Delphi, couldn't find any mention of it on their site. Also, if I have to contact a vendor for a price I immediately lose interest... that always smells like their sales people rubbing hands - "ooh lets research the prospect and see how much they can afford to pay". There is a saying.. "If you have to ask, you can't afford it".

Uwe Raabe · July 14, 2023

1 minute ago, Vincent Parrett said:

Does it actually support Delphi

Yes, it does: Supported Languages. They are also providing new releases in a reasonable time frame. When I provide a test case showing some syntax confusing the parser, they usually fix it in the next two releases.

6 minutes ago, Vincent Parrett said:

Also, if I have to contact a vendor for a price I immediately lose interest

OK, but that doesn't say anything about the quality and usability of the product.

I may have reacted the same when I had found the website myself, but I already had another product from their German reseller, when they contacted me with a trial of Understand. That was 2014 and I declined with a comment about the poor Delphi support. In 2016 they came up with that again, and I agreed, because the Delphi support was sufficient for me at that time. Meanwhile, with some significant help from myself, it became even better.

As I often have to cope with foreign code and for that it proved very helpful. It turned out that it also gives some valuable insights in my own code, especially the sort of code that evolved over time. I am glad to have this tool at hand. One can argue about their sales channel, but IMHO that doesn't diminish the product itself. BTW, one can always try to negotiate with the reseller.

Bill Meyer · July 14, 2023

2 hours ago, Vincent Parrett said:

Units with 15K lines - well that's a code smell

Absolutely! But as you point out, it is also a serious task to refactor such things.

dummzeuch · July 14, 2023

I wonder whether the Unit Dependency graph in PasDoc or the Project Dependencies expert in GExperts might be of any help here. They both contain the required information, but possibly not in a helpful format. Maybe they could be improved to become more helpful.

Uwe Raabe · July 14, 2023

8 hours ago, Vincent Parrett said:

VirtualTrees.pas is now only 2K lines and code is split into units that make sense.

Nevertheless are there several cycles as shown by MMX Unit Dependency Analyzer:

Here is a part of the Dependency Graph from Understand for the VirtualTrees -> VirtualTrees.WorkerThread cycle with information where the dependencies come from in the Dependency Browser below:

Edited July 14, 2023 by Uwe Raabe

Brandon Staggs · July 14, 2023

17 hours ago, Bill Meyer said:

In the majority of cases, circular references are needed because of badly organized code modules. And I say that based on thousands of modules and millions of lines of code, over a period of 15+ years.

Completely agree.

Sign In

Feature req: Compiler unit dependency graph / log with warnings about circularity

Recommended Posts

Lars Fosdal 1866

Share this post

Link to post

RCrandall 8

Share this post

Link to post

Der schöne Günther 336

Share this post

Link to post

Lars Fosdal 1866

Share this post

Link to post

Der schöne Günther 336

Share this post

Link to post

Uwe Raabe 2160

Share this post

Link to post

Lajos Juhász 323

Share this post

Link to post

Uwe Raabe 2160

Share this post

Link to post

Lajos Juhász 323

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

dummzeuch 1656

Share this post

Link to post

Brandon Staggs 384

Share this post

Link to post

Brandon Staggs 384

Share this post

Link to post

Bill Meyer 339

Share this post

Link to post

Bill Meyer 339

Share this post

Link to post

Bill Meyer 339

Share this post

Link to post

Bill Meyer 339

Share this post

Link to post

Vincent Parrett 847

Share this post

Link to post

Uwe Raabe 2160

Share this post

Link to post

Vincent Parrett 847

Share this post

Link to post

Uwe Raabe 2160

Share this post

Link to post

Bill Meyer 339

Share this post

Link to post

dummzeuch 1656

Share this post

Link to post

Uwe Raabe 2160

Share this post

Link to post

Brandon Staggs 384

Share this post

Link to post

Create an account or sign in to comment

Create an account