Jump to content
Lars Fosdal

Feature req: Compiler unit dependency graph / log with warnings about circularity

Recommended Posts

https://quality.embarcadero.com/browse/RSP-41961

 

The process of cleaning up circular references can be quite challenging, as we today have no good tool to discover and track the unit interdependcy.  "Blatant" circular references are explicitly forbidden, but since we can include units both in the interface and the implementation section - it is quite easy to circumvent this rule.

 

Another challenge is when you inadvertently drag in a massive collection of units into your project, because someone needed a structure or function from a specific unit - which again uses other units, which uses others again - and so forth.

 

The discovery of where this unit is included, and when in the compilation it is parsed, would be significantly helped by a simple build log. A sequential log of the compilation of each unit in the application - indicating where it first was necessary to  compile another unit to complete the current unit.

 

I suggested it could look something like this - more comments in the QP issue.

Pls vote/comment if you find it interesting.

unit1 compiling... 
   unit2 compiling... 
     unit3 compiling...
     unit3 compiled (lines, warnings, hints)
   unit2 compiled (lines, warnings, hints)
unit1 compiled (lines, warnings, hints)

 

Share this post


Link to post

I too would love to see a solution for this.  I have read up on the problem but have no practical way to tackle it yet and I don't have the smarts to just create something from first principles.  I would pay good money for a utility that just does it and could clean up my code base (20 years and counting) in a day.  Sigh.

  • Like 1

Share this post


Link to post
3 minutes ago, Der schöne Günther said:

To speed up compilation?

Among other things.  See comments in issue. 

It has also been said that dealing with circularity is increasingly challenging for the compiler.

Share this post


Link to post

Yes, I have read that comment, but I am not much wiser. I have honestly never bothered with this, and now I wonder if I should have.

 

The core of the motivation is "Refactor the code, so the compiler trips over its own feet less"? What is "DCU cache stability" Marco mentions? Is it that Delphi often does not register source files updating (for example, by switching version control branches) and then tries to link outdated DCU files in?

Share this post


Link to post

Avoiding circular unit references speeds up compilation significantly. Let me cite @Bill Meyer in his excellent book Delphi Legacy Projects : Strategies and Survival Guide:

Quote

When you are working with 1,000 or more units, and the majority of them are participating in dozens (or more) of such cycles, and the length of the cycles is dozens (or more) units, then you have a real snarl.

And worse, you will find that the days of building in seconds are long gone, and that building the application now takes minutes.

As slower compile times also affect the IDE, especially Code Insight, we are not just talking about longer build times, but responsiveness of the IDE itself.

 

Also mentioned in the book and backed by my own experience and involvement, a very helpful tool is the Unit Dependency Analyzer built in MMX Code Explorer (also available as standalone). Bill dedicates a whole chapter in his book on Cleaning Uses Clauses.

  • Thanks 1

Share this post


Link to post

For me MMX Code Explorer works perfectly to reduce circular references (Unit dependency Analyzer). For compiler stability possible it's more important to write the entire unit name eg. Winapi.Windows instead of Windows and having a large list of unit scopes names in the options. 

Share this post


Link to post
1 minute ago, Lajos Juhász said:

For compiler stability possible it's more important to write the entire unit name eg. Winapi.Windows instead of Windows and having a large list of unit scopes names in the options. 

Relying on unit scope names also makes the compiling slower. Besides MMX there is also UsesCleaner as its command line companion resolving these issues.

  • Thanks 1

Share this post


Link to post
3 minutes ago, Uwe Raabe said:

Relying on unit scope names also makes the compiling slower.

For me it made unstable. There was no chance to do compile all for a large project group. The solution was to add scopes to unit names and improve code structure and reduce circular references.

  • Like 1

Share this post


Link to post

Sounds interesting. Another issue that could be solved more easily with this feature is inability to build some unit deep in dep graph. Now compiler just throws error "cannot compile used unit" at position of project's main uses section leaving the task of searching the trouble unit to you.

Edited by Fr0sT.Brutal

Share this post


Link to post

I guess going back to the feature set of Turbo Pascal 3 would also improve the stability of the IDE and the compile speed

[/sarcasm]

  • Haha 2

Share this post


Link to post
18 hours ago, Lars Fosdal said:

it is quite easy to circumvent this rule.

 

There is no rule being circumvented. The rule is you can't have a circular dependency in the interface. You are free to create them in the implementation.

 

I agree that should be avoided as much as possible and it often signals a need for some refactoring (breaking things out into smaller units). But it is not always wrong.

 

I agree that it would be nice to have tooling in the IDE to work on uses clauses to track interdependent units, unnecessary unit references, etc. I would especially like to be able to see exactly why a unit is being included in a compilation. In a very large project it can be difficult to track down who is bringing in a unit when you don't expect it to be needed.

 

Lots could be done here for sure.

Share this post


Link to post
7 hours ago, Der schöne Günther said:

Why should one care about unit interdependency? To speed up compilation?

It's not always bad, but generally you should structure your code so that you don't have "chicken and egg" issues. I tend to think that if you have two units depending on each other to compile, the parts that are creating the dependency should be broken out into another unit that isn't dependent on either one. The most obvious example are simple types appearing in interface sections of both units when each unit needs simple types from the other. Adding a third unit to define the simple types is an easy fix that just makes code easier to manage and extend later.

Share this post


Link to post
On 7/12/2023 at 9:22 AM, Brandon Staggs said:

It's not always bad, but generally you should structure your code so that you don't have "chicken and egg" issues. I tend to think that if you have two units depending on each other to compile, the parts that are creating the dependency should be broken out into another unit that isn't dependent on either one. The most obvious example are simple types appearing in interface sections of both units when each unit needs simple types from the other. Adding a third unit to define the simple types is an easy fix that just makes code easier to manage and extend later.

There can be exceptional cases which justify sparing use of circular dependencies. Donald Knuth makes the point with regard to sorting algorithms. @Stefan Glienke has also argued for such cases.

My point, and what @Uwe Raabe was speaking very explicitly about is large programs where circularity is commonplace, and not at all exceptional. There is no rational case which can support that practice. In the majority of cases, circular references are needed because of badly organized code modules. And I say that based on thousands of modules and millions of lines of code, over a period of 15+ years.

  • Like 2

Share this post


Link to post
On 7/12/2023 at 3:31 AM, Uwe Raabe said:

Avoiding circular unit references speeds up compilation significantly.

I might go so far as to call circular unit references code rot. And they are one of the most common issues in legacy code, in my experience.

Share this post


Link to post
On 7/11/2023 at 2:39 PM, Lars Fosdal said:

https://quality.embarcadero.com/browse/RSP-41961

 

The process of cleaning up circular references can be quite challenging, as we today have no good tool to discover and track the unit interdependcy.  "Blatant" circular references are explicitly forbidden, but since we can include units both in the interface and the implementation section - it is quite easy to circumvent this rule.

 

Another challenge is when you inadvertently drag in a massive collection of units into your project, because someone needed a structure or function from a specific unit - which again uses other units, which uses others again - and so forth.

 

The discovery of where this unit is included, and when in the compilation it is parsed, would be significantly helped by a simple build log. A sequential log of the compilation of each unit in the application - indicating where it first was necessary to  compile another unit to complete the current unit.

 

I suggested it could look something like this - more comments in the QP issue.

Pls vote/comment if you find it interesting.


unit1 compiling... 
   unit2 compiling... 
     unit3 compiling...
     unit3 compiled (lines, warnings, hints)
   unit2 compiled (lines, warnings, hints)
unit1 compiled (lines, warnings, hints)

 

One of the issues is that in a large project with many circular references, that log will show modules appearing again and again.

In one large project, I found over 20,000 such references, and a cold build was up to eight minutes. Reducing the circular references by about 25% brought the build time down to about a minute. And the time was stable over several builds in a session, where with 20K+ it had been increasing on successive builds.

 

MMX does produce a report of circularity chains. It is large, and not friendly to humans, but it is easy to parse, and I built  a simple tool which delivers in a grid the counts for each of the modules with circular references, and which can compare two reports, which is useful as you continue to work on the project.

  • Thanks 2

Share this post


Link to post

Let me make a rash assertion. If you have a large project in which some modules are large (>15K lines) and contain many (>200) uses references, then you are referencing some of them simply to gain access to relatively simple types. Move those to other modules with no code. Modules of types and consts can be global with no negative impact. But when those declarations are no longer found in massive units, you will begin to see your uses clauses shrink.

Edited by Bill Meyer

Share this post


Link to post
9 hours ago, Bill Meyer said:

Move those to other modules with no code.

This ^^^ - it's one of the first things I look to do when refactoring code.  

9 hours ago, Bill Meyer said:

large (<15K lines)

Units with 15K lines - well that's a code smell - units that long are horrible to work with. As an example, VirtualTrees.pas from Virtual-TreeView used to be 38K lines long. Reviewing the code it was easy to see why it was so long.. because of circular references between classes (often unnecessarily) and the heavy use of friend access to private variables. Refactoring it was not easy and did require some minor breaking changes - but the code is now much more manageable - VirtualTrees.pas is now only 2K lines and code is split into units that make sense. 

  • Like 3

Share this post


Link to post

Just a small anecdote: Working on one of my customers code base, which is heavily convoluted with circular references, I was able to break a cycle with simply using a string literal instead of a global constant declared in one of the units, knowingly sacrificing at least some of the Clean Code principles. The constant was declared like this:

const
  cLocalHost = 'localhost';

BTW, a valuable tool for me to understand someone else's code and detect the fibers a cycle is made from is SciTools Understand. Although it may look a bit expensive for some at first, the time saving effects are absolutely worth it.

Edited by Uwe Raabe

Share this post


Link to post
16 minutes ago, Uwe Raabe said:

BTW, a valuable tool for me to understand someone else's code and detect the fibers a cycle is made from is SciTools Understand. Although it may look a bit expensive for some at first, the time saving effects are absolutely worth it.

Does it actually support Delphi, couldn't find any mention of it on their site. Also, if I have to contact a vendor for a price I immediately lose interest... that always smells like their sales people rubbing hands -  "ooh lets research the prospect and see how much they can afford to pay". There is a saying.. "If you have to ask, you can't afford it".  

  • Like 2

Share this post


Link to post
1 minute ago, Vincent Parrett said:

Does it actually support Delphi

Yes, it does: Supported Languages. They are also providing new releases in a reasonable time frame. When I provide a test case showing some syntax confusing the parser, they usually fix it in the next two releases.

 

6 minutes ago, Vincent Parrett said:

Also, if I have to contact a vendor for a price I immediately lose interest

OK, but that doesn't say anything about the quality and usability of the product.

 

I may have reacted the same when I had found the website myself, but I already had another product from their German reseller, when they contacted me with a trial of Understand. That was 2014 and I declined with a comment about the poor Delphi support. In 2016 they came up with that again, and I agreed, because the Delphi support was sufficient for me at that time. Meanwhile, with some significant help from myself, it became even better.

 

As I often have to cope with foreign code and for that it proved very helpful. It turned out that it also gives some valuable insights in my own code, especially the sort of code that evolved over time. I am glad to have this tool at hand. One can argue about their sales channel, but IMHO that doesn't diminish the product itself. BTW, one can always try to negotiate with the reseller.

Share this post


Link to post
2 hours ago, Vincent Parrett said:

Units with 15K lines - well that's a code smell

Absolutely! But as you point out, it is also a serious task to refactor such things.

Share this post


Link to post

I wonder whether the Unit Dependency graph in PasDoc or the Project Dependencies expert in GExperts might be of any help here. They both contain the required information, but possibly not in a helpful format. Maybe they could be improved to become more helpful.

  • Like 1

Share this post


Link to post
8 hours ago, Vincent Parrett said:

VirtualTrees.pas is now only 2K lines and code is split into units that make sense. 

Nevertheless are there several cycles as shown by MMX Unit Dependency Analyzer:

 

image.thumb.png.841cb19647a00bfa0b808f60f587dcc5.png 

 

Here is a part of the Dependency Graph from Understand for the VirtualTrees -> VirtualTrees.WorkerThread cycle with information where the dependencies come from in the Dependency Browser below:

 

image.thumb.png.4cea5bde6ee1fcc605a1779a94962a06.png

Edited by Uwe Raabe

Share this post


Link to post
17 hours ago, Bill Meyer said:

In the majority of cases, circular references are needed because of badly organized code modules. And I say that based on thousands of modules and millions of lines of code, over a period of 15+ years.

Completely agree.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×