Jump to content
RadStudio 10.3.1 was released today Read more... ×
Tommi Prami

Does anyone have some library/unit to make coding of tokeniser/parser/somekind of tree easier?

Recommended Posts

Don't need to be ready to run thingy, just something to make things little bit easier. So don't have to code all from Scratch.

 

Would be good exercise, but not enough free time to tinker on it.

 

Or some article on how to actually code one would also help.

 

-Tee- 

Edited by Tommi Prami

Share this post


Link to post

@Tommi Prami That's easy: https://github.com/RomanYankovsky/DelphiAST 

 

You might want to try the version with all my patches in it: https://github.com/JBontes/DelphiAST 

 

It seems Roman Yankovsky has not had time to integrate all my issues yet. 

 

It **is** a very mature library and using it will serve you well. If you want to change the code, grokking the

calls between the different parser levels does take some getting used to. I suggest first reading up on the difference

between a parser and a lexer if you want to do that.

 

PS don't be put off by the XML output, that's just a cheap trick to display the contents of the tree. You don't need it to work with DelphiAST.

Edited by Johan Bontes
  • Like 3
  • Thanks 3

Share this post


Link to post
3 hours ago, Johan Bontes said:

It seems Roman Yankovsky has not had time to integrate all my issues yet. 

It is very unfortunate that you used merge to get the latest master changes into your fork rather than rebasing.

That makes looking at your changes very difficult (I had to look back through the commits until I found yours that were not integrated into master via PR).

Also having all changes in one branch will result in some huge PR whereas putting each different change into it's own branch and submitting PRs for that would be easier to review.

 

I don't know about Roman but I would not accept a PR with a ton of different fixes where I cannot separate them from each other and validate them individually.

I just bring this up because I want those bugfixes incorporated back into the main project.

  • Like 1

Share this post


Link to post
1 hour ago, Tommi Prami said:

More than less general. Any syntax basically...  

What does that mean? Do you want a Pascal parser, or an any/all language parser?

The former exists, the latter does not.

Obviously there are parsers for other languages, just not one parser for every language.

 

There is however Langserver   you might want to have a look at that.

Share this post


Link to post

Although it has been pretty quiet over there for some years I still use GOLD Parsing System in some of my applications. Besides some pre-built ones it allows to create your own grammars as needed.

  • Like 1
  • Thanks 1

Share this post


Link to post

@Uwe Raabe I read a little about this parser, great tool! There is a ready-to-use grammar for Delphi, though without the support of new language features, such as generics, anonymous functions and closures. And there is several GOLD Parser Engines with sources written in Delphi. I took the most recent version and updated it a bit, now it can be compiled in Delphi 10.2.3. Put it on GitHub.

  • Like 1

Share this post


Link to post

Look at https://gitlab.com/teo-tsirpanis/gold-parser-Lazarus for a version 5 compatible Gold engine.  Version 5 grammars are not compatible with versions 1 engines.

But IMHO the good old lex/yacc https://github.com/RomanYankovsky/ndyacclex can be more easily integrated with your Delphi projects.   No need to rely on third party libraries.  One could develop and test the grammar with GOLD I suppose and then translate it to lex/yacc.

Edited by pyscripter
  • Like 3
  • Thanks 1

Share this post


Link to post

@pyscripter Thank you for the finds! I had some free time, so I adapted Teo's parser for Delphi. Also corrected some bugs in this parser and in the grammar for the Delphi 7.0 language. https://github.com/Kryuski/GOLD-Parsing-System-For-Delphi

 

@Markus Kinzler Interesting. I will definitely follow this project.

 

Update: Parse::Easy is written using Perl. Yes, it can generate a lexer and a parser in Delphi laguage, but hey! I do not want to install another IDE and learn a new language when such things may well be written in Delphi. Once they wrote ErrorInsight on J#, and I think it was a bad decision. Maybe later, I or someone else can rewrite Parse::Easy to Delphi. But now it does not make sense, since it is still in the beta stage.

Edited by Kryvich
  • Thanks 3

Share this post


Link to post
On 11/2/2018 at 3:41 PM, Johan Bontes said:

What does that mean? Do you want a Pascal parser, or an any/all language parser?

The former exists, the latter does not.

Obviously there are parsers for other languages, just not one parser for every language.

 

There is however Langserver   you might want to have a look at that.

OK, i'll rephrase 🙂

 

Any syntax chosen. Forgot abo8ut DelphiAST, that could be first stepping stone to look from.

 

I've just been thinking that would be cool to study how it is done, and maybe do some small app in the process.

 

-Tee-

 

Share this post


Link to post

There is a book by Terence Parr on Language Implementation Patterns that might be of interest.

 

Sue

  • Like 2

Share this post


Link to post

What you want is called a "Lexer" or "Parser" (depends on whether or not it is persistent), and should not be hard to get actually. Below you find one that I just wrote myself, feel free to use/modify/delete/eat/drink it or do whatever you want with it if it helps you (untested!).

 

unit Lexer;

interface

uses
  System.SysUtils, System.Types, System.Generics.Collections;

type
  TLexer = class abstract
  public type
    ETokenError = class(Exception);

    TToken = record
    private
      FText: String;
      FPosition: TPoint;
      FKind: Byte;
    public
      property Text: String read FText;
      property Position: TPoint read FPosition;
      property Kind: Byte read FKind;
    end;
  private
    FTokens: TList<TToken>;
    function GetTokenCount: Integer;
    function GetTokens(const AIndex: Integer): TToken;
  protected
    // Check if end of text is reached
    function EndsText(const AChar: PChar): Boolean; virtual;
    // Check if end of line is reached
    function BreaksLine(const AChar: PChar): Boolean; virtual;
    // Check if Char is valid (abort if not)
    function IsValidChar(const AChar: Char; const AKind: Byte): Boolean; virtual;
    // Get kind of new token
    function TokenKind(var AChar: PChar): Byte; virtual; abstract;
    // Check if token ends here
    function EndsToken(var AChar: PChar; const AKind: Byte): Boolean; virtual; abstract;
    // Convert token kind if necessary
    procedure ConvertToken(var AChar: PChar; var AKind: Byte); virtual; abstract;
  public
    property Tokens[const AIndex: Integer]: TToken read GetTokens;
    property TokenCount: Integer read GetTokenCount;
    constructor Create(const AText: String);
    destructor Destroy; override;
  end;

implementation

{ TLexer }

function TLexer.BreaksLine(const AChar: PChar): Boolean;
begin
  Result := String.Create([AChar[0], AChar[1]]).Equals(sLineBreak);
end;

constructor TLexer.Create(const AText: String);

  procedure Parse;
  var
    Current: PChar;
    Previous: PChar;
    Token: TToken;
    Kind: Byte;
    Position: TPoint;
    StringBuilder: TStringBuilder;
  begin
    Position := Default(TPoint);
    Current := PChar(AText);
    StringBuilder := TStringBuilder.Create;
    try
      while not EndsText(Current) do
      begin
        Previous := Current;
        Kind := TokenKind(Current);
        while not (EndsText(Current) or EndsToken(Current, Kind)) do
        begin
          if not IsValidChar(Current[0], Kind) then
          begin
            raise ETokenError.CreateFmt('Invalid character: ', [String.Create([Current[0]]).QuotedString]);
          end;
          ConvertToken(Current, Kind);
          if BreaksLine(Current) then
          begin
            Inc(Position.Y);
          end;
          StringBuilder.Append(Current);
          Inc(Current);
        end;
        Token.FText := StringBuilder.ToString;
        Token.FKind := Kind;
        Inc(Position.X, (Current - Previous) div SizeOf(Char));
        Token.FPosition := Position;
        StringBuilder.Clear;
        FTokens.Add(Token);
      end;
    finally
      StringBuilder.Free;
    end;
  end;

begin
  inherited Create;
  FTokens := TList<TToken>.Create;
  Parse;
end;

destructor TLexer.Destroy;
begin
  FTokens.Free;
  inherited;
end;

function TLexer.EndsText(const AChar: PChar): Boolean;
begin
  Result := AChar[0] = #0;
end;

function TLexer.GetTokenCount: Integer;
begin
  Result := FTokens.Count;
end;

function TLexer.GetTokens(const AIndex: Integer): TToken;
begin
  Result := FTokens[AIndex];
end;

function TLexer.IsValidChar(const AChar: Char; const AKind: Byte): Boolean;
begin
  Result := CharInSet(AChar, [Low(AnsiChar) .. High(AnsiChar)]);
end;

end.

 

Note however, that this is not optimal performance-wise.

Edited by Dennis07
  • Like 1

Share this post


Link to post

hey, i've just suscribed because i was looking for a Delphi forum to learn a bit, and found this, 

 

Some months ago I wanted to build a GUI/compiler for corewar games (the one invented in '84 where you program in asm a warrior to fight with the other warrior in a core). 

So I have this exact same problem, I've tried the Gold Parser stuff, but I was unable to understand all the underlying stuff. 

What i finally did was, remembering the compilers lessons in the university, that really it's not complex to build your own lexer/parser. 

If i remember right the perfect book was the one with a dragon in the cover? or that was the one about Operative Systems? . 

Btw, still not finished it, but it got it working to the symbols table :D (because corewar assemblers need macro processings, etc...)

 

Anyway, depending on what you want to parse and do, maybe it's easier to build your own one. There is a very good tutorial online in Pascal about this, if you are interested I can have a look. 

 

Regards, and thanks everyone, this is my first post :D

Share this post


Link to post

Quite a few years ago, Jack Crenshaw wrote a series of articles on how to build a simple compiler. He wrote in TP, and develops the whole thing step by step. It is a recursive descent compiler, and originally emitted assembly code for a Motorola 68000. The articles have been further developed over the years, and the target altered to the 80x86. The value is simply that he wrote this for people who are not writing a compiler professionally, but who wish to learn the mechanics of compiling code. The writing is clear, and the whole development easy to follow. The articles can be found here:

http://www.pp4s.co.uk/main/tu-trans-comp-jc-intro.html

 

  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×