Jump to content
mazluta

Check If File is what he claim to be

Recommended Posts

some one ask in UNIGUI forum about how to check if file is really what the extension "said".

does MyFile.DocX is really Word DocX file.

 

Since i Have DMS (Data Management System) and it all about files and document i have a pascal unit just for that.
any one how will use it can do with it what he want + i don't take any responsibility for any use of this unit.

add the unit to the uses class.

call :  

if IsFileTypeAsClaim('c:\aa\a1.mkv' {full file name and path}) then
    showmessage('file is ok.')
else
    showmessage('The file is not of the declared type.');

uCheckFileType.pas

Edited by mazluta

Share this post


Link to post
7 hours ago, mazluta said:

if IsFileTypeAsClaim('c:\aa\a1.mkv' {full file name and path}) then
    showmessage('file is ok.')
else
    showmessage('The file is not of the declared type.');

The write such a function, your have to open the file and check his content to see if is what the file extension pretend.

Each file has a specific internal file format, usually in a header. You have to look at the specification for each file format and write code to do more or less serious checks. There is a website which could help you : https://docs.fileformat.com/

Share this post


Link to post

This code would be much better if each check of file header was done with same same code, against a signature declared either in a constant, or maybe in a file that was linked as a resource. This would make the code much cleaner, without so much repetition, and would allow you to extend it very easily. 

  • Like 1
  • Sad 1

Share this post


Link to post

Something like this (for Image Formats - the ones I detect for):

Quote


FUNCTION ImageFormat(CONST Data : TBytes) : CpuInt;
  BEGIN
    IF Data.Count<100 THEN
      Result:=TImageFormat.Unknown
    ELSE IF Data.StartsWith([$89,$50,$4E,$47]) THEN
      Result:=TImageFormat.PNG
    ELSE IF Data.StartsWith([$FF,$D8,$FF]) AND (Data[6]=$4A) AND (Data[7]=$46) AND (Data[8]=$49) AND (Data[9]=$46) THEN
      Result:=TImageFormat.JPG
    ELSE IF Data.StartsWith([$42,$4D]) THEN
      Result:=TImageFormat.BMP
    ELSE IF Data.StartsWith('GIF89') OR Data.StartsWith('GIF87') THEN
      Result:=TImageFormat.GIF
    ELSE IF Data.Contains('<svg ') THEN // Data.StartsWith('<?xml version="1.0" encoding="UTF-8"') OR Data.StartsWith('<svg ') THEN
      Result:=TImageFormat.SVG
    ELSE
      Result:=TImageFormat.Unknown
  END;


 

Using a TBytes Helper "StartsWith" and "Contains" - but should give you an idea.

 

Not 100% but enough for my usage, and should allow you to adapt the methodology to your own needs.

  • Like 1

Share this post


Link to post

For images, in the old times, we opened the file and tried to get its size. If it work the image was saved to avoid using the original file. Of course it can change the image quality and it supposed the loading/getsize feature has no security issue.

Share this post


Link to post

That won't detect a .PNG file with .JPG content (it will merely detect that it is some form of picture).

Share this post


Link to post
11 minutes ago, HeartWare said:

That won't detect a .PNG file with .JPG content (it will merely detect that it is some form of picture). 

If you open the file with TPNGImage it will raise an exception if the content is not PNG. Not so hard to detect it is not a PNG content.

Share this post


Link to post

It's funny to see various people posting their own solutions when the original post contains a comprehensive implementation.... 

Share this post


Link to post
1 hour ago, David Heffernan said:

It's funny to see various people posting their own solutions when the original post contains a comprehensive implementation.... 

Bike shedding

 

5 hours ago, David Heffernan said:

This code would be much better if each check of file header was done with same same code, against a signature

For example like this:

https://github.com/graphics32/graphics32/blob/b45d1108a8f57b66739731e952f81e2636c63abd/Source/GR32.ImageFormats.pas#L386

 

Example of use:

const
  FileSignaturePNG: AnsiString        = #$89#$50#$4e#$47#$0d#$0a#$1a#$0a;
  FileSignaturePNGMask: AnsiString    = #$ff#$ff#$ff#$ff#$ff#$ff#$ff#$ff;

...

function TImageFormatAdapterPNG.CanLoadFromStream(AStream: TStream): boolean;
begin
  Result := CheckFileSignature(AStream, FileSignaturePNG, FileSignaturePNGMask);
end;

 

Edited by Anders Melander

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×