Jump to content
david_navigator

Routine to check if set of dates match

Recommended Posts

I have a two sets of datetimes which I need to frequently check to see if the two lists are identical. There isn't the time to iterate through the lists whenever I need this info, so I'm thinking I need to store a value that represents the data.
I think it's a hash, but this isn't something I've done before and everything I've attempted hasn't worked i.e I've changed some of the datetimes and ended up with the same result.

 

If it's relevant there are usually about 60 in each set, but it could be between two and a thousand (always an even number) and each is only stored to the minute i.e seconds & mS are always zero.

 

Can anyone point me in the correct direction please ?

 

Many thanks

 

David

Share this post


Link to post

Simple approach: Create a string representation of each list and compare the strings when needed.

 

Of course you have to re-create the string whenever a list changes.

Share this post


Link to post
Guest

I think that using the follow, will be more flexible;

-- aux: array, dictionary, list, record, class, etc...

<dateOne, dateTwo, boolean> = 25/02/2022, 24/02/2022, false  == a "class" can allow more controll about setters and getters.

if dont need the date-values, then, store one "a boolean"

Share this post


Link to post
45 minutes ago, Uwe Raabe said:

Simple approach: Create a string representation of each list and compare the strings when needed.

 

Of course you have to re-create the string whenever a list changes.

Crafty 🙂

Share this post


Link to post
38 minutes ago, joaodanet2018 said:

I think that using the follow, will be more flexible;

-- aux: array, dictionary, list, record, class, etc...

<dateOne, dateTwo, boolean> = 25/02/2022, 24/02/2022, false  == a "class" can allow more controll about setters and getters.

if dont need the date-values, then, store one "a boolean"

Thanks, for the idea, though I don't it won't work in this particular case as there are actually many lists and I need to know if any list matches my "current" list.

Share this post


Link to post
6 minutes ago, david_navigator said:

Crafty 🙂

Perhaps a check if total strings lengths are equal, before constructing actual strings. Calculating lengths is usually faster then concatenating strings. Could save some time.

  • Like 1

Share this post


Link to post

four questions:

1. Where do the datetimes come from and how/when do they change? For example, if you're comparing file datetimes, then maintaining a status when the data changes might be better than checking when you need to know. They might change a lot less often than you check for a difference.

2. What is it you really need to know: whether any particular pair is different? or whether any difference exists anywhere in the list?

3. Is that always simply a 'difference exists' boolean? Or would knowing [greater than | equal to | less than] also needed?

4. Is there a guarantee that for every element[x] in list one, the same element exists at position[x] in list two? How is that guarantee enforced or checked?

 

All of these affect your potential solution options. For the first example, if the list changes much less often than the need to check, then you might create a tracking object with two boolean values: FUpdated and FDifferent. Set FUpdated to true when any list element changes. When checking, if FUpdated is false, then simply return the FDifferent boolean, or if true recheck the list, store and return the comparison operation result, and reset FUpdated to false. This can save you thousands of list comparisons if the lists are much less volatile than the need to know whether they're the same. Answers to the other questions could modify the structure of this tracking object.

Share this post


Link to post
Guest

You say "sets", does that imply that the date-lists are unsorted?

Share this post


Link to post
25 minutes ago, Dany Marmur said:

You say "sets", does that imply that the date-lists are unsorted?

The date lists are sorted. 

There can be N date lists,
I need to know which date lists match the current one.

Share this post


Link to post
50 minutes ago, bobD said:

four questions:

1. Where do the datetimes come from and how/when do they change? For example, if you're comparing file datetimes, then maintaining a status when the data changes might be better than checking when you need to know. They might change a lot less often than you check for a difference.

2. What is it you really need to know: whether any particular pair is different? or whether any difference exists anywhere in the list?

3. Is that always simply a 'difference exists' boolean? Or would knowing [greater than | equal to | less than] also needed?

4. Is there a guarantee that for every element[x] in list one, the same element exists at position[x] in list two? How is that guarantee enforced or checked?

 

All of these affect your potential solution options. For the first example, if the list changes much less often than the need to check, then you might create a tracking object with two boolean values: FUpdated and FDifferent. Set FUpdated to true when any list element changes. When checking, if FUpdated is false, then simply return the FDifferent boolean, or if true recheck the list, store and return the comparison operation result, and reset FUpdated to false. This can save you thousands of list comparisons if the lists are much less volatile than the need to know whether they're the same. Answers to the other questions could modify the structure of this tracking object.

1. My UI, so I "know" when a user changes a date.
2. I need to know if any lists are the same as my list.
3. answered by 2 - I don't want to know anything if they're different.
4. No. Imagine each list represents a hotel room and each element of the list is the check in and check out time for each visitor over the next 6 months - I need to know if any two rooms have exactly the same booking schedules.

 

The dates change frequently, the comparison gets run infrequently, but the compare has to be very fast and the date changes can be slow.

Share this post


Link to post
17 minutes ago, david_navigator said:

The dates change frequently, the comparison gets run infrequently, but the compare has to be very fast and the date changes can be slow.

I still suggest my string representation approach. With each dates list store its string representation, which is calculated each time the list changes (or with a lazy scheme). When comparing use the string representation. Comparing strings is usually pretty fast in Delphi.

Share this post


Link to post
4 hours ago, david_navigator said:

I have a two sets of datetimes which I need to frequently check to see if the two lists are identical. There isn't the time to iterate through the lists whenever I need this info, so I'm thinking I need to store a value that represents the data.
I think it's a hash, but this isn't something I've done before and everything I've attempted hasn't worked i.e I've changed some of the datetimes and ended up with the same result.

 

If it's relevant there are usually about 60 in each set, but it could be between two and a thousand (always an even number) and each is only stored to the minute i.e seconds & mS are always zero.

 

Can anyone point me in the correct direction please ?

 

Many thanks

 

David

Since you only need to be precise to the minute do not store TDatetimes but multiply the values by 60*24 and truncate the result to give an integer (perhaps an int64, you could reduce the range necessary by subtracting a suitable reference date first, e.g. EncodeDate(2020,1,1)). This gives you the number of minutes from the reference date. Store these into a TList<integer>. To calculate a hash use the list's ToArray method to get an array you can feed to a suitable hash algorithm from the System.Hash unit.

Edited by PeterBelow

Share this post


Link to post
22 hours ago, david_navigator said:

1. My UI, so I "know" when a user changes a date.
2. I need to know if any lists are the same as my list.
3. answered by 2 - I don't want to know anything if they're different.
4. No. Imagine each list represents a hotel room and each element of the list is the check in and check out time for each visitor over the next 6 months - I need to know if any two rooms have exactly the same booking schedules.

 

The dates change frequently, the comparison gets run infrequently, but the compare has to be very fast and the date changes can be slow.

"I need to know if any lists are the same as my list."

This implies an 'active record' where the question is really "does list <x> (my list) already exist in set of lists <y>"

"if any two rooms have exactly the same booking schedules."

This implies a much more complex question: "given set of lists <y>, are any two [or more] lists the same?" (basically, an iteration of the first question where x = 0 to number of lists -1.)

which is it?

 

Suppose a user changes a date-item for a room. One of four things happen:

--the list, previously distinct, is now a duplicate of another

--the list, previously a duplicate of another, is now distinct

--the list was and remains distinct

--the list was and remains a duplicate

Can we rule out any of these cases as impossible?

Does anything have to happen as a result of any of these cases? (For example, the change not being allowed)

 

Is there any persistence aspect to the question? What's being stored?

 

My initial design thought is to maintain a sorted list of list hashes, so that the question "does list <x> (my list) already exist in set of lists <y>" is a simple look-up on the list hash (which can itself be calculated and stored with the list whenever it changes). 

 

however, the statement "The dates change frequently, the comparison gets run infrequently" makes the whole idea of pre-calculation a case of swimming upstream. You don't generally make a frequent operation slower in order to speed up a relatively infrequent operation. So I agree with Joseph Mitzen's response that cautioned that some instrumentation is called for.

 

Edited by bobD

Share this post


Link to post
16 minutes ago, bobD said:

makes the whole idea of pre-calculation a case of swimming upstream

That's why I suggested a lazy scheme for the calculation. When date changes invalidate the calculated value. When comparing and the value is invalid calculate it. As every list has to be compared to each other, calculating at first use or before will still have its benefits.

 

22 hours ago, david_navigator said:

4. No. Imagine each list represents a hotel room and each element of the list is the check in and check out time for each visitor over the next 6 months - I need to know if any two rooms have exactly the same booking schedules.

As stated above this depends on the time range the lists cover. It is possible that one rooms list matches another ones only partly. I am unsure if this is also something being asked for.

Share this post


Link to post
15 hours ago, Joseph MItzen said:

How do you know this? How long is it taking?

Needs to be question #1. It is not uncommon for people to invest much time and energy in pursuit of optimizations they later find were not necessary.

As an old friend told me many years ago: First make it work, then make it fast.

Profiling always trumps supposition.

Share this post


Link to post

A hash of the list would only tell you if the list was identical across all items.  Keep in mind that any positive matches would have to be checked as it is mathematically possible to create the same hash value from different source values, more so if you are just using the DigestAsInteger function to grab a simple integer.  Also, it is entirely possible to generate a hash value ANYWHERE in the integer range, so don't rely on any special values (for example, 0 is a possible result). 

 

If the lists have partial matches you need to report on, you could build a list of lists for each list.  The idea is that each level would contain more information.  So first list would be a single year and contain a list of months, which contains a list of days, which contains lists of hours, and the next level minutes.  Each list can be a static array so there is no walking through the list, just checking against nil to see if that array position has a list (and potential values) or not.  It has the disadvantage of using more memory, and requires time to build/destroy the list, but should be very quick to access.  You just need to do a decodeDateTime to break a datetime into its parts, then use that to see if there is a value in the other list(s) that matches.  The idea is to create something manageable and eliminate a need to loop and compare more than necessary.  If you need to support multiple years, I would add another level to the start that would be a dynamic array of years and compute the array position as selected year - lowest year supported and grow your array as needed.  if the year in the comparison list is outside of the bounds of that array, then it doesn't match anything that year.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×