Berocoder 28 Posted 10 hours ago (edited) Our team have a big problem now but I don't expect people here have the complete solution. Maybe I get a hint. Background I work in a team with for developers. We maintain a big ERP desktop program for transport logistics. Size is about 9 MLOC but that include Devexpress grid and components. The company have about 200 active users. What is a bit unusual is that we use Bold for Delphi as framework between database and application code. That means 99% of SQL for read and 100 % of write queries are generated by Bold. Application is deployed on GCP (Google Computing Platform). The code is far from perfect, there is for example random exceptions in a dialog that handle ferries... We made a patch 29 April morning. People worked normal on that day. Next day 30 April we got reports on slow performance. Slow SQL queries are logged. So I checked that 24 - 29 April there was about 80-90 queries per day that took more than 15 seconds. 30 April and forward there was 200-400 queries per day. There is a slight increases of deadlocks. Now 30 - 40 per day. Previosly it was 15 - 25. I don't get it. Previous patch was made 11 April. Commits in git after patch was not so complicated. A new column was added to a table that replaced a calculated value for vehicles driver. And reads of that was also significantly faster. And also some other minor changes. Anyway if the patch is the reason 29 April should also been a slow day. It wasn't. This is a sample slow query. PlanMission table represent a Trip for vehicle. But each vehicle unit have own insteance. So for example if a truck combination consists of a hauler and a trailer that is 2 PlanMissions. The PlanMission.Trailer link is connected in that case. The purpose of query is to have one row per combination. Trailers are excluded. 2025-05-06 10:13:05:801 (21820) PERF: TBoldUniDACQuery.Open took 58.803 seconds (0.000s cpu) 1 sql for SELECT PlanMission.bold_id FROM PlanMission WHERE (statePlanOpen > 0 OR stateInProcess > 0) AND PlanMission.bold_id NOT IN (SELECT Trailer FROM PlanMission) It took almost a minute to execute. Result is around 1000 rows. One thing I thought about is that we have always used READ_COMMITED isolation in MS SQL Server database. I am no DB guru but that means if any row in the result is updating but not committed queries like above have to wait until done. Advantage is that read query got the latest data. Disadvantage is that user have to wait for the result. But write queries should be fast so don't know either.... If I run query above it takes 0 seconds but that might use the cache in database so maybe not fair comparison. Other things to mention. There is a big project to improve logging but it is not in production yet. Unfortunately that would be handy now. Currently the log is written to a text-file. One per user and session. In the future all logs will be sent to a LogManager application that save it in a logdatabase instead. This will make it easier to analyze problems where different users read and write same data. That is also a problem we had in the past. A group of users both read and modify vehicle-trips and parcels. This is slow. But currently much slower than normal. Can only guess of the reason. It can be the patch. Or it can be some other external issue that came now. As I said I don't expect a solution but maybe a discussion that generate ideas 🙂 Edited 10 hours ago by Berocoder Share this post Link to post
PeaShooter_OMO 34 Posted 8 hours ago (edited) @Berocoder Does your SQL Servers update automatically through Windows Updates or some other mechanism? Was there any other Windows Updates that took place? What does the Performance monitoring on those machines/VMs (client and server) tell you about the current state of the machine when this issue appears or when you run those slow queries? Edited 8 hours ago by PeaShooter_OMO Share this post Link to post
Die Holländer 83 Posted 6 hours ago Is it always on specific SQL statements/queries, like the Select you showed? You can try not to lock those Selects by: 'select * from "myTable" where ..... with(NOLOCK)'; Just to see if the locking READ COMMITTED is the problem. See if the database drivers your users using are still the same. Share this post Link to post
Berocoder 28 Posted 6 hours ago (edited) 2 hours ago, PeaShooter_OMO said: @Berocoder Does your SQL Servers update automatically through Windows Updates or some other mechanism? Was there any other Windows Updates that took place? What does the Performance monitoring on those machines/VMs (client and server) tell you about the current state of the machine when this issue appears or when you run those slow queries? The OS behind SQL Server is handled by Google. None of us have access to it. We see only MSSQL Instance. On applications server was latest updates 10 April so before this problem starts. Edited 5 hours ago by Berocoder Share this post Link to post
Berocoder 28 Posted 5 hours ago 5 minutes ago, Die Holländer said: Is it always on specific SQL statements/queries, like the Select you showed? You can try not to lock those Selects by: 'select * from "myTable" where ..... with(NOLOCK)'; Just to see if the locking READ COMMITTED is the problem. See if the database drivers your users using are still the same. I have actually tested READ_COMMITTED in SQL Studio by open 2 tabs. One with update query with no commit. And the other with read query for same data. And read query is waiting until I commit on first tab. So that is how READ_COMMITTED works. But I also think that update should not take long time. Obviously it is more complicated than that. Had meeting today and we looked at slow queries from Google admin console. Most was select queries but there was also some updates. Single slowest update for one row was 16 seconds. So this feels weird. It should be done in milliseconds. I suppose it waiting on something else so it is a chain reaction. We also saw that some logs was missing 30.4 the day problem started. Might be that the problem starts in Google. We have made a support ticket to them. But much more questions than answers now. Share this post Link to post
Berocoder 28 Posted 3 hours ago (edited) We have an application that pump data from main database to another smaller database. So it just reads from production DB. But it reads a lot! When we turned off that most of the conflicts was gone. The use case for the application is analyze of data. Turn off that is not so popular in management. But we know more now at least Edited 3 hours ago by Berocoder 1 Share this post Link to post
Brian Evans 122 Posted 3 hours ago Storage IO suddenly being slower will cause similar symptoms. All writes and any reads that need to get data from the disk instead of the cache bottleneck. Things that used to not interfere with each other now do. Years ago a battery on a RAID card in a SQL Server cluster failed making write performance 5% of normal. All sorts of odd issues until the anemic disk performance was noticed and bunch of disk heavy tasks were temporarily disabled to allow the main work to still get done. Share this post Link to post