Fighting system slowness and improving DB performance

Our customer’s database has been experiencing high CPU pressure for some time, so we have been tasked to investigate this matter. User workstations would stop responding, freeze, and display DB connection problems.

 

Analysis

First of all, standard information required in such cases (logs, screenshots, files) has been collected and passed on to the vendor of application that ran on the SQL Server DB.

Due to the high usage and presence of the underlying problem the DB server has crashed, leaving behind logs and a stack dump.

DeadlockedSchedulers

There was no chance of getting this immediately analyzed by the vendor due the time of crash – late hours of Friday, so I have started the process myself. Luckily, there is information online of how to tell what has caused the server crash. I.e. if we get entries like the ones highlighted below, then the state of threads within the memory dump clearly identifies an issue with parallelism.

DumpContents

That prompted us to check the existing parallelism setting on the server and it was set to 0 (with 1024 as the maximum value). During the later peer review of our actions within SQL Server community it has been confirmed that this setting (0, with 1024 as maximum DOP) could indeed cause the aforementioned problem.

We came up with an initial suggestion to set the Maximum Degree of Parallelism setting on the server to 8 (as that was coherent with the number of schedulers present in this SQLServer – 24), but later discovered that in this particular system the value of 1 yields best performance. It is unusual to have it at 1, but some environments benefit from such setting (apparently Microsoft Sharepoint and Axapta should have MAXDOP 1 setting).

What we have observed below was particularly interesting. When query hints with OPTION clause were used in production environment, with nearly 30 agents updating the same two tables on a regular basis, a typical application query took over 20 seconds to execute with MAXDOP 8:

MAXDOP8

and only 1-2 seconds to execute with MAXDOP 1:

MAXDOP1

When the application ran on a small-scale POC with just the DB and a single agent updating the concerned table, the results were completely opposite – query with MAXDOP 8 took less time than MAXDOP 1. This has been noticed by a few people running different SQL Server instances, and could be related to the fact that in presence of significantly larger amount of table writes parallelism can have detrimental effect on server performance (even causing stack dumps, as we can see above).

 

Outcome

Since we have applied the change the system did not experience high CPU pressure anymore. For a number of busy days operators have been happy with the performance of the system: there were no crashes and disconnections.

 

Closing the loop

Our findings have been relayed to the vendor and their recommendation was to set the MAXDOP setting to 8, instead of 1 or the initial 0. We have also followed vendor’s recommendations to configure Full Text catalogue automatic updates on certain tables to make performance even better.

This turned out to be the first time this particular setting (MAXDOP) was mentioned in technical support records of the vendor. That highlights how large the workflow of this particular customer was, and how previously undiscovered aspects of database performance can be revealed in such circumstances (the DB was experiencing high usage due to a large number of FileSystemWatcher notifications being sent by the SAN, causing metadata update activities).

As a long term result, vendors now became aware of this setting (MAXDOP) and have suggested that it may be used in future software versions as part of the individual queries (after all the due testing, development and validation).

Update:

It appears that following our discovery the vendor has now added the MAXDOP setting to the DB monitoring page:

MAXDOP setting has been added to the DB monitoring page