MAXDOP – Taking Away Parallelism

What happens when you take parallelism away from SQL Sever. (nothing good)

Steps:

  1. Run the database and table creation script.
  2. Jam some data in the table using SQLQueryStress.
  3. Clear out the wait stats. (I’m only interested in the waits associated with the MAXDOP statement not the insertion waits)
  4. Issue a query against the database with using OPTION (MAXDOP 1).

 

In the first pic we remove parallelism from my laptop. (4 core with 8 g/ram)

The first waiter and winner by a long shot is IO_COMPLETION, which is often a great sign for subsystem issues.

Let’s take a look at how the files on the subsystem are performing.

WithMAXDOP

A quick look at sys.dm_io_virtual_file_stats shows the average write stalls for the tempdb data file are off the map.

At first glance I’m guessing SGAM contention.

How can we tell if this is SGAM contention or just a shitty subsystem?

The first clue are the stalls on the tempdb data file.

The data file shows the worst wait for both reads and writes for any file.

Again, this reinforces my belief that this is tempdb. (wrong assumption on my part)

 

shittySubsytem

It just so happens we have a script for that. (more on SGAM contention)

With Tasks
As (Select session_id,
        wait_type,
        wait_duration_ms,
        blocking_session_id,
        resource_description,
        PageID = Cast(Right(resource_description, Len(resource_description)
                - Charindex(':', resource_description, 3)) As Int)
    From sys.dm_os_waiting_tasks
    Where wait_type Like 'PAGE%LATCH_%'
    And resource_description Like '2:%')
Select session_id,
        wait_type,
        wait_duration_ms,
        blocking_session_id,
        resource_description,
    ResourceType = Case
        When PageID = 1 Or PageID % 8088 = 0 Then 'Is PFS Page'
        When PageID = 2 Or PageID % 511232 = 0 Then 'Is GAM Page'
        When PageID = 3 Or (PageID - 1) % 511232 = 0 Then 'Is SGAM Page'
        Else 'Is Not PFS, GAM, or SGAM page'
    End
From Tasks;

When the results show us it’s not tempdb, we have our answer. A shitty subsystem.

So why aren’t the reads higher for all the other data files?

Because for our contrived example tempdb was the most utilized.

If we add some more OLTP like action the average write and read stalls would show subsystem shittyness.

Resources:

SGAM Script with Explanation

Brent Ozar’s Tempdb Explanation

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>