Extended Events Intro

来源:互联网 时间:1970-01-01

Extended Events is supposed to be taking over for most of Profiler and server-side tracing functionality, but there were people like me out there that took their time making the switch. For me the excuse was SQL Batch Completed isn’t in 2008 / 2008 R2, most databases are slow to move off of 2008 because it’s hard to say 2008 can’t do everything you want, and, well, I really like SQL Batch Completed!!!

Now I’m losing my excuses. It’s been three years since 2012 came out and all the new servers I’m installing are 2012 or 2014, so I lost that excuse. Then I came up with “I don’t know how to use them, but I know how to use something that does almost the same thing”; that’s great logic during an outage, but terrible logic for the long term. So, I’m learning it, still learning it, and I have something to help out those who made it to 2012 but are still stuck on the “I don’t know how to use them” phase of denial.

In several of my blog posts I referenced my favorite server-side trace. Capture everything on the server that takes over X seconds, with 5 being my favorite starting place. Yes, this adds about 1ms to anything that takes over 5 seconds, which adds up to…insignificant. Seriously, if this kills your server then it was dead already, but you will see arguments against creating this for long-term runs. I don’t agree with those arguments, but they’re your servers and I want you to make an informed decision on what you run.

Anyways, here’s how I set it up with Extended Events, for which I used Jonathan Kehayias’s ( b | t ) Trace to XE Converter to get started.

IF EXISTS (SELECT 1 FROM sys.server_event_sessions WHERE name = 'DurationOver5Sec')DROP EVENT SESSION [DurationOver5Sec] ON SERVER;GOCREATE EVENT SESSION [DurationOver5Sec]ON SERVERADD EVENT sqlserver.rpc_completed(ACTION ( sqlserver.client_app_name-- ApplicationName from SQLTrace, sqlserver.client_hostname-- HostName from SQLTrace, sqlserver.client_pid-- ClientProcessID from SQLTrace, sqlserver.database_id-- DatabaseID from SQLTrace, sqlserver.request_id-- RequestID from SQLTrace, sqlserver.server_principal_name-- LoginName from SQLTrace, sqlserver.session_id-- SPID from SQLTrace)WHERE (duration >= 5000000)),ADD EVENT sqlserver.sql_batch_completed(ACTION ( sqlserver.client_app_name-- ApplicationName from SQLTrace, sqlserver.client_hostname-- HostName from SQLTrace, sqlserver.client_pid-- ClientProcessID from SQLTrace, sqlserver.database_id-- DatabaseID from SQLTrace, sqlserver.request_id-- RequestID from SQLTrace, sqlserver.server_principal_name-- LoginName from SQLTrace, sqlserver.session_id-- SPID from SQLTrace)WHERE (duration >= 5000000)),ADD EVENT sqlos.wait_info(ACTION ( sqlserver.client_app_name-- ApplicationName from SQLTrace, sqlserver.client_hostname-- HostName from SQLTrace, sqlserver.client_pid-- ClientProcessID from SQLTrace, sqlserver.database_id-- DatabaseID from SQLTrace, sqlserver.request_id-- RequestID from SQLTrace, sqlserver.server_principal_name-- LoginName from SQLTrace, sqlserver.session_id-- SPID from SQLTrace)WHERE(duration > 5000 --This one is in milliseconds, and I'm not happy about that AND ((wait_type > 0 AND wait_type < 22) -- LCK_ waits OR (wait_type > 31 AND wait_type < 38) -- LATCH_ waits OR (wait_type > 47 AND wait_type < 54) -- PAGELATCH_ waits OR (wait_type > 63 AND wait_type < 70) -- PAGEIOLATCH_ waits OR (wait_type > 96 AND wait_type < 100) -- IO (Disk/Network) waits OR (wait_type = 107) -- RESOURCE_SEMAPHORE waits OR (wait_type = 113) -- SOS_WORKER waits OR (wait_type = 120) -- SOS_SCHEDULER_YIELD waits OR (wait_type = 178) -- WRITELOG waits OR (wait_type > 174 AND wait_type < 177) -- FCB_REPLICA_ waits OR (wait_type = 186) -- CMEMTHREAD waits OR (wait_type = 187) -- CXPACKET waits OR (wait_type = 207) -- TRACEWRITE waits OR (wait_type = 269) -- RESOURCE_SEMAPHORE_MUTEX waits OR (wait_type = 283) -- RESOURCE_SEMAPHORE_QUERY_COMPILE waits OR (wait_type = 284) -- RESOURCE_SEMAPHORE_SMALL_QUERY waits--OR (wait_type = 195) -- WAITFOR)))ADD TARGET package0.event_file(SET filename = 'DurationOver5Sec.xel',max_file_size = 10,max_rollover_files = 5)WITH (MAX_MEMORY = 10MB, MAX_EVENT_SIZE = 10MB, STARTUP_STATE = OFF, MAX_DISPATCH_LATENCY = 5 SECONDS, EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS);ALTER EVENT SESSION DurationOver5SecON SERVERSTATE = START; What’s it all mean?

This captures all SQL Batch Completed and RPC Completed events that took over 5 seconds along with any waits that took over 5 seconds. Seems kind simple and much easier to read than the script to create a trace, but there are some things to point out here.

Duration – Milliseconds or Microseconds File Location Restart on server reboot

First, duration, is it milliseconds or microseconds? Trick question, it’s both!!! Ola Hallengren ( b | t ) opened This Connect Item which resulted in Microsoft apologizing and saying they’ll fix it in SQL 2016. They can “fix” it one of two ways. Either they’ll document them as being different scales and your script will still work without any modifications, or they’ll make them all be the same scale and the script I have here will behave differently on different versions of SQL Server. Anyways, the script I have here is 5 seconds all around, which is 5,000,000 microseconds for the SQL and 5,000 milliseconds for the waits.

Continuing on the duration confusion, the wait info’s duration still isn’t what you might think it is. This is the duration for a single instance of the wait, not cumulative waits for the duration of a specific query. If you made a loop that executed 10 times calling a wait for delay of 1 second each time then those are 10 separate 1 second waits. More important, if your query is doing physical reads from disk then it’s a ton of small PageIOLatch waits, not one large one this session will see.

The rest of this isn’t nearly as confusing, I promise.

The file location I’m using is just a filename without a path, which will default to where your error log files are. It’s going to be a much more complex name once SQL Server is done with it, and not just a underscore with a rollover count like server-side traces. However,it will start how it’s written here and you can use wildcards to say which files you want to read.

Now when you restarted the server, or just SQL services, with server-side traces you’d have to manually startthat trace again if you wanted it to run all the time. This is exactly how the script I have here works, too. However, Extended Events also added “Startup State” which means it will start when the server starts. The only issue I have with this is that it will ONLY start when the services start, so the session you define now won’t start until the next restart. That’s all fine and good if you’re a patient person….it’s not all fine and good for me.

The other minor details I can see having questions on are on the Max Dispatch Latency and Event Retention Mode. These are limits on how it writes to the file location I’m using. Max Dispatch means that SQL Server will write it to the output file within that many seconds after the event actually happens, so it’s obviously an asynchronous action. Event Retention Mode can give SQL Server permission to say it’s too overworked to write the entries and skip them if needed, and I chose to tell SQL Server to drop as many as it wants to if I’d be slowing it down. It’s all documented in BOL, and it’d be good for you to read through that instead of just listening to me picking out select details.

So, how do you read this? It’s going to put it all in XML, isn’t it!?!?

If you can read this, Kendra Little ( b | t ) may suggest you’re a demon robot ! Well, she’s probably right in calling this next script names because friendly robots don’t treat your CPU like this. In fact, if you’re on a stressed server or want to do a lot of analysis on this, it’s going to be best to copy the .xel files to a server you care a little less about and shred the XML there.

Unfortunately, I wrote my script to automatically find the files and add a wildcard character, so you’d have to modify my script to run it on another server. The point is getting you started with as little frustration as possible, and this is perfect for that reason. The biggest catch is that the session has to be running for this to work without changes because it’s grabbing the full file name with path from the DMVs for running sessions.

Anyways, here goes….

DECLARE @SessionName SysName , @TopCount Int = 1000--SELECT @SessionName = 'UserErrors'SELECT @SessionName = 'DurationOver5Sec'--SELECT @SessionName = 'system_health'/* SELECT * FROM sys.tracesSELECT Session_Name = s.name, s.blocked_event_fire_time, s.dropped_buffer_count, s.dropped_event_count, s.pending_buffersFROM sys.dm_xe_session_targets tINNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_addressWHERE target_name = 'event_file'--*/SET STATISTICS IO, TIME ONIF OBJECT_ID('tempdb..#Events') IS NOT NULL BEGINDROP TABLE #EventsENDIF OBJECT_ID('tempdb..#Queries') IS NOT NULL BEGINDROP TABLE #Queries ENDDECLARE @Target_File NVarChar(1000), @Target_Dir NVarChar(1000), @Target_File_WildCard NVarChar(1000)SELECT @Target_File = CAST(t.target_data as XML).value('EventFileTarget[1]/File[1]/@name', 'NVARCHAR(256)')FROM sys.dm_xe_session_targets tINNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_addressWHERE s.name = @SessionNameAND t.target_name = 'event_file'SELECT @Target_Dir = LEFT(@Target_File, Len(@Target_File) - CHARINDEX('/', REVERSE(@Target_File))) SELECT @Target_File_WildCard = @Target_Dir + '/' + @SessionName + '_*.xel'--SELECT @Target_File_WildCardSELECT TOP (@TopCount) CAST(event_data AS XML) AS event_data_XMLINTO #EventsFROM sys.fn_xe_file_target_read_file(@Target_File_WildCard, null, null, null) AS FORDER BY File_name DESC, file_offset DESC SELECT EventType = event_data_XML.value('(event/@name)[1]', 'varchar(50)'), Duration_sec = CAST(event_data_XML.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT')/CASE WHEN event_data_XML.value('(event/@name)[1]', 'varchar(50)') LIKE 'wait%' THEN 1000.0 ELSE 1000000.0 END as DEC(20,3)) , CPU_sec = CAST(event_data_XML.value ('(/event/data[@name=''cpu_time'']/value)[1]', 'BIGINT')/1000000.0 as DEC(20,3)), physical_reads_k = CAST(event_data_XML.value ('(/event/data [@name=''physical_reads'']/value)[1]', 'BIGINT')/1000.0 as DEC(20,3)), logical_reads_k = CAST(event_data_XML.value ('(/event/data [@name=''logical_reads'']/value)[1]', 'BIGINT') /1000.0 as DEC(20,3)), writes_k = CAST(event_data_XML.value ('(/event/data [@name=''writes'']/value)[1]', 'BIGINT')/1000.0 as DEC(20,3)), row_count = event_data_XML.value ('(/event/data [@name=''row_count'']/value)[1]', 'BIGINT'), Statement_Text = ISNULL(event_data_XML.value ('(/event/data [@name=''statement'']/value)[1]', 'NVARCHAR(4000)'), event_data_XML.value ('(/event/data [@name=''batch_text'' ]/value)[1]', 'NVARCHAR(4000)')) , TimeStamp = DateAdd(Hour, DateDiff(Hour, GetUTCDate(), GetDate()) , CAST(event_data_XML.value('(event/@timestamp)[1]', 'varchar(50)') as DateTime2)), SPID = event_data_XML.value ('(/event/action [@name=''session_id'']/value)[1]', 'BIGINT'), Username = event_data_XML.value ('(/event/action [@name=''server_principal_name'']/value)[1]', 'NVARCHAR(256)'), Database_Name = DB_Name(event_data_XML.value ('(/event/action [@name=''database_id'']/value)[1]', 'BIGINT')), client_app_name = event_data_XML.value ('(/event/action [@name=''client_app_name'']/value)[1]', 'NVARCHAR(256)'), client_hostname = event_data_XML.value ('(/event/action [@name=''client_hostname'']/value)[1]', 'NVARCHAR(256)'), result = ISNULL(event_data_XML.value('(/event/data [@name=''result'']/text)[1]', 'NVARCHAR(256)'),event_data_XML.value('(/event/data [@name=''message'']/value)[1]', 'NVARCHAR(256)')), Error = event_data_XML.value ('(/event/data [@name=''error_number'']/value)[1]', 'BIGINT'), Severity = event_data_XML.value ('(/event/data [@name=''severity'']/value)[1]', 'BIGINT'), EventDetails = event_data_XML INTO #QueriesFROM #EventsSELECT q.EventType, q.Duration_sec, q.CPU_sec, q.physical_reads_k, q.logical_reads_k, q.writes_k, q.row_count, q.Statement_Text, q.TimeStamp, q.SPID, q.Username, q.Database_Name, client_app_name = CASE LEFT(q.client_app_name, 29)WHEN 'SQLAgent - TSQL JobStep (Job 'THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(q.client_app_name,32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(q.client_app_name, 67, len(q.client_app_name)-67)ELSE q.client_app_nameEND , q.client_hostname, q.result, q.Error, q.Severity, q.EventDetailsFROM #Queries q--WHERE eventtype NOT IN /*rather typical filtering*/ ('security_error_ring_buffer_recorded', 'sp_server_diagnostics_component_result', 'scheduler_monitor_system_health_ring_buffer_record')--AND eventtype NOT IN /*specific troubleshooting filtering*/ ('connectivity_ring_buffer_recorded', 'wait_info')ORDER BY TimeStamp DESC

The top is all I typically edit. The Session Name is ‘DurationOver5Sec’ for this purpose, but as you can see it can also read some of the information from the system_health session. The TopCount is just saying I’m only going to look at the most recent 1,000 records BEFORE I shred andfilter it, mostly because I love my CPU and this changes how much XML I’m throwing at it.

No matter how I write this to filter before playing with XML or try to be 100% safe, my scripts will not be perfect for you. In fact they may crash your server. Unfortunately, I’m only half kidding, because nothing is 100%. The space is limited to 50 MB, the overhead of writing to this file should be fine since we’re only working with events that took over 5 seconds asynchronously with data loss allowed, so I don’t foresee any issues in most cases unless you change these numbers. However, reading them gets expensive and you have to use the most caution here. Keep in mind that you can put the .xel files on any server with SQL installed, even developer edition, and query it.

Why’d I write this and what else is going through my head?

The inspiration for this post actually came from a question a former coworker asked me quite a while ago, just seeing if I knew a lot about Extended Events. At the time the technically correct answer was “I don’t know”, but instead I told him that although I didn’t have experience with Extended Events that he could look at Jonathan Kehayias’s( b | t ) blog, specifically his XEvent A Day Series . However, I also warned him that Jonathan’s work tends to go deeper than a beginner is prepared for and that he should start with his coworker,Erin Stellato’s ( b | t ), version that is easier to start with, specifically her Making the Leap From Profilers to Extended Events sessionshe made available on YouTube.This actually inspired two blog posts, I just haven’t written the How to Say “I don’t know” Better post yet.

Have fun, experiment (with queries), and ask questions.

Filed under: Monitoring , SQL Server , Troubleshooting , Tuning Tagged: baseline data , monitoring , Profiler , Script , XEvent %20%20 %20%20%20%20



相关阅读:
Top