With a little help from SQL Server’s tracing tools you can easily get to the bottom of almost all login failures (18456 errors).
If the login failure is against SQL Server 2005 or above and the default trace is running just go to my SQL Server default trace post.
If you’d prefer to use extended events (as SQL Profiler is now deprecated), go to Capturing login failures via SQL Server Audit.
Otherwise the technique being described below is pretty much version independent, but assumes you know how to use SQL Server Profiler and capture a trace.
SQL Server 18456 errors
A login failure will throw an 18456 error and will be accompanied by the following entry in the SQL Server error log (SQL Server 2000 does not display the IP address):
2009-01-15 09:40:24.55 Logon Error: 18456, Severity: 14, State: 8. 2009-01-15 09:40:24.55 Logon Login failed for user ‘Domain\User’. [CLIENT: xxx.xxx.xxx.xxx]
The severity of the error indicates the seriousness of the error. A severity level of 14 indicates an error in the range described as user correctable, which is understandable for login failures. The next item of information the error provides is the state number. Most errors have a state number associated with them which provides further information which is usually unique to the error that has been thrown. For a login error, state: 8, shown in the above example, indicates an invalid password was used.
Watch out for the most common failure whereby a user will specify a SQL Authentication in their client and enter a Windows account as the login, or vice-versa; this will show up clearly as the cause in the error log.
These cases are usually down to the user trying to connect as a different AD account to the one they are logged in as. Point them to RunAs.
The state number therefore provides invaluable information about the reason for the login failure and can often be enough to identify the cause of an 18456 error.
The table below illustrates what some of these state values mean:
Login failure error codes
Microsoft’s Bob Dorr has an excellent post giving details of the failure IDs in 18056 errors which map to the cryptic state code which contain the root cause of the failure, which should fill in the gaps if you’re looking for login failures against older versions of SQL Server. The state codes with Login in the name relate to 18456 errors and those with Redo in the name relate to 18056 (connection pooling related) errors.
If it’s a SQL Server 2008 (or onwards) server you’re in luck as the state information is now (finally!) dumped into the SQL Server error log.
The next item of information is the login (SQL Server or Windows) generating the failure, followed by the IP address of the host from which the login was attempted. The IP address provides a useful cross-reference to confirm we’re looking at the right host when we’re trying to isolate the login failure.
As an aside, a classic login failure is the following 18452 error:
Logon Error: 18452, Severity: 14, State: 1. Logon Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. [CLIENT: xxx.xxx.xxx.xxx]
These are just due to the underlying account having changed its password whilst still being logged on to SQL Server.
Isolating the login failure
If the information provided in the error log is not enough to isolate the source or the cause of the error, the next step is to trace it.
If you’re running SQL Server 2005 or above you’re in luck. The default trace was introduced in this version and is enabled by default. Check out my SQL Server default trace post instead and/or just run the query below:
select te.name as [event], e.applicationname, e.textdata, e.spid, e.starttime, e.databasename as db, e.loginname as [login], e.hostname as host, e.clientprocessid as pid, (select [path] from sys.traces where is_default = 1 and is_shutdown=0) as tracefile from fn_trace_gettable((select [path] from sys.traces where is_default = 1 and is_shutdown=0), default) e inner join sys.trace_events te on e.eventclass=te.trace_event_id where e.eventclass = 20 order by e.starttime
This will display any login failures from the current set of trace files along with the name of the trace file(s) those errors were captured from. Save off those trace files before they got rolled over and the information gets lost.
Figure 1: Login failures captured in th SQL Server default trace
If the default trace is disabled, or the information is no longer held in the current batch of default trace files, or you have an earlier version of SQL Server, then read on.
Creating a trace template
Start SQL Server Profiler (SSP), and, using either your favourite trace template, or via a new trace template (File > Templates > New Templates…) make sure the following events are selected:
- Audit Login Failed (under Security Audit)
- Errorlog (under Errors & Warnings)
- User Error Message *
* User Error Message is required for older versions of SQL Server where e.g. the login failure is due to an invalid/offline database name being specified as the default database because pre-SQL Server 2012 the invalid/offline database name was not printed in the errorlog.
Clear the Show All Events tick box to reduce the clutter, then make sure the following columns are selected:
- StartTime **
These columns can be found in the Trace Properties dialog on the Events Selection tab. If they are not visible, tick the Show all columns checkbox. Note that SPID is always a selected column by default, and cannot be de-selected.
**Adding StartTime to the list may be useful here in order to help isolate login failures that only occur at certain times.
When done, clear the Show all columns tick box and you’ll end up with something like the template show nin Figure 2 below.
This minimal set of events will help ensure that any performance impact from running the trace can be kept to a minimum.
Next we need to filter for the login error. Click on the Column Filters… button and under Edit Filter select Error and on the right hand side under Equals enter 18456 as the error number (Figure 2). This will filter out unwanted benign errors like 5701 and 5703 that tend to pervade a lot of systems:
Figure 2: Filtering in only 18456 login failure errors
Finally, click on the Organize Columns… button to re-order the columns. The order is a personal preference but I tend to have it as follows: EventClass, StartTime, Hostname, ClientProcessID, Error, NTUserName, LoginName, NTDomainName, DatabaseName, ApplicationName and so on. Figure 3 below shows a completed trace template:
Figure 3: A completed SQL Profiler trace template to narrow down 18456 errors.
It might look a bit sparse, but we are only interested in a specific error.
On a production system it’s never really advisable (imho) to run a graphical SSP trace on the server (although I do admit to having fallen off my high-horse on that front in the past). If the login failure being tracked happens once every few days or so, always use a server side trace which you can just set and forget, but as this is a very lightweight trace its impact on the server in most environments should be negligible.
Save the modified trace template and make a note of the name it was saved as.
Launching a trace to trap SQL Server error 18456
Launch a new trace after making sure you select that newly saved template as the template for the new trace. With the trace launched, just wait for the login failures. Stop the trace after a login failure has been generated.
The Hostname column should have recorded the name of the server that the invalid login emanated from and the ClientProcessID should have captured the Process ID, or PID of the offending process (or processes if there are multiple processes involved). Please note: if the connection is via JDBC no client PID will get picked up and it will just report a value of 123.
Log on to the server triggering the errors, and list the PIDs of the relevant processes. This can be done using Tasklist or Task Manager. To view the PIDs via Task Manager, start Task Manager (Shift+Ctrl+Esc), go to View > Select Columns… and tick the checkbox labelled PID (Process Identifier) and click OK.
Figure 4 below shows this Task Manager option:
Figure 4: How to show a process’s PID
Click on the Processes tab to bring all the processes running on that server into view (make sure Show all processes from all users is ticked) and click on the PID column heading to sort the PIDs in descending or ascending order.
If the processeses are refreshing too quickly, we can slow down or stop them being refreshed via another menu option: View > Update speed and set it to Low or Paused (don’t forget to reset it afterwards).
Once you’ve isolated the process responsible via the PID it should just be a matter of identifying where that process stores the credentials it uses for logging into SQL Server and verifying them. Usually, the process will be a service, so it’s just a question of bringing up the Services plugin via Control Panel, or Start > Run > services.msc should also do the trick.
That’s it, so happy hunting!
Note: If the login failure is emanating from a Java based client, the Application Name and ClientProcessID will not be captured, and they will show up with default labels, which will be jTDS for ApplicationName and 123 as the ClientProcessID.
[EDIT: In the brave new world that exists from SQL Server 2012 SQL Profiler has been deprecated and replaced by Extended Events.
Auditing related events, including login failures, have been transferred to the SQL Server Audit component which is covered in the Capturing login failures via SQL Server Audit post.]