Wednesday, March 13, 2013

Active Directory server unavailable causes VMware SSO failure

Had an interesting issue today with VMware Single Sign On (SSO) Server where it failed to authenticate connections. I saw the following in the SSO log located at C:\Program Files\VMware\Infrastructure\SSOServer\logs\ssoAdminServer.
[2013-03-13 07:41:49,575 ERROR opID=10d57a6f-20e6-4621-ba44-e90ae7cd8751 pool-31-thread-7  com.vmware.vim.sso.admin.vlsi.PrincipalDiscoveryServiceImpl] Error connecting to the identity source
com.rsa.common.ConnectionException: Error connecting to the identity source
 Caused by: javax.naming.NamingException: getInitialContext failed. javax.resource.spi.ResourceAdapterInternalException: Unable to create a managed connection 'ldaps://<REDACTED>:3269' with 'GSSAPI' Reason: javax.resource.spi.ResourceAdapterInternalException: Unable to create managed connection <REDACTED>:3269 [Root exception is javax.resource.spi.ResourceAdapterInternalException: Unable to create a managed connection 'ldaps://<REDACTED>:3269' with 'GSSAPI' Reason: javax.resource.spi.ResourceAdapterInternalException: Unable to create managed connection <REDACTED>:3269]
 Caused by: javax.resource.spi.ResourceAdapterInternalException: Unable to create a managed connection 'ldaps://<REDACTED>:3269' with 'GSSAPI' Reason: javax.resource.spi.ResourceAdapterInternalException: Unable to create managed connection <REDACTED>:3269
 Caused by: javax.resource.spi.ResourceAdapterInternalException: Unable to create managed connection <REDACTED>:3269
 Caused by: javax.naming.CommunicationException: <REDACTED>:3269 [Root exception is java.net.ConnectException: Connection timed out: connect]
 Caused by: java.net.ConnectException: Connection timed out: connect
What's interesting is that we have two Active Directory servers specified in the SSO configuration, Server-A and Server-B.  There were only errors for Server-B in the SSO log.  Server-B is currently unavailable, currently undergoing an upgrade to Windows Server 2012.

Seeing the errors in the log I restarted the VMware SSO Service. Within a few minutes the service was running and working again. Logins were working again and I could see that Server-A was being used.

Curious, I checked our SSO settings for the domain and confirmed that Server-A was set to "Primary" and Server-B was "Secondary". Why didn't SSO automatically switch to using Server-A if Server-B could not be contacted?

Not sure why Server-A was not automatically used but the log clearly indicated the error and restarting the SSO service worked.