Testing IM and Web-Conferencing Archiving set to Crtitical

March 31, 2017, 11:00 pm

≫ Next: After an in-place upgrade to Skype for Business Server 2015 Response Group Workflows Fail to work

≪ Previous: Understanding Monitoring and Archiving on Lync Server 2013

Organizations often chose to enable IM Archiving for multiple reasons, while some may be for record keeping purposes, others may have a regulatory /compliance requirement to ensure IM Archiving is occurring for every IM and Web-Conferencing Session.

When an organization is archiving for regulatory /compliance purposes, it may be possible that they are required to stop the service, if Archiving is failing. In Lync Server 2013 and Skype for Business Server 2015, we offer this by means of a setting in the commandlet Set-CSArchivingConfiguration called BlockOnArchiveFailure

Parameter	Required	Type	Description
BlockOnArchiveFailure	Optional	System.Boolean	If True, then the IM service will be suspended any time instant messages cannot be archived. If set to False (the default value), IM will continue even if instant messages cannot be archived.

This can also be accessed from the Control Panel and would look similar-to the image below.

IM Archiving set to Critical

Just as organizations perform Disaster Recovery Exercises / Routines, to validate that their infrastructure works as intended, and the organization ( or Organizational unit) is prepared with up to date documentation, if, a Disaster event occurs, organizations may also want to test and/or prove that IM and Web-Conferences would fail, if archiving was to fail.

With Lync Server 2013 and Skype for Business Server 2015, proving that IM and Web-Conferencing would stop, if Archiving was to fail can be a little challenge. Here’s why

Challenge#1
If the Archiving Database is Offline, Lync will export storage data to Web-Service File-share (for example \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\DataArchive\20161122\LyncStd01.contoso.com\)

Challenge#2
If the Archiving Database is offline, and the Web-Services File-share has not access we would see EVENT ID 32080 and the System would fail-back to C:\ProgramData\Microsoft\Skype for Business Server 2015\StorageService

Challenge #3
If the Archiving Database is offline, and the Web-Services File-share has not access we would see EVENT ID 32080 and access to the path C:\ProgramData\Microsoft\Skype for Business Server 2015\StorageService is also blocked. The local Database can have 5,000 Items or upto 10 GB ( SQL Express Limitation)

The challenges mentioned above, can certainly make it ceretainly operationally challenging to undo. There can be a lot of delay in undoing the efforts, which can cause of productivity.

Solution #1
Stop LyncLocal Instance on all Lync Front-End Server in the pool, where we want to simulate a failure. Since this is rather common solution, people might want to introduce another solution.

Solution #2
Set the LySS Database offline in SQL, so all access from a communications server is blocked. This can be accomplished by running the following query on each Front-End Server

ALTER DATABASE LySS SET OFFLINE WITH ROLLBACK IMMEDIATE

As soon as this is completed on an Enterprise Edition Pool or a Standard Edition pool, IM messages will stop transmitting from the pool. Presence will still be available, but both IM and Web-Conferencing would be failing.

In-order to bring services back to business as usual, one will have to bring the database online by running

ALTER DATABASE LySS SET ONLINE

Once the databases in your routing group is online, you will be able use IM and Web-Conferencing again.

Here are some-event logs, which may be useful during testing. I am adding them so the web-page is indexed, and administrators come to an authoritative source, when they search for EVENT ID’s or Descriptions.

EVENT ID	Source	Event ID Description
56717	LS Data Collection	IM was blocked in critical archiving mode due to local Storage Service is full or unavailable. Cause: Storage Service or its dependent components are not running. Resolution: Ensure the local Storage Service database is not full and target storages such as SQL Server or Exchange Server are available.
56800	LS Data Collection	Failed to commit session data into the local Storage Service database. Error: SessionUpdateException: code=Success, reason=, Unable to finalize session, no session items removed, no new items enqueuedat Microsoft.Rtc.Internal.Storage.Queue.LyssQueueDal.FinalizeSession(StoreContext ctx, Guid adapterID, HashSet`1 sessionIDs, List`1 queueItemList)at Microsoft.Rtc.Server.UdcAdapters.UcSessionAdapter.WrapperFinalizeSession(StoreContext ctx, LyssQueueDal dal, HashSet`1 sessionIds, List`1 queueItems)at Microsoft.Rtc.Server.UdcAdapters.UcSessionAdapter.FinalizeSession(StoreContext ctx, LyssQueueDal dal, HashSet`1 sessionIds, List`1 persistItems)at Microsoft.Rtc.Server.UdcAdapters.UcSessionAdapter.PersistSession(StoreContext ctx, LyssQueueDal dal, SessionState entry, Boolean isCriticalMode)Cause: Storage Service or its dependent components are not running. Resolution: Ensure the local Storage Service database is not full and target storages such as SQL Server or Exchange Server are available.
32042	LS Storage Service	Storage Service API failed to add a message to the queue. Add Queue Message failure. EnqueueException: code=ErrorQueueUnhealthy, reason=Unable to Enqueue Message: Storage Queue is not healthy due to errors: Storage Service Database is full. . Please retry later. at Microsoft.Rtc.Internal.Storage.Api.StorageService.BeginEnqueueMessages(EnqueueMessagesRequest enqueueMessagesRequest, AsyncCallback asyncCallback, Object state) Cause: Authentication or authorization failure, bad input parameters, fabric errors, timeouts, other errors. Resolution: Check event details. Ensure that the caller of Storage Service is properly authenticated using windows authentication, and has the required authorization based on security group membership. Verify that inputs were valid. If problem persists, notify your organization’s support team with the event detail.
32008	LS Storage Service	Unexpected exception. Message=Error: Path \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\DataArchive\20161122\LyncStd01.contoso.com\ failed to be read for flushed data. Error details: System.IO.IOException: The network path was not found. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileSystemEnumerableIterator`1.CommonInit() at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost) at System.IO.Directory.GetFiles(String path, String searchPattern, SearchOption searchOption) at Microsoft.Rtc.Internal.Storage.Sql.LyssDal.CheckFilePathForFlushedFiles(StoreContext ctx, String parentFilePath, Boolean checkArchived, Boolean& errorOccurred, Int32& numDataFilesToReport) Exception: The network path was not found. Stack Trace: at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileSystemEnumerableIterator`1.CommonInit() at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost) at System.IO.Directory.GetFiles(String path, String searchPattern, SearchOption searchOption) at Microsoft.Rtc.Internal.Storage.Sql.LyssDal.CheckFilePathForFlushedFiles(StoreContext ctx, String parentFilePath, Boolean checkArchived, Boolean& errorOccurred, Int32& numDataFilesToReport) Cause: Unexpected exception. Resolution: If problem persists, notify your organization’s support team with the event detail.
32013	LS Storage Service	Cannot perform a LYSS database operation. Message=#CTX#{ctx:{traceId:18446744072925107599, activityId:”c0af1230-6791-473f-a13a-76795835de80″}}#CTX# FinalizeSession sproc failed: SprocNativeError = [1105] Exception: System.Data.SqlClient.SqlException (0x80131904): Could not allocate space for object ‘dbo.ItemQueue’.’CL_ItemQueue’ in database ‘lyss’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds, Boolean describeParameterEncryptionRequest) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader() at Microsoft.Rtc.Common.Data.DBCore.Execute(SprocContext sprocContext, SqlConnection sqlConnection, SqlTransaction sqlTransaction) ClientConnectionId:8d59a7be-4c40-4747-9d00-33b889057e0c Error Number:1105,State:2,Class:17 Stack Trace: at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds, Boolean describeParameterEncryptionRequest) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader() at Microsoft.Rtc.Common.Data.DBCore.Execute(SprocContext sprocContext, SqlConnection sqlConnection, SqlTransaction sqlTransaction) Cause: Cannot perform an LYSS database operation. Resolution: Verify that the data is valid and that the LYSS database is available and healthy. If this error caused by Violation of UNIQUE KEY constraint ‘CL_ItemQueue’, then most likely it is due to at attempt to load a duplicate item from the file share. If so, find flushed xml file that contains duplicated key and move the xml file to somewhere else. In addition, please verify the file share is healthy.
32059	LS Storage Service	Space Used by Storage Service DB is at or above the Critical Threshold. SQL Edition=Express Edition (64-bit); Space Used Percent=87.5; Critical Threshold Percent=80 queue item counts summary: owned: True, status: 2, critical: True, count: 356 Total queue items: 356, total archived items: 0 Cause: The DB size can grow bigger under heavier usage as the data in the Storage Service Queue and/or Cache grows. Once Storage Service finishes processing the data, the db will shrink back to normal size. However breaching the critical threshold implies that the normal processing of the data is slow or blocked resulting in so much excessive DB growth that service functionality is now affected and blocked. Resolution: Check event details to find the root cause of why data is not getting processed. Resolve the root cause to allow Storage Service to start shrinking the DB down naturally. If problem persists, notify your organization’s support team with the event details.
32089	LS Storage Service	A flush of queue items from the Storage Service DB was initiated, and items were exported to the file system. Queue size: Error, flushed 1 files to the filesystem. success: True. Files: \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\DataArchive\20161122\LyncStd01.contoso.com\e1dc38d13ed15269b601a5460e8f9631__1.xml Cause: Periodically, or in reaction to the size of the Storage Service queue, we may purge items from the database, exporting them to the file system in order to ensure performance isn’t impacted due to the accumulation of data. These items should be re-imported after the root cause of the accumulation is resolved. Typically this would occur due to an outage of a data storage endpoint (like Exchange), or could be due to a sustained period of high system load. Resolution: The resource kit tool is available to import exported items back into the DB for processing.
32090	LS Storage Service	Flushed queue Items from the Storage Service DB have been left unattended to for some amount of time and require attention to be imported back. Parent Path \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\. 112 data files are over 5 days old. Cause: Periodically, or in reaction to the size of the Storage Service Queue, we may purge items from the database, exporting them to the file system in order to ensure performance isn’t impacted due to the accumulation of data. These items should be re-imported after the root cause of the accumulation has been resolved. Typically this would occur due to an outage of a data storage endpoint (like Exchange), or could be due to a sustained period of load
32080	LS Storage Service	A queue flush operation has encountered a file error. Preliminary primary fileShareName parameter: \\contoso.com\LyncFileShare\1-WebServices-1\StorageService is unusable. Exception: System.IO.DirectoryNotFoundException: Failed to get DirectoryInfo of \\contoso.com\LyncFileShare\1-WebServices-1\StorageService at Microsoft.Rtc.Internal.Storage.Sql.LyssDal.ValidateFileShareName(StoreContext ctx, String fileShareName, String timestamp, LyssDBUsageStatus usageLevel, Boolean isTenantMigration) Cause: There may be permission issues to the file share, local file location, temporary directory, or disk is full. Resolution: Please check event detail and trace log for more information. Please ensure there is write permission to required file locations.

References:

· Import Storage Service Data

· Archiving Options in Lync Server 2013

· The LCSLog SQL Database is not logging any archiving content

· Understanding Monitoring and Archiving on Lync Server 2013

↧

After an in-place upgrade to Skype for Business Server 2015 Response Group Workflows Fail to work

April 10, 2017, 4:00 am

≫ Next: Understanding User Replicator in Lync Server 2013 and Skype for Business Server 2015

≪ Previous: Testing IM and Web-Conferencing Archiving set to Crtitical

In a peculiar case, we noticed that after an in-place upgrade from Lync Server 2013 to Skype for Business Server 2015, Response Group Workflows stopped working. The initial troubleshooting involved verifying (i) that the Response Group Service is starting, (ii) the work-flow is defined in Skype for Business Server 2015 after the upgrade, (iii) the AD Object for the workflow exists.

To troubleshot further, as a simple step, we deleted and re-created the workflow, and could reproduce the error. From the logs on client, we could see 500 Internal Server Error, so I wanted to first make sure the permissions on AD are set correctly based on Issue: Calls to certain Response groups fail with an Error “500 Internal Server Error” . Once I could verify that the issue was indeed not to do with AD permissions, I started to investigate further.

Looking through the built-in scenarios, a scenario called RGS exists, but I wasn’t sure, if the components would suffice.

Description	Centralized Logging Tool Scenario	Logging Tool Components
Response Group Service	RGS	RgsClientsLib

I decided to create a custom scenario, call it RGSCustom and perform logging. I used the below commands to create my custom scenario

$SIPStack = New-CsClsProvider -Name “SIPStack” -Type “WPP” -Level “Debug” -Flags “All”
$S4 = New-CsClsProvider -Name “S4” -Type “WPP” -Level “Debug” -Flags “All”
$RGSClientLib = New-CsClsProvider -Name “RGSClientLib” -Type “WPP” -Level “Debug” -Flags “All”
$RgsCommonLibrary = New-CsClsProvider -Name “RgsCommonLibrary” -Type “WPP” -Level “Debug” -Flags “All”
$RgsDatastores = New-CsClsProvider -Name “RgsDatastores” -Type “WPP” -Level “Debug” -Flags “All”
$RgsDeploymentApi = New-CsClsProvider -Name “RgsDeploymentApi” -Type “WPP” -Level “Debug” -Flags “ALL”
$RgsDeploymentLibrary = New-CsClsProvider -Name “RgsDeploymentLibrary” -Type “WPP” -Level “Debug” -Flags “ALL”
$RgsDiagnostics = New-CsClsProvider -Name “RgsDiagnostics” -Type “WPP” -Level “Debug” -Flags “All”
$RgsHostingFramework = New-CsClsProvider -Name “RgsHostingFramework” -Type “WPP” -Level “Debug” -Flags “ALL”
$RgsMatchMakingService = New-CsClsProvider -Name “RgsMatchMakingService” -Type “WPP” -Level “Debug” -Flags “All”
$RgsDBSyncAgent = New-CsClsProvider -Name “RgsDBSyncAgent” -Type “WPP” -Level “Debug” -Flags “ALL”

New-CsClsScenario -Identity “Global/RGSCustom” -Provider @{Add=$SIPStack,$S4,$RGSClientLib,$RgsCommonLibrary,$RgsDatastores,$RgsDeploymentApi,$RgsDeploymentLibrary, $RgsDiagnostics, $RgsHostingFramework, $RgsMatchMakingService, $RgsDBSyncAgent}

Next it was time to start logging and while we were logging to restart the Response Group Service, so that I would have any failure encountered during service state captured

Start-CsClsLogging -Scenario “RGSCustom” -Pools SFBSTD01.contoso.com
Stop-CsClsLogging -Scenario “RGSCustom”
Search-CsClsLogging -Pools SFBSTD01.contoso.con -OutputFilePath “C:\Windows\Temp\RGSCustom.txt” -StartTime (get-date).AddMinutes(-30)

As I was investigating the CLS logs, I noticed the a peculiar error “Message: CALLCONTROL: Call declined because CallControl is not started”

TL_INFO(TF_COMPONENT) [SFBSTD01\SFBSTD01] 208D8.4700::05/03/2016-19:46:01.484.0000A42D (RgsCommonLibrary,RgsLogMessage.ReportMessageInternal:rgslogmessage.cs(452))
_rgs_message_begin_
Direction: Incoming
From: sip:sritodi@contoso.com
To: sip:sampleworkflow@contoso.com
Message: CALLCONTROL: Call declined because CallControl is not started
WorkflowId: c338e8ba142042b9b30023269d29daa0
_rgs_message_end_

I decided to look through the event logs when the service was starting and noticed EVENT ID 31067

This sounded a little strange as I was aware that every Lync Server 2013 pool has 2 Application Contacts, one RGS Presence Watcher and another RGS Announcement Service. This event was speaking about 3 RGS Presence Watcher Contact objects.

I decided to query the number of RGS Presence Watcher contact objects

Get-CsApplicationEndpoint | Where-Object DisplayName -eq “RGS Presence Watcher” | Ft Identity, DisplayName, RegistrarPool

This could also be accomplished by running

Get-ADObject -Filter ‘msRTCSIP-OwnerUrn -eq “urn:application:RGS”‘ -SearchBase ‘CN=Configuration,DC=contoso,DC=Com’ -Properties msRTCSIP-PrimaryUserAddress, displayName | ft msRTCSIP-PrimaryUserAddress, displayName –AutoSize

I quickly realized that for the pools that were upgraded from Lync Server 2013 to Skype for Business Server 2015 and where calls to response group were failing, only had a single RGS Presence Watcher contact object

Solution: We decided to republish the topology by running

Enable-CsTopology -Verbose

Once topology was republished, we could see 3 RGS Presence Watcher contact objects. A service restart to the RGS service, and calls started to work as expected.

↧

Understanding User Replicator in Lync Server 2013 and Skype for Business Server 2015

April 17, 2017, 3:59 pm

≫ Next: Configuring Alternate Login ID for Skype for Business

≪ Previous: After an in-place upgrade to Skype for Business Server 2015 Response Group Workflows Fail to work

Reference: https://blogs.technet.microsoft.com/toml/2005/05/09/lcs-2005-user-replicator-faq/

This post starts of with a reference at the top, only because it is indeed a very well written blog post from a little over 10 years ago. Since LCS 2005, we have had User replicator and while a lot has changed, the principles for User Replicator is essentially the same.

User replicator runs under the Front-End Service context, rather than a different service. It now writes to the SQL Express installation on each server (RTCLocal Instance), and runs on every server in the pool. It runs on any server that has the registrar role installed.

What does User Replicator do?

User Replicator is responsible for ensuring that the Lync Server or Skype for Business Server database and Active Directory are synchronized. What this means is that any time an user object or contact object is created or modified in Active Directory, it is User Replicator’s responsibility for ensuring that the changes are propagated to database. To accomplish this, User Replicator first performs a Full-Sync (or Initial Sync) and then subscribes to a Delta Sync (Incremental Changes) using DirSync.

What setting in User Replicator are configurable ?

With Lync Server 2010 we introduced Set-CsUserReplicatorConfiguration to allow an organization to control the user replicator. Here we discuss the different switches

ReplicationCycleInterval – Since UserReplicator only tracks delta changes from the Active Directory (AD), the using a smaller replication interval like 5 minutes, ensures that the Distribution List Expansion (DL Expansion) and Address Book Web-Query (ABWQ) provide accurate information. It also allows for users to be created in Active-Directory and be provisioned in Lync or Skype for Business within minutes. It is to be noted that since we only subscribe to delta changes, the load on a domain controller is negligible.

ADDomainNamingContextList – specifies the Domains that may have user objects and contact objects, that need to be synchronized. When this is not-set, User replicator will try to locate all the different domains and perform replication. ADDomainNamingContextList can be used to exclude say an empty root domain, or a domain if it’s was used only to store computer accounts.

SkipFirstSyncAllowedDowntime – This was introduced only in Skype for Business Sever 2015. It sets the Front-End Service (RTCSrv) from pending to started, even though a the initial Sync hasn’t been completed.

DomainControllerList – This was introduced only in Skype for Business Sever 2015, and allows to specify a list of domain controllers, however, we suggest to to leave this to default. I will explain why in a little bit.

Can I control which DC’s User Replicator connects to in order to perform synchronization?

In Skype for Business Server 2015 ( not in previous versions) , while its configurable, its not recommended, because the User replicator uses a Windows API called DsGetDcName to connect to a Domain Controller. The response of the DsGetDcName API really depends on how your Active Directory Administrator has configured the AD Sites and Services in your organization. The response is either (i) An in-site Domain Controller or (ii) An out-of-site Domain Controller

It is to be noted, that an the definition of Site here is an AD Site, which is defined by a list of Subnets and should typically be a representation of your physical site.

To know which site your Lync / Skype for Business Server belongs to, all you need to do is run nltest.exe /DSGetSite from a command-prompt. If the server is not associated to a site, chances are User Replicator will connect to a less than optimal domain controller for both initial Sync and delta syncs.

If AD Sites are configured correctly, either an in-site domain controller ( if one exists) is chosen, or an out-of-site, which has the lowest cost (based on the cost configured in AD Sites and Services). If the Lync or Skype for Business Server is not a member of any AD site, then the Lync / Skype for Business Server will connect to a random domain controller, which may not even be in the same continent.

How long does the initial replication cycle typically take?

There are a number of variables that affect the length of the initial cycle, chief among them the number of objects ( User object and Contact Objects combined) being synchronized, the domain controller that was chosen, the available band-width and load on the domain controller. Assuming minimum spec hardware or better and no serious network latency/bandwidth issues, an initial cycle with 100,000 objects will take about 30 minutes. In contrast, an SBA server can be in a remote location with limited bandwidth and potentially no in-site domain controller, in such a case, the initial sync can take considerably longer.

Examples #1:
A SBA server didn’t exist in any AD Site and this caused for User Replicator Initial Sync to connect to a Domain Controller in a different Continent, with poor network connectivity, eventually taking well over 6 hours to Synchronize, causing Front-End Service to be in Starting Mode for 6+ Hours. A simple AD Site configuration change caused the service to start in ~ 45 minutes when the initial Sync was interrupted, and the service was restarted. With Skype for Business Server 2015, the SkipFirstSyncAllowedDowntime parameter for Set-csUserReplicatorConfiguration would have been useful. This is one of the many reason why we recommend not to configure the DomainControllerList parameter using Set-csUserReplicatorConfiguration

Examples #2:
In a particular case that I handled several months ago, we found that AD replication between sites was configured to occur only between 06:00 PM and 06:00 AM in 30 minute intervals. This caused users in a site to be able to communicate with a new hire almost immediately, while it took several hours ( up to 12 hours) for users on another site to view the newly created user. Once the AD replication interval was set to perform replication in 30 minute intervals, round the clock, we a newly created user was accessible in ~ 30+ minutes from both sites.

↧

Configuring Alternate Login ID for Skype for Business

May 5, 2017, 7:13 am

≫ Next: Lync Backup Service – EVENT ID 4060 – The server principal “CONTOSO\skype-pool1-FE2$” is not able to access the database “msdb” under the current security context.

≪ Previous: Understanding User Replicator in Lync Server 2013 and Skype for Business Server 2015

Please note that it is now possible to configure an alternate login ID (in some circumstances) for Skype for Business/Lync. This will most likely be of interest in cases where end users have email or SIP addresses that differ from their UPN and/or a UPN that is non-routable (jdoe@contoso.local).

You can read full requirements and other important technical details in the Windows IT Center.

Configuring Alternate Login ID

↧

Lync Backup Service – EVENT ID 4060 – The server principal “CONTOSO\skype-pool1-FE2$” is not able to access the database “msdb” under the current security context.

May 25, 2017, 2:00 am

≫ Next: Retirement of the Lync Connectivity Analyzer Tool

≪ Previous: Configuring Alternate Login ID for Skype for Business

Recently, I was working on a case with pool-pairing with a unique twist. The import status for Conferencing Module was working, but for User Module was failing . Upon looking further, I noticed EVENT ID 4060 with the following text

Log Name: Lync Server
Source: LS Backup Service
Date: 5/16/2017 11:25:48 AM
Event ID: 4060
Task Category: (4000)
Level: Error
Keywords: Classic
User: N/A
Computer: skype-pool1-FE2.contoso.com
Description:
Skype for Business Server 2015, Backup Service user store backup module failed to complete import operation.

Configurations:
Backup Module Identity:UserServices.PresenceFocus
Working Directory path:\\contoso.com\SFBShare\1-BackupService-2\BackupStore\Temp
Local File Store Unc path:\\contoso.com\SFBShare\1-BackupService-2\BackupStore
Remote File Store Unc path:\\contoso.com\SFBShare\2-BackupService-1\BackupStore

Additional Message:
Exception: Microsoft.Rtc.BackupService.ImportOperationException: Import operation (from zip archive ) is failed due to: Failed to execute stored procedure XdsQueryReplicaStatus. Native Error: 916, Exception: The server principal “CONTOSO\skype-pool1-FE2$” is not able to access the database “msdb” under the current security context.. Retriable: False. Cookie: . —> System.Data.SqlClient.SqlException: The server principal “CONTOSO\skype-pool1-FE2$” is not able to access the database “msdb” under the current security context.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.TryNextResult(Boolean& more)
at System.Data.SqlClient.SqlDataReader.NextResult()
at Microsoft.Rtc.Common.Data.DBCore.ParseResults(SprocContext sprocContext, SqlDataReader sqlReader)
at Microsoft.Rtc.Common.Data.DBCore.Execute(SprocContext sprocContext, SqlConnection sqlConnection, SqlTransaction sqlTransaction)
— End of inner exception stack trace —
at Microsoft.Rtc.BackupService.BackupModules.XdsBackupModuleBase.QueryBackupStatus()
at Microsoft.Rtc.BackupService.BackupModules.UserStoreBackupModule.GetBackupCookie()
at Microsoft.Rtc.BackupService.BackupModuleHandler.ReceiveBackupDataTask.GetBackupCookie(Boolean& isModuleInitialized)
at Microsoft.Rtc.BackupService.BackupModuleHandler.ReceiveBackupDataTask.InternalExecute()
at Microsoft.Rtc.Common.TaskManager`1.ExecuteTask(Object state)

Cause: Either network or permission issues. Please look through the exception details for more information.
Resolution:
Resolution

So I decided to collect logs the built-in scenario HADR. The scenario HADR has the following components

BackupService
PowerShell
RtcDbSyncAgent
UserServices

Since the scenario included both BackupService and UserServices, I was optimistic that I would certainly nail-down the issue. However, the logs were not leading me to any solution. It was only providing the name of the sproc XdsQueryReplicaStatus. However, this sproc exists on the Front-End Server and the Back-End Servers, so I needed more information on which database to troubleshoot.

Next I decided to collect a memory dump using the command

ProcDump.exe -ma -e System.Data.SqlClient.SqlException LyncBackupService.exe

Upon investigation, I finally found that the sproc XdsQueryReplicaStatus was connecting to the Lync Back-end Server. Now that I know the issue was with SQL, it certainly seemed like a Permissions issue, and so I double-checked the permissions with permission in a working environment. But was unable to find anything within the permissions of the databases used by Skype for Business Server 2015.

Finally, I decided to check the permissions for System Databases, and found that in the failing scenario, GUEST login was removed from MSDB (system database). So I decided to grant GUEST user the Connect permissions for MSDB database by running

USE msdb; GRANT connect TO guest; GO

Next, since this was an update to the System Database, we restarted the SQL Services, and the issue was resolved.

Resource:
You should not disable the guest user in the msdb database in SQL Server
What does the status reported by Get-CsBackupServiceStatus mean?

↧

Retirement of the Lync Connectivity Analyzer Tool

June 13, 2017, 10:22 am

≫ Next: Simplified Port Requirements for Skype for Business Online

≪ Previous: Lync Backup Service – EVENT ID 4060 – The server principal “CONTOSO\skype-pool1-FE2$” is not able to access the database “msdb” under the current security context.

As some of you may have noticed, the Lync Connectivity Analyzer tool has been retired and is no longer available for public download. We are interested in hearing how you used the tool, and what we could provide in the form of a tool that would better assist you in troubleshooting connectivity or sign-in issues.

Thanks,

Kris Korff, Sr. Supportability Program Manager

↧

Simplified Port Requirements for Skype for Business Online

June 13, 2017, 10:25 am

≫ Next: .net Framework 4.7 and Skype for Business (&Lync) Server

≪ Previous: Retirement of the Lync Connectivity Analyzer Tool

Hi all,

Please visit our Techcommunity to read Thomas Binder’s new blog on Simplified port requirements for Skype for Business Online!

Thanks,

Kris Korff, Sr. Supportability Program Manager, Skype for Business

↧

.net Framework 4.7 and Skype for Business (&Lync) Server

June 23, 2017, 11:52 am

≫ Next: Skype for Business Server Address Book Normalization Rules–Failing Normalization

≪ Previous: Simplified Port Requirements for Skype for Business Online

With the release of .net Framework 4.7, the Skype for Business/Lync Team would like to remind everyone that we generally follow the Exchange teams’ guidance when it comes to .net Framework releases…so please check their guidance here, which at this time is please DO NOT install .net Framework 4.7 on any Skype for Business or Lync servers.

We will update this thread once testing has been completed.

↧

Skype for Business Server Address Book Normalization Rules–Failing Normalization

July 7, 2017, 2:20 pm

≫ Next: EVENT ID 56208 – Resolving Issues with CDR Throttling

≪ Previous: .net Framework 4.7 and Skype for Business (&Lync) Server

Recently I was working on a Service Request, where the address book normalization was not able to normalize simple 10 Digit numbers. I tried about half a dozen normalization rules, trying to make sure that the pattern would capture the phone numbers, but kept failing continuously.

As I was reading Ken’s Blog ( See: http://ucken.blogspot.com/2015/05/skype4b-address-book-normalization.html ), I noticed that the default E164 rule was indeed missing

(UPDATE 2015-Aug-13: Hany Elkady noted in the comments below that removing the E164 rule seemed to stop AD phone numbers already stored in E.164 format from appearing in the address book. I personally haven’t seen this occur but you might want to leave that rule in place.)

I decided to rebuild the default Normalization rule by running the following and noticed that Normalization started to occur.

New-CsAddressBookNormalizationRule -Parent Global -Name ‘Generic_E164’ -Description “Generic_E164” -Pattern E164 -Translation NULL

Since then our Documentation for New-CsAddressBookNormalizationRule ( See: https://technet.microsoft.com/en-us/library/dn985803.aspx) has been updated

Note:E164 is a well-known keyword which translates to tel:+digits-that-match. If E164 is specified and the phone number doesn’t need to be translated then NULL is an expected response

and also

Note:If the pattern is E164 then NULL is a valid value for Translation since the number doesn’t need to be translated.

↧

EVENT ID 56208 – Resolving Issues with CDR Throttling

August 10, 2017, 11:00 pm

≫ Next: Understanding the relationships between UCMA Trusted Application objects

≪ Previous: Skype for Business Server Address Book Normalization Rules–Failing Normalization

In my previous blog post, I had explained what causes EVENT ID 56208 and had alluded changing the threshold as a work-around. Here is a work-around to resolve identify the issue further, and apply a work-around.

Running a simple SQL query mentioned below will be able to be provide a list of top 10 MS-Diagnostic IDs which occur the most in this environment within the last 30 days.

Use LcsCDR Go

Select Top 10 DiagnosticId, Count(DiagnosticId) as 'Frequency' From [LcsCDR].[dbo].[SessionDetails] Where (SessionIdTime >= dateadd(day, datediff(day, 0, getdate())-30, 0)) and SessionIdTime <dateadd (day, datediff(day, 0, getdate()), 0) Group By [LcsCDR].[dbo].[SessionDetails].DiagnosticId Order by Frequency Desc Go

The query can easily be modified to change the period from 30 day currently (getdate()-30) to say 1 day, 7 days, 15 days or 365 days, which would help understand if the issue has been occurring for longer periods of time, but has just reached the tipping point(s).

Let's say MSDiagID 52094 has the highest frequency, and has had this frequency only in the last 1 week, not in data collected from the last say perhaps 30 days.

Use LcsCDR Go Update dbo.MsDiagMetaData Set ThrottleLimit =20 where MsDiagId = 52094 Go

Once the threshold has been modified, you will notice considerable improvement over the next couple of hours as any data backlogged will now be committed.

Please be aware that this is not a solution, but just a work-around within Lync to prevent problems within Lync and Skype for Business. In-order to find a solution, we may need to use the find the actual reason behind it. Some of the examples are:

Why do clients report MS-DiagnosticID 52094 ? - A temporary loss in WiFi can cause this issue
Why do we have this event mostly from a few IP Addresses? - This probably can help limit the source to a location or geography
Why do PC clients only report the issue ? - Most users at the location were using PC Clients
Why do we have mobile clients not report this issue? - Mobile devices probably used data networks ( 3G/4G/ LTE networks)
Why do we have this MS-DiagnosticID only for this period of 1 week ? - A large conference was hosted last week where say a large population from the company was invited, and the temporary WiFi Set-up was suffering from issues, and an AP in particular was problematic.

Like I mentioned, I can only provide a workaround to provide relief in Lync or Skype for Business Servers and monitoring. The real issue will have to be investigated separately. Getting clear answer to above question will pretty much point you to the actual cause and its resolution.

↧

Understanding the relationships between UCMA Trusted Application objects

September 18, 2017, 10:10 am

≫ Next: Help! My Director is consuming all my resources!

≪ Previous: EVENT ID 56208 – Resolving Issues with CDR Throttling

Author: Zack Campbell, Service Engineer, Microsoft Skype for Business Online Services

I was recently engaged by the owner of multiple high-visibility and business-critical UCMA Trusted Applications, requesting my assistance to replace the Trusted App Computers associated to a large list of Endpoints. I didn't know the full backstory, but apparently their servers were VMs, hosted to Hyper-V hosts which they were under a hard deadline to vacate. Anyway, the initial request seemed simple enough, but as I dug in, I quickly realized that there was no direct relationship between their Trusted App Computers and their Endpoints.

I also realized (not so quickly) that there was not a one-to-one relationship between those objects. I knew I had the data I needed to figure out those relationships, but -- being the ~~smart~~ lazy SfB admin that I am -- I started by digging around the Internet for some background, only to come up empty.

Anyone who has worked much with UCMA Trusted Apps probably already understands this, but I was just then realizing that I had some quick scrambling to do to ensure I didn't cause an outage for their Trusted Apps… while implementing a change to prevent a different outage. That's never a good day.

You're really going to make me work at this, huh?

Up to this point, I had been able to slide by on most UCMA change work, without having a super clear understanding of the relationships between UCMA Trusted Apps and their respective Pool, Endpoint, and Computer objects.

This time, however, while the app owner was somewhat confident of the App Pools and Apps associated with his various Computers and Endpoints, he wasn't totally certain, nor was he sure which ones corresponded with which. Unfortunately, that's not good enough, when doing change work… so it fell to me to really figure this whole model out, so I could give the App owner solid consulting advice, make the right changes at the right times, and help him avoid any impact.

Getting all the pieces together

Not having found any help on the Internet, I got to work, picking through my company SfB deployment's various existing Trusted Apps, Computers, App Pools and Endpoints, looking closely at the object properties that tied them together. I made several interesting observations, which I was later able to stitch together into a fairly simple model:

Trusted App Computers are directly related to single Trusted App Pool.
1. The property that ties them together is the App Pool's PoolFqdn (corresponding to the Trusted App Computers' Pool property).
2. A Trusted App Pool requires at least one Trusted App Computer.

Trusted Apps are directly related to a single Trusted App Pool.
1. The property that ties them together is the App Pool's PoolFqdn (corresponding to the Trusted Apps' TrustedApplicationPoolFqdn property).
2. The App Pool has a corresponding Applications multivalued property, populated by all the Trusted Apps associated with it (via their ApplicationId properties).
3. A Trusted App Pool doesn't have to have any Trusted Apps associated with it (it doesn't do much good without them, but there's nothing preventing this from happening).

Trusted App Endpoints are directly related to a single Trusted App.
1. The property that ties them together is the Trusted App's ApplicationId (corresponding to the Trusted Apps Endpoints' OwnerUrn property).
2. A Trusted App doesn't have to have any Trusted App Endpoints associated with it.

Ok, that's not so bad

These observations were helpful, but not the kind of thing that's easy to remember or use. More to the point, I prefer pictures, so I made one. Nice, huh? This is a lot easier…

Application

With this model in hand, it was a simple matter to build the list of all Trusted App Pools and respective Trusted App Computers associated with the Endpoints my customer provided, and it helped them as well, to see/understand how their objects related to each other.

My hope is that it'll be useful to other SfB admins, as well. Don't hesitate to provide comments and feedback. I'll be happy to update this, as needed.

↧

Help! My Director is consuming all my resources!

September 20, 2017, 7:30 am

≫ Next: High Glitch rate in QoE Report

≪ Previous: Understanding the relationships between UCMA Trusted Application objects

Author: DJ Ball, Senior Escalation Engineer, Skype for Business

Recently I worked on a couple of cases where the administrators were reporting higher than average CPU consumption on their Director pool servers. They reported seeing sustained 80 to 90% CPU consumption during peak business hours. This was most noticeable around the top of each hour. Then, a few hours before the end of their day, the CPU would begin to fall back to their normal 20 to 30% average (normal for these customers, every customer should have their own baseline!).

As we began to troubleshoot the issue over several days, we noticed that only two or three servers in the pool would have high CPU consumption on a given day. We were able to confirm that every server in the pool had high CPU consumption at some point, so this problem was definitely affecting all members of the pool (just not all at the same time) .

Watching Task Manager was enough to figure out that RTChost.exe was the top consumer of CPU time. Now we needed to determine what was causing the problem. Was it load not well balanced among servers in the pool? Was something different on the problem servers (or problem servers on problem days)? Was there were any increase in users or devices on problem days?

A custom perfmon counter log was needed to dig deeper and understand why this service was consuming more CPU. Here is the Logman command line that allowed the customer to easily create the counter log on each server. I have provided the Performance Counter text file that contains all the counters that we used.

PerformanceCounters1

Create command:

logman -create counter SFBPERF -f bin -v mmddhhmm -cf PerformanceCounters.txt -o %systemdrive%\Perflog\%COMPUTERNAME%.LOG -y -cnf 24:00:00

Start command:

logman start SFBPERF

Stop command:

Logman stop SFBPERF

I had the customer run these perfmon logs on each server on issue and non-issue days (so we could compare problematic vs. non-problematic). Once I had this data, it was a time-consuming task to pick it apart.

In reviewing the perfomns, I started off adding these two counters. They showed that the RTCHost.exe process trended up exactly as the total CPU usage. Rtchost was using ~20% of the Processor time\_Total.

Process\% processor Time\RTCHost

Processor\% processor Time\_Total

Then I overlaid these additional counters to look at user load:

LS:SIP protocol\SIP - Incoming Messages /Sec

LS:SIP - Load Management\SIP - Average Holding Time For Incoming Messages

It was very clear that SIP - Incoming Messages /Sec went from an average of 3080, and jumped to 4380. That is about a 40% jump in traffic over the course of ~3 minutes. SIP - Average Holding Time For Incoming Messages also rose from basically 0, to 13.9 just at this same time. But when I compared these peaks against other servers in the pool, they were no higher than other servers that were not having high CPU. I had established was that the 10:00 am hour was a peak time for users joining meetings.

What is RTChost doing when it is consuming so much CPU? Next was to add these counters to the view:

Process\Private bytes\RtcHost

Memory\Available Mbytes

.Net CLR Memory\% Time in GC

Private Bytes counter showed that RtcHost process grew from consuming about 1Gb of memory to a peak just over 13GB in the span of 9 minutes. Available Mbytes counter showed that total system memory went from averaging ~14GB free, then dropped to 3.6GB free over that same period. % Time In GC is a counter that shows .Net Garbage Collection that is occurring for that process. Our jump in user load is what caused the process to consume much more memory, which causes GC to start kicking into overdrive, which drove up the CPU usage.

Now that we knew GC was our bottleneck, I discovered the customer was still running the old .Net 4.0 framework. .Net 4.6.2 release has improved memory management performance and Skype for Business Server has supported .Net 4.6.2 since the February 2017 update. We do not support .Net 4.7 version as it has not been fully tested. The 4.6.2 version can be found here.

The .Net Garbage Collector serves as the automatic memory manager for applications written in .Net. While GC is running, the other worker threads are blocked until GC finishes. The more often GC is running, the less often other work can be done. As a process becomes busier, GC will run more often and for longer periods of time.

Garbage Collection has two modes, Server and a Workstation. The Rtchost process is configured to use workstation mode by default. Workstation mode will have 1 thread to perform GC, and 1 memory heap, where as Server mode will have 1 heap per logical CPU core and 1 GC thread per CPU core. These differences can cause a process to consume as much as 2.5 times the amount of memory. You need to check the Memory\Available Mbytes counter closely to ensure you have enough system memory to handle this change. For a deep dive on GC, the Fundamentals of Garbage Collection is a great resource and the Exchange Team Blog has this excellent post.

Once the servers were updated with .Net 4.6.2, I had the customer enable server mode GC with concurrency in the Rtchost config file as shown below. You should make a backup of this file before adding the two lines to the <runtime> section. This change does require reboot to be picked up.

Default path - "C:\Program Files\Skype for Business Server 2015\Server\Core\RtcHost.Exe.config"

<?xml version="1.0" encoding="utf-8" ?>

<configuration>

<runtime>

<generatePublisherEvidence enabled="false"/>

<gcServer enabled="true"/>

</runtime>

<system.serviceModel>

<services>

If you think this change may help your environment, you need to consider the following caveats:

Per Server requirements for Skype for Business Server 2015, Director role servers are recommended to have 16GB of memory. You need to closely monitor the Memory\Available Mbytes counter before and after making this change. You should have at least 1.5GB free during peak times.
Future Cumulative updates may overwrite your custom RtcHost.Exe.config. You will need to check this setting after each update. This is a custom configuration that needs to be set for each environment.

Thanks for reading!

DJ.

↧

High Glitch rate in QoE Report

September 21, 2017, 11:00 pm

≫ Next: EVENT ID 56416 – Failed to post QoE report to External Consumer

≪ Previous: Help! My Director is consuming all my resources!

If your organization uses Lync Monitoring Reports, CQM you may occasionally see high glitch rate in one or more calls. Both reports typically show you calls with very high glitch rate.

In-order to resolve the issue, we first need to understand what a glitch really is.

A glitch is defined as a short-lived fault in a system. It is often used to describe a transient fault that corrects itself, and is therefore difficult to troubleshoot.

In a Lync or Skype for Business related setting, a Glitch is a short span of time, when the application was unable to have exclusive control of the device. This can be caused, because either another application wanted to use the device, like a Browser wanting to auto-play a video, or a PowerPoint presentation that has audio in it.

Our recommendations for glitch are as follows

AudioSpeakerGlitchRate
Average glitches per five minutes for the loudspeaker rendering. For good quality, this should be less than one per five minutes. Not reported by A/V Conferencing Servers, Mediation Servers, or IP phones.

AudioMicGlitchRate
Average glitches per five minutes for the microphone capture. For good quality this should be less than one per five minutes. Not reported by A/V Conferencing Servers, Mediation Servers, or IP phones.

See: https://technet.microsoft.com/en-us/library/gg398064.aspx

Every Lync call has audio sampling at either 8,000 Samples/Sec (narrowband) or 16,000 samples/sec (Wideband) and a glitch is failure to capture one or more consecutive samples.

If your organization is using a capture and render device that is optimized for Lync then you should indeed expect few glitches.

Our certification Process requires the following

Lync User Experience

Plug and play: once connected, a device registers on Lync server and is ready to use

First run experience: Automatic detection of a device with a direct link to user guide and other useful tools and documentation

Mute/unmute across PC and device

Audio quality (embedded in the device): no echo or excessive glitches, echo cancellation across devices, wideband / Microsoft media platform with RT audio

Anti-flicker support for webcams (global powerline frequency) admin experience

See: https://technet.microsoft.com/en-us/office/dn788944.aspx

If you do experience an issue with Glitches, I would recommend, to begin troubleshooting, by first updating any/every device driver that’s used for Audio and Video on the PC where the glitch rate is high, simply by downloading and installing the latest versions from the appropriate vendor’s web-site.

Next, I would suggest to test again, and if the issue continues to occur, and/or isn’t reduced drastically, to check all USB peripherals, and then to update the devices. If you are using a say, a Universal Docking Station, please download the device drivers from the vendor for the same.

If say, you are using a USB Hub or or are connecting the device using a docking station, it would be prudent to test with a direct connection. Even using a different USB port may help.

To troubleshoot the issue, I would typically perform the following

Request the user to Run MSInfo32.exe and then Click File –> Save ( Save in NFO format)
Run the following in PowerShell to get a list of all available drivers
dir C:\windows\System32\drivers\*.* | %{ $_.VersionInfo} | ConvertTo-Html > C:\Windows\Temp\drivers.html
Collect Data using Windows Performance Recorder (WPR)
1. Download and Install Windows Assessment and Deployment Kit (Windows ADK) from https://developer.microsoft.com/en-us/windows/hardware/windows-assessment-deployment-kit for your operating System
2. Once you have installed Windows ADK, can you please search for Windows Performance Recorder (WPR) and select Scenario Audio Glitches, and ensure to de-select “First Level Triage”, also change the logging to Verbose and Logging Mode to File
3. Start Click on Start, and then attempt a call in Lync / Skype for Business
4. Once you have about 60 seconds in the call click on SAVE in Windows Performance Recorder (WPR)
5. At this point, we have the ETL file
6. We would also want to click SAVE in the next window.

Once you have data from above, it can be used by Microsoft Premier Support to analyze

If you are interested to learn, what’s in the data that was collected, and how we analyze them, you might want to view https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-151-Media-eXperience-Analyzer-part-3-Audio-Glitch-Analysis

↧

EVENT ID 56416 – Failed to post QoE report to External Consumer

September 26, 2017, 6:00 pm

≫ Next: Have you heard the one about he delegate who accepted the meeting request?

≪ Previous: High Glitch rate in QoE Report

Starting in Lync Server 2010, we added a functionality to enable our partners to provide insights into Call Quality by means of a sending a copy of the Voice Quality Report (VQReport) directly from the server. At that time, I knew of of a handful of companies that would allow you to configure QoEConfiguration, so they could generate some reports and provide insights about your network and configuration. Over time, with Call Quality Methodology and later with Call Quality Dashboard, and also integrating CQD Online

To send your QoE Reports to a third party, all you had to do within Lync was to Run

Set-CsQoEConfiguration -EnableExternalConsumer $true –ExternalConsumerName <Friendly Name of the Third Party Consumer> -ExternalConsumerURL "HTTPS URL Provided by the third party"

As soon as replication was complete, and presuming DNS, Certificates, Firewall was in order, all new QoE Reports would also be sent to the third-party. If the third-party was busy or unavailable, the messages would be queued-up ( in MSQM in Lync Server 2010 and in LySS in Lync Server 2013 and above) and then be retried.

If say, for some reason, the organization decided to change it’s course and use either Call Quality Methodology or Call Quality Dashboard, you could use Set-csQoEConfiguration to remove the configuration.

It could be possible that over time, with all the changes, the strategy may have changed, but the configuration has existed, and the 3rd party provider has chosen to block connection from your organization, or a new pool is deployed, and outbound connections to port 443 to the external consumer is no longer accessible, in such cases, you could see EVENT ID 56416 occur in your organization.

Time:     5/2/2017 2:49:54 PM

ID:       56416

Level:    Error

Source: LS Data Collection

Machine: SKYPESTD01.contoso.com

Message: Failed to post QoE report to External Consumer.

Error: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond <IP Address of the Provider >:443

at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)

at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)

--- End of inner exception stack trace ---

at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)

at System.Net.HttpWebRequest.GetRequestStream()

at Microsoft.Rtc.Server.UdcAdapters.QoE.HttpSender.SendReports(LyncMessageDetails msgDetails)

at Microsoft.Rtc.Server.UdcAdapters.QoE.QoEProcessor.ProcessQueueItems(LyssQueueItem queueItem)

Cause: Configurations for the external reports consumer are not set correctly.

Resolution:

Check the External Consumer configurations. If the problem persists, notify your organization's support team with the relevant details.

Depending on the cause, and if the intent is to send the data to third-party then updating it would be matter of checking why a connection to port 443 is failing and correcting the same. Once the connection issue has been resolved, it’s just a matter of waiting for all the VQReports to be delivered to the third-party. It may take a couple of hours, depending on the robustness of the third-party system, and the time for which you have been experiencing the failure.

If the intent is no longer to send the data to the third-party, then, you may want to run Invoke-csStorageServiceFlush to move all the data from the existing queues to the the Network file share, so resources like CPU, RAM and SQL Storage ( this is held in SQL Express ) are not wasted. It could also be possible that the Storage Service may perform a Flush Automatically, under several conditions. You might want to read my blog-post Testing IM and Web-Conferencing Archiving set to Critical to understand the robustness of LYSS.

↧

Have you heard the one about he delegate who accepted the meeting request?

October 5, 2017, 10:58 am

≫ Next: How to Set up and configure Cloud Voicemail for Skype for Business Online users

≪ Previous: EVENT ID 56416 – Failed to post QoE report to External Consumer

Author: DJ Ball, Senior Escalation Engineer, Microsoft CSS Support

Another day supporting Skype for Business, and another topic to discuss. Today we will look at delegation, and how to troubleshoot it.

The case came in as a customer reporting that several of their Administrative Assistances could not create a Skype for Business meeting on behalf of one specific VIP user.

This VIP had configured delegates in Outlook for 4 different assistants. All four would get this error when they clicked the “Skype Meeting” button on a calendar item they were creating for their manager.

“The person you are scheduling on behalf of is not UC enabled or there may be configuration issues with the account. Make sure you are signed in to the same account you use for Microsoft Outlook. If the problem continues, please contact your support team.”

There are two ways that Skype can be configured for delegation. The most common scenario is for the manager to assign a delegate using Outlook. This post covers setting up delegates and the various permissions that can be granted. The Skype for Business client has a process, UCMapi.exe, that will sync down delegates that are configured in the mailbox. So once this is set in Outlook, within a few minutes the delegate should see the yellow banner in the client showing they were added as a delegate.

So how do we troubleshoot this? The first thing to verify is that the delegate can open the Manager’s calendar using Outlook. Then verify the delegate can create a regular meeting on behalf of the manager without clicking the “Skype Meeting” button. Once we established these are working, we could move on to more in depth poking.

I had the manager remove the delegate, wait several minute and re-add them back. We noticed the delegate never receives the yellow banner. The next thing to check is the EnableExchangeDelegateSync attribute on the assigned Client policy.

This must be set to True to allow delegates to schedule meetings on behalf of someone. Once this was enabled, the delegate’s saw the yellow banner message that they were added as a delegate. However, they still got the same error message for a Skype meeting.

You can check the registry to see if a client ever dismissed the yellow banner. We have two keys that track this. They are the Last DelegatorList and Dismissed DelegatorList. You can verify these keys under this path.

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Office\16.0\Lync\user@domain.com

Name: Dismissed DelegatorList

Type: REG_MULTI_SZ

Data: disabledmanager@djball1.lab

Name: Last DelegatorList

Type: REG_MULTI_SZ

Data: disabledmanager@djball1.lab

managertwo@djball1.lab

The Skype for business client includes an Outlook Add-in named “Skype Meeting Add-in for Microsoft Office 2016”. When the “Skype meeting” button is clicked, code from this add-in is run. To crack this case, I had to turn to debugging the functions that run when this button is clicked. My debug session led me to an old Lync 2010 article that talks about the possible Add-In error messages. The chart has a very similarly worded error, but it was different enough that previous searches never found it. The possible cause is listed as “…Lync 2010 located the SMTP address of the person you are scheduling a meeting for but cannot determine his or her SIP address from Exchange Server or Lync”.

When the “Skype Meeting” button is clicked, we look up the Manager’s contact in the Global Address book. We then loop through all the proxy addresses looking for a matching SIP: address. If we do not find this, we fail and log the error message. This behavior has not changed since the 2010 days.

This was a hybrid deployment for Skype for Business Server, and Exchange 2013. The Manager and Delegates were all fully moved to the cloud for both services. We found that the on premises Exchange Global Address List did have the correct SIP address for the manager, but this attribute had not correctly replicated to the cloud. We worked with their on premises Exchange team and the Office 365 operations team to correct this. Once replication took place all Admins could create the Skype meetings without any issue, another problem solved.

But wait, remember I said there were two ways to setup delegates? When the Skype for Business user is enabled for Enterprise Voice, a “Call Forwarding” option is lit up in Tools/Options. This allows you to explicitly set up a Skype for Business delegate that can create meetings, but also allow the delegate to answer incoming phone calls. This delegation is not configured when the manager creates the Outlook delegate. This method is used for scenarios where Outlook is not used by the clients.

https://support.office.com/en-us/article/Set-up-a-Lync-Meeting-on-behalf-of-someone-else-DD35D8E8-147A-4B51-ABF2-9B02121EA3C3

↧

How to Set up and configure Cloud Voicemail for Skype for Business Online users

October 13, 2017, 1:29 am

≫ Next: SDN Interface Setup/Configuration Recommendations

≪ Previous: Have you heard the one about he delegate who accepted the meeting request?

What is Cloud PBX Voice Mail? How does it Work? How to Configure Cloud PBX Voice Mail for SFB online users?

Cloud PBX Voice mail is a New Service Offering Provided by Microsoft using O365 Cloud PBX. Cloud PBX Voice Mail is ONLY available for Customers who Use O365 Skype For Business Online with Cloud PBX and PSTN calling or Cloud PBX with ON Premise Cloud Connector Edition (CCE). You CANNOT use Cloud Voice mail if you use Skype For Business Server or LYNC Server ON Premise, Nor can it be used with an On Premise PBX/IP-PBX Phone system.

General Pre-Requisites for Cloud Voice Mail

1. Cloud Voice Mail will work ONLY if your users are Homed in SFB Online. If you have SFB deployed in a Hybrid scenario then ONLY users who are hosted Online Will use Cloud Voice Mail.

2. You Need to have Microsoft Exchange as your Email Solution, Exchange can either be Online with O365 or ON premise. If Exchange is ON premise then the minimum Supported version for Exchange for Cloud Voice Mail is Exchange 2013 with CU12.

Below is a Table that Specifies the Supported Configurations for Voicemail available with Skype for Business

SFB User is Homed Online, Exchange Mailbox is Online:

When SFB User is Homed in O365 and has a Mailbox in Exchange Online then Voice Mail for such a user is Handled by Cloud PBX Voice mail service.

Please NOTE that Exchange UM is NOT used to provide Voicemail in this Scenario.

Voice Mail is Actually handled by the Cloud PBX Voice Mail service

When the Cloud PBX Voice Mail service takes a Voice Mail for a User it deposits this Voice Mail into the Users Exchange Mailbox and Hence we still Need Exchange.

Exchange is ONLY needed to Store a User's Voice Mail as we don’t have Native Storage for Voice mail anywhere in SFBO at the Moment.

Exchange Does NOT play any Role in Collecting a User's Voice Mail

How Does it Work?

User E1 calls User E2
User E2 Declines the Call
This call is Now sent to the SFB online FE server where User E2 is Homed
SFB FE server Now has to decide what to do with the call
The FE server Checks User E2’s Hosted Voicemail Attribute to see if HostedVoicemail is set to $TRUE
If it is, then it checks the Hosted Voicemail policy that is applied to the user and finds where to send the call.
The Hosted VM Policy should point to the SFB Online Resource Forest Edge Pool (sipedgebl20r.infra.lync.com) which will then route the call to Cloud PBX Voice Mail Service.
Once FE finds that the call needs to go to sipedgebl20r.infra.lync.com then it generates a new Invite and sends the call to the SFB online Edge server associated with the FE pool.
The Job of FE server is done at this point
Now the edge server associated with User E2's FE pool receives this call/INVITE
The Edge server sends this call to the Resource forest Edge server sipedgebl20r.infra.lync.com
An Edge server in the sipedgebl20r.infra.lync.com pool gets this Call
The Edge server then sends this call to its next HOP which would be a FE server in its pool
The FE server in the Pool now gets this call.
This FE Server then sends the call to the Cloud PBX Voice mail service “noam.voicemail.services.skypeforbusiness.com” that runs in AZURE based on Custom Routing rules.
Before sending the Call to the Azure VM service the FE server also adds a Special CONTEXT Header to the SIP INVITE which has the information of User E2 like SMTP address, Object ID, Tenant ID, Display Name etc. This information is used by the Azure VM service to find the language setting of the user and the email address of the user to deliver VM.
The Azure VM Service “noam.voicemail.services.skypeforbusiness.com” will answer the call, record the VM and then send the Voice Mail as an email to the user E2's mailbox.
The Voice Mail is then delivered to the users Mailbox as an attachment and the users Mailbox is the ONLY location where it is Stored.

PRE-REQUISITES:

Provisioning and license

To ensure Cloud PBX Voicemail works for your users We need to Verify If the Cloud PBX User is Provisioned Correctly. To do this you have to ensure the following is done

Verify that the SFB online User has the Following Licenses;

Cloud PBX for SFB

SFB PSTN Calling

Make Sure the user has a Telephone Number Assigned and is Enabled for Enterprise Voice

You can check this using the below Command from SFB Online Remote PowerShell

Get-CsOnlineUser | fl Alias,lineuri,Enterprisevoiceenabled*

The User should also have a Valid Exchange Mailbox (Online or On Premise) in order for him to be able to Receive/check his Voicemails.

CONFIGURATION:

There is NO CONFIGURATION that is required for Cloud PBX voicemail to work. If the user is provisioned correctly with the correct license and is enabled for Enterprise voice with a Telephone number then Cloud PBX Voicemail will automatically be enabled/configured in the backend. To ensure that it is indeed configured we can check the user configuration using SFB online Remote PowerShell.

Collect the output of the following commands from the SFB online Management Shell:

Get-CsOnlineUser | fl Alias,EnterpriseVoiceEnabled,Lineuri,Hostedvoicemail,hostedvoicemailpolicy

For a User to have Azure Voicemail, the HostedVoicemail Attribute should be set to TRUE

When a user is enabled for EV and LineURI is set, the Hosted Voicemail attribute will be automatically provisioned to TRUE for the user (this process might take some time for newly created or moved users, sometimes up to 24 hours). In case you find that the user still doesn’t have HostedVoicemail set to $true then you can Manually set this attribute using remote PowerShell with the help of the below command (however you should NEVER have to do this manually, this should ideally happen automatically as soon as the user is provisioned correctly in O365)

Set-CsUser -Identity e2@mshaikh.onmicrosoft.com -HostedVoiceMail $true

The HostedVoicemailPolicy attribute should be populated with the name “Businessvoice”.

When a user is enabled for EV and LineURI is set, The Hosted Voicemail Policy attribute will also be automatically provisioned for the user (this process might take some time for newly created or moved users, sometimes up to 24 hours). In case you find that the user still doesn’t have HostedVoicemailPolicy attribute populated with "businessvoice" then you CANNOT Manually set this attribute using remote PowerShell as the command to do this is NOT available in SFB Online PowerShell.

The Hosted Voicemail Policy “Businessvoice” is a Default Policy that exists in O365 by Design and is the ONLY policy that needs to be used for Cloud Voicemail. This policy will be automatically applied for any user who is Enabled for Cloud PBX with PSTN calling.

Side note: You can check the properties of the Hosted Voicemail Policy “Businessvoice” using the command Get-CsHostedVoicemailPolicy.

The Destination attribute here points to the the SFB online resource forest pool where the Cloud Voicemail service resides. This destination field tells the Online FE servers where to send calls for Voicemail

You should NOT Try to Enable your user for UM.

If the settings listed above have been configured/provisioned correctly then the user is silently enabled for UM in the back end using a custom process. Remember Exchange UM is NOT used for Voicemail in this scenario. Exchange is ONLY used to store Voicemail messages and hence there is NO Need to enable UM for users manually. This is done automatically on the backend.

You can however verify if it is done from the Skype for Business admin center in the O365 Portal. Below is a screen shot showing this;

If you are NOT seeing the Unified Messaging flag set as enabled or any of the other settings populated like HostedVoicemail or HostedVoiceMailPolicy then try to call your user and test if you can leave them a Voicemail, normally once you leave them a voicemail all the settings and flags will get set automatically.

What if We have SFB online but Exchange ON premise?

If you are Using SFB online with Cloud PBX and PSTN calling and have Exchange Server ON Premise then in order for Cloud Voicemail service to deposit Voicemail messages to your users mailbox you will need to configure Oauth between Exchange ON Premise and Skype for Business online. The below article has information on how to achieve this

https://support.microsoft.com/en-us/help/3195158/customer-issues-between-exum-and-azure-voicemail

Feature Matrix for Cloud PBX Voicemail

The Table below lists the features that are available with Cloud PBX voicemail as compared to traditional Exchange UM.

Again - Keep in Mind that Exchange UM is NOT used for Voicemail in this scenario and hence the features available with Exchange UM may not necessarily be available with Cloud PBX voicemail.

Helpful Articles

https://support.office.com/en-us/article/Set-up-Phone-System-voicemail-Admin-help-9c590873-b014-4df3-9e27-1bb97322a79d

↧

SDN Interface Setup/Configuration Recommendations

November 28, 2017, 7:17 am

≫ Next: Event ID 32099 – Attempt to automatically import a flushed file back into Storage Service encountered error

≪ Previous: How to Set up and configure Cloud Voicemail for Skype for Business Online users

Author: Steve Schiemann

Introduction and Background:

This writing is intended for customers currently implementing or planning for the Software Defined Networking Interface (SDN) implementation for Lync/Skype for Business on-premise servers. As noted in my previous blog post, the SDN Interface uses open protocols to apply software control to network hardware. There are three primary components to the SDN Interface:

· The Dialog Listener that captures signaling and quality observations about media traffic between Skype for Business endpoints. The Listener component (a.k.a. “LDL”, or Lync Dialog Listener) needs to be installed on each Front-End server.

· At least one SDN Manager that collects data from Dialog Listeners and distributes to third-party network management systems (“Subscribers”, or network controllers). If a single Manager or manager failover configuration is deployed, call quality data is stored in memory on each manager. This is transient data, meaning that as soon as it is sent to a controller, it goes away. In a manager pool configuration, a data store that maintains the shared state among all SDN Managers in a single pool is required. The data store could be a SQL database or Redis cache system.

· One or more Subscribers. These controllers support a RESTful (REpresentational State Transfer) web service to receive and analyze the call- and media-quality data posted from the SDN Managers. Based on the call quality data received, these third-party network management systems can make real-time adjustments to optimize network traffic.

This blog post applies primarily to the Manager component. This component can be deployed in different ways:

· In a manager pool

· In a failover configuration

· As a single manager

· Manager and Listener components collocated on the same server.

For more context, please refer to my previous blog post, online SDN Interface documentation, and/or the documents downloaded with each version of the SDN Interface.

Statement:

If deploying multiple SDN Interface managers, the Skype for Business product group is now strongly recommending installing SDN Manager pools, instead of deploying managers in a failover configuration.

Why?

Failures triggered by various connection issues reaching limits cause the DialogListener to fail over. The problem is that there is no coordination among them. Failover configuration is really targeted for a disaster mitigation solution. If failover is configured, the limits in the parameters (such as submitqueuelen and maxretrybeforefailover in our documentation) should be set high enough to prevent a failover happening in unintended situations. Details regarding these parameters are beyond the scope of this post.

When a disconnected Dialog Listener attempts to deliver messages to the primary SDN Manager, a failover protection algorithm will switch to the alternative SDN Manager to ensure that the SDN Interface provides continuous service when server failures occur. In this case, the alternate SDN Manager becomes the new primary service provider. Call states are lost during the failover transition, because state is kept in memory on the primary SDN Manager. This may cause inconsistent or incomplete message reporting delivered to subscribers until the new active SDN Manager can establish a consistent view of the ongoing media streams.

In the event of fail over, the secondary computer is promoted to the new primary node. Restoring the second node will automatically make it the secondary node, and the new primary node will stay in place until it fails over. Listeners will not “fail back” automatically to their original manager.

Why is an SDN Manager Pool Better than a Failover configuration?

In a Skype for Business SDN Interface pool configuration, all Dialog Listeners are connected to a DNS load-balanced pool of SDN Manager servers.

In this configuration, the size of the pool scales with the message load produced by the Skype for Business Servers and Dialog Listeners. The pool automatically handles most server failures. Network controllers (subscribers) connected to this SDN Manager pool receive a consistent state about applicable media streams handled by the connected Skype for Business Server front end pools.

Disaster scenarios can be dealt with by defining the SDN Manager pool across different locations. The failover configuration is there for similar failover, but it needs careful configuration and therefore not really recommended at all. Setting up an SDN Manager pool across different locations is preferred.

The disadvantage is that you need a data store (Redis or SQL) for a manager pool, but the advantage is load-sharing within the pool, instead of having a passive backup.

Summary

Although the documentation that is downloaded with each version of the SDN Interface treats each possible manager deployment equally, the product group is considering deprecating the failover scenario. If deploying multiple SDN managers, strongly consider using one or more manager pools.

↧

Event ID 32099 – Attempt to automatically import a flushed file back into Storage Service encountered error

January 17, 2018, 8:00 am

≫ Next: Skype for Business Client-Side Anti-Virus Scanning

≪ Previous: SDN Interface Setup/Configuration Recommendations

LYSS service in Skype for Business Server 2015, started auto-importing data from the file share assigned to the respective pool. This is done every 30 minutes. When an import fails, event ID 32099 will be generated.

An example of this event would be

Log Name: Lync Server
Source: LS Storage Service
Date: 1/4/2018 9:20:30 AM
Event ID: 32099
Task Category: LS Storage Service
Level: Error
Keywords: Classic
User: N/A
Computer: SKYPEFE01.contoso.com
Description:
Attempt to automatically import a flushed file back into Storage Service encountered error.
The following automatic flushed file import error events occurred.
#CTX#{ctx:{traceId:10001, activityId:"8454c5d4-57f6-437a-9cf7-46fc15960492"}}#CTX# File: \\contoso.com\LyncRootDFS\RTCShare\1-WebServices-1\StorageService\DataExport\20140625\SKYPEFE01.contoso.com\0640daf8d97b5199b82663737356b525__14.xml, items deserialized 3, items failing re-import: 3

#CTX#{ctx:{traceId:10001, activityId:"8454c5d4-57f6-437a-9cf7-46fc15960492"}}#CTX# File: \\contoso.com\LyncRootDFS\RTCShare\1-WebServices-1\StorageService\DataExport\20140625\SKYPEFE03.contoso.com\b20f02d8943b53dc89ddcb2ff106f912__6.xml, items deserialized 2, items failing re-import: 2
:
Cause: Bad input data, or error calling Storage Service, or other errors.
Resolution:
Please look at event details and use the correlation ID to view corresponding traces to resolve the error.

To investigate, I would get started with the XML files. First, I would simply view them in a Browser or another application to view the contents. A quick visual spot check could provide information, about the failures

<?xml version="1.0"?>
-<LyssQueueItem Version="1" xmlns="http://schemas.microsoft.com/RtcServer/2012/11/lyssimpexp">
-<QueueItems>
+<ItemQueue ItemQueueID="0282d738-9468-e711-8108-0050569e79b5" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="e081e54a-9468-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="82c21bcb-9768-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="3cd27fce-9a68-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="48b5359a-a368-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="44bb9767-a768-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="8b35a1e5-be68-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="4ff14355-c068-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="b6e98870-c168-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="c0204b47-c568-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="6c59d707-c668-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="90482b96-c768-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
+<ItemQueue ItemQueueID="6c76a50e-ca68-e711-8108-0050569e79b6" GroupID="38e63176-1723-5d4d-8a40-ff2e2c436899">
</QueueItems>
</LyssQueueItem>

Here we can count the number of items as 13 items, and may be able to investigate individually. Let's say, if there were many items, we would want to not manually count the errors and then manually check what’s wrong.

So we can run use PowerShell to help us out, this can be accomplished by running

[XML] $a=Get-Content '.\LYSS_Sample.XML'
$a.LyssQueueItem.QueueItems.ChildNodes.Count

1289

Disclaimer: Importing a XML file in PowerShell can be very resource extensive, and it highly recommended to not be performed on a LYNC or Skype for Business Server

Next, we can look into the characteristics about the issue, simply by running the below script. We can see that the content in this XML file is all tagged as Item Status 3 with a particular AdapterID.

$a.LyssQueueItem.QueueItems.ItemQueue | ft ItemStatus,AdapterID

ItemStatus AdapterID
---------- ---------
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f
5          cde2bace-f515-444d-a3f1-858a7fc8728f

If all the items have ItemStatus 5 and the AdapterID is cde2bace-f515-444d-a3f1-858a7fc8728f, and you have users that are enabled for Skype for Business, but have mailboxes hosted in an email system that’s either non-Microsoft Solution ( EWS doesn’t exist) or doesn’t allow for OAuth ( Exchange Server 2010 and earlier) then these messages mostly would be Server-Side Conversation History for the account, but conversations cannot be saved. If the goal is not to move the mailbox to Exchange Server 2013 or higher, then these files can be ignored.

$a.LyssQueueItem.QueueItems.ItemQueue | Group-Object AdapterID | FT Count, Name
Count       Name
-----       ----
60         cde2bace-f515-444d-a3f1-858a7fc8728f
12         36AA818F-00BB-43BC-88E7-6840ECA732C6
5          0947BCF3-7D50-40A7-9E3A-F07B9DC4CEF1

If the name matches "CDE2BACE-F515-444D-A3F1-858A7FC8728F" you might want to follow instructions The LCSLog SQL Database is not logging any archiving content.

If the name matches "36AA818F-00BB-43BC-88E7-6840ECA732C6" it could be possible that you also have issue with EVENT ID 56208 – Resolving Issues with CDR Throttling

If the name matches “0947BCF3-7D50-40A7-9E3A-F07B9DC4CEF1”, if could be possible that you have issues with EVENT ID 56416 – Failed to post QoE report to External Consumer

If there are more than 2,000 items in any XML file ( shouldn’t typically happen), then it could happen that the Auto-Import functionality may fail parsing the file. In such a case, we would recommend to use ImportStorageServiceData.exe or reach out to Microsoft Premier Support

if the contents of the folder have data older than your retention period ( CDR, QoE, IM&WebConf retention) say for example the XML files are generated 12 months ago, but the retention periods are 90 days, then it would be safe to delete the XML files from >90 days ago.

As always, when in doubt, we request you to open a Service Request with Microsoft Premier Support

↧

Skype for Business Client-Side Anti-Virus Scanning

February 6, 2018, 7:53 am

≫ Next: Persistent Chat room lock-up and become unavailable when a user is either added/removed from the Room/Category

≪ Previous: Event ID 32099 – Attempt to automatically import a flushed file back into Storage Service encountered error

by Steve Schiemann

Microsoft has found that some client-side issues can arise because of anti-virus interference with normal operations. These issues include but are not limited to downloading the address book, response problems when performing various tasks, or outright crashes.

To ensure that the antivirus scanner does not interfere with the operation of Skype for Business (SfB) clients, customers should exclude client tracing/profile directories, and the Office installation directories on each workstation on which you run a file-level antivirus scanner.

Note:

Folder and file locations listed below are the default locations for various client installations. For any locations for which you did not use the default, exclude the locations you specified for your installation instead of the default locations specified in this writing.

Important:

Please note that some antivirus programs may need absolute, not relative paths, for their exclusion list.

Client Tracing / Profile Directories

Office 2016:

%userprofile%\AppData\Local\Microsoft\Office\16.0\Lync

Office 2013:

%userprofile%\AppData\Local\Microsoft\Office\15.0\Lync

Office 2016 Installation Directories

Click-to-Run:

C:\Program Files (x86)\Microsoft Office\root\Office16

MSI-based Installations:

· 64-bit Office on 64-bit Windows:

C:\Program Files\Microsoft Office\Office16\

· 32-bit Office:

C:\Program Files (x86)\Microsoft Office\Office16\

Office 2013 Installation Directories

· 64-bit Office on 64-bit Windows

C:\Program Files\Microsoft Office\Office15\

· 32-bit Office:

C:\Program Files (x86)\Microsoft Office\Office15\

Must I Exclude These Directories?

The short answer is no, but please take into consideration that we in Microsoft Customer Service and Support have resolved many issues by simply taking A/V scanning out of the picture. This happens both server- and client-side. Often customers push back when asked to remove A/V software, or simply disable it for testing purposes. We understand your concerns, but this software can be very intrusive. Even if disabled, hooks are left in place which can interfere with Skype for Business clients. For another perspective, please see this this blog. Here is an excerpt: “AV or security software manufacturers tend to understand “Disabled” as a “I’ll continue with all my intrusive way of doing, only that if I detect something suspicious I won’t tell anyone. But I can keep being the cause of performance problems, memory leaks, or memory corruptions. “

Eicar Test

The Eicar (European Institute for Computer Antivirus Research) test allows anyone to see if a certain folder on their machine is being scanned. Simple copy/paste the 68-byte ASCII text into notepad, and save it locally. Your scanner should pick up this innocuous file and flag it as a threat. I did this, and saved it to my Lync/Sfb profile folder, and immediately was informed of a “severe” thread by Windows Defender. If I suspected A/V of causing issues with SfB, I would have excluded this folder from scanning.

Grab the Eicar test and details from http://www.eicar.org/86-0-Intended-use.html

Conclusion

In most SfB client cases, A/V software runs fine without any special configuration and does not interfere with SfB functionality. If you have read this page however, you understand why customers might be asked to exclude certain directories from scanning, or to disable, or remove A/V software for testing purposes.

Note:

We are not aware of a risk of excluding the specific files or folders that are mentioned in this article from scans that are made by your antivirus software. However, your system may be safer if you do not exclude any files or folders from scans.

Resources

Antivirus scanning exclusions for Lync Server 2013

https://technet.microsoft.com/en-us/library/dn440138(v=ocs.15).aspx

Plan antivirus scanning for Outlook 2013

https://technet.microsoft.com/en-us/library/dn769141.aspx?f=255&MSPPError=-2147217396

↧

Persistent Chat room lock-up and become unavailable when a user is either added/removed from the Room/Category

February 16, 2018, 9:00 am

≫ Next: SaRA now available for Skype for Business

≪ Previous: Skype for Business Client-Side Anti-Virus Scanning

The latest update for Lync Server 2013 ( July 2017 ) has the following fix

The update for Skype for Business Server 2015 ( May 2017) has the following

KB4015910 Event ID 53106 "Unable to Save Message" occurs in Skype for Business Server 2015 Persistent Chat Server

It could happen that though the updates are installed ( or a higher CU) is installed, the issue could persist in the environment.

Log Name:      Lync Server

Source:        LS Persistent Chat Server

Date:          10/10/2017 1:01:02 PM

Event ID:      53508

Task Category: (1098)

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      PCHATServer.contoso.com

Description:

Failed to release the admin lock. Administrative command processing cannot proceed.

Log Name:      Lync Server

Source:        LS Persistent Chat Server

Date:          10/10/2017 5:44:36 AM

Event ID:      53555

Task Category: (1098)

Level:         Warning

Keywords:      Classic

User:          N/A

Computer:      PCHATServer.contoso.com

Description:

An inconsistent state between the server cache and the database was detected and the server cache will be reloaded.

The Persistent Chat server will reload its cache from the database.

Cause: This can be caused by Persistent Chat servers failing to communicate with each other.

Log Name:      Lync Server

Source:        LS Persistent Chat Compliance Server

Date:          10/8/2017 1:29:31 PM

Event ID:      53106

Task Category: (1097)

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      PCHATServer.contoso.com

Description:

Unable to save message 10/8/2017 8:24:59 PM PART ma-chan://contoso.com/6f41dceb-69ae-434a-9699-123e8eb5f675 0 39000 to database due to exception:

CmdID: c5409a64-b11d-4d49-90f5-fa694cd4555f The server could not restore db connection within the allowed time (00:10:00) using connection string: Data Source=sql01.contoso.com\RTC;Initial Catalog=mgccomp;Integrated Security=SSPI;Failover Partner=sql02.contoso.com\RTC. at

at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.Database.DbCommand.executeUntilSuccessOrTimeout[TR](Fun`2 executeDelegate, RetryInfo retryInfo)

at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.Database.DbCommand.executeImp[TR](Fun`2 executeDelegate, Int32 retryTimeoutInMs)

at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.Database.DbCommand.ExecuteNonQuery(Int32 retryTimeoutInMs)

at Microsoft.Rtc.Internal.Chat.Server.Compliance.ComplianceDataAccess.Save(RawComplianceData data)

at Microsoft.Rtc.Internal.Chat.Server.Compliance.ComplianceServer.Save(RawComplianceData data).

This issue stems from design and from scalability. When Persistent Chat servers were designed it wasn't expected that users would be removed/added on continual basis. Also to ensure that only participants who are in the chat room have access, even though a single user was added/removed, we verify the permissions for every user and every category and every chatroom. This works well in small environments, but as the usage scales, the solution fails to scale. Now, about the trade-off, we added a new flag that can be modified to change the behavior, where no checks are performed and the actions are simply implemented. What does it mean in daily usage, if a user was removed from say a chat room, under the current scenario, the chat room access is also removed from the client immediately. The trade-off that businesses will now have to make is for performance, and to prevent SQL lock-ups, that may have to wait for a client to sign-out and sign-in, causing access to the chat room to be revoked.

RESOLUTION:

Connect to MGC database in your environment and then get me the contents of the dbo.tblConfig table. It should be like

configLabel configPoolID configContent

pool 9CFB3493-89B2-447C-8487-9C19C13E1694 < ?xml version="1.0"....

We are interested in the ConfigContent. It should look like

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

           < configuration version="1">

             < pool>

               < db>

                 < retry_ms>600000</retry_ms>

                 < lossdetection_ms>120000</lossdetection_ms>

               < /db>

               < channelserver>

                  <ADConnect>

                   < GlobalCatalog>

                     < findgc>True</findgc>

                     < host></host>

                     < adsynchfreq>480</adsynchfreq>

                   < /GlobalCatalog>

                 < /ADConnect>

                 < adupdate>

                   < batchsize>5000</batchsize>

                   < sleeptime_ms>10000</sleeptime_ms>

                   < accesspoll_ms>604800000</accesspoll_ms>

                   < accesspoll_size>50</accesspoll_size>

                   < accesspoll_enabled>False</accesspoll_enabled>

                 < /adupdate>

                 < serverbackchat>

                   < cache_size_limit>2500000</cache_size_limit>

                 < /serverbackchat>

                 < watermarks>

                   < batch_message_count_max>20</batch_message_count_max>

                   < async_send_max>100</async_send_max>

                   < async_send_max_lo>90</async_send_max_lo>

                   < outbound_queue_max>100000</outbound_queue_max>

                   < outbound_queue_max_lo>90000</outbound_queue_max_lo>

                   < low_priority_queue_max>500</low_priority_queue_max>

                   < inbound_queue_size_max>10000</inbound_queue_size_max>

                   < channelinvitemax>50</channelinvitemax>

                 < /watermarks>

               < /channelserver>

               < webservice>

                   < maxchunksizeinkb>1024</maxchunksizeinkb>

               < /webservice>

             < /pool>

            </configuration>

Please see highlighted section in Yellow. We will need to edit the contents to insert the line <notify_users>0</notify_users> at that particular location. Once this is done, we would recommend to restart the services for PCHAT.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

            <configuration version="1">

              <pool>

                <db>

                  <retry_ms>600000</retry_ms>

                  <lossdetection_ms>120000</lossdetection_ms>

                </db>

                <channelserver>

                  <ADConnect>

                    <GlobalCatalog>

                      <findgc>True</findgc>

                      <host></host>

                      <adsynchfreq>480</adsynchfreq>

                    </GlobalCatalog>

                  </ADConnect>

                  <adupdate>

                    <batchsize>5000</batchsize>

                    <sleeptime_ms>10000</sleeptime_ms>

                    <accesspoll_ms>604800000</accesspoll_ms>

                    <accesspoll_size>50</accesspoll_size>

                    <accesspoll_enabled>False</accesspoll_enabled>

                  </adupdate>

                  <serverbackchat>

                  <cache_size_limit>2500000</cache_size_limit>

                  <notify_users>0</notify_users>

                  </serverbackchat>

                  <watermarks>

                    <batch_message_count_max>20</batch_message_count_max>

                    <async_send_max>100</async_send_max>

                    <async_send_max_lo>90</async_send_max_lo>

                    <outbound_queue_max>100000</outbound_queue_max>

                    <outbound_queue_max_lo>90000</outbound_queue_max_lo>

                    <low_priority_queue_max>500</low_priority_queue_max>

                    <inbound_queue_size_max>10000</inbound_queue_size_max>

                    <channelinvitemax>50</channelinvitemax>

                  </watermarks>

                </channelserver>

                <webservice>

                    <maxchunksizeinkb>1024</maxchunksizeinkb>

                </webservice>

              </pool>

            </configuration>

For user removals, it could be possible that you could run Revoke-csClientCertificate for the removed user, and the user will be signed-out from all end-points that do not use UCWA. They can then sign-in and continue using the service. This commandlet may disrupt the calls and conferences or IM conversations the user is on.

Please check your business requirements and the available trade-offs to decide if you want to proceed with altering the configuration. Also note that a service restart is required.

↧