Investigating File System Watcher buffer overruns

An exciting story about a peculiar storage behavior causing IPDirector problems. Warning: log-reading, some programming and Powershell scripting inside!

 

Symptoms
Some of my customers have been experiencing problems with IPDirector, where clips have remained ‘growing’ long after being complete and newly exported files would not appear in Database Explorer. Customer engineers had to resort to using the ‘Search for New Clips and Folders’ function or even re-scanning the Nearlines to get the correct clip information to be displayed.

 
A few words about the SyncroDB service
In computing there arу two basic ways of maintaining an up-to-date list of files on a network storage:

Polling: from time to time, the application can request a list of files on the storage, compare it to its internal list of known files and identify changes that have taken place since the last survey.

Notifications: the application can sit back, relax and wait for some other entity to inform it of any changes as they happen. In case of files it could be File Created/ Changed/ Deleted messages. Upon receiving such a message the application would update its internal list of known files and take any required actions according to the action that has taken place.

EVS IPDirector suite of applications is based on Microsoft .NET Framework, which provides a plenty of basic functions for interacting with file systems. One of its modules is called File System Watcher, and it allows the service (in our case – SyncroDB) to define a directory to watch and start receiving messages that reflect any changes that happen there, i.e. clips being written to the folder by NLE like Adobe Premiere or users manually deleting assets via Windows explorer.
Sometimes SyncroDB has to deal with storages that don’t send notifications at all, and that’s when we select the ‘Other’ management mode in IPDirector Remote Installer. This is the implementation of the aforementioned ‘Polling’ approach.
When we do trust our storage to notify IP Director of all the changes within a Nearline defined in Remote Installer, we choose ‘MS Windows’ management mode. Some Windows-compatible shares are based on an SMB protocol and they also send such notifications.
It has to be noted that the implementation of SMB protocol in network storage products is entirely up to their developers and because of that there could be significant differences in how storages behave.

 
Significant differences
Let’s head back to where the article started: “IPDirector has stopped reflecting changes that took place in the Nearline”. Care to guess what could be the problem here? 🙂
Upon examining SyncroDB logs, I have spotted the following entries:

Exception\DirectoryManager_4373.log 397 2016-04-11 16:13:04,9631493 – System.IO.InternalBufferOverflowException: Too many changes at once in directory:\\generic.domain.tv\Customer\Storage\Entertainment\HiRes\.

A quick search through Microsoft documentation reveals that ‘InternalBufferOverflowException” is returned to the application (SyncroDB in our case) when the internal buffer of .NET framework File System Watcher Component gets overflown with notifications. The maximum buffer size that a developer can set is 64 KB. If the component receives too many notifications from the storage, some of them are lost and the application remains unaware of some of the changes.
In our case that would explain why IPDirector remains unaware of some newly created files and has not been notified that some files are no longer growing and have been completed.

 

We need to go deeper”
It was time to take this up with the storage manufacturer. However, after a brief remote session and a demonstration of the problem my customer has received a response that this needs further investigation and it has been unclear whether the problem lies within EVS software or within the storage itself. I felt obliged to help both parties understand better where the problem lies.
To eliminate EVS from the equation, I have taken an open-source template that is available for making use of Windows File System Watcher (http://www.codeproject.com/Articles/26528/C-Application-to-Watch-a-File-or-Directory-using-F), added an error handler and set the internal buffer of the File System Watcher component to highest possible value allowed by Windows (64KB):

File System Watcher .NET fix

Then I have tried to export two files from the archive system to the Nearline located on the network storage in question:

2 - fsw running

The storage has been still overflowing the FileSystemWatcher component internal buffer with too many notifications.

Some may argue that a piece of C# code freely available on the internet is not guaranteed to be well-written. So let’s go to an even more basic level. Just a few lines in Microsoft Powershell will also show that the amount of notifications sent by the storage is too large.

$watcher = New-Object System.IO.FileSystemWatcher
$watcher.Path = “\\UNCpath\folder”
$watcher.IncludeSubdirectories = $true
$watcher.EnableRaisingEvents = $true
$watcher.InternalBufferSize = 64000
$changed = Register-ObjectEvent $watcher “Changed” -Action { write-host “Changed: $($eventArgs.FullPath)” }
$error = Register-ObjectEvent $watcher “Error” -Action { write-host “Error: $($eventArgs.GetException)” }

4-powershell output

Even the simple seven-line Windows PowerShell script given above shows the same behavior – the network storage is generating too many notifications.

Now the question is – is the problem specific to this SAN? What if we write to a slightly different storage?

I have tried to write to an alternative storage – the result is visible in the window on the left. The window on the right shows exactly the same file being written to the problematic network share:


4 -storage vs storage

Of course, it could be that the application that writes the file is at fault through doing some batch-writes that trigger many changes. And it is possible that the other storage (pictured on the left) has different FSW notification setting enabled.

But the investigation above has proven to be a good starting point for the conversation with the network storage developer.

EDIT: soon after our investigation the vendor has kindly acknowledged this issue and stated that they are working on a patch.

EDIT2: The network storage manufacturer have now released a new firmware with notification throttling option.