A number of tests assume that there will only be a single change set. This assumption only holds true if everything between start() and stopAfter() happens within the polling interval. If things are running slowly and the time take to make the changes to the file system is longer than the polling interval, multiple change sets will be produced by the watcher. We could increase the polling interval to decrease the chance of the changes taking longer than the interval. Alternatively, we could make the tests tolerate multiple changes sets by combining them and asserting that across the n change sets, only the expected changes were detected. The latter will be more robust and will also avoid making the tests take longer to run. As such, it's my preferred option.