dzn: Refactor the event logic to avoid splitting cmdlists
Here's an attempt at reducing the cost of event set/wait (as suggested by @jenatali a while ago). Any intra-cmdlist dependency is implemented with pipeline flush (AKA NULL UAV barrier) since that's the best we can do with the standard barrier API. Inter-cmdlist and host -> device
dependencies are still implemented with ID3D12Fence signal/waits, but we no longer split the cmdlist. This means the event might take a bit longer to expose its new state to the host, but it shouldn't impact inter-cmdlist dependencies, since those are serialized at the ExecuteCommandList() level anyway.
(based on !54 (merged))
/cc @lfrb