executor: add a new artifact cache
executor: add a new artifact cache
The current artifact caching solution is based on CacheControl, a library that interfaces with the HTTP requests library to transparently provide caching.
However, CacheControl has the following issues:
- Concurrent requests in a cold cache scenario will always be served concurrently, until one finishes
- High memory usage (6x the artifact size)
- Streaming ressources is unreliable since dead connections would interrupt the stream
- Cached data may be incomplete without CacheControl knowing about it, leading to weirdness
In a world where a single Raspberry Pi should be able to control tens of test machines, and some of them would be using artifacts weighing hundreds of MB (think complete OS images), this level of inefficiency and unreliability is unacceptable.
Instead, we propose an artifact cache that was designed with the following requirements:
- Ressource efficient: Minimize the network bandwidth, along with RAM and CPU usage
- Reliable: The cache should be resistent to network/download failures and automatically restart when possible.
- Low-latency: Access to the artifact should be streamable from cache or from the internet transparently, without the need to first wait for the download to finish.
- Multi-process: Should use the minimal amount of synchronisation to make parallel access to the cache from multiple thread or processes as fast as possible. Cache requests for the same URL coming from multiple processes should be de-duplicated and only result in one download
- Simple to use: Provide the methods needed to either stream the artifact, or get a filepath, without having to think about locking or being optimal.
More details can be found in artifactcache.py :)