(Solved) Scalability
The problem
The Issue is solved, but still open for improvement. Suggestions for this problem can be added to the opened issue.
The scalability works well for repository-service-tuf-api
once you can scale
horizontally, having multiple instances of the Server API sending all
requests to the Broker.
The scalability for repository-service-tuf-worker
is not functional.
The repository workers pick up the tasks randomly, but it is executed in order once we use a lock.
The behavior
The problem is the process of writing the role metadata files.
For example, whenever you add a target to a delegated hash role (i.e.
bins-e
), you need to write a new <version>.bins-e.json
, bump the
<version>.snapshot.json
and the <version>.timestamp
.
If you have a hundred or thousand requests to add targets you might have
multiple new <version>.bins-e.json
followed by bumps in snapshot
and
timestamp
. There is a risk of race conditions.
Exemple
On one level, we optimize it by grouping all changes for the same delegated hash role , avoiding multiple interactions in the same task.
However we still have a problem with the snapshot and timestamp
.
To avoid the problem, we use a lock system with one task per time.
The lock protects against the race condition but does not solve the
scalability. Even having dozen repository-service-tuf-worker
do not scale the
writing metadata process.
The expected behavior
Suggestions for this problem can be added to the opened issue.