(Solved) Scalability

[Issue 17]

The problem

The Issue is solved, but still open for improvement. Suggestions for this problem can be added to the opened issue.

The scalability works well for repository-service-tuf-api once you can scale horizontally, having multiple instances of the Server API sending all requests to the Broker.

The scalability for repository-service-tuf-worker is not functional.

The repository workers pick up the tasks randomly, but it is executed in order once we use a lock.

The behavior

The problem is the process of writing the role metadata files.

For example, whenever you add a target to a delegated hash role (i.e. bins-e), you need to write a new <version>.bins-e.json, bump the <version>.snapshot.json and the <version>.timestamp.

$@startuml participant "Broker/Backend" as broker participant "add-target" as add_target participant "Storage Backend" as storage #Grey broker o-> add_target: [task 01] <consumer> add_target -> storage: loads latest bin-e.json add_target <-- storage: 3.bin-e.json add_target -> add_target: Add target\nBump version add_target -> storage: writes 4.bin-e.json note right: 4.bin-e.json\n\tfile001 add_target -> storage: loads latest Snapshot add_target <-- storage: 41.snapshot.json add_target -> add_target: Add <bin-e> meta\nbump version add_target -> storage: writes 42.snapshot.json note right: 4.bin-e.json\n\tfile001\n42.snapshot.json\n\t4.bin-e add_target -> storage: loads Timestamp add_target <-- storage: Timestamp.json (version 83) add_target -> add_target: Add 42.snapshot.json add_target -> storage: writes timestamp.json note right: 4.bin-e.json\n\t file001\n42.snapshot.json\n\t4.bin-e\ntimestamp.json\n\t42.snapshot.json add_target -> broker: [task 01] <publish> result @enduml$

If you have a hundred or thousand requests to add targets you might have multiple new <version>.bins-e.json followed by bumps in snapshot and timestamp. There is a risk of race conditions.

Exemple

$@startuml participant "Broker/Backend" as broker participant "add-target" as add_target participant "Storage Backend" as storage #Grey broker o-[#Blue]> add_target: [task 01] <consuner> add_target -[#Blue]> storage: loads latest bin-e.json broker o-[#Green]> add_target: [task 02] <consuner> add_target -[#Green]> storage: loads latest bin-p.json add_target <[#Blue]-- storage: 3.bin-e.json add_target <[#Green]-- storage: 16.bin-p.json add_target -[#Blue]-> add_target: 3.bin-e.json\n Add target\nBump version to 4 add_target -[#Green]> add_target: 16.bin-e.json\n Add target\nBump version to 16 add_target -[#Blue]> storage: writes 4.bin-e.json add_target -[#Green]> storage: writes 16.bin-e.json note right: 4.bin-e.json\n\tfile001\n16.bin-p.json\n\tfile003\n\tfile005 add_target -[#Blue]> storage: loads latest Snapshot add_target -[#Green]> storage: loads latest Snapshot add_target <[#Blue]-- storage: 41.snapshot.json add_target <[#Green]-- storage: 41.snapshot.json add_target -[#Blue]> add_target: Add <bin-e> meta\nbump version add_target -[#Green]> add_target: Add <bin-p> meta\nbump version add_target -[#Blue]> storage: writes 42.snapshot.json note right: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t4.bin-e add_target -[#Green]-> storage: writes 42.snapshot.json destroy storage note right#FFAAAA: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t16.bin-p \ \n\t**missing 4.bin-e** add_target -[#Blue]> storage: loads Timestamp add_target -[#Green]> storage: loads Timestamp add_target <[#Blue]-- storage: Timestamp.json (version 83) add_target -[#Blue]> add_target: Add 42.snapshot.json add_target -[#Blue]> storage: writes timestamp.json (version 84) note right#FFAAAA: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t16.bin-p \ \n\t**missing 4.bin-e** \ \ntimestamp.json \ \n\tversion 84 \ \n\t42.snapshot add_target -[#Blue]> broker: [task 01] <publish> result add_target <[#Green]-- storage: Timestamp.json (version 84) add_target -[#Green]> add_target: Add 42.snapshot.json add_target -[#Green]> add_target: Add target\nBump version to 85 add_target -[#Green]> storage: writes timestamp.json (version 85) note right#FFAAAA: 4.bin-e.json\n\t \ file001\n16.bin-p.json\n\tfile003\n\tfile005 \ \n42.snapshot.json\n\t16.bin-p \ \n\t**missing 4.bin-e** \ \ntimestamp.json \ \n\tversion 84 \ \n\t42.snapshot add_target -[#Green]> broker: [task 02] <publish> result @enduml$

On one level, we optimize it by grouping all changes for the same delegated hash role , avoiding multiple interactions in the same task.

However we still have a problem with the snapshot and timestamp. To avoid the problem, we use a lock system with one task per time.

The lock protects against the race condition but does not solve the scalability. Even having dozen repository-service-tuf-worker do not scale the writing metadata process.

The expected behavior

Suggestions for this problem can be added to the opened issue.