(Solved) Scalability

The problem

The Issue is solved, but still open for improvement. Suggestions for this problem can be added to the opened issue.

The scalability works well for repository-service-tuf-api once you can scale horizontally, having multiple instances of the Server API sending all requests to the Broker.

The scalability for repository-service-tuf-worker is not functional.

The repository workers pick up the tasks randomly, but it is executed in order once we use a lock.

The behavior

The problem is the process of writing the role metadata files.

For example, whenever you add a target to a delegated hash role (i.e. bins-e), you need to write a new <version>.bins-e.json, bump the <version>.snapshot.json and the <version>.timestamp.

@startuml

participant "Broker/Backend" as broker
participant "add-target" as add_target
participant "Storage Backend" as storage #Grey

broker o-> add_target: [task 01] <consumer>

add_target -> storage: loads latest bin-e.json
add_target <-- storage: 3.bin-e.json
add_target -> add_target: Add target\nBump version
add_target -> storage: writes 4.bin-e.json
note right: 4.bin-e.json\n\tfile001

add_target -> storage: loads latest Snapshot
add_target <-- storage: 41.snapshot.json
add_target -> add_target: Add <bin-e> meta\nbump version
add_target -> storage: writes  42.snapshot.json
note right: 4.bin-e.json\n\tfile001\n42.snapshot.json\n\t4.bin-e

add_target -> storage: loads Timestamp
add_target <-- storage: Timestamp.json (version 83)
add_target -> add_target: Add 42.snapshot.json
add_target -> storage: writes timestamp.json
note right: 4.bin-e.json\n\t file001\n42.snapshot.json\n\t4.bin-e\ntimestamp.json\n\t42.snapshot.json
add_target -> broker: [task 01] <publish> result

@enduml

If you have a hundred or thousand requests to add targets you might have multiple new <version>.bins-e.json followed by bumps in snapshot and timestamp. There is a risk of race conditions.

Exemple

@startuml

participant "Broker/Backend" as broker
participant "add-target" as add_target
participant "Storage Backend" as storage #Grey

broker o-[#Blue]> add_target: [task 01] <consuner>
add_target -[#Blue]> storage: loads latest bin-e.json
broker o-[#Green]> add_target: [task 02] <consuner>
add_target -[#Green]> storage: loads latest bin-p.json
add_target <[#Blue]-- storage: 3.bin-e.json
add_target <[#Green]-- storage: 16.bin-p.json
add_target -[#Blue]-> add_target: 3.bin-e.json\n Add target\nBump version to 4
add_target -[#Green]> add_target: 16.bin-e.json\n Add target\nBump version to 16
add_target -[#Blue]> storage: writes 4.bin-e.json
add_target -[#Green]> storage: writes 16.bin-e.json
note right: 4.bin-e.json\n\tfile001\n16.bin-p.json\n\tfile003\n\tfile005


add_target -[#Blue]> storage: loads latest Snapshot
add_target -[#Green]> storage: loads latest Snapshot

add_target <[#Blue]-- storage: 41.snapshot.json
add_target <[#Green]-- storage: 41.snapshot.json

add_target -[#Blue]> add_target: Add <bin-e> meta\nbump version
add_target -[#Green]> add_target: Add <bin-p> meta\nbump version

add_target -[#Blue]> storage: writes 42.snapshot.json
note right: 4.bin-e.json\n\t \
file001\n16.bin-p.json\n\tfile003\n\tfile005 \
\n42.snapshot.json\n\t4.bin-e
add_target -[#Green]-> storage: writes 42.snapshot.json
destroy storage
note right#FFAAAA: 4.bin-e.json\n\t \
file001\n16.bin-p.json\n\tfile003\n\tfile005 \
\n42.snapshot.json\n\t16.bin-p \
\n\t**missing 4.bin-e**

add_target -[#Blue]> storage: loads Timestamp
add_target -[#Green]> storage: loads Timestamp
add_target <[#Blue]-- storage: Timestamp.json (version 83)
add_target -[#Blue]> add_target: Add 42.snapshot.json
add_target -[#Blue]> storage: writes timestamp.json (version 84)
note right#FFAAAA: 4.bin-e.json\n\t \
file001\n16.bin-p.json\n\tfile003\n\tfile005 \
\n42.snapshot.json\n\t16.bin-p \
\n\t**missing 4.bin-e** \
\ntimestamp.json \
\n\tversion 84 \
\n\t42.snapshot

add_target -[#Blue]> broker: [task 01] <publish> result

add_target <[#Green]-- storage: Timestamp.json (version 84)
add_target -[#Green]> add_target: Add 42.snapshot.json
add_target -[#Green]> add_target: Add target\nBump version to 85
add_target -[#Green]> storage: writes timestamp.json (version 85)
note right#FFAAAA: 4.bin-e.json\n\t \
file001\n16.bin-p.json\n\tfile003\n\tfile005 \
\n42.snapshot.json\n\t16.bin-p \
\n\t**missing 4.bin-e** \
\ntimestamp.json \
\n\tversion 84 \
\n\t42.snapshot
add_target -[#Green]> broker: [task 02] <publish> result

@enduml

On one level, we optimize it by grouping all changes for the same delegated hash role , avoiding multiple interactions in the same task.

However we still have a problem with the snapshot and timestamp. To avoid the problem, we use a lock system with one task per time.

The lock protects against the race condition but does not solve the scalability. Even having dozen repository-service-tuf-worker do not scale the writing metadata process.

The expected behavior

Suggestions for this problem can be added to the opened issue.