Admission control system
Blocking I/O and non-blocking I/O clients
Admission control system : system with load shedding and rate limiting -> back exception -> HTTP 429 and 503
Client should avoid immediate retry -> retrying with exponential backoff and jitters is better
What if servers are not elastic and the problem is not transient ?
Blocking / non-blocking client (sync and async )
Pros
Different : how they handle concurrent thread
How many concurrent requests a client can generate ? An example
Nowadays , industry are shifting to non-blocking -> blocking server -> blocking clients
No matte which server model can be suffered from this issue when the inching request exceeds the outgoing traffic and the processing request is accumulating -> exhausting the resources
One way to resolve is to stop send request for a while , to allow server to process -> circuit breaker pattern
Circuit breaker finite-state machine
Important considerations about the circuit breaker pattern
When the client received failed response from server, it counts them and when reach a limit it stops calling the server for a while;
There are some open source libs for using -> Resilience4j or Polly -> you specify thress things
A state machine that explain the state transition
If from in half open, there are some other exception, it remains half open
Things to consider
Problems with slow service (chain reactions cascading failures) and ways to solve them
Bad service
It's better to fail fast than fail slow. Fail immediately and visibly -> distributed system and OOD
e.g.
Object initialization: we should throw exception if we can't initiate completely
java
public class SomeClass {
private final String username; // immutable object
public SomeClass(@NotNull String userName) {
this.username = userName;
}
....
}
Precondition: Implementing precondition for input parameters in a function:
java
public static double sqrt(double value) {
Preconditions.checkArgument(value >= 0.0);
...
}
Configuration validation: set configuration file and read properties from it. Fail fast and don't rely on default values
java
public int maxQueueSize() {
String peroperty = Config.getProperty("maxQueueSize");
if (property == null) {
throw new IllegalStateException("...");
}
...
}
Request validation : return exception back to client and don't try to set value and continue handling
java
private void validateRequestParameter(String param) {
if (param == null) {
throw new IllegalArgumentException("...");
}
...
}
Slow service kill themselves along with their servers, e.g.
When a single consumer start to slow down, all the sender threads will be impacted , then the notifications queue will be consumed slower and thus fill up message, and producers will be slow and will not continue push or pulled(back pressure) messages by the queue.
Cascading failure: one component causes the whole system component to fail (different part in a system)
Chain reaction Or Server identified the slow consumer and no more send messages to it(like using circuit breakers) -> this will cause more load on the other consumers and slow down them (system type of components)
For cascade failures, client should protect themselves .-> convert slow queries to fast queries by identify or isolate bad dependencies.
How to implement
Partition resources into groups of limited size and isolated groups , to isolate the impacted of failed parts from others health parts .
The example in the above can be -> instead of having a single thread pool -> we have several thread pool for each consumer, each pool has a limited number of threads . One consumer and pool slow down will not impact others .
More examples -> this pattern is used in a service that has many dependencies
Find group limits can be hard -> and theses limits needs to be revised from time to time -> load test
How to implement
Note