Everything is unstable. You cannot assume that every external service call will succeed on first attempt - the load balancer might be removing a failing host, the database might be being patched, the Ops person might have tripped on the network cable and quickly plugged it back in hoping the boss didn't notice. If you make an outbound call you probably need an easy way to retry a couple of times before giving up and raising an exception.

As applications get more distributed, as the use of micro services increase, as the quality of SAAS products with APIs improves, the applications we develop become ever more reliant on external calls - usually http requests. And so what is often forgotten is that applications become naturally more unstable.

Do it the same, but better.

In many situations when an outbound http request or SQL query errors the simplest thing to do is just make the call again. If it succeeds then great, no need to page the on call DevOps person! If it fails, at some point you need to give up trying and throw the error.

You should not retry everything, and you should not retry hundreds of times. If the SMTP server has sent the email you told it to send but throws an exception on writing a log about it because the disk is temporarily full, you do not want to retry 1000 times as quickly as possible as those email all send and the recipients computer will grind to a halt. Ask me how I know.

But limited and careful use of the retry pattern can improve your applications reliability and reduce exceptions and their resulting bug reports - because you are tracking the errors in your production application right?! And less support tickets is always a good thing.

Most developers have tried to implement retries at some point and the result is frequently much harder to read or maintain:

  • Retry logic should not obscure the actual application logic making code harder to understand later.
  • Retry logic is probably a cross cutting concern and should be centralised. Avoid duplicating that retry looping code.
  • You may want to be able to configure the retry behaviour without recompilation. What happens when everything is overloaded and a colleague suggests "we need to disable all the retries and fail fast"?
  • Use it with care, badly implemented or miss-configured and it can add even more load to a failing system.
  • If your requirements are complicated then consider the excellent Polly library, however that may be overkill for many situations and one helper utility is often enough.

A simple synchronous retry helper example

If you are not using async/await then one static method and a few lines may be all you need.

To apply the helper just use a lambda to wrap the action that you want to retry on error. Lets assume we want the http call to fetch some product data JSON to retry 3 times on failure before throwing the exception. Our original code might look something like

product = httpClient.Get("https://example.com/api/products/1");  

So now we wrap the call with our helper:

var maxRetryAttempts = 3;  
var pauseBetweenFailures = TimeSpan.FromSeconds(2);  
RetryHelper.RetryOnException(maxRetryAttempts, pauseBetweenFailures, () => {  
    product = httpClient.Get("https://example.com/api/products/1");
});

Ideally the values for retry attempts and seconds paused between failures should be loaded from configuration.

Observe that we are still logging the error so that we can monitor and track the errors, just without necessarily holding up execution in a transient failure situation, and that we have released the thread to do other work using Task.Delay while we wait for our http server to try and recover.

Next up we will add an async version of the retry helper and later we look at using the library Polly to do retries

via GIPHY