Skip to main content

Copy paste test

Danny Fallon avatar
Written by Danny Fallon
Updated over 6 years ago

responding at all. Exact cause of 500 error is not known (due to lack of logs/reporting from the app), assumption is that Heroku limits outbound connection from single dyno or another reason caused Ruby app to fail when trying to download the image. Given setting of Cloudfront destination, the request that resolved in time but with 500 code, had been cached in Cloudfront and later resulted in need to purge the CF cache.

Number of requests/minute on CloudFront gateway by Hit, Miss, Error (gaps are due to missed log files downloads) (time in UTC)

The app did run on unknown Heroku account, with unknown dyno type and apparently no auto scaling enabled, which contributed to the app did not recover itself.

Not responding proxy service caused Error on Cloudfront which resulted in CloudFront responding 503 to the client. That in turn caused images in affected comments (where images URLs had been replaced with proxied URL) to not load and to not show in the client browser (inbox or messenger). 

Detection

After customer reported they can’t see the uploaded images, the PR containing the GEM bump had been identified to be responsible and the PR had been reverted. At the same time attempt to scale up the proxy service has started, but failed due to unknown location of the service.

Majority of errors occured in Inbox (app.intercom.io in graph below) and rest on individual customer’s messengers on their webpage:

Distribution of requests on CloudFront by referrer between 14:35 and 15:37 UTC

Revert

The deployment of the revert recovered images uploads created after the deploy to being transformed to proxied URL.

Did this answer your question?