How to use a saved network log to debug issues on a remote machine

Steps to reproduce an error

We encounter "steps to reproduce an error" with some regularity:

A bug defect that comes through from QA
Bug reports in an open-source project or collected from your application in production
Your colleague informally describing an issue they observed

Upon receiving these instructions, which can vary wildly in amount of detail and accuracy, we face a few roadblocks when trying to replicate the error:

Our environment is different from the one used in the report - this even applies when it is our own report!
The information might be missing a few key details - screenshots don’t capture what we might want to see, and pasted payloads may not be the ones that we really need.
The information might be inaccurate
We aren’t able to use the tools that we normally would to debug it; like inspecting payloads and headers, looking to see which services were called and in which order, which service(s) resulted in error codes, etc.

How to better investigate network-related bug reports

The HAR file is a download of all network traffic from the Network tab of debug tools (supported in all browsers). You can read more about it in this post. We can drag-and-drop this file into the Network tab of our browser dev-tools and it displays all of the captured network traffic, allowing us to filter and inspect the responses in seconds.

It is more reliable to use this file even if we are able to perfectly reproduce all of the steps because it captured a snapshot of calls at the exact point in time that led to the error. So it is not susceptible to the usual reasons that we "can’t replicate the issue":

intermediate environments and services might have been sending unexpected responses then but aren’t now
the environment targeted in the report can be different from ours
creating the test data to replicate the issue might not be available at the time that we are able to debug the issue

What can we gain by using the HAR file?

A time reduction from days to seconds: In the worst-case scenario it takes days to get the right test data and environment setup; in the best-case scenario we still need to replicate the steps instead of drag-and-drop into the network tab.
Accuracy when investigating the issue: The network calls are exactly as they were when the error occurred, and there are no variables.
Reduction in back-and-forth communication: We might still need to communicate in order to get all of the information related to the calls (E.g. what the user was doing in the UI exactly), but it is very likely that looking at the log is sufficient to fully track down the issue.
Using the debugging tools that we are familiar with already: Debugging using screenshots or copy-pasted logs removes our familiar set of tools that we use when debugging payloads, order and timing of calls, etc. Even if we receive the exact response that we need for debugging via copy-paste, we still need to put that into another tool in order to get information out of it (e.g. if it is a JSON response, we would want to format it or drill down). This functionality is already built into our debugging tools and we don’t need to copy-paste it anywhere.

What might be lost?

There are some considerations to take into account when using these files:

File size: The files can be quite large. It is typical for it to be in the 10’s of megabytes in size. However, the file is pure-text, so it can the compressed pretty heavily. This might be a consideration if it’s being attached in something like JIRA.
Sensitive data: These files contain all of the network traffic, including headers and payloads, and so user tokens, user data, and other sensitive information is contained within. If this comes from an outside source (from customers) we want to take care to make sure that we are storing them appropriately. Even if it is from within the organization team we would want to check them to ensure that keys, tokens, etc. are appropriately masked.
Added step when creating the tickets: This may be a concern, especially if you have automated tests: you have to set up the download of the HAR files, which may not be feasible. However most tools (like Cypress and Selenium) provide the functionality to download the HAR file since they interface directly with the browser dev-tools.

An anecdote

The other day I went through a back-and-forth exchange with QA on a ticket. The ticket itself had a good bit of detail: steps to reproduce, screenshots, actual vs. expected behaviour; but the process involved in reproducing this issue took a few days. This was due to a few factors:

There were environment-related issues so we weren’t able to create test data
When the test data was available, the endpoints weren’t available because some back-end services weren’t available at the time
When everything else was working, I was unavailable (at night, over the weekend)

When we eventually got this reproduced together, we realized that the issue happened to be a network call returning the wrong payload much further back in the chain of calls than where we were testing (the application is a multi-page sign-up process).

Using this we came up with a more reliable process. Every ticket is created using the following checklist.

Detailed steps to reproduce
"Expected" vs "actual" behaviour
HAR file captured at the time of the error
Console log captured at the time of the error
Comparison against the behaviour in production (if available)

Summary

Refuse to debug an issue without a HAR file if the issue involves a network request of any kind. Don’t waste time and resources (you may not even be able to reproduce the error) otherwise. Consider requesting a console log output as well.
You can drag-and-drop a HAR file into most browser dev-tools to inspect the requests
HAR files are widely supported: You can use Charles Proxy to replay or modify the requests, create them from many tools like Selenium and Cypress (and almost all current browsers, including IE back to version 9)