Troubleshooting: how to solve your problems in nine steps

Posted on July 4, 2017 • 6 min read • 1,238 words
I have been working with various computer systems for quite a few years now. I got my first computer when I was in elementary school. I belonged to the…
Troubleshooting: how to solve your problems in nine steps

I have been working with various computer systems for quite a few years now. I got my first computer when I was in elementary school. I belonged to the MSX fanbase. Early on, I realized that not everything always worked the way it was supposed to. The trick was figuring out what was going wrong and why. If you know what is going wrong and why, you can at least fix it.

I have applied and refined this little trick over the years. I am in the process of describing it for an article on VMGuru.com. Basically, it boils down to the following:

image

One of the most important things when it comes to troubleshooting is that you have a system or workflow. You need to tackle the problem methodically and analytically. In my eyes, there is nothing worse than random troubleshooting and suddenly, things start working again. You still don’t know what was wrong, which means you won’t be able to prevent it next time.

The approach  

There are various ways to solve a problem. My approach is roughly outlined below. Despite saying that you should use a workflow, I don’t always do it in the same order myself. Here it is:

  1. Take a deep breath, grab a cup of coffee or tea, and think calmly about your problem.
  2. Describe the problem in as much detail as possible.
  3. Check if the problem is still occurring.
  4. Check for known issues and solutions.
  5. Sketch the situation.
  6. Check the connectivity between components.
  7. Check components and error logs.
  8. Modify components according to your documentation.
  9. ‘If everything else fails, try a new approach.’

I won’t describe the steps in all their details, but I hope it provides you with enough structure for your troubleshooting journey.

1. Take a deep breath, grab a cup of coffee or tea, and think calmly about your problem  

This step is possibly one of the most important, if not the most important, thing you must do. Don’t panic, just sit and relax. Take your time. When you start tampering with various things, you may be further from a solution. Another thing you definitely should not do is CYA – cover your ass – or whatever you want to call it. Don’t delete log files; rather, admit that you made a mistake. It will come out eventually anyway.

First, focus on solving the problem. Saving your job or career comes only after that.

2. Describe the problem  

It’s important to describe the problem as accurately as possible. ‘It doesn’t work’ is not really a good description of the problem. It also doesn’t help you fix the problem.

What exactly isn’t working? Can’t you install or use something? Can’t you access the system? Is there no display? Doesn’t the system work as you’d expect? And then, how does that manifest?

Be as precise as possible. Describe as many symptoms as you can. And no, ‘I think this or that happened’ or ‘someone somewhere told someone else it was broken’ doesn’t cut it. Also, don’t filter out information. It needs to be as specific as possible, but excluding symptoms might prevent you from solving the problem.

As part of describing your problem, you can use the Five Ws and one H: who, what, when, where, why, and how, and also consider other questions:

  1. Who is experiencing the problem
  2. What is the problem
  3. What were you doing when the problem occurred
  4. Why is it a problem
  5. When does the problem occur
  6. Where does the problem occur
  7. (if possible) Why does the problem occur
  8. Did it ever work in the first place?
  9. What has changed since it last worked?

3. Check if the problem is still occurring  

Replicate the problem. Verify it yourself. Don’t just trust another architect/consultant/engineer/technician/<fill in the blank>. Trust is good, verification is better.

Trying to solve a problem that has since disappeared, or one you haven’t witnessed yourself, is a waste of time. It’s important to confirm that the issue still exists. When you expect error messages but don’t see them, try to provoke them. Maybe the problem was temporary due to an external cause and has already been resolved at another level, like a power outage or network failure.

4. Check for known issues and solutions  

Search the internet for known issues and their solutions. Often, you can find a knowledge base on the supplier’s or manufacturer’s website. Here, you can search specifically for symptoms. Often a solution or at least a direction to search comes out of this.

5. Sketch the situation  

Drawing a sketch or diagram involving the problematic component can give you a good idea of where to look. By sketching the situation, you organize the information for yourself. And if you are troubleshooting with a team, it helps ensure everyone has the same understanding of the infrastructure or the situation.

6. Check the connectivity between the components  

Now that you know how the components are related, you can check how the communication between them is proceeding. With communication, I mean any exchange. This could also mean, for example, the fuel flow from tank to engine, as well as the connection between your email client and the email server on the internet. The check can often be performed quite basically. For an internet connection, you can execute a ‘ping’ command. For a fuel flow, you could disconnect one end of the hose to see if fuel comes out.

7. Check components and error logs  

Check all components based on your sketch. Review all error logs. Components that have changed since the last time the system worked are suspect.

  1. Determine what the component is supposed to do and check if it indeed does it.
  2. Check for error messages, event logs, log files, onboard management systems, and similar systems.
  3. Review the configuration against your documentation (you have documentation, right?), but don’t change anything yet!!

Keep in mind that a lack of error messages could also be a fault in certain cases. Also, check if you can increase the level of detection for messages. In certain systems, you can turn on ‘debug’ logs, allowing you to see all actions and activities as they happen. This gives you insight into what’s going wrong.

8. Modify components according to your documentation  

When you’re ready with the review of systems and components, you will undoubtedly have a list of items that differed from the documentation. This is the time to adjust those items, after you have of course taken steps to revert back to the previous situation if necessary, like creating a backup.

  1. Change one thing at a time
  2. Document your change
  3. Check if it works now. If it doesn’t work, revert.

9. ‘If everything else fails, try a new approach’  

If the above steps haven’t helped, or if it’s taking too long, don’t search for the cause indefinitely. Take action:

  1. Consult the supplier/manufacturer
  2. Replace components with those you are sure work
  3. ‘Break’ what worked. At least this way, you will know what to expect.

In conclusion  

Every problem is unique. I often use the above process myself. Not just to solve problems myself, but also when I have to solve a problem with a team. It helps to outline what everyone needs to do or check within their responsibility.

A problem doesn’t need to cause panic as long as you have a process with which to fight the problem.

See also

    Follow me