All the rules of generally fixing things apply, but there are technology-specific considerations to also consider.
Before you ever panic or give up, make a simple web search for what you want, since others have probably found the answer.
- To be more thorough, use search handles to narrow what you’re looking for.
- Look for similar model numbers or alternate-language implementations of the same thing.
- What works for others may not work for you, but how you can use what worked is more important than what has been done.
Networked
Typically, it helps to triple-check that every networked component is offline, and preferably unplugged.
- A powered-off device can still register as connected (especially network switches).
Network Chains
As counter-intuitive as it sounds, move things around frequently to see if anything changes.
- Change out port locations, plug things into various locations, swap out hardware.
- Often, the programmer will make code that worked for the situation (e.g., check ports 1 and 2 on a two-port switch), and it wasn’t updated for a later hardware release (e.g., check all available ports).
- Sometimes, the code may need something to change to escape an endlessly looping subroutine.
Your most difficult challenge will first be in making the problem reproducible, then in localizing it.
- If the issue keeps cropping up in the same area, split that area in half as many times as possible.
Perception
The ability to know exactly which details are significant can only come from experience.
- A network technician with 2 years of hands-on experience with that particular software or hardware is worth one with 6 or 10 years on anything else.
Repetition
One of the fortunate aspects about most computer troubleshooting (with the important exception of anything involving AI) is that the system is highly fine-tuned, meaning that it’s not likely that more than one thing broke at once.
Complications
Since computers are inherently complicated, do not do anything to make things more complicated. This is not easy for the types of people who use computers.
To avoid reference issues, don’t let a CPU run updates or install anything while it’s multitasking something else:
- The code on the computer is instructed to write information to Point A.
- While it’s been designated, but before it was written, a tech-savvy user made Point A become Point B because they were trying to be efficient with something else.
- Computer writes to Point A.
- Computer later glitches out because everything that was relative to Point A is now only accurate relative to Point B.
- Worst-case scenario is that the tech-savvy user must do something far more dramatic, like reinstall the OS or extract data from a hard drive.
If you must roll back updates, turn off the auto-update features first, and make sure to roll back all the connected dependencies. Rolling back is like heart surgery, so only do it if you have no choice.
Repairing
The best way to repair depends heavily on the domain.
It’s always important to have done some preventative work before you needed to repair it:
- Have the same or similar extra hardware available for replacement.
- Keep offline media of the current software versions available, or at least have another means to connect to the source of that software (e.g., mobile hotspot cellphone subscription).
- Keep ready access to the precise technical documentation that indicates how to reset or reinstall something.
The easiest preventative measure is to always keep multiple backups.
- If you’re pressed for memory space, space out the backup cycle as you go farther back (e.g., keep a copy for each week for the past month, a copy from every month for the past year, etc.).
- If you must manually run the backup, you should be spending more time-saving backups than loading them.
For the most part, software fixes simply require having the software pre-downloaded for quick transfer, but it’s worth keeping some hardware available, just in case:
- USB drive loaded with a plethora of diagnostic apps, preferably two of them (one for Windows, and the other with a lightweight Linux distro on the drive)
- A wired USB keyboard, which is less trouble to set up
- A Bluetooth keyboard, for mobile devices
- Non-magnetized screwdriver set with Torx, flat, and cross-point drivers
- Antistatic mat or antistatic wrist strap
- Head-mounted magnifier or magnifying glass
- POST card for boot issues
- Loopback plug for network diagnosis
- Multimeter for testing circuitry
- Power supply tester
- Soldering iron with solder wire
Hardware
Find a sufficient replacement that does the job.
- It can be an upgrade if the situation permits (e.g., keyboard, mouse), but make sure it’s compatible before getting it.
- Don’t worry too much about overkill (e.g., a newer model with more features) or reliability, since you can replace it again when it’s not urgent.
Software
Try to reinstall or reload the code.
- If you have access to the code, you may be able to change a reference, but don’t try rebuilding the code until after it’s back online and no longer urgent.
If anything depends on it, don’t upgrade it.
- Unless a dependency elsewhere had upgraded and deprecated support for the current version, try to reinstall what existed.
- Updates are generally not good to roll out, but software updates will frequently overwrite hundreds, maybe thousands of references, and you may need to debug new updates and features on top of addressing your current problem.
Data
Unfortunately, recovering data can be tremendously difficult.
- Look for proprietary software to recover the data, which may require decrypting the device’s memory.
- If the data is particularly secure or proprietary, you may need another piece of hardware that’s precisely the same type (e.g., a specific brand of disk drive).
- Sometimes, you’ll simply have to hack the solution by ripping out the data yourself, then find the protocol that translates the raw data into a usable format.
AI
If a training model has been poisoned, you have several options:
- Start all over and retrain. This is technically the most obvious, but also the most time-consuming and potentially the most expensive.
- Train the entire model on fixed, predictable, safe data, which dilutes the poison. It’s not foolproof, but it’s technically the lowest-effort, and further exposure to good data will make the model fix itself over time.
- Delete and retrain the specific faulty nodes. If you can pull it off, this is ideal.
Postmortem
After fixing anything, always document the new rules and what happened. Otherwise, you’ve made life worse for someone else (including Future You).
Case Studies
Weird failures:
- Print this file, your printer will jam
- We can’t send email more than 500 miles
- Car allergic to vanilla ice cream
Impressive fixes: