All the rules of generally fixing things apply, but there are technology-specific considerations as well.
Before you ever panic or give up, make a simple web search for what you want, since others have probably found the answer.
- To be more thorough, use search handles to narrow what you’re looking for.
- Look for similar model numbers or alternate-language implementations of the same thing.
- What works for others may not work for you, but how you can use what worked is more important than what has been done.
Most of the time, it helps to triple-check that every networked component is offline, and preferably unplugged.
- A powered-off device can still register as connected (especially network switches).
As counter-intuitive as it sounds, move things around frequently to see if anything changes.
- Change out port locations, plug things into various locations, swap out hardware.
- Often, the programmer will make code that worked for the situation (e.g., check ports 1 and 2 on a two-port switch), and it wasn’t updated for a later hardware release (e.g., check all available ports).
- Sometimes, the code may need something to change to escape an endlessly looping subroutine.
Your most difficult challenge will first be in making the problem reproducible, then in localizing it.
- If the problem keeps cropping up in the same area, split that area in half as many times as possible.
The ability to know exactly which details are significant can only come from experience.
- A network technician with 2 years of hands-on experience with that particular software or hardware is worth one with 6 or 10 years on anything else.
One of the fortunate aspects about most computer troubleshooting (with the important exception of anything involving AI) is that the system is highly fine-tuned, meaning that it’s not likely more than one thing broke at once.
Since computers are inherently complicated, do not do anything to make things more complicated. This is not easy for the types of people who use computers.
To avoid reference issues, don’t let a CPU run updates or install anything while it’s multitasking something else:
- The programming of the computer specifies that it will write information to Point A.
- While it’s been designated, but before it was written, tech-savvy user made Point A become Point B because he was trying to be more efficient.
- Computer writes to Point A.
- Computer later glitches out because everything that was relative to Point A has been relative to Point B.
- Tech-savvy user must do something far more dramatic like reinstall the OS or extract data from a hard drive.
If you must roll back updates, turn off the auto-update features first, and make sure to roll back all the connected dependencies. Rolling back is like heart surgery, so only do it if you have no choice.
The best way to repair depends heavily on the domain.
It’s always important to have done some preventative work before you needed to repair it:
- Have the same or similar extra hardware available for replacement.
- Keep offline media of the current software versions available, or at least have another means to connect to the source of that software (e.g., mobile hotspot cellphone subscription).
- Keep ready access to the precise technical documentation that indicates how to reset or reinstall something.
The easiest preventative measure is to always keep multiple backups.
- If you’re pressed for memory space, space out the backup cycle as you go farther back (e.g., keep a copy for each week for the past month, a copy from every month for the past year, etc.).
- If you must manually run the backup, you should be spending more time saving backups than loading them.
Find a sufficient replacement that does the job.
- It can be an upgrade if the situation permits (e.g., keyboard, mouse), but make sure it’s compatible before getting it.
- Don’t worry too much about overkill (e.g., a newer model with more features) or reliability, since you can replace it again when it’s not urgent.
Try to reinstall or reload the code.
- If you have access to the code, you may be able to change a reference, but don’t try rebuilding the code until after it’s back online and no longer urgent.
If anything depends on it, don’t upgrade it.
- Unless a dependency elsewhere had upgraded and deprecated support for the current version, try to reinstall what existed.
- Updates are generally not good to roll out, but software updates will frequently overwrite hundreds, maybe thousands of references, and you may need to debug new updates and features on top of fixing your current problem.
Unfortunately, recovering data can be tremendously difficult.
- Look for proprietary software to recover the data, which may require decrypting the device’s memory.
- If the data is particularly secure or proprietary, you may need another piece of hardware that’s the exact same type (e.g., a specific brand of disk drive).
- Sometimes, you’ll simply have to hack the solution by ripping out the data yourself, then find the protocol that translates the raw data into a usable format.
If a training model has been poisoned, you have several options:
- Start all over and retrain. This is technically the most obvious, but also the most time-consuming and potentially the most expensive.
- Train the entire model on fixed, predictable, safe data, which dilutes the poison. It’s not foolproof, but it’s technically the lowest-effort, and further exposure to good data will make the model fix itself over time.
- Delete and retrain the specific faulty nodes. If you can pull it off, this is ideal.
- Print this file, your printer will jam
- We can’t send email more than 500 miles
- Car allergic to vanilla ice cream