Thinking about Troubleshooting and Problem Resolution

You are a helpdesk technician with NotReallyACompany and 2 issues approach you on the same day:

Issue 1: "My Screen is Black"
Reporter: Marketing Guy
Priority: 3
Description: The screen on my laptop has gone black and I can't see anything. It's one of the Dells that we have.

Issue 2: "No Display"
Reporter: Research Engineer
Priority: 1
Description: I have an issue with the screen on my Mac being black. I can see nothing on the screen and I need this laptop today.

There are two ways to approach this issue. The first is to triage each issue separately. Admittedly, there's good justification for doing so.

  • These machines have different hardware architectures
  • The machines have different operating systems
  • The inividuals have different expressions of Urgency in their priorities

It would not be, therefore, unreasonable to first triage the Mac determine next course, and then determine if it is not better to fix that machine before even beginning to triage the Marketing machine.

However, with a little bit more depth of thought (and foreknowledge of what the issues is because this is based on a real-world experience, but jumping the gun makes for a bad narrative, so just follow me here.), these are the same issue.

When I first encountered these issues together, I did what I suggested above. I was a junior technician on a university helpdesk and they seemed extremely different. At the end of that day I was completing the write-ups of all of my tickets for the shift and I noticed that... well... they weren't different.

2 machines, 2 procedures

There are issues that you never really think about in Windows systems that come up in Macs. For instance: the need to reset the Non-Volatile, or NVRAM on an Intel Mac. I'm showing my age here a bit, the M1 and newer series Macs take care of this automatically, but there are still a lot of Intel Macs out there. Windows systems use something similar to NVRAM, but before this comes to a technical lesson just know that Windows systems, and most *nix systems, take care of this problem for you at boot. With Macs, it was a common issue to need to reset them. This is done with a key-combination Cmd+Opt+P+R. The next common issues with Macs was needing to hit the Command+F1 key some number of times to engage the display.

At this point in my career, that is actually how I thought about problems. Each system had a discrete set of steps to resolve an issue, regardless of underlying cause. This was naive, and brought on by an education that encouraged doing well on the exam, regardless of underlying knowledge.

On Windows systems, it was almost always a case of needing to pull the battery, hold down the power button, boot the system, and press Windows+P until the proper monitor engaged.

Both of these procedures were different, but with the number of things that could happen on the Macs everything always *seemed* more complicated.

Ultimately, both of these are the same solution.

2 procedures, 1 process

Both of these operating systems were fundamentally trying to accomplish the same thing. Engage the onboard video hardware to display output to the builtin screen. Both of them had been plugged into external displays before being closed **and then** unplugged. This saved to persistent memory which display was being used. Each one, when it booted next, went through the same process:

  1. POST
  2. Load the OS Boot RAM State
  3. Boot the OS from the RAM State
  4. Query the Default Hardware
  5. Engage the Display Hardware
  6. Begin Displaying Output

And they just kept carrying on. To the machines, they were doing what they were supposed to be: Sending output data to the display hardware. On the other hand, that display hardware had nowhere to display. It was also doing it's job in taking the data from the processor... before dumping it right into nothingness.

The issue was that both systems had a stuck state to resolve: the belief that the secondary display was to be the default in this situation.

Ultimately, the processes of unplugging the battery and holding the power button on the Dell and holding Cmd+Opt+P+R on the Mac did the same thing. They forced the system to boot cleanly with no preset functions for things such as what hardware is connected, what drives should be present, which display should be used, etc. All of these were reset back to their (generally) least problematic default values.

In my case, however, these systems still needed more coaxing. The operating system had *also* been set to view the secondary display as default. This is frustrating but has several solutions:

  • Plug in a secondary display and reset the display settings with the GUI
  • Engage the Operating System's built-in display management to set the proper display

Sometimes, if the stars align and you have all the right cables handy, the first one is faster. Outside of sitting at a desk for a machine I was actively working on and accidentally caused this problem for myself or a dock was conveniently located and plugged in where I was, that has never happened to me. On the other hand, both systems have a convenient key combination to resolve this issue. (This is a convenient time to remind folks that computers are meant to be used... engineers and developers go through a lot of effort to make that the case...) In the case of Windows Operating Systems the command is Windows+P and on Macs it's Command+F1.

Both sets of commands do the same thing: Engage the operating system's built-in display management and set a display. Both work largely the same way, if you hold the function key (Windows or Command) and repeatedly press the operation key (P or F1) they will cycle through the available display options. Depending on OS version and hardware this can be several options, usually including "Primary Only", "Secondary Only", "Duplicate", "Extend", and "Off".

The Outcome

Ultimately, this experience taught me several important lessons:

  1. It's important to know *why* when doing technical work.
  2. It's important to remember that Operating Systems are abstractions of hardware functions. What you can do in one, you can usually do in another.
  3. It's bad practice to remember only procedures and not understand what those procedures are doing.
  4. It's important to document your work. Sometimes, it's when doing the documentation that everything comes together to make sense.

This all comes up because there was a similar problem at my current employer recently. An engineer has a Mac for his kiddo's school and it was giving some issues. No one on the helpdesk wanted to help, partially for time constraints (fair), but disturbingly also that they could not understand the problem. A problem that they had solved countless times on Windows (and *nix) counterparts. Their fear came from missing lesson 2, and 3 above.

Working in technology is a constant learning exercise. At some point, to succeed, you need to develop the confidence to take a new experience (like troubleshooting a Mac) and apply your existing skills to that endeavour.

In other words, sometimes Fucking Around and Finding Out is the best way to learn if you're up to the task or not. Just don't do that where it's critical... or in production ;)


0 Kudos

Comments

Displaying 0 of 0 comments ( View all | Add Comment )