Archive for the ‘Troubleshooting’ Category

Firefox’s Stability?

Thursday, October 22nd, 2009

Today several colleagues started talking about something I had, until the conversation started, taken for granted: the stability of the Firefox browser. It seems that some people are experiencing frequent crashes, hangs, and other annoying behaviors. Several have taken the radical (and undesirable) step of removing the application completely, migrating to another product like Chrome or Safari.

If anyone from Microsoft is reading this, don’t think it means IE is better. The same people said they disliked Microsoft’s product as “too clunky,” “too slow,” or just plain annoying. I don’t know which version they were referring to, but since most of the discussion was from science writers I doubt they were using some ancient and unsupported version. These folks tend to be pretty current with their technology.

The whole discussion made me curious, since I (and many others) use Firefox daily with absolutely no problem whatsoever. Many suggestions were made in terms of possible diagnostics or causes, so it only seems reasonable to pass them along. After all, if a few writers are experiencing browser problems it seems likely others are as well.

First, many users suggested specific websites might be more problematic than others. One noted that (of all things) scientificamerican.com relied on lots of heavy mouse-over advertisements and odd popups that could cause browser problems. The solution (if you’re using Firefox) is to install the wonderful AdBlock plugin, which should disable most, if not all ads found on these websites. Installing any ad blocker might help, but this seems to be one of the best. Plus, it’ll save bandwidth by preventing the download of all that cruft.

Second, a plugin like NoScript (also Firefox only) could easily cause slow page loading in some cases. This is because whole sections of a site might be blocked by the NoScript application; if these are somehow critical to the page, bad things might happen. This said, NoScript is a very handy application that can easily save your PC from infection by a bogus website. So don’t disable it unless you’re really certain a given site is safe.

Next, it’s possible that cruft (junk) from older installations is causing Firefox crashes. One user said they’d totally cured their browser-crash problem by completely uninstalling and re-installing Firefox 3.5.3.

All in all, Firefox is a great browser. But like any piece of software, it can run badly in certain contexts. Outside applications, not to mention viruses or other malware, could affect its stability. If you’re having browser problems, try to isolate the type of crash you’re having (i.e. is it caused by a certain website or action). Hunt around the net for help. Ask a friend. Make sure it’s not badly outdated or no longer supported. Try removing and re-installing it.

If nothing else works, try another browser. If that also crashes, you have much bigger problems to worry about.

Obscure and Difficult Troubleshooting

Tuesday, September 22nd, 2009

Recently a friend emailed me with an odd problem. His Windows XP (SP3) box had suddenly lost its ability to use USB devices, and he couldn’t figure out why. This guy has been in the field for a long time and has performed many troubleshooting tasks on all sorts of unusual hardware. If he was stumped, the problem was fundamentally obscure or just plain weird.

The basic symptoms were as follows: connect a mass storage device (external drive, point-and-shoot camera, etc.) to the PC, and either it would appear as an “unknown device” or the system would lock up. Occasionally a device would be recognized, then would just as suddenly become inaccessible again. Deleting and rebuilding the USB device database by removing devices from Device Manager had no effect, nor did other suggestions like disconnecting the machine from power for half an hour or resetting the system BIOS.

He finally decided that maybe the built-in USB device hardware was defective, so we found a PCI-based replacement card and he gave it a go. Same result. But this experiment actually uncovered the root cause and resolved the issue.

He’d noticed his Webcam worked just fine if it was connected by itself, and thought maybe the problem involved mass storage drivers in Windows (he’d also just upgraded to SP3). Then, he unplugged the cable he was using to connect his disks…and suddenly the Webcam came back to life. On a hunch, he threw that cable aside and grabbed a spare. Suddenly all devices were working just fine.

A $5.00 USB cable had gone bad for unknown reasons, throwing his whole system into an unstable state. This exercise probably cost him a full day in terms of troubleshooting time, plus the cost of a new USB card and the hair he’d pulled while diagnosing it.

The oral of the story is simple. Don’t discount any piece of hardware, no matter how inconsequential it might seem, when your system starts acting up. A terminator, a cable, or a duff connector can ruin your whole day.

You Too Can Repair Things

Tuesday, September 15th, 2009

We live in a throw-it-away-and-buy-another-one world these days. People regularly bin items that, only a decade or so ago, would have been considered way too valuable not to repair. Cameras, cell phones, PCs, and other bits of electrical gear (not to mention cars) are tossed out as unrepairable, or too expensive to warrant a repair.

Personally, I think part of this attitude is fostered by manufacturers. Making people believe something can’t be repaired easily or cost effectively means they’ll sell more goods. And if they set repair costs so high that buying another unit is cheaper to the consumer than the repair estimate, then their markets are more or less guaranteed.

I’m mentioning this because, as I wrote a week or two ago, I ended up buying a new digital camera because my old one (bought in 2005) accidentally fell off the bumper of my car. Long story. But the unit still worked — all that happened was that the screen on the back was cracked — and I kept looking at the forlorn piece of electronics sitting on my desk. Would it really be that difficult to repair? And could I even find the parts…surely Sony wouldn’t sell me a new screen, or they’d charge five times its value.

Hello Ebay. I decided to search for the camera’s model number and manufacturer, just to see if someone had another broken unit I could pick up for parts. Five minutes later I’d found a small company that apparently specializes in buying up old digital cameras and other devices for pennies on the dollar. They cannibalize them, test the parts, and sell them on the cheap. A $20 Paypal payment later, and a used/tested replacement screen was on its way to my house.

It arrived today. I pulled out a tiny jeweler’s screwdriver, along with a small set of pliers used for fine electrical work, and started the task. I wasn’t sure how hard it would be to disassemble the case, but it turned out to be a simple job involving 5 tiny Philips screws. Off came the back of the case, revealing the cracked screen. It was connected to the camera’s electronics with 2 small ribbon cables, easily manipulated with the pliers.

Ten minutes later, my old camera was working again with its shiny new screen. For $20 and a few minutes’ work, I fixed a “complex” piece of electronics while sitting at my desk. Obviously, more serious damage (i.e. a broken case or fried main board) probably would have made the job far less worthwhile, but this proves the average semi-competent person can still do their own repairs if they want to. So don’t pitch that “broken” item too quickly — with a little research and some careful work, you might save yourself a whole lot of cash.

The Wrong Way To Correct Performance

Wednesday, August 5th, 2009

Earlier today, I was astonished by a Facebook posting by a friend, who basically said her PC was performing badly…so she wiped and reloaded the whole OS. Now she’d realized how many applications were installed, and was lamenting all the time she’d have to spend re-installing them.

The reason this posting caught my eye was that another friend once told me he used the same method as his “tried and true” solution to any Windows problem. Lose a file association? Wipe and re-install. Machine running a bit slower than you’d like? Same solution. What an incredible waste of time. There are far better methods that are significantly less labor intensive and time consuming.

Apparently many people are convinced that any slowness in Windows must be the result of a virus or other piece of malware. This simply isn’t true (though I’ll bet millions of users have infected systems and are completely unaware of the fact). Well written viruses are stealthy. They won’t slow down systems so badly that owners are tempted to search for problems.

Badly written viruses are, of course, another ball of wax. They, like any other defective piece of code, could cause massive performance hits or repeated Blue Screens of Death.

I’ve suggested the use of a good Registry cleaner and a disk defragmentation tool (even the built-in Windows version is pretty good) on many occasions. These tools, along with a decent firewall to keep bad guys away, are still your best line of defense in terms of performance preservation. Machines degrade over time due to fragmented disks and bogus/unneeded Registry entries. They need maintenance, just like a car or any other electo-mechanical device.

You (hopefully) don’t swap out your car’s engine every time the oil is due to be changed. That’s what the “wipe and rebuild” method equates to, and it’s massive overkill. Install the right utilities, back up your disks regularly, and your system will effectively maintain itself.

Advanced Disk Repair

Monday, July 6th, 2009

Recovering from a hardware failure is not always easy or straightforward, as my recent disk woes have shown. When the system disk began throwing errors, the obvious course of action was to transfer the contents of the failing drive to a new unit. I’ve done it before, but this time it wasn’t so easy.

Let’s review. The drive was throwing bad block errors in Event Viewer, and occasionally the whole system would freeze with a solid disk activity light. No data was being lost. Diskeeper was reporting no errors. Chkdsk, however, would not complete (it would hang, and I even allowed the system to sit and process for 8 hours with no results). Attempts to migrate the disk using Partition Magic failed for the same reason — solid activity light, no progress. Even repairing Windows failed to cure the problem.

The new Seagate disk (500GB SATA Barracuda) came with Seatools and Seagate’s own disk analysis/migration tool. Surely one of these would identify and correct the problem? No. They also failed, hanging in the same place. It was obvious a serious analysis/recovery tool was needed.

While researching the problem, I’d run across a few articles that mentioned Spinrite, a highly regarded disk analysis tool from Gibson Research. It’s been around since the 1980s, when we used it in the university’s PC repair shop. A friend who happened to stop by also mentioned it, and I decided it was worth a shot. If it didn’t cure the problem, nothing would and I’d simply have to re-install from scratch on the new drive.

Running Spinrite on Level 2 disclosed no errors, so I started it again on Level 4. In this mode, it completely reads and re-writes every bit on the disk in order to check for bad sectors. It ran along merrily until it hit the bad sectors, then slowed to a crawl–but progress was being made. As it was the holiday weekend, I let it run overnight. By morning it had completed, after finding and correcting 5 subtle errors in two sectors on the disk.

Voila! I was able to boot the system with no hangs, create a fresh new image backup of the whole drive, and clone the disk onto my new SATA unit. The machine lives, the old disk is on its way back to Seagate, and the problem is resolved. But it took about 3 days’ work to find and isolate what turned out to be a very unusual failure.

This is why repairs can be expensive, and why you should keep good backups.

PC Resurrection

Friday, July 3rd, 2009

As most of you know, I’ve been having problems with my primary Windows PC’s C drive. It’s turned out to be a fairly complex problem, and the solution has been challenging. The process also shows how easy it is for multiple problems to turn up simultaneously.

The initial problem was slow read performance on the C drive (lots of solid disk activity lights, random slowness), which was finally traced to bad blocks on the disk. Since that’s an easy swap, I ordered a replacement 500GB Seagate drive, opened the case, connected it to a spare SATA port, and powered on the system. It would no longer boot, and wasn’t even running its self tests. I theorized that the board itself was going bad, and this was causing the disk errors.

Since the board was a 2003 model, a full upgrade seemed useful. My new Intel board and 3.0GHZ processor, along with 2GB of 800MHZ Corsair memory, arrived a few days later. A quick hardware swap occurred, only to discover the system disk had apparently lost some Windows executive files. I really wanted my original OS back and didn’t just want to re-install, so I decided to try an experiment.

The original XP installation CD was put in, and Recovery Console was booted. I tried both bootfix and fixmbr to see if they’d restore the drive’s boot blocks, but they were inadequate to the task. The XP CD was again booted, and I told it to Repair the existing Windows installation. An hour later, the system booted on its own. Phase one was accomplished.

I then installed the new motherboard’s drivers and a shiny new GeForce 9500GT SLI video card, and checked performance again. The original disk is still throwing errors, so we now know the problem was probably a combination of a failing motherboard and a flaky disk. The original drive is now being cloned onto a brand new 500GB Seagate Barracuda. Hopefully this will be the end of the diagnostic process and my (completely refurbished) system will be back to normal.

The object lessons are as follows. First, good backups (which I have) are key. Second, not all performance issues are software related. Event Viewer can be your friend. And keep a copy of a partitioning package (I use Partition Commander) around at all times. It might just save the day when a disk decides it’s fed up with life.

When Disks Go Bad

Monday, June 22nd, 2009

Recently I’ve been having a difficult performance issue on my primary Windows PC. Basically it’s involved sporadic system hangs with lots of disk activity that didn’t seem related to a specific application. The HD activity light would come on for 5-15 seconds, hanging the machine. Then it would recover and all would be well for a random amount of time.

I put the system through all the usual checks — spyware, viruses, and so forth. Nothing showed up (as it should not, since the machine is pretty heavily protected). I tried shutting down various services and applications, like the firewall and various System Tray applications. No effect. Next, I updated video drivers and made sure there were no known issues involving compatibility or Windows updates that occurred recently. The problem persisted.

Finally it occurred to me that I was over-thinking the problem, and that it might lie at a much lower level. So I opened Event Viewer, cleared all the logs, worked for a while, then opened the System log and took a look. The problem was immediately visible — a disk is going bad. The Log is showing multiple cases of bad blocks on HardDisk0, which is the system drive. That’s not good, and it definitely explains everything. Now the problem is to get a new disk, open the box, hook up the new drive temporarily while the old one is still in place, and use Partition Magic to clone the partitions onto the failing drive.

Of course, I need to do all this before the existing disk decides it’s time to go to the Great Silicon Graveyard. In the meantime, I’m trying to pull a backup from both partitions on the failing drive, just in case it fails completely before the replacement shows up in the mail.

The lesson is clear: you can’t blame all performance problems on spyware or disk fragmentation. Sometimes the problem is much more fundamental. If you’re having a problem like this, check the basics. Make sure no errors are showing up in the system logs, or in another hardware-related location. The data you save may be your own.

Problems in the Darnest Places

Wednesday, May 20th, 2009

A few weeks ago I mentioned my primary Windows PC was randomly slowing down for no apparent reason. I went through numerous diagnostics, including removal of various bits of software (BitTorrent DNA was apparently part of the problem) but the issue persisted. To recap: on random occasions I’d try to open an application, file, or folder and suddenly the disk I/O light would come on solidly for up to 5 minutes. Logging into the system after a reboot took up to 15 minutes.

For a while, I thought I’d been hit by malware of some type. But I’m very paranoid about what makes it onto my system and have run Zone Alarm for several years with no incidents. I finally isolated the problem today, and the steps taken might be useful for others with an intractable issue like this.

The first step was to open Task Manager. Then I worked as usual until the disk I/O light came on and the system hung. At that time I checked active tasks, and clicked on the CPU column twice to re-order the applications. I wanted the highest CPU using processes at the top so I could see what was happening. This didn’t show anything unusual, but I did notice that vsmon.exe (the Zone Alarms scanning process) was taking a steady 2-5% of CPU time when the system was hung. So, on a hunch, I added disk-related columns (disk read and write) to Task Manager’s output using its View->Select Columns option.

This showed that the vsmon.exe process was performing huge numbers of read operations while the system was otherwise hung. Now I was onto something, but I needed to know which file it was accessing. That’s easy. The nice guys at Sysinternals have a utility called FileMon (very much like the lsof utility on UNIX) that shows, in real time, which files each process is accessing.

Running this utility, it turns out vsmon.exe was constantly re-reading a game patch I downloaded a month or so ago. This file is 1.3GB in size. When I rebooted, scanning this file could take 10 minutes (rendering the system unusable during that time). Periodically, vsmon.exe apparently decided it needed to re-scan the same file again to see if anything had changed. I have no idea why, but suspect Zone Alarm somehow flagged the file as suspicious.

The fix was simple. I deleted the game patch, which I’d already installed anyway and no longer needed. The problem is now totally gone…applications open like lightning, and there’s no more disk thrashing. Why was the file flagged, and why is vsmon.exe so paranoid about it? Your guess is as good as mine.

XP Behaving Badly, Part II

Tuesday, May 12th, 2009

Yesterday I blogged about a performance problem on my XP box that was proving difficult to track down. The last stage in the game involved cleaning up Zone Alarm’s list of permitted applications. My hunch was that it might have grown too large as the result of repeated instances of adding and removing programs over the years.

Sadly I have no way to confirm this scientifically through repetition, but cleaning out the list of permitted apps seems to have made a massive difference in performance. I simply opened Zone Alarms’ list of applications and started deleting entries I knew were outdated or no longer installed on the system. I also knew that the firewall would ask again for permission if I happened to delete an entry that was still active, so I wasn’t too worried about making things worse.

It’s now been 24 hours, and the “30 second delay” problem has not reappeared. I can switch among active applications with no delay whatsoever. The system appears much quicker overall, even when starting new applications.

What’s the explanation? I suspect Zone Alarm allocates a certain amount of RAM as cache space for the permitted application list, and loads as many entries as possible into memory when it starts up. This makes sense, since it would improve performance by eliminating the need to read a new entry from disk every time an event occurred.

However, if there’s a limit to the amount of allocated memory, what may have happened is that Zone Alarm had to go back and reload the cache periodically. Depending on how the process is designed, such activity could cause a fairly significant delay when switching applications. I’m not sure this is the actual explanation, but the Zone Alarm vsmon.exe process was definitely consuming CPU (2-4% on average) during the delay period. Now it almost never shows up as anything but 0% on Task Manager.

Keep this incident in mind if you’re running the Zone Alarm suite and experience delays on your PC. A bit of housekeeping might correct the problem. You could also try shutting down the firewall temporarily to see if it changes anything. If it does, and if the performance problem shows up again when you restart the firewall, you’re probably on to something.

When troubleshooting a system (or, for that matter, a car or any other device), the ability to toggle a behavior at will is a good sign you’re close to the source of the problem. It’s the scientific method at its best.

XP Behaving Badly

Monday, May 11th, 2009

Recently my XP machine started misbehaving itself, and I have yet to track down the problem. The diagnostic path has been interesting though, and it shows how misleading some behaviors can be.

The problem first manifested itself a week or so ago, about the time I was installing the BitTorrent client, as noted in an earlier article. Part of the performance problem, which showed up at boot time in the form of a massive delay in system startup, was traced to the BitTorrent DNA application. The system has been booting normally since I removed this troublesome piece of code. The problem now is that, in many cases, switching among active programs has become appallingly slow. Also, in some cases it takes far longer than normal to start some applications. But there’s no discernible pattern.

For instance, usually I have both Firefox and Thunderbird active, with the email client in the foreground. If I click on the Firefox window to bring it to the front, the disk activity light can come on solidly for up to 30 seconds before the applications switch occurs. During this period, Thunderbird is still accessible (I can switch back to it just fine). But Firefox appears hung, until suddenly its window again becomes active. The same happens with other program combinations, so it’s not isolated to a specific application.

One problem was disk fragmentation. Several months ago I’d installed Diskeeper 2009, and all 3 local disks were set to automatic (background) defragmentation. I opened the Diskeeper manager, only to find that this setting had somehow — I suspect a Windows update — been changed. The C drive was a mess, but has been cleaned up. The application switching delay persists. Zone Alarm shows no viruses or other malware.

On a hunch, I opened Zone Alarm’s Program Control center and removed literally hundreds of old entries from it. Every setup program, installer, and other temporary application leaves an entry behind in the “permitted application” list, and it had grown significantly over time. We will see if pruning it has a positive effect on performance.

Diagnosing performance issues often isn’t easy. And system slowness isn’t always caused by viruses or other malware.