Archive for May, 2009

Content Management 101

Thursday, May 28th, 2009

I attended a small conference today, and a lot of the buzz was around the concept of “content management” in corporations and other entities. No, the term doesn’t refer to keeping porn off your Intranet. Instead, it’s all about figuring out what information you own and how to leverage it.

The problem, as I’ve known for some time, is that companies are drowning in information. Many corporations are saddled with information-related regulations requiring them to keep copies of “pertinent” documents for a given period. Agencies, especially state and federal groups, have to conform to Freedom of Information Act requests. Thus, they need the ability to produce documents on demand.

However, many of them have no idea what information is being stored, where it is, or what it’s related to. Generally they follow one of two rules: a) delete everything that’s not nailed down, or b) keep everything forever. The first is guaranteed to get many companies into hot water with their own employees — information like email shouldn’t just be deleted automatically after 90 days or some other arbitrary period. The second, which is apparently more common, means exponential growth in storage requirements (translation: lots and lots of disk space).

And even if you keep everything, you still need the ability to find and use it. Holding onto a critical email message from 1998 is fine, but no one’s going to know where it is unless the company has a well defined content management system that can retrieve data as needed.

Many companies (Microsoft, IBM and Oracle, to name a few) sell fairly heavy duty CM systems. They aren’t cheap, but they might save your company a lot of time and disk space. One vendor estimates that switching from “keep everything” to “keep just what you need” can save a 10,000 person company about a million bucks a year in storage costs. And if you’re sued, having a good CM system in place may help you deal with legal “document discovery” issues with relative ease.

If your company isn’t using such a system, it might not be a bad idea to check out the offerings.

Office 14

Thursday, May 28th, 2009

Today I attended a conference that included several discussions of soon-to-be-released software. One of the topics included a short (since it’s still under wraps) chat about what’s been known at Microsoft as “Office 14,” or the next release of MS Office. This is the version that will follow Office 2007, which is itself bloatware extraordinaire in my opinion.

The public release name for the new product will be Office 2010; the buzz I’m hearing is that it’ll be out sometime this year. Of course, in 2007 people were saying it would be shipped in 2008, so who knows what the reality of the situation may be.

According to another site, it’s now due in the latter half of 2009. Features are as yet unclear, but one interesting claim is that a tool called OffiSync “allows you to work on your Google Docs from within Office Applications.” The evaluation published on the same site says this tool also provides the ability to collaborate on documents (SharePoint, anyone?) and do other interesting tasks.

It seems clear that Microsoft is dipping its toes, or maybe a whole leg, into the cloud paradigm. There’s apparently a version of 2010 called “Office for the Web,” which sounds a whole lot like a cloud-resident version. One blogger who’s been following the product’s development posted the following excerpt from a presentation Microsoft gave.

Office 14 can be delivered to end users in the traditional manner, as a virtual application, through browser-based applications or even over a mobile phone, all through a single deployment infrastructure. Through integration with Microsoft System Center, and SharePoint Server, provisioning Office across a range of user experiences is easier than ever

Then there’s an allegation that non-Windows OSes will be supported: “Microsoft Office 14 for the web will run on operating systems other than Windows. This, of course, includes the Mac and Linux based operating systems.”

Wow. Office running on Linux? That means Redmond is acknowledging the existence of another OS! And it’s Linux, which it dismissing as “unstable,” “too buggy,” and “unscalable” while it forked over “licensing” money to lawsuit-happy SCO in order to boost its legal battle against IBM some years back.

This is very interesting. Microsoft may have a winner: a release of Office with features that are not only snazzy, but actually potentially useful. Now let’s see how heinous the memory footprint is, and how unusable the latest UI might be. I still want the speed of Office 2003, and the old-style UI as well.

HTML 5 Looms Large

Wednesday, May 27th, 2009

For years software development companies have prophesied the advent of totally web-based applications that will remove the need for local installation, storage, and so forth. Software as a Service (SaaS) is part of that model, and various companies (like Google) have experimented with it. In some respects, it makes sense. Why have users buy hundreds of dollars worth of software they’ll use only rarely? Why chew up gigs of disk space for huge applications that could be run directly across the Internet from central servers? On the other hand, is this model stable and scalable enough for widespread adoption?

One of the problems is that these snazzy new applications are too complex for the current release of HTML, which is at version 4 and has been for some years. In the early days of the web, HTML versions changed pretty rapidly; 2.0 came out in 1995, was succeeded by 3.2 in 1997/98, then quickly surpassed by 4.0 (in reality 4.01) around the turn of the millennium. HTML 5’s working draft just came out in 2008, and many hurdles have yet to be overcome before it really becomes a usable standard.

What all this means is that no one is going to produce huge web-based applications based on the HTML 4 language. For one, it would be technically difficult if not impossible to do so due to limitations in the current version. For another, it makes almost zero sense to write new code against an ancient standard like 4.01. So the “growing sense that the Internet and browsers–rather than a computer’s operating system–will be the future foundation for application development” is just that…a sense. It’s not reality yet.

The other issue, of course, is that every browser now in use speaks 4.01 and older HTML only. Users would need to upgrade to new (and as yet unwritten) versions of IE, Firefox, Chrome, etc. in order to make use of the HTML 5 standard. And the proposed new release involves lots of interesting new bells & whistles. Apparently there are “five main HTML 5 concepts: canvas tags, video tags, geolocation, application caching and database, and Web Workers.” They’re all designed to extend the HTML 4 standard by providing new capabilities, including an easier method of dealing with layout issues (canvas tags).

Mozilla and (I suspect) Microsoft are working on HTML 5 browsers. They’re probably hedging their bets on which portions and variants of the proposed standard will end up being adopted. I suspect we have some time to wait before real “cloud” computing becomes a reality — if it ever does. The dream of a new form of “dumb terminal” computing based on network loading of OS and software has been an I.T. wet dream since the mid 1980s. I’m not sure it’ll ever be as widely adopted as some expect.

A PC on the Cheap

Tuesday, May 26th, 2009

A week or so ago an old friend emailed me. She knows I “work in computing” (translation to non-techies: “you’re a PC and Windows guy”) and wanted to ask what machine she should buy. That’s a pretty broad question, so I asked what she wanted to do with it. Basically, she’s looking for a small system to carry to work and cruise the Internet during breaks and quiet periods, so I suggested a Netbook or other small laptop. She’s happy with that concept, since she can get a decent machine for $300 or so.

But the question had me thinking about PC hardware in general, and what’s available for short money these days. As usual, the main issue is the same one I asked her: “what do you want to do with it?” The answer helps determine how much you really need to spend. I also find that most home users really don’t need cutting-edge systems, since they don’t do work that significantly stresses the CPU or disk. Unless you’re a hard-core gamer, developer, or audio/video producer you can probably get away with last year’s model (or even older units) at a fraction of the cost of a new system.

For instance, a quick search on EBay resulted in soon-expiring auctions for refurbished Dell Optiplex GX520 Desktop systems in the $100-150 range. That’s a system with a 3GHZ Pentium 4 CPU, 1GB of RAM, a 250GB disk, DVD burner, decent video chipset, and XP Pro. I’d beef that up to 2GB of RAM just to improve overall performance a bit, but otherwise it would make a perfectly usable system for many home users. Add a monitor, keyboard, and mouse and you’re ready to go. If you already own those accessories, chances are you can re-use them with a new system.

Another alternative, if you don’t trust EBay or a third-party refurbishing company, is the “outlet centers” many manufacturers operate. Lenovo, Dell, HP, and nearly everyone else sells new-in-the-box or refurbished systems with full warranties and far more current hardware. Recent prices on these outlet stores are in the $300-500 range for Intel Core 2 Quad - 2330 systems with a 320 GB SATA Hard Drive, 2GB RAM, DVD, and Vista Home Premium.

You don’t need a cutting-edge system to run a browser and email program. Of course that’s not what manufacturers want you to think, and obviously you shouldn’t buy an older system if you need high performance. But if you’re looking for a basic system, why spend money on the best?

No Longer Your Parents’ TV

Monday, May 25th, 2009

I’m old enough to remember the days of dial-tuner TVs (19″ was huge, 25″ was nirvana) capable of receiving exactly 12 channels. For those of you too young to remember, this was VHF channels 2-13. Of course, those were also the days when you were lucky if 3 or 4 channels were available in your area. My how things have changed.

I just bought a new set, replacing a 46″ rear-projection HDTV I bought in 2002 — yes, I was an early adopter. The old unit was in perfect shape, but was too big for the available space in my new house. It’s now been donated to a senior citizens home, where I hope it sees good use. This particular unit was exceptionally “modern” for its time, as one of the first units with an HDMI input.

The replacement is another Sony, also 46″, also HDTV of course. Not only does it have multiple HDMI inputs (7, to be exact), but it’s also a 120HZ 1080p unit that hangs on the wall. It uses about half the power, and is EnergyStar 3.0 compliant. It adjusts its backlighting to match ambient room light. And I’ll bet it has a much wider viewing angle. The rear-projection unit was limited in this respect.

One thing that attracted me to this particular model is that it has “a PC input2 (HD-15 pin) [that] offers the added versatility of using your HDTV as a computer monitor.” So I can grab a decent spare PC, put in a really good video card, put it on my home network, and watch videos on Hulu or other services on a screen that’s 5 times the size of my standard PC monitor. It also takes Blu-Ray and any number of other formats, and can receive various types of USB and memory cards.

Today’s models are not just a “TV” anymore — instead they’re multimedia output devices that can be used for a wide range of entertainment and educational purposes. The sad thing is that, even with 200 channels to choose from, there’s still nothing to watch!

2000 Movies on a DVD?

Thursday, May 21st, 2009

The expansion in capacity available for portable mass storage devices over the last few decades is astonishing. The 2GB thumb drive I carry as part of my keychain is well over 2 times the amount of space that was installed on a room-sized VAX system I managed in the mid 1980s, and it used 16″ platters in drives that weighed in at about 100 lbs each.

The first floppies I ever saw were 5.5″ 180KB (yes, Kilobytes) types, also in the early 80s. Some readers may remember the progression from “single sided, single density” to double density, then double sided double density (DS-DD), models. Those were, I believe, 360 and 720KB respectively. Then, we achieved what we thought was Nirvana with 3.5″ HD 1.4MB floppies. Anyone who installed early versions of Windows or Office using stacks of 3.5″ disks (I recall over 40 being used in one release) was certainly thinking “there has to be a better way.”

Segue to today, when we carry 50GB on an iPod that fits in a pocket. DVDs can handle several GB, and now several very clever Australian researchers have announced a new dimension in storage. Rather than squeezing a few extra bits into the existing technology, thereby maybe increasing capacity by a few percent, they appear to have changed the medium completely.

Whereas “standard DVDs are made with three spatial dimensions, the Aussie researchers added two more.” In doing so, they made a quantum (almost literally) leap in technology. It appears they can store up to 2000 full-length movies on a single enhanced DVD.

Let’s think about that. Say a full-length movie takes a paltry 1GB, or roughly half a DVD’s capacity. Given the new technology, that’s 2000GB, or 2 Terabytes (TB) per enhanced DVD. And if their predictions are correct, we may have these discs in our hands in 5-10 years. Of course, cost is another matter that has to be considered. These things will not be cheap to produce or burn, and I suspect they’ll be pretty sensitive (at least initially) to damage.

That’s okay, though. I remember when burning a single-speed CD (all 650MB) required lots of special handling, a blindingly fast Mac II, and a completely clean hard drive. If you didn’t meet all the conditions, the burn just failed. Nowadays, I can slap in a CD and burn at 16-32x while visiting websites and running Second Life. I’m pretty sure any technological barriers to this technology will be overcome as well.

Problems in the Darnest Places

Wednesday, May 20th, 2009

A few weeks ago I mentioned my primary Windows PC was randomly slowing down for no apparent reason. I went through numerous diagnostics, including removal of various bits of software (BitTorrent DNA was apparently part of the problem) but the issue persisted. To recap: on random occasions I’d try to open an application, file, or folder and suddenly the disk I/O light would come on solidly for up to 5 minutes. Logging into the system after a reboot took up to 15 minutes.

For a while, I thought I’d been hit by malware of some type. But I’m very paranoid about what makes it onto my system and have run Zone Alarm for several years with no incidents. I finally isolated the problem today, and the steps taken might be useful for others with an intractable issue like this.

The first step was to open Task Manager. Then I worked as usual until the disk I/O light came on and the system hung. At that time I checked active tasks, and clicked on the CPU column twice to re-order the applications. I wanted the highest CPU using processes at the top so I could see what was happening. This didn’t show anything unusual, but I did notice that vsmon.exe (the Zone Alarms scanning process) was taking a steady 2-5% of CPU time when the system was hung. So, on a hunch, I added disk-related columns (disk read and write) to Task Manager’s output using its View->Select Columns option.

This showed that the vsmon.exe process was performing huge numbers of read operations while the system was otherwise hung. Now I was onto something, but I needed to know which file it was accessing. That’s easy. The nice guys at Sysinternals have a utility called FileMon (very much like the lsof utility on UNIX) that shows, in real time, which files each process is accessing.

Running this utility, it turns out vsmon.exe was constantly re-reading a game patch I downloaded a month or so ago. This file is 1.3GB in size. When I rebooted, scanning this file could take 10 minutes (rendering the system unusable during that time). Periodically, vsmon.exe apparently decided it needed to re-scan the same file again to see if anything had changed. I have no idea why, but suspect Zone Alarm somehow flagged the file as suspicious.

The fix was simple. I deleted the game patch, which I’d already installed anyway and no longer needed. The problem is now totally gone…applications open like lightning, and there’s no more disk thrashing. Why was the file flagged, and why is vsmon.exe so paranoid about it? Your guess is as good as mine.

Darwin Online

Tuesday, May 19th, 2009

Amazingly enough, if you look deeply enough on the Internet you can actually find worthwhile, educational material. I know that’s shocking, what with all the porn and financial scams one could peruse first. However, here’s the link (iTunes required) that will take you to the lectures. They’re a set of 8 public events commemorating Charles Darwin’s 200th birthday, and they aren’t bad. Apple iTunes maintains these for free, so you can just download them charge free

This is the type of information sharing for which, startlingly enough, the Web was actually created all those years ago. The first uses of hyperlinking and embedded data were related to scientific study, and were designed to get users from text matter in a document body to footnotes or even a link to another document with related data. Now we’ve evolved (sorry, had to) into a more sophisticated model in which we’re not just linking flat ascii text, but whole multimedia productions.

The cool thing about this case is that access and storage for this purely academic, non-profit lecture series is being provided by an application usually used to download music for which there’s a cost basis. However, the people at Apple are also providing free access to certain, well deserving works. The Darwin lectures certainly apply. Here’s hoping we see more such lectures made publicly available from other institutes or universities.

There’s also PLoS, the Public Library of Science. This is a web site where academic research papers are freely available, rather than encapsulated in journals that cost $10,000 a year to subscribe to (many are read by no more than a dozen researchers). So if you really want to learn about Darwin and his theories — and I will note that most people who claim to understand them are totally wrong — here’s a good place to get started. The lectures are interesting, they’re not dry or boring, and they actually show what it’s all about.

Maybe you’ll learn something. High technology being used to provide free academic access. It can’t get any better than this.

When it Rains, it Pours…Software

Monday, May 18th, 2009

Sometimes events just seem to end up occurring in a rapid-fire manner after a long period of relative calm. Now it turns out Microsoft apparently has some major releases coming out of the development pipe within the next year. Redmond will be a busy place, what with Windows 7, Windows Server 2008, and Exchange 2010 (what?!) hitting at more or less the same time.

This will cause a lot of work for Support personnel who have to deal with the inevitable deluge of bugs and enhancement requests. At the same time, IT people at large client sites will need to strategize about deployment timetables and rollout methodologies. Most companies don’t allow employees to install new software on their own these days, you see. Instead they have a few early adopters (maybe sales and consulting) who need the latest and greatest on their desks at all times. IT staff also test major new releases in isolated environments to ensure no major compatibility problems exist with other in-house software or websites.

And right after all this hits, Office 2010 hits the stage. I’m not sure how many people will care, since Office 2007 was a major resource hog It offered nothing new, aside from a fancy new menu system no one understands and a new document format that shreds compatibility with other platforms. Here’s hoping 2010 provides an option to revert to the 2003-style menus.

Also interesting is that a whole raft of code-named projects are due to hit the streets fairly soon. According to the buzz, these include a “new Application Server technology for Internet Information Server (Dublin); a client console for Forefront security software (Stirling); a distributed cache system for clustering technology (Velocity); and a componentized version of Windows Embedded for devices (Quebec).”

Some of these technologies will be used only in a few rarefied environments, like high-end server farms and development shops. Others, like the Application Server, will be hidden behind websites as middleware. Redmond is going to be a busy place. Hope they haven’t laid off or “reassigned” all the Support staff.

Google’s Glitch

Thursday, May 14th, 2009

As many people on Google’s Gmail and search services certainly noticed, the huge provider experienced a severe service provision problem between roughly 10:30 and 11:30AM Thursday (US Eastern time). The issue, which apparently involved a routing problem that pushed too much traffic toward servers based in Asia, caused delays in the all-important search function, as well as problems on YouTube and other services. Fundamentally, “many Web sites took twice as long to load and were twice as likely to fail during Google’s disruption” according to one report.

The main issue here is that Google has largely become the primary go-to service in terms of search and other services. Since its services are “used by hundreds of millions of people, even a breakdown affecting a small percentage of its audience can have a huge impact. Google’s search engine, by far the most popular on the Internet, fields more than 9 billion monthly search requests in the United States alone.” Ergo, if Google goes down — even for a short period — a lot of services simply stop working.

As is the case with many large Internet and technology companies in general, the company has distributed its services worldwide to guard against a major meltdown at a particular facility. However, in this case nothing actually went down. Instead, a routing issue overloaded one data center until the problem was corrected. Basically, they’re “doing it right” from an overall design standpoint, but somehow managed to put too much of a load on one location.

The bad thing is that this interrupted services worldwide for about an hour, and certainly caused a lot of user frustration. The good thing is that annoyed users simply did the right thing by going elsewhere. Rather than using Google, they used Yahoo or another search engine until the problem was resolved.

Another good thing (though bad from a production computing standpoint, since outages like this are embarrassing) is that this sort of problem helps big companies test the resiliency of their services. Companies can run as many simulated disaster drills as they want; sometimes it takes a real emergency to find holes in the recovery process and areas for improvement.

Will someone lose their job over this? Possibly. Will Google learn from it? Hopefully.