Sun Microsystems - Low Cost High Availability

Sun Microsystems is one of the worlds leading producers of computing systems and is the originator of many ground breaking technologies such as Java. One of their key media customers needed a way to loosely cluster a large number of workstations and servers such that database jobs could be replicated across a number of remote sites, thus guaranteeing that a database sort would be completed even if groups of systems failed or became unavailable in a given location.

Project Requirements
Sun Microsystems customer had many small servers or workstations that were often highly underused. At the same time the customer could not afford to build a new cluster infrastructure or buy in off the shelf solutions. The customer had a very large library of standard analysis routines that produced specialised reports. At many offices local versions of these reports existed, customised for specific end customers. The number of programs in the library and the specifics of how they ran databases and produced reports for customers was such that making wide scale changes to in house applications would not be possible or cost effective. 

 

The Solution
Layer3 carried out analysis to look at how a short term quick win might be produced that would make use of existing systems and minimise change to programs and local working practise. We quickly established that whilst there was great variability about the library of programs, and many differences in the order in which each office might use programs from the library, actually running the programs was very similar at every location. 
Layer3 created a job control system that allowed multiple copies of jobs to be run on multiple remote systems. This job control system allowed underused systems in different time zones to cooperate in producing results. If any given system failed there would be a number of others running the job.


Benefits
• No hardware change or upgrade cost. 
• No change to existing working practises. 
• Developed in Java in under 3 months. 
• Existing job control was enhanced to request how many copies should be run and their distribution. 
• Termination could be requested when a number of completed solutions were available. 
• Operation was spread across multiple continents providing DR benefits. 
• Cost savings were in the region of £950,000.