Enable existing applications for grid
If you are familiar with the six strategies for grid application enablement, you might want to apply one of the frst three of them to an existing application. This article shows you what to consider to grid-enable platform-specific distributed applications (monolithic and modular) and Web-enabled applications (servlet-centric and database-centric).
Introduction
Before using the techniques described in this article, make sure you are familiar with the six strategies for grid application enablement described in the following "Six strategies for grid application enablement" series of articles:
- 
Part 1 provides a series overview of the six strategies, and summarizes the characteristics and benefits of each strategy.  
 
- 
- 
Part 3: Strategy 3 Parallel Batch and Strategy 4 Service discusses grid enablement using these two mutually exclusive strategies. In Strategy 3, a batch job is subdivided. Its many independent subjobs run concurrently on behalf of the user who submitted the aggregate job. Strategy 4 discusses implementations of a service-oriented architecture in a grid environment.  
 
- 
In this article, we introduce the basic architectural pattern that fits most cases when you enable existing code. Next, we discuss a few strategies to enable existing code based on the most common architectures we have encountered over the years. We introduce two basic scenarios for platform-specific distributed applications, and we will study the most common architectural variations of each scenario and the most common way to make these architectures fit into the enablement pattern. We conclude with two scenarios for Web-enabled applications (servlet-centric and database-centric).
There's a pattern
Most enablement efforts for existing code using batch-oriented grid infrastructure software are similar. There's a pattern you can follow. You can use this pattern to achieve the first three strategies of grid adoption.
The base case for these three strategies for grid adoption is a program that takes command-line parameters and uses files or databases as specified in the command-line parameters. If the program is licensed, the grid infrastructure will require license management capabilities.
In general, the first three strategies to enable an existing application to run on a grid all require the user to send requests to a client application, which acts as a requester to the grid infrastructure. The client to the client application can be the actual user or a portal. The grid infrastructure takes care of deploying the actual application, which becomes a provider to the grid infrastructure. See Figure 1.
Figure 1. Integration pattern for enabling existing code

 
The key point is that the client program (job submission driver) should talk to the grid infrastructure as if it is talking directly to the application. The simplest way to do this is by having the client program issue command-line type instructions to the virtualized application.
Implementing this scenario is easy when the application is a stand-alone program with minimal deployment requirements. But when you're dealing with integrated applications, the whole thing might require a little creativity.
Known scenarios
Enabling existing code using batch-oriented grid infrastructure software involves a finite number of known scenarios because of the state of the software industry today. Two scenarios exist, both involving essentially the same type of application:
- 
Enabling platform-specific distributed applications, which includes client/server, transactional, and batch-oriented applications written before Web applications existed.  
- 
Enabling Web-enabled applications, which includes those platform-specific distributed applications that were given Web "front ends" or "wrappers" to make them work as Web applications. In most cases, the integration strategy involves enabling servlet-centric or database-centric applications. 
 
 
Enabling platform-specific distributed applications
As previously mentioned, this scenario applies to client/server, transactional, and batch-oriented applications. Two architectural tendencies prevail in these three types of applications: monolithic and modular. 
The case of monolithic applications
In general, deploying platform-specific monolithic applications on batch-oriented grid infrastructure software is as simple as installing the application on all grid nodes and writing some "glue code" that will integrate user requests, parameter passing, and program calls from the grid infrastructure. See Figure 2. 
Figure 2. Deploying monolithic applications using the standard enablement pattern

 
This "glue code" we're talking about is what makes this an integration job. In most cases, you can integrate a monolithic application and a batch-oriented grid infrastructure product through scripting. You can use Perl, Python, or ordinary shell scripts to integrate user requests, parameter passing, and application calls within the context of the grid infrastructure software. 
There's a caveat
The caveat has to do with what a monolithic application does and how it does it. Monolithic applications tend to try to be all things to all people. That's one reason they're monolithic (modules are not trustworthy to some people). Sometimes, monolithic applications have, among other things, embedded grid functionality. Embedded items and the way in which such functionality was implemented will determine whether the application can run on a grid. 
For instance, a monolithic application that does not have any built-in grid infrastructure functionality will be easier to enable than others. An example would be a similar application that, on top of doing what it is supposed to do, also takes care of tasks such as scheduling which instance processes what request, or which database tables need to be locked on behalf of a given user, or when transaction affinity needs to be enforced. However, we still need to look into how the built-in grid functionality was implemented.
If the built-in grid functionality can be turned off from within the application, then this monolithic piece of code will be able to run on top of batch-oriented grid infrastructure software with no problem. If, on the other hand, the grid functionality is built deep into the application and it cannot be turned off, then we have a problem. 
In general, the extent of the work needed to turn off built-in grid infrastructure functionality is very high. When programmers were learning to reuse code, they also learned to abstract functionality both ways: up and down. Embedding grid infrastructure features in business logic frameworks is an example of abstracting functionality down.
We can't blame programmers for doing this. At the time when most platform-specific distributed applications were written, very few people were thinking about grid computing. At the time, nobody even considered the possibility of finding ready-made containers for their applications, much less complete grid infrastructures where they could just deploy their code and move on with their lives. 
The case of modular applications
In general, enabling modular platform-specific distributed applications will be easier than enabling monolithic applications. The reason I say this is that modular applications give you choices on how to deploy them. There are caveats just as in the previous case, but with modular applications, easier ways to get around them. 
Turning off modules
One of the main advantages modular applications have is the possibility of turning off modules when necessary. This way, any environment-related functionality can be rendered to the grid infrastructure.
As in the case of monolithic applications, the existence of built-in grid features and whether they can be turned off or stripped out will also determine the degree of difficulty for the grid enablement effort.
The difference is that most modular applications, when they have any built-in grid features, will most likely concentrate that functionality in a single module, or a group of specialized modules. This should make it easier (in theory) to turn those modules off or to just eliminate them altogether. 
There's another caveat
This caveat is inter-module communication. The degree of difficulty in turning off, or stripping out, application modules depends on how the designers implemented inter-module communication. In general, the simpler the transport, the lower the degree of difficulty. 
For instance, it is common for applications of this kind to have a dedicated module to handle all database calls. In some cases, the module not only acts as a universal database client by supporting ODBC or JDBC drivers for several vendors but also does something we can call "table access scheduling," which is sort of an intra-application table-locking mechanism that allows the application to handle table locking independent of the database. 
Having a universal database client is a good idea. However, if the application is to be grid-enabled, it is better to leave table locking to the data grid infrastructure (let's assume that's what we're doing with the application). So, all we need to do is substitute the module for a regular database client, deploy the RDBMS into the data grid, and we've got ourselves a grid-enabled application.
Most database clients and listeners rely on TCP/IP sockets to get their orders from a program. The DB2® client listens by default on port 50000, for example. But what if the application designers decided that the tried-and-true way of TCP/IP sockets was not fancy enough for their application? What if they decided to go with a proprietary mechanism for inter-module? Then the problem is not so straightforward anymore. 
There is another aspect to inter-module communication that can turn out to be a show-stopper. If the application is to be deployed on a computational grid, there can be several instances of several modules running concurrently on the grid. If the grid infrastructure software cannot relay the transport mechanism, or if the transport mechanism itself cannot function on a grid environment, the application simply will not work as expected.
Then, the degree of difficulty for the grid enablement project will be directly proportional to the effort of replacing the inter-module communication mechanism. 
Deployment strategies: best-case scenario
An ideal modular application should handle inter-module communication with a dispatcher -- or broker -- module. This plan would allow modules to be deployed anywhere on the grid because inter-module communication will always happen through the broker. See Figure 3. 
Figure 3. Ideal deployment for modular applications

 
The best-case scenario has two very desirable behaviors. First, application modules should be atomic to allow independent deployment from one another. The dispatcher-broker module should take care of all inter-module communication and data exchange.
As for shared libraries, the grid infrastructure should be able to handle them if they're installed as part of a system-wide installation. If not, they can be included as part of the provisioning policy for all modules so that all nodes have a local copy.
Second, application modules should be granular enough to allow for multiple instances of the same module to run concurrently (at least) on the same machine. A module should take care of its own results aggregation. Application-level results aggregation can be handled by the dispatcher-broker module or by through the database. 
Deployment strategies: most-common scenario
Unfortunately, most modular applications are not that well behaved. In some cases, module encapsulation is not atomic enough to allow for true independent deployment. In other cases, the dispatcher-broker module doesn't take care of all inter-module communication and data exchange. Instead, some modules call on each other directly, which forces them to reside on the same machine.
Results aggregation also represents a problem, especially when the dispatcher-broker doesn't fully own the task of managing inter-module communication. Some modules might feed their results into other modules instead of just passing them back to the broker. Whatever the situation, the most common scenario is to deploy the entire application on all grid nodes as shown in Figure 4. 
Figure 4. Most common scenario for deploying modular applications

 
To an extent, the most common scenario means that a modular application can be deployed as a monolithic application if worse comes to worst. It should work but the advantage of being modular will be lost because it won't be exploited by the grid infrastructure as in the ideal case.
In the same vein, think of a monolithic application as a single-module modular application and treat it as such when devising a strategy for managing results aggregation in the case of multiple concurrent instances.
These two cases, monolithic and modular applications, represent the simplest scenario when it comes to grid enabling platform-specific distributed applications. The situation changes dramatically when we deal with Web-enabled applications, as you'll see in the next section. 
 
Enabling Web-enabled applications
A Web-enabled application is not a true J2EE application. We call "Web-enabled" those applications that were written originally as platform-specific distributed applications but run as Web applications, thanks to a Web front end.
The most common architectures are known as servlet-centric and database-centric, and each poses its own challenges to grid enablement. 
Servlet-centric applications
Servlet-centric applications, in general, follow the architectural pattern illustrated in Figure 5. 
Figure 5. Most common Servlet-centric architectural pattern

 
The platform-specific application in Figure 5 is, in some cases, patched up to support things such as XML and other technologies. It is enabled to talk to a Java™ Virtual Machine via JNI or a proprietary connector framework. As for database support, it is common to use ODBC-to-JDBC bridges or to just stay with ODBC.
When "ported" to run on J2EE application servers, servlet-centric applications interact with clients via a gateway servlet, which relays requests to the connector and thus to the actual application. A typical porting to an application server, such as IBM® WebSphere® Application Server, looks like the one illustrated in Figure 6. 
Figure 6. Typical servlet-centric Web enablement strategy

 
Enabling an application that follows this execution pattern run on a grid would require at least the following steps:
- 
Modify the deployment descriptor so that the only component being actually deployed on WebSphere Application Server is the gateway servlet, which will become the portal for the users.  
- 
Take the Java piece of the connector framework (JNI or proprietary) that acts as an interface to the core application and deploy it as a stand-alone Java application. This will be the client program or job submission driver.  
- 
Deploy the core of the application as a monolithic, or a modular application (whichever term applies) on a batch-oriented grid infrastructure.  
The resulting scenario is illustrated in Figure 7.
Figure 7. Typical servlet-centric grid deployment

 
It might be necessary to change some of the original assumptions when implementing this scenario. For instance, a deployment like the one shown in Figure 7 would probably be easier to manage if the gateway Servlet ran on WebSphere Application Server Express, which doesn't include an EJB Container, as opposed to WebSphere Application Server Advanced Edition, or Apache Tomcat. A change of this nature would actually benefit your customers because it could lower the total cost of the application.
Keep in mind this is just one way of doing it. There may be better ways to architect the deployment pattern depending on the characteristics of the application. The best solution should provide the best returns in terms of feasibility, effort, usability, and administration. 
Caveats
The issues affecting monolithic and modular applications can also affect servlet-centric grid enablements. In addition, issues related to application performance can also arise.
Servlet-centric applications, in most cases, will experience performance problems at the Web container level. The use of a gateway servlet can create a sometimes nasty bottleneck that stems from, among other reasons: 
- 
The servlet execution model, especially if the servlet bottleneck relies on Java Server Pages (JSPs) for presentation logic  
- 
The latency created by connector frameworks such as JNI -- In some JVM implementations (ours, for example), JNI calls cause the JVM to have to create pointers to allocate the requested platform-specific processes. Sometimes these pointers occupy large chunks of memory and the JVM needs to refresh them continuously to avoid the garbage collection thread from picking them up while they're still active (you don't want that to happen). This creates additional overhead on the JVM, which translates into higher CPU and memory heap utilization by the Java process in which the JVM is running.  
These issues have nothing to do with the grid infrastructure. They can cause problems even if the application is not running on a grid. You need to be aware that you will have to tune the application server under the new conditions once the application is deployed on a grid. Other issues can stem from the interaction of the application server and the grid infrastructure. For instance: 
- 
The overhead for security between the application server and the grid infrastructure. Given that grids are security freaks, and given that JNI calls don't always like to be asked to authenticate, the result sometimes is that the application has to log on to the grid infrastructure every single time it makes a connector call. This can slow things down.  
- 
Network latency between the application server and the grid infrastructure, especially on geographically disperse deployments.  
Sometimes these problems, when they all surface at the same time, make the whole grid enablement exercise too complicated and too costly to be worth the effort. Sometimes it's better to consider the possibility of deploying the platform-specific piece of the application as a regular monolithic or modular application (whichever applies), or to rewrite the platform-specific piece as a J2EE-compliant set of components and look for an SOA-based grid infrastructure. 
Database-centric applications
Database-centric applications became the de-facto standard back in the day of the client/server paradigm. Tools such as PowerBuilder, Oracle 2000, PacBase, Progress, and other products were widely used to create monolithic, footprint-heavy, and large applications that required proprietary languages and specialized skills to understand them.
Some products of this type managed to adapt to the new distributed paradigm and gave us these Web-enablement hybrids we know now as database-centric applications. Some vendors claim having re-engineered their products to be truly distributed and Web-native, but for some products, under the wraps, the old client/server, monolithic architecture remains untouched.
This situation is understandable given that most vendors even invented their own languages to describe highly complex frameworks aimed to facilitate what was called in those days "Rapid Application Prototyping and Development." 
Regardless of the technical value of these products, vendors need to preserve this intellectual capital for one simple reason: A lot of money was invested in their development. Therefore, database-centric applications are going to be around for a long time.
Common implementations of database-centric applications revolve around proprietary frameworks. In most cases, these frameworks have been "patched" to support Java technology, XML, and JMS, on other now-popular industry standards. Figure 8 illustrates the most common flavor of this architecture. 
Figure 8. Most common database-centric architectural pattern

 
In most cases, the application and the database are bound together in a single package, and there's no difference between business logic and database operational logic. Because of this, added support modules do not interact natively with the original application.
Porting scenarios for database-centric applications are usually done through the Web container, as shown in Figure 9. 
Figure 9. Typical database-centric Web enablement scenario

 
The strategy is similar to the one used for servlet-centric applications. It involves writing a gateway servlet that accesses the proprietary framework through the add-on Java support modules as dependent classes, or through XML files. 
Treat it as a monolithic application
The easiest way to grid-enable a database-centric application is to treat it as a monolithic application. In this case, you would need to implement the deployment pattern shown in Figure 9.
But there's an interesting characteristic about database-centric applications that might provide a more efficient way to deploy certain applications on a grid. 
Virtualizing data
Database-centric applications use the database not just to store data but also to keep configuration information, workflow data, and even presentation metadata. This dependency on the database is sometimes so tight that it is impossible to separate the database from the runtime environment. In some cases, the application actually runs on top of the database engine.
In cases where the database run time is the application run time, what can be virtualized is not the business logic but the data. You need to use a data grid instead of a computational grid as in the previous scenarios.
By virtualizing a database-centric application on a data grid, you would be indirectly virtualizing the application run time and, thus, the application itself. All the data the application needs to run will be made available locally on all nodes on the grid and, whenever the application changes, the modifications will be propagated automatically.
What happens is that you get location independence for the requests going to the database while the data is propagated all across the grid. To the requesting program, the data seems to be local all the time when, in reality, it can be located anywhere on the grid. So, what runs as a single-instance batch job is the request broker (the Java interface and a driver for the thick client) while the data is virtualized by the grid infrastructure.
Depending on how the business logic is written, you might be able to virtualize the data and the business logic as shown in Figure 10. 
Figure 10. Data virtualization scenario for database-centric applications

 
Note that you can virtualize data and business logic on a data grid only if the business logic runs in processes triggered by the database engine, which is the case of most proprietary frameworks.
If, on the other hand, the business logic can be triggered by an outside process, such as the job submission driver, you might be able to actually separate data from business logic in what would be a combination of a computational grid for the business logic and a data grid for the database. This approach, however, might not be a feasible solution in some cases because it could introduce too much complexity into the deployment model. 
Caveats
Database-centric applications usually have a characteristic of being heavy on CPU and memory usage. This demand becomes especially visible when the database run time also runs applications.
In some cases, a grid deployment exercise might not work as expected due to the overhead created by the data grid itself, plus the overhead caused by a resource-hungry database runtime environment. In other cases, when it is possible to separate the data and the application, the situation resembles the case of monolithic applications and the same issues will apply.
 
Conclusion
The information in this article should give you enough ammunition to start brainstorming about the best way to grid-enable your product. If you want to grid-enable platform-specific distributed applications (both monolithic and modular) and Web-enabled applications (both servlet-centric and database-centric), this article shows you what to think about. 
Resources