Wednesday, April 09, 2008

Adaptive Persistence Model

Background

Complex enterprise systems have large number of persistence objects with complex relationships among these objects. In J2EE architecture these persistence objects are represented as Entity Beans. In this article we will assume that architecture is based on Container Managed Persistence (CMP), although the solution remains the same for Bean Managed Persistence. The relationship between persistence objects (Entity Beans) is represented as Container Managed Relation (CMR).

In order to avoid remote calls while accessing each field of entity bean, all practical architectures use Local Interface, session facade and value objects. We will assume that readers are familiar with these patterns and will not discuss them here.

When business tier needs some data from persistence tier, it will request session facade. Session facade will initialize the requested entity bean. This entity bean might also have several dependent objects defined by CMR, e. g., Person entity bean might have Address as dependent entity bean. Client who requested Person object might also need Address object. There are two options for loading these related objects. These options are discussed in following sections.

Aggressive loading

When an entity bean is initialized, all other related entity beans as defined by CMR are also loaded. When 'getValueObject()' method of entity bean is invoked, it also loads value objects from dependent CMPs, builds a complete object graph and returns that. Because of cascade read of dependent objects from database, this method call is very expensive in terms of both performance and resource usage.

The benefit of this approach is that caller will have access to all the fields/attribute of requested persistence object without having to make additional remote invocation and a trip to database.

But the problem with this approach is that it loads all the related entity beans regardless of whether caller needs them or not. If caller does not need dependent objects, then loaded objects will unnecessarily consume system memory. Since this method call has significantly slower response and high resource usage, using it for every persistence tier call might impact the system performance significantly making it unacceptable solution.

Lazy loading

The second approach of loading CMR beans is based on lazy loading. When an entity bean is initialized, only its CMP fields are loaded. Initialization of all fields defined by CMR is differed. The 'getValueObject()' method of entity bean will return a value object which has just CMP fields populated. All CMR fields on value object will remain null. The value objects are enhanced to implement the following logic in getter methods which return dependent object

  1. If dependent object has already been loaded, simply return that.

  2. If dependent object has not been loaded, retreive it by making a request to persistence tier, store it locally (to avoid remote calls for subsequent requests) and then return it.

Using this approach, caller does not have to write a special code for loading partial objects. Lazy loading is transparent from caller.

The benefit of this approach is that it does not load any unnecessary objects into memory thus saves unnessary database calls and server resources. Since the call to 'getValueObject()' is relatively inexpensive, caller might observe significant improvement in response time.

But if caller needs any dependent object and calls corresponding getter method on value object, this approach requires a round trip to persistence tier which involves remote call and database operation. If round trips to server is made only a few times, this approach will outperform the first one and should be acceptable. But in a complex business application total number of objects in object graph might easily be of the order of hundreds or even thousands. A business rule might need most or all dependent objects in order to make a business decision. This scenario will require value objects to make hundreds of remote calls and database operations in order to load all required dependent objects and might affect the system performance significantly.

Hybrid option

Obvisouly none of above two approaches can provide an elegant solution to meet varied needs of persistence model required by most enterprise systems. One might suggest using a hybrid approaches. One approach could be to allow client to specify a list of all required dependent objects when requesting a persistence object from persistence tier. Persistence tier will pre-load only requested CMR fields and will leave all other CMR fields. If caller happens to call getter for non-retreived dependent object, it will be retreived using lazy-loading approach so that caller does not see unexpected NullPointerException. Although this solution seems elegant in most cases but it has the following problems:

  1. It requires carefull inspection of the code in order to determine optimized set of CMRs to pre-load.

  2. Caller has knowledge about persistence tier's internal implementation. It breaks the decoupling rule.

  3. Since customer requirements change, this appraoch will also require developers to make sure that they modify the CMR set appropriately. In other words this approach is very vulnerable to requirement/implementation change.

Adaptive hybrid approach

This approach is mostly based on hybrid approach. It uses value objects similar to one used in lazy loading. But it differs from regular hybrid approach in that instead of requiring client to specify set of CMRs to pre-load, it uses an adaptive rules engine to make that decision. This rules engine uses caller's context and a knowledge base to determine the set of CMRs which should be pre-loaded with requested entity bean. In the beginning when knowledge base of Rules engine does not have enough information, Rules Engine might not determine correct set of CMRs to pre-load. In this case caller may still have to make remote invocation (and database call) to retreive a dependent object. Rules engine will use such incident to learn and enhance its knowldge base. Over the period of time, when knowlege base has evolved, Rules Engine should be able to make correct decision and pre-load only those CMRs required by caller. The whole pre-loading mechanism is encapsulated from caller. Following section discusses design of rules engine.

Rules Engine uses caller's stacktrace to determine its context. It stores caller's context and set of requested CMRs in the knowledge base. One caller's context might have multiple entries in knowledge base with differing set of CMRs defined. It assigns a weight to each entry. This weight is updated during engine's learning process. It bases its decision of loading CMR on 1) caller's context and 2) highest weighed entry in knowledge base.

When system creates the value object graph, it assigns a unique id to that object graph. All value objects in that graph will share the id but this id will be different from any other object graph. Rules based engine will store this id, pre-loaded CMRs and caller context in a temporary storage. If any value object has to retreive the dependent object from serve because it was not pre-loaded (known as miss), it will also pass the id to persistence tier. Rules engine will use this id and requested object information to update its knowledge base. It will add requested object information into pre-loaded CMR. Value Objects also keep track of whether it was ever invoked by the caller. When the value objects are garbage collected, system calls their 'finalize' method. Value objects use this method to send information back to rules engine to help it improve its knowldge base.

When rules engine receives a notification from value object about its life cycle event, it updates the CMR list associated with the id. If any value object was never used by caller, it will remove the object from CMR list. After rules engine has finished updating the CMR set, it will verify this set against already existed set for same caller in knowledge base. If new CMR set already has an entry in knowledge base, rules engine will just update its weight. Otherwise it will add new entry into database with new CMR set and assign it a weight.

No comments: