Friday, April 18, 2008

Why I like Anemic Domain Model?

Martin Fowler, whom I respect, coined the term Anemic Domain Model (ADM) and called it anti-pattern.

At first, one will tend to agree with what Martin is saying. In OO design, objects must carry their state as well as their behavior. Having objects just with state (Entity Beans, Hibernate Entities) is same as C struct. Similarly having objects with just behavior (Stateless Session Beans) is transaction script pattern which is procedural style of programming.

In pure OO world, domain objects should have both state which is business information and behavior which are business rules. People call this Rich Domain Model (RDM).

Sounds good? Okay...sounds good only for small single tier desktop application.

What about multi tier and/or distributed applications? Well...then many questions pop in the mind.

  • What about separation of concerns? Aren't we mixing persistence concern with business logic concern?

  • Are Rich Domain Objects more reusable in different application? It doesn't look right. Generally Business Information is more reusable across different applications then the business rules. That is why integration between different applications is web services (XML) based. If that wasn't true, we should have seen Jini being more popular for enterprise integration then XML.

  • What about distributing the work in a large team? Doesn't Rich Domain Model requires all team members to be expert in all technologies? Doesn't this increase the cost of development?

  • I still don't have much trust in MDA and hence don't believe that complex Rich Domain objects can be auto-generated. I think C# has support for partial classes to solve such problem but there is no equivalent in Java.

  • In a real enterprise business application, business rules and business policies are more volatile than the business information. Well in a Rich Domain Model, both are combined into same classes, there is no exercise to distinguish and break down the classes into more stable package vs less stable package. Regardless of whether business information changes or business rules/policies change, same set of classes are going to be affected. Does it sound good approach? Not to me.

  • What about using Rules engine? Can these be still used with Rich Domain Model? If so how? Doesn't this require the rules and policies that alter the data to be separated from the data itself?

  • How are these Rich Domain objects implemented. Are these simply POJO or these are objects which are similar to classes that combine Stateful Session Bean and Persistent Entity? From the articles on internet and material in books, it appears that The answer here should be POJO.

  • In a multi-tier application, what does client/presentation tier see? Are Rich Domain objects exposed to the client? Can client invoke business methods locally?

  • What about the transactional boundaries? How do I ensure that business rules are executed as part of a transaction? If client is able to call the business rules methods locally, there is no transaction, there is no data source available. In real enterprise application, the connection to database server is protected and only the application server and few DBAs are allowed to connect to it. For a web only application, the data source issue shouldn't be there but what about the application that uses Rich Desktop client (Swing)?

Well even after reading many books that advocate Rich Domain Model, I don't seem to get clear answers to above questions.

Rich Domain Model is good from OO purism point of view. In fact its 'the' way of writing non-distributed applications. It should also be the choice for writing distributed application written in Jini.

However, for real enterprise applications, I will be more inclined to use ADM because:

  • I know that ADM works.

  • It allows good work distribution in a large team.

  • Model can be created in an iteration that precedes service objects development and presentation tier (and client tier) development.

  • Project sponsors in a Corporate don't care about OO purism. The bottom line for them is to deliver an application that is easy for the developers to write, can be delivered on time and in budget and which works.

Wednesday, April 09, 2008

Adaptive Persistence Model

Background

Complex enterprise systems have large number of persistence objects with complex relationships among these objects. In J2EE architecture these persistence objects are represented as Entity Beans. In this article we will assume that architecture is based on Container Managed Persistence (CMP), although the solution remains the same for Bean Managed Persistence. The relationship between persistence objects (Entity Beans) is represented as Container Managed Relation (CMR).

In order to avoid remote calls while accessing each field of entity bean, all practical architectures use Local Interface, session facade and value objects. We will assume that readers are familiar with these patterns and will not discuss them here.

When business tier needs some data from persistence tier, it will request session facade. Session facade will initialize the requested entity bean. This entity bean might also have several dependent objects defined by CMR, e. g., Person entity bean might have Address as dependent entity bean. Client who requested Person object might also need Address object. There are two options for loading these related objects. These options are discussed in following sections.

Aggressive loading

When an entity bean is initialized, all other related entity beans as defined by CMR are also loaded. When 'getValueObject()' method of entity bean is invoked, it also loads value objects from dependent CMPs, builds a complete object graph and returns that. Because of cascade read of dependent objects from database, this method call is very expensive in terms of both performance and resource usage.

The benefit of this approach is that caller will have access to all the fields/attribute of requested persistence object without having to make additional remote invocation and a trip to database.

But the problem with this approach is that it loads all the related entity beans regardless of whether caller needs them or not. If caller does not need dependent objects, then loaded objects will unnecessarily consume system memory. Since this method call has significantly slower response and high resource usage, using it for every persistence tier call might impact the system performance significantly making it unacceptable solution.

Lazy loading

The second approach of loading CMR beans is based on lazy loading. When an entity bean is initialized, only its CMP fields are loaded. Initialization of all fields defined by CMR is differed. The 'getValueObject()' method of entity bean will return a value object which has just CMP fields populated. All CMR fields on value object will remain null. The value objects are enhanced to implement the following logic in getter methods which return dependent object

  1. If dependent object has already been loaded, simply return that.

  2. If dependent object has not been loaded, retreive it by making a request to persistence tier, store it locally (to avoid remote calls for subsequent requests) and then return it.

Using this approach, caller does not have to write a special code for loading partial objects. Lazy loading is transparent from caller.

The benefit of this approach is that it does not load any unnecessary objects into memory thus saves unnessary database calls and server resources. Since the call to 'getValueObject()' is relatively inexpensive, caller might observe significant improvement in response time.

But if caller needs any dependent object and calls corresponding getter method on value object, this approach requires a round trip to persistence tier which involves remote call and database operation. If round trips to server is made only a few times, this approach will outperform the first one and should be acceptable. But in a complex business application total number of objects in object graph might easily be of the order of hundreds or even thousands. A business rule might need most or all dependent objects in order to make a business decision. This scenario will require value objects to make hundreds of remote calls and database operations in order to load all required dependent objects and might affect the system performance significantly.

Hybrid option

Obvisouly none of above two approaches can provide an elegant solution to meet varied needs of persistence model required by most enterprise systems. One might suggest using a hybrid approaches. One approach could be to allow client to specify a list of all required dependent objects when requesting a persistence object from persistence tier. Persistence tier will pre-load only requested CMR fields and will leave all other CMR fields. If caller happens to call getter for non-retreived dependent object, it will be retreived using lazy-loading approach so that caller does not see unexpected NullPointerException. Although this solution seems elegant in most cases but it has the following problems:

  1. It requires carefull inspection of the code in order to determine optimized set of CMRs to pre-load.

  2. Caller has knowledge about persistence tier's internal implementation. It breaks the decoupling rule.

  3. Since customer requirements change, this appraoch will also require developers to make sure that they modify the CMR set appropriately. In other words this approach is very vulnerable to requirement/implementation change.

Adaptive hybrid approach

This approach is mostly based on hybrid approach. It uses value objects similar to one used in lazy loading. But it differs from regular hybrid approach in that instead of requiring client to specify set of CMRs to pre-load, it uses an adaptive rules engine to make that decision. This rules engine uses caller's context and a knowledge base to determine the set of CMRs which should be pre-loaded with requested entity bean. In the beginning when knowledge base of Rules engine does not have enough information, Rules Engine might not determine correct set of CMRs to pre-load. In this case caller may still have to make remote invocation (and database call) to retreive a dependent object. Rules engine will use such incident to learn and enhance its knowldge base. Over the period of time, when knowlege base has evolved, Rules Engine should be able to make correct decision and pre-load only those CMRs required by caller. The whole pre-loading mechanism is encapsulated from caller. Following section discusses design of rules engine.

Rules Engine uses caller's stacktrace to determine its context. It stores caller's context and set of requested CMRs in the knowledge base. One caller's context might have multiple entries in knowledge base with differing set of CMRs defined. It assigns a weight to each entry. This weight is updated during engine's learning process. It bases its decision of loading CMR on 1) caller's context and 2) highest weighed entry in knowledge base.

When system creates the value object graph, it assigns a unique id to that object graph. All value objects in that graph will share the id but this id will be different from any other object graph. Rules based engine will store this id, pre-loaded CMRs and caller context in a temporary storage. If any value object has to retreive the dependent object from serve because it was not pre-loaded (known as miss), it will also pass the id to persistence tier. Rules engine will use this id and requested object information to update its knowledge base. It will add requested object information into pre-loaded CMR. Value Objects also keep track of whether it was ever invoked by the caller. When the value objects are garbage collected, system calls their 'finalize' method. Value objects use this method to send information back to rules engine to help it improve its knowldge base.

When rules engine receives a notification from value object about its life cycle event, it updates the CMR list associated with the id. If any value object was never used by caller, it will remove the object from CMR list. After rules engine has finished updating the CMR set, it will verify this set against already existed set for same caller in knowledge base. If new CMR set already has an entry in knowledge base, rules engine will just update its weight. Otherwise it will add new entry into database with new CMR set and assign it a weight.