Basant's Technical Corner: 2008

Tuesday, May 13, 2008

VMWare accuses Microsoft for playing dirty tricks trying to prevent customers from using virtualized machines.
http://www.vmware.com/solutions/whitepapers/msoft_licensing_wp.html

Let us see how long will the Microsoft be able to deny the evolution in hardware Virtualization.

Monday, May 12, 2008

Should I use JPA in my Business Application?

Why I love JPA is because it promises to relieve and Architect from making early decision for choosing a persistence engine। A development team can start the development with most common persistence engine। Since the application uses standard JPA, the application Architect can take time to evaluate and identify the best persistence engine for the application.

Well! Does JPA 1.0 live to its promise? Not really. I learned it the hard way. There are many features which are crucial in building an enterprise application but are missing from JPA 1.0. However these features are already implemented by all major persistence engines. Some of these missing features are :

JPA does not support deleting orphaned children. For example, I have two persistence entities; Order and LineItem. The relation from Order to LineItem is composition. An Order can contain many LineItems. LineItem entity's life is bound to an Order. It can not live on its own. Well now user creates an order, adds a line item to the order and saves it. User then deletes old line item from then order and adds a new line item and re-saves the order. Now what happens during second save is that JPA instructs underlying engine to delete the association between first line and order however the line item itself is never deleted. It's order id will be null. If you put constraint in the database that order_id in lineitem table can not be null, the second save operation will fail with database constraint exception. Well hibernate has an additional annotation @CascadeType.DELETE_ORPHAN' to delete such orphan children. TopLink has similar extension. I don't know how did JPA spec team missed such an important feature.
JPA 1.0 has lack of custom datatype mapping to database. Support for Boolean is weak in JPA. I know Database savvy people (especially oracle savvy) like to represent boolean flag with single character column with possible values of 'Y', 'N' or null however, Java people like to represent it with and attribute of type Boolean. How do we fill the gap? Well hibernate allows one to specify different type of mapping for boolean. You can choose datatype as yes_no. However JPA 1.0 does not have any equivalent.
JPA supports JPQL which is a huge step forward from old EJBQL. However, since JPA does not have support for Criteria API (similar to the one in Hibernate), developers are left to deal with the mess of String manipulation for building dynamic JPQL for the given dynamic search criteria. Dynamic search criteria isn't uncommon for any enterprise application. In small application, you might see very messy code that does criteria comparison and string manipulation. Or for big projects, you will see a home-grown solution that is similar to Hibernate Criteria API.
This list can go on and on such as missing support for primitive array. However, I think I made the point.

Now in order to fill the gap, we have three options:

We can write the application code to fill the missing gaps.
We can write home grown abstraction layer that hides the internal details of vendor specific extensions.
Or we can commit ourselves to a specific persistence engine and use it's extended features.

I think everybody will agree with me that option 1 should be out unless you know what you are doing (special defense project, need extremely high performance system, manager got too much money to spend)

If I am standard savvy, Sun Microsystem's employee or working for a large project, I will choose the second option as it still leaves the door open for using different persistence engine in future.

Well, I am though standard savvy. But over many years I learned that a business does not give a “****” to the standards. All it cares about is a solution that can be developed within budget by given developers and will be able to keep their business running. So for small to medium size projects, I will recommend on using a concrete persistence engine. Believe me, it will save you lot of time and energy during the design and development.

I hope that I don't have to stick with my decision for long time. I have seen the JPA 2.0 specification which seems to address many issues with JPA 1.0. I hope with the release of JPA 2.0, we should be able to use standard based persistence solution in the application.

Friday, April 18, 2008

Why I like Anemic Domain Model?

Martin Fowler, whom I respect, coined the term Anemic Domain Model (ADM) and called it anti-pattern.

At first, one will tend to agree with what Martin is saying. In OO design, objects must carry their state as well as their behavior. Having objects just with state (Entity Beans, Hibernate Entities) is same as C struct. Similarly having objects with just behavior (Stateless Session Beans) is transaction script pattern which is procedural style of programming.

In pure OO world, domain objects should have both state which is business information and behavior which are business rules. People call this Rich Domain Model (RDM).

Sounds good? Okay...sounds good only for small single tier desktop application.

What about multi tier and/or distributed applications? Well...then many questions pop in the mind.

What about separation of concerns? Aren't we mixing persistence concern with business logic concern?
Are Rich Domain Objects more reusable in different application? It doesn't look right. Generally Business Information is more reusable across different applications then the business rules. That is why integration between different applications is web services (XML) based. If that wasn't true, we should have seen Jini being more popular for enterprise integration then XML.
What about distributing the work in a large team? Doesn't Rich Domain Model requires all team members to be expert in all technologies? Doesn't this increase the cost of development?
I still don't have much trust in MDA and hence don't believe that complex Rich Domain objects can be auto-generated. I think C# has support for partial classes to solve such problem but there is no equivalent in Java.
In a real enterprise business application, business rules and business policies are more volatile than the business information. Well in a Rich Domain Model, both are combined into same classes, there is no exercise to distinguish and break down the classes into more stable package vs less stable package. Regardless of whether business information changes or business rules/policies change, same set of classes are going to be affected. Does it sound good approach? Not to me.
What about using Rules engine? Can these be still used with Rich Domain Model? If so how? Doesn't this require the rules and policies that alter the data to be separated from the data itself?
How are these Rich Domain objects implemented. Are these simply POJO or these are objects which are similar to classes that combine Stateful Session Bean and Persistent Entity? From the articles on internet and material in books, it appears that The answer here should be POJO.
In a multi-tier application, what does client/presentation tier see? Are Rich Domain objects exposed to the client? Can client invoke business methods locally?
What about the transactional boundaries? How do I ensure that business rules are executed as part of a transaction? If client is able to call the business rules methods locally, there is no transaction, there is no data source available. In real enterprise application, the connection to database server is protected and only the application server and few DBAs are allowed to connect to it. For a web only application, the data source issue shouldn't be there but what about the application that uses Rich Desktop client (Swing)?

Well even after reading many books that advocate Rich Domain Model, I don't seem to get clear answers to above questions.

Rich Domain Model is good from OO purism point of view. In fact its 'the' way of writing non-distributed applications. It should also be the choice for writing distributed application written in Jini.

However, for real enterprise applications, I will be more inclined to use ADM because:

I know that ADM works.
It allows good work distribution in a large team.
Model can be created in an iteration that precedes service objects development and presentation tier (and client tier) development.
Project sponsors in a Corporate don't care about OO purism. The bottom line for them is to deliver an application that is easy for the developers to write, can be delivered on time and in budget and which works.

Wednesday, April 09, 2008

Adaptive Persistence Model

Background

Complex enterprise systems have large number of persistence objects with complex relationships among these objects. In J2EE architecture these persistence objects are represented as Entity Beans. In this article we will assume that architecture is based on Container Managed Persistence (CMP), although the solution remains the same for Bean Managed Persistence. The relationship between persistence objects (Entity Beans) is represented as Container Managed Relation (CMR).

In order to avoid remote calls while accessing each field of entity bean, all practical architectures use Local Interface, session facade and value objects. We will assume that readers are familiar with these patterns and will not discuss them here.

When business tier needs some data from persistence tier, it will request session facade. Session facade will initialize the requested entity bean. This entity bean might also have several dependent objects defined by CMR, e. g., Person entity bean might have Address as dependent entity bean. Client who requested Person object might also need Address object. There are two options for loading these related objects. These options are discussed in following sections.

Aggressive loading

When an entity bean is initialized, all other related entity beans as defined by CMR are also loaded. When 'getValueObject()' method of entity bean is invoked, it also loads value objects from dependent CMPs, builds a complete object graph and returns that. Because of cascade read of dependent objects from database, this method call is very expensive in terms of both performance and resource usage.

The benefit of this approach is that caller will have access to all the fields/attribute of requested persistence object without having to make additional remote invocation and a trip to database.

But the problem with this approach is that it loads all the related entity beans regardless of whether caller needs them or not. If caller does not need dependent objects, then loaded objects will unnecessarily consume system memory. Since this method call has significantly slower response and high resource usage, using it for every persistence tier call might impact the system performance significantly making it unacceptable solution.

Lazy loading

The second approach of loading CMR beans is based on lazy loading. When an entity bean is initialized, only its CMP fields are loaded. Initialization of all fields defined by CMR is differed. The 'getValueObject()' method of entity bean will return a value object which has just CMP fields populated. All CMR fields on value object will remain null. The value objects are enhanced to implement the following logic in getter methods which return dependent object

If dependent object has already been loaded, simply return that.
If dependent object has not been loaded, retreive it by making a request to persistence tier, store it locally (to avoid remote calls for subsequent requests) and then return it.

Using this approach, caller does not have to write a special code for loading partial objects. Lazy loading is transparent from caller.

The benefit of this approach is that it does not load any unnecessary objects into memory thus saves unnessary database calls and server resources. Since the call to 'getValueObject()' is relatively inexpensive, caller might observe significant improvement in response time.

But if caller needs any dependent object and calls corresponding getter method on value object, this approach requires a round trip to persistence tier which involves remote call and database operation. If round trips to server is made only a few times, this approach will outperform the first one and should be acceptable. But in a complex business application total number of objects in object graph might easily be of the order of hundreds or even thousands. A business rule might need most or all dependent objects in order to make a business decision. This scenario will require value objects to make hundreds of remote calls and database operations in order to load all required dependent objects and might affect the system performance significantly.

Hybrid option

Obvisouly none of above two approaches can provide an elegant solution to meet varied needs of persistence model required by most enterprise systems. One might suggest using a hybrid approaches. One approach could be to allow client to specify a list of all required dependent objects when requesting a persistence object from persistence tier. Persistence tier will pre-load only requested CMR fields and will leave all other CMR fields. If caller happens to call getter for non-retreived dependent object, it will be retreived using lazy-loading approach so that caller does not see unexpected NullPointerException. Although this solution seems elegant in most cases but it has the following problems:

It requires carefull inspection of the code in order to determine optimized set of CMRs to pre-load.
Caller has knowledge about persistence tier's internal implementation. It breaks the decoupling rule.
Since customer requirements change, this appraoch will also require developers to make sure that they modify the CMR set appropriately. In other words this approach is very vulnerable to requirement/implementation change.

Adaptive hybrid approach

This approach is mostly based on hybrid approach. It uses value objects similar to one used in lazy loading. But it differs from regular hybrid approach in that instead of requiring client to specify set of CMRs to pre-load, it uses an adaptive rules engine to make that decision. This rules engine uses caller's context and a knowledge base to determine the set of CMRs which should be pre-loaded with requested entity bean. In the beginning when knowledge base of Rules engine does not have enough information, Rules Engine might not determine correct set of CMRs to pre-load. In this case caller may still have to make remote invocation (and database call) to retreive a dependent object. Rules engine will use such incident to learn and enhance its knowldge base. Over the period of time, when knowlege base has evolved, Rules Engine should be able to make correct decision and pre-load only those CMRs required by caller. The whole pre-loading mechanism is encapsulated from caller. Following section discusses design of rules engine.

Rules Engine uses caller's stacktrace to determine its context. It stores caller's context and set of requested CMRs in the knowledge base. One caller's context might have multiple entries in knowledge base with differing set of CMRs defined. It assigns a weight to each entry. This weight is updated during engine's learning process. It bases its decision of loading CMR on 1) caller's context and 2) highest weighed entry in knowledge base.

When system creates the value object graph, it assigns a unique id to that object graph. All value objects in that graph will share the id but this id will be different from any other object graph. Rules based engine will store this id, pre-loaded CMRs and caller context in a temporary storage. If any value object has to retreive the dependent object from serve because it was not pre-loaded (known as miss), it will also pass the id to persistence tier. Rules engine will use this id and requested object information to update its knowledge base. It will add requested object information into pre-loaded CMR. Value Objects also keep track of whether it was ever invoked by the caller. When the value objects are garbage collected, system calls their 'finalize' method. Value objects use this method to send information back to rules engine to help it improve its knowldge base.

When rules engine receives a notification from value object about its life cycle event, it updates the CMR list associated with the id. If any value object was never used by caller, it will remove the object from CMR list. After rules engine has finished updating the CMR set, it will verify this set against already existed set for same caller in knowledge base. If new CMR set already has an entry in knowledge base, rules engine will just update its weight. Otherwise it will add new entry into database with new CMR set and assign it a weight.

Thursday, March 13, 2008

E4X (ECMAScript for XML)

E4X is a scripting language extension that adds native XML support to ECMAscript. It does this by providing access to the XML document in a form that feels natural for ECMAscript programmers. The goal is to provide a simpler API for accessing XML documents, than other common APIs, such as DOM or XSLT.

Lets take a look at a few examples of how you can read XML data using E4X.

You will be able to create variables of type XML by parsing a String. But XML literals will now be supported as well:

var employees:XML =
<employees>
    <employee ssn="”123-123-1234″">
        <name first="”John”" last="”Doe”/"></name>
        <address>
            <street>11 Main St.</street>
            <city>San Francisco</city>
            <state>CA</state>
            <zip>98765</zip>
        </address>
    </employee>
    <employee ssn="”789-789-7890″">
        <name first="”Mary”" last="”Roe”/"></name>
        <address>
            <street>99 Broad St.</street>
            <city>Newton</city>
            <state>MA</state>
            <zip>01234</zip>
        </address>
   </employee>
</employees>
;

Instead of using DOM-style APIs like firstChild, nextSibling, etc., with E4X you just “dot down” to grab the node you want. Multiple nodes are indexable with [n], similar to the elements of an Array:

trace(employees.employee[0].address.zip);
—
98765

To grab an attribute, you just use the .@ operator:

If you don’t pick out a particular node, you get all of them, as an indexable list:

trace(employees.employee.name);
—


<name first="”John”" last="”Doe”/">
<name first="”Mary”" last="”Roe”/">

(And note that nodes even toString() themselves into formatted XML!)

A handy double-dot operator lets you omit the “path” down into the XML _expression_, so you could shorten the previous three examples to

trace(employees..zip[0]);
trace(employees..name);

You can use a * wildcard to get a list of multiple nodes or attributes with various names, and the resulting list is indexable:

trace(employees.employee[0].address.*);
—


<street>11 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>98765</zip>

trace([EMAIL PROTECTED]);

—
Doe

You don’t have to hard-code the identifiers for the nodes or attributes… they can themselves be variables:


var whichNode:String = “zip”;
trace(employees.employee[0].address[whichNode]);

—
98765


var whichAttribute:String = “ssn”;
trace([EMAIL PROTECTED]);

—
789-789-7890

A new for-each loop lets you loop over multiple nodes or attributes:


for each (var ssn:XML in [EMAIL PROTECTED])
{
    trace(ssn);
}

—
123-123-1234
789-789-7890

Most powerful of all, E4X supports “predicate filtering” using the syntax .(condition), which lets you pick out nodes or attributes that meet a condition you specify using a Boolean _expression_. For example, you can pick out the employee with a particular social security number like this, and get her state:


var ssnToFind:String = “789-789-7890″;
trace(employees.employee.(@ssn == ssnToFind)..state);

Instead of using a simple conditional operator like ==, you can also write a complicated predicate filtering function to pick out the data you need.

E4X has complete support for XML namespace.

Compared with the current XML support in the Adobe Flash, E4X allows you to write less code and execute it faster because more processing can be done at the native speed of C++.

Friday, January 04, 2008

Persistence Objects and Object Equality in Java

When implementing equals and hashCode methods in Persistence Domain Objects, it is natural instinct to assume that you will need to only compare its database key which is also known as surrogate key.

When you recommend this approach, bright mind people will dismiss this idea by making following arguments

Java Language Specification says that two objects are equal only and only if their state is equal.
In long lived enterprise applications (i.e. applications which will live longer than 5-6 years) the database key for some of the tables might have to be recycled because of the constraints of size of primary key field. In such case you can not use database key as two different records can have same primary key ( i.e. one before recycle and second after recycle)

In my opinion above arguments are only partially valid.

In most application (including enteprise, scientific and novice), the equality operation requirement says that two objects are equal only if they represent the same physical entity regardless of their state. If two different objects share the state, they are identical but not equal. For example, in real life if you take snapshot of one person at the different times, the state of that person might be different, but these snapshot represent the same person. A record in database represents an entity. Similary two persons might share the state (such as name, physical attributes, health condition, place, academics, jobs etc.) but they ARE different persons and that is why govenment came up with an artifical concept such Social Security number to distinguish them.
If two objects have same primary key because of recyle, these objects will have been created at least 5-6 years apart. Also since database is very strict about not allowing duplicate primary key, previous set of records must have been archived into dataware house and removed from the database used for OLTP. In such case I don't see why and how will you create these two objects in the same application. As a result, I don't see why will you into hypothetical problem described in the point 2.

Basant's Technical Corner