Thursday, August 11, 2005

Intermediate Representations vs Software Performance

Intermediate Representations vs Software Performance

During last few days some of my friends at Scali-Norway were working on writing a Data Access Objects (DAOs) to access system data. They were writing their DAOs in C++ and used a third party driver to access the postgress database. In the simplest scenario DAO reads rows in a table and converts them to a intermediate object representation and returns the collection of objects. Problem they are facing now is the less performance and high memory usage. By going through their code I found that they would have avoided these problems if they have designed the DAOs correctly at the first place. Anyway in this post I will discuss how to represent your data in an intermediate representation without compromizing performance or memory usage.

To demonstrate the concept I will take a simple example. Assume u have a "customer_data" table. U need to send mails to all the customers who have not paid their bills for the last month. U need to interact with an external billing service to check whether a particular customer has paid or not. Also for extensibility u need to wrap customer's data in a "Customer" object which is your intermediate representation. Think a while about a design that you will propose to solve this problem....

The approach my friends have taken is as follows:

They have a DAO class which will iterate thruogh the database recordset and create a collection of Customer objects which will be returned to the upper layer.

public Customer[] getAllCustomers() {

Customer[] customers = new Customer[];
RecordSet rs = getSQLRecordSet();
while (rs.hasnext()) {

customers[i] = convertRecordToCustomer(rs.next());

}
return customers;

}


Then in the business method they iterate through the given Customer collection and send a mail if that particular customer has not paid his bill.

public void informCustomers() {

Customer[] customers = DAO.getAllCustomers();
for each customer {

if (customer has not paid the bill)
{
sendMailToCustomer();
}

}

}


If we analyze the above solution, we can see to perform a given operation we need to go through two loops. Say they had 1000 records in the resultset, then first they will be running a loop of 1000 to convert their recordset rows in to intermediate representation. And another 1000 loop to perform actual business process on those records. The worst part is that you load all your data in to a collection of objects and pass the collection to the upper layer. Imagine you have 10M records in your database, you will probably run out of memory in trying to perform the above operation.


But the good news is you can easily find a better solution to the above problem by applying a simple design pattern. The important point here is actually the business process doesnt need all the customer objects at once to be in the memory, instead it needs one at a time to process. So in our new implementation we write our business logic in a listner class implementing a interface called "CustomerListner" as follows.


interface CustomerListner {

public void onRecord(Customer cust);
}

class BusinessProcessAction implements CustomerListner {

public void onRecord(Customer cust) {

if (customer not paid the bill)
{
sendMailToCustomer();
}

}


}

Now we call the business process in the following manner. We create an instance of BusinessProcessAction which will deal with one customer instance and process the logic on it. Then we ask the DAO to notify the BusinessProcessAction instance as and when it reads a data record from the database. Code will look like as
follows..

BusinessProcessAction action = new BusinessProcessAction ();
DAO.process(action);

Our DAO will act as a producer of customer instances. Once we called the DAO.process() method it will start fetching records from the database, convert that record to a Customer instance and then it will ask our BusinessProcessAction to perform the business logic on that customer.

class DAO {

public void process(CustomerListner listner) {

RecordSet rs = getSQLRecordSet();


while (rs.hasnext()) {

Customer customer = convertRecordToCustomer(rs.next());
listner.onRecord(customer);

}

}


}


This will do the job for you. If you closely look at this implementation u can see we have achived all our objectives.
1) Altogether we runs only 1000 loops for 1000 records
2) We haven't load more than one record to the memory at once
3) Most important is.. our business method still uses Intermediate Representation and it is independent of database formats..


Specially when the record production operation is asynchronous this pattern gives a definite edge. Assume in above example fetching data from the datasource takes 1 hour per record.. Then with earlier implementation it will take 1000hour before transfering the control to the business layer. But with the second approach the processing is real-time, as and when a record is fetched from the database a mail will be sent to that customer.


This example is just to demonstrate the concept of the pattern. You can extend this pattern to achive much more flexibility if u think more given a specific problem. If you start looking at problems relating them to this pattern you will be able to save most of the processing time and the memory usage of your programs....

6 comments:

Hasith Yaggahavita said...

I forgot to mention the purpose of the listner interface... genarally we dont need our DAOs to be dependent on business tier.. but business tier will be dependent on the DAOs. That CustomerListner interface is defined in the DAO layer. Our business action will inplement the interface. If we do not have that interface then it will cause cyclic dependency between DAO and business tier, which is not acceptable..

88Pro said...

Hasith,

I have few questions just to clarify some of my doubts. In your imporved DAO there is a process method which is void. However I am sure as a DAO there would be a need for it to return all Customers (Customer[]) for some other functions. Say for listing. So will the existing getAllCustomer method will be there in the DAO as well?

Also I think there are Disconnected ResultSets in latest JDBC where you wouldnt need intermediate objects. If you have the luxury of using Disconnected ResultSets what would be the consideration between choosing between listener pattern and and Cached ResultSets?

Hasith Yaggahavita said...

Well.. there may be requirements where u need a set of objects to be in memory. For an example in a case of object comparison. But I dont think u need to do so for a listing of objects. Assume the following scenario.. U need to show the names of all the customers in a Swing listbox. Then youe UI composite may implement the ListnerInterface and your UI controller will pass your UI composite to the DAO.process() method. DAO will call "onRecord" method of the given listner for each record. UI component will add customer name to the listbox as and when it's "onRecord" is called. Same concept can be applied for Struts, JSP environment as well.

The concept is to avoid the physical seperation of system layers. But still we keep well seperated logically. This is some thing being we don't think of much as Java guys.. we generally use so much serializations when passing data from one tier to another. We try to keep our layers physically seperated, which is great if you want your layers to be deployed on seperate VMs. But that is not a requirement for many of the cases.

When talking about JDBC there are lots of standard features we may use like disconnected resultsets, paging capabilities, etc.. (But all these cool features are not supported by all of the database drivers we use) Even in a case of disconnected resultset we will be holding the results on the memory. If you do not need that overhead and if you require only one object in the memory for your operation listners will perform better.

Also this is not a pattern for implementing just DAOs.. I took this example just to demonstrate the idea. In java world we will be using database framework for database access in most of the cases. But there are instances we will have to write our own Data Access Layers for different data sources. E.g. Socket data streams, CIM servers, WebServices etc...

Hasith Yaggahavita said...

This pattren has some drawbacks also.. One problem is you dont get full controll of the process withing the business tier. The process actually executed withing the Data Access Layer. This can make some issues in complex exception handling requirements.

One appropriate example use of this will be writing a middle data access tier on top of a database. Assume you are going to implement a CIM API for users to work on CIM objects. You will be providing a layer which hides actual database representation and allow users to work with CIM standard objects. In your API you can allow uses to access their objects either as implementing object listners or to obtain the object list in traditional way.

In above example it is obvious that the clients of your API will run on the same VM. Then there is no requirement for serialization of data. Thus use of listners will provide high performance and less memory usage.

James Baker said...

Congratulations Friend for your excellent blog on small business resources!Keep up the good work!
If you have a moment, please visit my site:
small business resources
I send you my warm regards and wish you continued success.
Have a nice day! :-)

Anonymous said...

HI Hasith Ayya,

I think here we are only having Memory usage optimization and not having performance optimization,becouse of . Becouse of in both cases it's runing same number of statements from both loops. But great design pattern to Optimize Memory while databse querying.

Lakmal