Jeyanthi's Java Cafe: 2010

Converting TimeZones

Converting timezones can be a bit complicated if you are attempting to do it for the first time. There are problems using the java.util classes that can introduce bugs in your application.

Let's say you have a java.util.date field that you retrieved from the database and you want to display this value in the user's local timezone.

java.util.Date always reflects UTC since they do not have timezones. So you cannot convert a Date from one timezone to another. Only a Calendar or a textual representation of a date can be converted into a different time zone.

So the first step to do this would be convert the Date to a Calendar.

The normal way of doing it would be

You would then use a DateFormat to display this time in whatever format and timezone is desired.

Example:

But Calendar.getTime() is not recommended and there is a bug explaining the workaround.
Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4031021

So we would use a method like the one below to get the time value out of the Calendar object.

So the date display would now be:

But using Calendar.setTime() might not convert the UTC date into a local timezone correctly. The problem I noticed was a one hour difference when the UTC date ends up in the next day corresponding to the user's local time. Also, this only occurs if you are dealing with a day later than the current day.

Example: Lets say the UTC date value is Dec 21st 2010 2:45 AM
We need this value to be converted to a user's local timezone, let's say EST time.

So the correct converted date would be
Dec 20th 2010 9:45 PM (after DST change)

There is an alternative way of converting timezones to fix the above problem.

Step 1:
Convert the Date field to a Calendar object. All timezone conversions should be done in a java.util.Calendar.

Note that only the "time" field is stored in the resulting Calendar object. The date will be set in Step 4. This workaround is required for the problem described above to be resolved.

Step 2:

Convert the timezone on the Time field:

The resulting Calendar object will have the correct time set, but will still have today's(current day's) date.

Step 3:

Calculate the difference between the current day and the original date that had to be converted:

Step 4:
Add the difference to the converted Calendar object. The final code combining all the above steps is as follows:

Reference:
Date and time in Java

How to get the MD5 checksum for a file

MD5 Hashsums are use to confirm the integrity of files. That is especially important when very large files are downloaded or moved from one location to another. To check the integrity of a file, its corresponding MD5 file can be downloaded and compared to the MD5 created locally on the file downloaded.

The program below can be used to create the MD5 hash for a file using the Apache Commons Codec lib:

The Windows program "md5sums" creates MD5 hashes.

In Unix, the BSD program "md5sum" creates MD5 hashes.

Reference
http://www.rgagnon.com/javadetails/java-0416.html

http://commons.apache.org/codec/

http://www.pc-tools.net/win32/md5sums/

Reading a file into a Java String

Comparing Java Date and Oracle Timestamp

Oracle 9.2 through 10.2 jdbc drivers causes java.util.Date to be mapped to java.sql.Date which does not include time information. Hence a direct comparison of a Java Date object to an Oracle Timestamp column like below will throw database errors

new java.sql.Date( .getTime() )

ORA-00932: inconsistent datatypes: expected TIMESTAMP got NUMBER

To be able to compare an Oracle Timestamp to a Java Date object, follow the steps below:

Step 1: Convert the Java Date to a String

Example:

Step 2: Use the String value in the SQL and use Oracle to_date() to convert it into Oracle date

Example:

Step 3: Cast the Oracle date into an Oracle timestamp
Example:

Reference
Oracle JDBC FAQ

Java Snippets on Date and Timestamp

Creating a Java Timestamp object

Convert String to Date

Converting java.util.Date to java.sql.Date

The difference between both these date formats is the extra millisecond in java.sql.date

java.util.date ---------->dd mm yyyy hh:mm:ss

java.sql.date----------->dd mm yyyy hh:mm:ss: ms

The conversion is simple

Setting Date fields in the database using PreparedStatement
While setting a date using PreparedStatement, using setDate strips out the Time value. If you need the time field to be retained, use PreparedStatement setTime instead.

Getting Date fields from the database using ResultSet
While retrieving the Date value from the database, using getDate on ResultSet normally strips out the Time Value. To preserve the time fields, use ResultSet getTimeStamp instead.

CLOB's and Java Spring Framework

A CLOB (Character Large Object) is an Oracle data type that can hold up to 4 GB of data. CLOB's are handy for storing large text since a VARCHAR has a maximum limit of 4000-bytes.

Here is an example of how to create a table with a CLOB:

Some common operations on CLOB's:

Writing to a CLOB:

Substr:

Finding the length of a CLOB:

Some caveats when dealing with LOB's:

LOBs are not allowed in GROUP BY, ORDER BY, SELECT DISTINCT, aggregates and JOINS
You cannot alter a table to add a CLOB column after creation. CLOB's have to be created at table creation time
The Oracle SQL parser can only handle string literals up to 4000 characters in length

Dealing with CLOB’s in Java using Spring Framework:

The code below shows the basic parts of how to read and write to a CLOB using Spring 2.x.

There are also other options like using a MappingSqlQuery to read from the CLOB.

Reference:
Short examples from ASKTOM

Handling CLOBs - Made easy with Oracle JDBC

DbUnit Framework

DbUnit extends the popular JUnit test framework and is a useful and powerful tool for simplifying unit testing of database operation.
The core components of DBUnit are:

IDatabaseConnection

DatabaseConnection - wraps a JDBC connection
DatabaseDataSourceConnection - wraps a JDBC Data Source

IDataSet

FlatXmlDataSet
DefaultDataSet
XmlDataSet
CompositeDataSet
StreamingDataSet
FilteredDataSet
DatabaseDataSet
XlsDataSet
QueryDataSet
ReplacementDataSet

DatabaseOperation
Operations (named "DatabaseOperation."xyz")

UPDATE
TRUNCATE
INSERT
REFRESH
DELETE
CLEAN_INSERT
DELETE_ALL
NONE

Other Operations

CompositeOperation
TransactionOperation
IdentityInsertOperation

Steps :

Create your dataset load file
Write your testcases
Implement your testXXX() methods

Step 1: Create your dataset load file:
One unit test database instance can be created per developer with no data. DbUnit has the ability to export and import your database data to and from XML datasets. If the testcases are setup correctly, the data required for testing is loaded from the XML files before testcase execution and deleted after it is complete. So there will be no need to cleanup afterwards.
Data may also be generated from a number of sources:

flat XML files, of which we will see an example very shortly
tables in the database
database queries created using SQL
less obvious sources, such as Microsoft Excel spreadsheets

Here is a simple example of how to extract data into XML files from a Postgres database:

Step 2: Write your testcases

Create a wrapper class by extending the DBTestCase class. The wrapper class has the following important methods:

getDataSet (): you can use this to indicate how to load test data
getSetUpOperation() : defines the operation to be executed before the test is executed
getTearDownOperation() : defines clean up operation

To make testing easier, you can create a base class that extends DBTestCase for the entire project. It will look something like this:

Step 3. Implement your testXXX() methods
Implement your test methods using Junit. Your database is now initialized before and cleaned-up after each test methods according to what you did in previous steps.

By extending the base class created above, we can create any number of DBUnit testcases . Here is an example of a complete DBUnit testcase for testing code used for executing log queries using Spring framework. The methods tested here are getLogInDateRange(), addLogEntry() and updateLogEntry().

Advantages of DBUnit:

It is an excellent way to avoid the myriad of problems that can occur when one test case corrupts the database and causes subsequent tests to fail or exacerbate the damage
DbUnit can also help you to verify that your database data match an expected set of values and it has methods for comparing data, between flat files, queries and database tables
Because the test data remains unchanged in an XML file, testcases can be executed repeatedly after iterative code changes to verify if the code didn't break.

References:
http://www.dbunit.org/index.html

JIRA and Crucible

Recently I've started using some new tools from Atlassian and they have some really cool features compared to others I've used so far.

JIRA is Atlassian's issue tracking system and these are some of the things I like about it:
1. A cool web-based UI

2. Plug-ins for IntelliJ IDEA that makes it so much easier to use

3. Workflow customizable to any project

4. A customizable dashboard with project reports and statistics

5. Multi-dimensional reporting - reports can be read in a variety of dimensions, including pie chart, linear or column graphs, and statistical tables

It was first launched in 2003 and so is relatively new compared to Bugzilla and other similar tools

Crucible is Atlassian's code review tool. It is a lightweight, web-based tool that makes code review a breeze.
For someone used to various code review tools, this seems to be the best and all developers will love it.

These are some of the cool things about Crucible:
1. The team doesn't have to meet together in one place. Code revies can be completed asynchronusly
2. Sub-tasks or issues can be created out of code review comments
3. JIRA integration
4. Inline comments makes it so much more easier to track the review and provide feedback
5. The diff tool is very useful and pleasant to use

The many uses of DBMS_RANDOM

DBA's are often faced with scenarios where they have to generate test data for developers. Let's say you have a test database with no values in a certain field. Using DBMS_RANDOM, you can just generate them on your own.
Example: Refer the table structure in my previous post. Let's say I don't have any values for Medication. My code values for Medication are 4 digits and range from 0001 to 0100.
These are some ways I would add data for these columns: Let's say I now need to set a few of them to '-1', which is the code for missing data. Here is a great way to do it:
Pick any random value from some other field and use it to assign values to this one. What if you had to generate all the data by yourself? It's not a big deal. Here are some tips:
Generate Time with AM/PM: Random date: One digit value with range: A random string with specific length: Generate a random end date when start date is provided: Date of birth to Age calculation:

Reverse crosstab query in Oracle

Most of us are familiar with cross-tabbing in queries. It's basically changing rows into columns. If you have a requirement to do the reverse, do the following: Let’s say this is the initial table structure: Let's say you need to change this into the following structure: (Columns into Rows) Here is the sql for reverse cross tabbing: This would be the result:

CID	DATE_REPORTED	MED_CODE	MEDICATION	REPORTING_PROVIDER	SOURCE	MEDICATION_TYPE
77518254	14.10.2009	0032	ARTANE(TRIHEXYPHENIDYL)(ANTI-CHOLINERGICS)	20	MICP	MEDICATION 1
77518254	14.10.2009	0008	LUVOX(FLUVOXAMINE)(ANTI-DEPRESSANTS)	20	MICP	MEDICATION 3
77518254	14.10.2009	0101	MARPLAN(ISOCARBOXAZID)(ANTI-DEPRESSANTS)	20	MICP	MEDICATION 3
77518254	14.10.2009	0112	MARPLAN(ISOCARBOXAZID)(ANTI-DEPRESSANTS)	20	MICP	MEDICATION 4

Enforcing database security policies

Here is a script that displays the attributes you’ll use most often for enforcing database security policies: This view provides useful information that can be used in implementing your security policies. For example, the user’s IP address and authentication mode are available and can be used to govern what a user has access to. The Client Identifier can be set by an application and used to track and trace connected sessions. Reference: Effective Oracle Database 10g Security by Design by David Knox

Surrogate Keys

Surrogate keys are those that serve as the join fields between the fact and dimension tables in a dimension data warehouse. Surrogate keys are widely used in data warehouses instead of merely referencing the operational natural keys due to its advantages:

Performance – using a 4-byte integer usually makes a great surrogate key because it can represent 2³² values. This is compact and efficient compared to bulky alphanumeric natural keys used by the operational systems. This translates into better performance by using more efficient joins, smaller indices, and more fact rows per block.
Space saving - As surrogate keys are only 4 bytes long, they occupy less space that the bulky alphanumeric production keys. For example the date data type occupies 8 bytes. So by using surrogate keys we save 4 bytes. If we have 1 billion records in the fact table, we would be saving 4x1billion bytes = 3.73 GB of space!!!!
Buffer from operational key management practices – reuse of operational keys pose a problem using them in a data warehouse that stores historical data. Using a surrogate key creates a buffer from potential problems like these.
Mapping to integrate disparate sources – Systems developed independently in company divisions may not use the same keys, or they may use keys that conflict with data in the systems of other divisions. This situation may not cause problems when each division independently reports summary data, but it cannot be permitted in the data warehouse where data is consolidated. Since data warehouses are populated from many different operational data sources, using surrogate key acts as an integration mechanism and the ETL process uses a mapping table to uniquely identify the linked natural keys.
Handle unknown or not applicable conditions – there are cases when the data in the operational data source may violate certain rules followed by the data warehouse. For example, the operational key could have null values which may pose a problem since these are the foreign keys to the fact table which cannot have null values. Using a surrogate key prevents conditions like these.
Track changes in dimension attribute values – Though dimension attribute values are more static than fact tables, they may still undergo changes. Using a surrogate key, we can generate new keys for each change and easily handle the versioning. This enables easier changes to attributes when business requirements/rules change
Easier implementation - Surrogate keys enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes.

A surrogate key may be meaningful or meaningless. A meaningful key stores information within the key. Meaningless keys are sometimes called system keys because they only have meaning to the system. The advantages of meaningful keys are that they are constructed for data clustering, reducing retrieval times. The disadvantages of meaningful keys are that they are inflexible and rely on a code structure that could break.

A surrogate key can be created by using CREATE SEQUENCE command.

Jeyanthi's Java Cafe