Jeyanthi's Java Cafe

CLOB's and Java Spring Framework

A CLOB (Character Large Object) is an Oracle data type that can hold up to 4 GB of data. CLOB's are handy for storing large text since a VARCHAR has a maximum limit of 4000-bytes.

Here is an example of how to create a table with a CLOB:

Some common operations on CLOB's:

Writing to a CLOB:

Substr:

Finding the length of a CLOB:

Some caveats when dealing with LOB's:

LOBs are not allowed in GROUP BY, ORDER BY, SELECT DISTINCT, aggregates and JOINS
You cannot alter a table to add a CLOB column after creation. CLOB's have to be created at table creation time
The Oracle SQL parser can only handle string literals up to 4000 characters in length

Dealing with CLOB’s in Java using Spring Framework:

The code below shows the basic parts of how to read and write to a CLOB using Spring 2.x.

There are also other options like using a MappingSqlQuery to read from the CLOB.

Reference:
Short examples from ASKTOM

Handling CLOBs - Made easy with Oracle JDBC

DbUnit Framework

DbUnit extends the popular JUnit test framework and is a useful and powerful tool for simplifying unit testing of database operation.
The core components of DBUnit are:

IDatabaseConnection

DatabaseConnection - wraps a JDBC connection
DatabaseDataSourceConnection - wraps a JDBC Data Source

IDataSet

FlatXmlDataSet
DefaultDataSet
XmlDataSet
CompositeDataSet
StreamingDataSet
FilteredDataSet
DatabaseDataSet
XlsDataSet
QueryDataSet
ReplacementDataSet

DatabaseOperation
Operations (named "DatabaseOperation."xyz")

UPDATE
TRUNCATE
INSERT
REFRESH
DELETE
CLEAN_INSERT
DELETE_ALL
NONE

Other Operations

CompositeOperation
TransactionOperation
IdentityInsertOperation

Steps :

Create your dataset load file
Write your testcases
Implement your testXXX() methods

Step 1: Create your dataset load file:
One unit test database instance can be created per developer with no data. DbUnit has the ability to export and import your database data to and from XML datasets. If the testcases are setup correctly, the data required for testing is loaded from the XML files before testcase execution and deleted after it is complete. So there will be no need to cleanup afterwards.
Data may also be generated from a number of sources:

flat XML files, of which we will see an example very shortly
tables in the database
database queries created using SQL
less obvious sources, such as Microsoft Excel spreadsheets

Here is a simple example of how to extract data into XML files from a Postgres database:

Step 2: Write your testcases

Create a wrapper class by extending the DBTestCase class. The wrapper class has the following important methods:

getDataSet (): you can use this to indicate how to load test data
getSetUpOperation() : defines the operation to be executed before the test is executed
getTearDownOperation() : defines clean up operation

To make testing easier, you can create a base class that extends DBTestCase for the entire project. It will look something like this:

Step 3. Implement your testXXX() methods
Implement your test methods using Junit. Your database is now initialized before and cleaned-up after each test methods according to what you did in previous steps.

By extending the base class created above, we can create any number of DBUnit testcases . Here is an example of a complete DBUnit testcase for testing code used for executing log queries using Spring framework. The methods tested here are getLogInDateRange(), addLogEntry() and updateLogEntry().

Advantages of DBUnit:

It is an excellent way to avoid the myriad of problems that can occur when one test case corrupts the database and causes subsequent tests to fail or exacerbate the damage
DbUnit can also help you to verify that your database data match an expected set of values and it has methods for comparing data, between flat files, queries and database tables
Because the test data remains unchanged in an XML file, testcases can be executed repeatedly after iterative code changes to verify if the code didn't break.

References:
http://www.dbunit.org/index.html

JIRA and Crucible

Recently I've started using some new tools from Atlassian and they have some really cool features compared to others I've used so far.

JIRA is Atlassian's issue tracking system and these are some of the things I like about it:
1. A cool web-based UI

2. Plug-ins for IntelliJ IDEA that makes it so much easier to use

3. Workflow customizable to any project

4. A customizable dashboard with project reports and statistics

5. Multi-dimensional reporting - reports can be read in a variety of dimensions, including pie chart, linear or column graphs, and statistical tables

It was first launched in 2003 and so is relatively new compared to Bugzilla and other similar tools

Crucible is Atlassian's code review tool. It is a lightweight, web-based tool that makes code review a breeze.
For someone used to various code review tools, this seems to be the best and all developers will love it.

These are some of the cool things about Crucible:
1. The team doesn't have to meet together in one place. Code revies can be completed asynchronusly
2. Sub-tasks or issues can be created out of code review comments
3. JIRA integration
4. Inline comments makes it so much more easier to track the review and provide feedback
5. The diff tool is very useful and pleasant to use

The many uses of DBMS_RANDOM

DBA's are often faced with scenarios where they have to generate test data for developers. Let's say you have a test database with no values in a certain field. Using DBMS_RANDOM, you can just generate them on your own.
Example: Refer the table structure in my previous post. Let's say I don't have any values for Medication. My code values for Medication are 4 digits and range from 0001 to 0100.
These are some ways I would add data for these columns: Let's say I now need to set a few of them to '-1', which is the code for missing data. Here is a great way to do it:
Pick any random value from some other field and use it to assign values to this one. What if you had to generate all the data by yourself? It's not a big deal. Here are some tips:
Generate Time with AM/PM: Random date: One digit value with range: A random string with specific length: Generate a random end date when start date is provided: Date of birth to Age calculation:

Reverse crosstab query in Oracle

Most of us are familiar with cross-tabbing in queries. It's basically changing rows into columns. If you have a requirement to do the reverse, do the following: Let’s say this is the initial table structure: Let's say you need to change this into the following structure: (Columns into Rows) Here is the sql for reverse cross tabbing: This would be the result:

CID	DATE_REPORTED	MED_CODE	MEDICATION	REPORTING_PROVIDER	SOURCE	MEDICATION_TYPE
77518254	14.10.2009	0032	ARTANE(TRIHEXYPHENIDYL)(ANTI-CHOLINERGICS)	20	MICP	MEDICATION 1
77518254	14.10.2009	0008	LUVOX(FLUVOXAMINE)(ANTI-DEPRESSANTS)	20	MICP	MEDICATION 3
77518254	14.10.2009	0101	MARPLAN(ISOCARBOXAZID)(ANTI-DEPRESSANTS)	20	MICP	MEDICATION 3
77518254	14.10.2009	0112	MARPLAN(ISOCARBOXAZID)(ANTI-DEPRESSANTS)	20	MICP	MEDICATION 4

Enforcing database security policies

Here is a script that displays the attributes you’ll use most often for enforcing database security policies: This view provides useful information that can be used in implementing your security policies. For example, the user’s IP address and authentication mode are available and can be used to govern what a user has access to. The Client Identifier can be set by an application and used to track and trace connected sessions. Reference: Effective Oracle Database 10g Security by Design by David Knox

Surrogate Keys

Surrogate keys are those that serve as the join fields between the fact and dimension tables in a dimension data warehouse. Surrogate keys are widely used in data warehouses instead of merely referencing the operational natural keys due to its advantages:

Performance – using a 4-byte integer usually makes a great surrogate key because it can represent 2³² values. This is compact and efficient compared to bulky alphanumeric natural keys used by the operational systems. This translates into better performance by using more efficient joins, smaller indices, and more fact rows per block.
Space saving - As surrogate keys are only 4 bytes long, they occupy less space that the bulky alphanumeric production keys. For example the date data type occupies 8 bytes. So by using surrogate keys we save 4 bytes. If we have 1 billion records in the fact table, we would be saving 4x1billion bytes = 3.73 GB of space!!!!
Buffer from operational key management practices – reuse of operational keys pose a problem using them in a data warehouse that stores historical data. Using a surrogate key creates a buffer from potential problems like these.
Mapping to integrate disparate sources – Systems developed independently in company divisions may not use the same keys, or they may use keys that conflict with data in the systems of other divisions. This situation may not cause problems when each division independently reports summary data, but it cannot be permitted in the data warehouse where data is consolidated. Since data warehouses are populated from many different operational data sources, using surrogate key acts as an integration mechanism and the ETL process uses a mapping table to uniquely identify the linked natural keys.
Handle unknown or not applicable conditions – there are cases when the data in the operational data source may violate certain rules followed by the data warehouse. For example, the operational key could have null values which may pose a problem since these are the foreign keys to the fact table which cannot have null values. Using a surrogate key prevents conditions like these.
Track changes in dimension attribute values – Though dimension attribute values are more static than fact tables, they may still undergo changes. Using a surrogate key, we can generate new keys for each change and easily handle the versioning. This enables easier changes to attributes when business requirements/rules change
Easier implementation - Surrogate keys enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes.

A surrogate key may be meaningful or meaningless. A meaningful key stores information within the key. Meaningless keys are sometimes called system keys because they only have meaning to the system. The advantages of meaningful keys are that they are constructed for data clustering, reducing retrieval times. The disadvantages of meaningful keys are that they are inflexible and rely on a code structure that could break.

A surrogate key can be created by using CREATE SEQUENCE command.