Converting timezones can be a bit complicated if you are attempting to do it for the first time. There are problems using the java.util classes that can introduce bugs in your application.
Let's say you have a java.util.date field that you retrieved from the database and you want to display this value in the user's local timezone.
java.util.Date always reflects UTC since they do not have timezones. So you cannot convert a Date from one timezone to another. Only a Calendar or a textual representation of a date can be converted into a different time zone.
So the first step to do this would be convert the Date to a Calendar.
The normal way of doing it would be
You would then use a DateFormat to display this time in whatever format and timezone is desired.
Example:
But Calendar.getTime() is not recommended and there is a bug explaining the workaround.
Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4031021
So we would use a method like the one below to get the time value out of the Calendar object.
So the date display would now be:
But using Calendar.setTime() might not convert the UTC date into a local timezone correctly. The problem I noticed was a one hour difference when the UTC date ends up in the next day corresponding to the user's local time. Also, this only occurs if you are dealing with a day later than the current day.
Example: Lets say the UTC date value is Dec 21st 2010 2:45 AM
We need this value to be converted to a user's local timezone, let's say EST time.
So the correct converted date would be
Dec 20th 2010 9:45 PM (after DST change)
There is an alternative way of converting timezones to fix the above problem.
Step 1:
Convert the Date field to a Calendar object. All timezone conversions should be done in a java.util.Calendar.
Note that only the "time" field is stored in the resulting Calendar object. The date will be set in Step 4. This workaround is required for the problem described above to be resolved.
Step 2:
Convert the timezone on the Time field:
The resulting Calendar object will have the correct time set, but will still have today's(current day's) date.
Step 3:
Calculate the difference between the current day and the original date that had to be converted:
Step 4:
Add the difference to the converted Calendar object. The final code combining all the above steps is as follows:
Reference:
Date and time in Java
How to get the MD5 checksum for a file
MD5 Hashsums are use to confirm the integrity of files. That is especially important when very large files are downloaded or moved from one location to another. To check the integrity of a file, its corresponding MD5 file can be downloaded and compared to the MD5 created locally on the file downloaded.
The program below can be used to create the MD5 hash for a file using the Apache Commons Codec lib:
The Windows program "md5sums" creates MD5 hashes.
In Unix, the BSD program "md5sum" creates MD5 hashes.
Reference
http://www.rgagnon.com/javadetails/java-0416.html
http://commons.apache.org/codec/
http://www.pc-tools.net/win32/md5sums/
The program below can be used to create the MD5 hash for a file using the Apache Commons Codec lib:
The Windows program "md5sums" creates MD5 hashes.
In Unix, the BSD program "md5sum" creates MD5 hashes.
Reference
http://www.rgagnon.com/javadetails/java-0416.html
http://commons.apache.org/codec/
http://www.pc-tools.net/win32/md5sums/
Comparing Java Date and Oracle Timestamp
Oracle 9.2 through 10.2 jdbc drivers causes java.util.Date to be mapped to java.sql.Date which does not include time information. Hence a direct comparison of a Java Date object to an Oracle Timestamp column like below will throw database errors
new java.sql.Date(.getTime() )
ORA-00932: inconsistent datatypes: expected TIMESTAMP got NUMBER
To be able to compare an Oracle Timestamp to a Java Date object, follow the steps below:
Step 1: Convert the Java Date to a String
Example:
Step 2: Use the String value in the SQL and use Oracle to_date() to convert it into Oracle date
Example:
Step 3: Cast the Oracle date into an Oracle timestamp
Example:
Reference
Oracle JDBC FAQ
new java.sql.Date(
ORA-00932: inconsistent datatypes: expected TIMESTAMP got NUMBER
To be able to compare an Oracle Timestamp to a Java Date object, follow the steps below:
Step 1: Convert the Java Date to a String
Example:
Step 2: Use the String value in the SQL and use Oracle to_date() to convert it into Oracle date
Example:
Step 3: Cast the Oracle date into an Oracle timestamp
Example:
Reference
Oracle JDBC FAQ
Java Snippets on Date and Timestamp
Creating a Java Timestamp object
Convert String to Date
Converting java.util.Date to java.sql.Date
The difference between both these date formats is the extra millisecond in java.sql.date
java.util.date ---------->dd mm yyyy hh:mm:ss
java.sql.date----------->dd mm yyyy hh:mm:ss: ms
The conversion is simple
Setting Date fields in the database using PreparedStatement
While setting a date using PreparedStatement, using setDate strips out the Time value. If you need the time field to be retained, use PreparedStatement setTime instead.
Getting Date fields from the database using ResultSet
While retrieving the Date value from the database, using getDate on ResultSet normally strips out the Time Value. To preserve the time fields, use ResultSet getTimeStamp instead.
Convert String to Date
Converting java.util.Date to java.sql.Date
The difference between both these date formats is the extra millisecond in java.sql.date
java.util.date ---------->dd mm yyyy hh:mm:ss
java.sql.date----------->dd mm yyyy hh:mm:ss: ms
The conversion is simple
Setting Date fields in the database using PreparedStatement
While setting a date using PreparedStatement, using setDate strips out the Time value. If you need the time field to be retained, use PreparedStatement setTime instead.
Getting Date fields from the database using ResultSet
While retrieving the Date value from the database, using getDate on ResultSet normally strips out the Time Value. To preserve the time fields, use ResultSet getTimeStamp instead.
CLOB's and Java Spring Framework
A CLOB (Character Large Object) is an Oracle data type that can hold up to 4 GB of data. CLOB's are handy for storing large text since a VARCHAR has a maximum limit of 4000-bytes.
Here is an example of how to create a table with a CLOB:
Some common operations on CLOB's:
Writing to a CLOB:
Substr:
Finding the length of a CLOB:
Some caveats when dealing with LOB's:
Dealing with CLOB’s in Java using Spring Framework:
The code below shows the basic parts of how to read and write to a CLOB using Spring 2.x.
There are also other options like using a MappingSqlQuery to read from the CLOB.
Reference:
Short examples from ASKTOM
Handling CLOBs - Made easy with Oracle JDBC
Here is an example of how to create a table with a CLOB:
Some common operations on CLOB's:
Writing to a CLOB:
Substr:
Finding the length of a CLOB:
Some caveats when dealing with LOB's:
- LOBs are not allowed in GROUP BY, ORDER BY, SELECT DISTINCT, aggregates and JOINS
- You cannot alter a table to add a CLOB column after creation. CLOB's have to be created at table creation time
- The Oracle SQL parser can only handle string literals up to 4000 characters in length
Dealing with CLOB’s in Java using Spring Framework:
The code below shows the basic parts of how to read and write to a CLOB using Spring 2.x.
There are also other options like using a MappingSqlQuery to read from the CLOB.
Reference:
Short examples from ASKTOM
Handling CLOBs - Made easy with Oracle JDBC
DbUnit Framework
DbUnit extends the popular JUnit test framework and is a useful and powerful tool for simplifying unit testing of database operation.
The core components of DBUnit are:
Step 1: Create your dataset load file:
One unit test database instance can be created per developer with no data. DbUnit has the ability to export and import your database data to and from XML datasets. If the testcases are setup correctly, the data required for testing is loaded from the XML files before testcase execution and deleted after it is complete. So there will be no need to cleanup afterwards.
Data may also be generated from a number of sources:
Step 2: Write your testcases
Create a wrapper class by extending the DBTestCase class. The wrapper class has the following important methods:
Step 3. Implement your testXXX() methods
Implement your test methods using Junit. Your database is now initialized before and cleaned-up after each test methods according to what you did in previous steps.
By extending the base class created above, we can create any number of DBUnit testcases . Here is an example of a complete DBUnit testcase for testing code used for executing log queries using Spring framework. The methods tested here are getLogInDateRange(), addLogEntry() and updateLogEntry().
Advantages of DBUnit:
References:
http://www.dbunit.org/index.html
The core components of DBUnit are:
- IDatabaseConnection
- DatabaseConnection - wraps a JDBC connection
- DatabaseDataSourceConnection - wraps a JDBC Data Source
- IDataSet This is the primary abstraction used by DbUnit to manipulate tabular data. Commonly used implementations are:
- FlatXmlDataSet
- DefaultDataSet
- XmlDataSet
- CompositeDataSet
- StreamingDataSet
- FilteredDataSet
- DatabaseDataSet
- XlsDataSet
- QueryDataSet
- ReplacementDataSet
- DatabaseOperation
- Operations (named "DatabaseOperation."xyz")
- UPDATE
- TRUNCATE
- INSERT
- REFRESH
- DELETE
- CLEAN_INSERT
- DELETE_ALL
- NONE
- Other Operations
- CompositeOperation
- TransactionOperation
- IdentityInsertOperation
- Create your dataset load file
- Write your testcases
- Implement your testXXX() methods
Step 1: Create your dataset load file:
One unit test database instance can be created per developer with no data. DbUnit has the ability to export and import your database data to and from XML datasets. If the testcases are setup correctly, the data required for testing is loaded from the XML files before testcase execution and deleted after it is complete. So there will be no need to cleanup afterwards.
Data may also be generated from a number of sources:
- flat XML files, of which we will see an example very shortly
- tables in the database
- database queries created using SQL
- less obvious sources, such as Microsoft Excel spreadsheets
Step 2: Write your testcases
Create a wrapper class by extending the DBTestCase class. The wrapper class has the following important methods:
- getDataSet (): you can use this to indicate how to load test data
- getSetUpOperation() : defines the operation to be executed before the test is executed
- getTearDownOperation() : defines clean up operation
Step 3. Implement your testXXX() methods
Implement your test methods using Junit. Your database is now initialized before and cleaned-up after each test methods according to what you did in previous steps.
By extending the base class created above, we can create any number of DBUnit testcases . Here is an example of a complete DBUnit testcase for testing code used for executing log queries using Spring framework. The methods tested here are getLogInDateRange(), addLogEntry() and updateLogEntry().
Advantages of DBUnit:
- It is an excellent way to avoid the myriad of problems that can occur when one test case corrupts the database and causes subsequent tests to fail or exacerbate the damage
- DbUnit can also help you to verify that your database data match an expected set of values and it has methods for comparing data, between flat files, queries and database tables
- Because the test data remains unchanged in an XML file, testcases can be executed repeatedly after iterative code changes to verify if the code didn't break.
References:
http://www.dbunit.org/index.html
JIRA and Crucible
Recently I've started using some new tools from Atlassian and they have some really cool features compared to others I've used so far.
JIRA is Atlassian's issue tracking system and these are some of the things I like about it:
1. A cool web-based UI
2. Plug-ins for IntelliJ IDEA that makes it so much easier to use
3. Workflow customizable to any project
4. A customizable dashboard with project reports and statistics
5. Multi-dimensional reporting - reports can be read in a variety of dimensions, including pie chart, linear or column graphs, and statistical tables
It was first launched in 2003 and so is relatively new compared to Bugzilla and other similar tools
Crucible is Atlassian's code review tool. It is a lightweight, web-based tool that makes code review a breeze.
For someone used to various code review tools, this seems to be the best and all developers will love it.
These are some of the cool things about Crucible:
1. The team doesn't have to meet together in one place. Code revies can be completed asynchronusly
2. Sub-tasks or issues can be created out of code review comments
3. JIRA integration
4. Inline comments makes it so much more easier to track the review and provide feedback
5. The diff tool is very useful and pleasant to use
JIRA is Atlassian's issue tracking system and these are some of the things I like about it:
1. A cool web-based UI
2. Plug-ins for IntelliJ IDEA that makes it so much easier to use
3. Workflow customizable to any project
4. A customizable dashboard with project reports and statistics
5. Multi-dimensional reporting - reports can be read in a variety of dimensions, including pie chart, linear or column graphs, and statistical tables
It was first launched in 2003 and so is relatively new compared to Bugzilla and other similar tools
Crucible is Atlassian's code review tool. It is a lightweight, web-based tool that makes code review a breeze.
For someone used to various code review tools, this seems to be the best and all developers will love it.
These are some of the cool things about Crucible:
1. The team doesn't have to meet together in one place. Code revies can be completed asynchronusly
2. Sub-tasks or issues can be created out of code review comments
3. JIRA integration
4. Inline comments makes it so much more easier to track the review and provide feedback
5. The diff tool is very useful and pleasant to use
The many uses of DBMS_RANDOM
DBA's are often faced with scenarios where they have to generate test data for developers.
Let's say you have a test database with no values in a certain field. Using DBMS_RANDOM, you can just generate them on your own.
Example: Refer the table structure in my previous post. Let's say I don't have any values for Medication. My code values for Medication are 4 digits and range from 0001 to 0100.
These are some ways I would add data for these columns: Let's say I now need to set a few of them to '-1', which is the code for missing data. Here is a great way to do it:
Pick any random value from some other field and use it to assign values to this one. What if you had to generate all the data by yourself? It's not a big deal. Here are some tips:
Generate Time with AM/PM: Random date: One digit value with range: A random string with specific length: Generate a random end date when start date is provided: Date of birth to Age calculation:
Example: Refer the table structure in my previous post. Let's say I don't have any values for Medication. My code values for Medication are 4 digits and range from 0001 to 0100.
These are some ways I would add data for these columns: Let's say I now need to set a few of them to '-1', which is the code for missing data. Here is a great way to do it:
Pick any random value from some other field and use it to assign values to this one. What if you had to generate all the data by yourself? It's not a big deal. Here are some tips:
Generate Time with AM/PM: Random date: One digit value with range: A random string with specific length: Generate a random end date when start date is provided: Date of birth to Age calculation:
Reverse crosstab query in Oracle
Most of us are familiar with cross-tabbing in queries. It's basically changing rows into columns.
If you have a requirement to do the reverse, do the following:
Let’s say this is the initial table structure:
Let's say you need to change this into the following structure: (Columns into Rows)
Here is the sql for reverse cross tabbing:
This would be the result:
CID | DATE_REPORTED | MED_CODE | MEDICATION | REPORTING_PROVIDER | SOURCE | MEDICATION_TYPE |
77518254 | 14.10.2009 | 0032 | ARTANE(TRIHEXYPHENIDYL)(ANTI-CHOLINERGICS) | 20 | MICP | MEDICATION 1 |
77518254 | 14.10.2009 | 0008 | LUVOX(FLUVOXAMINE)(ANTI-DEPRESSANTS) | 20 | MICP | MEDICATION 3 |
77518254 | 14.10.2009 | 0101 | MARPLAN(ISOCARBOXAZID)(ANTI-DEPRESSANTS) | 20 | MICP | MEDICATION 3 |
77518254 | 14.10.2009 | 0112 | MARPLAN(ISOCARBOXAZID)(ANTI-DEPRESSANTS) | 20 | MICP | MEDICATION 4 |
Enforcing database security policies
Here is a script that displays the attributes you’ll use most often for enforcing database security policies:
This view provides useful information that can be used in implementing your security policies. For example, the user’s IP address and authentication mode are available and can be used to govern what a user has access to. The Client Identifier can be set by an application and used to track and trace connected sessions.
Reference:
Effective Oracle Database 10g Security by Design by David Knox
Surrogate Keys
Surrogate keys are those that serve as the join fields between the fact and dimension tables in a dimension data warehouse. Surrogate keys are widely used in data warehouses instead of merely referencing the operational natural keys due to its advantages:
- Performance – using a 4-byte integer usually makes a great surrogate key because it can represent 232 values. This is compact and efficient compared to bulky alphanumeric natural keys used by the operational systems. This translates into better performance by using more efficient joins, smaller indices, and more fact rows per block.
- Space saving - As surrogate keys are only 4 bytes long, they occupy less space that the bulky alphanumeric production keys. For example the date data type occupies 8 bytes. So by using surrogate keys we save 4 bytes. If we have 1 billion records in the fact table, we would be saving 4x1billion bytes = 3.73 GB of space!!!!
- Buffer from operational key management practices – reuse of operational keys pose a problem using them in a data warehouse that stores historical data. Using a surrogate key creates a buffer from potential problems like these.
- Mapping to integrate disparate sources – Systems developed independently in company divisions may not use the same keys, or they may use keys that conflict with data in the systems of other divisions. This situation may not cause problems when each division independently reports summary data, but it cannot be permitted in the data warehouse where data is consolidated. Since data warehouses are populated from many different operational data sources, using surrogate key acts as an integration mechanism and the ETL process uses a mapping table to uniquely identify the linked natural keys.
- Handle unknown or not applicable conditions – there are cases when the data in the operational data source may violate certain rules followed by the data warehouse. For example, the operational key could have null values which may pose a problem since these are the foreign keys to the fact table which cannot have null values. Using a surrogate key prevents conditions like these.
- Track changes in dimension attribute values – Though dimension attribute values are more static than fact tables, they may still undergo changes. Using a surrogate key, we can generate new keys for each change and easily handle the versioning. This enables easier changes to attributes when business requirements/rules change
- Easier implementation - Surrogate keys enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes.
A surrogate key may be meaningful or meaningless. A meaningful key stores information within the key. Meaningless keys are sometimes called system keys because they only have meaning to the system. The advantages of meaningful keys are that they are constructed for data clustering, reducing retrieval times. The disadvantages of meaningful keys are that they are inflexible and rely on a code structure that could break.
Subscribe to:
Posts (Atom)