Chapter 2. Installation


  1. System Requirements
  2. Setting the CLASSPATH
  3. Loading the Driver
  4. Connecting to the Database

System Requirements

HXTT Excel packages include a Type 4 JDBC driver. Type 4 indicates that the driver is written in Pure Java, and communicates in the database system's own network protocol. Because of this, the driver is platform independent; once compiled, the driver can be used on any system. HXTT Excel can run on any platforms with Java VM, which includes Microsoft Windows, Novell Netware, OS2, UNIX, and LINUX. HXTT Excel supports Personal Java, JDK1.0.X, JDK1.1.X, JDK1.2.X, JDK1.3.X, JDK1.4.X, JDK1.5.X, JDK1.6.X, JDK1.7.X, JDK1.8.X, and JDK1.9.X. HXTT Excel includes a database engine which can support multi-user access. It supports { UNION | INTERSECT | EXCEPT | MINUS } [ ALL ] query , INNER JOIN, FULL JOIN, LEFT JOIN, RIGHT JOIN, NATURAL JOIN, CROSS JOIN, and subquery which includes single-row subquery, multirow subquery, multiple-column subquery, inline views, and correlated subquery.

Setting the CLASSPATH

When java loads any class, it searches a list known as the classpath. This is a list of directories where classes are placed, or a list of jar files (archives containing classes and other resources) or both. HXTT Excel driver is a Type 4 driver. You can do this in many different methods, but the most command are:

  1. Setting the CLASSPATH environment variable.
  2. putting it on the command line using the -cp parameter.
  3. placing it in the JVM's lib/ext directory.
  4. extract all files in jar file into the directory of your application.

You can know detailed information about "Setting the Classpath" from your JDK Tools and Utilities. Let's use JDBC4.0 package as a simple sample. To put Excel_JDBC40.jar into your class path, you should use "export CLASSPATH=/usr/share/lib/Excel_JDBC40.jar:$CLASSPATH" on Solaris and Linux, and "SET CLASSPATH=\javalib\Excel_JDBC40.jar;%classpath%" on Windows.

Loading the Driver

Any source that uses JDBC needs to import the java.sql package by using " import java.sql.*;".

HXTT Excel driver' name is com.hxtt.sql.excel.ExcelDriver, and you can uses it without involving hard coding the driver into your code. You do this by setting the jdbc.drivers system property. For example, for command line apps you can use:
java -Djdbc.drivers=com.hxtt.sql.excel.ExcelDriver yourApp
Then, the JVM upon startup will load the drivers automatically. Some applications (JBoss, Tomcat etc) support a .properties file which they use to save putting this on the command line.

The second method is the most common and involves you loading the driver yourself. It's simple:
From then on you can get connections from DriverManager.
Note: If Class.forName() throws ClassNotFoundException, you should check your classpath.

Connecting to the Database

After the driver has been registered with the DriverManager, you can obtain a Connection instance that is connected to a particular database by calling DriverManager.getConnection(). With JDBC, a database is represented by a URL (Uniform Resource Locator).

                jdbc:excel:[//]/[DatabasePath][?prop1=value1[;prop2=value2]] (You can omit that "//" characters sometimes)
                        For example:
                                "jdbc:excel:/c:/data" for Windows driver
                                "jdbc:excel:///c:/data" for Windows driver
                                "jdbc:excel:////usr/data" for unix or linux
                                "jdbc:excel://///" for UNC path
        Remote Access (client/server mode):
                        For example: "jdbc:excel://" if one ExcelServer is run on the 3099 port of
        Compressed Database:(.ZIP, .JAR, .GZ, .TAR, .BZ2, .TGZ, .TAR.GZ, .TAR.BZ2, .7z) 
                jdbc url format is the same as embedded url and remote url.
                        For example:
        Memory-only Database:
        URL Database:(http protocol, https protocol, ftp protocol, sftp protocol)
                        For example:
                                "jdbc:excel:" //Note: FTP site's user/password should be set in ftpURL, and cannot be set in JDBC connection property because user/password JDBC connection property belongs to server/client connection.
                                               //FTP  protocol supports  explicit SSL/TLS encryption (FTPES)  and  implicit SSL/TLS (FTPS). FTPES can be detected according to FTP Server reply, and FTPS can be used if you use port 990 in  Ftp url.
        SAMBA Database:(smb protocol)
                        For example:
                                "jdbc:excel:smb://test1:123@" //Note: SAMBA user/password should be set in SMB url, and cannot be set in JDBC connection property because user/password JDBC connection property belongs to server/client connection.
	UNC path JDBC url:
                        For example:
	Free JDBC url:(Warning: only use it for special project)
                jdbc:excel:/" or "jdbc:excel:///". Then you can use some full UNC path names in SQL to visit anywhere where your Java VM has right to access.
                        For instance:
                                select * from \\amd2500\e$\excelfiles\test;
                                select * from "\\amd2500\d$\excelfiles".test;
                                select * from ".".test;

         HXTT Excel supports seamlessly data mining on memory-only table, physical table, url table, compressed table, SAMBA table in a sql. More details
         is in Advanced Programming chapter.

To connect, you need to get a Connection instance from JDBC. To do this, you use the DriverManager.getConnection() method:

Connection con = DriverManager.getConnection(url, properties);

There are a few different signatures for the getConnection() method. You should see the API documentation that comes with your JDK for more specific information on how to use them. You can specify additional properties to the JDBC driver by placing them in a java.util.Properties instance and passing that instance to the DriverManager when you connect.

Property Name
Default Value
versionNumber Excel Version Number. You can use null, "BIFF7"(MS Excel 95), "BIFF8" (MS Excel 97/98/2000/2001/XP/2002/2003/2004), "XLSX" (MS Excel 2007/2010/2013/2016), "Strict" (Strict Open XML format). This parameter is only used for CREATE DATABASE . BIFF8
otherExtension Indicates whether Excel driver supports other extension beside 'xls', 'xlsx', and 'xlsm'. You can use comma to assign more than one extension, for instance, otherExtension=DB,ACR . null
firstRowHasNames Indicates whether the first row of worksheet data contains column names. If you use false value, there's no table header recognition.
If you use true value, it will recognise table header from first row.
If you use an int value n, it will use the n row of worksheet data contains column names, and data row start from the n+1 (or according dataStartRow property) row.
It supports complicated table header more than one row for ETL. Headers and Sub-headers can be used as column name. Parellel tables can be recognise.
endRowHasNames Indicates whether the last row of worksheet data contains column names. It will equal to firstRowHasNames default. It will detect automatically from the firstRowHasNames'th row to the endRowHasNames'th row to decide whether there's parent columns. null
dataStartRow Indicates where is the first data row of worksheet. Use it only when there's comment rows between table header and data rows. Without it, the data row will start at first row( if there is not table header) ,or firstRowHasNames+1 row (if there is a table header at firstRowHasNames). null
dataEndTag Sometimes the spreadsheet has some comment rows after table body. dataEndTag can define a dynamic range table. HXTT Excel will ignore the following data rows when it met dataEndTag value. null
maxScanRows Indicates how many rows should be scanned when determining the column types. If you set maxScanRows to 0, the entire file is scanned. If you set maxScanRows to a negative value, the file won't be scanned. 10
withFormat Indicates whether return number value (for instance, currency) in Formatting. false
cloneMergedCellValues Indicates whether clone values for all merged cells. false
readOnlyMode Indicates whether uses readOnly mode for speed optimization. ReadOnly mode means multi-user read mode and will skip formula parse. If it's true, HXTT Excel will calculate value according to relative formula when it update cell. true
emptiesAreVisible Indicates whether Excel's resultSet includes empty rows. true
tmpdir Indicates whether set a temp directory, Default: the value of JVM's "" property. If that value is incorrect, using the directory of JDBC url. _memory_ means large data in memory. null
delayedClose Indicates the delayed seconds for close transaction. That option is used to avoid frequent close/open table operations for following sqls. Automatic temporary index is disabled when delayedClose<=60s. You can use 0~120 seconds. Default: 3. null
refreshInterval To specify a a refresh interval setting in seconds for FTP/SFTP/HTTP/HTTS database file which determines how long it to discard the content cache. 60
lockTimeout To specify HXTT Excel driver's timeout in milliseconds to wait until other processes or applications released record lock or table lock. 0 means a default value, and <0 means no wait. For server/client mode, remote client connection uses also that parameter(Default value: 30000ms) to wait response from server side. 1000
maxIdleTime Indicates the max idle time in minute for remote connection. That option is mainly used to avoid closing automatically idle remote connection for connection pool. Embedded idle connectoin won't be closed automatically except for garbage collection. You can use 1~1440 minutes. Default: 30. null
soTimeout To specify Enable/disable Socket read timeout with the specified timeout, in milliseconds. With this option set * to a non-zero timeout, a read() call on the InputStream associated with * this Socket will block for only this amount of time. If the timeout * expires, a is raised, though the * Socket is still valid. The option must be enabled * prior to entering the blocking operation to have effect. The * timeout must be {@code > 0}. * A timeout of zero is interpreted as an infinite timeout. 1000
charSet To specify a Character Encoding Scheme other than the client default. You can find a Supported Encodings list of file:///c|/jdk1.2/docs/guide/internat/encoding.doc.html. Cp895(Czech MS - DOS 895), Cp620(Polish MS - DOS 620) and Mazovia are extra supported although JVM doesn't support those. null
ignoreCompressionFile Indicates whether can list alll tables in compression file. You can use null, true, false true
ODBCTrimBehavior Indicates whether works like MS Access ODBC driver to ignore tail space characters in condition expression. You can use null, true, false false
caseInsensitive Indicates whether works like MS Access ODBC driver to be case insensitve for string comparison. You can use null, true, false false
emptyDecimalAsZero Indicates whether returns empty decimal as zero value. You can use null, true, false false
emptyStringAsNull Indicates whether returns empty string as null value. You can use null, true, false true
dateFormat dateFormat is used to specify a default parse sequence of date(Default: 'yyyy-MM-dd') format. Date and Time patterns follow the Java java.text.SimpleDateFormat Format ( standard. You can use __or__ to define more than one format for ETL data type detection, but only the first format will be used for output format. yyyy-MM-dd
timeFormat timeFormat is used to specify a default parse sequence of time(Default: 'hh:mm:ss') format. Date and Time patterns follow the Java java.text.SimpleDateFormat Format ( standard. You can use __or__ to define more than one format for ETL data type detection, but only the first format will be used for output format. hh:mm:ss
timestampFormat timestampFormat is used to specify a default parse sequence of timestamp(Default: 'yyyy-MM-dd hh:mm:ss') format. Date and Time patterns follow the Java java.text.SimpleDateFormat Format ( standard. You can use __or__ to define more than one format for ETL data type detection, but only the first format will be used for output format. yyyy-MM-dd hh:mm:ss
decimalFormat decimalFormat is used to specify a default parse sequence of decimal number format. You can use __or__ to define more than one format for ETL data type detection, but only the first format will be used for output format. null
decimalSeparator decimalSeparator is used to specify a default character for decimal sign. Different for French, etc. null
groupingSeparator groupingSeparator is used to specify a default character for thousands separator. Different for French, etc. null
timezone local is used to specify a default time zone for calendar. "local" means use the local calendar. null
focusValue For extracting XML/JSON data into a relational table, if set to true, it will fetch parsed SQL value, not XML element, or JSON object. true
stuffAutomatically For extracting XML/JSON data into a relational table, if it set to true, it can split one complicated element/object into multi data rows. If it set to false, it can still flat XML/JSON into SQL table, but won't increase data rows by stuff automatically. true
maxStuffColumnCount When using stuffAutomatically=true, maxStuffColumnCount value limits the max expanded child columns. Please remember, one stuffed column can own many child columns. 2
maxStuffLevel maxStuffLevel value limits the max level of expanded child columns. For instance, `SubImageList/SubImageInfoObject/ImageID` column will show for maxStuffLevel=3. `SubImageList/SubImageInfoObject/` column will show for maxStuffLevel=2, and you can use `SubImageList/SubImageInfoObject/`->'ImageID' to quote the value of ImageID. 2
hyphen4name For Extracting XML/JSON data into a relational table, it can convert automatically sub element/object to SQL column. The default column name is ParentName_ChildName, if you use '_' as separator. For instance, ""name": {"first": "John", "last": "Doe" } will become two columns, name_first and name_last. If you use null or empty string, then the columm name will become ParentNameChildName. Then you can use select namefirst,namelast from yourJsonTable. _
hyphenInColumName '_', '-', and ' '(space) can occur in column name, and you can choose hyphenInColumName=_(or other strings) to format those special characters(_,-, space) so that database tools can read it. null
maxCacheSize Indicates the max memory utilization for per table on automatic temporary index or matched result cache. You can use 16~65536 kilo bytes. Default: 1024. null
host The remote host on which one ExcelServer is running null
port The port on which one ExcelServer is listening null
serverType The type of ExcelServer on the remote host null
user The user to connect as null
password The password to use when connecting null
cryptType To specify a crypt type for Table Encryption and Column Level Encryption. All new created table in this connection will become crypted table. You can use DES, TRIDES, BLOWFISH, and AES now. null
cryptKey To specify a crypt key. Without encrypt key, CREATE TABLE won't create crypted table. null
storeCryptKey Indicates whether crypt key is stored in crypted table. If stored, crypted table can be opened automatically in any connection without predefined crypt properites. If not stored, cryptd table can only be opened with correct key, and none include us can help you in cracking your data without the correct key. false

When your code then tries to open a Connection, and you get a No driver available SQLException being thrown, this is probably caused by the driver not being in the class path, or the JDBC url not being correct.

To close the database connection, simply call the close() method to the Connection:



Copyright © 2003-2019 Heng Xing Tian Tai Lab | All Rights Reserved. |