Main   Products   Offshore Outsourcing   Customers   Partners   ContactUs  
JDBC Databases
  HXTT Access v5.2
  HXTT Cobol v2.1
  HXTT DBF v5.2
  HXTT Excel v4.2
  HXTT Paradox v5.2
  HXTT Text(CSV) v5.2
 
  Buy Now
  Support
  Download
  Document
  FAQ
  HXTT XML v1.2
Offshore Outsourcing
Oracle Data Import/Export
DB2 Data Import/Export
Sybase Data Import/Export
Free Resources
  Firewall Tunneling
  Search Indexing Robot
  Conditional Compilation
  Password Recovery for MS Access
  Password Recovery for Corel Paradox
  Checksum Tool for MD5
  Character Set Converter
  Pyramid - Poker of ZYH
   
   
   
Hongxin Technology & Trade Ltd. of Xiangtan City (abbr, HXTT)

HXTT Text(CSV)
Extremely big csv
Denny
2014-10-15 05:21:52.0
dear hxtt,

i have a csv files which size is about 4Gb

issuing "select * from csvfile limit 100" will last forever, consume memory & eventually crash

is there any way to do this efficiently .. i would have to take the whole file also to feed to another database, and probably have to run complex query (joins, subselect, aggregate)

please advice

regards
Re:Extremely big csv
HXTT Support
2014-10-15 06:13:48.0
Try
jdbc:csv:/d:/test/text/?_CSV_Header=true;_CSV_Quoter=;readOnlyMode=true;
Re:Re:Extremely big csv
Denny
2014-10-15 10:06:29.0
tried these:

maxScanRows=-1;readOnlyMode=true;soloMode=true;

still no change

can you confirm how much time it should be required to load 100 rows from 4gb csv with 10 columns ?

regards
Re:Re:Re:Extremely big csv
HXTT Support
2014-10-16 06:11:38.0
Checked. No relation with the connection property.
2014-10-16 fixed a load bug which waiting the thread to parse all CSV rows.
Please download the latest package.
Re:Re:Re:Re:Extremely big csv
Denny
2014-10-16 16:39:05.0
dear hxtt,

working very nice,
thank you for great support

regards
Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-10-17 12:42:28.0
Hello,

i found a problem after the last update..

"select count(*) from any_table" always returns 1073741823
"select count(1) from any_table" : correct result
"select count(column1) from any_table" : correct result

can this be fixed ?

regards
Re:Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-10-17 14:25:01.0
dear hxtt,

more problems coming one some csv file:

04:11:44,454 INFO [STDOUT] java.lang.NullPointerException
04:11:44,464 INFO [STDOUT] at com.hxtt.sql.text.b.if(Unknown Source)
04:11:44,471 INFO [STDOUT] at com.hxtt.sql.text.b.a(Unknown Source)
04:11:44,475 INFO [STDOUT] at com.hxtt.sql.text.b.run(Unknown Source)
04:11:44,477 INFO [STDOUT] at java.lang.Thread.run(Thread.java:662)

java.sql.SQLException: Failed to go Record 132001 of table customer_dst2:Inner Error: a corrupted CSV Index File.
at com.hxtt.global.SQLState.setNextException(Unknown Source)
at com.hxtt.sql.text.l.a(Unknown Source)
at com.hxtt.sql.a7.a(Unknown Source)
at com.hxtt.sql.da.a(Unknown Source)
at com.hxtt.sql.da.a(Unknown Source)
at com.hxtt.sql.fj.y(Unknown Source)
at com.hxtt.sql.d4.a(Unknown Source)
at com.hxtt.sql.d4.a(Unknown Source)
at com.hxtt.sql.d4.a(Unknown Source)
at com.hxtt.sql.b0.a(Unknown Source)
at com.hxtt.sql.b0.a(Unknown Source)
at com.hxtt.sql.b0.a(Unknown Source)
at com.hxtt.sql.am.a(Unknown Source)
at com.hxtt.sql.d6.execute(Unknown Source)


please advice

regards
Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-10-18 06:53:49.0
Supported. Please redownload it.
Re:Extremely big csv
Denny
2014-11-10 19:45:19.0
dear hxtt,

there is new problems from the last package:

1.
jdbc:CSV:ftp://user:***@127.0.0.1:2100?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;{{{charSet=437}}};_CSV_EOL=\u000A
- select count(*) from table.csv from ftp: 400000 (correct result)
- select * from table.csv : hangs
stacktrace:
java.io.RandomAccessFile.readBytes(Native Method)
java.io.RandomAccessFile.read(RandomAccessFile.java:322)
com.hxtt.concurrent.z.d(Unknown Source)
com.hxtt.concurrent.r.do(Unknown Source)
com.hxtt.concurrent.o.a(Unknown Source)
com.hxtt.concurrent.v.d(Unknown Source)
com.hxtt.concurrent.r.do(Unknown Source)
com.hxtt.sql.text.b.a(Unknown Source)
com.hxtt.sql.text.l.a(Unknown Source)
com.hxtt.sql.a7.a(Unknown Source)
com.hxtt.sql.da.a(Unknown Source)
com.hxtt.sql.da.a(Unknown Source)
com.hxtt.sql.fj.y(Unknown Source)
com.hxtt.sql.d4.a(Unknown Source)
com.hxtt.sql.d4.a(Unknown Source)
com.hxtt.sql.dh.try(Unknown Source)
com.hxtt.sql.dh.next(Unknown Source)


2.
jdbc:CSV:///c:\afolder?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=\u0022;;_CSV_EOL=\u000A;
- select count(*) from table.csv from local path : 3312 (wrong)
- select * from table.csv : 3312 (wrong)

the file used for point 1 & 2 is the same file, the difference is
point 1 is accessed from ftp, point 2 is accessed from local path


please advice

regards



Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-11 04:52:07.0
Please email us your table.csv sample.
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-11-11 08:27:47.0
sample emailed to webmaster@hxtt.com
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-11 20:57:38.0
>jdbc:CSV:ftp://user:***@127.0.0.1:2100?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;{{{charSet=437}}};_CSV_EOL=\u000A
>- select count(*) from table.csv from ftp: 400000 (correct result)
>- select * from table.csv : hangs
It should be
;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;charSet=437;_CSV_EOL=\\u000A",

>- select * from table.csv : hangs
It can return 400000 for local or http protocol. Hasn't tested ftp protocol.

>- select count(*) from table.csv from local path : 3312 (wrong)
>- select * from table.csv : 3312 (wrong)
Failed to parse row 3313:quote character " can't exist in data, which doesn't begined with quote character.

Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-11-11 21:06:08.0
>It should be
>;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;charSet=437;_CSV_EOL=\\u000A",

sorry, that special URL is actually processed again in my code to produce correct url, i'm pasting the wrong url

>- select * from table.csv : hangs
> It can return 400000 for local or http protocol. Hasn't tested ftp protocol.

then maybe the problem is in the ftp protocol, because in the stacktrace it hangs when reading local file, while infact the file was on ftp

>- select count(*) from table.csv from local path : 3312 (wrong)
>- select * from table.csv : 3312 (wrong)
>Failed to parse row 3313:quote character " can't exist in data, which doesn't >begined with quote character.

is it possible for the driver to throw exception with the error (not silently drop rows) ?

Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-12 04:05:03.0
>then maybe the problem is in the ftp protocol, because in the stacktrace
> it hangs when reading local file, while infact the file was on ftp
It will fetch that file from ftp into memory/local temp directory.

>is it possible for the driver to throw exception with the error (not silently drop rows) ?
2014-11-12 CSV parse warning can be fetched through Statement.getWarings(). The prior version used ResultSet.getWarning().
warnings=stmt.getWarnings();
if(warnings!=null){
do{
System.out.println(warnings.getMessage());
System.out.println("Error Code:"+warnings.getErrorCode());
System.out.println("SQL State:"+warnings.getSQLState());
}while((warnings=warnings.getNextWarning())!=null);
stmt.clearWarnings();
}


Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-11-12 04:28:39.0
>then maybe the problem is in the ftp protocol, because in the stacktrace
> it hangs when reading local file, while infact the file was on ftp
> It will fetch that file from ftp into memory/local temp directory.

so there is no resolution to this ?

the old driver (jar size 1,324,231) has no problem (can fetch 400000) in few seconds
the new driver (jar size 1,449,148) hangs

both with the same url
jdbc:CSV:ftp://user:***@127.0.0.1:2100?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;_CSV_EOL=\u000A

if this is the case then i have to revert to old version, which will waste all previous effort above

please advice

regards
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-12 05:12:20.0
>the old driver (jar size 1,324,231) has no problem (can fetch 400000) in few seconds
>the new driver (jar size 1,449,148) hangs
For ftp protocol? I will check it.
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-13 18:26:42.0
Checked. Ftp can work normal.
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-11-13 18:36:17.0
have you tested with the sample I provide ?

you have to fetch the rows, because it can retrieve about 20K rows before hang

it also can fetch small files normally

I have tested it over&over again and it always hang at about 20K rows out of 400K

but when I switch to old driver, it works ok
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-13 18:47:44.0
>I have tested it over&over again and it always hang at about 20K rows out of 400K
I can fetch all 400000 rows through testText.java
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-14 21:49:55.0
>>I have tested it over&over again and it always hang at about 20K rows out of 400K
Failed to recur your issue.
1st. Please download the latest package, and replace all old package files.
2nd. Retest your code to recur your issue. If it exists still, please email us your testCode. I tested it through the sample code testText.java .
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
Denny
2014-11-18 00:45:14.0
after some further testing, i have found that it was not a hang condition, but
instead a very slow fetching process at about 10 rows/seconds

it can fetch fast at around first 25000 rows for 2-3 seconds, then become very slow
(10 rows per second for the rest of rows)

the ftp server is on the same machine, so this is not a connection issue

and this is does not happen with the old driver (very fast about 5000 row/sec)

my test code:

import java.sql.*;

public class Test {
public static void main(String [] args) throws Throwable{
try {
Driver d = (Driver)Class.forName("com.hxtt.sql.text.TextDriver").newInstance();

Connection connection = d.connect("jdbc:CSV:ftp://user:user@192.168.48.9:2100?;_CSV_Separator=\\u005E;csvfileExtension=txt;_CSV_Quoter=;_CSV_EOL=\\u000A",null);

try {
PreparedStatement PS = connection.prepareStatement("select * from customer_dst2");

try {
ResultSet RS = PS.executeQuery();

try {
int i=0;

while (RS.next()) {
i++;
System.out.println(i+". "+RS.getString(1));
}
} finally {
RS.close();
}
} finally {
PS.close();
}
} finally {
connection.close();
}
} catch (Throwable e) {
e.printStackTrace();
}
}
}
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv
HXTT Support
2014-11-19 20:19:45.0
Supported. Please download the latest package.

Search Key   Search by Last 50 Questions




Google
 

Address: 9 Station Rd., Xiangtan City, Hunan Province, P.R. China
Postcode: 411100
Phone: (86)731-58225727
Fax: (86)731-58225727
Email: webmaster@hxtt.com
Copyright © 1999-2011 Hongxin Technology & Trade Ltd. | All Rights Reserved. | Privacy | Legal | Sitemap