Extremely big csv |
Denny |
2014-10-15 05:21:52 |
dear hxtt,
i have a csv files which size is about 4Gb
issuing "select * from csvfile limit 100" will last forever, consume memory & eventually crash
is there any way to do this efficiently .. i would have to take the whole file also to feed to another database, and probably have to run complex query (joins, subselect, aggregate)
please advice
regards
|
Re:Extremely big csv |
HXTT Support |
2014-10-15 06:13:48 |
Try
jdbc:csv:/d:/test/text/?_CSV_Header=true;_CSV_Quoter=;readOnlyMode=true;
|
Re:Re:Extremely big csv |
Denny |
2014-10-15 10:06:29 |
tried these:
maxScanRows=-1;readOnlyMode=true;soloMode=true;
still no change
can you confirm how much time it should be required to load 100 rows from 4gb csv with 10 columns ?
regards
|
Re:Re:Re:Extremely big csv |
HXTT Support |
2014-10-16 06:11:38 |
Checked. No relation with the connection property.
2014-10-16 fixed a load bug which waiting the thread to parse all CSV rows.
Please download the latest package.
|
Re:Re:Re:Re:Extremely big csv |
Denny |
2014-10-16 16:39:05 |
dear hxtt,
working very nice,
thank you for great support
regards
|
Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-10-17 12:42:28 |
Hello,
i found a problem after the last update..
"select count(*) from any_table" always returns 1073741823
"select count(1) from any_table" : correct result
"select count(column1) from any_table" : correct result
can this be fixed ?
regards
|
Re:Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-10-17 14:25:01 |
dear hxtt,
more problems coming one some csv file:
04:11:44,454 INFO [STDOUT] java.lang.NullPointerException
04:11:44,464 INFO [STDOUT] at com.hxtt.sql.text.b.if(Unknown Source)
04:11:44,471 INFO [STDOUT] at com.hxtt.sql.text.b.a(Unknown Source)
04:11:44,475 INFO [STDOUT] at com.hxtt.sql.text.b.run(Unknown Source)
04:11:44,477 INFO [STDOUT] at java.lang.Thread.run(Thread.java:662)
java.sql.SQLException: Failed to go Record 132001 of table customer_dst2:Inner Error: a corrupted CSV Index File.
at com.hxtt.global.SQLState.setNextException(Unknown Source)
at com.hxtt.sql.text.l.a(Unknown Source)
at com.hxtt.sql.a7.a(Unknown Source)
at com.hxtt.sql.da.a(Unknown Source)
at com.hxtt.sql.da.a(Unknown Source)
at com.hxtt.sql.fj.y(Unknown Source)
at com.hxtt.sql.d4.a(Unknown Source)
at com.hxtt.sql.d4.a(Unknown Source)
at com.hxtt.sql.d4.a(Unknown Source)
at com.hxtt.sql.b0.a(Unknown Source)
at com.hxtt.sql.b0.a(Unknown Source)
at com.hxtt.sql.b0.a(Unknown Source)
at com.hxtt.sql.am.a(Unknown Source)
at com.hxtt.sql.d6.execute(Unknown Source)
please advice
regards
|
Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-10-18 06:53:49 |
|
Supported. Please redownload it.
|
Re:Extremely big csv |
Denny |
2014-11-10 19:45:19 |
dear hxtt,
there is new problems from the last package:
1.
jdbc:CSV:ftp://user:***@127.0.0.1:2100?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;{{{charSet=437}}};_CSV_EOL=\u000A
- select count(*) from table.csv from ftp: 400000 (correct result)
- select * from table.csv : hangs
stacktrace:
java.io.RandomAccessFile.readBytes(Native Method)
java.io.RandomAccessFile.read(RandomAccessFile.java:322)
com.hxtt.concurrent.z.d(Unknown Source)
com.hxtt.concurrent.r.do(Unknown Source)
com.hxtt.concurrent.o.a(Unknown Source)
com.hxtt.concurrent.v.d(Unknown Source)
com.hxtt.concurrent.r.do(Unknown Source)
com.hxtt.sql.text.b.a(Unknown Source)
com.hxtt.sql.text.l.a(Unknown Source)
com.hxtt.sql.a7.a(Unknown Source)
com.hxtt.sql.da.a(Unknown Source)
com.hxtt.sql.da.a(Unknown Source)
com.hxtt.sql.fj.y(Unknown Source)
com.hxtt.sql.d4.a(Unknown Source)
com.hxtt.sql.d4.a(Unknown Source)
com.hxtt.sql.dh.try(Unknown Source)
com.hxtt.sql.dh.next(Unknown Source)
2.
jdbc:CSV:///c:\afolder?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=\u0022;;_CSV_EOL=\u000A;
- select count(*) from table.csv from local path : 3312 (wrong)
- select * from table.csv : 3312 (wrong)
the file used for point 1 & 2 is the same file, the difference is
point 1 is accessed from ftp, point 2 is accessed from local path
please advice
regards
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-11 04:52:07 |
|
Please email us your table.csv sample.
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-11-11 08:27:47 |
|
sample emailed to webmaster@hxtt.com
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-11 20:57:38 |
>jdbc:CSV:ftp://user:***@127.0.0.1:2100?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;{{{charSet=437}}};_CSV_EOL=\u000A
>- select count(*) from table.csv from ftp: 400000 (correct result)
>- select * from table.csv : hangs
It should be
;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;charSet=437;_CSV_EOL=\\u000A",
>- select * from table.csv : hangs
It can return 400000 for local or http protocol. Hasn't tested ftp protocol.
>- select count(*) from table.csv from local path : 3312 (wrong)
>- select * from table.csv : 3312 (wrong)
Failed to parse row 3313:quote character " can't exist in data, which doesn't begined with quote character.
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-11-11 21:06:08 |
>It should be
>;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;charSet=437;_CSV_EOL=\\u000A",
sorry, that special URL is actually processed again in my code to produce correct url, i'm pasting the wrong url
>- select * from table.csv : hangs
> It can return 400000 for local or http protocol. Hasn't tested ftp protocol.
then maybe the problem is in the ftp protocol, because in the stacktrace it hangs when reading local file, while infact the file was on ftp
>- select count(*) from table.csv from local path : 3312 (wrong)
>- select * from table.csv : 3312 (wrong)
>Failed to parse row 3313:quote character " can't exist in data, which doesn't >begined with quote character.
is it possible for the driver to throw exception with the error (not silently drop rows) ?
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-12 04:05:03 |
>then maybe the problem is in the ftp protocol, because in the stacktrace
> it hangs when reading local file, while infact the file was on ftp
It will fetch that file from ftp into memory/local temp directory.
>is it possible for the driver to throw exception with the error (not silently drop rows) ?
2014-11-12 CSV parse warning can be fetched through Statement.getWarings(). The prior version used ResultSet.getWarning().
warnings=stmt.getWarnings();
if(warnings!=null){
do{
System.out.println(warnings.getMessage());
System.out.println("Error Code:"+warnings.getErrorCode());
System.out.println("SQL State:"+warnings.getSQLState());
}while((warnings=warnings.getNextWarning())!=null);
stmt.clearWarnings();
}
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-11-12 04:28:39 |
>then maybe the problem is in the ftp protocol, because in the stacktrace
> it hangs when reading local file, while infact the file was on ftp
> It will fetch that file from ftp into memory/local temp directory.
so there is no resolution to this ?
the old driver (jar size 1,324,231) has no problem (can fetch 400000) in few seconds
the new driver (jar size 1,449,148) hangs
both with the same url
jdbc:CSV:ftp://user:***@127.0.0.1:2100?;_CSV_Separator=\u005E;csvfileExtension=txt;_CSV_Quoter=;_CSV_EOL=\u000A
if this is the case then i have to revert to old version, which will waste all previous effort above
please advice
regards
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-12 05:12:20 |
>the old driver (jar size 1,324,231) has no problem (can fetch 400000) in few seconds
>the new driver (jar size 1,449,148) hangs
For ftp protocol? I will check it.
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-13 18:26:42 |
Checked. Ftp can work normal.
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-11-13 18:36:17 |
have you tested with the sample I provide ?
you have to fetch the rows, because it can retrieve about 20K rows before hang
it also can fetch small files normally
I have tested it over&over again and it always hang at about 20K rows out of 400K
but when I switch to old driver, it works ok
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-13 18:47:44 |
>I have tested it over&over again and it always hang at about 20K rows out of 400K
I can fetch all 400000 rows through testText.java
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-14 21:49:55 |
>>I have tested it over&over again and it always hang at about 20K rows out of 400K
Failed to recur your issue.
1st. Please download the latest package, and replace all old package files.
2nd. Retest your code to recur your issue. If it exists still, please email us your testCode. I tested it through the sample code testText.java .
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
Denny |
2014-11-18 00:45:14 |
after some further testing, i have found that it was not a hang condition, but
instead a very slow fetching process at about 10 rows/seconds
it can fetch fast at around first 25000 rows for 2-3 seconds, then become very slow
(10 rows per second for the rest of rows)
the ftp server is on the same machine, so this is not a connection issue
and this is does not happen with the old driver (very fast about 5000 row/sec)
my test code:
import java.sql.*;
public class Test {
public static void main(String [] args) throws Throwable{
try {
Driver d = (Driver)Class.forName("com.hxtt.sql.text.TextDriver").newInstance();
Connection connection = d.connect("jdbc:CSV:ftp://user:user@192.168.48.9:2100?;_CSV_Separator=\\u005E;csvfileExtension=txt;_CSV_Quoter=;_CSV_EOL=\\u000A",null);
try {
PreparedStatement PS = connection.prepareStatement("select * from customer_dst2");
try {
ResultSet RS = PS.executeQuery();
try {
int i=0;
while (RS.next()) {
i++;
System.out.println(i+". "+RS.getString(1));
}
} finally {
RS.close();
}
} finally {
PS.close();
}
} finally {
connection.close();
}
} catch (Throwable e) {
e.printStackTrace();
}
}
}
|
Re:Re:Re:Re:Re:Re:Re:Re:Extremely big csv |
HXTT Support |
2014-11-19 20:19:45 |
|
Supported. Please download the latest package.
|