java


Java Reading large files into byte array chunk by chunk


So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.
I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).
This is the code that is not a complete failure
public void rundis(Path pp) {
byte bb[] = null;
try {
bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
System.out.println("byte array made");
} catch (Exception e) {
e.printStackTrace();
}
if (bb.length != 0 || bb != null) {
System.out.println("byte array filled");
//send to method to turn into hex
} else {
System.out.println("byte array NOT filled");
}
}
I know how the process should go, but I don't know how to code that properly.
The process if you are interested:
Input file using File
Read the chunk by chunk of the file into a byte array. Ex. each byte array record hold 600 bytes
Send that chunk to be turned into a Hex value --> Integer.tohexstring
Send that hex value chunk to be made into a binary value --> Integer.toBinarystring
Mess around with the Binary value
Save to custom file line by line
Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed.
Any and all help will be appreciated, thank you for reading :)
To chunk your input use a FileInputStream:
Path pp = FileSystems.getDefault().getPath("logs", "access.log");
final int BUFFER_SIZE = 1024*1024; //this is actually bytes
FileInputStream fis = new FileInputStream(pp.toFile());
byte[] buffer = new byte[BUFFER_SIZE];
int read = 0;
while( ( read = fis.read( buffer ) ) > 0 ){
// call your other methodes here...
}
fis.close();
To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.
In pseudocode it would look something like this:
while there are more bytes available
read some bytes
process those bytes
(write the result back to a file, if needed)
In Java, you can use a FileInputStream to read a file byte by byte or chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:
FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));
We need the FileOutputStream to write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:
byte[] buf = new byte[4096];
How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes
int read = is.read(buf);
this will read up to buf.length bytes and store them in buf. It will return the total bytes read. Then we process the bytes:
//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);
process() in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.
Last, we write the result back to a file:
os.write(ret);
We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:
int read = 0;
while((read = is.read(buf)) > 0) {
byte[] ret = process(buf, read);
os.write(ret);
}
and finally close the streams
is.close();
os.close();
And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even read from TCP instead of a file, the basic logic is the same.
This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.
A example implementation for the process method:
//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
final char[] hexchars = "0123456789ABCDEF".toCharArray();
char[] ret = new char[length * 2];
for ( int i = 0; i < length; ++i) {
int b = bytes[i] & 0xFF;
ret[i * 2] = hexchars[b >>> 4];
ret[i * 2 + 1] = hexchars[b & 0x0F];
}
return ret;
}
Edit2 small code typo

Related Links

Call Java method from JNI
can't serialize class com.google.code.morphia.mapping.Mapper
Creating an implementing class that reuses objects of other implementations of the same interface
Android managing threads without using AsyncTask
Login with php return string with AsyncTask in android
Android Calendar Intent event end always one hour after start, API level 8+
Hadoop: Using a Custom Object in a Mapper's Output
get src attribute inside div tag jsoup
NoClassDefException, starting activity from another package
how to set toggle button text value from db on jtable
Disable enhancement for Play Framework model classes
Twitter4j Listen to tweets from a particular user
Second addActionListener in the class not opening new window
Logback SMTPAppender not sending email
Session amongst Angular and rest services on rails and java
To read the contents of an excel and use the values in my Robotium test case

Categories

HOME
debugging
proxy
minimum-spanning-tree
c#-4.0
deezer
react-router
jira
rdf
cakephp-2.5
dxl
umd
disassembler
facebook-messenger-bot
kibana-4
apache-cayenne
alignment
task
leiningen
visual-studio-cordova
flask-wtforms
reactcsstransitiongroup
zapier
one-hot-encoding
functional-testing
sylius
centos6.5
semantic-analysis
intel-pin
kannel
virtualdub
maquette
jaxb2
bluestacks
media-queries
javascriptcore
titanium-mobile
winrt-xaml-toolkit
vsts-build-task
amazon-kinesis-kpl
bower-install
plsql-psp
appfabric
simplexml
xenforo
angularjs-factory
lumberjack
broadcastreceiver
android-tabhost
nand2tetris
nomethoderror
estimote
vao
long-polling
upstart
jlink
typescript1.8
aurelia-validation
recursive-datastructures
forever
typed-lambda-calculus
akka-cluster
python-cryptography
mako
imanage
jspdf-autotable
quartz-composer
yt-project
file-writing
underscore.js-templating
login-control
nsfilemanager
xml-attribute
multiple-regression
lttng
hover-over
revolution-r
two-factor-authentication
photobucket
freelancer.com-api
msys
simple-framework
feedback
libressl
pundit
splash
system32
csquery
nsmutabledictionary
typekit
pick
html-helper
eol
sgen
typo3-neos
stxxl
terminal-services
pyhdf
c18
ivyde
jsctypes
blending
buster.js
random-seed
plasma
eventlistener
factory-method
specification-pattern
mt
cascalog
code-cleanup
netdna-api
qt-jambi
javax.script
jmock
brewmp
nerddinner
gacutil
private-members

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App