java


Pack header and data layout in one byte array using ByteBuffer in an efficient way?


I have a header and data which I need to represent in one Byte Array. And I have a particular format for packing the header in a Byte Array and also a different format to pack the data in a Byte Array. After I have these two, I need to make one final Byte Array out of it.
Below is the layout which is how defined in C++ and accordingly I have to do in Java.
// below is my header offsets layout
// addressedCenter must be the first byte
static constexpr uint32_t addressedCenter = 0;
static constexpr uint32_t version = addressedCenter + 1;
static constexpr uint32_t numberOfRecords = version + 1;
static constexpr uint32_t bufferUsed = numberOfRecords + sizeof(uint32_t);
static constexpr uint32_t location = bufferUsed + sizeof(uint32_t);
static constexpr uint32_t locationFrom = location + sizeof(CustomerAddress);
static constexpr uint32_t locationOrigin = locationFrom + sizeof(CustomerAddress);
static constexpr uint32_t partition = locationOrigin + sizeof(CustomerAddress);
static constexpr uint32_t copy = partition + 1;
// this is the full size of the header
static constexpr uint32_t headerOffset = copy + 1;
And CustomerAddress is a typedef for uint64_t and it is made up like this -
typedef uint64_t CustomerAddress;
void client_data(uint8_t datacenter,
uint16_t clientId,
uint8_t dataId,
uint32_t dataCounter,
CustomerAddress& customerAddress)
{
customerAddress = (uint64_t(datacenter) << 56)
+ (uint64_t(clientId) << 40)
+ (uint64_t(dataId) << 32)
+ dataCounter;
}
And below is my data layout -
// below is my data layout -
//
// key type - 1 byte
// key len - 1 byte
// key (variable size = key_len)
// timestamp (sizeof uint64_t)
// data size (sizeof uint16_t)
// data (variable size = data size)
Problem Statement:-
Now for a part of project, I am trying to represent overall stuff in one particular class in Java so that I can just pass the necessary fields and it can make me a final Byte Array out of it which will have the header first and then the data:
Below is my DataFrame class:
public final class DataFrame {
private final byte addressedCenter;
private final byte version;
private final Map<byte[], byte[]> keyDataHolder;
private final long location;
private final long locationFrom;
private final long locationOrigin;
private final byte partition;
private final byte copy;
public DataFrame(byte addressedCenter, byte version,
Map<byte[], byte[]> keyDataHolder, long location, long locationFrom,
long locationOrigin, byte partition, byte copy) {
this.addressedCenter = addressedCenter;
this.version = version;
this.keyDataHolder = keyDataHolder;
this.location = location;
this.locationFrom = locationFrom;
this.locationOrigin = locationOrigin;
this.partition = partition;
this.copy = copy;
}
public byte[] serialize() {
// All of the data is embedded in a binary array with fixed maximum size 70000
ByteBuffer byteBuffer = ByteBuffer.allocate(70000);
byteBuffer.order(ByteOrder.BIG_ENDIAN);
int numOfRecords = keyDataHolder.size();
int bufferUsed = getBufferUsed(keyDataHolder); // 36 + dataSize + 1 + 1 + keyLength + 8 + 2;
// header layout
byteBuffer.put(addressedCenter); // byte
byteBuffer.put(version); // byte
byteBuffer.putInt(numOfRecords); // int
byteBuffer.putInt(bufferUsed); // int
byteBuffer.putLong(location); // long
byteBuffer.putLong(locationFrom); // long
byteBuffer.putLong(locationOrigin); // long
byteBuffer.put(partition); // byte
byteBuffer.put(copy); // byte
// now the data layout
for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
byte keyType = 0;
byte keyLength = (byte) entry.getKey().length;
byte[] key = entry.getKey();
byte[] data = entry.getValue();
short dataSize = (short) data.length;
ByteBuffer dataBuffer = ByteBuffer.wrap(data);
long timestamp = 0;
if (dataSize > 10) {
timestamp = dataBuffer.getLong(2);
}
byteBuffer.put(keyType);
byteBuffer.put(keyLength);
byteBuffer.put(key);
byteBuffer.putLong(timestamp);
byteBuffer.putShort(dataSize);
byteBuffer.put(data);
}
return byteBuffer.array();
}
private int getBufferUsed(final Map<byte[], byte[]> keyDataHolder) {
int size = 36;
for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
size += 1 + 1 + 8 + 2;
size += entry.getKey().length;
size += entry.getValue().length;
}
return size;
}
}
And below is how I am using my above DataFrame class:
public static void main(String[] args) throws IOException {
// header layout
byte addressedCenter = 0;
byte version = 1;
long location = packCustomerAddress((byte) 12, (short) 13, (byte) 32, (int) 120);
long locationFrom = packCustomerAddress((byte) 21, (short) 23, (byte) 41, (int) 130);
long locationOrigin = packCustomerAddress((byte) 21, (short) 24, (byte) 41, (int) 140);
byte partition = 3;
byte copy = 0;
// this map will have key as the actual key and value as the actual data, both in byte array
// for now I am storing only two entries in this map
Map<byte[], byte[]> keyDataHolder = new HashMap<byte[], byte[]>();
for (int i = 1; i <= 2; i++) {
keyDataHolder.put(generateKey(), getMyData());
}
DataFrame records =
new DataFrame(addressedCenter, version, keyDataHolder, location, locationFrom,
locationOrigin, partition, copy);
// this will give me final packed byte array
// which will have header and data in it.
byte[] packedArray = records.serialize();
}
private static long packCustomerAddress(byte datacenter, short clientId, byte dataId,
int dataCounter) {
return ((long) (datacenter) << 56) | ((long) clientId << 40) | ((long) dataId << 32)
| ((long) dataCounter);
}
As you can see in my DataFrame class, I am allocating ByteBuffer with predefined size of 70000. Is there a better way by which I can allocate the size I am using while making ByteBuffer instead of using a hardcoded 70000?
Also is there any better way as compared to what I am doing which packs my header and data in one byte array? I also need to make sure it is thread safe since it can be called by multiple threads.
Is there a better way by which I can allocate the size I am using while making ByteBuffer instead of using a hardcoded 70000?
There are at least two, non-overlapping approaches. You may use both.
One is buffer pooling. You should find out how many buffers you need during peak periods, and use a maximum above it, e.g. max + max / 2, max + average, max + mode, 2 * max.
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.function.Consumer;
import java.util.function.Function;
public class ByteBufferPool {
private final int bufferCapacity;
private final LinkedBlockingDeque<ByteBuffer> queue;
public ByteBufferPool(int limit, int bufferCapacity) {
if (limit < 0) throw new IllegalArgumentException("limit must not be negative.");
if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");
this.bufferCapacity = bufferCapacity;
this.queue = (limit == 0) ? null : new LinkedBlockingDeque<>(limit);
}
public ByteBuffer acquire() {
ByteBuffer buffer = (queue == null) ? null : queue.pollFirst();
if (buffer == null) {
buffer = ByteBuffer.allocate(bufferCapacity);
}
else {
buffer.clear();
buffer.order(ByteOrder.BIG_ENDIAN);
}
return buffer;
}
public boolean release(ByteBuffer buffer) {
if (buffer == null) throw new IllegalArgumentException("buffer must not be null.");
if (buffer.capacity() != bufferCapacity) throw new IllegalArgumentException("buffer has unsupported capacity.");
if (buffer.isDirect()) throw new IllegalArgumentException("buffer must not be direct.");
if (buffer.isReadOnly()) throw new IllegalArgumentException("buffer must not be read-only.");
return (queue == null) ? false : queue.offerFirst(buffer);
}
public void withBuffer(Consumer<ByteBuffer> action) {
if (action == null) throw new IllegalArgumentException("action must not be null.");
ByteBuffer buffer = acquire();
try {
action.accept(buffer);
}
finally {
release(buffer);
}
}
public <T> T withBuffer(Function<ByteBuffer, T> function) {
if (function == null) throw new IllegalArgumentException("function must not be null.");
ByteBuffer buffer = acquire();
try {
return function.apply(buffer);
}
finally {
release(buffer);
}
}
public <T> CompletionStage<T> withBufferAsync(Function<ByteBuffer, CompletionStage<T>> asyncFunction) {
if (asyncFunction == null) throw new IllegalArgumentException("asyncFunction must not be null.");
ByteBuffer buffer = acquire();
CompletionStage<T> future = null;
try {
future = asyncFunction.apply(buffer);
}
finally {
if (future == null) {
release(buffer);
}
else {
future = future.whenComplete((result, throwable) -> release(buffer));
}
}
return future;
}
}
The withBuffer methods allow a straight forward usage of the pool, while the acquire and release allow separating the acquisition and releasing points.
Another one is segregating the serialization interface, e.g. the put, putInt and putLong, where you can then implement a byte counting class and an actual byte buffering class. You should add a method to such interface to know if the serializer is counting bytes or buffering, in order to avoid unnecessary byte generation, and another method to increment byte usage directly, useful when calculating the size of a string in some encoding without actually serializing.
public interface ByteSerializer {
ByteSerializer put(byte value);
ByteSerializer putInt(int value);
ByteSerializer putLong(long value);
boolean isSerializing();
ByteSerializer add(int bytes);
int position();
}
public class ByteCountSerializer implements ByteSerializer {
private int count = 0;
#Override
public ByteSerializer put(byte value) {
count += 1;
return this;
}
#Override
public ByteSerializer putInt(int value) {
count += 4;
return this;
}
#Override
public ByteSerializer putLong(long value) {
count += 8;
return this;
}
#Override
public boolean isSerializing() {
return false;
}
#Override
public ByteSerializer add(int bytes) {
if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");
count += bytes;
return this;
}
#Override
public int position() {
return count;
}
}
import java.nio.ByteBuffer;
public class ByteBufferSerializer implements ByteSerializer {
private final ByteBuffer buffer;
public ByteBufferSerializer(int bufferCapacity) {
if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");
this.buffer = ByteBuffer.allocate(bufferCapacity);
}
#Override
public ByteSerializer put(byte value) {
buffer.put(value);
return this;
}
#Override
public ByteSerializer putInt(int value) {
buffer.putInt(value);
return this;
}
#Override
public ByteSerializer putLong(long value) {
buffer.putLong(value);
return this;
}
#Override
public boolean isSerializing() {
return true;
}
#Override
public ByteSerializer add(int bytes) {
if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");
for (int b = 0; b < bytes; b++) {
buffer.put((byte)0);
}
return this;
// or throw new UnsupportedOperationException();
}
#Override
public int position() {
return buffer.position();
}
public ByteBuffer buffer() {
return buffer;
}
}
In your code, you'd do something along these lines (not tested):
ByteCountSerializer counter = new ByteCountSerializer();
dataFrame.serialize(counter);
ByteBufferSerializer serializer = new ByteByfferSerializer(counter.position());
dataFrame.serialize(serializer);
ByteBuffer buffer = serializer.buffer();
// ... write buffer, ?, profit ...
Your DataFrame.serialize method should be refactored to accept a ByteSerializer, and in cases where it would generate data, it should check isSerializing to know if it should only calculate the size or actually write bytes.
I leave combining both approaches as an exercise, mainly because it depends a lot on how you decide to do it.
For instance, you may make ByteBufferSerializer use the pool directly and keep an arbitrary capacity (e.g. your 70000), you may pool ByteBuffers by capacity (but instead of the needed capacity, use the least power of 2 greater than the needed capacity, and set the buffer's limit before returning from acquire), or you may pool ByteBufferSerializers directly as long as you add a reset() method.
Also is there any better way as compared to what I am doing which packs my header and data in one byte array?
Yes. Pass around the byte buffering instance instead of having certain methods return byte arrays which are discarded the moment after their length is checked or their contents are copied.
I also need to make sure it is thread safe since it can be called by multiple threads.
As long as each buffer is being used by only one thread, with proper synchronization, you don't have to worry.
Proper synchronization means your pool manager has acquire and release semantics in its methods, and that if a buffer is used by multiple threads between fetching it from and returning it to the pool, you are adding release semantics in the thread that stops using the buffer and adding acquire semantics in the thread that starts using the buffer. For instance, if you're passing the buffer through CompletableFutures, you shouldn't have to worry about this, or if you're communicating explicitly between threads with an Exchanger or a proper implementation of BlockingQueue.
From java.util.concurrent's package description:
The methods of all classes in java.util.concurrent and its subpackages extend these guarantees to higher-level synchronization. In particular:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
Actions in a thread prior to the submission of a Runnable to an Executor happen-before its execution begins. Similarly for Callables submitted to an ExecutorService.
Actions taken by the asynchronous computation represented by a Future happen-before actions subsequent to the retrieval of the result via Future.get() in another thread.
Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread.
For each pair of threads that successfully exchange objects via an Exchanger, actions prior to the exchange() in each thread happen-before those subsequent to the corresponding exchange() in another thread.
Actions prior to calling CyclicBarrier.await and Phaser.awaitAdvance (as well as its variants) happen-before actions performed by the barrier action, and actions performed by the barrier action happen-before actions subsequent to a successful return from the corresponding await in other threads.
Another way of doing it would be via a DataOutputStream around a ByteArrayOutputStream, but you should concentrate your performance tuning around the places it's needed, and this isn't one of them. Efficiency isn't any kind of an issue here. The network I/O will dominate by orders of magnitude.
Another reason to use a ByteArrayOutputStream is that you don't have to guess the buffer size in advance: it will grow as necessary.
To keep it thread-safe, use only local variables.

Related Links

Can't add java overlay via layman
onOptionsItemSelected is not working in android?
How to sum the digits of two integers?
Why does declaring similar object name with HashMap has no conflict?
One ReentrantReadWriteLock instance per Variable to be synchronized?
Cassandra: Get equivalent Java Class for DataType
Cannot resolve method 'OnListItemClick'
Parcelable encountered IOException writing serializable object on Activity Pause
Determining all sub-vector combinations of a binary vector for a certain hamming weight
Split() with multiple delimiters(Not working)
How to Get value of filed “cat2” from the json string
Using Native Library in Spring With JBOSS
How do you retrieve a Method object given its name and the type of its formal parameters?
Android Volley error using Singleton pattern
What does where().eq(“colName”, new SelectArg()) mean?
Implementing a background thread on web application to check the time every minute

Categories

HOME
log4j
image
testng
hook
debugging
netsuite
zeromq
mfc
view
yahoo-oauth
tizen-web-app
cakephp-2.5
gps
disassembler
ckeditor
convolution
azure-media-services
wheelnav.js
export-to-csv
fallback
workload-scheduler
propel
ab-testing
text-rendering
viewport
karma-jasmine
google-pagespeed
jprofiler
javacv
apache-commons-io
ping
replaceall
saas
subdomains
r-raster
oracle-fusion-middleware
http-get
qwerty
gesture
serverside-rendering
theano.scan
vapor
spring-security-kerberos
fault
outlook-2013
carrot
forever
flow-control
spring-mongodb
mediaelement
topbeat
specrun
nxt
fody
py2app
fputcsv
dataview
suffix-tree
rtbkit
sdhc
django-unittest
citrus-pay
libressl
content-length
pundit
sysinternals
cisco-ios
ami
web-controls
knuth
nsbutton
java-metro-framework
issuu
funcunit
xceed-datagrid
seaside
braille
fmod
excel-2003
plasma
type-equivalence
cassini-dev
clipper
errai
subscript
mod-auth
subgurim-maps
f#-powerpack
servicehost
datareader
visitor-statistic
blitz++
zend-translate
mozilla-prism
adrotator
sustainable-pace
ugc

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App