Monday, December 10, 2012

Message / Packet Parsing

A common problem in designing and writing distributed systems is the handling of the wire protocol. To help in solving these problems both many programmers go it on their own writing their own serialization tools, while others trust third-party tools to ease their development.  After doing a little bit of both I'm not convinced I prefer one approach over another.

The example:

First let me provide an example of a message:

SampleMessage {
  int record; // unique id
  Type type; // some enumed field
  bytes message; // The data
  bytes signature; // Integrity and authenticity of this "SampleMessage"

Custom made serializer:

Using our own serializer, we could do the following assuming a SampleMessage msg:
bytes buffer;


And then on the parsing side:
bytes buffer;

int length = buffer.read_int(8); 

msg.set_message(buffer.mid(12, length));
int signature_length = buffer.read_int(12 + length); 
msg.set_signature(buffer.mid(12 + length + 4, signature_length));

So the major caveats are the following: what is the size of Type, is it uniform across all platforms? Also we're making a lot of potentially unnecessary copies for what might be large datagrams.

Third-party serializers (without message definition):

Alternatively, let's assume we have something like QDataStream:
QByteArray data;
QDataStream stream(&data, QIODevice::in);
stream << msg.record() << msg.type() << msg.message() << msg.signature();
// or maybe even
stream << msg;

For parsing:
QDataStream stream(data);
stream >> msg;
// or maybe not...
int record;
Type type;
QByteArray message;
QByteArray signature;
stream >> record >> type >> message >> signature;

In this case, we just have to check that our output is sane or perhaps look at the QDataStream and ensure that it is still in good working order (Status != ReadPastEnd), but how do we check the signature matches the tuple (record, type, message) in any efficient manner?

Third-party serializers (with message definition):

A completely different serializer, protobufs would work as such:
std::stringstream stream(std::stringstream::out);
string output = stream.str();

And on the return:
std::stringstream stream(output, std::stringstream::out);

Protobuf doesn't handle the signature issue any easier and requires both an external compiler and a library.

Thus far...

Protobufs would be great if we could encapsulate a Message within a SignedMessage, then we *should* be able to get the original character array used for constructing the Message and verify that the signature is correct.  Unfortunately that does not happen.

QByteArray does allow for constructing a QByteArray from another without doing a copy of the underlying array.  However, we do not have the access we need from QDataStream to know where into the QByteArray to construct the base (unsigned) message.

Using our own method allows us to have this fine grained control but at the cost of writing more expressive code and having more debugging routines.

Similar Packets

Ideally we want to reduce our packet parsing as much as possible.  So we can embed multiple packets in the same path.  Using something like protobuf, where we must define the data we expect to be pushing around, makes it complicated for this arbitrary behavior.  Requiring us to embed packets of one type as bytes in another or requiring this lower level packet to know about higher layer packets breaking modularity.  The same could be said about QDataStream, but then again it allows us to avoid unnecessary copies.  In either case, both scenarios feel unnatural.  If we want our home grown packets to have these features, the code will start feeling bloated and potentially complex -- welcome to a whole new world of coding bugs.

I'm still brainstorming on my conclusion and hopefully I'll update when I'm satisfied until then....