Day 16 Binary decoding

Wow this is a long description. The first thing to catch my eye here is that some of the analysis is going to cross byte boundaries. So in the example packet, bits 7-12 are the first number, and crosses the 8bit boundary.

We're going to need to turn the hexedecimal start into a binary stream and then start processing that stream.

The second thing that I notice is that we've got the concept of a packet and a subpacket, and I'm immedietely suspicious that our subpackets themselves might contain their own subpackets. That could be recursive, so our packet parser is going to have to think about how to represent that.

Our question asks us to only create the sum of the packet versions.

This time, I think what I'm going to do is use some python classes, mostly because I haven't done that yet, and partly because this lets us use a pattern around decompposition. If we have a Packet class, it can contain a list of subpackets. Each packet class can encode the data we care about, so version, type. We can store data if there is any (the operators don't seem to have their own data, but literal value packets do). We can then ask a packet for it's VersionChecksum, which will add together all the VersionChecksums of the subpackets and add it;s own version.

Secondly, we're going to need a parse function that takes a set of bytes and parses them as a bitstream. There's several ways of doing this, but the easiest in python is to use the bitstring package. I thought I'd use it in Day 3 but I never ended up needing it. bitstream has got a neat ability to consume bits of the stream, make decisions and then interpret the next chunk of the stream.

Let's give the classes a define first and then try parsing a literal value from a class first

Ok, we've got some packets. We don't really understand what these operators are, so I'm storing TypeId for now, but I suspect that we'll end up mapping them to operations like add, subtract and so on.

Let's get onto parsing the bitstream.

Our most complex thing is that we're going to parse the bitstream, but if we need subpackets, we actually need a way to recurse into creating packets from the remainder of the stream before we create the operator object. We'll use recursion here. But there's an oddity, when we parse a given bitstream, we might return a series of packets.

Think about the example of an operator packet that contains two subpackets. We're going to parse the version, type and then our length, whuch says we have a further 27 bits containing multiple packets. We then need to consume the next 27 bits, and pass it to a function that can return us multiple packets from it.

At the outermost layer we can only have a single packet.

We can either have two functions doing something very similar, or we can have a recursive function and put up with the fact that we're going to return a list of packets from it. The second feels more sensible here.

Ok, we've got something that can parse literal packets, but operator packets are a bit more complex because there's two different mechanisms for storing subpackets. We're going to read the length field and switch those modes and see if we can get the example operator packets to parse

Ok, this is a really interesting error, so rather than debug it, for explanatory purposes I'm going to leave this here as it is.

The examples are a little frustrating as they often don't give the version numbers or operator type in the examples, so I've put 0's into those and then I'd inspect the output and validate it.

What we've got here is the first operator packat works, so we're parsing the operator and getting the subpart of the stream, but the second operator packet, using the length type of 1 is not quite working correctly. It seems to be returning OperatorPacket(Packet(0,0) []).

Now I'm hoping that in python, if we recurse into parsepacket to read the stream, it will correctly advance the stream and return correctly....

And I've literally just written that and realised that my return value is in the exception handler, not at the end of teh stream. So far, I was somewhat expecting the process to end because we've reached the end of teh stream. This is the first time that we're ending because I hit the "number of packets to read" variable reaching 0, and as such, there's no exception, just None.

Let's fix that and see if that solves my problems

Ok, there we go. I added in some debugging. In this case because we are recursing, the debugging can be a pain to read normally. Because I prefix the print lines with the debug string, I get indention and line identification, but further when I recurse, we further indent the debugging lines for the each level of recursion, that'll make it far easier to read if I have to debug some of the multi level nested operators in the future.

The issue here incidentally was about 3 main things. I wasn't returning the packets as I thought, but that wasn't all I wasn't actually decrementing the packetstoread counter either and finally, I was trying to read 15 bits of length, which was resulting in consuming the next 4 bits of the first packet, resulting in some odd packets.

The debugging really showed this up, especially printing out the length fiels which made me go "huh, parsing 15,000 odd packets, that can't be right".

With this done, let's try on some test data and validate that our version calculations work as well

Ok, that works, let's try the production data

Part 2 - Operating on thin ground

Now we know how to parse packets, time to implement them. This is where our classes are going to help.

We're going to say that all packets have a value, and each subclass must implement its value method according to the list.

We then need to update the parser to create not just Operator packets, but the right kind of operator packet, but that shouldn't be too complex. We might want to refactor out the shared logic of parsing the subpackets though.

Now we need to parse our new packets out. Note that there's a refactoring we could do here, I'll come back to that later I think.

Ok, that seems to parse and setup correctly. Let's see if the values work as expected in the test cases

Ok, let's try running that production program

That worked.

I talked about a refactoring that we could do.

We're storing the typeid in the class, as typeid. We could also build a simple lookup table, so that we cna fetch the appropriate class based on the typeid.

That way we could simply do

version, typeid = stream.readlist("uint:3, 3")
if typeid == 4:
    # get data here
    packet = getPacketType(typeid)(version, data)
else:
    # get subpackets
    packet = getPacketType(typeid)(version, subpackets)

One could also try to abstract away the getsubpackets and get data into a routine that can return either, and then we can pass the result to the packet constructor and hope it works properly.

But I'm otherwise happy with this.

One thing to note, due to debugging this, I went back to edit the class definitions slightly. In particular I changed the repr implementation in Packet to print itself out and rely on data that can be overridden by the subclasses. This is cleaner and means less typing and repetition as we create all the subclasses.

I wonder if we'll see these data types another day in advent?