Day 9 - Moving memory

Created on Monday, December 9, 2024.

Day 9 Part 1

Ok, this looks like a fun little problem. We have a dense format that is designed to represent memory layout, so 995989 represents a 9 block file (with id 0) at location 0, a 9 block space, a 5 block file with id 1 at location 18, a 9 block space, an 8 block file with id 2 starting at location 32, and then 9 blocks of space.

There's a cute visual representation which I think might be a bit of a red herring, but dense representation 2333133121414131402 should look like

00...111...2...333.44.5555.6666.777.888899

We defrag and make space by moving blocks from the end of memory to the first available free space. Critically, we need to remember the original ID's as well, as our checksum at the end is each blocks ID multiplied by its position.

How to approach this

There's lots of potentially smart things we can do, but the brute force and simple technique here might be to do as the example has done, and represent the in memory blocks as bytes. I note that in the example above we conveniently have 18 digits, meaning the numbers 1 to 9 can be used in the representation, but looking at my input, we're afect about 20,000 characters, so we need to consider going up to around 10,000 for our ID. We're going to be doing lots of swaps, which is in theory inefficient, but we can probably worry about that later.

My bigger initial worry was just how big this space gets. Any puzzle that has a compressed input has a potential to ask you to allocate terrabytes of RAM if you aren't careful.

With 20,000 digits, if every single one was a 9, we'd be looking at around 9 memory spaces for every digit, so 9*20,000 so only about 200,000 bytes which is less than half a megabyte. I think we're fine, but I'm keeping an eye on that in case it bites me later.

So if we're fine with it, we're going to go for the simplest thing that works: Unpack the dense array into a memory array, which is a vector of u16's representing each block in memory and the file id stored in it. Run a defragment algorithm, which will proceed along the blocks looking for spaces (whcih reminds me, we need a special value for representing free space), and when it finds one, grabbing the last full space. Finally, generate a checksum and return it.

Let's give that a go with sample input and see if we can generate a memory array that looks right.

/// We find free blocks by starting at start and counting forwards until we find a u16::MAX
fn find_next_free(input: &[u16], start: usize) -> usize {
    let mut start = start;
    while input[start] != u16::MAX {
        start += 1;
    }
    start
}

/// We find blocks to move by starting at start and counting backwards until we find a block that isn't u16::MAX
pub fn find_next_block(input: &[u16], start: usize) -> usize {
    let mut start = start;
    while input[start] == u16::MAX {
        start -= 1;
    }
    start
}

That's very simple, so simple in fact that I wonder if we might be better of with an iterator of an enumeration, but with us constantly changing the underlying memory datastructure, I think we'll get some weird issues.

So lets work on the actual defragment operation. In essence we're going to keep going while find_next_block > find_next_free, moving the block to free.

Let's see if it works on the test data.

pub fn defragment(input: &mut Vec<u16>) -> Vec<u16> {
    let mut next_block: usize = find_next_block(input, input.len() - 1);
    let mut next_free: usize = find_next_free(input, 0);
    while next_free < next_block {
        input[next_free] = input[next_block];
        input[next_block] = FREE;
        next_block = find_next_block(input, next_block - 1);
        next_free = find_next_free(input, next_free + 1);
    }
    input.clone()
}

And that works on the test data as expected, which was a bit of a surprise to me at least! And even adding two simple tests for boundary conditions (what if there's a free block at space 0, and equally if theres a full block at the last position) seems to work.

So on with calculating the checksum, this really is a simple iteration over the enumeration of the location and the fileid, multiplying them if the id isn't FREE, and then summing the total...

input
    .iter()
    .enumerate()
    .map(|(i, id)| if *id != FREE { (i as u16) * *id } else { 0 })
    .sum()

A bit of funkiness with turning the usize into a u16, derefrencing the id and multiplying, but otherwise nice and clean.... (right)

Tpye errors and math

So, I got a comically low number and didn't twig for some time. I spent a bit of time staring at my code, and even went and got an example with longer input from Reddit in order to check I wasn't doing something completely stupid...

Except of course I was. I wasn't doing the thing others seemed to do, which is treating the underlying string as the memory and forgetting that ID's can be quite big, but there's something silly in my line above.

For a 20,000 long line, it's likely that the last few ids are 19,998, 19,999 etc... When multiplied together they are going to rapidly overflow a simple 16 bit number!

A quick hack to use a 64 bit number, and we get a result. It's a bit dirty because for some reason in Rust multiplication although being left associative is not defined sensibly. So a u64 multiplied by a u16 should give you a u64, but for some reason, the standard multiplication ops are only defined for similar operand types, so u64*u64 is fine, but we need to explicitely promote a u16 into a u64.

That does give me a slightly nicer to read statement though:

.map(|(i, id)| {
    if *id != FREE {
        (i as RunResult) * (*id as RunResult)
    } else {
        0
    }
})

Part 2 - Files who move together stay together

Ok, I was expecting something much worse than this. We now just change our defragment function to keep whole files together. That means that we're going to have to change our find_next_free to take a target block size. Critically, looking at their examples, we see that the 2 9's move from the end to the start, but the 8's don't move at all. We need to find a contiguous space, or return some kind of error and move onto the next file, which is a concept that we haven't used so far.

I kind of wish I'd used a different representation now, but I think we can do this, so lets give it a go.

Firstly, we need to be able to go through the blocks on at time by ID, so we need to find a block by it's id. But we also need to know how long it is.

/// We find a block by finding the first block with the number, and the size by continueing until the block isn't that id
fn find_block_by_id(input: &[u16], id: u16) -> (usize, usize) {
    let mut start = 0;
    let mut size: usize = 0;
    while input[start] != id {
        start += 1;
    }
    while input[start + size] == id {
        size += 1;
    }
    (start, size)
}

Secondly, we need to find a free block of the right size. This is going to be rife with boundary conditions, because a gap of exactly the right size is just what we want. Luckily this is agreedy algorithm, so we can just return as soon as we find a gap the right size.

/// We look for free blocks of at least the right size that are before end
pub fn find_first_free_of_size(input: &[u16], size: usize, end: usize) -> usize {
    let mut start = 0;
    let mut found = 0;
    while found < size && start < end {
        while input[start] != FREE {
            start += 1;
        }
        while input[start + found] == FREE {
            found += 1;
            if found == size {
                return start;
            }
        }
        found = 0;
        start += 1;
    }
    return MAX;
}

Returning MAX here is a bit ugly, we should probably be returning an Ok or Err, but I'll come back to try that in factoring.

So now we have those our updated refactoring is to simply go from the highest id down to 0, looking to see if there's a block to move it too that is earlier than the existing position. I wondered if we could stop once we pass the pointers, but since we don't know if the last move will be moving block 1 from position 1 to position 0, I think we have to go the whole way.

Ugly as sin, but it works

pub fn defragment_without_splits(input: &mut Vec<u16>) -> Vec<u16> {
    // First we find the last ID in the disk
    let mut i = input.len() - 1;
    while input[i] == FREE {
        i -= 1
    }
    let mut id = input[i];
    // Now we go through them all
    while id > 0 {
        let (block, size) = find_block_by_id(input, id);
        let free = find_first_free_of_size(input, size, block);
        if free != MAX {
            for j in 0..size {
                input[free + j] = id;
                input[block + j] = FREE;
            }
        }
        id -= 1;
    }
    input.clone()
}

I found a bug in the above find first free of size function as well. Once we start searching free memory, we don't check whether we've gone past our boundary condition, so I had to change

while input[start + found] == FREE

to

while input[start + found] == FREE && start < end 

But it works, and in under 250 milliseconds. There's some obvious optimisations I could do, but I think I'll call it.

Next

Day 14 - Not really a grid

Previous

Day 7 - It's turtles all the way down