Day 6 Multiplying fish

I want to make a one fish, two fish joke here, but can't think of it.

This feels like a form of cellular automata. In this case, we can mostly ignore the fish part, and focus on what we can see. We have an input, a list of numbers, and each iteration is the result of applying a function to the list of numbers.

The rules for the list of numbers is that the next iteration produces: For an item 8-1 => item-1 For an item 0 => 8 and a 0

So we need a function that can produce 1 or 2 numbers, and then we can iterate over the list, and produce a number 1 or 2 numbers for the next generation for each item in teh current generation.

So, I think in this case, I'm going to use Python's new match feature again. This matches the number and returns either fish-1 in most cases, or when it gets to 0 returns a tuple of (6,8) to represent both the reset of the frequency for the fish and a new fish of value 8.

That works, we get a 6,8 pair if we have a fish at time 0, and otherwise we get a fish at time-1 So now we can take a given generational list, and generate a new list that is all the new generations.

That works, it's a bit ugly, and we can tidy it up later. Because we return an integer in one case, and a tuple in the toher, we need to alternate between append and extend.

But that should let us see if we can handle the test data and 80 generations.

That worked! And far better than I expected it too. Let's see how it works on the production data. I'm expecting this to be too slow for production data

Wow, that's a lot of numbers! That took a little while, but not as slow as I thought.

I'm pretty confident that part 2 is going to require a lot more time and effort though, and there's some nasty bits in that code.

Let's tidy up some of that code then.

Firstly, I'm unhappy with the append/extend thing. It would be far easier if next_for_fish returned a tuple of list in each case. That would also remove some of the complexity from the get_next_generation

Ok, that's better. I tried to use a list comprehension for get_next_generation, but because the extend function modifies its called rather than returning a new list, I couldn't work out how to get it to work. I also tried to use map and reduce here. This feels like a problem that should be intrinsically solvable with a really nice functional foldLeft or reduce call, because it's running the same operation on each generation, and then folding them together, but I couldn't get it to work in the way my functions operate, and this is good enough

Part 2 - How fast is your code?

Let's try for 256 days

No matter how I optimise this, that list is getting really big, and very slow, and we're not even halfway to 256 days. Beyond around 140 days or so, the duration starts running into seconds, and it's doubling everytime I add a generation, so this is way way too slow. I also note that this list now contains around 1/2 million items. If we assume it's 1 byte per item, that's 1/2 a megabyte in ram now, and it's going to double each time. (That's probably part of the cause of the slowdown as well, 10 more of them, and we're talking 1/2Gig of ram and counting)

We're going to need a totally different approach for this.

What if instead of a list of fish, like 1,2,3,2,3,4, we think of the fish as a dictionary of ages? So we can say there's two fish age 2, two fish age 3, one fish aged 4 and so on.

So that would like { 1:1, 2:2, 3:2, 4:1 }

And now we would need a generation function that takes a dictionary and returns a new dictionary based on the ages all dropping by 1 (and new fish being spawned).

So we're going to walk over the dictionary, and handle 2 special cases.

Firstly, the number of fish aged 6 is the number of fish from the last generation aged 7, plus all the fish that were age 0 that respawned at 6. Secondly, the number of fish aged 8 is equal to the number of fish aged 0 who spawned new fish. Finally, for every other age, it's just equal to the number of fish who were aged age+1 last generation.

Right. That's a funky way of doing it, but that seems to be far more memory and CPU friendly, lets try on the longer test cases and then production data.

I'll add that I tried mentally adding together the numbers to turn lists into dictionaries, but it was a pain, so I wrote a list_to_dict function as well. I've got a nagging feeling that something in Python's collections module can do this for me (a countedset or something) but I was heading back to meetings so I just wrote it.

The same is true for the ugly as since d = {0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0}. This is there because I coudn't be bothered to use collections.defaultdict anywhere, and rewriting my code to use d.get(index,default) was a faff. By starting the dictionary with 0's in all indexes, I just know that there's a number in each slot that can be manipulated.

Let's have a go on the test data

Ok, that works and feels really fast, let's try the same on production, first on the part 1, and then on part 2

Done it. I've had better ideas and better implementations.

There's something nagging me about this. It really does feel like it should be solvable in a really simple functional way here, because the result of each generation is simply the same function iterated on the previous generation. The problem for me is working out the stopping condition, and having time and energy to go rewrite the code to look like that.