This is interesting. Recently had the idea of making up an instruction set for bit-strings so that I could generate a bunch of programs w/ the instruction set to compress bit-strings where short instruction sequences would get high scores and longer ones would get low scores.
The tricky part is designing the feedback loop to properly train the instruction generator and like in this paper I needed to also include some non-differentiable stack operations. It's surprisingly hard to find work that combines neural networks and compression algorithms even though they seem like an obvious fit. This also allows for downstream tasks that are not possible with just vector spaces because the network that can compress bit-strings must be encoding some non-trivial features of the data set and can be used to augment downstream tasks with differentiable compression.
The tricky part is designing the feedback loop to properly train the instruction generator and like in this paper I needed to also include some non-differentiable stack operations. It's surprisingly hard to find work that combines neural networks and compression algorithms even though they seem like an obvious fit. This also allows for downstream tasks that are not possible with just vector spaces because the network that can compress bit-strings must be encoding some non-trivial features of the data set and can be used to augment downstream tasks with differentiable compression.