This is really powerful. As a use case I'm considering the graph of streets in a city. A task would be to make predictions on the next nodes visited by a vehicle given the history of the recent nodes. I'm not sure how you deal with mini batches. Since you need the Adj matrix of the whole graph?
Mini-batching is indeed tricky, as you need the information of the complete local neighborhood for all nodes in a mini-batch. Let's say you select N nodes for a mini-batch, then you would also have to provide all nodes of the k-th order neighborhood for a neural net with k layers (if you want an exact procedure). I'd recommend to do some subsampling in this case, although this is not trivial. We're currently looking into this. Otherwise full-batch gradient descent tends to work very well on most datasets. Datasets up to ~1 million nodes should fit into memory and the training updates should still be quite fast.