A few months ago I came across maximum mean discrepancy as a measure of distribution difference, and today I read this term and totally forgot what is means and had to find a youtube video to refresh my understanding. This happens a lot of times in my research. I feel like unless something is really basic (e.g. CNN, cross entropy, etc) and used a lot in my day-to-day model building, I easily forgot what I have read. I wonder is it just because I have a bad memory or I do not have a good way to organize information?
The only paper I remember how it works is when I try to implement it myself
yeah that works for me, but there are too many papers out there
Try a hyperbolic time chamber.
Or you try to teach someone else
By implementing a model without looking at the paper, you essentially perform autoencoding/masked language modelling and learn a more compact latent representation.