LoRA has been adapting the wrong part of the weight matrix

LoRA works by adding low-rank updates to the dominant singular value subspaces of pretrained weight matrices. That framing makes intuitive sense: the dominant subspaces are where the action is, where most of the model’s computation lives. A new paper from Sten Rüdiger and Sebastian Raschka argues this intuition is exactly backwards for knowledge acquisition tasks. Their method, Minor Component Adaptation (MiCA), targets the least significant singular value subspaces instead, and reports up to 5.9x more knowledge acquisition than LoRA using only 6 to 60 percent of LoRA’s parameters.

The reasoning behind the design is clerverly simple. Dominant subspaces encode the model’s core pretrained representations. The geometry of language, reasoning patterns, general capabilities. Updating those subspaces during fine-tuning disrupts precisely what makes the model useful, forcing a tradeoff between retaining general ability and absorbing new information. Minor subspaces, by contrast, are low-variance and underutilised. They have headroom to absorb new factual content without colliding with the model’s existing representational structure. MiCA uses SVD to identify these minor components and constrains parameter updates to that space.

TMiCA is specifically designed for tasks where you need to inject factual knowledge into a model — domain information, company-specific data, encyclopaedic content. This is distinct from style transfer, instruction following, or capability elicitation, where LoRA performs well. The paper’s claim is not that MiCA outperforms LoRA universally; it is that LoRA systematically underperforms when the goal is to add new factual knowledge to a model, because it adapts subspaces that are already doing useful work.

If you’re messing around with knowledge injection, so things like RAG augmentation, enterprise domain adaptation, fine-tuning on proprietary corpora, it’s worth looking into. If 6% of LoRA’s parameters achieves 5x the knowledge retention, you are leaving a lot on the table with standard LoRA configurations. The practical question is whether MiCA is stable enough to use in production without extensive tuning. The paper controls for hyperparameter optimisation, so the claimed gains are real under that condition, but real-world fine-tuning pipelines often do not run exhaustive hyperparameter searches.

The paper is brand new and has not yet been through peer review. The 5.9x figure is under optimised hyperparameters. Code and datasets are promised on acceptance. The stronger claim, that MiCA outperforms full fine-tuning on knowledge acquisition, is the one most worth scrutinising when independent reproductions appear. If it holds, the standard recommendation for knowledge injection tasks will need to change.