1.58-bit large language model

A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a transformer large language model with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and perplexity of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit FP16 or BF16) counterparts, this design allows reaching the same artificial intelligence goals with much lower hardware requirements, latency, and training effort.

The name comes from a fact that a single trit, a ternary arithmetic equivalent of a bit that can take the {-1, 0, 1} values, carries $log_{2}3\approx 1.58$ bits of information. The 1.58-bit LLM models are also called 1-bit LLMs (the true 1-bit models also exist).

BitNet

In 2024, Ma et al., researchers at Microsoft declared that their 1.58-bit model, BitNet b1.58 is comparable in performance to the 16-bit Llama 2 and opens the era of 1-bit LLM. BitNet creators did not use the post-training quantization of weights but instead relied on the new BitLinear transform that replaced the nn.Linear layer of the traditional transformer design.

In 2025, Microsoft researchers had released an open-weights and open inference code model BitNet b1.58 2B4T demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.

Critique

Some researchers point out that the scaling laws of large language models favor the low-bit weights only in case of undertrained models. As the number of training tokens increases, the deficiencies of low-bit quantization surface.

References

Sources

Ma, Shuming; Wang, Hongyu; Ma, Lingxiao; Wang, Lei; Wang, Wenhui; Huang, Shaohan; Dong, Li; Wang, Ruiping; Xue, Jilong; Wei, Furu (2024-02-27). "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". arXiv:2402.17764 [cs.CL].
Ma, Shuming; Wang, Hongyu; Huang, Shaohan; Zhang, Xingxing; Hu, Ying; Song, Ting; Xia, Yan; Wei, Furu (2025). "BitNet b1.58 2B4T Technical Report". arXiv:2504.12285 [cs.CL].
Friha, Othmane; Amine Ferrag, Mohamed; Kantarci, Burak; Cakmak, Burak; Ozgun, Arda; Ghoualmi-Zine, Nassira (2024). "LLM-Based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness". IEEE Open Journal of the Communications Society. 5: 5799–5856. doi:10.1109/OJCOMS.2024.3456549. ISSN 2644-125X.
Hutson, Matthew (2024-05-30). "1-bit LLMs Could Solve AI's Energy Demands". IEEE Spectrum. Retrieved 2025-04-22.
Huyen, Chip (2024-12-04). AI Engineering. "O'Reilly Media, Inc.". ISBN 978-1-0981-6627-4. Retrieved 2025-04-22.
Kumar, Tanishq; Ankner, Zachary; Spector, Benjamin F.; Bordelon, Blake; Muennighoff, Niklas; Paul, Mansheej; Pehlevan, Cengiz; Ré, Christopher; Raghunathan, Aditi (2024). "Scaling Laws for Precision". arXiv:2411.04330 [cs.LG].
Morales, Jowi (2025-04-17). "Microsoft researchers build 1-bit AI LLM with 2B parameters". Tom's Hardware. Retrieved 2025-04-21.
Ouyang, Xu; Ge, Tao; Hartvigsen, Thomas; Zhang, Zhisong; Mi, Haitao; Yu, Dong (2024). "Low-Bit Quantization Favors Undertrained LLMS: Scaling Laws for Quantized LLMS with 100T Training Tokens". arXiv:2411.17691 [cs.LG].
Wang, Hongyu; Ma, Shuming; Dong, Li; Huang, Shaohan; Wang, Huaijie; Ma, Lingxiao; Yang, Fan; Wang, Ruiping; Wu, Yi; Wei, Furu (2023). "BitNet: Scaling 1-bit Transformers for Large Language Models". arXiv:2310.11453 [cs.CL].