Today, Stability AI released its first Japanese language model (LM), Japanese StableLM Alpha, the best-performing openly available LM created for Japanese speakers.
Japanese StableLM is a 7 billion-parameter general-purpose language model. It stands as the top-performing publicly available Japanese language model, according to a benchmark suite against four sets of other Japanese LMs.
Japanese StableLM Base Alpha 7B will be released under the commercially available Apache License 2.0. Japanese StableLM Instruct Alpha 7B is a model created for research purposes and is released exclusively for research use. For details, please refer to the Hugging Face Hub page.
“We are proud of our first big step towards contributing to the Japanese generative AI ecosystem,” said Meng Lee, Project Lead of Japanese StableLM. ”We look forward to continuing to create models across several modalities, built specifically to reflect Japanese culture, language and aesthetics”.
Japanese StableLM Base Alpha 7B is trained for text generation using large-scale data sourced mainly from the Web. The training data is predominantly composed of Japanese and English text, with the remaining 2 percent of material in the form of source code.
In addition to open datasets, the training data includes datasets created by Stability AI Japan and datasets created with the cooperation of the Japanese team of the EleutherAI Polyglot project, along with members of Stability AI Japan’s community.
For training, we used software that is an extension of EleutherAI‘s GPT-NeoX. For example, the model architecture incorporates new technologies such as SwiGLU and xPos. A cumulative total of 750 billion tokens were processed across epochs.
The Japanese StableLM Instruct Alpha 7B model is a language model that is additionally tuned to follow user instructions.
Supervised Fine-tuning (SFT) was employed for the additional training, and multiple open datasets were used. As discussed below, SFT also significantly improves the performance evaluation score by lm-evaluation-harness.
To evaluate performance, we tested the model on tasks that include sentence classification, sentence pair classification, question answering, and sentence summarization. We measured the performances using the lm-evaluation-harness benchmark of EleutherAI.
Similarly to the conventions in the Open LLM Leaderboard, the average of the scores in the eight tasks is calculated and used for the overall evaluation of each model. Japanese StableLM Instruct Alpha 7B scored 54.71, which places it far ahead of other Japanese models. Stability AI Japan is also in the process of improving the evaluation methodology for testing these models.
The models are available on Hugging Face Hub, and can be tested for inference and additional training. For more information, please visit the Hugging Face hub pages linked below:
For more details, please refer to the Hugging Face Hub pages.
Stability AI is an open access generative AI company working with partners to deliver next-generation infrastructure globally. Headquartered in London with developers around the world, Stability AI’s open philosophy provides new avenues for cutting-edge research in imaging, language, code, audio, video, 3D content, design, biotechnology, and other scientific research. For more information, visit https://stability.ai/.
Jasper Research Lab’s new shadow generation research and model enable brands to create more photorealistic…
We’re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini…
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response…
This post is co-written with Martin Holste from Trellix. Security teams are dealing with an…
As AI continues to unlock new opportunities for business growth and societal benefits, we’re working…
An internal email obtained by WIRED shows that NOAA workers received orders to pause “ALL…