vendredi, septembre 29, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions
Edition Palladium
No Result
View All Result
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription
Edition Palladium
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription
No Result
View All Result
Edition Palladium
No Result
View All Result

Easy methods to run LLaMA-13B or OpenChat-8192 on a Single GPU

Admin by Admin
juillet 14, 2023
in Natural Language Processing
0
Easy methods to run LLaMA-13B or OpenChat-8192 on a Single GPU


Lately, quite a few open-source giant language fashions (LLMs) have been launched. These highly effective fashions maintain nice potential for a variety of functions. Nonetheless, one main problem that arises is the limitation of sources relating to testing these fashions. Whereas platforms like Google Colab Professional supply the flexibility to check as much as 7B fashions, what choices do we’ve after we want to experiment with even bigger fashions, comparable to 13B?

On this weblog submit, we’ll see how can we run Llama 13b and openchat 13b fashions on a single GPU. Right here we’re utilizing Google Colab Professional’s GPU which is T4 with 25 GB of system RAM. Let’s verify how you can run it step-by-step.

Step 1:

Set up the necessities, it is advisable set up the speed up and transformers from the supply and be sure to have put in the newest model of bitsandbytes library (0.39.0).

!pip set up -q -U bitsandbytes
!pip set up -q -U git+https://github.com/huggingface/transformers.git
!pip set up -q -U git+https://github.com/huggingface/peft.git
!pip set up -q -U git+https://github.com/huggingface/speed up.git
!pip set up sentencepiece

Step 2:

We’re utilizing the quantization approach in our method, using the BitsAndBytes performance from the transformers library. This system permits us to carry out quantization utilizing varied 4-bit variants, comparable to NF4 (normalized float 4, which is the default) or pure FP4 quantization. With 4-bit bitsandbytes, weights are saved in 4 bits, whereas the computation can nonetheless happen in 16 or 32 bits. Totally different combos, together with float16, bfloat16, and float32, might be chosen for computation.

To reinforce the effectivity of matrix multiplication and coaching, we advocate using a 16-bit compute dtype, with the default being torch.float32. The latest introduction of the BitsAndBytesConfig in transformers gives the flexibleness to switch these parameters in response to particular necessities.

import torch
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer


quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16
)

Step 3:

As soon as we’ve added the configuration, now on this step we’ll load the tokenizer and the mannequin, Right here we’re utilizing Openchat mannequin, you should utilize any 13b mannequin accessible on HuggingFace Mannequin.

If you wish to use Llama 13 mannequin, then simply change the model-id to “openlm-research/open_llama_13b” and once more run the steps under

model_id = "openchat/openchat_8192"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model_bf16 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)

Step 4:

As soon as we’ve loaded the mannequin, it’s time to check it. You possibly can present any enter of your alternative, and likewise improve the “max_new_tokens” parameter to the variety of tokens you want to generate.

textual content = "Q: What's the largest animal?nA:"
gadget = "cuda:0"
inputs = tokenizer(textual content, return_tensors="pt").to(gadget)
outputs = model_bf16.generate(**inputs, max_new_tokens=35)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Output:

You should use any 13b mannequin utilizing this quantization approach utilizing a single GPU or Google Colab Professional.



Previous Post

Python has skilled a major surge in reputation as a programming language in latest instances. | by Rishi | Jul, 2023

Next Post

Information + Science

Next Post
Information + Science

Information + Science

Trending Stories

4 Methods AI Can Improve Your Advertising and marketing Methods

4 Methods AI Can Improve Your Advertising and marketing Methods

septembre 29, 2023
Deep Dive into Pandas Copy-on-Write Mode — Half III | by Patrick Hoefler | Sep, 2023

Deep Dive into Pandas Copy-on-Write Mode — Half III | by Patrick Hoefler | Sep, 2023

septembre 29, 2023
‘Speak’ to Your SQL Database Utilizing LangChain and Azure OpenAI | by Satwiki De | Sep, 2023

‘Speak’ to Your SQL Database Utilizing LangChain and Azure OpenAI | by Satwiki De | Sep, 2023

septembre 29, 2023
Robots-Weblog | Kurzinterview: 4 Fragen an Etienne Lacroix, CEO of Vention

Robots-Weblog | Kurzinterview: 4 Fragen an Etienne Lacroix, CEO of Vention

septembre 29, 2023
Prime 7 Free Cloud Notebooks for Information Science

Prime 7 Free Cloud Notebooks for Information Science

septembre 29, 2023
A Information to Picture Technology with Steady Diffusion

A Information to Picture Technology with Steady Diffusion

septembre 29, 2023
Elevate Your Expertise with Fractal Knowledge Science Skilled Certificates

Elevate Your Expertise with Fractal Knowledge Science Skilled Certificates

septembre 29, 2023

Welcome to Rosa-Eterna The goal of The Rosa-Eterna is to give you the absolute best news sources for any topic! Our topics are carefully curated and constantly updated as we know the web moves fast so we try to as well.

Categories

  • Artificial Intelligence
  • Computer Vision
  • Data Mining
  • Intelligent Agents
  • Machine Learning
  • Natural Language Processing
  • Robotics

Recent News

4 Methods AI Can Improve Your Advertising and marketing Methods

4 Methods AI Can Improve Your Advertising and marketing Methods

septembre 29, 2023
Deep Dive into Pandas Copy-on-Write Mode — Half III | by Patrick Hoefler | Sep, 2023

Deep Dive into Pandas Copy-on-Write Mode — Half III | by Patrick Hoefler | Sep, 2023

septembre 29, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

Copyright © 2023 Rosa Eterna | All Rights Reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription

Copyright © 2023 Rosa Eterna | All Rights Reserved.