Frontegg.ai is now available in Beta Get started
Blog

Build an AI Chatbot with Gradio, Hugging Face, Pytorch

Thinking about creating an AI chatbot? Overwhelmed by the number of tools out there purporting to do this? Whether you want to take control and fully customize the chatbot to your particular use case/company/product or otherwise, this step-by-step guide will show you how to get up and running without wasting time.

I’ve been exploring various AI tools, mostly as a hobby with a few professional POCs mixed, for almost a decade now and the technology has been through tons of ups and downs, but the volatility due to the flood of everyone trying to get in on the AI game from the last few years has been something different. With the fear of getting left behind driving a lot of tools entering the market, the promises have often failed to deliver.

However, thanks to the wonderful work from the folks over at Hugging Face and their library of AI models, datasets, and their open-source libraries, taking the first steps into the craziness that is the AI world, has been a much, much better experience and a little less overwhelming.

And, I’m here to share that knowledge with you. I’ll walk you through creating an AI chatbot backed by whatever model you want from Hugging Face. Then, you’ll be able to demo different models, learn a tiny bit about the AI behind the scenes, and have the foundation to launch from there or take it further and customize to your liking. Let’s get started!

Prereqs

This guide will be working in python due to the ubiquity of its use in AI/ML/Data Science, but you should also be aware that there are ways that you can connect what we’ll build to other languages and frameworks, but the details are for another article.

If you’re already familiar with pip and virtual environments feel free to skip ahead to the next section.

If you’re not familiar, then here’s just a quick primer on pip and virtual environments. 

  • pip is the package manager that’s shipped with modern python3 and will be used in this guide
  • virtual environments are similar to VM’s (virtual machines) as it creates an isolated, well, environment for python and dependencies like python packages. It’s not strictly necessary to use, but highly recommended for this. It helps protect your application and its dependencies from breaking as well as protecting external systems and applications that have python or package dependencies. 

That’s all you need to know and you can skip the next section if you want, otherwise, continue reading for a little deeper explanation of pip and virtual environments in python.

Precursor: Pip and Python virtual environments

PyPi, or more often referred to as just pip due to how it’s invoked from the command line, is python’s package manager just like npm/yarn/pnpm/and the seemingly 1800 other package managers for Node.js, Maven/Gradle for Java, nuget for .NET,  Composer for php, Cargo for Rust, vcpkg/Conan/Xrepo/Hunter for C and C++, CocoaPods for Objective-C/Swift…I’m sure you get the point (and no offense to those not listed, this is just an illustrative example pulled from the top of my head at the time of writing this). conda is also commonly used with python, but for this article we’ll stick with pip as it’s shipped with modern python3.

Much like docker containers or vms, virtual environments are a way to create an isolated environment for a particular python project. The latest versions of python3 even ship with venv, a tool to create them.

It’s highly recommended to make use of virtual environments for this and when building different applications in python. Virtual environments create an isolated, well, environment which can have its own version of python and python packages. This isolation helps bidirectionally to protect the environment from unintended consequences from changes being made elsewhere as well as unintended consequences externally due to changes within the environment. That was a mouthful of a sentence, but one example that might help illustrate is if you upgrade python without using virtual environments, you might unintentionally break your OS because it relied on some methods that were deprecated in that new version.

And, this isn’t just a made up example as I can confirm from my own personal experience. A lot of Unix-based OS’s (including macOS) have a version of python they depend on and if there happens to be any breaking changes in the new version you change to, it can put your machine in a severely funky state that might require a reinstall to recover from, losing data that wasn’t backed up (again, I can attest to this from personal experience. Lesson learned 😭.).

And, it’s not only python itself but for any other packages that are installed. Virtual environments also help for reproducibility because you can create a sort of recipe for creating that virtual environment and ensure that the circumstances are the same when trying your app out on another machine or at a later time.

Create a virtual environment

Since v3.5 of python, venv is officially recommended by python and bundled with its distributions.

You can create one with the following command:

* you might need to replace python with python3 depending on your system setup as python is sometimes reserved for v2 of python for backwards compatibility. You can check using which python, or equivalent command, and see if it returns a path to python3. This step is okay to use the python that’s already installed, if there is one. Be careful if it doesn’t and you want to alias python to python3 because the OS or other apps might depend on it (though, less common now that v2 python is no longer supported and we’re well into v3 python’s lifecycle).

which python
# OUTPUT:
# OK if... 
# /path/to/bin/python3 or python: aliased to python3
# Wrong python if...
# /path/to/bin/python2 or /path/to/bin/python
python -m venv /path/to/new/virtual/environment

Install required Python packages

There are a few packages that’ll be used by this project, so, after ensuring that the virtual environment is active (if using one), you can install them with pip using the following command:

# source /path/to/new/virtual/environment/bin/activate
# -U means upgrade packages to latest version, and is not strictly necessary
pip install -U torch, transformers, gradio

Building an app with Gradio

Gradio started back in 2020 to provide researchers to quickly demo their AI/ML models and has experienced significant growth over the years, recently reaching over 1 million monthly developers in 2025 using the python library to create all kinds of web apps utilizing AI.

Along with the higher lever, quick implementation ways provided when customization is not as much of a concern, they provide a newer way of building apps with more control handed over to the developer to trick out their apps, namely with the Blocks class. For purposes of speed in this post, let’s stick with the higher level ChatInterface which is specifically designed to create AI Chatbots without having to learn a new library or spend a bunch of time tweaking a bunch of different knobs just to get a basic demo to work. Let’s jump in!

‘Predict’functions when building with Gradio

Gradio interfaces like ChatInterface, and even with Blocks, need a predict function. This function is essentially the heart of your demo and is the “AI thing” you want to demo. 

Here are a couple of high-level examples to illustrate the idea: 

1. You want to demo a computer vision model capable of doing object detection.

The predict function might be something like:

# this predict function takes in an image and 
# returns a list of objects found using a particular computer vision model
def predict(image):
    return computer_vision_model.detect(image)

So, the demo app might look something like a file upload button for the user to add an image, and a box below where they get back a list of objects found.

2. Or, you want to demo a model capable of transcribing speech found in an audio file.

A predict function for this might look something like:

# this predict function takes in an audio file and
# returns the speech transcription
def predict(audio):
    return automatic_speech_recognition.transcribe(audio)

Your app might have an upload file button, or maybe a textbox to enter a url where an audio file can be pulled from. Then, once the user provides the file, the app runs and returns the automatically transcribed speech.

For our AI chatbot use case, we’ll be using an LLM to respond to messages from the user with a predict function that looks like:

# this predict function takes in a user's prompt and
# returns a response given by an LLM
def predict(prompt):
    return llm.chat(prompt)

These are simplified, high-level examples. In reality, your predict function might be much more complex than this. And, you might have multiple which depend on the user’s action (e.g., different buttons that perform different actions). But, it also might not if you just need a quick demo of what an AI model is capable of!

How to build a chatbot with Gradio, Hugging Face (and their transformers library).

Example app on GitHub: gradio-huggingface-chatbot.

Create a virtual environment

Highly recommended

Create a python virtual environment with venv or another virtual environment manager like uv or virtualenv

venv

python3 -m venv {path/to/new/virtual/env}

# activate the environment (if not already active)
source {path/to/new/virtual}/bin/activate

uv

# creates environment in .venv
uv venv

# activate the environment
source .venv/bin/activate

virtualenv

# creates environment in ./env_name
virtualenv venv

# activate the environment
source .venv/bin/activate

Install dependencies

  • transformers: Huggingface Transformers
  • gradio: Gradio for app building
  • torch: Pytorch
  • (optional) hf_xet: Huggingface Xet for faster downloading

pip or virtualenv

# optionally, add hf-xet as well
pip install -U transformers gradio torch

uv

# optionally, add hf-xet as well
uv add transformers gradio torch

Building with Gradio

*A quick note on convention: the gradio library is often imported as gr to make things a little easier when coding or reading the code.

Gradio provides ChatInterface and Chatbot which can be used on their own or combined for further customization. This makes it super easy to create a quick chat interface. 

from gradio import gr


with gr.Blocks(fill_height=True) as demo:
    gr.Markdown("# SMOL Chatbot")

    smol_chatbot = gr.Chatbot(
        type="messages",
        placeholder="""# Hi! I'm Smolly 👋\n 
### 😊 A big brain in a little package. Ask Me Anything""",
        # height="20vh",
        label="smol chatbot",
        min_height="10vh",
        max_height="40vh",
        resizable=True,
        avatar_images=(
            None,
            "https://huggingface.co/front/assets/huggingface_logo-noborder.svg",
        ),
        layout="panel",
        show_copy_all_button=True,
        watermark="built by frontegg",
    )
    smol_chat = gr.ChatInterface(
        fn=smol_predict,
        type="messages",
        chatbot=smol_chatbot,
        autofocus=True,
        examples=["What's the smallest model?", "What's an LLM?", "Where is Smallville?"],
    )

Chatbot models (LLMs) on Hugging Face

You can try some of the models out before choosing one with Hugging Face’s chat app.

Search for LLMs using the following filters on their models page:

  • task: Text Generation or Text2Text Generation
    • these are the tasks used for chatbot style AI Agents and only differ in the underlying techniques used for creating the LLM
  • library: Transformers
    • important as we’re using the transformers library

or use one of these links:

Once you find one you want to try, open its page and copy the model id and name, or “path”, at the top of the page:

e.g.,

Using high-level Transformers Pipeline

The Pipelines in the Transformers library simplifies the process by handling a lot of the work behind the scenes, so you don’t need to understand as much of the technical details of AI and LLMs. It’s only required to give a model id and a task, although, even the task can sometimes be automatically inferred depending on the available info from the model.

And, if you’re using a model on Hugging Face, then you only need to give it the model path you got in the last step. Then using it requires calling the pipeline with the user’s prompt or message. For a chat-style response, we’ll want to use a chat template, which is simply a format that includes the role of the entity that the message belongs to.

In the pipeline call, use a chat template format for the user’s message: {"role": "user", "content": message}.

For example:

from transformers import pipeline

# use the pipeline utility method to initialize a text generation pipeline with the Llama 3.2 1B Instruct LLM
generate = pipeline("text-generation", model="meta-llama/Llama-3.2-1B-Instruct")

# ...
# once the user enters something into the chat, get the user's prompt, e.g., `user_prompt`
# ...

# call the pipeline with the user's prompt using the chat template
chat_message = [{"role": "user", "content": user_prompt}]
response = generate(chat_message, return_full_text=False)

return_full_text when set to False just tells the pipeline to only return the generated response. This is a convenience to make parsing it a little easier.

So, if the user enters “Is llama fur soft?”, the response will look something like this:

[{'generated_text': 'Llama fur is generally considered to be relatively soft and warm. Llamas are South American camelids, and their fur is known for its unique characteristics.'}]

which we can use the generated_text to respond in the chat as the agent.

Response from the text generation pipeline using a chat input format looks something like this:

pipeline(user_chat_message, return_full_text=False)

# returns
[{'generated_text': 'I am a helpful AI assistant named SmolLM, trained by Hugging Face. I am here to assist you in various aspects of life, from personal to professional. Whether you are looking for advice on a specific topic or seeking help with a particular task, I am here to provide guidance and support.'}]

Tip: max_new_tokens parameter can be used to control the length of the response, but you might end up with sentences that are cut off. You can play around with the value or add something like a length penalty to make it more likely to end a response before reaching the max. The higher the max value, though, the longer it’ll likely take to process.

Run the app from the command line (making sure you’ve activated your virtual environment) with python {your_file}.py or if you want the app to reload when changes are detected in your file use the gradio command like this: gradio {your_file}.py! Then, open your browser and navigate to the url provided after running the command.

Resources: