Follow our blog ⇒ Follow

Bring Your Text to Life: Create a Word Cloud Generator with Python

Read time: 2 min
how to generate word cloud

Visualizing text data is crucial for gaining insights, and word clouds offer an engaging way to do that. In this blog, we’ll walk through building a Word Cloud Generator using Python and Streamlit, allowing users to generate unigram and bigram word clouds dynamically.

Tech Stack and Libraries

We will use the following Python libraries:

  • Streamlit → To build an interactive web app
  • NLTK (Natural Language Toolkit) → For text preprocessing
  • WordCloud → To generate word clouds
  • Matplotlib → To display the word cloud images

Steps to Build the Word Cloud Generator

1. Set Up Your Environment

First, install the required libraries using:

pip install streamlit wordcloud nltk

Additionally, download NLTK stopwords:

import nltk
nltk.download('stopwords')

2. Preprocess the Text

Before creating a word cloud, we must clean the text by:
✅ Converting to lowercase
✅ Removing punctuation
✅ Eliminating stopwords
✅ Tokenizing words

Instead of using nltk.word_tokenize(), which requires downloading punkt, we simplify it with text.split().

from nltk.corpus import stopwords
import string

def preprocess_text(text):
    stop_words = set(stopwords.words('english'))
    text = text.lower().translate(str.maketrans('', '', string.punctuation))
    tokens = [word for word in text.split() if word not in stop_words]
    return tokens

3. Generate Word Clouds

We use the WordCloud library to create unigram (single-word) and bigram (two-word) clouds.

from wordcloud import WordCloud
import matplotlib.pyplot as plt

def generate_wordcloud(tokens, ngram=1):
    if ngram == 1:
        text_data = ' '.join(tokens)
    else:
        text_data = ' '.join([' '.join(tokens[i:i+2]) for i in range(len(tokens)-1)])

    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text_data)
    return wordcloud

4. Build the Streamlit App

We create a simple Streamlit UI to take user input and display the generated word clouds.

import streamlit as st

st.title("WordCloud Generator")
text = st.text_area("Enter text for WordCloud")

if st.button("Generate"):
    processed_text = preprocess_text(text)

    st.subheader("Unigram WordCloud")
    unigram_wc = generate_wordcloud(processed_text, ngram=1)
    st.image(unigram_wc.to_array(), use_container_width=True)

    st.subheader("Bigram WordCloud")
    bigram_wc = generate_wordcloud(processed_text, ngram=2)
    st.image(bigram_wc.to_array(), use_container_width=True)

5. Deploy the Streamlit App

Run it locally or deploy it in the cloud for online access.

Live Demo


Final Thoughts

This project is a great example of how Streamlit simplifies data visualization. With just a few lines of code, we built an interactive Word Cloud generator that can process and display word frequencies dynamically.

Want to try it out? 🚀 Deploy your own and enhance it with features like custom stopwords, color themes, or additional NLP preprocessing!

Let me know in the comments if you have any questions or improvements! 😊

About the Author

Results-driven Data Analyst with expertise in SQL, Power BI, Tableau, and Excel. Proven track record in data extraction, cleaning, and analysis, driving data-driven decisions. Skilled in collaborating with cross-functional teams to enhance data quality and deliver actionable insights.
linkedin

Post a Comment

Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.