Bring Your Text to Life: Create a Word Cloud Generator with Python

Visualizing text data is crucial for gaining insights, and word clouds offer an engaging way to do that. In this blog, we’ll walk through building a Word Cloud Generator using Python and Streamlit, allowing users to generate unigram and bigram word clouds dynamically.

Tech Stack and Libraries

We will use the following Python libraries:

Streamlit → To build an interactive web app
NLTK (Natural Language Toolkit) → For text preprocessing
WordCloud → To generate word clouds
Matplotlib → To display the word cloud images

Steps to Build the Word Cloud Generator

1. Set Up Your Environment

First, install the required libraries using:

pip install streamlit wordcloud nltk

Additionally, download NLTK stopwords:

import nltk
nltk.download('stopwords')

2. Preprocess the Text

Before creating a word cloud, we must clean the text by:
✅ Converting to lowercase
✅ Removing punctuation
✅ Eliminating stopwords
✅ Tokenizing words

Instead of using nltk.word_tokenize(), which requires downloading punkt, we simplify it with text.split().

from nltk.corpus import stopwords
import string

def preprocess_text(text):
    stop_words = set(stopwords.words('english'))
    text = text.lower().translate(str.maketrans('', '', string.punctuation))
    tokens = [word for word in text.split() if word not in stop_words]
    return tokens

3. Generate Word Clouds

We use the WordCloud library to create unigram (single-word) and bigram (two-word) clouds.

from wordcloud import WordCloud
import matplotlib.pyplot as plt

def generate_wordcloud(tokens, ngram=1):
    if ngram == 1:
        text_data = ' '.join(tokens)
    else:
        text_data = ' '.join([' '.join(tokens[i:i+2]) for i in range(len(tokens)-1)])

    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text_data)
    return wordcloud

4. Build the Streamlit App

We create a simple Streamlit UI to take user input and display the generated word clouds.

import streamlit as st

st.title("WordCloud Generator")
text = st.text_area("Enter text for WordCloud")

if st.button("Generate"):
    processed_text = preprocess_text(text)

    st.subheader("Unigram WordCloud")
    unigram_wc = generate_wordcloud(processed_text, ngram=1)
    st.image(unigram_wc.to_array(), use_container_width=True)

    st.subheader("Bigram WordCloud")
    bigram_wc = generate_wordcloud(processed_text, ngram=2)
    st.image(bigram_wc.to_array(), use_container_width=True)

5. Deploy the Streamlit App

Run it locally or deploy it in the cloud for online access.

Live Demo

Final Thoughts

This project is a great example of how Streamlit simplifies data visualization. With just a few lines of code, we built an interactive Word Cloud generator that can process and display word frequencies dynamically.

Want to try it out? 🚀 Deploy your own and enhance it with features like custom stopwords, color themes, or additional NLP preprocessing!

Let me know in the comments if you have any questions or improvements! 😊

Analytix Edge