Visualizing text data is crucial for gaining insights, and word clouds offer an engaging way to do that. In this blog, we’ll walk through building a Word Cloud Generator using Python and Streamlit, allowing users to generate unigram and bigram word clouds dynamically.
Tech Stack and Libraries
We will use the following Python libraries:
- Streamlit → To build an interactive web app
- NLTK (Natural Language Toolkit) → For text preprocessing
- WordCloud → To generate word clouds
- Matplotlib → To display the word cloud images
Steps to Build the Word Cloud Generator
1. Set Up Your Environment
First, install the required libraries using:
pip install streamlit wordcloud nltk
Additionally, download NLTK stopwords:
import nltk
nltk.download('stopwords')
2. Preprocess the Text
Before creating a word cloud, we must clean the text by:
✅ Converting to lowercase
✅ Removing punctuation
✅ Eliminating stopwords
✅ Tokenizing words
Instead of using nltk.word_tokenize()
, which requires downloading punkt
, we simplify it with text.split()
.
from nltk.corpus import stopwords
import string
def preprocess_text(text):
stop_words = set(stopwords.words('english'))
text = text.lower().translate(str.maketrans('', '', string.punctuation))
tokens = [word for word in text.split() if word not in stop_words]
return tokens
3. Generate Word Clouds
We use the WordCloud
library to create unigram (single-word) and bigram (two-word) clouds.
from wordcloud import WordCloud
import matplotlib.pyplot as plt
def generate_wordcloud(tokens, ngram=1):
if ngram == 1:
text_data = ' '.join(tokens)
else:
text_data = ' '.join([' '.join(tokens[i:i+2]) for i in range(len(tokens)-1)])
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text_data)
return wordcloud
4. Build the Streamlit App
We create a simple Streamlit UI to take user input and display the generated word clouds.
import streamlit as st
st.title("WordCloud Generator")
text = st.text_area("Enter text for WordCloud")
if st.button("Generate"):
processed_text = preprocess_text(text)
st.subheader("Unigram WordCloud")
unigram_wc = generate_wordcloud(processed_text, ngram=1)
st.image(unigram_wc.to_array(), use_container_width=True)
st.subheader("Bigram WordCloud")
bigram_wc = generate_wordcloud(processed_text, ngram=2)
st.image(bigram_wc.to_array(), use_container_width=True)
5. Deploy the Streamlit App
Live Demo
Final Thoughts
This project is a great example of how Streamlit simplifies data visualization. With just a few lines of code, we built an interactive Word Cloud generator that can process and display word frequencies dynamically.
Want to try it out? 🚀 Deploy your own and enhance it with features like custom stopwords, color themes, or additional NLP preprocessing!
Let me know in the comments if you have any questions or improvements! 😊