Site-wide Blogs Topic Clustering

Implement site-wide topic clustering for your blogs using semantic clustering.

Mohammad Abdin


February 9, 2024

In this blog post, I'll walk you through how 'Moonlit' can be utilized for effective topic clustering, a crucial technique for boosting your SEO efforts. It's one of the more advanced use-cases but I'll walk you through a step-by-step guide on fetching, clustering, and visually representing blog content using Moonlit.

Understanding the impact of topic clustering in SEO includes:

  • Improved Keyword Strategy: 'Moonlit' aids in uncovering new keyword opportunities by identifying clusters, refining your SEO approach.
  • Organised Content: It helps organise content thematically, improving site structure and user navigation, both vital for SEO.
  • Trend Identification: The tool enables you to detect emerging trends within your content, keeping you ahead in your industry.
  • Enhanced User Engagement: By grouping similar content, 'Moonlit' helps in retaining user interest, increasing site engagement.
  • Gap Analysis in Content: It identifies content gaps, guiding you to create focused posts that cater to your audience's needs.
  • Efficient Internal Linking: Understanding clusters with 'Moonlit' assists in developing a more impactful internal linking strategy, a cornerstone of SEO success.

⭐ You can try out the App here

The Final Result of our Topic Clustering App

Setting the Inputs

To kick off our project, we first gather essential user inputs. These inputs play a pivotal role in how our AI app functions:

  • Root URL: This acts as a starting point for fetching the sitemap, typically located at /sitemap.xml. We utilise this to access all links under the domain.
  • Number of Clusters: Indicates how many topic clusters to create for the blog posts. This aids in organising content effectively.
  • Blog Prefix: Helps in filtering out non-blog pages (like contact or legal pages) from the sitemap. For instance, setting this to 'post' targets links formatted as /post/{blog-title}.

These inputs provide the foundation for our app, ensuring it operates with precision tailored to user needs.

Fetching the Blogs

This step is where the complexity ramps up. We employ a custom Python node, as shown below, to fetch blog posts and structure them into a table with titles, URLs, and content. The process starts with the root URL to access the sitemap, then filters and scrapes content using Python libraries. The outcome is a neatly organised list of blog entries.

The 3 inputs and first logic node

Semantic Clustering & Grouping

Now comes the magic - semantic clustering! We utilise a K-means clustering node, ensuring to select the 'Text Clustering' option. This transforms the blog text into vectors for clustering.

After clustering, we employ a Group By node to categorise the table by cluster, using the 'Concatenate' option to amalgamate text from each cluster. The result? A table neatly organised into distinct topic clusters.

K Means Semantic Clustering and Grouping by Cluster

Labelling the Clusters

The clusters initially appear as integers, so our next task is to make them meaningful. We use GPT to generate descriptive titles for each cluster. Before this, we remove the 'content' column to streamline the process and avoid token limit issues, focusing on URLs and Blog Titles for insights. This step is crucial for understanding and navigating our clustered content efficiently.

Labelling the clusters using GPT

Here is the used prompt:

“The table below is the result of performing semantic clustering on a list of blogs, currently each cluster is indicated by an integer. your task is to give a meaningful topic (1, 2, or a 3 word phrase) to encompass that collection of blogs. You'll notice that the data has been grouped by cluster and the text in the columns has been concatenated, use the concatenated title to infer the topic. Your response should be a JSON mapping of each cluster to it's topic, for example:

{"0": "Walkthrough Guides",

 "1": "Digital Marketing",

 ...continue for all clusters}


Here is the data:


You can ask it to return the data in whatever format you want. If you want it as a table, change the prompt to tell it to return the data in the same format just with a cluster title instead of a cluster integer. Also make sure that the ‘Force JSON’ option is ticked in the GPT node so that it outputs a valid JSON to be parsed by a Table Output node.

Visualisation (Bonus)

For an added touch, we visualise the data. I've crafted a Python function that generates an HTML string, presenting the clusters in a dynamic map diagram. This, coupled with our HTML Output node, brings a visually engaging element to our data representation.

Custom Python Script for generating a dynamic topic map
Written By

Mohammad Abdin

Mohammad is a full-stack developer, and the founder of Moonlit Platform. He holds a Bachelor's degree in Computer Science & Artificial Intelligence, and is committed to continuous learning and skill enhancement. His journey is marked by a steadfast dedication to developing and delivering exceptional product experiences.


Minute Read