Automate Internal Linking

In this post, I'll be sharing my journey developing an AI-powered app designed to automate the process of natural injecting internal links for a given blog post. This is something that wouldn't have been possible or at least not as powerful without the latest large language models (LLMs). I'm relying on two primary technologies here; semantic clustering and the aforementioned LLMs. Semantic Clustering will help us step over LLM's biggest current limitation that is the context window. So here's a short outline of how this would work:

Scrape the website's sitemap, and retrieve all the pages including their title and meta description
Perform semantic clustering on all these pages to find the ones most relevant to our target blog post (this is like short listing the option to avoid exceeding the context limit)
Pass the shortlisted relevant pages to an LLM along with the target blog post and prompt it to inject internal links where relevant.

If you're just interesting in tool itself, you try it here

Why Internal Linking Matters:

‍

1. Enhanced User Experience: Internal links provide your readers with a roadmap to more relevant content, keeping them engaged and improving the overall user experience on your blog.

‍

2. Improved SEO: Search engines love internal links. They help in indexing your content more effectively, understanding the structure of your site, and distributing page authority throughout your website.

‍

3. Reduced Bounce Rates: By offering additional reading material through internal links, readers are more likely to stay longer on your site, significantly reducing bounce rates.

‍

4. Boosts Page Views: Each internal link is an opportunity to increase page views, as readers discover more of your content.

‍

How it works

‍

Imagine you have a website with hundreds of blog posts. Manually reading each post to find relevant sections for internal linking can be time-consuming. Here's how our tool simplifies the process: First, provide the URL of your website's sitemap. This allows us to access all your blog posts, as well as the URL of the specific post where you want to add internal links. Utilising Moonlit's advanced SEO capabilities, we begin by refining the sitemap URLs through semantic K-means Clustering. This process filters out irrelevant URLs, keeping only the pertinent ones. Next, we use a sophisticated Language Model, like GPT-4, which has proven effective in our experience. This model analyses your target blog post and the curated list of relevant posts. It then identifies sentences or sections in the target post where internal links from the list can be seamlessly integrated.

‍

App Breakdown

All Apps in Moonlit consist of Inputs, Logic, and Outputs. Let's break down what we did for each section.

‍

Inputs

Sitemap URL: This will be passed to the 'Extract Sitemap URLs' logic node to fetch the 'url', 'title', and 'description' of each page.
Blogs Prefix: to filter out urls by specific path segments, if all blog posts are prefixed with /blog/{blog_title}, we can set the value to '/blog/' to filter for that.
Blog Post URL: This is our target blog post where we will inject internal links into.

‍

Logic

Step 1: In the first step of our logic we will be running two functions in parallel. The Extract Sitemap URLs node will return a table containing each post url, title, and description. We also run the Scrape Webpage with the Output Format option set to 'Main Text' to retrieve the post body text.
Step 2: In this step we used a custom python snippet to make sure that our target blog post url is included in the table given by the Extract Sitemap URLs node. A quick caveat here, since we only have the text body and not the meta title and description, I've used the body in place of the description and the URL in place of the title which shouldn't negatively affect our next step.
Step 3: The K-Means Clustering node is used with 'Text Clustering' ticked so that it converts all the text in our table to vector embeddings so that we can compute the distance between each one and place them into 5 separate buckets where each bucket contains the posts most relevant to each other.
Step 4: As the last pre-processing step before prompting our large language model I filter out all the clusters keeping only the cluster that contains the url to our target blog post. I also drop the 'cluster' column since it's irrelevant to our prompt.
Step 5: This is where the actual internal linking happens, we pass the List of Relevant Blog Posts along with the Target Blog Post Copy and ask GPT to add the internal links.

‍

Here is the full prompt used:

‍

List of Blog Posts:

{{custom_python_function_655266}}

‍

Target Blog Post:

{{scrape_webpage_417380}}

‍

You are given a list of blog posts that contains the url, title, and description for each post, you are also provided with the extracted contents of a 'target' blog post. Your task is to find internal linking opportunities in the target blog post using the list of blog posts. While reading the target blog post You must try to find natural ways to inject relevant internal links throughout the target blog post content.

‍

For example if the post is talking about topic X in one of the sections and you notice that there is a post in the provided list of blog posts that is relevant to it, you can add something like:

<a href="https://example.com/blog/x">you can learn more about X here</a>

try to maintain the same tone while making these minor changes.

‍

The links should be spaced out and naturally injected where it is best suited without feeling forced, here is a bad and good example of what I mean.

‍

Bad (too many links next to each other):

‍

I've written about cheese <a href="https://example.com/page1">so</a> <a href="https://example.com/page2">many</a> <a href="https://example.com/page3">times</a> <a href="https://example.com/page4">this</a> <a href="https://example.com/page5">year</a>.

‍

Better (links are spaced out with context):

‍

I've written about cheese so many times this year: who can forget the <a href="https://example.com/blue-cheese-vs-gorgonzola">controversy over blue cheese and gorgonzola</a>, the <a href="https://example.com/worlds-oldest-brie">world's oldest brie</a> piece that won the Cheesiest Research Medal, the epic retelling of <a href="https://example.com/the-lost-cheese">The Lost Cheese</a>, and my personal favourite, <a href="https://example.com/boy-and-his-cheese">A Boy and His Cheese: a story of two unlikely friends</a>.

‍

Of course this is just an example of how you would naturally inject natural link, the blog post has nothing to do with cheese.

‍

Don't force links where they are not relevant, if you can't find any relevant links to inject just respond with saying so. Otherwise your response should be the target post with the internal links injected. Please maintain the exact same body of text but you can change the wording a bit in the sections you want to fit an internal link for a more natural read.

‍

Outputs

- Original Post: The original post body without links just for direct comparison.

- Improved Post: The final result, containing the improved post with internal links.

‍

Testing & Results

Now that everything in the 'build' section is complete we can test the app. I've used Intercom as an example. It took some minor refinements to the prompt and testing with different models until I've landed good results. The model was able to return to the same text body with 3 relevant internal links injected throughout.

You can try it out and see the results here: https://app.moonlitplatform.com/apps/8qeIZfXGmS7CIo0tWvHS

‍

Final Thoughts

This journey into integrating AI for internal linking in blogs has been pretty interesting. With the AI doing the heavy lifting, the process of adding relevant links to your posts becomes simpler and more efficient. It's not just about saving time, but also about adding value to your readers' experience and giving your SEO a little nudge.

In short, this tool is like having a smart assistant who knows your content inside out. It helps you connect the dots within your blog, making sure your readers find the good stuff you've written about, without you having to comb through every post.

Remember, the goal here is to make your blog not only more reader-friendly but also more visible online. It's a simple, yet effective way to keep your content interconnected and interesting. Keep blogging, and let AI handle the links!

Written By

Mohammad Abdin

Mohammad is a passionate tech enthusiast, he boasts a Bachelor's degree in Computer Science & Artificial Intelligence. With a rich background in Digital Marketing, he has honed his skills both within dynamic agency environments and as a freelancer, serving a diverse array of clients across various industries. Leveraging his extensive expertise in digital marketing, web development, and artificial intelligence, Mohammad founded Moonlit Platform. His vision is to revolutionize the SEO and content creation landscapes by providing SEO and content specialists with innovative tools and an open, flexible environment, thereby enhancing their ability to implement advanced AI workflows effectively.