Content

Auditing Website Content at Scale with AI

Mohammad Abdin
June 2, 2024
9
min read

Regularly auditing website content is crucial for maintaining the quality and relevance of medium to large websites. However, manually reviewing hundreds of pages can be an incredibly tedious and time-consuming task. Some of the key problems with manual content auditing include:

  • It's a slow and labor-intensive process
  • It's prone to human error and inconsistencies
  • It's difficult to maintain a consistent quality standard across all pages
  • It's challenging to identify and prioritize areas for improvement

In this article, we'll explore how Large Language Models (LLMs) can streamline the content auditing process, and provide a step-by-step guide on automating content auditing using the Moonlit platform.

Content Auditing using traditional language analysis techniques

Traditional methods for analyzing text, such as n-grams and basic Natural Language Processing (NLP) techniques, have been used for content auditing in the past. These approaches can identify keywords, analyze sentiment, and extract basic entities from the text. However, they have limitations when it comes to understanding the context, coherence, and overall quality of the content.

Harnessing the Power of LLMs for Efficient Content Auditing

LLMs, such as GPT-3 and LLaMa, have revolutionized the field of Natural Language Processing. These models are trained on vast amounts of data and can understand and generate human-like text with remarkable accuracy. When applied to content auditing, LLMs can:

  • Understand the context and meaning of the content
  • Assess the relevance and usefulness of the information
  • Identify areas for improvement in terms of clarity, structure, and readability
  • Provide specific suggestions for optimization

Choosing the right model

When selecting an LLM for content auditing at scale, several factors should be considered:

  • Cost: Different LLMs have varying pricing models. It's essential to choose a model that fits your budget, especially when processing hundreds of pages.
  • Speed: The time taken to process each page impacts the overall efficiency of the auditing process. Faster models can significantly reduce the total time required.
  • Quality: The accuracy and reliability of the LLM's output directly influence the effectiveness of the content audit. Models with higher quality output should be preferred.
  • Context Window: The maximum input size that the model can process is crucial. Models with larger context windows can handle longer pages without the need for truncation or splitting.

How to Automate Content Auditing for your website using Moonlit

While LLMs offer a powerful solution for content auditing, implementing them in production at scale presents several challenges:

  • API rate limits can restrict the number of pages that can be processed within a given timeframe
  • The cost of using LLMs can quickly add up when processing a large number of pages
  • Developing a custom solution requires significant upfront development effort

Moonlit provides a no-code platform that simplifies the process of automating content auditing using LLMs. Let's walk through the steps to set up an automated content auditing workflow using Moonlit.

Step 1: Preparing the data

To begin, you'll need a list of page URLs that you want to audit. You can manually create a CSV file with a column named 'loc' containing the URLs, or you can use an online XML sitemap to CSV converter to extract all URLs from your website's sitemap.

Here's an example of how your CSV file might look:

Alternatively, you can use Google Search Console to export a list of your website's pages. To do this:

  1. Go to Google Search Console and select your website
  2. Navigate to the "Pages" tab under "Index"
  3. Click on "Export" to download a CSV file containing your website's pages

Step 2: Building the Workflow

With Moonlit's intuitive no-code App Editor, creating a content auditing workflow is simple. Our app will consist of two main steps:

  1. Scrape the content from a given page URL
  2. Pass the scraped content through an LLM prompt to assess it against Google's "helpful content guidelines"
Content Auditor Tool Underlying Logic

The prompt we'll be using for the LLM has been carefully crafted and tested by AI SEO specialist Jonathan Boshoff:

Your task is to provide a report on a webpage content 
Check if this page meets Google Helpful Content Guidelines.

You must Give specific and actionable examples for how to further improve it. 

Do not make a recommendation if you can not provide a specific example. Recommendations should provide specific 
examples of text on the page. Please be very scrutinizing. You should only pass a page if it is exceptional. Even if 
you pass a page, you must further improve it.

You must exclude recommendations for images or links. Do not assume the page does not have images or links as you 
can not detect them.

If the page meets guidelines, provide specific examples of how it could be further improved. 

Use line breaks and spacing to make output easy to read. Content can always be improved.

Markdown format: Your response output must be in markdown format. Using headings and sub headings, bold, lists, and 
line breaks. This improves reading clarity.

-----
# Page:
{{Page Content}}
-----

# Guidelines:
The guidelines are provided in questions format, don't directly answer these questions but use them for writing your report.

- Does the content provide original information, reporting, research, or analysis?
- Does the content provide a substantial, complete, or comprehensive description of the topic?
- Does the content provide insightful analysis or interesting information that is beyond the obvious?
- Does the main heading or page title provide a descriptive, helpful summary of the content?
- Does the main heading or page title avoid exaggerating or being shocking in nature?
- Is this the sort of page you'd want to bookmark, share with a friend, or recommend?
- Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book?
- Does the content provide substantial value when compared to other pages in search results?
- Does the content have any spelling or stylistic issues?
- Is the content produced well, or does it appear sloppy or hastily produced?

-----

Please proceed with writing the report in markdown format.

We've also added another LLM step after this responsible for assigning a 1-10 priority value, with the prompt:

Based on this report, can you provide us with a number between 1-10, 1 being the page needs immediate attention and 10 being a perfect page. This will help us with prioritization,
please only respond with the number and nothing more or less.

You can find out more about prompt chaining in Moonlit in this guide. But essentially, we're just ticking the Include Message History box in the first LLM and then referencing it in the Message History field of the second LLM, that does mean the first LLM will output a list of messages, so to only get the report in our output we can use the "dot notation" (ex. {{first_llm.1.content}}) where "1" is the index of the message, index "0" would be our prompt.

Model Choice

Considering the factors mentioned earlier, we opted for LLaMa 3 as our LLM of choice. LLaMa 3 offers a good balance of speed and quality at a reasonable cost, making it suitable for processing a large number of pages. If you're dealing with a smaller set of pages (e.g., less than 20), you might consider using a more powerful model like Claude 3 or GPT-4 for even better results.

Feel free to clone the app into your own Moonlit project and customize it to fit your specific needs and test with different models.

Step 3: Run it at Scale

With our app tested and ready, it's time to process our entire list of pages using Moonlit's "Bulk Runs" feature.

Create a new Job

In the Bulk Runs tab, click on "New Job" and upload the CSV file containing your page URLs. Map the 'loc' column to the corresponding input field in the app.

Start Job

Once your data is loaded, click on "Start Job" to begin the content auditing process. Moonlit will execute the app for each row in your CSV, processing 5 rows simultaneously. You can continue working on other tasks while the job runs in the background. Upon completion, you'll receive an email notification with the results.

Bulk Run Sample

The screenshot above is only for a sample run. To use this data efficiently, you can download the CSV and then sort the pages by priority level (1 being highest priority, and 10 being lowest), then for each page, read the report and try to apply the recommended improvements.

Streamlining Content Audits with Moonlit and LLMs

In this article, we've explored the challenges of manually auditing website content and how LLMs can significantly streamline the process. By leveraging Moonlit's no-code platform, you can easily create and run a content auditing workflow at scale, saving time and resources while ensuring a consistent quality standard across your website.

Ready to take your content auditing to the next level? Sign up for Moonlit today and start automating your content workflows with the power of LLMs!

About the Author
Mohammad Abdin

Mohammad is a tech enthusiast, he boasts a Bachelor's degree in Computer Science & Artificial Intelligence. With a rich background in Digital Marketing, he has honed his skills both within dynamic agency environments and as a freelancer, serving a diverse array of clients across various industries. Leveraging his extensive expertise in digital marketing, web development, and artificial intelligence, Mohammad founded Moonlit Platform with the purpose of empowering content and SEO specialists with the tools for utilizing AI in their field to its full potential.