Enriching Leads using Large Language Models

Mohammad Abdin
June 1, 2024
min read

Lead enrichment involves gathering and analyzing additional data about potential customers to enhance the information already in your CRM or sales database. This process can significantly improve lead qualification, personalize sales approaches, and increase the efficiency of marketing strategies. However, manually performing lead enrichment poses considerable challenges:

  • Time-consuming: Manually sourcing data requires significant time and effort.
  • Outdated information: Keeping data current manually is challenging, often resulting in outdated information that can negatively impact sales and marketing efforts.
  • Inefficiency: Manual data collection is often inefficient, consuming resources that could be better spent on strategic activities.

The data you'll collect will be based on

A. The current information you have.

B. Why you need the data.

At this pivotal moment in our company's journey, we cherish the opportunity to personally engage with our users. This interaction is crucial for garnering feedback and rapidly refining your experience with Moonlit. So in this article we are sharing our own case study. Using only emails & names, we want to retrieve our user's LinkedIn, in addition to notes about the individual. We'll go through the process of building the Moonlit app, running it at scale, and how you can customize it for your use case.

Building Process

Leads Enricher - Full Workflow

After setting two inputs, one for the email, and one for the name, we added a custom python function for preprocessing the information to generate a Google search query that we can use the fetch public information about the user.

Our goal is to infer the company website from their email, if they're using a regular non-work email, then the search query will only consist of the prospect's name.

search_query = inputs.get('Name', "")
email = inputs.get('Email', "")

domain_regex = r'@([\w.-]+\.\w+)'
match =, email)
common_hostnames = {'gmail', 'hotmail', 'yahoo',
	'outlook', 'aol', 'comcast', 
	'live', 'msn', 'passport',
    'ymail', 'icloud', 'mail', 
    'zoho', 'gmx', 'me', 
    'yandex', 'proton'}

if match:
	domain =
	host = domain.split('.')[0]
	if host not in common_hostnames:
		search_query = search_query + " " + host
return search_query

Gathering & Processing Data

We passed the search query returned from the function above to a Google Search function to return the top 5 results before finally passing these results to an LLM (Large Language Model) to summarize and structure it properly.

The LLM is tasked with providing us with the LinkedIn profile if found, and notes about the prospect.

We've used the following prompt:

I programmatically ran a google search to get the top 5 results for one of my leads. The query used was: "{{Search Query}}"

Here are the results:
{{Search Results}}

Your task is to read each one of these results to try to derive any public information about this lead in a 
structured format. I want you to respond with a valid JSON object containing two keys; `linkedin`, and `notes`. 
linkedin url should obviously be the lead's LinkedIn profile if it was found in the results, otherwise you can set 
it as "n/a". You must make sure that the profile corresponds to the lead, to help you with that, I'll provide you 
with some context on my target audience so if you notice their linkedin summary or headline fits the criteria then 
it's likely them. The `notes` key should contain summarized paragraph of all of what you found in the google 
results, that can include their company, work, or literally anything that can help me personalize my communications 
with them.

# Context (target audience):
The lead is for a B2B SaaS, it's a no code platform that allows SEO & content specialists to automate AI workflows 
by building super customizable input/output tools, hence the target audience includes seo managers, agency managers, 
seo consultants, content specialists, and any adjacent roles.

# Example response:

 "linkedin": "",
 "notes": "Works at xyz. Is skilled in xyz..."
Please do not include any characters before or after the json object as your response will be directly passed to a JSON parser.

Remember to change the `# Content (target audience)` section

We've specifically prompted the AI to return the results in JSON format so that we can place each the LinkedIn profile and Notes in seperate outputs.

Enriching Leads - Outputs

Running it at scale

To run this workflow at scale for all of our leads, we used Moonlit's Bulk Run feature which allows you to provide a CSV with 1000s of inputs and have the workflow run across them in parallel.

After exporting all the leads, we created a new bulk job.

New Bulk Job Modal

This will create a table that includes the inputs on the left side, and the outputs on the right side.

Starting the job will begin running through each row, filling the values for the outputs.

Bulk Run Table

Final Thoughts

This concludes our own case study with Moonlit. It saved us days of research to learn about each prospect. We still manually check each prospect before reaching out to them and make sure that the information is correct. We've made this app clonable so you can access it here, clone it and make any custom edits. You can add more inputs if you have more information about your prospects for more accurate results, and you can also extend it's functionality so that it returns other social accounts aside from LinkedIn.

If you need any help with such use cases, feel free to schedule a call with us and we'll always be happy to help.

About the Author
Mohammad Abdin

Mohammad is a tech enthusiast, he boasts a Bachelor's degree in Computer Science & Artificial Intelligence. With a rich background in Digital Marketing, he has honed his skills both within dynamic agency environments and as a freelancer, serving a diverse array of clients across various industries. Leveraging his extensive expertise in digital marketing, web development, and artificial intelligence, Mohammad founded Moonlit Platform with the purpose of empowering content and SEO specialists with the tools for utilizing AI in their field to its full potential.