Exploring self-supervised webscraper code generation with LLMs

Using LLMs to fully self-supervise the creation of a webscraper given a website

poster

Description

The process of writing code has been sped-up massively through the rapid adoption of LLMs in the programming process. However, most workflows still largely “human-in-the-loop”, with a programmer running, verifying and helping to debug generated programs. We want to test the ability for LLMs to fully self-supervise in the code generation process: verifying if outputs are correct. We test this in the limited context of code generation for website scrapers and achieve a proof-of-concept that LLMs are able to fully self-supervise in the creation of webscrapers, enabling their use as a fully-automated generalized scraping tool.