Using LLMs to validate AI-Results: A Case-Study in the Domain of Floor Plan Vectorization

Background

AI-based services can be used to save valuable time for expert workers, when working with vast amounts of data. While AI can produce sufficiently correct results in many cases, there are often instances, where AI errs, and humans must be kept in the loop to correct erroneous AI results. Identifying those potentially erroneous instances is often done by using statistical measures, like confidence ratings given by the model itself. We aim to make use of the capabilities of general-purpose and vision-enabled LLMs to support this initial validation.

In an effort to digitally reconstruct building models from construction plans, we use Computer Vision Models to segment and vectorize those plans. When automating this task using AI, a lot of time can be saved. Still, in some cases, human interaction is necessary to correct or fine-tune AI results. The correctness of these AI results can be validated against a set of domain-specific rules and conventions.

 

Research Goal

We aim to explore and evaluate the capabilities of Large Language Models in validating the correctness of AI results to trigger human evaluation and correction for potentially erroneous instances. To get a complete picture regarding the capabilities of LLMs in validating AI-results, there is a need to explore existing literature and perform a technical evaluation in the given domain. This technical evaluation must be performed in various dimensions, such as performance compared to human validation, input modalities, and transparency of results.

 

Working on this thesis you will:
  • Find and design an appropriate method to technically evaluate the capabilities of LLMs to validate AI-results in the target domain
  • Implement and perform the evaluation of the selected technical evaluation method
  • Quantitatively and qualitatively analyze the chances and challenges of using LLMs to validate AI-results
  • Embed your findings in the existing scientific literature

 

We look forward to receiving your application because you:
  • are interested in the field of machine learning and human-AI collaboration
  • are highly motivated to work on recent real-world problems in a self-organized and goal-oriented working mode and you bring in own ideas
  • are open-minded and willing to familiarize with the application domain that is the construction industry
  • have very good English skills as the thesis will be written in English

 

Details
  • Start: immediately 
  • Duration: 6 months
  • Location: up to you

 

We offer you a challenging research topic, close supervision, and the opportunity to develop practical and theoretical skills. If you are interested, send your CV, transcript of records, and a brief letter of motivation to sebastian schaefer2 does-not-exist.kit edu.