AI-Enhanced Data Standardisation and Integration: Leveraging Large Language Models to Transform the Construction Sector 

AI-DSI_Graphical_Abstract

Funded by:

Construct Innovate Seed Fund 2025

Construct Innovate

Project start & end date:

September, 2025 ~ September, 2026 

Collaborators:

  • University of Galway (lead)
  • KOSMOS

Motivation:

This project emerged from strategic initiatives within the KOSMOS digitalisation team, aiming to tackle critical data fragmentation challenges in construction workflows. It was identified not only internal obstacles but also broader industry inefficiencies, specifically the inaccessibility of valuable historical project data due to inconsistent formats and terminology.  

Motivated by the potential for sector-wide impact, an AI-driven approach, leveraging large language models (LLMs), was initiated to automate and scale data standardisation. This solution aims to enable smarter, future-ready decision-making across the construction domain. While early results are promising, further development is essential, particularly given the variability in unstructured data collected from multiple teams across different countries, each with its own formatting.

Project summary:

This project will develop and deploy an LLM-powered framework to intelligently classify, structure, and standardise construction project data across formats and platforms. Drawing on advanced natural language processing, the system will interpret unstructured inputs, such as material descriptions and legacy cost, and transform them into AI-ready records based on consistent semantic attributes. The solution will be iteratively refined using expert feedback and validated through real-world pilot datasets provided by the industrial partner, KOSMOS. This will yield a powerful, transferable tool for reducing manual data processing and improving digital workflows across the industry. These data are used for estimating project costs & carbon at different stages of the project, making it essential to rely on historical data for accurate assumptions, and their proper analysis directly impacts decision making from the stakeholders involved in the project.

Objectives: 

  • Develop an LLM-based engine for semantic classification of fragmented construction data to achieve interoperability across platforms. 
  • Define a standardised schema aligned with domain-specific ontologies and practice to reduce manual data entry and cleaning in target workflows. 
  • Enable advanced analytics (e.g. cost, carbon prediction) through AI-ready datasets.

Aligned SDGs:

SDG 8SDG 9SDG 11SDG 13

Contact us:

PI: Dr Yadong Jiang, yadong.jiang@universityofgalway.ie 

Mrs Baran Moradkhani, B.Moradkhani1@universityofgalway.ie