Sayaka Nishio presented a poster at IEEE IEEM.
2024-12-18
- news
- research

Sayaka Nishio from the Knowledge Mining (Nonaka) Laboratory at Aichi Institute of Technology, with whom we are collaborating, presented a poster at the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM).
This research proposed a novel method leveraging LLMs (Llama3) and embedding models to extract research objectives, ML methodologies, and dataset names from academic papers and visualize their interrelationships through network analysis. The approach demonstrated effectiveness, particularly for papers in the economics domain, and highlighted potential applications for research utilizing ESG data.
Title: Extraction of Research Objectives, ML Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis
Authors: S. Nishio, H. Nonaka, N. Tsuchiya, A. Migita, Y. Banno (Aichi Institute of Technology), T. Hayashi (The University of Tokyo), H. Sakaji (Hokkaido University), T. Sakumoto (Nagaoka University of Technology), K. Watabe (Saitama University)
Abstract: Machine learning (ML) has become a key tool in many sectors. Selecting the most suitable ML models and datasets for particular applications is essential to the successful use of ML in industry. Nonetheless, this process demands expertise in both the ML techniques and the specific an application domain, which can result in a steep learning curve. Consequently, research aimed at automatically extracting research goals, ML techniques, and dataset from academic literature is crucial for streamlining recommendations of appropriate methods. Traditional approaches for extracting information from literature have typically been confined to recognizing entities such as ML models. In response to this limitation, the current study introduces a novel approach that focuses on extracting tasks, ML methodologies, and dataset names from research papers, while also exploring the interconnections between them by leveraging large language models (LLMs), embedding techniques, and network clustering. The proposed approach, using Llama3, demonstrates strong performance, with an F- score above 0.8 in various categories, thereby confirming its effectiveness. Moreover, evaluations on research papers within the financial domain validate the utility of this method, presenting critical insights into the deployment of contemporary datasets, particularly those associated with ESG data.