학술논문

PDF-Based Chatbot Development Using LLAMA2 and LangChain: Training and Deployment for Document Interaction
Document Type
Conference
Source
2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.0 Smart Computing for Innovation and Advancement in Industry 4.0, 2024 OPJU International Technology Conference (OTCON) on. :1-6 Jun, 2024
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Power, Energy and Industry Applications
Signal Processing and Analysis
Training
Technological innovation
Systems operation
Buildings
Chatbots
Portable document format
Vectors
Natural Language Processing
Vectorizing
Embedding
Large Language Models
Document Chatbot
Language
Abstract
In the last few years, the increasing role of chatbots has been demonstrated for document interaction; these intelligent robots help users through massive textual materials and make their search expedient. This study examines the process of building a PDF-based chatbot that is running on the LLAMA2model, in particular training and deployment steps. Identified steps in the methodology will take the following order, which will allow us to combine document understanding with response generation. First, the collection of structured data is carried out through any dedicated loading from a systemised approach for versatile formats within a specified location. Purportedly, the splitting of text is followed by a mechanism, possibly a tool, which then does the job of segmenting the documents into several digestible-sized chunks for handling the processing efficiently. The grabbed text pieces are then changed to embeddings by the Hugging Face embeddings model, which can capture semantic information pertinent to future operations. What follows this is the building of the vector store based on the FAISS index to allow fast similarity search operations important for question resolution. Thereby, developing on the LLAMA2 model, specifically designed for the conversational AI systems operation, embedded into the chatbot, the models receive and generate relevant responses considering context and user questions. A given template of prompts for instant answers will be structured to make the answer generation easier and then incorporate the contextual information and user queries without difficulty. A Chatbot system is extracted as one of the RetrievalQA chains with Llama2 as a language model combined with the retrieval mechanism of the FAISS vector store. This integration aims at (i) the chatbot being able to find fitting document extracts that are incorporated into the response and (ii) the chatbot sending useful responses. The chatbot iteratively uses the testing process as a quality improvement tool, which allows it to respond exhaustively and interactively to questions coming from within the document collection.