Python Question

User Generated

Ounvenin1357

Programming

University of North Texas

Description

Unformatted Attachment Preview

Generative AI: LLM: Knowledge Base: Documents Q&A – Search Systems: PART II Thuan L Nguyen, PhD 2: Generative AI: LLM: Knowledge Base: QA & Search AI Deep learning (Source: mindovermachines.com​) 3: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI What is It? Generative AI: A category of artificial intelligence focused on using AI deep learning models to generate new contents, including text, images, audio, video, and more. The contents are novel but look realistic and may be indistinguishable from human-created ones. 4: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLM What is It? Generative AI is based on the NLP technologies such as Natural Language Understanding (NLU) and Conversational AI (AI Dialogues) Those among the most challenging tasks AI needs to solve. 5: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLMs Large Language Models Large Language Models (LLMs) are revolutionary AI Deep Learning neural networks that excel in natural language understanding (NLU) and content generation. • “LARGE" in LLMs refers to the vast scale of data and parameters used to train them, allowing LLMs to develop a comprehensive understanding of language. • Being particularly transformer-based models trained on massive text datasets using deep learning techniques, • Able to learn complex language patterns, capture nuances like grammar and tone, and generate coherent and contextually relevant text 6: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLMs Large Language Models Large Language Models (LLMs) are revolutionary AI Deep Learning neural networks that excel in natural language understanding (NLU) and content generation. • “LARGE" in LLMs refers to the vast scale of data and parameters used to train them, allowing LLMs to develop a comprehensive understanding of language. • Being particularly transformer-based models trained on massive text datasets using deep learning techniques, • Able to learn complex language patterns, capture nuances like grammar and tone, and generate coherent and contextually relevant text 7: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview This project aims to develop a knowledge-based Question-Answer and Search System. The project is done using the Cloud Integrated Development Environment (IDE) System (CIDES) provided by Google Cloud Platform (GCP) Vertex Ai services. The user can design, build, and test generative AI applications using the CIDES Vertex AI, taking advantages its rich features and ample resources, especially the collaboration and integration of GCP CIDES Vertex AI and LangChain generative AI platforms, in which LangChain’s RetrievalAugmented Generation libraries are tightly merged with advanced vector embeddings matching techniques of GCP Vertex AI. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 8: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview It is assumed that each group is an AI system development team in a business organization. With the explosion of popularity and widely spread of employing generative AI in real-world management and business activities, the leaders of the corporation wants the team to develop a generative AI system that the company employees can use it to perform content search, ask questions, and get answers about the contents of the organization proprietary documents. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 9: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview The team will adopt the GCP CIDES Vertex AI to design, build, and test the system throughout the project, including but not limited to cloud storage, vector embeddings generation, vector databases management, and advanced vector search technologies. For development, the group will use Python for coding with Google Collaboratory (Colab) as the coding IDE. The group also plans to use popular generative AI techniques, including but not limited to Retrieval Augmented Generation (RAG), Sentence Transformers, and tools provided by generative AI platforms like LangChain and Hugging Face. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 10: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems Phase II: Develop and Test Q&A – Search System Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 11: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 1: Q&A – Search Systems: Matching Engine Index and Endpoint: Specify Parameters The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) – a powerful vector-embeddings matching system. To use Matching Engine, a matching engine index of the knowledge base and an ME end-point must be created, which require several important parameters Use the following code to set values of the required parameters. 12: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 2: Q&A – Search Systems: Matching Engine Index and Endpoint: Create GCP Cloud Storage Buckets The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) – a powerful vector-embeddings matching system. To use Matching Engine, GCP cloud storage buckets must be created to store Matching Engine Index Use the following code to create the required GCP cloud storage buckets IMPORTANT NOTES: --) When rerunning the code of the project from the start: • The above buckets must be deleted. • OR new buckets for the Matching Engine Index must be created with new names. • Different from the currently used names. 13: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 3: Q&A – Search Systems: Initialize Newly-Created Matching Engine Index Folder with Dummy Embeddings File The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) – a powerful vector-embeddings matching system. The newly created GCP cloud storage buckets used to store Matching Engine Index must be initialized with a dummy embeddings JSON file. Use the following code to create and save the dummy file into the buckets. 14: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 4: Q&A – Search Systems: Create Matching Engine The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) (Now: Vector Search Engine) – a powerful vector-embeddings matching system. An instance of the GCP Vertex AI: Matching Engine (ME) or Vector Search Engine (VSE) must be created and used for the Q&A Search tasks. Use the following code to create an instance of the GCP Vertex AI: Matching Engine (ME) or Vector Search Engine (VSE). 15: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 5: Q&A – Search Systems: Create Matching Engine Index The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) (Now: Vector Search Engine) – a powerful vector-embeddings matching system. ME (or VSE) Index is one of the most important components of the ME or VSE system. The ME or CSE indexes are specialized data structures used to store vector embeddings. They are optimized for fast and efficient similarity search, especially for the GCP proprietary Approximate Nearest Neighbor (ANN) vector search algorithm. To perform Q&A search on the knowledge base of documents, the matching engine index must be created. Use the following code to invoke a method of the GCP Vertex AI: Matching Engine (ME) to create the index. 17: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 6: Q&A – Search Systems: Deploy ME Index to ME End Point The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) (Now: Vector Search Engine) – a powerful vector-embeddings matching system. The ME or VSE endpoints serve as the access points for interacting with the Matching Engine instance. • Endpoints accept queries containing vector embeddings and return the most similar items from the indexed data. • Endpoints can be private, accessible only within your GCP project, or public, allowing external applications to interact with your Matching Engine. Use the following code to deploy the ME (or CSE) Index to the ME (or VSE) endpoint. IMPORTANT NOTES: --) It can be fast, or it can take some time - several minutes and even longer - to complete this step. --) It is necessary to keep the computer and the monitor running actively, not hibernating or sleeping when the system is running to avoid disruption. 18: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 6: Q&A – Search Systems: Deploy ME Index to ME End Point Use the following code to deploy the ME (or CSE) Index to the ME (or VSE) endpoint. 19: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 7: Q&A – Search Systems: Load PDFs as “Documents” Using LangChain Document Loaders The Q&A – Search system: To process PDFs, first, the knowledge-based files are loaded and their metadata are created. 20: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 8: Q&A – Search Systems: Verify Newly Created Metadata The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) (Now: Vector Search Engine) – a powerful vector-embeddings matching system. After creating metadata for the knowledge-based PDFs, use the following code to verify the metadata of the first PDF in the knowledge base. 21: Generative AI: LLM: Knowledge Base: QA & Search Phase II - Step 9: Q&A – Search Systems: Recursively split by character (chunk documents) To process the data, each file needs to be chunked, or split into smaller chunks to be fit within the context window of the large language model (LLM). In this project, Gemini-pro 1.0 is used; its window context is 32000 tokens. Use the following code to recursively split the documents. 22: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step10: Q&A – Search Systems: Verify Split Data The Q&A – Search system: To perform Q&A Search tasks, the system needs to use GCP Vertex AI: Matching Engine (ME) (Now: Vector Search Engine) – a powerful vector-embeddings matching system. After splitting the PDFs, use the following code to verify the split data related to the first document. 23: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step11: Q&A – Search Systems: Create an Instance of Matching Engine (or Vector Search Engine) Configured as Vector Database (GCP: Vector Store): Get ME Index and ME Endpoint The Q&A – Search system: To perform Q&A Search tasks using vector search features efficiently, vector embeddings are stored in Vector Store, the GCP proprietary vector database. First use the following code to get the Matching Engine (ME) / Vector Search Engine (VSE) Index ID and Endpoint ID. 25: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step12: Q&A – Search Systems: Create an Instance of Matching Engine (or Vector Search Engine) Configured as Vector Database (GCP: Vector Store) The Q&A – Search system: One advanced and powerful feature of GCP: Vertex AI is to configure the LLM instance as a vector database, i.e., GCP Vector Store. Next use the following code to create an instance of Matching Engine (ME) or Vector Search Engine (VSE) with embeddings. It means it is configured as GCP: Vector Store, a powerful vector database. 24: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step13: Q&A – Search Systems: Create Metadata for Each Document Chunk The Q&A – Search system: Documents are split into smaller chunks that are transformed into vector embeddings using GCP Vertex Embeddings API. Then, the embeddings are added into the Matching Engine (ME) / Vector Search Engine (VSE) Index. First use the following code to prepare text and metadata for each document chunk before adding their vector embeddings into the Matching Engine (ME) / Vector Search Engine (VSE) Index. 26: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step14: Q&A – Search Systems: Add Documents as Embeddings into Matching Engine (or Vector Search Engine) Index The Q&A – Search system: Documents are split into smaller chunks that are transformed into vector embeddings using GCP Vertex Embeddings API. Then, the embeddings are added into the Matching Engine (ME) / Vector Search Engine (VSE) Index. Next use the following code to add vector embeddings of each document chunk into the Matching Engine (ME) / Vector Search Engine (VSE) Index. IMPORTANT NOTES: --) It may take some time – several minutes or even longer to complete this step. --) It is necessary to keep the computer and the monitor running actively, not hibernating or sleeping when the system is running to avoid disruption. 27: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step15: Q&A – Search Systems: Test Sematic Search Using GCP Vertex AI Matching Engine / Vector Search Engine The Q&A – Search system: Run tests to verify if the semantic search with the Matching Engine (ME) / Vector Search Engine (VSE) works correctly. Use the following code to run Test 1 of the verification. 28: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step15: Q&A – Search Systems: Test Sematic Search Using GCP Vertex AI Matching Engine / Vector Search Engine The Q&A – Search system: Run tests to verify if the semantic search with the Matching Engine (ME) / Vector Search Engine (VSE) works correctly. The Q&A search results of Test 1: 29: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step16: Q&A – Search Systems: Test Sematic Search Using GCP Vertex AI Matching Engine / Vector Search Engine The Q&A – Search system: Run tests to verify if the semantic search with the Matching Engine (ME) / Vector Search Engine (VSE) works correctly. Use the following code to run Test 2 of the verification. 30: Generative AI: LLM: Knowledge Base: QA & Search Phase II – Step16: Q&A – Search Systems: Test Sematic Search Using GCP Vertex AI Matching Engine / Vector Search Engine The Q&A – Search system: Run tests to verify if the semantic search with the Matching Engine (ME) / Vector Search Engine (VSE) works correctly. The Q&A search results of Test 2: Generative AI: LLM: Knowledge Base: Documents Q&A – Search Systems: PART III Thuan L Nguyen, PhD 2: Generative AI: LLM: Knowledge Base: QA & Search AI Deep learning (Source: mindovermachines.com​) 3: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI What is It? Generative AI: A category of artificial intelligence focused on using AI deep learning models to generate new contents, including text, images, audio, video, and more. The contents are novel but look realistic and may be indistinguishable from human-created ones. 4: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLM What is It? Generative AI is based on the NLP technologies such as Natural Language Understanding (NLU) and Conversational AI (AI Dialogues) Those among the most challenging tasks AI needs to solve. 5: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLMs Large Language Models Large Language Models (LLMs) are revolutionary AI Deep Learning neural networks that excel in natural language understanding (NLU) and content generation. • “LARGE" in LLMs refers to the vast scale of data and parameters used to train them, allowing LLMs to develop a comprehensive understanding of language. • Being particularly transformer-based models trained on massive text datasets using deep learning techniques, • Able to learn complex language patterns, capture nuances like grammar and tone, and generate coherent and contextually relevant text 6: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLMs Large Language Models Large Language Models (LLMs) are revolutionary AI Deep Learning neural networks that excel in natural language understanding (NLU) and content generation. • “LARGE" in LLMs refers to the vast scale of data and parameters used to train them, allowing LLMs to develop a comprehensive understanding of language. • Being particularly transformer-based models trained on massive text datasets using deep learning techniques, • Able to learn complex language patterns, capture nuances like grammar and tone, and generate coherent and contextually relevant text 7: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview This project aims to develop a knowledge-based Question-Answer and Search System. The project is done using the Cloud Integrated Development Environment (IDE) System (CIDES) provided by Google Cloud Platform (GCP) Vertex Ai services. The user can design, build, and test generative AI applications using the CIDES Vertex AI, taking advantages its rich features and ample resources, especially the collaboration and integration of GCP CIDES Vertex AI and LangChain generative AI platforms, in which LangChain’s RetrievalAugmented Generation libraries are tightly merged with advanced vector embeddings matching techniques of GCP Vertex AI. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 8: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview It is assumed that each group is an AI system development team in a business organization. With the explosion of popularity and widely spread of employing generative AI in real-world management and business activities, the leaders of the corporation wants the team to develop a generative AI system that the company employees can use it to perform content search, ask questions, and get answers about the contents of the organization proprietary documents. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 9: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview The team will adopt the GCP CIDES Vertex AI to design, build, and test the system throughout the project, including but not limited to cloud storage, vector embeddings generation, vector databases management, and advanced vector search technologies. For development, the group will use Python for coding with Google Collaboratory (Colab) as the coding IDE. The group also plans to use popular generative AI techniques, including but not limited to Retrieval Augmented Generation (RAG), Sentence Transformers, and tools provided by generative AI platforms like LangChain and Hugging Face. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 10: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems Phase III: Clean Up (Remove and Delete) Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 11: Generative AI: LLM: Knowledge Base: QA & Search Phase III – Step1: Q&A – Search Systems: Set Flag & Get ME ID and ME Endpoint The Q&A – Search system: After running all the steps, it is necessary to delete all the resources required for the system while running it to avoid unnecessary charges. Use the following code to set the flag and get the Matching Engine Index ID and ME Endpoint ID. 12: Generative AI: LLM: Knowledge Base: QA & Search Phase III – Step 2: Q&A – Search Systems: Delete ME Endpoint The Q&A – Search system: After running all the steps, it is necessary to delete all the resources required for the system while running it to avoid unnecessary charges. Use the following code to delete the Matching Engine Endpoint 13: Generative AI: LLM: Knowledge Base: QA & Search Phase III – Step 3: Q&A – Search Systems: Delete ME Endpoint The Q&A – Search system: After running all the steps, it is necessary to delete all the resources required for the system while running it to avoid unnecessary charges. Use the following code to delete the Matching Engine Index. 14: Generative AI: LLM: Knowledge Base: QA & Search Phase III – Step 4: Q&A – Search Systems: Delete ME Endpoint The Q&A – Search system: After running all the steps, it is necessary to delete all the resources required for the system while running it to avoid unnecessary charges. Use the following code to delete contents in the GCP Cloud buckets used to store contents created during the system execution. 15: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 ADTA 5760: Generative AI with Large Language Models Thuan L Nguyen, PhD Final Project 1. Overview The final project covers all the topics discussed during the course. The materials posted in any format for the class activities should be considered and used for the project. Additionally, the student can use any other source of information that he/she can gather. IMPORTANT NOTES: --) The student should present his/her work for each section using text and images. --) The sources can be from class lectures, assignments, and more, or other sources --) One picture is worth 1000 words. However, an image with text explaining what it is and what it is for is considered complete and much more convincing. --) Images can include the screenshots the student has taken while working on the classwork. IMPORTANT NOTES: --) When discussing a topic or answering a question, the student must provide adequate explanation and supporting details to his/her presentation. IMPORTANT NOTES: --) If MS Docx is the document format required for submission, the student must submit the contents as MS Docx files, not submit PDF documents. --) However, before submitting, the student should make a backup copy of the documents by converting them into PDF files that could be used for re-submission if the submitted file was corrupted. 2. Final Project: Assignment Format The final project is assigned as a team assignment, which means that all the group’s student members will collaborate while working on it. However, each student must write and submit his/her report independently. In other words, a student works on the assignment with the team but writes and submits the report as if he/she had worked independently. 3. Final Project: Overview Each group is assumed to be an AI system development team in a business organization. With the explosion of popularity and widespread use of generative AI in real-world management and business activities, the corporation’s leaders want the team to develop a generative AI system that the company employees can use to perform content searches, ask questions, and get answers about the contents of the organization’s proprietary documents. The team will adopt Google Cloud Platform (GCP): Vertex AI services as the primary system Integrated Development Environment (IDE) to design, build, and test the system throughout the project, including but not limited to cloud storage, vector embeddings generation, vector databases management, and advanced vector search technologies. For development, the group will use Python for coding with Google Collaboratory (Colab) as the coding IDE. The group also plans to use popular generative AI techniques, including but not limited to Retrieval Augmented Generation (RAG), Sentence Transformers, and tools provided by generative AI platforms like LangChain and Hugging Face. The team plans to use Gemini 1.0 pro and the latest versions of both Python and LangChain in the system development process. 4. PART I: Create a Google Colab Account (5 Points) TO-DO: Individual students • Each student must create a free account with Alphabet/Google Collaboratory. TO-DO: Groups • Each group must create a paid PRO account with Alphabet/Google Collaboratory. Page 2 of 7 5. PART II: Seminar: Journey of Generative AI – From Boltzmann Distribution and Markov Chain to Large Language Model (15 Points) 5.1 Overview A seminar on the topic “The Journey of Generative AI: From Boltzmann Distribution and Markov Chain to Large Language Model” is scheduled for Thursday, 04/25/2024. • Time: 6:30 PM – 8:30 PM • Date: 04/25/2024 • Location: FRISCO Campus - FRLD #260 IMPORTANT NOTES: • The class meeting of WEEK 15 is rescheduled from Wednesday 04/24/2024 to Thursday 04/25/2024. TO-DO: • Attend the seminar • Write an essay (3 pages) on the topic: What technologies have contributed to the advent and advances of generative artificial intelligence (AI)? SUBMISSION REQUIREMENTS PART II: --) Submit the essay. IMPORTANT NOTES: • The student must attend the seminar to get credit for this assignment, including writing the essay. o If the student does not attend the seminar, he/she will not receive any credit for this assignment regardless of he/she writes the essay. o The reason is that the essay must be based solely on what the student can learn from the seminar. 6. PART III: Generative AI A&Q-Search System: Planning, Requirements, Data (5 Points) SUBMISSION REQUIREMENTS PART III: --) Submission requirements of HW 4 7. PART IV: Generative AI A&Q-Search System: System Analysis (5 Points) SUBMISSION REQUIREMENTS PART IV: --) Submission requirements of HW 5 Page 3 of 7 8. PART V: Generative AI A&Q-Search System: System Design (10 Points) SUBMISSION REQUIREMENTS PART V: --) Submit the system design – the high level and the detailed design – of the Q&A Search system. IMPORTANT NOTES: --) To get credit for PART V, the student must take notes of the lecture (in class) on Wednesday 04/17/2024 and use the notes to design the Q&A – Search system developed with the project. • To complete this section, the student cannot use any contents or materials obtained from the Internet or any external source.. 9. PART VI: Generative AI Q&A-Search System: System Set-Up (10 Points) The project is developed using the latest version of Alphabet/Google cloud platform, LangChain, and LangChain – GCP Vertex AI Interface. • • • google-cloud-aiplatform: Version 1.44.0 (latest 03/20/2024) LangChain: Version 0.1.12 (latest 03/20/2024) langchain-google-vertexai: Version 0.1.1 (latest 03/20/2024) TO-DO: • Use Google Colab to start a new Jupyter Notebook o Name the notebook: ADTA_5760_Final_Project_QASearch_System.ipynb • Access the document: gcp_vertexai_knowledgebase_QASearch_system_PART_I.pdf • Add code to the notebook following steps of PHASE I of the project discussed in the above document to set up the Cloud Integrated Development Environment System (CIDES) of the project IMPORTANT NOTES: • The code for each step must be in one cell of the Jupyter Notebook. SUBMISSION REQUIREMENTS PART VI: --) Submit the Jupyter Notebook after completing the project. 10. PART VII: Generative AI Q&A-Search System: System Development (20 Points) TO-DO: • Access Google Colab and open the Jupyter Notebook started on PART VI o Notebook: ADTA_5760_Final_Project_QASearch_System.ipynb • Access the document: gcp_vertexai_knowledgebase_QASearch_system_PART_II.pdf Page 4 of 7 • Add code into the notebook o Following steps of PHASE II of the project discussed in the above document to develop the Q&A – Search system based on the system design completed in PART V IMPORTANT NOTES: • The code for each step must be in one cell of the Jupyter Notebook. SUBMISSION REQUIREMENTS PART VII: --) Submit the Jupyter Notebook after completing the project. 11. PART VIII: Generative AI Q&A-Search System: System Testing (10 Points) TO-DO: • Access Google Colab and open the Jupyter Notebook started on PART VI o Notebook: ADTA_5760_Final_Project_QASearch_System.ipynb • Access the document: gcp_vertexai_knowledgebase_QASearch_system_PART_II.pdf • Add code into the notebook o Following steps of PHASE II of the project discussed in the above document to test the semantic search developed with the GCP Vertex AI Matching Engine (or Vector Search Engine) IMPORTANT NOTES: • The code for each step must be in one cell of the Jupyter Notebook. SUBMISSION REQUIREMENTS PART VIII: --) Submit the Jupyter Notebook after completing the project, including the results of the tests. 12. PART IX: Generative AI Q&A-Search System: System Clean-Up (5 Points) TO-DO: • Access Google Colab and open the Jupyter Notebook started on PART VI o Notebook: ADTA_5760_Final_Project_QASearch_System.ipynb • Access the document: gcp_vertexai_knowledgebase_QASearch_system_PART_III.pdf • Add code to the notebook o Following steps of PHASE III of the project discussed in the above document to clean up (delete and remove components) the project to avoid unnecessary charges after completing the project Page 5 of 7 SUBMISSION REQUIREMENTS PART IX: --) Submit the Jupyter Notebook after completing the project, including the steps of cleaning it up and the results. 13. PART X: Final Project: Final Presentation (20 Points) Each team presents the Generative AI Q&A-Search system that the team has developed for the final project on Wednesday, 05/01/2024 (WEEK 16). For the demo of the Q&A Search system, the team must use the knowledge base, i.e., the set of PDFs, collected by the team. It means the team must update the code, very minor updates – only one or two line of codes may be impacted – to incorporate the changes. IMPORTANT NOTES: --) For all sections before PART X, the assigned knowledge base is used. --) The collected knowledge base is only used for PART X. To get grades for PART X, every member of each team must participate in the presentation, i.e., actively deliver part of the presentation, to get grades for PART X. • Any team member who did not present any content in the team presentation would not get any points for PART X. IMPORTANT NOTES: --) All teams must be in class by the presentation start (6:00 PM). --) To be fair, the order of presenting, i.e., which team makes the presentation first, second, and so on, is determined by drawing a team number randomly. 14. HOWTO Submit 14.1 Final Project Report and All Related Documents The student must submit all required documents (MS Word documents) for the final project by uploading them to his/her OneDrive folder for submissions. IMPORTANT NOTES: --) The student receives a link to access the folder via the UNT email. The following documents are required for the final project reports: 1. 2. 3. 4. SEMINAR Essay (3 pages) (PART II) Project: Business, Technical, and Data Requirements (PART III: HW 4) Generative AI Q&A-Search System: System Analysis (PART IV: HW 5) Generative AI Q&A-Search System: System Design (PART V) Page 6 of 7 5. Generative AI Q&A-Search System: Code: Set-Up, Develop, Test (PART VI, VII, VIII, and IX) a. Generative AI Q&A-Search System: System Set-Up (PART VI) b. Generative AI Q&A-Search System: System Development (PART VII) c. Generative AI Q&A-Search System: System Testing (PART VIII) d. Generative AI Q&A-Search System: System Clean-Up (PART IX) 6. Generative AI Q&A-Search System: Code: Complete Jupyter Notebook used for the final presentation IMPORTANT NOTES: --) For the code (PART VI, VII, VIII, and IX), the student must submit the entire Jupyter Notebook. --) The student must run the code in every cell of the notebook to show the result if displayed. The student is required to inform the instructor about the submission by email sent to the instructor (Thuan.Nguyen@unt.edu ) The subject of the email must be: “ADTA 5750: Final Project – Submission.” Due date & time: 8:00 AM – Wednesday 05/01/2024 IMPORTANT NOTES: --) Due to the limited time for grading and posting the grades as required by the Registrar’s Office, submissions must be on time. Page 7 of 7 Generative AI: LLM: Knowledge Base: Documents Q&A – Search Systems: PART I Thuan L Nguyen, PhD 2: Generative AI: LLM: Knowledge Base: QA & Search AI Deep learning (Source: mindovermachines.com​) 3: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI What is It? Generative AI: A category of artificial intelligence focused on using AI deep learning models to generate new contents, including text, images, audio, video, and more. The contents are novel but look realistic and may be indistinguishable from human-created ones. 4: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLM What is It? Generative AI is based on the NLP technologies such as Natural Language Understanding (NLU) and Conversational AI (AI Dialogues) Those among the most challenging tasks AI needs to solve. 5: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLMs Large Language Models Large Language Models (LLMs) are revolutionary AI Deep Learning neural networks that excel in natural language understanding (NLU) and content generation. • “LARGE" in LLMs refers to the vast scale of data and parameters used to train them, allowing LLMs to develop a comprehensive understanding of language. • Being particularly transformer-based models trained on massive text datasets using deep learning techniques, • Able to learn complex language patterns, capture nuances like grammar and tone, and generate coherent and contextually relevant text 6: Generative AI: LLM: Knowledge Base: QA & Search Artificial Intelligence: Generative AI: LLMs Large Language Models Large Language Models (LLMs) are revolutionary AI Deep Learning neural networks that excel in natural language understanding (NLU) and content generation. • “LARGE" in LLMs refers to the vast scale of data and parameters used to train them, allowing LLMs to develop a comprehensive understanding of language. • Being particularly transformer-based models trained on massive text datasets using deep learning techniques, • Able to learn complex language patterns, capture nuances like grammar and tone, and generate coherent and contextually relevant text 7: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview This project aims to develop a knowledge-based Question-Answer and Search System. The project is done using the Cloud Integrated Development Environment (IDE) System (CIDES) provided by Google Cloud Platform (GCP) Vertex Ai services. The user can design, build, and test generative AI applications using the CIDES Vertex AI, taking advantages its rich features and ample resources, especially the collaboration and integration of GCP CIDES Vertex AI and LangChain generative AI platforms, in which LangChain’s RetrievalAugmented Generation libraries are tightly merged with advanced vector embeddings matching techniques of GCP Vertex AI. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 8: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview It is assumed that each group is an AI system development team in a business organization. With the explosion of popularity and widely spread of employing generative AI in real-world management and business activities, the leaders of the corporation wants the team to develop a generative AI system that the company employees can use it to perform content search, ask questions, and get answers about the contents of the organization proprietary documents. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 9: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems: Overview The team will adopt the GCP CIDES Vertex AI to design, build, and test the system throughout the project, including but not limited to cloud storage, vector embeddings generation, vector databases management, and advanced vector search technologies. For development, the group will use Python for coding with Google Collaboratory (Colab) as the coding IDE. The group also plans to use popular generative AI techniques, including but not limited to Retrieval Augmented Generation (RAG), Sentence Transformers, and tools provided by generative AI platforms like LangChain and Hugging Face. Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 10: Generative AI: LLM: Knowledge Base: QA & Search Q&A – Search Systems Phase I: Install, Set Up, and Develop Q&A Search Ecosystem Source: Thuan L Nguyen - AI generated images using Google DeepMind Imagen 2 11: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 1: Q&A – Search Systems: GCP Vertex AI: System Setup The project is developed using the latest version of Alphabet/Google cloud platform, LangChain, and LangChain – GCP Vertex AI Interface. • google-cloud-aiplatform: Version 1.44.0 (latest 03/20/2024) • LangChain: Version 0.1.12 (latest 03/20/2024) • langchain-google-vertexai: Version 0.1.1 (latest 03/20/2024) IMPORTANT NOTES: --) It takes time for the code to set up the system, maybe between 10 – 20 minutes. --) It requires to restart the kernel after the installation of all the modules have been completed so that the latest modules are used. • To ensure that all the installation has been done: • The green check mark to the left of the Jupyter Notebook cell shows up. 12: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 1: Q&A – Search Systems: GCP Vertex AI: System Setup The project is developed using the latest version of Alphabet/Google cloud platform, LangChain, and LangChain – GCP Vertex AI Interface. • google-cloud-aiplatform: Version 1.44.0 (latest 03/20/2024) • LangChain: Version 0.1.12 (latest 03/20/2024) • langchain-google-vertexai: Version 0.1.1 (latest 03/20/2024) 13: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 2: Q&A – Search Systems: Restart Jupyter Notebook Kernel Restart the kernel after the installation of all the modules have been completed so that the latest modules can be accessed. IMPORTANT NOTES: • All the installations MUST be done before running the following code to restart the kernel: • The green check mark to the left of the Jupyter Notebook cell shows up. • Then, MUST wait until the restart process completes before continuing. 14: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 3: Q&A – Search Systems: Authenticate Colab Account Run the following piece of code to authenticate the Google Colab account, including the Google account (GMAIL), the GCP project, and more. 15: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 4: Q&A – Search Systems: Download Custom Python Modules The project employs GCP: Vertex AI’s AI Matching Engine, a proprietary powerful technology used for generative AI search engines based on advanced vector embeddings technologies, also developed by Alphabet/Google. The system requires special Python modules proprietarily developed by GCP, too. The Q&A – Search system needs these special Python modules downloaded with the following code. 16: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 5: Q&A – Search Systems: Import Libraries The Q&A – Search system requires various sw libraries, including those of Python, Vertex AI systems, and LangChain. It is necessary to import these libraries with the following code. 17: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 6: Q&A – Search Systems: Specify the GCP Region The project employs GCP: Vertex AI SDK that must run at a specified GCP cloud region. Currently, GCP Vertex AI SDK is available in some GCP cloud regions, but not all. To be safe, choose the GCP region of us-central1. Use the following code to specify the GCP project (with project ID – NOT project name) and the region. 18: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 7: Q&A – Search Systems: Define Class CustomVertexAIEmbeddings The Q&A – Search system: Create a Python class based on the LangChain’s wrapper class of GCP Vertex AI Embedding API. This class, CustomVertexAIEmbeddings, handles vector embeddings using GCP: Vertex AI services and technologies. Use the following code to define the class. 19: Generative AI: LLM: Knowledge Base: QA & Search Phase I - Step 8: Q&A – Search Systems: Create LLM instance and Embeddings Instance The Q&A – Search system: Create an LLM instance of GEMINI model. Also, create an instance of class CustomVertexAIEmbeddings to handle vector embeddings. Use the following code to create the above instances.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

TITLE NAME
REGISTRATION
COURSE
DATE

Technologies Enabling the Advent and Advances of Generative AI
Generative AI has emerged as a transformative pass, capable of producing practical content
inside the course of severa modalities, consisting of textual content, photos, audio, and video.
This incredible achievement is the culmination of a long time of research and upgrades in several
key era that have laid the inspiration for generative AI. In this essay, we're able to find out the
pivotal technology which have contributed to the appearance and development of generative
artificial intelligence.
Deep Learning and Neural Networks:
Deep learning, a subfield of system gaining knowledge of inspired through the shape and feature
of the human mind, has performed a vital role inside the development of generative AI. Neural
networks, mainly deep neural networks with multiple layers, have examined fantastic talents in
recognizing patterns, extracting capabilities, and learning complicated representations from
massive portions of data (Vaswani,2017). These enhancements have enabled the introduction of
effective generative fashions capable of synthesizing novel information that cautiously resembles
the education facts.
Transformer Architecture:
The transformer structure, added in 2017, revolutionized the sector of natural language
processing (NLP) and ultimately impacted different domains, which consist of computer vision
and generative AI. This structure is primarily based on self-interest mechanisms, permitting it to
capture prolonged-variety dependencies and contextual facts more correctly than conventional
recurrent neural networks. The transformer shape has been instrumental in the improvement of

huge language fashions (LLMs) and multimodal generative fashions, facilitating the era of
coherent and contextually relevant content material fabric fabric.
Large Language Models (LLMs):
LLMs, which includes GPT (Generative Pre-knowledgeable Transformer) and BERT
(Bidirectional Encoder Representations from Transformers), have completed a pivotal feature in
advancing generative AI. These are educated on massive quantities of textual records, letting
them boom a complete knowledge of language and seize intricate styles and relationships. LLMs
have validated wonderful abilities in textual content era, translation, summarization, and
question-answering, paving the manner for more superior generative AI applications.
Generative Adversarial Networks (GANs):
GANs, delivered in 2014, have been instrumental in the field of generative AI, mainly for photo
and video era. These include two neural networks: a generator that creates synthetic information,
and a discriminator that evaluates the authenticity of the generated records (Goodfellow,2014).
Through a hostile training manner, the generator learns to supply more and more sensible and
diverse statistics, while the discriminator will become higher at distinguishing actual from faux
statistics. GANs have enabled the era of enormously practical snap shots, videos, and even threeD models.
Variational Autoencoders (VAEs):
VAEs are another type of generative version that has contributed to the development of
generative AI (Kingma,2013). These fashions discover ways to encode input facts right into a
decrease-dimensional latent area and then decode the latent representations returned into the
unique statistics distribution. VAEs were mainly beneficial in producing diverse and novel

records samples, in addition to permitting interpolation and manipulation of latent
representations, main to exciting applications in areas along with photo enhancing, fashion
switch, and information augmentation.
Diffusion Models:
Diffusion models, an incredibly new elegance of generative models, have won enough attention
for their capability to generate first rate images and audio. These models work by way of
regularly including noise to the input statistics and then mastering to reverse the system,
successfully removing the noise and reconstructing the authentic records (Abbeel,2020).
Diffusion models have confirmed dazzling outcomes in generating sensible and various snap
shots, making them a promising road for destiny research in generative AI.
Multimodal Learning:
Generative AI has also benefited from advancements in multimodal studying, which involves the
integration of multiple modalities, together with text, pix, and audio. This has opened up
interesting possibilities for programs like ...


Anonymous
I was struggling with this subject, and this helped me a ton!

Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Related Tags