Sentiment analysis

Computer Science

MSc Artificial Intelligence

Bournemouth University

Question Description

Prepare the project report and put code in the jupyter notebook for the sentiment analysis . Work on 3 techniques: Support Vector machines, Decision tree, Naive bayes. Please check for Harward plagiarism as our university uses this checker for plagiarism.

Unformatted Attachment Preview

Artificial Intelligence Project Proposal: Sentiment Analysis What is the problem to be solved? Since there is so much of information available publicly or privately over the internet are growing constantly, a large number of texts expressing opinions are available in forums, review sites, blogs and social media. Sentiment analysis helps us in understanding the unstructured information that could be automatically transformed into structure data of public opinions regarding products, services, politics, brands or any topic that people can express opinions about. Sentiment Analysis is an automated process of understanding an opinion about a given subject from a text. Sentiment Analysis system helps in determining whether a piece of writing is positive, negative or neutral. System for text analysis combines Natural Language Processing (NLP) and machine learning techniques to assign weighted sentiment scores to the entities, topics, themes and categories this with a sentence or phase. Why are you interested in this particular problem? • • • • Being able to view the sentiment behind everything from blogs, reviews, posts to news articles means being able to strategies and plan for the future. Sentiment Analysis helps business process huge amounts of data in an efficient and cost-effective way. Extracting the data from Social websites is widely practiced and adopted across the world by organizations across the world. It can be an essential part of market research and customer service support. Not only we can see people thoughts about products or services, we can see what the customers are thinking about our competitors too. Real-Time analysis Sentiment analysis can identify critical issues in real-time, Sentiment models can help us immediately identify and take action right away. Does the problem need datasets to be available? If so, which dataset is to be used? To Solve the above-mentioned problem the datasets are crucial and we are taking the datasets from multiple sources like: 1. IMDB Movie Reviews Dataset 2. Amazon Product Dataset 3. Twitter US Airline Sentiment Which Approach is appropriate for solving that problem? Please describe exactly the steps i.e. How you’re going to deal with problem in hand? Approach: In order to deliver this project, the following models are planned: 1. Natural Language Processing: Natural language processing is used to get accuracy on the sentiment analysis. Natural language processing plays a crucial role in communication between the computer and humans. The sentiment analysis is like a bag of words that need to differentiate if it’s the negative or the positive, with the help of NLP we are going to find if it’s a negative or positive one. In online we can see a lot of non-grammar sentences and many short reviews which are very hard to understand, with the help of the NLP technique we can minimize the time of reviewing the sentence. 2. Machine learning Techniques: There are several machine Learning Techniques can be used, but we are focused on three techniques: a. Naive Bayes: Naive Bayes is very easy to understand and very simple to implement, but its very effective for sentiment analysis. This classification algorithm is used for the documentation level. The basic idea is to find the probability of words and categories. This technique is very fast when taking decisions computationally. b. Support Vector Machine: SVM is an advanced technique in machine learning algorithms. It is one of the perfect techniques for sentiment analysis. SVM is suitable for learning larger patterns. If SVM finds a new pattern it updates the training dataset dynamically. The Technique is used to maintain the risk minimization for the lowest probability of errors. c. Decision Tree: The decision tree algorithm is like a tree-based approach that consists of child and root node which focuses on the target value. The decision tree algorithm is like a flow chart structure, where each and every internal node denotes a test on an attribute and every branch represents an outcome of the test and leaf node represents child node or class distribution. With the help of these techniques, we are going to build our project for various datasets. Which algorithms are planned for the application? We will be performing sentiment analysis using these following algorithms: • Support Vector machines. • Naïve-Bayes. • Decision Tree. • Natural language processing and we will also put our efforts in creating a Word Cloud of sentiment analysis, as well as trying other more techniques. Which Quality measures are to be used to evaluate the algorithms? Accuracy, precision and recall are standard metrics used to evaluate the performance of a classifier. Accuracy measures how many texts were predicted correctly (both as belonging to a category and not belonging to the category) out of all of the texts in the corpus. Precision measures how many texts were predicted correctly as belonging to a given category out of all of the texts that were predicted (correctly and incorrectly) as belonging to the category. Recall measures how many texts were predicted correctly as belonging to a given category out of all the texts that should have been predicted as belonging to the category. We also know that the more data we feed our classifiers with; the better recall will be. Below are the so far quality improvement measures observed for different models: Support Vector Machines: • • • There are linear and non-linear kernels available, we choose them as per the need of particular application. RBF kernel is a good choice if our data is not structured. Reviewing the training instances when data is unbalanced. Changing the cross-validation cross function. Make sure inputs are standardized properly. Optimize both gamma and cost using “Grid Search” algorithm. Naïve Bayes: • • • • Naïve Bayes can handle missing data. Use log probabilities. Calculate frequency for each observation if the data is categorical and if the data is realvalued attributes we can summarize the density of the attribute using a Gaussian distribution. Use probabilities for feature selection, segment the data, parallelize probability calculation, remove those features that are highly corelated. Decision Tree: • • • • By looking at the possible decisions on the tree for all Final Outcomes. Mark the decisions that carry a lot of risk and the decisions that have a low probability of a successful outcome. Look at the risky decision. Consider whether your business can tolerate the amount of risk. The decisions with the lowest chances of success are to be looked over and eliminate choices that carry a high cost or a lot of risk without a significant outcome. Select the path leading to a significant final outcome that has a high chance of succeeding. References: DataSet: https://analyticsindiamag.com/10-popular-datasets-for-sentiment-analysis/ https://www.sciencedirect.com/science/article/pii/S2090447914000550 June 2019 v1 ASSESSMENT TASK This assignment addresses all four Intended Learning Outcomes (ILOs) for this unit (see below). There are two parts to the submission (more on this below). 1. The source code (to solve a chosen problem) that you have implemented, to provide evidence of independent, technical, work (35% of total mark). 2. A technical report that covers the description of the problem, the methodology, and an empirical investigation (40% of total mark). A key aspect of this assessment is demonstrating the ability to perform a critical analysis and evaluation. This involves empirical experiments, evaluating the performance of artificial intelligence algorithms and, potentially, data processing techniques, depending on the problem at hand. You can choose to implement one of the algorithms we cover in the class or other algorithms as required by the targeted problem. In all cases, full details must be provided both in the documentation of the code and the report. The implementation must be in Python, Java or Matlab. You are given the opportunity to choose yourself: 1) The project you are interested in any AI applications: natural language processing and understanding, machine vision, speech recognition, robotics, intelligent agents, smart environments, etc. 2) Your teammates (3 people max) – please give a name to your team. While individual projects are possible, you are encouraged to join a team. Projects developed by one person will be evaluated on that basis, but still according to the same assessment criteria. You are asked to propose a project idea by following the traditional workflow: What is the problem to be solved? Why are you interested in this particular problem? Does the problem need datasets to be available? If so, which dataset is to be used? Which approach is appropriate for solving that problem? Please describe exactly the steps i.e. how you are going to deal with the problem at hand. Which algorithms are planned for the application? Which quality measures are to be used to evaluate the algorithms? Page 1 of 4 June 2019 v1 If you find it challenging to come up with your own project idea, you will need to discuss with the Unit Leader (UL) for advice and potential ideas by arranging such a discussion any date before 30/03/2020. In all cases, please make sure to submit your proposal (a brief description that covers the questions mentioned earlier) as soon as you have made your choice, but the deadline for all submissions is 30/03/2020 (the latest). Please note that you can submit your proposal any time before this date from now on, so that you have more time to develop your project. Please note that for your guidance, a sample of datasets will be made available on Brightspace (under the “Assessment” option). To learn about these datasets, please read the corresponding documentation (potentially the “Readme.txt” file once you have downloaded it). You can use these datasets or propose others, depending on the idea you are exploring in your project. Deliverables 1. D1: The proposal to be submitted that includes a brief description that covers the questions mentioned earlier. Deadline is 30/03/2020. Please name your proposal using the name of your team “TeamName_AI.doc” (or .pdf) and submit through Turnitin (first box). 2. D2: A detailed report that contains the following sections: Front matter, Problem definition, Methodology (all steps), Experiments & discussion, Conclusion and references. Please name it Report_AI.doc (or .pdf). 3. D3: Working and well-documented code. Please zip it and name it Code_AI.zip or Code_AI.rar. If you are using tools to develop your application, please explain exactly what, how, which parameters, etc. so that your results can be reproduced. Submit that description as a separate document and name it Code_AI.doc (or .pdf). 4. D4: A five-minute video, where you discuss your role in the group, your contribution to the final submission and the steps that you followed for completing the tasks. This is relevant only for a project team of at least 2 persons. You can either: a. make the videos of every group member accessible from outside Turnitin (using external drives like google drive). Then, please make sure to indicate the links in the project report. b. Combine it with the rest of the deliverables and submit (see Section Submission format below). 5. D5: Powerpoint presentation + Demo of the project: duration 20 min + 5 min questions. [this is not meant to be submitted]. The date of the presentation will be communicated in due time. Please note that there is no limit on the word count for both the proposal and the final report. All of these reports will be evaluated based on their content and not their length. But given the fact that this is a team project and for your guidance, you may try to go for about 1500 words for the proposal and about 3000 words for the final report. In the case of individual projects, you may limit your proposal and final report to 800 and 1500 words respectively. Please note that the presentation, code and video do not count towards the number of words. SUBMISSION FORMAT Except for the proposal, the rest of the deliverables should be submitted through Turnitin (large file box). Once you have all submission elements, please zip all of them in one file and name it: “Teamname_AI.zip” and upload in Turnitin. MARKING CRITERIA As noted above, there are three parts to this assessment: the technical report of an empirical investigation, the source code and the final presentation. The following criteria will be used to assess the assignment: Page 2 of 4 June 2019 v1 Criteria Quality of the report: - Complexity of the project - Clear presentation - Critical evaluation - Conclusions and future improvements - Completeness Quality of the code which covers the following elements: - Completeness (all expected steps and functionalities must implemented) - Correct execution of the code - Documentation of the code - Demo (part of the presentation) will count towards this criterion Quality of the presentation delivery - Delivery - Demo - Questions Mark 40% ILO(s) 1,2,3 35% 3,4 25% 1,2,3,4 be To get “pass”: you have to observe the conjunction of the following elements: • Submit all deliverables. • Define a project of reasonable complexity. • Provide a decent report and fully running code (maybe using existing AI libraries). • Deliver a good presentation and answer most questions, showing a good understanding of all facets of the project. • Active participation in the development of the project. To achieve higher mark: you will need to: • Submit all deliverables. • Define a project of a good level of complexity. • Provide an excellent report with details and fully running code with high quality, potentially most of the code is implemented by yourself. • Deliver a excellent presentation and answer all questions, showing excellent understanding of all facets of the project. • Active participation in the development of the project. Note please that in case the project is developed as a team, the members of that team may not necessarily get the same mark. It is based on the contribution and involvement in the execution of the project. LEARNING OUTCOMES Having completed this unit, the student is expected to: 1. 2. 3. 4. Demonstrate an understanding of the principal challenges involved in AI, the major research areas, and the overall historical development of the field. Compare and contrast techniques from the various sub-fields of AI. Demonstrate an understanding of the applicability and limitations of AI for problems in a realworld context. Implement a solution for a real-world problem using AI techniques and software tools. QUESTIONS ABOUT THE BRIEF This assignment will be discussed in class, where students are encouraged to ask questions for clarification. Feel free to use email when no lab session is scheduled between the time the questions arise and the submission deadline. Signature Marker Hamid Bouchachia Page 3 of 4 June 2019 v1 HELP AND SUPPORT • If a piece of coursework is not submitted by the required deadline, the following will apply: 1. If coursework is submitted within 72 hours after the deadline, the maximum mark that can be awarded is 40%. If the assessment achieves a pass mark and subject to the overall performance of the unit and the student’s profile for the level, it will be accepted by the Assessment Board as the reassessment piece. The unit will count towards the reassessment allowance for the level; This ruling will apply to written coursework and artefacts only; This ruling will apply to the first attempt only (including any subsequent attempt taken as a first attempt due to exceptional circumstances). 2. If a first attempt coursework is submitted more than 72 hours after the deadline, a mark of zero (0%) will be awarded. 3. Failure to submit/complete any other types of coursework (which includes resubmission coursework without exceptional circumstances) by the required deadline will result in a mark of zero (0%) being awarded. The Standard Assessment Regulations can be found on Brightspace. • If you have any valid exceptional circumstances which mean that you cannot meet an assignment submission deadline and you wish to request an extension, you will need to complete and submit the Exceptional Circumstances Form for consideration to your Programme Support Officer (based in C114) together with appropriate supporting evidence (e.g, GP note) normally before the coursework deadline. Further details on the procedure and the exceptional circumstances form can be found on Brightspace. Please make sure that you read these documents carefully before submitting anything for consideration. For further guidance on exceptional circumstances please see your Programme Leader. • You must acknowledge your source every time you refer to others’ work, using the BU Harvard Referencing system (Author Date Method). Failure to do so amounts to plagiarism which is against University regulations. Please refer to http://libguides.bournemouth.ac.uk/bu-referencing-harvardstyle for the University’s guide to citation in the Harvard style. Also be aware of Self-plagiarism, this primarily occurs when a student submits a piece of work to fulfill the assessment requirement for a particular unit and all or part of the content has been previously submitted by that student for formal assessment on the same/a different unit. Further information on academic offences can be found on Brightspace and from https://www1.bournemouth.ac.uk/discover/library/using-library/howguides/how-avoid-academic-offences • Students with Additional Learning Needs may contact Learning Support on www.bournemouth.ac.uk/als • You should not be conducting any primary research (i.e. carrying out an investigation to acquire data first-hand, for example, where it involves approaching participants to ask questions or to participate in surveys, questionnaires, interviews, observations, focus groups, etc.) unless otherwise specified in the brief. However, if there is a genuine requirement to collect primary research data you will require ethical approval before doing so. In the first instance, please discuss with the Unit Leader. The collection of primary data without appropriate ethical approval is a serious breach of Bournemouth University’s Research Ethics Code of Practice and will be treated as Research Misconduct. Disclaimer: The information provided in this assignment brief is correct at time of publication. In the unlikely event that any changes are deemed necessary, they will be communicated clearly via e-mail and Brightspace and a new version of this assignment brief will be circulated. Page 4 of 4 ...
Student has agreed that all tutoring, explanations, and answers provided by the tutor will be used to help in the learning process and in accordance with Studypool's honor code & terms of service.

This question has not been answered.

Create a free account to get help with this and any other question!