Ashford University Web Page Scraping With R to Capture Content Paper

User Generated

oevgwbpne803

Programming

ashford university

Description

Web page scraping is a common way to collect data and search for particular items. Use R to capture the content of a job listing website of your choice such as Monster.com, Indeed.com, or any other relevant job listing website. Using R, capture the data for the postings that relate to cybersecurity for the last 30 days. In R, identify the frequency of the job listings that contain specific requirements for the CISSP® certification and the years of required experience.

In a Word document, clearly identify the site you scraped. Paste the R code you created to acquire, and identify the information relevant to the cybersecurity listings you identified from the site. Include a screen capture or other form of visual in the Word document that demonstrates the results of your R code. Submit the document to Waypoint for grading.

In your Word document,

  • Summarize the results for the elements you acquired from the job site. Be sure to identify the website scraped.
  • Include the R code you developed to acquire and identify the data.
  • Demonstrate the results of the R code for the job elements identified by including a screen capture of the data.

The Web Page Scraping With R paper

  • Must include a separate title page with the following:
    • Title of Assignment
    • Student’s name
    • Course name and number
    • Instructor’s name
    • Date submitted

Unformatted Attachment Preview

Front cover Introduction to R in IBM SPSS Modeler Wannes Rosius Redpaper International Technical Support Organization Introduction to R in IBM SPSS Modeler October 2016 REDP-5388-00 Note: Before using this information and the product it supports, read the information in “Notices” on page v. First Edition (October 2016) This edition applies to Version 18, Release 03 of IBM SPSS Modeler (product number 5725-A65). This document was created or updated on October 13, 2016. © Copyright International Business Machines Corporation 2016. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi IBM Redbooks promotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Introduction to this paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Enabling the R nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2. R basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Getting started with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 3. The basics of R nodes in IBM SPSS Modeler . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 The R nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Simple R code example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2.1 modelerData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2.2 modelerDataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.3 modelerModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Some general remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Read data options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 4. Custom Dialog Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 About the Custom Dialog Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Custom dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Simple example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 20 20 21 21 Chapter 5. Tips and tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 R code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 ibmspsscf70 library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Some useful parts of R code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Custom Dialog Builder tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 How to save and share a custom dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Link to dialog and script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 What about SQL Pushback? Hadoop Pushback? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 What about real-time scoring? and IBM SPSS Modeler Solution Publisher? . . . . . . . . 5.5 More about the metadata in modeler and the consequences on R integration. . . . . . . 29 30 30 31 33 33 33 35 36 37 © Copyright IBM Corp. 2016. All rights reserved. iii iv Introduction to R in IBM SPSS Modeler Notices This information was developed for products and services offered in the US. This material might be available from IBM in other languages. However, you may be required to own a copy of the product or product version in that language in order to access it. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you provide in any way it believes appropriate without incurring any obligation to you. The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to actual people or business enterprises is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. © Copyright IBM Corp. 2016. All rights reserved. v Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks or registered trademarks of International Business Machines Corporation, and might also be trademarks or registered trademarks in other countries. developerWorks® IBM® IBM PureData® PureData® Redbooks® Redpaper™ Redbooks (logo) SPSS® WebSphere® ® The following terms are trademarks of other companies: Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. vi Introduction to R in IBM SPSS Modeler IBM REDBOOKS PROMOTIONS IBM Redbooks promotions Find and read thousands of IBM Redbooks publications Search, bookmark, save and organize favorites Get personalized notifications of new content Link to the latest Redbooks blogs and videos Download Now Android iOS Get the latest version of the Redbooks Mobile App Promote your business in an IBM Redbooks publication ® Place a Sponsorship Promotion in an IBM Redbooks publication, featuring your business or solution with a link to your web site. ® Qualified IBM Business Partners may place a full page promotion in the most popular Redbooks publications. Imagine the power of being seen by users who download millions of Redbooks publications each year! ibm.com/Redbooks About Redbooks Business Partner Programs THIS PAGE INTENTIONALLY LEFT BLANK Preface This IBM® Redpaper™ publication focuses on the integration between IBM SPSS® Modeler and R. The paper is aimed at people who know IBM SPSS Modeler and have only a very limited knowledge of R. Chapters 2, 3, and 4 provide you with a high level understanding of R integration within SPSS Modeler enabling you to create or recreate some very basic R models within SPSS Modeler, even if you have only a basic knowledge of R. Chapter 5 provides more detailed tips and tricks. This chapter is for the experienced user and consists of items that might help you get up to speed with more detailed functions of the integration and understand some pitfalls. Introduction to this paper Although there are several very good articles and blogs related to IBM SPSS Modeler, many people still struggle with both R and the integration between IBM SPSS Modeler and R. The goal of this paper is to help with this situation. At every point in the paper, we try to include R examples you can easily copy into the appropriate R node in SPSS Modeler. Unless specified otherwise, the code snippets are always based on the telco.sav data set which can be found in the demo folder of your SPSS Modeler installation. After the source node, attach a type node, and then the appropriate R node. However, sometimes there are just abstracts of code to show you the idea. We clearly indicate when the code is incomplete. You will find the code backs into several code frames throughout this document. Some useful web addresses to help you get started: 򐂰 Essentials for R - Installation Instructions https://github.com/IBMPredictiveAnalytics/R_Essentials_Modeler/releases/downloa d/18.0/SPSS_Modeler_R_Essentials_18.0_Installation_Doc_ML.zip 򐂰 IBM SPSS Modeler Extensions ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/18.0/en/ ModelerExtensions.pdf 򐂰 IBM developerWorks® web page IBM SPSS Predictive Analytics Downloads https://developer.ibm.com/predictiveanalytics/downloads/ 򐂰 IBM developerWorks blog post - SPSS Modeler and R integration - Getting started https://developer.ibm.com/predictiveanalytics/2014/11/25/spss-modeler-and-r-int egration-getting-started © Copyright IBM Corp. 2016. All rights reserved. ix Authors This paper was produced by the following author: Wannes Rosius is a data scientist based in Brussels, Belgium working for IBM within the center of excellence team of IBM predictive solutions. He has over a decade experience in data science across multiple industry sectors. He has experience in a wide variety of data science tools, including IBM SPSS, SAS, R, Python, and others. He holds Masters degrees in Mathematics and Statistics, and has an in-depth knowledge of applying data mining techniques. He is experienced in a wide range of industry application areas including customer churn, customer profitability, cross-selling, retail demand forecasting, fraud intelligence, CRM, econometric modelling, debt management, behavioral credit risk modelling, and site location. Thanks to the following people for their contributions to this project: Martin Keen, LindaMay Patterson International Technical Support Organization Now you can become a published author, too! Here’s an opportunity to spotlight your skills, grow your career, and become a published author—all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Comments welcome Your comments are important to us! We want our papers to be as helpful as possible. Send us your comments about this paper or other IBM Redbooks® publications in one of the following ways: 򐂰 Use the online Contact us review Redbooks form found at: ibm.com/redbooks 򐂰 Send your comments in an email to: redbooks@us.ibm.com 򐂰 Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 x Introduction to R in IBM SPSS Modeler Stay connected to IBM Redbooks 򐂰 Find us on Facebook: http://www.facebook.com/IBMRedbooks 򐂰 Follow us on Twitter: http://twitter.com/ibmredbooks 򐂰 Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 򐂰 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm 򐂰 Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html Preface xi xii Introduction to R in IBM SPSS Modeler 1 Chapter 1. System setup This chapter discusses setting up your system. It is assumed that you have a valid installation of IBM SPSS Modeler on your machine. For more installation topics, see the installation instructions. This chapter contains the following sections: 򐂰 Installing R 򐂰 Enabling the R nodes © Copyright IBM Corp. 2016. All rights reserved. 1 1.1 Installing R Depending on the version of your IBM SPSS Modeler, install the associated version of R as shown in Table 1-1. Table 1-1 SPSS Modeler version to R version link SPSS Modeler version R version and download link 16.02.15.2 Download R 2.15.2 for Windows https://cran.r-project.org/bin/windows/base/old/2.15.2/ 17.03.1 Download R 3.1.0 for Windows https://cran.r-project.org/bin/windows/base/old/3.1.0/ 17.13.1 Download R 3.1.0 for Windows https://cran.r-project.org/bin/windows/base/old/3.1.0/ 18.03.2 Download R 3.2.0 for Windows https://cran.r-project.org/bin/windows/base/old/3.2.0/ After you have downloaded and installed R, you have a working R instance on your computer. Similar to SPSS Modeler, you can have several versions of R installed on your computer without any problem. 1.2 Enabling the R nodes You need to install the IBM SPSS Modeler essentials for R. Perform the following steps: 1. Go to the SPSS Community Downloads page to find the essentials, at this web address: https://developer.ibm.com/predictiveanalytics/downloads/ 2. Select option 2 Get Essentials for SPSS and click Get R Essentials for SPSS Modeler. 3. Now you are at github. Select and download the Modeler 18 Essentials for R for your particular platform. If you require Essentials for R for earlier Modeler versions, the page provides links to older versions. 4. Execute the installation. The installation asks you the path of your R installation and the path to the bin files of your SPSS Modeler installation. Note: The prefilled path is the default path to a SPSS Modeler server. You need to change this path if you want to configure your client. This installation places the R nodes in your SPSS Modeler node palette and includes the necessary R libraries in your R installation folder. 2 Introduction to R in IBM SPSS Modeler 2 Chapter 2. R basics There are a wide variety of R courses publicly available through several channels. It is not our intend to replace these courses. You do not need to be an R expert to use this document. However, there are some basics of R code and R terminology you need to understand to exploit the integration of R and IBM SPSS Modeler. This chapter contains the following sections: 򐂰 Getting started with R © Copyright IBM Corp. 2016. All rights reserved. 3 2.1 Getting started with R Open R in its original graphical user interface (GUI), by going to the R installation folder and opening the \bin\x64\RGUI.exe file. Figure 2-1 shows the R console. Figure 2-1 R console The R console is ready to run commands. You might see the term RStudio, which is a development environment on top of this R GUI. You might prefer to use RStudio, which is a powerful and productive user interface for R. Installation of RStudio is not required for this introduction, but might be handy for future use. R is a powerful programming language and environment for statistical computing and graphics. R is a programming language, unlike IBM SPSS Modeler. It is built on objects that are defined by the user. Example 2-1 shows R code you can type in the R console to see the R outputs. Example 2-1 R code x
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached. Please let me know if you have any questions or need revisions.

Web Page Scraping for Job Listing On Indeed.Com Using R Language
Student’s Name
Course Name
Course Number
Instructor’s Name
January 7, 2021

Overview
In this task, we are going to use R programing language to web scrap for job listing on
the Indeed.com. The details of this task includes providing a summary of the results for the
elements acquired from Indeed.com, the R code used as well as the screen capture of the data.
Other details includes the frequency of the job listings that contain specific requirements for the
CISSP® certification and the years of required experience.
Data Scraping
Since our task require that we capture the data for the postings that relate to
cybersecurity, then we are going to use CISSP® certification as the keywords together with the
other requirements.
In R programming language, we are going to use rvest library to harvest data from the
website.
The url for desired website to be scraped is
https://www.indeed.com/jobs?q=CISSP®%20certification&fromage=30
Where the CISSP® certification and 30 is the keywords
The R Code

# loading the required packages
library(tidyverse)
library(rvest)
library(xml2)

url %
rvest::html_attr("title")
# get job location
page %>%
rvest::html_nodes(".location") %>%
rvest::html_text()
# get company name
page %>%
rvest::html_nodes(".company") %>%
rvest::html_text() %>%
stringi::stri_trim_both()

# Job Rating
page %>%
rvest::html_nodes(".ratingsContent") %>%
rvest::html_text() %>%
...


Anonymous
Great content here. Definitely a returning customer.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags