Access over 20 million homework & study documents

Transactional Data and Landing Area Paper

Content type
User Generated
Subject
Computer Science
Type
Essay
Rating
Showing Page:
1/3
Slide 3
So when the data comes from the different sources it’s a raw data or we can also say it’s transactional
data which will a lot irrelevant data. So to create some information from the raw data and to organize
the data in the structured form we use ETL( so for example I a company where they are getting data
from various sources, the data will be scattered and ETL will help you to gather all that transactional
data from various source to load into you database or datawarehouse in a structured way). It will help
you to analyze data for critical business decision and to ans business situations and questions.
Slide 4
Landing Area is something where the data is first loaded. This landing area will be table(s) in some
database that holds the source data as is. Not many transformations is applied to data while being
loaded into landing area.
The staging area contains data to which the application logic is applied. Data from a single table or
multiple tables from landing area can be combined to create table(s) in the staging area. The data is
transformed, aggregated, filtered, and all the business logic applied before loading the data into staging
area. Sometimes the landing and staging are merged into one table.
The last stage is the data warehouse area is the place that business users and the reporting layer have
access to. The data in the datawarehouse is the final version of the data. Usually data from staging to
datawarehouse is passed with minimal transformation.
Slide 5
AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple
and cost-effective to categorize your data, clean it, enrich it, and move it reliably between
various data stores and data streams. It will run all your jobs on a serverless, fully managed,
Scale-out environment. So there are no resources to manage.
Slide 6
Less Hassel: AWS glue is very easy to integrate with other service in AWS. Like for example It
can work on data that is available in AWS Aurora or in RDS or in Amazon redshift or in S3
buckets and also databased that are running in VPC running on EC2 instance.
Cost Effective: As I said in my previous slide that AWS is serverless. Therefore there is no
infrastructure to provision or manage. AWS itself provision, configure and the resources that are
require to run ETL jobs on fully managed environment. So you only pay for the resources that
used to run a specific ETL job.
More Power: AWS GLUE powerful in a such way that it automatically identifies data sources,
data formats, and suggests schemas and transformations. AWS Glue automatically generates the
code to execute your data transformations and loading processes which is generally a python or
scala based code.
So overall with AWS GLUE you can serverless queries, you can create ETL-JOBS, IT will help you
Schedule the jobs as well will execute code.

Sign up to view the full document!

lock_open Sign Up
Showing Page:
2/3
Slide 7
AWS DATA log:
The AWS Glue Data Catalog is your persistent metadata store. It is a managed service
that lets you store, annotate, and share metadata in the AWS Cloud.
Each AWS account has one AWS Glue Data Catalog per AWS region. It provides a
uniform repository wheresystems can store and find metadata to keep track of data and
use that metadata to query and transform the data.
We have to use AWS Identity and Access Management (IAM) policies to control access
to the data sources managed by the AWS Glue Data Catalog. These policies allow
different services to interact with each other. IAM policies let you clearly and
consistently define which users have access to which data, regardless of its location.
the AWS Glue Data Catalog can access:
Amazon Athena
Amazon Redshift Spectrum
Amazon EMR
AWS Crawlers and Classifier
AWS Glue also lets you set up crawlers that can scan data in all kinds of repositories, classify it,
extract schema information from it, and store the metadata automatically in the AWS Glue Data
Catalog. From there it can be used to guide ETL operations.
The AWS Glue Jobs System
The AWS Glue Jobs system provides managed infrastructure to orchestrate your ETL
workflow. You can create jobs in AWS Glue that automate the scripts you use to extract,
transform, and transfer data to different locations. Jobs can be scheduled and chained,
or they can be triggered by events such as the arrival of new data.
You use the AWS Glue console to define and orchestrate your ETL workflow. The
console calls several API operations in the AWS Glue Data Catalog and AWS Glue
Jobs system to perform the following tasks:
Define AWS Glue objects such as jobs, tables, crawlers, and connections.

Sign up to view the full document!

lock_open Sign Up
Showing Page:
3/3

Sign up to view the full document!

lock_open Sign Up
Unformatted Attachment Preview
Slide 3 So when the data comes from the different sources it’s a raw data or we can also say it’s transactional data which will a lot irrelevant data. So to create some information from the raw data and to organize the data in the structured form we use ETL( so for example I a company where they are getting data from various sources, the data will be scattered and ETL will help you to gather all that transactional data from various source to load into you database or datawarehouse in a structured way). It will help you to analyze data for critical business decision and to ans business situations and questions. Slide 4 Landing Area is something where the data is first loaded. This landing area will be table(s) in some database that holds the source data as is. Not many transformations is applied to data while being loaded into landing area. The staging area contains data to which the ...
Purchase document to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Anonymous
Awesome! Perfect study aid.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Documents