Backup, Disaster Recovery
Plan(DRP) and Business
Continuity Plan(BCP)
IST 8100
N V Raman
Data Backup
Why backup
Recover
from loss of data due to corruption,
hacking, disaster
What to backup
Databases, files, applications
Image
of the entire system in a sever
Full, Incremental & Differential
Backups
Daily Events
Full
Differential
Incremental
Monday: Full Backup
Monday
Monday
Monday
Tuesday: A Changes
Tuesday
Saves A
Saves A
Wednesday: B Changes
Wed’day
Saves A + B
Saves B
Thursday: C Changes
Thursday
Saves A+B+C
Saves C
Friday: Full Backup
Friday
Friday
Friday
If a failure occurs on Thursday, what needs to be
reloaded for Full, Differential, Incremental?
Which methods take longer to backup? To reload?
Tape, hard drive, cloud backup
Cloud back up can create a DR system
Backup Rotation:
Grandfather/Father/Son
Grandfather
Dec ‘2013
Jan ‘2014
Feb ‘2014
Mar ‘2014
Father
April 30
May 6
May 13
May 20
graduates
Son
May 21 May 22 May 23 May 24 May 25 May 26 May 27
Frequency of backup = daily, 3 generations
Backup Labeling
Data Set Name = Master Inventory
Volume Serial # = 14.1.24.10
Date Created = Jan 24, 2014
Accounting Period = 3W-1Q-2014
Offsite Storage Bin # = Jan 2014
Backup could be disk…
Backup & Offsite Library
Backups are kept off-site (1 or more)
Off-site is sufficiently far away (disasterredundant)
Library is equally secure as main site;
Library has constant environmental control
(humidity-, temperature-controlled, UPS,
smoke/water detectors, fire extinguishers)
Detailed inventory of storage media & files is
maintained
Why DRP?
What are the Threats?
When developing a disaster recovery plan you need to address more than
just hurricanes, earthquakes and fires as causes of outages.
A/C Failure
Acid Leak
Asbestos
Bomb Threat
Bomb Blast
Brown Out
Burst Pipe
Cable Cut
Chemical Spill
CO Fire
Condensation
Construction
Coolant Leak
Cooling Tower Leak
Corrupted Data
Diesel Generator
Earthquake
Electrical Short
Epidemic
Evacuation
Explosion
Fire
Flood
Fraud
Frozen Pipes
Hacker
Hail Storm
Halon Discharge
Human Error
Humidity
Hurricane
HVAC Failure
H/W Error
Ice Storm
Insects
Lightning
Logic Bomb
Lost Data
Source: Contingency Planning Research, Inc.
Low Voltage
Microwave Fade
Network Failure
PCB Contamination
Plane Crash
Power Outage
Power Spike
Power Surge
Programmer Error
Raw Sewage
Relocation Delay
Rodents
Roof Cave In
Sabotage
Shotgun Blast
Shredded Data
Sick building
Smoke Damage
Snow Storm
Sprinkler Discharge
Static Electricity
Strike Action
S/W Error
S/W Ransom
Terrorism
Theft
Toilet Overflow
Tornado
Train Derailment
Transformer Fire
UPS Failure
Vandalism
Vehicle Crash
Virus
Water (Various)
Wind Storm
Volcano
IT DR Plan vs BCP
Policy
BCP
Management
IT Infrastructure
DR
IT DR Plan vs BCP
BCP
DR Plan
Creating a DR Plan -Elements of a DR Plan
Risk
Identification – risk register and matrix
Assess Vulnerability to these risks – Business Impact Analysis
Determine impact on the business - Business Impact Analysis
Identify critical business functions/IT services – service
categories and technology dependency mapping
Design and implement mitigation strategies – putting the
capability in place
Agree activation plans - write the runbook
Testing –agree testing ; documentation and KPIs
Ongoing changes and maintenance –Keeping the DR Plan up to
date
Probability/Impact
Risk Score for a Specific Risk
Probability
0.9
0.7
0.5
0.3
0.1
0.90
0.70
0.50
0.30
0.10
1.00
Risk Score = P x I
1.80 2.70 3.60
1.40 2.10 2.80
1.00 1.50 2.00
0.60 0.90 1.20
0.20 0.30 0.40
2.00 3.00 4.00
Impact
4.50
3.50
2.50
1.50
0.50
5.00
Risk Response Template
Project Name:
Prepared by:
Date:
Identified
Risk
Statement
of Impact
Probability Risk
Impact
of
priority
rating occurring
number
A
B
AxB
0
0
0
0
0
0
0
0
0
Mitigation
action
Business Impact Analysis
• Identify main business functions
• Identify major activities of each function
• Identify dependencies for all major
activities
• Quantify consequence from the loss of
prerequisites
• IT Perspective of BIA
- Driven by the business perspective of the
“must-have” applications
Business Impact Analysis
• Categorize Applications
- Essential to organization’s ability to operate
- Significantly reduces the organization’s
capabilities
- Useful, but not important in the short term
• Restore based upon categories
Business Impact Analysis
Example of State Government
• 1 CRITICAL
- Loss of this business function threatens the ability for the state to operate. Loss
of business function disrupts the security and well being of the state.
• 2 SIGNIFICANT
- Loss of these business functions significantly reduces the effectiveness of the
states operations. Loss of business function has a negative citizen impact and
affects the financial well being of the state
• 3 MODERATE
- Loss of business function affects multiple state agencies/school districts and their
ability to operate. Loss of business function has a negative citizen impact
• 4 LIMITED
- Loss of these business function is limited to only the person or department using
the application. Loss of this business function has little or no effect on the states
ability to carry on business
• 5 MINIMAL
- Loss of business function does not have a direct impact on the department's
ability to do business
Business Impact Analysis
Online Sale of Licenses
Supporting Application
Number of Users
DNREC eGov Application
40,000
Impact Rating
2 days
1 week
1 month
3 month
hing & Hunting licenses,
f boat registration online to
5
5
4
rk passes, surf fishing licenses,
handise online
DNREC eGov Application
10,000
5
5
4
f Waste Water Licenses Online DNREC eGov Application
1,000
5
5
4
DNREC eGov Application
10,000
5
5
5
DNREC eGov Application
1,000
5
3
3
tdoor Delaware magazine
n Asbestos Report Online
Recovery Strategies
• DR 1 - DR required at a min 150 mile
radius. Offsite redundancy required.
• DR 2 - DR required at a min 150 mile
radius. Offsite redundancy recommended.
• DR 3 - DR required. May be housed offsite at
DTI Data Center (under 150 mile radius) or other
facility.
• DR 4 - DR not required unless specified at the
department level.
• DR 5 - DR not required.
Recovery Strategies
Impact
Over 150
Level Redundancy mi.
1
Required
Required
2
Optional
Required
Under 150
mi.
3
Required
4
Optional
5
Business Continuity Plan(BCP)
Definition:
The Plan to assure that the capability exists to
continue essential business functions across a
wide range of potential
emergencies.
Essentially, the process of maintaining the essential
business of an organization in case of a disaster.
What is a BCP? Or What are the Aspects
of BCP?
ROADMAP FOR IMPLEMENTATION AND MANAGEMENT OF
THE CONTINUATION OF BUSINESS OPERATIONS
ENSURES COMMON UNDERSTANDING OF ROLES AND
RESPONSIBIITIES
EXPLAINS IMPORTANT PROCEDURES AND PROCESSES
REQUIRED FOR EMERGENCY MANAGEMENT
ACCREDITAION PROGRAM
Some Definitions
Business Continuity: Offer critical services in event of
disruption
Disaster Recovery: Survive interruption to IT systems
Alternate Process Mode: Service offered by backup
system
Disaster Recovery Plan (DRP): How to transition to
Alternate Process Mode
Restoration Plan: How to return to regular system
mode
Reciprocal Agreement with another organization
MAIN ELEMENTS OF BCP
MAIN STEPS
ALSO TO INCLUDE:
PROJECT INITIATION
BCP PLAN & PROCEDURES
SCOPE PROJECT
COMMUNICATIONS
IDENTIFICATION OF ESSENTIAL FUNCTIONS
VITAL RECORDS, SYSTEMS & EQUIPMENT
BIA
HUMAN CAPITAL
IDENTIFY PREVENTIVE CONTROLS
ALTERNATE FACILITIES
RECOVERY STRATEGY
ORDERS OF SUCCESSION
PLAN DESIGN AND DEVELOPMENT
DELEGATION OF AUTHORITY
IMPLEMENTATION TRAINING AND TESTING
DEVOLUTION PROCEDURES
MAINTENANCE
RECONSTITUTION
TESTS, TRAINING & EXERCISES
MAINTENANCE
MAIN STEPS IN BCP DEVELOPMENT
Develop contingency planning policy statement
A
formal policy provides authority and guidance to develop an effective
contingency plan
Conduct BIA –
Helps
to identify and prioritize critical It components;
Identify preventive controls
Measures
taken to reduce effects of system disruption can increase
system availability and reduce contingency life cycle costs
Develop recovery strategies
To
ensure system can be recovered quickly and effectively following
disruption
MAIN STEPS IN BCP DEVELOPMENT- contd.
Develop
an IT contingency plan
To
contain detailed guidance and procedures for restoring damaged
systems
Plan,
testing, training and exercises
Testing
identifies gaps; training prepares recovery personnel; both
improve agency preparedness
Plan
maintenance
Plan
should be a living document and must be updated to
remain current with system enhancement and organization
changes
BCP Goals
ENSURE CONTINUOUS PERFORMANCE OF ESSENTIAL FUNCTIONS
& OPERATIONS
PROTECT ESSENTIAL FACILITIES, EQUIPMENT, RECORDS, OTHER
ASSETS
REDUCE OR MITIGATE DISRUPTIONS TO BUSINESS OPERATIONS
MINIMIZE LOSS INJURY, DAMAGE, LOSS OF LIFE
ACHIEVE TIMELY & ORDERLY RECOVERY
RESUME CRITICAL SERVICE TO CUSTOMERS
PLAN FOR MANTAING FAMILY HARMONY OF STAFF
RECONSTITUTION
BCP Requirements
IDENTIFY & PRIORITIZE ESSENTIAL FUNCTIONS OF THE
ORGANIZATION
DETERMINE RESOURCES REQUIRED BY ESSENTIAL
FUNCTIONS
ENSURE STAFF ARE TRAINED AND FUNCTIONAL IN PRIMARY
AND ALTERNATE ROLES & DUTIES
ASSESS ABILITY TO QUICKLY MOVE FROM PRIMARY
LOCATION TO ALTERNATE LOCATION WITH 12 HOURS
NOTIFICATION
ASSESS ABILITY TO CONTINUE EFFECTIVE OPERATIONS AT
ALTERNATE LOCATION FOR UP TO 30 DAYS
IDENTIFY ESSENTIAL FUNCTIONS
▪
IDENTIFY ALL FUNCTIONS
✓
✓
✓
✓
IDENTIFY ESSENTIAL FUNCTIONS THAT PROVIDE VITAL SERVICES
PRIORATIZE THE FUNCTIONS IN ORDER OF CRITICALITY
PRIORITIZE ESSENTIAL FUNCTIONS BY:
TIME CRITICALITY: HOW LONG BEFORE LOSS OF FUNCTION ADVERSELY AFFECTS CORE
MISSION?
RECOVERY TIME OBJECTIVE: PERIOD WHEN SYSTEMS, PROCESSES, SERVICES, FUNCTIONS
MUST BE RECOVERED
CRITICAL SEQUENCE FOR RECOVERY
RELATED CRITICAL PROCESSES AND SERVICES
DETERMINE ESSENTIAL FUNCTION RESOURCE REQUIREMENTS
ARE THERE KEY PERSONNEL ASSOCIATED WITH CERTAIN CRITICAL
FUNCTIONS?
IDENTIFY ESSENTIAL
FUNCTIONS
HOW ESSENTIAL IS THE FUNCTION? LOSS WOULD
HAVE WHICH EFFECT?
CATASTROPHIC EFFECT ON ENTIRE ORGANIZATION AND
OTHER DEPTS
CATASTROPHIC EFFECT ON OWN DEPARTMENT ONLY
MODERATE EFFECT ON DEPARTMENT
MODERATE EFFECT ON SOME DIVISIONS
MINOR EFFECT ON DEPT OR SOME DIVS
IDENTIFY ESSENTIAL FUNCTIONS
HOW LONG COULD THE ORGANIZATION CONTINUE
OPERATIONS WITH THE LOSS OF THE FUNCTION?
CANNOT WITHSTAND ANY INTERRUPTION
A FEW HOURS
UP TO ONE DAY
UP TO TWO DAYS
THREE DAYS TO A WEEK
ONE TO TWO WEEKS
Recovery Time: Terms
Interruption Window: Time duration organization can wait
between point of failure and service resumption
Service Delivery Objective (SDO): Level of service in Alternate
Mode
Maximum Tolerable Outage: Max time in Alternate Mode
Disaster
Recovery
Plan Implemented
Regular Service
SDO
Alternate Mode
Time…
Interruption
Regular
Service
Interruption
Window
Maximum Tolerable Outage
Restoration
Plan Implemented
Recovery Point Objective
1
Week
1
Day
1
Hour
How far back can you fail to?
One week’s worth of data?
Interruption
RPO and RTO
Recovery Time Objective
1
1
Hour Day
1
Week
How long can you operate without a system?
Which services can last how long?
Alternate Recovery Strategies
Duplicate or Redundant Site: Standby hot site within the
organization which can resume operations instantaneously
without anybody noticing there was an interruption
Hot Site: Fully configured, ready to operate within hours
Warm Site: Ready to operate within days: no or low power
for main computers. Does contain disks, network,
peripherals.
Cold Site: Ready to operate within weeks. Contains
electrical wiring, air conditioning, flooring – no servers
Mobile Site: Fully- or partially-configured trailer comes to
your site, with microwave or satellite communications
Disruption vs. Recovery Costs
Service Downtime
Cost
*
Hot Site
*
Warm Site
Alternative Recovery Strategies
Minimum Cost
Time
*
Cold Site
Business Continuity Process
Perform Business Impact Analysis
Prioritize services to support critical business
processes
Determine alternate processing modes for
critical and vital services
Develop the Disaster Recovery plan for IS
systems recovery
Develop BCP for business operations recovery
and continuation
Test the plans
Maintain plans
An Incident Occurs…
Emergency Response
Team: Human life:
First concern
Call Security
Officer (SO)
or committee
member
Security officer
declares disaster
SO follows
pre-established
protocol
Phone tree notifies
relevant participants
Public relations
interfaces with media
(everyone else quiet)
Mgmt, legal
counsel act
IT follows Disaster
Recovery Plan
BCP Contents
Pre-incident readiness
How to declare a disaster
Evacuation procedures
Identifying persons responsible, contact
information
vendors, insurance, recovery facilities, suppliers,
offsite media, human relations, law enforcement
(for serious security threat)
Step-by-step procedures
Required resources for recovery & continued
operations
Concerns for a BCP/DR Plan
Evacuation plan: People’s lives always take first
priority
Disaster declaration: Who, how, for what?
Responsibility: Who covers necessary disaster
recovery functions
Procedures for Disaster Recovery
Procedures for Alternate Mode operation
Resource Allocation:
During recovery & continued
operation
Copies of the plan should be off-site
Disaster Recovery
Responsibilities
General Business
First responder:
Evacuation, fire, health…
Damage Assessment
Emergency Mgmt
Legal Affairs
Transportation/Relocation
/Coordination (people,
equipment)
Supplies
Salvage
Training
IT-Specific Functions
Software
Application
Emergency operations
Network recovery
Hardware
Database/Data Entry
Information Security
Contact information is
important!
BCP Documents
Focus:
Event
Recovery
IT
Disaster Recovery Plan Business Recovery Plan
Procedures to recover at
alternate site
Recover business after a
disaster
IT Contingency Plan:
Occupant Emergency Plan:
Recovers major
application or system
Protect life and assets during
physical threat
Cyber Incident
Response Plan:
Crisis Communication Plan:
Malicious cyber incident
Business
Continuity
Business
Provide status reports to public
and personnel
Business Continuity Plan
Continuity of Operations Plan
(COOP)
Longer duration outages
Disaster Recovery
Test Execution
Always tested in this order:
Desk-Based Evaluation/Paper Test: A
group steps through a paper procedure and
mentally performs each step.
Preparedness Test: Part of the full test is
performed. Different parts are tested
regularly.
Full Operational Test: Simulation of a full
disaster
Business Continuity Test Types
Checklist Review: Reviews coverage of plan – are all
important concerns covered?
Structured Walkthrough: Reviews all aspects of plan, often
walking through different scenarios
Simulation Test: Execute plan based upon a specific
scenario, without alternate site
Parallel Test: Bring up alternate off-site facility, without
bringing down regular site
Full-Interruption: Move processing from regular site to
alternate site.
Testing Objectives
Main objective: existing plans will result in
successful recovery of infrastructure & business
processes
Also can:
•
•
•
•
Identify gaps or errors
Verify assumptions
Test time lines
Train and coordinate staff
Testing Procedures
Develop test
objectives
Execute Test
Tests start simple and
become more challenging
with progress
Include an independent
3rd party (e.g. auditor) to
observe test
Retain documentation for
audit reviews
Evaluate Test
Develop recommendations
to improve test effectiveness
Follow-Up to ensure
recommendations
implemented
Test Stages
Pre-Test: Set the Stage
Set up equipment
Prepare staff
Test: Actual test
PreTest
Post Test: Cleanup
Returning resources
Calculate metrics: Time
required, % success
rate in processing, ratio of successful
transactions in Alternate mode vs. normal
mode
Delete test data
Evaluate plan
Implement improvements
Test
PostTest
Gap Analysis
Comparing Current Level with Desired Level
Which processes need to be improved?
Where is staff or equipment lacking?
Where does additional coordination need to
occur?
Insurance
Data Center &
Equipment
Data & Media
Employee
Damage
Business Interruption:
Valuable Papers &
Records: Covers cash
Fidelity Coverage:
Loss of profit due to IS
interruption
value of lost/damaged
paper & records
Loss from dishonest
employees
Extra Expense:
Media Reconstruction
Errors & Omissions:
Extra cost of operation
following Data Center
damage
Cost of reproduction of
media
Liability for error
resulting in loss to client
IS Equipment &
Facilities: Loss of Data
Media Transportation
Center & equipment due
to damage
Loss of data during xport
Auditing BCP
Includes:
Is BIA complete with RPO/RTO defined for all services?
Is the BCP in-line with business goals, effective, and current?
Is it clear who does what in the BCP and DRP?
Is everyone trained, competent, and happy with their jobs?
Is the DRP detailed, maintained, and tested?
Is the BCP and DRP consistent in their recovery coverage?
Are people listed in the BCP/phone tree current and do they have a
copy of BC manual?
Are the backup/recovery procedures being followed?
Does the hot site have correct copies of all software?
Is the backup site maintained to expectations, and are the
expectations effective?
Was the DRP test documented well, and was the DRP updated?
BUSINESS OPERATIONS
BCP Checklist
Compile a list of your company’s locations and the departments, people, IT and non-IT assets
within each of those locations.
Assess which dependencies feed into and are produced from your operational processes.
Consider any rules and regulations governing your business operations (Sarbanes Oxley, HIPPA,
FFIEC, etc.).
Determine the minimum level at which your business can operate, and then identify which
departments and/or processes need to be restored first after an interruption.
Evaluate the minimum resources needed to keep your critical processes running. Establish
your recovery time objectives (RTOs).
Create a list of any natural and everyday disasters that could affect your business.
Classify events as high, medium or low likelihood. Create fully detailed plans for highlikelihood events. As
event likelihood decreases, plans can become more general, but you should plan for every
possible event.
Confirm scope of insurance coverage.
Store an off-site copy of your business continuity plan in a secure, disaster-proof location.
Communicate your plans with vendors, suppliers, employees, partners, etc.
COST ANALYSIS
BCP Checklist
Quantify the potential costs of downtime or a total business failure.
Assess the cost of downtime per hour for each department.
Weigh the cost of downtime versus the cost of specific recovery solutions.
STAFF
Have cash on hand for emergency payroll.
Compile a list of employee cellphone numbers for emergency communication.
Appoint recovery team members and emergency communication points of contact for
each department. Assemble an emergency preparedness kit including the following:
• Employee information
• Supplier and vendor contact information
• Disaster recovery vendor contact information
• Flashlight
• First aid kit
• Battery-operated radio to stay updated on emergency situations Create and
communicate an evacuation plan.
Communicate safety tips and emergency shelter plans.
BCP Checklist
RECOVERY LOCATIONS
Plan for short-term and long-term alternate locations.
Determine if it’s necessary to remain local for your customers or if working
remotely is an option.
If working remotely, consider whether or not you will provide accommodations for
employees’ families. If considering mobile recovery, request a site survey to assess
potential deployment areas.
NETWORK RECOVERY
If Internet goes down, determine how your employees will gain access to the
network.
Assess the different available voice continuity solutions and determine which
works best for your business’s needs.
Determine what equipment will be necessary to access the network (laptops,
computers, printers, mice, monitors, etc.) and arrange for it to be delivered
within your RTO.
.
BCP Checklist
VOICE RECOVERY
Determine what communications solutions will work best for your
company to restore voice connectivity. Reroute phone lines to
seamlessly answer customer calls.
Arrange for communications equipment (VoIP phones, headsets, etc.)
to arrive within your RTO.
DATA BACKUP AND RECOVERY
Identify critical data that needs to be backed up.
Assess data backup solutions (tape, deduplication, cloud, etc.)
Determine where data will be stored off-site.
Determine backup intervals for both critical and noncritical data.
Arrange for quick ship of equipment on which you can access your
data.
BCP Checklist
TESTING
Perform ongoing evaluations of business continuity
procedures to test for suitability, adequacy and
effectiveness.
Schedule tests at least once a year (be sure to include
upper management and all critical
departments/employees in any DR tests).
Ask for employee feedback following the test.
Assess the results of the tests and adjust your DR plan
accordingly.
Communicate any changes to the DR point people
throughout the company.
Purchase answer to see full
attachment