University of Nairobi Explain how Does the Newton Raphson Method Work Chapter Ques

User Generated

Nypbby

Programming

University of Nairobi

Description

Assignment

Instructions:

For this assignment submit a single file, MS WORD in addition to your Python codes with descriptions. The Python codes need to have comments. Please see the book for reference if needed.

Format Requirements for all assignments:

  • No page limits
  • Single spaced, 12-point or larger font size; 1-inch margins
  • Use headers and/or bullets to organize and convey key elements, and page numbers
  • Only Latin alphabet characters are allowed (i.e., do not include any words or phrases that contain non-English characters)
  • Comment all the codes provided and answer the questions in the document as needed

Requirements: As needed

Unformatted Attachment Preview

Hands-On Simulation Modeling with Python Develop simulation models to get accurate results and enhance decision-making processes Giuseppe Ciaburro BIRMINGHAM—MUMBAI Hands-On Simulation Modeling with Python Copyright © 2020 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Commissioning Editor: Sunith Shetty Acquisition Editor: Devika Battike Senior Editor: David Sugarman Content Development Editor: Joseph Sunil Technical Editor: Sonam Pandey Copy Editor: Safis Editing Project Coordinator: Aishwarya Mohan Proofreader: Safis Editing Indexer: Manju Arasan Production Designer: Joshua Misquitta First published: July 2020 Production reference: 1160720 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-83898-509-7 www.packt.com Packt.com Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website. Why subscribe? • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals • Improve your learning with Skill Plans built especially for you • Get a free eBook or video every month • Fully searchable for easy access to vital information • Copy and paste, print, and bookmark content Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details. At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. Contributors About the author Giuseppe Ciaburro holds a PhD in environmental technical physics, along with two master’s degrees. His research was focused on machine learning applications in the study of urban sound environments. He works at the Built Environment Control Laboratory at the Università degli Studi della Campania Luigi Vanvitelli, Italy. He has over 18 years’ professional experience in programming (Python, R, and MATLAB), first in the field of combustion, and then in acoustics and noise control. He has several publications to his credit. About the reviewers Greg Walters has been involved with computers and computer programming since 1972. He is well versed in Visual Basic, Visual Basic.NET, Python, and SQL, and is an accomplished user of MySQL, SQLite, Microsoft SQL Server, Oracle, C++, Delphi, Modula-2, Pascal, C, 80x86 Assembler, COBOL, and Fortran. He is a programming trainer and has trained numerous individuals in many pieces of computer software, including MySQL, Open Database Connectivity, Quattro Pro, Corel Draw!, Paradox, Microsoft Word, Excel, DOS, Windows 3.11, Windows for Workgroups, Windows 95, Windows NT, Windows 2000, Windows XP, and Linux. He is currently retired and, in his spare time, is a musician and loves to cook. He is also open to working as a freelancer on various projects. Yoon Hyup Hwang is a seasoned data scientist in the marketing and finance industries and is the author of two applied machine learning books. He has almost a decade of experience building numerous machine learning models and data science products. He holds an M.S.E. in Computer and Information Technology from the University of Pennsylvania and a B.A. in Economics from the University of Chicago. He enjoys training various martial arts, snowboarding, and roasting coffee. He currently lives in the Greater New York Area with his artist wife, Sunyoung, and a playful dog, Dali (named after Salvador Dali). Packt is searching for authors like you If you’re interested in becoming an author for Packt, please visit authors. packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Table of Contents Preface Section 1: Getting Started with Numerical Simulation 1 Introducing Simulation Models Introducing simulation models  4 Decision-making workflow Comparing modeling and simulation Pros and cons of simulation modeling Simulation modeling terminology 5 6 6 7 Classifying simulation models 9 Comparing static and dynamic models 9 Comparing deterministic and stochastic models 9 Comparing continuous and discrete models10 Approaching a simulationbased problem 10 Problem analysis Data collection Setting up the simulation model Simulation software selection Verification of the software solution Validation of the simulation model Simulation and analysis of results 11 11 11 13 14 15 16 Dynamical systems modeling 16 Managing workshop machinery Simple harmonic oscillator Predator-prey model 17 18 20 Summary22 2 Understanding Randomness and Random Numbers Technical requirements  Stochastic processes 24 24 Types of stochastic process 25 Examples of stochastic processes The Bernoulli process Random walk 26 26 27 The Poisson process 29 Random number simulation 30 Probability distribution Properties of random numbers 31 32 The pseudorandom number generator33 The pros and cons of a random number generator 33 Random number generation algorithms 34 Linear congruential generator 34 Random numbers with uniform distribution37 Lagged Fibonacci generator 39 Testing uniform distribution 42 The chi-squared test Uniformity test 42 45 Exploring generic methods for random distributions 51 The inverse transform sampling method 51 The acceptance-rejection method 52 Random number generation using Python 53 Introducing the random module The random.random() function The random.seed() function The random.uniform() function The random.randint() function The random.choice() function The random.sample() function Generating real-valued distributions 54 54 55 56 56 57 58 58 Summary 59 3 Probability and Data Generation Processes Technical requirements  62 Explaining probability concepts  62 Exploring probability distributions69 Understanding Bayes’ theorem 66 Probability density function Mean and variance Uniform distribution Binomial distribution Normal distribution Compound probability Bayes’ theorem Summary83 Types of events Calculating probability Probability definition with an example 62 63 63 66 68 Section 2: Simulation Modeling Algorithms and Techniques 70 71 72 76 79 4 Exploring Monte Carlo Simulations Technical requirements  88 Introducing Monte Carlo simulation88 Monte Carlo components First Monte Carlo application Monte Carlo applications  Applying the Monte Carlo method for Pi estimation 89 89 90 91 Understanding the central limit theorem96 Law of large numbers Central limit theorem  96 97 simulation101 Generating probability distributions Numerical optimization Project management Performing numerical integration using Monte Carlo Defining the problem Numerical solution Min-max detection Monte Carlo method Visual representation 101 102 103 104 104 106 108 109 111 Summary113 Applying Monte Carlo 5 Simulation-Based Markov Decision Processes Technical requirements  116 Overview of Markov processes 116 The Bellman equation explained138 The agent-environment interface Exploring MDPs Understanding the discounted cumulative reward  Comparing exploration and exploitation concepts 117 119 Multi-agent simulation 140 Summary142 Introducing Markov chains 124 Transition matrix Transition diagram Markov chain applications 122 123 125 126 127 Introducing random walks 127 Simulating a one-dimensional random walk129 Simulating a weather forecast 132 Dynamic programming concepts  Principle of optimality The Bellman equation 139 139 140 6 Resampling Methods Technical requirements  Introducing resampling methods  Sampling concepts overview Reasoning about sampling  Pros and cons of sampling Probability sampling How sampling works 144 144 145 146 146 147 147 Exploring the Jackknife technique148 Defining the Jackknife method 148 Estimating the coefficient of variation  150 Applying Jackknife resampling using Python 151 Demystifying bootstrapping  156 Introducing bootstrapping Bootstrap definition problem Bootstrap resampling using Python Comparing Jackknife and bootstrap 156 157 158 161 Explaining permutation tests 162 Approaching cross-validation techniques163 The validation set approach Leave-one-out cross validation K-fold cross validation Cross-validation using Python 163 164 165 165 Summary168 7 Using Simulation to Improve and Optimize Systems Technical requirements  Introducing numerical optimization techniques 170 170 Defining an optimization problem 171 Explaining local optimality  173 Defining the descent methods 174 Approaching the gradient descent algorithm174 Understanding the learning rate 177 Explaining the trial and error method 178 Implementing gradient descent in Python178 Facing the Newton-Raphson method183 Using the Newton-Raphson algorithm for root-finding  183 Approaching Newton-Raphson for numerical optimization Applying the Newton-Raphson technique  184 185 Deepening our knowledge of stochastic gradient descent 189 Discovering the multivariate optimization methods in Python191 The Nelder–Mead method 191 Powell's conjugate direction algorithm 195 Summarizing other optimization methodologies197 Summary198 Section 3: Real-World Applications 8 Using Simulation Models for Financial Engineering Technical requirements  Understanding the geometric Brownian motion model  202 202 Defining a standard Brownian motion 203 Addressing the Wiener process as random walk 204 Implementing a standard Brownian motion 205 Handling the stock price trend as time series213 Introducing the Black-Scholes model 215 Applying Monte Carlo simulation 216 Studying risk models for portfolio management 220 Using Monte Carlo methods for stock price prediction 207 Using variance as a risk measure 221 Introducing the value-at-risk metric 221 Estimating the VaR for some NASDAQ assets223 Exploring the Amazon stock price trend 208 Summary232 9 Simulating Physical Phenomena Using Neural Networks Technical requirements  234 Introducing the basics of neural networks  234 Understanding biological neural networks Exploring ANNs Splitting the data  256 Explaining multiple linear regression 258 Understanding a multilayer perceptron regressor model 260 235 236 Exploring deep neural networks264 Understanding feedforward neural networks 242 Exploring neural network training 243 264 265 266 Getting familiar with convolutional neural networks Examining recurrent neural networks Analyzing LSTM networks Summary267 Simulating airfoil self-noise using ANNs 244 Importing data using pandas Scaling the data using sklearn Viewing the data using matplotlib 246 249 252 10 Modeling and Simulation for Project Management Technical requirements  270 Introducing project management270 Addressing management problems using MDPtoolbox Changing the probability of fire 278 282 Understanding what-if analysis Scheduling project time using Monte Carlo simulation 284 271 Managing a tiny forest problem272 Summarizing the Markov decision process272 Exploring the optimization process 273 Introducing MDPtoolbox 275 Defining the tiny forest management example 275 Defining the scheduling grid 285 Estimating the task's time 286 Developing an algorithm for project scheduling287 Exploring triangular distribution 288 Summary294 11 What's Next? Summarizing simulation modeling concepts  295 Generating random numbers 296 Applying Monte Carlo methods 298 Addressing the Markov decision process299 Analyzing resampling methods 300 Exploring numerical optimization techniques302 Using artificial neural networks for simulation303 Applying simulation model to real life 304 Other Books You May Enjoy Leave a review - let other readers know what you think 317 Modeling in healthcare Modeling in financial applications Modeling physical phenomenon Modeling public transportation Modeling human behavior 304 305 306 307 308 Next steps for simulation modeling309 Increasing the computational power 309 Machine learning-based models  311 Automated generation of simulation models312 Summary313 Preface Simulation modeling helps you to create digital prototypes of physical models to analyze how they work and predict their performance in the real world. With this comprehensive guide, you’ll learn about various computational statistical simulations using Python. Starting with the fundamentals of simulation modeling, you’ll learn about concepts such as randomness and explore data generating processes, resampling methods, and bootstrapping techniques. You’ll then cover key algorithms such as Monte Carlo simulations and the Markov Decision Process, which are used to develop numerical simulation models, and discover how they can be used to solve real-world problems. As you make progress, you’ll develop simulation models to help you get accurate results and enhance decision-making processes. Using optimization techniques, you’ll learn to modify the performance of a model to improve results and make optimal use of resources. The book will guide you through creating a digital prototype using practical use cases for financial engineering, prototyping project management to improve planning, and simulating physical phenomena using neural networks. By the end of this book, you’ll be able to construct and deploy simulation models of your own to solve real-world challenges. Who this book is for Hands-On Simulation Modeling with Python is for simulation developers and engineers, model designers, and anyone already familiar with the basic computational methods that are used to study the behavior of systems. This book will help you explore advanced simulation techniques such as Monte Carlo methods, statistical simulations, and much more using Python. Working knowledge of the Python programming language is required. What this book covers Chapter 1, Introduction, analyzes the basics of numerical simulation and highlights the difference between modeling and simulation and the strengths of simulation models such as defects. The different types of models are analyzed, and we study practical modeling cases to understand how to elaborate a model starting from the initial considerations. viii Preface Chapter 2, Understanding Randomness and Random Numbers, defines stochastic processes and explains the importance of using them to address numerous real-world problems. The main methods for generating random numbers with practical examples in Python code, and the generation of uniform and generic distributions, are both explored. It also explains how to perform a uniformity test using the chi-square method. Chapter 3, Probability and the Data Generating Process, shows how to distinguish between the different definitions of probabilities and how they can be integrated to obtain useful information in the simulation of real phenomena. Chapter 4, Monte Carlo Simulations, explores techniques based on Monte Carlo methods for process simulation. We will first learn the basic concepts, and then we will see how to apply them to practical cases. Chapter 5, Simulation-Based Markov Decision Process, shows how to deal with decisionmaking processes with Markov chains. We will analyze the concepts underlying Markovian processes and then analyze some practical applications to learn how to choose the right actions for the transition between different states of the system. Chapter 6, Resampling Methods, shows how to apply resampling methods to approximate some characteristics of the distribution of a sample in order to validate a statistical model. We will analyze the basics of the most common resampling methods and learn how to use them by solving some practical problems. Chapter 7, Use of Simulation to Improve and Optimize Systems, shows how to use the main optimization techniques to improve the performance of our simulation models. We will see how to use the gradient descent technique, the Newton-Raphson method, and stochastic gradient descent. We will also see how to apply these techniques with practical examples. Chapter 8, Simulation Models for Financial Engineering, shows practical cases of using simulation methods in a financial context. We will learn how to use Monte Carlo methods to predict stock prices and how to assess the risk associated with a portfolio of shares. Chapter 9, Simulating Physical Phenomena with Neural Networks, shows how to develop models based on artificial neural networks to simulate physical phenomena. We will start by exploring the basic concepts of neural networks, and we will examine their architecture and its main elements. We will see how to train a network to update its weights. Preface ix Chapter 10, Modeling and Simulation for Project Management, deals with practical cases of project management using the tools we learned how to use in the previous chapters. We will see how to evaluate in advance the results of the actions undertaken in the management of a forest using Markov processes, and then move on to evaluating the time required for the execution of a project using the Monte Carlo simulation. Chapter 11, What’s Next?, provides a better understanding of the problems associated with building and deploying simulation models and additional resources and technologies to learn how to hone your machine learning skills. To get the most out of this book Working knowledge of Python programming language is required. If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code. Download the example code files You can download the example code files for this book from your account at www. packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/ support and register to have the files emailed directly to you. You can download the code files by following these steps: 1. Log in or register at www.packt.com. 2. Select the Support tab. 3. Click on Code Downloads. 4. Enter the name of the book in the Search box and follow the onscreen instructions. x Preface Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of: • WinRAR/7-Zip for Windows • Zipeg/iZip/UnRarX for Mac • 7-Zip/PeaZip for Linux The code bundle for the book is also hosted on GitHub at https://github.com/ PacktPublishing/Hands-On-Simulation-Modeling-with-Python. In case there’s an update to the code, it will be updated on the existing GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out! Download the color images We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/ files/downloads/9781838985097_ColorImages.pdf. Conventions used There are a number of text conventions used throughout this book. Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.” A block of code is set as follows: import random import statistics import matplotlib.pyplot as plt When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold: import random import statistics import matplotlib.pyplot as plt Preface xi Any command-line input or output is written as follows: $ python jakknife_estimator.py Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: “Select System info from the Administration panel.” Tips or important notes Appear like this. Get in touch Feedback from our readers is always welcome. General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com. Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details. Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material. If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors. packtpub.com. Reviews Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about Packt, please visit packt.com. Section 1: Getting Started with Numerical Simulation In this section, the basic concepts of simulation modeling are addressed. This section helps you to understand the fundamental concepts and elements of numerical simulation. This section contains the following chapters: Chapter 1, Introducing Simulation Models Chapter 2, Understanding Randomness and Random Numbers Chapter 3, Probability and Data Generating Processes 1 Introducing Simulation Models A simulation model is a tool capable of processing information and data and predicting the responses of a real system to certain inputs, thus becoming an effective support for analysis, performance evaluation, and decision-making processes. The term simulation refers to reproducing the behavior of a system. In general, we speak of simulation both in the case in which a concrete model is used and in the case in which an abstract model is used that reproduces reality using a computer. An example of a concrete model is a scale model of an airplane that is then placed in a wind tunnel to carry out simulated tests to estimate suitable performance measures. Although, over the years, physicists have developed theoretical laws that we can use to obtain information on the performance of dynamic systems, often, the application of these laws to a real case takes too long. In these cases, it is convenient to construct a numerical simulation model that allows us to simulate the behavior of the system under certain conditions. This elaborated model will allow us to test the functionality of the system in a simple and immediate way, saving considerable resources in terms of time and money. 4 Introducing Simulation Models In this chapter, we're going to cover the following main topics: • Introducing simulation models • Classifying simulation models • Approaching a simulation-based problem • Dynamical systems modeling Important Note In this chapter, an introduction to simulation techniques will be discussed. In order to deal with the topics at hand, it is necessary that you have a basic knowledge of algebra and mathematical modeling. Introducing simulation models Simulation uses abstract models built to replicate the characteristics of a system. The operation of a system is simulated using probability distributions to randomly generate system events, and statistical observations are obtained from the simulated system. It plays a very important role, especially in the design of a stochastic system and in the definition of its operating procedures. By not working directly on the real system, many scenarios can be simulated simply by changing the input parameters, thus limiting the costs that would occur if this solution were not used and, ultimately, reducing the time it would take. In this way, it is possible to quickly try alternative policies and design choices and model systems of great complexity by studying their behavior and evolution over time. Important Note Simulation is used when working on real systems is not convenient due to high costs, technical impossibility, and the non-existence of a real system. Simulation allows you to predict what happens to the real system if certain inputs are used. Changing these input parameters simulates different scenarios that allow us to identify the most convenient one from various points of view. Introducing simulation models 5 Decision-making workflow In a decision-making process, the starting point is identifying the problematic context that requires a change and therefore a decision. The context that's identified is then analyzed in order to highlight what needs to be studied for the decisions that need to be made; that is, those elements that seem the most relevant are chosen, the relationships that connect them are highlighted, and the objectives to be achieved are defined. At this point, a formal model is constructed, which allows us to simulate the identified system in order to understand its behavior and to arrive at identifying the decisions to be made. The following diagram describes the workflow that allows us to make a decision, starting from observing the problematic context: Figure 1.1 – Decision-making workflow This represents a way of spreading knowledge and involves various actors. Constructing a model is a two-way process: • Definition of conceptual models • Continuous interaction between the model and reality by comparison In addition, learning also has a participatory characteristic: it proceeds through the involvement of different actors. The models also allow you to analyze and propose organized actions so that you can modify the current situation and produce the desired solution. 6 Introducing Simulation Models Comparing modeling and simulation To start, we will clarify the differences between modeling and simulation. A model is a representation of a physical system, while simulation is the process of seeing how a model-based system would work under certain conditions. Modeling is a design methodology that is based on producing a model that implements a system and represents its functionality. In this way, it is possible to predict the behavior of a system and the effects of the variations or modifications that are made on it. Even if the model is a simplified representation of the system, it must still be close enough to the functional nature of the real system, but without becoming too complex and difficult to handle. Important Note Simulation is the process that puts the model into operation and allows you to evaluate its behavior under certain conditions. Simulation is a fundamental tool for modeling because, without necessarily resorting to physical prototyping, the developer can verify the functionality of the modeled system with the project specifications. Simulation allows us to study the system through a wide spectrum of conditions so that we can understand how representative the model is of the system that it refers to. Pros and cons of simulation modeling Simulation is a tool that's widely used in a variety of fields, from operational research to the application industry. This technique can be made successful by it overcoming the difficulties that each complex procedure contains. The following are the pros and cons of simulation modeling. Let's start with the concrete advantages that can be obtained from the use of simulation models (pros): • It reproduces the behavior of a system in reference to situations that cannot be directly experienced. • It represents real systems, even complex ones, while also considering the sources of uncertainty. • It requires limited resources in terms of data. • It allows experimentation in limited time frames. • The models that are obtained are easily demonstrable. Introducing simulation models 7 As anticipated, since it is a technique capable of reproducing complex scenarios, it has some limitations (cons): • The simulation provides indications of the behavior of the system, but not exact results. • The analysis of the output of a simulation could be complex and it could be difficult to identify which may be the best configuration. • The implementation of a simulation model could be laborious and, moreover, it may take long calculation times to carry out a significant simulation. • The results that are returned by the simulation depend on the quality of the input data: it cannot provide accurate results in the case of inaccurate input data. • The complexity of the simulation model depends on the complexity of the system it intends to reproduce. Nevertheless, simulation models represent the best solution for the analysis of complex scenarios. Simulation modeling terminology In this section, we will analyze the elements that make up a model and those that characterize a simulation process. We will give a brief description of each so that you understand their meaning and the role they play in the numerical simulation process. System The context of an investigation is represented through a system; that is, the set of elements that interact with each other. The main problem linked to this element concerns the system boundaries, that is, which elements of reality must be inserted in the system that represents it and which are left out and the relationships that exist between them. State variables A system is described in each instant of time by a set of variables. These are called state variables. For example, in the case of a weather system, the temperature is a state variable. In discrete systems, the variables change instantly at precise moments of time that are finite. In continuous systems, the variables vary in terms of continuity with respect to time. 8 Introducing Simulation Models Events An event is defined as any instantaneous event that causes the value of at least one of the status variables to change. The arrival of a blizzard for a weather system is an event, as it causes the temperature to drop suddenly. There are both external events and internal events. Parameters Parameters represent essential terms when building a model. They are adjusted during the model simulation process to ensure that the results are brought into the necessary convergence margins. They can be modified iteratively through sensitivity analysis or in the model calibration phase. Calibration Calibration represents the process by which the parameters of the model are adjusted in order to adapt the results to the data observed in the best possible way. When calibrating the model, we try to obtain the best possible accuracy. A good calibration requires eliminating, or minimizing, errors in data collection and choosing a theoretical model that is the best possible description of reality. The choice of model parameters is decisive and must be done in such a way as to minimize the deviation of its results when applied to historical data. Accuracy Accuracy is the degree of correspondence of the simulation result that can be inferred from a series of calculated values with the actual data, that is, the difference between the average modeled value and the true or reference value. Accuracy, when calculated, provides a quantitative estimate of the quality expected from a forecast. Several indicators are available to measure accuracy. The most used are mean absolute error (MAE), mean absolute percentage error (MAPE), and mean squared error (MSE). Sensitivity The sensitivity of a model indicates the degree to which the model's outputs are affected by changes in the selected input parameters. A sensitivity analysis identifies the sensitive parameters for the output of the model. It allows us to determine which parameters require further investigation so that we have a more realistic evaluation of the model's output values. Furthermore, it allows us to identify which parameters are not significant for the generation of a certain output and therefore can possibly be eliminated from the model. Finally, it tells us which parameters should be considered in a possible and subsequent analysis of the uncertainty of the output values provided by the model. Classifying simulation models 9 Validation This is the process that verifies the accuracy of the proposed model. The model must be validated to be used as a tool to support decisions. It aims to verify whether the model that's being analyzed corresponds conceptually to our intentions. The validation of a model is based on the various techniques of multivariate analysis, which, from time to time, study the variability and interdependence of attributes within a class of objects. Classifying simulation models Simulation models can be classified according to different criteria. The first distinction is between static and dynamic systems. So, let's see what differentiates them. Comparing static and dynamic models Static models are the representation of a system in an instant of time, or representative models of a system in which the time variable plays no role. An example of a static simulation is a Monte Carlo model. Dynamic models, on the other hand, describe the evolution of the system over time. In the simplest case, the state of the system at time t is described by a function x (t). For example, in population dynamics, x (t) represents the population present at time t. The equation that regulates the system is dynamic: it describes the instantaneous variation of the population or the variation in fixed time intervals. Comparing deterministic and stochastic models A model is deterministic when its evolution, over time, is uniquely determined by its initial conditions and characteristics. These models do not consider random elements and lend themselves to be solved with exact methods that are derived from mathematical analysis. In deterministic models, the output is well determined once the input data and the relationships that make up the model have been specified, despite the time required for data processing being particularly long. For these systems, the transformation rules univocally determine the change of state of the system. Examples of deterministic systems can be observed in some production and automation systems. Stochastic models, on the other hand, can be evolved by inserting random elements into the evolution. These are obtained by extracting them from statistical distributions. Among the operational characteristics of these models, there is not just one relationship that fits all. There's also probability density functions, which means there is no one-to-one correspondence between the data and system history. 10 Introducing Simulation Models A final distinction is based on how the system evolves over time: this is why we distinguish between continuous and discrete simulation models. Comparing continuous and discrete models Continuous models represent systems in which the state of the variables changes continuously as a function of time. For example, a car moving on a road represents a continuous system since the variables that identify it, such as position and speed, can change continuously with respect to time. In discrete models, the system is described by an overlapping sequence of physical operations, interspersed with inactivity pauses. These operations begin and end in welldefined instances (events). The system undergoes a change of state when each event occurs, remaining in the same state throughout the interval between the two subsequent events. This type of operation is easy to treat with the simulation approach. Important Note The stochastic or deterministic, or continuous or discrete, nature of a model is not its absolute property and depends on the observer's vision of the system itself. This is determined by the objectives and the method of study, as well as by the experience of the observer. Now that we've analyzed the different types of models in detail, we will learn how to develop a numerical simulation model. Approaching a simulation-based problem To tackle a numerical simulation process that returns accurate results, it is crucial to rigorously follow a series of procedures that partly precede and partly follow the actual modeling of the system. We can separate the simulation process workflow into the following individual steps: 1. Problem analysis 2. Data collection 3. Setting up the simulation model 4. Simulation software selection Approaching a simulation-based problem 11 5. Verification of the software solution 6. Validation of the simulation model 7. Simulation and analysis of results To fully understand the whole simulation process, it is essential to analyze the various phases that characterize a study based on simulation in depth. Problem analysis In this initial step, the goal is to understand the problem by trying to identify the aims of the study and the essential components, as well as the performance measures that interest them. Simulation is not simply an optimization technique and therefore there is no parameter that needs to be maximized or minimized. However, there is a series of performance indices whose dependence on the input variables must be verified. If an operational version of the system is already available, the work is simplified as it is enough to observe this system to deduce its fundamental characteristics. Data collection This represents a crucial step in the whole process since the quality of the simulation model depends on the quality of the input data. This step is closely related to the previous one. In fact, once the objective of the study has been identified, data is collected and subsequently processed. Processing the collected data is necessary to transform it into a format that can be used by the model. The origin of the data can be different: sometimes, the data is retrieved from company databases, but more often than not, direct measurements in the field must be made through a series of sensors that, in recent years, have become increasingly smart. These operations weigh down the entire study process, thus lengthening their execution times. Setting up the simulation model This is the crucial step of the whole simulation process; therefore, it is necessary to pay close attention to it. To set up a simulation model, it is necessary to know the probability distributions of the variables of interest. In fact, to generate various representative scenarios of how a system works, it is essential that a simulation generates random observations from these distributions. 12 Introducing Simulation Models For example, when managing stocks, the distribution of the product being requested and the distribution of time between an order and the receipt of the goods is necessary. On the other hand, when managing production systems with machines that can occasionally fail, it will be necessary to know the distribution of time until a machine fails and the distribution of repair times. If the system is not already available, it is only possible to estimate these distributions by deriving them, for example, from the observation of similar, already existing systems. If, from the analysis of the data, it is seen that this form of distribution approximates a standard type distribution, the standard theoretical distribution can be used by carrying out a statistical test to verify whether the data can be well represented by that probability distribution. If there are no similar systems from which observable data can be obtained, other sources of information must be used: machine specifications, instruction manuals for the machines, experimental studies, and so on. As we've already mentioned, constructing a simulation model is a complex procedure. Referring to simulating discrete events, constructing a model involves the following steps: 1. Defining the state variables 2. Identifying the values that can be taken by the state variables 3. Identifying the possible events that change the state of the system 4. Realizing a simulated time measurement, that is, a simulation clock, that records the flow of simulated time 5. Implementing a method for randomly generating events 6. Identifying the state transitions generated by events After following these steps, we will have the simulation model ready for use. At this point, it will be necessary to implement this model in a dedicated software platform; let's see how. Approaching a simulation-based problem 13 Simulation software selection The choice of the software platform that you will perform the numerical simulation with is fundamental for the success of the project. In this regard, we have several solutions that we can adopt. This choice will be made based on our knowledge of programming. Let's see what solutions are available: • Simulators: These are application-oriented packages for simulation. There are numerous interactive software packages for simulation, such as MATLAB, COMSOL Multiphysics, Ansys, SolidWorks, Simulink, Arena, AnyLogic, and SimScale. These pieces of software represent excellent simulation platforms whose performance differs based on the application solutions provided. These simulators allow us to elaborate on a simulation environment using graphic menus without the need to program. They are easy to use but many of them have excellent modeling capabilities, even if you just use their standard features. Some of them provide animations that show the simulation in action, which allows you to easily illustrate the simulation to non-experts. The limitations presented by this software solution are the high costs of the licenses, which can only be faced by large companies, and the difficulty in modeling solutions that have not been foreseen by the standards. • Simulation languages: A more versatile solution is offered by the different simulation languages ​​available. There are solutions that facilitate the task of the programmer who, with these languages, ​​can develop entire models or sub-models with a few lines of code that would otherwise require much longer drafting times, with a consequent increase in the probability of error. An example of a simulation language ​​is the general-purpose simulation system (GPSS). This is a generic programming language that was developed by IBM in 1965. In it, a simulation clock advances in discrete steps, modeling a system as transactions enter the system and are passed from one service to another. It is mainly used as a process flow-oriented simulation language and is particularly suitable for application problems. Another example of a simulation language ​​is SimScript, which was developed in 1963 as an extension of Fortran. SimScript is an event-based scripting language, so different parts of the script are triggered by different events. 14 Introducing Simulation Models • GPSS: General-purpose programming languages are designed to be able to create software in numerous areas of application. They are particularly suitable for the development of system software such as drivers, kernels, and anything that communicates directly with the hardware of a computer. Since these languages are ​​ not specifically dedicated to a simulation activity, they require the programmer to work harder to implement all the mechanisms and data structures necessary in a simulator. On the other hand, by offering all the potential of a high-level programming language, they offer the programmer a more versatile programming environment. In this way, you can develop a numerical simulation model perfectly suited to the needs of the researcher. In this book, we will use this solution by devoting ourselves to programming with Python. This software platform offers a series of tools that have been created by researchers from all over the world that make the elaboration of a numerical modeling system particularly easy. In addition, the open source nature of the projects written in Python makes this solution particularly inexpensive. Now that we've made the choice of the software platform we're going to use and have elaborated on the numerical model, we need to verify the software solution. Verification of the software solution In this phase, a check is carried out on the numerical code. This is known as debugging, which consists of ensuring that the code correctly follows the desired logical flow, without unexpected blocks or interruptions. The verification must be provided in real time during the creation phase because correcting any concept or syntax errors becomes more difficult as the complexity of the model increases. Although verification is simple in theory, debugging large-scale simulation code is a difficult task due to virtual competition. The correctness or otherwise of executions depends on time, as well as on the large number of potential logical paths. When developing a simulation model, you should divide the code into modules or subroutines in order to facilitate debugging. It is also advisable to have more than one person review the code, as a single programmer may not be a good critic. In addition, it can be helpful to perform the simulation when considering a large variety of input parameters and checking that the output is reasonable. Approaching a simulation-based problem 15 Important Note One of the best techniques that can be used to verify a discrete-event simulation program is one based on tracking. The status of the system, the content of the list of events, the simulated time, the status variables, and the statistical counters are shown after the occurrence of each event and then compared with handmade calculations to check the operation of the code. A track often produces a large volume of output that needs to be checked event by event for errors. Possible problems may arise, including the following: • There may be information that hasn't been requested by the analyst. • Other useful information may be missing, or a certain type of error may not be detectable during a limited debugging run. After the verification process, it is necessary to validate the simulation model. Validation of the simulation model In this step, it is necessary to check whether the model that has been created provides valid results for the system in question. We must check whether the performance measurements of the real system are well approximated by the measurements generated by the simulation model. A simulation model of a complex system can only approximate it. A simulation model is always developed for a set of objectives. A model that's valid for one purpose may not be valid for another. Important Note Validation is a where the level of accuracy between the model and the system is respected. It is necessary to establish whether the model adequately represents the behavior of the system. The value of a model can only be defined in relation to its use. Therefore, validation is a process that aims to determine whether a simulation model accurately represents the system for the set objectives. In this step, the ability of the model to reproduce the real functionality the system is ascertained; that is, it is ensured that the calibrated parameters, relative to the calibration scenario, can be used to correctly simulate other system situations. Once the validation phase is over, the model can be considered transferable and therefore usable for the simulation of any new control strategies and new intervention alternatives. As widely discussed in the literature on this subject, it is important to validate the model parameters that were previously calibrated on the basis of data other than that used to calibrate the model, always with reference to the phenomenon specific to the scenario being analyzed. 16 Introducing Simulation Models Simulation and analysis of results A simulation is a process that evolves during its realization and where the initial results help lead the simulation toward more complex configurations. Attention should be paid to some details. For example, it is necessary to determine the transient length of the system before reaching stationary conditions if you want performance measures of the system at full capacity. It is also necessary to determine the duration of the simulation after the system has reached equilibrium. In fact, it must always be kept in mind that a simulation does not produce the exact values ​​of the performance measures of a system since each simulation is a statistical experiment that generates statistical observations regarding the performance of the system. These observations are then used to produce estimates of performance measures. Increasing the duration of the simulation can increase the accuracy of these estimates. The simulation results return statistical estimates of a system's performance measures. A fundamental point is that each measurement is accompanied by the confidence interval, within which it can vary. These results could immediately highlight a better system configuration than the others, but more often, more than one candidate configuration will be identified. In this case, further investigations may be needed to compare these configurations. Dynamical systems modeling In this section, we will analyze a real case of modeling a production process. In this way, we will learn how to deal with the elements of the system and how to translate the production instances into the elements of the model. A model is created to study the behavior of a system over time. It consists of a set of assumptions about the behavior of the system being expressed using mathematical logical-symbolic relationships. These relationships are between the entities that make up the system. Recall that a model is used to simulate changes in the system and predict the effects of these changes on the real system. Simple models are resolved analytically, using mathematical methods, while complex models are numerically simulated on the computer, where the data is treated as the data of a real system. Dynamical systems modeling 17 Managing workshop machinery In this section, we will look at a simple example of how a discrete event simulation of a dynamic system is created. A discrete event system is a dynamic system whose states can assume logical or symbolic values, rather than numerical ones, and whose behavior is characterized by the occurrence of instantaneous events within an irregular timing sequence not necessarily known a priori. The behavior of these systems is described in terms of states and events. In a workshop, there are two machines, which we will call A1 and A2. At the beginning of the day, five jobs need to be carried out: W1, W2, W3, W4, and W5. The following table shows how long we need to work on the machines in minutes: Figure 1.2 – Table showing work time on the machines A zero indicates that a job does not require that machine. Jobs that require two machines must pass through A1 and then through A2. Suppose that we decide to carry out the jobs by assigning them to each machine so that when they become available, the first executable job is started first, in the order from 1 to 5. If, at the same time, more jobs can be executed on the same machine, we will execute the one with a minor index first. The purpose of modeling is to determine the minimum time needed to complete all the works. The events in which state changes can occur in the system are as follows: 1. A job becomes available for a machine. 2. A machine starts a job. 3. A machine ends a job. 18 Introducing Simulation Models Based on these rules and the evaluation times indicated in the previous table, we can insert the sequence of the jobs, along with the events scheduled according to the execution times, into a table: Figure 1.3 – Table of job sequences This table shows the times of the events in sequence, indicating the start and end of the work associated with the two machines available in the workshop. At the end of each job, a new job is sent to each machine according to the rules set previously. In this way, the deadlines for the work and the subsequent start of another job are clearly indicated. This is just as easy as it is to identify the time interval in which each machine is used and when it becomes available again. The table solution we have proposed represents a simple and immediate way of simulating a simple dynamic discrete system. The example we just discussed is a typical case of a dynamic system in which time proceeds in steps, in a discrete way. However, many dynamic systems are best described by assuming that time passes continuously. In the next section, we will analyze the case of a continuous dynamic system. Simple harmonic oscillator Consider a mass m resting on a horizontal plane, without friction, and attached to a wall by an ideal spring, of elastic constant k. Suppose that, when the horizontal coordinate x is zero, the spring is at rest. The following diagram shows the scheme of a simple harmonic oscillator: Dynamical systems modeling 19 Figure 1.4 – The scheme of a harmonic oscillator If the block of mass m is moved to the right with respect to its equilibrium position (x> 0), the spring, being elongated, calls it to the left. Conversely, if the block is placed to the left of its equilibrium position (x < 0), then the spring is compressed and pushes the block to the right. In both cases, we can express the component along the x-axis of the force due to the spring according to the following formula: Here, we have the following: • is the force. • is the elastic constant. • 𝐹𝐹𝑥𝑥 = −𝑘𝑘 ∗ 𝑥𝑥 is the horizontal coordinate that indicate the position of the mass m. From the second law of dynamics, we can derive the component of acceleration along x as follows: Here, we have the following: • is the acceleration. • is the elastic constant. • • 𝑎𝑎𝑥𝑥 = − 𝑘𝑘 ∗ 𝑥𝑥 𝑚𝑚 is the mass of the block. is the horizontal coordinate that indicate the position of the mass m. 20 Introducing Simulation Models = = If we indicate with the rate of change of the speed and with we can obtain the evolution equations of the dynamic system, as follows: Here, we have the following: • 𝜔𝜔2 = − the speed, 𝑑𝑑𝑑𝑑 = 𝑣𝑣 { 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 = − 𝜔𝜔2 ∗ 𝑥𝑥 𝑑𝑑𝑑𝑑 𝑘𝑘 𝑚𝑚 For these equations, we must associate the initial conditions of the system, which we can write in the following way: 𝑥𝑥(0) = 𝑥𝑥0 { 𝑣𝑣(0) = 𝑣𝑣0 The solutions to the previous differential equations are as follows: 𝑣𝑣0 𝑥𝑥(𝑡𝑡) = 𝑥𝑥0 cos(𝜔𝜔𝜔𝜔) + sin (𝜔𝜔𝜔𝜔) { 𝜔𝜔 𝑣𝑣(𝑡𝑡) = 𝑣𝑣0 cos(𝜔𝜔𝜔𝜔) − 𝑥𝑥0 𝜔𝜔 sin (𝜔𝜔𝜔𝜔) In this way, we obtained the mathematical model of the analyzed system. In order to study the evolution of the oscillation phenomenon of the mass block m over time, it is enough to vary the time and calculate the position of the mass at that instant and its speed. In decision-making processes characterized by high levels of complexity, the use of analytical models is not possible. In these cases, it is necessary to resort to models that differ from those of an analytical type for the use of the calculator as a tool not only for calculation, such as in mathematical programming models, but also for representing the elements that make up reality why studying the relationships between them. Predator-prey model In the field of simulations, simulating the functionality of production and logistic processes is considerably important. These systems are, in fact, characterized by high complexity, numerous interrelationships between the different processes that pass through them, segment failures, unavailability, and the stochasticity of the system parameters. Dynamical systems modeling 21 To understand how complex the analytical modeling of some phenomena is, let's analyze a universally widespread biological model. This is the predator-prey model, which was developed independently by the Italian researcher Vito Volterra and the American biophysicist Alfred Lotka. On an island, there are two populations of animals: prey and predators. The vegetation of the island provides the prey with nourishment in quantities that we can consider as unlimited, while the prey is the only food available for the predators. We can consider the birth rate of the prey constant over time; this means that in the absence of predators, the prey would grow by exponential law. Their mortality rate, on the other hand, depends on the probability they have of falling prey to a predator and therefore on the number of predators present per unit area. As for the predators, the mortality rate is constant, while their growth rate depends on the availability of food and therefore on the number of prey per unit area present on the island. We want to study the trend of the size of the two populations over time, starting from a known initial situation (number of prey and predators). To carry out a simulation of this biological system, we can model it by means of the following system of finite difference equations, where x(t) and y(t) are the number of prey and predators at time t, respectively: 𝑑𝑑𝑑𝑑 = 𝛼𝛼 ∗ 𝑥𝑥(𝑡𝑡) − 𝛽𝛽 ∗ 𝑥𝑥(𝑡𝑡) ∗ 𝑦𝑦(𝑡𝑡) { 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 = 𝛾𝛾 ∗ 𝑦𝑦(𝑡𝑡) − 𝛿𝛿 ∗ 𝑥𝑥(𝑡𝑡) ∗ 𝑦𝑦(𝑡𝑡) 𝑑𝑑𝑑𝑑 Here, we have the following: • α, β, γ, δ are positive real parameters related to the interaction of the two species • is the instantaneous growth rates of the prey. • is the instantaneous growth rates of the predators. The following hypotheses underlie in this model: • In the absence of predators, the number of prey increases according to an exponential law, that is, with a constant rate. • Similarly, in the absence of prey, the number of predators decreases at a constant rate. This is a deterministic and continuous simulation model. In fact, the state of the system, represented by the size of the two populations in each instance of time, is univocally determined by the initial state and the parameters of the model. Furthermore, at least in principle, the variables – that is, the size of the populations – vary continuously over time. The differential system is in normal form and can be solved with respect to the derivatives of maximum order but cannot be separated into variables. It is not possible to solve this in an analytical form, but with numerical methods, it is solved immediately (the RungeKutta method). The solution obviously depends on the values of the four constants and the initial values. Summary In this chapter, we learned what is meant by simulation modeling. We understood the difference between modeling and simulation, and we discovered the strengths of simulation models, such as defects. To understand these concepts, we clarified the meaning of the terms that appear most frequently when dealing with these topics. We then analyzed the different types of models: static versus dynamic, deterministic versus stochastic, and continuous versus discrete. We then explored the workflow connected to a numerical simulation process and highlighted the crucial steps. Finally, we studied some practical modeling cases to understand how to elaborate on a model starting from the initial considerations. In the next chapter, we will learn how to approach a stochastic process and understand the random number simulation concepts. Then, we will explore the differences between pseudo and non-uniform random numbers, as well as the methods we can use for random distribution evaluation. 2 Understanding Randomness and Random Numbers In many real-life situations, it is useful to flip a coin in order to decide what to do. Many computers also use this procedure as part of their decision-making process. In fact, many problems can be solved in a very effective and relatively simple way by using probabilistic algorithms. In an algorithm of this type, decisions are made based on random contributions that remember the dice roll with the help of a randomly chosen value. The generation of random numbers has ancient roots, but only recently has the process been sped up, allowing it to be used on a large scale in scientific research as well. These generators are mainly used for computer simulations, statistical sampling techniques, or in the field of cryptography. In this chapter, we're going to cover the following topics: • Stochastic processes • Random number simulation • The pseudorandom number generator 24 Understanding Randomness and Random Numbers • Testing uniform distribution • Exploring generic methods for random distributions • Random number generation using Python Technical requirements In this chapter, we will introduce random number generation techniques. In order to understand these topics, a basic knowledge of algebra and mathematical modeling is needed. To work with the Python code in this chapter, you need the following files (available on GitHub at https://github.com/PacktPublishing/Hands-On-SimulationModeling-with-Python): • LinearCongruentialGenerator.py • LearmouthLewisGenerator.py • LaggedFibonacciAlgorithm.py • UniformityTest.py • Random.Generation.py Stochastic processes A stochastic process is a family of random variables that depends on a parameter, t. A stochastic process is specified using the following notation: {𝑋𝑋𝑡𝑡 , 𝑡𝑡 ∈ 𝑇𝑇} Here, t is a parameter, and T is the set of possible values ​​of t. Usually, time is indicated by t, so a stochastic process is a family of time-dependent random variables. The variability range of t, that is, the set, T, can be a set of real numbers, possibly coinciding with the entire time axis. But it can also be a discrete set of values. The random variables, Xt, are defined on the set, X, called the space of states. This can be a continuous set, in which case it is defined as a continuous stochastic process, or a discrete set, in which case it is defined as a discrete stochastic process. Consider the following elements: 𝑥𝑥0 , 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ∈ 𝑋𝑋 Stochastic processes 25 This means the values ​​that the random variables, Xt, can take are called system states and represent the possible results of an experiment. The Xt variables are linked together by dependency relationships. We can know a random variable if we know both the values it can assume and the probability distribution. So, to understand a stochastic process, it is necessary not only to know the values ​​that Xt can take but also the probability distributions of the variables and the joint distributions between the values. Simpler stochastic processes, in which the variability range of t is a discrete set of time values, ​​can also be considered. Important note In practice, there are numerous phenomena that are studied through the theory of stochastic processes. A classic application in physics is the study of the motion of a particle in each medium, the so-called Brownian motion. This study is carried out statistically using a stochastic process. There are processes where even by knowing the past and the present, the future cannot be determined; whereas, in other processes, the future is determined by the present without considering the past. Types of stochastic process Stochastic processes can be classified according to the following characteristics: • Space of states • Time index • Type of stochastic dependence between random variables The state space can be discrete or continuous. In the first case, the stochastic process with discrete space is also called a chain, and space is often referred to as the set of non-negative integers. In the second case, the set of values assumed by the random variables is not finite or countable, and the stochastic process is in continuous space. The time index can also be discrete or continuous. A discrete-time stochastic process is also called a stochastic sequence and is denoted as follows: Here, the set, T, is finite or countable. {𝑋𝑋𝑛𝑛 | 𝑛𝑛 ∈ 𝑇𝑇} 26 Understanding Randomness and Random Numbers In this case, the changes of state are observed only in certain instances: finite or countable. If state changes occur at any instant in a finite or infinite set of real intervals, then there is a continuous-time process, which is denoted as follows: {𝑋𝑋(𝑡𝑡) | 𝑡𝑡 ∈ 𝑇𝑇} The stochastic dependence between random variables, X(t), for different values of t characterizes a stochastic process and sometimes simplifies its description. A stochastic process is stationary in the strict sense that the distribution function is invariant with respect to a shift on the time axis, T. A stochastic process is stationary in the broad sense that the first two moments of the distribution are independent of the position on the T axis. Examples of stochastic processes The mathematical treatment of stochastic processes seems complex, yet we find cases of stochastic processes every day. For example, the number of patients admitted to a hospital as a function of time, observed at noon each day, is a stochastic process in which the space of states is discrete, being a finite subset of natural numbers, and time is discrete. Another example of a stochastic process is the temperature measured in a room as a function of time, observed at every instant, with continuous state space and continuous time. Let's now look at a number of structured examples that are based on stochastic processes. The Bernoulli process The concept of a random variable allows us to formulate models that are useful for the study of many random phenomena. An important early example of a probabilistic model is the Bernoulli distribution, named in honor of the Swiss mathematician, James Bernoulli (1654-1705), who made important contributions to the field of probability. Some of these experiments consist of repeatedly performing a given test. For example, we want to know the probability of getting a head when throwing a coin 1,000 times. In each of these examples, we look for the probability of obtaining x successes in n trials. If x indicates the successes, then n - x will be the failures. A sequence of Bernoulli trials consists of a Bernoulli trial under the following hypotheses: • There are only two possible mutually exclusive results for each trial, arbitrarily called success and failure. • The probability of success, p, is the same for each trial. • All tests are independent. Stochastic processes 27 Independence means that the result of a test is not influenced by the result of any other test. For example, the event, the third test was successful, is independent of the event, the first test was successful. The toss of a coin is a Bernoulli trial: the heads event can be considered successful, and the tails event can be considered unsuccessful. In this case, the probability of success is p = 1/2. In rolling two dice, the event, the sum of the points is seven, and the complementary event are both unsuccessful. In this case, it is a Bernoulli trial and the probability of success is p = 1/6. Important note Two events are said to be complementary when the occurrence of the first excludes the occurrence of the second but one of the two will certainly occur. Let p denote the probability of success in a Bernoulli trial. The random variable, X, which counts the number of successes in n trials is called the binomial random variable of the n and p parameters. X can take integer values between 0 and n. Random walk The random walk is a discrete parameter stochastic process in which Xt, where X represents a random variable, describes the position taken at time t by a moving point. The term, random walk, refers to the mathematical formalization of statistics that describe the displacement of an object that moves randomly. This kind of simulation is extremely important for a physicist and has applications in statistical mechanics, fluid dynamics, and quantum mechanics. Random walks represent a mathematical model that is used universally to simulate a path formalized by a succession of random steps. This model can assume a variable number of degrees of freedom, depending on the system we want to describe. From a physical point of view, the path traced over time will not necessarily simulate a real motion, but it will represent the trend of the characteristics of the system over time. Random walks find applications in chemistry, biology, and physics, but also in other fields such as economics, sociology, and information technology. 28 Understanding Randomness and Random Numbers Random one-dimensional walking is a model that is used to simulate the movement of a particle moving along a straight line. There are only two potential movements on the allowed path: either to the right (with a probability that is equal to p) or to the left (with a probability that is equal to q) of the current position. Each step has a constant length and is independent of the others, as shown in the following diagram: Figure 2.1 – One-dimensional walking The position of the point in each instant is identified by its abscissa, X(n). This position, after n steps, will be characterized by a random term. Our aim is to calculate the probability of passing from the starting point after n movements. Obviously, nothing assures us that the point will return to the starting position. The variable, X(n), returns the abscissa of the particle after n steps. It is a discrete random variable with a binomial distribution. At each instant, the particle steps right or left based on the value returned by a random variable, Z(n). This variable can take only two values: +1 and -1. It assumes a + 1 value with a probability of p > 0 and a value of -1 with a probability that is equal to q. The sum of the two probabilities is p + q = 1. The position of the particle at instant n is given by the following equation: 𝑋𝑋𝑛𝑛 = 𝑋𝑋𝑛𝑛−1 + 𝑍𝑍𝑛𝑛 ; 𝑛𝑛 = 1, 2, … This shows the average number of returns to the origin of the particle, named p. The probability of a single return is given by the following geometric series: ∞ ∞ 𝑛𝑛=0 𝑛𝑛=0 𝜇𝜇 = ∑ 𝑛𝑛 𝑝𝑝𝑛𝑛 (1 − 𝑝𝑝) = ∑ 𝑛𝑛 𝑝𝑝𝑛𝑛 1 √𝑛𝑛 ∗ 𝜋𝜋 →∞ We assume that the probability of the particle returning to the origin tends to 1. This means that despite the frequency of the returns decreasing with the increase in the number of steps taken, they will always be in an infinite value of steps taken. So, we can conclude that a particle with equal probability of left and right movement, left free to walk casually to infinity with great probability, returns infinite times to the point from which it started. Stochastic processes 29 The Poisson process There are phenomena in which certain events, with reference to a certain interval of time or space, rarely happen. The number of events that occur in that interval varies from 0 to n, and n cannot be determined a priori. For example, the number of cars passing through an uncrowded street in a randomly chosen 5-minute time frame can be considered a rare event. Similarly, the number of accidents at work that happen at a company in a week, or the number of printing errors on a page of a book, is rare. In the study of rare events, a reference to a specific interval of time or space is fundamental. For the study of rare events, the Poisson probability distribution is used, named in honor of the French mathematician, Simeon Denis Poisson (1781-1840), who first obtained the distribution. The Poisson distribution is used as a model in cases where the events or realizations of a process, distributed randomly in space or time, are counts, that is, discrete variables. The binomial distribution is based on a set of hypotheses that define the Bernoulli trials, and the same happens for the Poisson distribution. The following conditions describe the so-called Poisson process: • The realizations of the events are independent, meaning that the occurrence of an event in a time or space interval has no effect on the probability of the event occurring a second time in the same, or another, interval. • The probability of a single realization of the event in each interval is proportional to the length of the interval. • In any arbitrarily small part of the interval, the probability of the event occurring more than once is negligible. An important difference between the Poisson distribution and the binomial distribution is the number of trials and successes. In a binomial distribution, the number, n, of trials is finite and the number, x, of successes cannot exceed n; in a Poisson distribution, the number of tests is essentially infinite and the number of successes can be infinitely large, even if the probability of having x successes becomes very small as x increases. 30 Understanding Randomness and Random Numbers Random number simulation The availability of random numbers is a necessary requirement in many applications. In some cases, the quality of the final application strictly depends on the possibility of generating good quality random numbers. Think, for example, of applications such as video games, cryptography, generating visuals or sound effects, telecommunications, signal processing, optimizations, and simulations. In an algorithm of this type, decisions are made based on the pull of a virtual currency, which is based on a randomly chosen value. There is no single or general definition of a random number since it often depends on the context. The concept of random number itself is not absolute, as any number or sequence of numbers can appear to be random to an observer, but not to another who knows the law with which they are generated. Put simply, a random number is defined as a number selected in a random process from a finite set of numbers. With this definition, we focus on the concept of randomness in the process of selecting a sequence of numbers. In many cases, the problem of generating random numbers concerns the random generation of a sequence of 0 and 1, from which numbers in any format can be obtained: integers, fixed points, floating points, or strings of arbitrary length. With the right functions, it is possible to obtain good quality sequences that can also be used in scientific applications, such as the Monte Carlo simulation. These techniques should be easy to implement and be usable by any computer. In addition, like all software solutions, they should be very versatile and quickly improved. Important note These techniques have a big problem that is inherent to the algorithmic nature of the process: the final string can be predicted from the starting seed. This is why we call this process pseudorandom. Despite this, many problems of an algorithmic nature are solved very effectively and relatively simply using probabilistic algorithms. The simplest example of a probabilistic algorithm is perhaps the randomized quicksort. This is a probabilistic variant of the homonymous sorting algorithm, where, by choosing the pivot element, the algorithm manages to randomly guarantee optimal complexity in the average case, no matter the distribution of the input. Cryptography is a field in which randomness plays a fundamental role and deserves specific mention. In this context, randomness does not lead to computational advantages, but it is essential to guarantee the security of authentication protocols and encryption algorithms. Random number simulation Probability distribution It is possible to characterize a random process from different points of view. One of the most important characteristics is the probability distribution. The probability distribution is a model that associates a probability with each observable modality of a random variable. The probability distribution can be either discrete or continuous, depending on whether the variable is random, discrete, or continuous. It is discrete if the phenomenon is observable with an integer number of modes. The throw of the dice is a discrete statistical phenomenon because the number of observable modalities is equal to 6. The random variable can take only six values ​​(1, 2, 3, 4, 5, and 6). Therefore, the probability distribution of the phenomenon is discrete. The probability distribution is continuous when the random variable assumes a continuous set of values; in this case, the statistical phenomenon can be observed with an infinite or very high number of modalities. The probability distribution of body temperature is continuous because it is a continuous statistical phenomenon, that is, the values ​​of the random variable vary continuously. Let's now look at different kinds of probability distributions. Uniform distribution In many cases, processes characterized by a uniform distribution are considered and used. This means that each element is as likely as any of the others to be selected if an infinite number of extractions is performed. If you represent the elements and their respective probabilities of being extracted on a graph, you get a rectangular graph as follows: Figure 2.2 – The probabilities of the elements 31 32 Understanding Randomness and Random Numbers Since the probability is expressed as a real number between 0 and 1, where 0 represents the impossible event and 1 the certain event, in a uniform distribution, each element will have a 1/n probability of being selected, where n is the number of items. In this case, the sum of all the probabilities must give a uniform result, since, in an extraction, at least one of the elements is chosen for sure. A uniform distribution is typical of artificial random processes such as dice rolling, lotteries, and roulette and is also the most used in several applications. Gaussian distribution Another very common probability distribution is the Gaussian or normal distribution, which has a typical bell shape. In this case, the smaller values, or those that are closer to the center of the curve, are more likely to be extracted than the larger ones, which are far away from the center. The following diagram shows a typical Gaussian distribution: Figure 2.3 – Gaussian distribution Gaussian distribution is important because it is typical of natural processes. For example, it can represent the distribution of the noise in many electronic components, or it can represent the distribution of errors in measurements. It is, therefore, used to simulate statistical distributions in the fields of telecommunications or signal processing. Properties of random numbers By random number, we refer to a random variable distributed in a uniform way between 0 and 1. The statistical properties that a sequence of random numbers must possess are as follows: • Uniformity • Independence The pseudorandom number generator Suppose you divide an interval, [0.1], into n subintervals of equal amplitude. The consequence of the uniformity property is that if N observations of a random number are made, then the number of observations in each subinterval is equal to N/n. The consequence of the independence property is that the probability of obtaining a value in a particular range is independent of the values ​​that were previously obtained. The pseudorandom number generator The generation of real random sequences using deterministic algorithms is impossible: at most, pseudorandom sequences can be generated. These are, apparently, random sequences that are actually perfectly predictable and can be repeated after a certain number of extractions. A PRNG is an algorithm designed to output a sequence of values that appear to be generated randomly. The pros and cons of a random number generator A random number generation routine must be the following: • Replicable • Fast • Not have large gaps between two generated numbers • Have a sufficiently long running period • Generate numbers with statistical properties that are as close as possible to ideal ones The most common cons of random number generators are as follows: • Numbers not uniformly distributed • Discretization of the generated numbers • Incorrect mean or variance • Presence of cyclical variations 33 34 Understanding Randomness and Random Numbers Random number generation algorithms The first to deal with the random number generation problem was John von Neumann, in 1949. He proposed a method called middle-square. This method allows us to understand some important characteristics of a random number generation process. To start, we need to provide an input as the seed or a value that starts the sequence. This is necessary to be able to generate different sequences each time. However, it is important to ensure that the good behavior of the generator does not depend on which seed is used. From here, the first flaw of the middle-square method appears, that is, when using the zero value as a seed, only a sequence of zeros will be obtained. Another shortcoming of this method is the repetitiveness of the sequences. As in all of the other PRNGs that will be discussed, each value depends on the previous one, and, at most, on the internal state variables of the generator. Since this is a limited number of digits, the sequence can only be repeated from a certain point onward. The length of the sequence, before it begins to repeat itself, is called the period. A long period is important because many practical applications require a large amount of random data, and a repetitive sequence might be less effective. In such cases, it is important that the choice of the seed has no influence on the potential outcomes. Another important aspect is the efficiency of the algorithm. The size of the output data values ​​and internal state, and, therefore, the generator input (seed), are often intrinsic features of the algorithm and remain constant. For this reason, the efficiency of a PRNG should be assessed not so much in terms of computational complexity, but in terms of the possibility of a fast and efficient implementation of the calculation architectures available. In fact, depending on the architecture you are working on, the choice of different PRNGs, or different design parameters of a certain PRNG, can result in a faster implementation by many orders of magnitude. Linear congruential generator One of the most common methods for generating random numbers is the Linear Congruence Generator (LCG). The theory on which it rests is simple to understand and implement. It also has the advantage of being computationally light. The recursive relationship underlying this technique is provided by the following equation: 𝑥𝑥𝑘𝑘+1 = (𝑎𝑎 ∗ 𝑥𝑥𝑘𝑘 + 𝑐𝑐) 𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚 Here, we can observe the following: • a is the multiplier (non-negative integers) • c is the increment (non-negative integers) The pseudorandom number generator 35 • m is the mode (non-negative integers) • x0 is the initial value (seed or non-negative integers) The modulo function, denoted by mod, results in the remainder of the Euclidean division of the first number by the second. For example, 18 mod 4 gives 2 as it is the remainder of the Euclidean division between the two numbers. The linear congruence technique has the following characteristics: • It is cyclical with a period that is approximately equal to m • The generated numbers are discretized To use this technique effectively, it is necessary to choose very large m values. As an example, set the parameters of the method and generate the first 16 pseudorandom values. Here is the Python code that allowed us to generate that sequence of numbers: import numpy as np a = 2 c = 4 m = 5 x = 3 for i in range (1,17):     x= np.mod((a*x+c), m)     print(x) The following results are returned: 0 4 2 3 0 4 2 3 0 4 2 3 36 Understanding Randomness and Random Numbers 0 4 2 3 In this case, the period is equal to 4. It is easy to verify that, at most, m distinct integers, Xn, can be generated in the interval, [0, m - 1]. If c = 0, the generator is called multiplicative. Let's analyze the Python code line by line. The first line was used to import the library: import numpy as np numpy is a library of additional scientific functions of the Python language, designed to perform operations on vectors and dimensional matrices. numpy allows you to work with vectors and matrices more efficiently and faster than you can do with lists and lists of lists (matrices). In addition, it contains an extensive library of high-level mathematical functions that can operate on these arrays. After importing the numpy library, we set the parameters that will allow us to generate random numbers using LCG: a c m x = = = = 2 4 5 3 At this point, we can use the LCG formula to generate random numbers. We only generate the first 16 numbers, but we will see from the results that these are enough to understand the algorithm. To do this, we use a for loop: for i in range (1,17):     x= np.mod((a*x+c), m)     print(x) To generate random numbers according to the LCG formula, we have used the np.mod() function. This function returns the remainder of a division when given a dividend and divisor. The pseudorandom number generator 37 Random numbers with uniform distribution A sequence of numbers uniformly distributed between [0, 1] can be obtained using the following formula: 𝑈𝑈𝑛𝑛 = 𝑋𝑋𝑛𝑛 𝑚𝑚 The obtained sequence is periodic, with a period less than or equal to m. If the period is m, then it has a full period. This occurs when the following conditions are true: • If m and c have prime numbers • If m is divisible by a prime number, b, for which it must also be divisible • If m is divisible by 4, then a - 1 must also be divisible by 4 Important note By choosing a large value of m, you can reduce both the phenomenon of periodicity and the problem of generating rational numbers. Furthermore, it is not necessary for simulation purposes that all numbers between [0, 1] are generated, because these are infinite. However, it is necessary that as many numbers as possible within the range have the same probability of being generated. Generally, a value of m is m ≥ 109 so that the generated numbers constitute a dense subset of the interval, [0, 1]. An example of a multiplicative generator that is widely used in 32-bit computers is the Learmonth-Lewis generator. This is a generator in which the parameters assume the following values: • a = 75 • c = 0 • m = 231 – 1 Let's analyze the code that generates the first 100 random numbers according to this method: import numpy as np a = 75 c = 0 m = 2**(31) -1 x = 0.1 38 Understanding Randomness and Random Numbers for i in range(1,100):     x= np.mod((a*x+c),m)     u = x/m     print(u) The code we have just seen was analyzed line by line in the Linear congruential generator section of this chapter. The difference, in addition to the values of the parameters, lies in the generation of a uniform distribution in the range of [0, 1] through the following command: u = x/m The following results are returned: Figure 2.4 – LCG output Since we are dealing with random numbers, the output will be different from the previous one. A comparison between the different generators must be made based on the analysis of the periodicity, the goodness of the uniformity of the numbers generated, and the computational simplicity. This is because the generation of very large numbers can lead to the use of expensive computer resources. Also, if the Xn numbers get too big, they are truncated, which can cause a loss of the desired uniformity statistics. The pseudorandom number generator 39 Lagged Fibonacci generator The lagged Fibonacci algorithm for generating pseudorandom numbers arises from the attempt to generalize the method of linear congruences. One of the reasons that led to the search for new generators was the need, useful for many applications especially in parallel computing, to lengthen the generator period. The period of a linear generator when m is approximately 109 is enough for many applications, but not all of them. One of the techniques developed is to make Xn + 1 dependent on the two previous values, Xn and Xn − 1, instead of only on Xn, as is the case in the LCG method. In this case, the period may arrive close to the value, m2, because the sequence will not repeat itself until the following equality is obtained: (𝑋𝑋𝑛𝑛+𝜆𝜆 , 𝑋𝑋𝑛𝑛+𝜆𝜆+1 ) = (𝑋𝑋𝑛𝑛 , 𝑋𝑋𝑛𝑛+1 ) The simplest generator of this type is the Fibonacci sequence represented by the following equation: 𝑋𝑋𝑛𝑛+1 = (𝑋𝑋𝑛𝑛 + 𝑋𝑋𝑛𝑛−1 ) 𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚 This generator was first analyzed in the 1950s and provides a period, m, but the sequence does not pass the simplest statistical tests. We then tried to improve the sequence using the following equation: 𝑋𝑋𝑛𝑛+1 = (𝑋𝑋𝑛𝑛 + 𝑋𝑋𝑛𝑛−𝑘𝑘 ) 𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚 This sequence, although better than the Fibonacci sequence, does not return satisfactory results. We had to wait until 1958, when Mitchell and Moore proposed the following sequence: 𝑋𝑋𝑛𝑛 = (𝑋𝑋𝑛𝑛−24 + 𝑋𝑋𝑛𝑛−55 ) 𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚, 𝑛𝑛 ≥ 55 Here, m is even and X0, ... X54 are arbitrary integers that are not all even. Constants 24 and 55 are not chosen at random but are numbers that define a sequence whose least significant bits (Xn mod 2) have a period of length 255-1. Therefore, the sequence (Xn) must have a period of length of at least 255-1. The succession has a period of 2M-1 (255-1) where m = 2M. Numbers 24 and 55 are commonly called lags and the sequence (Xn) is called a Lagged Fibonacci Generator (LFG). The LFG sequence can be generalized with the following equation: 𝑋𝑋𝑛𝑛 = (𝑋𝑋𝑛𝑛−𝑙𝑙 ⊗ 𝑋𝑋𝑛𝑛−𝑘𝑘 ) 𝑚𝑚𝑚𝑚𝑚𝑚 2𝑀𝑀 , 𝑙𝑙 > 𝑘𝑘 > 0 40 Understanding Randomness and Random Numbers Here, ⊗ refers to any of the following operations: +, −, ×, or ⊗ (exclusive or). Only some pairs (k, l) give sufficiently long periods. In these cases, the period is 2M-1 (2l-1). The pairs (k, l) must be chosen appropriately. The only condition on the first l values is that at least one of them must be odd; otherwise, the sequence will be composed of even numbers. Let's look at how to implement a simple example of additive LFG in Python using the following parameters: x0 = x1 = 1 and m = 232. Here is the code to generate the first 100 random numbers: import numpy as np x0=1 x1=1 m=2**32 for i in range (1,101):     x= np.mod((x0+x1), m)     x0=x1     x1=x     print(x) Let's analyze the Python code line by line. The first line was used to import the library: import numpy as np After importing the numpy library, we set the parameters that will allow us to generate random numbers using LFG: x0=1 x1=1 m=2**32 At this point, we can use the LFG formula to generate random numbers. We only generate the first 100 numbers. To do this, we use a for loop: for i in range (1,101):     x= np.mod((x0+x1), m)     x0=x1     x1=x     print(x) The pseudorandom number generator 41 To generate random numbers according to the LFG formula, we use the np.mod() function. This function returns the remainder of a division when given a dividend and divisor. After generating the random number, the two previous values are updated as follows: x0=x1 x1=x The following random numbers are printed: Figure 2.5 – Table of random numbers using LFG The initialization of an LFG is particularly complex, and the results of this method are very sensitive to the initial conditions. If extreme care is not taken when choosing the initial values, statistical defects may occur in the output sequence. These defects could harden the initial values along with subsequent values that have a careful periodicity. Another potential problem with LFG is that the mathematical theory behind the method is incomplete, making it necessary to rely on statistical tests rather than theoretical performance. 42 Understanding Randomness and Random Numbers Testing uniform distribution Test adaptation (that is, the goodness of fit) in general, has the purpose of verifying whether a variable under examination does or does not have a certain hypothesized distribution on the basis, as usual, of experimental data. It is used to compare a set of frequencies observed in a sample, with similar theoretical quantities assumed for the population. By means of the test, it is possible to quantitatively measure the degree of deviation between the two sets of values. The results obtained in the samples do not always exactly agree with the theoretical results that are expected according to the rules of probability. Indeed, it is very rare for this to occur. For example, although theoretical considerations lead us to expect 100 heads and 100 tails from 200 flips of a coin, it is rare that these results are obtained exactly. However, despite this, we must not unnecessarily deduce that the coin is rigged. The chi-squared test The chi-squared test is a test of hypotheses that gives us back the significance of the relationship between two variables. It is a statistical inference technique that is based on the chi-squared statistic and its probability distribution. It can be used with nominal and/ or ordinal variables, generally arranged in the form of contingency tables. The main purpose of this statistic is to verify the differences between observed and theoretical values, called expected values, and to make an inference on the degree of deviation between the two. The technique is used with three different objectives that are all based on the same fundamental principle: • The randomness of the distribution of a categorical variable • The independence of two qualitative variables (nominal or ordinal) • The differences with a theoretical model For now, we will just consider the first aspect. The method consists of a comparison procedure between the observed empirical frequencies and the theoretical frequencies. Let's consider the following definitions: • H0: Null hypothesis or the absence of a statistical relationship between two variables • H1: Research hypothesis that supports the existence of the relationship, for instance, H1 is true if H0 is false • Fo: Observed frequencies, that is, the number of data of a cell detected • Fe: Expected frequencies, that is, the frequency that should be obtained based on the marginal totals if there was no association between the two variables considered Testing uniform distribution 43 The chi-squared test is based on the difference between observed and expected frequencies. If the observed frequency is very different from the expected frequency, then there is an association between the two variables. As the difference between the observed frequency and the expected frequency increases, so does the value of chi-squared. The chi-squared value is calculated using the following equation: χ2 = ∑ (𝐹𝐹𝑜𝑜 − 𝐹𝐹𝑒𝑒 )2 𝐹𝐹𝑒𝑒 Let's look at an example to understand how to calculate this value. We build a contingency table that shows student choices for specific courses divided by genres. These are the observed values: Figure 2.6 – Table of student choices divided by genres In addition, we calculate the representation of each value as a percentage of column totals (observed frequencies): Figure 2.7 – Table of choices in percentages of column totals Now we calculate the expected value, as follows: Expected value = (𝑇𝑇𝑇𝑇𝑇𝑇 𝑟𝑟𝑟𝑟𝑟𝑟) ∗ (𝑇𝑇𝑇𝑇𝑇𝑇 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) 𝑇𝑇𝑇𝑇𝑇𝑇 Let's calculate it for the first cell (Biotechnology – Male): Expected value = (985) ∗ (1166) = 512.9567 2239 44 Understanding Randomness and Random Numbers The contingency matrix of expected values is as follows: Figure 2.8 – Table of contingency matrix Let's calculate the contingency differences (Fo – Fe), as follows: Figure 2.9 – Table of contigency differences Finally, we can calculate the chi-squared value, as follows: χ2 = ∑ (𝐹𝐹𝑜𝑜 − 𝐹𝐹𝑒𝑒 )2 (−101.957)2 (−101.957)2 = + + ⋯ = 84.35 𝐹𝐹𝑒𝑒 512.9567 472.0433 If the two characters were independent, we would expect a chi-squared value of zero. On the other hand, random fluctuations are always possible. So, even in the case of perfect independence, we will never have zero. Therefore, even chi-squared values that ​​ are far from zero could make the result compatible with the null hypothesis, H0, of independence between the variables. Important note One question that arises is whether the value obtained is only the result of a fluctuation, or does it arise from the dependence between the data? Statistical theory tells us that if the variables are independent, the distribution of the chi-squared frequencies follows an asymmetric curve. In our case, we have a frequency distribution table of two features: course and gender. That is, the course feature with three modes and the genre feature with two modes. In the case of independe...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

It's the final answer from my side

Assignment

(Chapters 4, 5, 6, 7)
Chapter 4 :
1- a. Python codes were executed and following three diagrams and outputs were obtained:

Figure 4.1(i): Plot of quarter circle 𝑥 2 + 𝑦 2 = 1 and square of size 1 for 𝑁 = 100
Output: N=100 M=84 Pi=3.36

Figure 4.1(ii): Plot of quarter circle 𝑥 2 + 𝑦 2 = 1 and square of size 1 for 𝑁 = 1000
Output: N=1000 M=779 Pi=3.12

Figure 4.1(iii): Plot of quarter circle 𝑥 2 + 𝑦 2 = 1 and square of size 1 for 𝑁 = 10000
Output: N=10000 M=7836 Pi=3.13
Explanation: The initial value M=0 represents that no point was inside the circle initially.
N=100, 1000, and 10000 represents the number of iterations or number of times the
calculations performed to check if a point falls inside/on the circle or it outside the circle. At
each check of a point (x, y) a point satisfies or not satisfy the condition: 𝑥 2 + 𝑦 2 ≤ 1. If the
given condition is true, value of M is increased by 1 else it remains the same as the previous
value. Finally, M/N represents the approx. area of the circular region and 4*M/N is equal to
Pi.
As the N increases the number of points increases and thus the value is more accurate to
actual value of pi which is 3.14…

1- b. Pi estimation was conducted in python. This time we use a half circle instead of 1/4th
circle. Three figures and outputs for N=100, 1000, 10000 are shown below:

Figure 4.1(iv): Plot of quarter circle 𝑥 2 + 𝑦 2 = 1 and square of size 1 for 𝑁 = 100
Output: N=100 M=75 Pi=3.00

Figure 4.1(v): Plot of quarter circle 𝑥 2 + 𝑦 2 = 1 and square of size 1 for 𝑁 = 1000
Output: N=1000 M=793 Pi=3.17

Figure 4.1(vi): Plot of quarter circle 𝑥 2 + 𝑦 2 = 1 and square of size 1 for 𝑁 = 10000
Output: N=10000 M=7872 Pi=3.15

Explanation: All the steps are as in 1-a except we have increase the domain of x. Now the
domain of ...


Anonymous
Nice! Really impressed with the quality.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags