Civil Engineering Data Collection and Analysis Discussion

User Generated

ivfpnryonepn

Engineering

Description

conduct a travel demand study for the Muscat region using a travel demand survey questionnaire. The team should perform a thorough literature review on travel demand survey covering the need for the survey, the data to be collected, preparation of a suitable questionnaire and methods of analysis.

Attachments:-

-The word doc include all the project details

-The 6 pdfs are journals for the literature review chapter 2(will uploaded after accept the qustion)

- The excal doc provided include all data needed for chapter 4: data collection and analysis

Unformatted Attachment Preview

1 Gender Total respondents of the survey 50 Male Female 38 12 Male 2 Age of respondent 18 years -25 years 26-35 years 36-45 years 46-55 years > 55 years 22 5 4 1 6 Male 3 Family income 4 No of members in household Number of vehicles owned by 5 the household 6 Occupation Female Female < 350 OMR 350-500 OMR 500-1000 1000-2000 >2000 1 2 25 4 6 1 to 2 2 to 4 4 to 6 greater than 6 6 10 26 8 0 cars 1car 2 cars 3 cars 4cars 5cars greater than 5 cars 2 8 15 22 3 0 0 Self employed(Business owner /Freelancer) Government employee Private company employee Unemployed Graduate Home maker (House wife) 5 2 26 2 3 6 3 3 0 0 Total 2 6 3 1 0 3 8 28 5 6 Mode choice to main actitivity in week day ( to work/to 7 college/to business) Reason for using car (out of 38 car users) (Respondents were asked to choose any one choice 8 which is very important) Student Retired 11 1 Own Car Rented car Taxi Bus Cycle Walking Company provided trasnport(College bus/company car) 38 0 5 3 0 1 Comfortable Cheaper Privacy Climate No alternatives Social status Door to door reach Less waiting time No need to change modes Easy to carry luggage Safety 6 6 9 3 0 6 13 2 1 1 3 3 Total 28 8 7 1 6 Travel demand survey for Muscat – A questionnaire based mini project Each team has to conduct a travel demand study for the Muscat region using a travel demand survey questionnaire. The team should perform a thorough literature review on travel demand survey covering the need for the survey, the data to be collected, preparation of a suitable questionnaire and methods of analysis. Once the data collection and analysis is done, each team is required to prepare and submit an online report highlighting all the salient features of the study along with all the evidence. Methodology for the mini project: The students are expected to do an extensive literature review for arriving at the methodology for the selected project. The tutor will act as a facilitator during the process of travel demand survey questionnaire development. Not less than 6 references are required to devise a suitable methodology, though there is no upper limit to the references. Additional support on the topic will be provided via class lectures and some support material that will be posted in the Blackboard. Format for report: Guidelines for the mini-project report and the marking rubrics can be downloaded from the Blackboard along with other required instructions. The cover page will be uploaded in the blackboard which has to be attached as the first page of the word document. A general outline for the report is provided below: Chapter 1 –Introduction ● Introduction ● Aim of the project ● Objectives ● Need for the study ● Scope of the project Chapter 2 - Literature review At least 6 relevant works to be reviewed for arriving at the methodology. Chapter 3 - Methodology of the project ● Detailed explanation of the methodology of the project with a flowchart ● Development of the survey questionnaire Chapter 4 - Data collection and analysis ● Data collected in detail (Both primary and secondary data may be used) ● Make excel charts using the data provided in the excel doc. ● Analysis using any statistical package Chapter 5- Results/Findings Chapter 6- Recommendations ● Recommendations based on the project output ● Recommendations for future work List of references ( Harvard style) Appendix Journal of Public Economics 3 (1974) 303-328. 0 North-Holland Publishing Company THE MEASUREMENT OF URBAN TRAVEL DEMAND Daniel Department of Economics, MCFADDEN* University of California, Berkeley, U.S.A. Transport projects involve sinking money in expensive capital investments, which have a long life and wide repercussions. There is no escape from the attempt both to estimate the demand for their services over twenty or thirty years and to assess their repercussions on the economy as a whole. Denys Munby, Transport, 1968 1. Introduction It is a truism that the transportation system is a critical component of every urban economy, and that transportation policy decisions can have a profound effect on the development of the urban system. Public transportation projects are often massive and mutually exclusive, with irreversible cumulative effects over long periods. If major social losses are to be avoided, careful planning based on a conceptually sound and empirically accurate benefit-cost calculus is essential. Accurate forecasts of travel demand under alternative transport policies are required for precise calculations of benefits. To be fully satisfactory, these forecasts must be sufficiently sensitive to reflect the impact of the changing urban environment over the lifetime of proposed transport projects. Travel demand forecasting has long been the province of transportation engineers, who have built up over the years considerable empirical wisdom and a repertory of largely ad hoc models which have proved successful in various applications. The contribution of psychologists and economists to forecasting methodology has been limited; despite a surge of recent interest, there still does not exist a solid foundation in behavioral theory for demand forecasting *This research was supported by National Science Foundation Grants GS-27226 and GS35890X and by the Department of Transportation, whose views are not necessarily represented by the contents of this paper. I am indebted to T. Domencich and M.K. Richter for useful comments at various stages of this research, and to M. Johnson, F. Reid, H. Varian, H. Wills and G. Duguay who have made major contributions to the empirical analysis of this paper; and I gratefully acknowledge the valuable comments of discussants R. Cooter, F.X. de Donnea, and E. Sheshinski. I claim sole responsibility for errors. 304 D. McFadden, Measurement of urban travel demand is complex and multifaceted, and involves ‘non-marginal’ choices, the task of bringing economic consumer theory to bear is a challenging one. Particularly difficult is the integration of a satisfactory behavioral theory with practical statistical procedures for calibration and forecasting. The object of this paper is to suggest approaches to advancing the behavioral theory of travel demand, and to shed light on some currently unresolved empirical questions on the determinants of travel behavior. Section 2 discusses the dimensions of travel demand behavior and the requirements imposed on any comprehensive theory of behavior. Section 3 presents selected results from a pilot study of rapid transit demand forecasting in the San Francisco Bay Area. practices.’ Because travel behavior 2. The dimensions of travel demand behavior start with the observation that urban travel demand is the result of aggregation over the urban population, each member of which is making individual travel decisions based on his personal needs and environment. These individual decisions are complex, involving trip purpose, frequency, timing, destination, and mode of travel. Further, these choices should be analyzed in the context of simultaneous choices of automobile ownership, housing location, and end-of-trip activities. Travel is not normally an end objective of the consumer, but rather a concomitant of other activities such as work, shopping, and recreation. Thus, it is natural to analyze travel demand within the framework of the consumption activity - household production models of Court-Griliches-Becker-Lancaster. We 2.1. Individual choice behavior Classical psychologic, theory views the individual as having a series of basic wants or drives.* Failure to satisfy these drives leads to increased activity; the larger the increase, the greater is the level of deprivation. Behavior which decreases deprivation is reinforced, and consequently learned. If we now assume this individual is a ‘rational’ economic consumer, we can postulate a ‘utility’ function summarizing the sense of well-being of the individual as a (decreasing) function of the level of deprivation he experiences. Suppose the individual exists over a sequence of short periods, say days, indexed v = 0, 1,. . . . Assume K levels drives, and let D, = (DOI, . . . . DvK) denote the vector of deprivation experienced by the individual in period v. We take the utility of the individual to ‘Many papers in the lilerature deserve mention for providing key elements in the foundation of a behavioral theory of travel demand, and useful insights into travel behavior. A partial list is: J. Dupuit (1844), S. Warner (1962), T. Lisco (1967), W. Oi (1962), J. Meyer et al. (1966), R. Quandt and W. Baumol (1966), G. Quarmby (1967), P. Stopher (1968), P. Stopher and T. Lisco (1970), D. Brand (1972), and M. Ben Akiva (1972). %ee, for example, E. Thorndike, A :heory of the action of the after-effects of a connection upon it, Psychology Review 40 (1933) 434439. D. McFadden, be a discounted Measurement of urban travel demand 305 sum of ‘per day’ utilities, writing u = “ZO SW~“) 3 (1) where 6 is a discount factor and the individual’s horizon is taken to be infinite to simplify later calculation. 3 Over his lifetime, the individual has available a set B of mutually exclusive alternative choices. Each member x E B is a vector x = (x0, x1, . . .), with x, a sub-vector of attributes associated with the decision made in period v. A simple example would be an individual whose only decision in life is a binary commute mode choice; i.e., his work type and location, residential location,.auto ownership, and non-work behavior are all completely determined. Let x0, xb denote the vectors of attributes such as travel time, cost, and comfort, associated with the two modes in period 0, and suppose these attributes do not change over time. Then, letting A = (x’, sb>, the set B is the Cartesian product B = A x A x .... More generally, the individual will face both long-run (residential location, auto ownership) decisions and short-run (timing of trips, mode choice) decisions, with the former decisions restricting the range of latter opportunities. The set of period v decisions x, associated with x E B will not be a simple ‘budget set’ of the type ordinarily encountered in consumer theory because of the qualitative transport choices involved and the ‘fixed charge’ nature of transport in facilitating consumer activities. To simplify analysis, we shall assume that the set of options at each u is finite; there is no particular technical difficulty in extending our analysis to the non-finite case. The relation between the consumer’s decision x E B and the evolution of deprivation levels over time is determined by the definition of drives and the nature of the household production technology; we assume this has the general form4 D u+l =f(Dv,x,). (2) The rational economic consumer will choose x E B to maximize utility (1) subject to his initial deprivation level D, and the constraints (2). To push the analysis beyond this very genera1 statement of the mechanism for determining behavior, we shall now make very specific and concrete assumptions on the functional forms of utility and the determination of drives; namely, utility linear in deprivation levels, u(D,)= -/?'D,, and deprivation levels evolving in a linear first-order difference equation, sThe linear additive form is justified only by convenience. 4The assumptions that the evolution of deprivation levels follows a first-order process and that the alternative sets are independent over time are made for notational convenience. They can be relaxed explicitly, or implicitly by broadening the definitions of deprivation levels and choice attributes to include historical information. D. McFadden, Measurement 306 D “+I of urban traoel demand = rD,+g(x,). To avoid boundary problems, we assume all real levels of deprivation, positive or negative, are defined. In these formulae, j3 is a vector of non-negative parameters and r is a Kx K matrix. In what follows, we shall assume the roots of r all lie in the interior of the unit circle; i.e., there are no self-sustaining rises in deprivation levels over time.5 It should be noted that while the functional forms (3) and (4) are concrete, they are not as specialized as might at first appear. First, since any sufliciently smooth utility function can be approximated on a specific set by a linear combination of appropriately chosen numerical functions, one can shape (3) by taking a sufficiently broad definition of the list of ‘drives’. Second, by including historical information in the deprivation-level vector and choosing the form of the functiong, a broad range of functional relations between the attribute vectors X, and per-day utility levels can be attained. We next use the forms (3) and (4) to simplify the statement of the utility maximization problem. Pre-multiply (4) by 6”+ ’ and sum over U: Assuming that the last term in this sum exists, we have f v--o 6"D, = (Z-W)- l Do + “tocY+‘g(x,)) , 5An earlier draft of this paper allowed for the possibility that some roots of I- could be unstable. This could correspond, for example, to the presence of drives such as ‘boredom’ which may have intrinsicJly unstable deprivation levels requiring continual monitoring and positive control, and could provide a theoretical explanation for cyclic variations in individual choice in the presence of static alternative sets. To make this possibility compatible with the earl& simplification assumption of an infinite horizon, I had previously assumed the matrix 6r to be stable. Discussant Robert C‘ooter has pointed out that this leads to the implausible conclusion that unstable deprivation levels will be divergent in the optimal solution, with the individual accepting extreme values of these variables in the discounted future in exchange for the short-run benefits of ‘steady-state’ behavior. While the unboundedness of deprivation levels might be dismissed as a consequence of our linearization of the consumer’s problem, the abscncc of cyclic behavior is contrary to the initial objectives of the construction. A much better approach to incorporating the possibility of cyclic behavior is to assume a finite horizon H in the utility function (1) anti impose no conditions on the roots of SK Then, the analogue of eq. (7) is N--L z/ = -B’n,,o,-/? c /lr,-a-16~+‘g(X,), “ZO with ,/f, = c” ii”T”, “CO or [I- 6r]nS = I- (Jr)“+ ’. It should be clear that the analysis we carry out for the case of stable rcould readily be adapted to this more general model. On the other hand, the assumption of a stable f seems more appropriate for application to steady-state Icork trip commute behavior. 307 D. McFadden, Measurement of urban travel demand and hence u = -p’(z-6r)-‘D0-~‘(z--6r)The first term in this expression problem reduces to MaxU= is constant; -Min X=(XO,XI,. Further I”$ a”+ ‘g(x,). hence, the utility maximization fi’(Z- U) - l “ZO P+ l&X”). . .)EB simplification of the problem occurs when B is a Cartesian B = A x A x ..,as in the mode choice example cited above :6 MaxU= product -Min xo~A 2.2. Population choice behavior Before giving concrete examples showing how problems (8) or (9) can be used to obtain implications for individual behavior, it is useful to explore the link between behavioral models of the individual and the data obtained from sampling an urban population. Our theory of individual behavior is not ‘singledvalued’; we cannot exclude the possibility that within our framework of economic rationality and postulated structure of utility maximization there will be unobserved characteristics, such as tastes and unmeasured attributes of alternatives, which vary over population and obscure the implications of the individual behavior model. However, it is possible to deduce from the individual choice model properties of population choice behavior which have empirical content. The following rather extensive digression on this subject may clarify the conceptual issues involved. Consider the textbook model of economic consumer behavior. The individual has a utility function u = U(x; p), representing tastes, which is maximized subject to a budget constraint x E B at a system of demands x = h(B;p), (10) where p is a specification of the individual’s tastes [e.g., p may be the individual’s binary preference relation (for which U is a representation), or a parameter vector specifying the utility function within a class of functional forms; factors influencing tastes and included in p are observed demographic variables such as sex, age, and education, and unobserved variables such as intelligence, experience, and childhood training; textbook models usually suppress the p argument]. The 6The reader will note that we have ended up with utility expressed as a function of attributes of the chosen alternatives. The conceptual apparatus of drives and household production has mattered only in the specification of the coefficient vector, and from a formal point of view could be dispensed with altogether. However, in drawing out the empirical implications of taste and production effects, the more elaborate structure is useful. econometrician typically has data on the behavior of a cross-section of consumers drawn from a population with common observed demographic characteristics: budgets B, and demands X, for individuals t = 1, . . . . T. He wishes to test hypotheses about the behavioral model (10) which may range from specific structural features of parametric demand functions (e.g., price and income elasticities) to the general revealed preference hypothesis that the observed data are generated by utility-maximizing consumers. The observed data will fail to fit eq. (10) exactly because of measurement errors in x,, consumer optimization errors, and unobserved variations in the population. The procedure of most empirical demand studies is to ignore the possibility of taste variations in the sample and make the plausible and convenient, but untested, assumption that the cross-section of consumers has observed demands which ‘are distributed randomly about the exact values .Yfor some common or representative tastes /?; i.e., x, = MB,; P)fE,, (11) where E, is an unobserved random term distributed independently of B,. The relation of observed aggregated demand to individual demand under this speGfication is straightforward. In a population of consumers who are homogeneous with respect to budgets faced, aggregate demand will equal individual demand ‘writ large’, and all systematic variations in aggregate demand are interpreted as generated by a common variation at the intensive margin of the identical individual demands. In the absence of unobserved variations in tastes or budgets, there is no extensive margin affecting aggregate demand. Conventional statistical techniques can be applied to eq. (11) under the specification above to test hypotheses on the structure of h. In the conventional demand study, where quantities demanded vary continuously, it is reasonable to expect marginal optimization errors and measurement errors to be important, and perhaps dominate the effect of taste variations. Then the specification (11) is fairly realistic.7 We now re-examine the conventional demand specification in the case that the set of alternative choices is finite. A utility maximum exists under standard conditions, and generates the demand equation (10). This equation predicts a single chosen x when tastes and unobserved attributes of alternatives are ‘Under scme conditions, the conclusion above on estimation of continuously varying demands will continue to hold even in the presence of some types of taste variation. Suppose one can postulate that consumers are homogeneous in tastes up to a vector of parameters that appear linearly in the demand function. (An example would be individuals with log-linear functions who face conventional budget constraints, with variation in the parameters of the utility function across individuals.) Then the demand functions can be estimated using random coefficients econometric models; what is important is that except for refinements in estimation of the error structure and the variances of estimators, this approach will lead to the same models and estimates as were obtained under the ‘identical consumers’ assumption. We shall next show that when consumer choice involves discrete alternatives rather than continuous choice, this ‘robustness’ property of the conventional model is lost. D. McFadden, Measurement of urban travel demand 309 assumed uniform across the population. The conventional statistical specification in (11) would then imply that all observed variation X, in demand over the finite set of alternatives is the result of errors in measurement and optimization. The argument that measurement error is an important factor is clearly implausible. Consumer optimization errors may be important, but then we must question the relevance of this behavioral model in which a substantial proportion of the observed variation in choice is attributed to aspects of behavior described only by the ad hoc error specification. Aggregate demand can usually be treated as a continuous variable, as the effect of the discreteness of individuals’ alternatives is negligible. As a result, aggregate demand may superficially resemble the demand for a pppulation of identical individuals for a divisible commodity. However, systematic variations in the aggregate demand for the lumpy commodity are all due to shifts at the extensive margin where individuals are switching from one alternative to another, and not at the intensive margin as in the divisible commodity, identical individual case. Thus, it is fallacious to apply the latter model to obtain specifications of aggregate demand for discrete alternatives. What is required is a formulation of the demand model in which the effects of individual differences in tastes and optimization behavior on the error structure in eq. (11) are made explicit. The implications of this specification for choice among discrete alternatives differ substantially from the conventional specification, as several examples will show. For notational simplicity, the utility function given in (7) is written in these examples as a general function of the attributes of alternative decisions, U(x). Example A. Suppose each member of the population has the utility function U(x,,x,) = x,+alogx,, and the budget constraint y = plx, +p2x2, with with a y 2 pr and x1 = 0 or 1. The taste parameter LXvaries in the population cumulative distribution function G(a) and mean Cr.Then, utility is maximized by purchasing a unit of good 1 when U( 1, (y-p JpJ > U(0, y/p2), or LX< - l/log (1 -p,/y). Hence, Prob (xIt = 1) = G( - l/log (1 -P~~/Y,)). Suppose an observed cross-section sample, has T income-price levels, indexed (Y~,P~~,P~,), R, individuals at income-price level t, and S, observed purchases of a unit of the first commodity among the individuals at level t. Then, the observed relative frequency f, = S,/R, is an estimate of the probability P, = G(- l/log (1 -pit/ y,)), and a statistical technique such as maximum likelihood or minimum chisquare can be used to estimate the unknown parameters of G. Suppose, for example, c( has the reciprocal exponential distribution G(a) = e(-e1’a)+e2* (0 < LX6 0,/0,>, where 0, and Q2 are positive parameters. Then log P, = e1 log (1 -pl,/vJ + (!I2 and a consistent (as all R, + + co) estimator of (0,) 0,) can be obtained by applying ordinary least squares to the equation logf, = 8, log (1 -p 1Jy,) + O2+ qt. (A weighted regression yields an asymptotically efficient estimator.) D. McFadden, Measurement of urban travel demand Example B. Each member of the population has a utility function U(x, , x2) = x1 su log x2 and budget constraint y = plxl +pzx,, where x1, x2 vary continuously. The demand functions are x1 = Max (0, y/p1 -E) and x2 = Min (y/p2, c(pI/p2). If c1varies in the population with a cumulative distribution function G(E), then the probability of observing an individual with zero demand for commodity 1 is Prob (x1 = 0) = 1 -G(y/p,). One then has a limited dependent variable which assumes its limiting value with positive probability. This problem can be handled statistically by maximum likelihood methods. For a log normal this model is a version of Tobit analysis. Example C. Each member of the population has a utility function U(x i, x2) = x1 +cc log x2 and budget constraint y = plx, +p2x2, where x2 is continuous, x1 is integer-valued, and we assume y/p, is greater than one and non-integral. Let m denote the largest integer less than y/p,. Suppose CIvaries in the population with a cumulative distribution function G(a), a > 0. Note that when x1 is treated as a continuous variable, utility has a unique maximum subject to the budget constraint at a value of x1 which is decreasing in TV.Hence, a critical value ~1, of cc at which an individual will switch from n to n+ 1 units of good 1 is determined by equality of the utility levels for these two quantities, implying c(” = -l,log{l-I&+)}, n = 0 )...) nz-1. (12) Hence, Prob [xi 5 n] = Prob [Z > cc,] = l-G(cc,) for n = O,..., m-l. From ihese formulae, the expected or average purchase of good 1 in the population is m-l Ex, = y “50 G(q). (13) A numerical example for the exponential distribution G(U) = 1 -e-” gives some idea of the bias introduced by using this continuous approximation to expected demand. If x1 is treated as a continuous variable, as in example B, the expected value is Ex, = j$‘P’ G(cr)dcr. 0 bias 5.5 0.028 0.145 8.94 2.1s 10.5 2.5 5.5 10.5 0.529 1.456 4.350 9.376 0.69 5.63 2.84 1.33 2.5 1.0 positive Percentage bias in approximation Y/PI 0.01 The True expected demand implies that fitting the continuous approximation to 311 D. McFadden, Measurement of urban travel demand data generated by the model will lead to underestimates of the parameter 8, which in turn will give spuriously high forecasts of the response of aggregate demand for good 1 to changes in price and income. Example D. A general model : An individual in the population has J alternatives, indexed j = 1,. . ., J, and described by a vector of observed attributes Xi for each alternative. The individual has a utility function which can be written in the form U = V(X) + F(X), where V is non-stochastic and reflects the ‘representative’ tastes of the population, E(X) is stochastic and reflects the effect of individual idiosyncrasies in tastes or unobserved variations in attributes for each observed attribute vector x. The probability that an individual drawq randomly from the population and given the alternatives 1, . . ., J will choose i equals Pi = Prob [V(xi)+&(Xi) = Prob [ & (x j )-E(x ~) > v(xj)+&(Xj) < V (X J - for all j # i] for allj V (x j ) # i]. (14) function of (&(x1), . . . . Let I/+~, . . . . sJ) denote the cumulative joint distribution E(x~)). Let $ i denote the derivative of $ with respect to its ith argument, and let Vi = V(Xj). Then, Pi = j”z $i(~+Vi-V,,...y s+Vi-VJ)ds. (1% Any particular joint distribution, such as joint normal, will yield a family of probabilities depending on the unknown parameters of the distribution and of the functions Vi. To illustrate the scope of this approach, suppose we assume that utility has the ‘linear-in-attributes’ form U(x) = CI~(X)+U’X, where c1is a random K-vector of taste parameters and a,,(x) is a taste effect specific to x. Suppose a is distributed multivariate normal with mean Orand covariance matrix A, and that aO(x) is distributed normally, independently of CC,with mean x’p and variance ai, and independently for different alternatives. Then the vectdr (U(x,) - U(x,), . . ., U(x,) - U(x,)) = U is multivariate normal with mean (E+ fl)‘Z’ and covariance matrix a$l+oge,e;+ZAZ’, with e, a J-vector of ones and Z’ = (x2-x1, ..,, xJ-x1). The probability that alternative one, is chosen equals the probability that the vector U is negative. For binary choice, this probability is ( (~+iv(xi-xz> p1 = cp 2/{20~+(X2-X&4(X~-X1)} >’ (16) where @ is the standard cumulative normal. When A is zero, this reduces to the conventional binary probit model; when 0; is zero, we obtain a model similar to the one proposed by Quandt (1966) for travel demand modeling. For multinomial choice, calculation of the choice probabilities requires numerical integration or approximation, a cumbersome requirement in non-linear statistical procedures. 312 D. McFadden, Mecwrctnent of urban iravcl demand A second example with considerable assuming the E(.Y~)are independently distribution computational advantages is obtaintd by identically distributed with the Wetbull Prob [E(x~) < c] = eee-‘. (17) Then the choice probability for alternative ey ’ P, = 7’ C evI j=l 1 is8 (18) and relative odds of choices satisfy log Pi/Pj = vi- vi. This is the well-known multivariate or conditional logit model which forms the starting point for much of the recent empirical work on disaggregated travel demand models. The multivariate probit or logit models outlined above, or alternative models derived from (15), can be estimated by maxim:lm likelihood methods, and under some data formats by modified minimum chi-square (Berkson-Theil) methods. The merits and drawbacks of these methods have been analyzed elsewhere by the author [McFadden (1973a)]; this reference also includes a survey of the statistical literature on the analysis of binary data and a discussion of the logical foundations and practical shortcomings of the logit model. 2.3. A behaktal model of mode choice We next give an example illustrating how the consumer’s optimization problem (8) and the analysis of population behavior given in example D can be combined to obtain specific models of transport demand. This example provides the framework for the empirical results reported in this paper, and is also the basis of the empirical work reported in Domencich and McFadden (1974) and McFadden (1973a). Example E. Consider an individual whose only decision is a choice of work commute mode, all other factors such as location, auto ownership, etc., being specified. We assume the attributes of his alternatives do not change from day to day, so that his optimization problem reduces to that given in (9). We assume initially that he faces a binary choice between auto and bus transit modes; we shall later introduce the alternative of a rapid transit mode. Suppose the relevant drive, are for nourishment (broadly defined), rest, and comfort. Commute alternative i has attributes defined by a vector xi = (Ci, Tvi, Ta,, Ki) *See McFadden (1973a). D. McFadden, Measurement of urban travel demand 313 giving the cost, on-vehicle time, access time, and comfort level of this mode. Only the first three attributes are observed. Letp, denote the per-period wage and pr a price index for consumption goods. Since we have assumed the individual has no choice as to amount worked, we can normalize working hours to one and take pw to also represent per-period income. Assuming all income is spent, the individual choosing mode i will purchase a quantity (pw - Ci)/pF of consumption goods and will forego TV i+ Ta i units of leisure beyond work time. The deprivation level of nourishment is assumed to satisfy D 1&J+ 1 = Y~D,,“-[(Pw-Ci)/P~-~~l, (0 < Yl < 1). (20) Fatigue will evolve similarly, with commute access time (involving walking and exposure to the elements) possibly being more tiring than on-vehicle time, and times being weighted by the real wage rate of the commuter, D 2,v+ 1 = YzD2,“-I~2_T”i_cr,.Tail~w/PF, (0 < Y2 < 1). (21) Discomfort is assumed to be non-cumulative, with D 3,v+1 = --Ki. Combining (22) these expressions yields the optimization problem (9), A+O,.s +02Tt,,.e PF +B,Tai.p~ -OaKi PF where A is a constant, 0r = (S/(1 -S))p,/(l-6y,), 0, = Q2, and e4 = (S/(1 -S))p,. F or a binary will select mode 1 if x pFT -O,(Ta,-Ta,).if corresponding 1, to (23) tY2 = (S/(1 -S))j3,/(1-6y,), mode choice, the individual . (24) Mote that the right-hand side of this expression is the difference of the‘impedence’ of the two modes, using a common definition of this term in the transportation literature. Suppose we now assume that the unobserved comfort variables e4Ki have the Weibull distribution described in .example D. Then, the probability that a randomly selected individual from the population will choose mode 1 is given by the binomial logit response curve P, = 1 l+ exp 1-I O,(C,-C,)/p~+e~(TvI_Ttz)p~ ( +O,(Ta,-Ta,)e PF (25) 314 obtained D. McFadden, Measurement of urban travel demand from eq. (18) in the case J = 2 and This response curve appears frequently in the transportation literature. What is interesting here is not the fact that we are able to derive conventional response curves by a (non-unique) choice of functional forms in a behavioral model, but rather that arguments of the type we have outlined could be used to generate functional forms for practical response curves from detailed analysis of individual behavior. There is an aspect of travel demand which has been left out of the above analysis, but which will play an important role in a comprehensive behavioral demand theory, the structure of decision making. Travel demand involves decisions along various dimensions such as mode, destination, frequency, along with long-run decisions on auto ownership and location. If all these decisions 4 rail 3 bus 1 auto Fig. 1 are made jointly, the number of distinct alternatives can be immense, presenting a problem not only to the investigator but also to the individual faced with the decision. Studies of decision behavior suggest that the individual in this circumstance is likely to follow a ‘tree’ decision structure, for example, first choosing whether to go on a trip, then to what destination, and finally by what mode. Such a decision structure will normally involve recourse if a particular branch is infeasible, but will require only local optimization, with considerably less computation than would be involved in evaluating all alternatives. A successful behavioral theory should not only parallel the individual’s decision tree, but should exploit the separability of decisions implicit in this tree to make empirical analysis practical. To illustrate the problem, suppose in example E that the set of alternatives is expanded to a mode choice between auto, bus and rail. The individual has the decision tree illustrated in fig. 1. This tree may correspond to a true joint decision between these three alternatives, represented as a binary bus-rail decision conditioned on transit choice, followed D. McFadden, Measurement of urban travel demand 315 by an auto-transit decision based on the ‘weighted’ attributes of transit. Alternatively, it may represent a true recursive structure in which the auto-transit decision is made based on some ‘average’ perceptionoftransit attributes,followed in the case that the transit leg is chosen by a decision among transit modes. In the first case, decisions can be viewed as being made moving down the tree; in the second case, moving up. Assuming the unobserved term in the rail alternative has a Weibull distribution as in example E, eq. (18) provides the multiple choice probabilities in the case of a joint choice among the final alternatives. Letting Vi, V, , V, denote the ‘representative’ utility of these three alternatives, we have evl evl r=-, PI = (27) e" + ey3+ ev4 e"' + ev2 where V, is defined to satisfy evZ = ev3 +ev4 and represents the ‘weighted’ utility of the transit alternative. The probability of bus conditioned on transit is (28) On the other hand, an individual moving up the decision tree will use (28) to choose between 3 and 4 once decision point 2 is reached, but may use a different ‘weighting’ for V, in the formula (29) For example, the ‘averaging’ rule might be V, = Max(V,, V,), (30) v, = v,p,,,,+ VP4,34. (31) or Both these rules will weigh the transit alternative less positively than the pure conditional logit weighting. The multiple choice models based on (30) and (31) are termed the ‘maximum’ and ‘cascade’ models, respectively. Although these models are plausible empirical alternatives to the conditional logit model, it should be noted that they are not derived from the utility maximization framework of example D.9 3. Empirical results10 We report here on the initial results obtained from a three-phase investigation 9The consistency ofdecision tree models under separability assumptions on utility is discussed in detail in Domencich and McFadden (1974). reThe empirical equations of this paper are revised upon the suggestion of discussants F.X. de Donnea and E. Sheshinski to incorporate the effects of income and after-tax wage (opportunity cost of travel time). A more extensive empirical analysis, including material contained in the previous version of this paper, is given in McFadden (1974). 316 D. McFadden, Measurement of urban travel demand of patronage forecasting models for rail rapid transit, using data collected in the San Francisco Bay Area before and after the introduction of Bay Area Rapid Transit (BART). The BART system is one of the first totally new fixed rail transit systems built in the United States since the beginning of the century, and is unique in that it combines the advantages of subway-like operation in downtown areas with extensive service corridors in suburban areas. It is fully automated to achieve low running times and headways, and is designed to be competitive with the automobile in comfort. It is the prototype of a series of rapid transit projects under consideration in major American cities. Thus, there is potentially a great social return to refining patronage forecasting methods for such systems, and thereby enhancing the accuracy of the cost-benefit analyses on which design and construction decisions are made. The results below are based on a sample of 213 households living and working in BART ‘accession areas’; a detailed description of the sample is given in McFadden (1973b, ch. II), which is also the source of the following summary: The Work Travel Study was undertaken to examine factors in the choice of travel mode to work among Bay Area residents prior to the opening of the new Bay Area Rapid Transit System. Since resources did not permit the interviewing of more than about 200 respondents, the study did not attempt a full geographic coverage of the Bay Area or a coverage of all types of commuting patterns. Rather, it focused on three considerations. First, interviewing was confined to household residents of a Y-shaped area of Alameda and Contra Costa Counties, centering on the major industrial cities of Oakland and Berkeley and on the small city of Emeryville lying between them. It also encompassed surrounding suburban areas lying suffi,ciently close to the radiating BART lines to make commuting by BART into the central area a realistic possibility. Second, interviewing was restricted to employed persons whose usual places of work were within the cities of Oakland, Berkeley, or Emeryville or across the bay in San Francisco or Daly City. This restriction was imposed in the belief that subsequent work travel on BART initiating within the study area would consist primarily of movements (a) within and between the core cities of Oakland, Berkeley and Emeryville; (b) into these core areas from surrounding suburban areas; and (c) from these areas to San Francisco or to the endpoint of the San Francisco BART line in Daly City. Third, since persons living closest to BART stations seemed most likely to use the new system, the sample was disproportionately drawn from persons residing in census tracts containing BART stations or immediately adjacent to them (hereafter, these shall be called BART contiguous tracts). The remainder of the area was more lightly sampled. As a rough goal, the sample was to consist of approximately equal numbers of commuters residing in (a) BART contiguous tracts of the core cities, (b) other tracts of the core cities, D. MrFaddcn, Measurement of urban travel demand (c) BART contiguous tracts of surrounding tracts of the surrounding suburban areas. suburban 317 areas, and (d) other While controlling approximate numbers in these four cells, the sample also was to be drawn in such a way as to permit the preparation of unbiased estimates of the characteristics of all household residents of the study area commuting to the designated cities. Thus, respondents could not be chosen simply to meet a predesignated quota; rather, they had to be part of a carefully controlled probability sample. This was accomplished by dividing the study area into a number of carefully defined geographic strata and then by sampling each stratum by multistage area probability sampling methods. After the strata were designated, one or more census tracts were chosen from each stratum, with probability proportionate to the stratum’s number of housing units. One city block was then chosen from each sampled tract by the same method, a list was prepared of all housing units on each sampled block, and approximately equal numbers of housing units were then chosen from each block by systematic random sampling from the list. Thus, in each stratum all housing units had the same probability of selection. Although sampling ratios varied from stratum to stratum - that is, a larger proportion of households were chosen in some strata than in others to provide the desired numbers of commuters of each type called for by ;!re design-estimates for the full study area could be prepared by appropriately weighting the stratum results. The task of designing a sample to meet these goals was greatly complicated by the need to screen comparatively large numbers of households to locate persons commuting to work in the designated cities. Many suburban residents, of course, are employed in their own communities rather than in the central cities. In addition, many households - especially in the central cities - contain no employed persons but only those who are retired, unemployed, or supported by Welfare. Data from a previous survey and from the 1970 Census were employed to estimate the total sample necessary to yield the desired number of cases of each type, but these provided only approximate guides, and during the course of the fieldwork it proved necessary to augment the original sample in order to obtain the desired numbers. A total of 710 occupied households was ultimately contacted to achieve a final sample of 213 interviews. A reinterview of this sample, combined with a retrospective interview of a larger sample, will be carried out in 1975 to extend and validate the models considered in the current analysis. The survey data was augmented with careful calculations of travel time, costs, congestion, and related variables for existing auto and bus modes, and for projected BART service. The auto data was collected by F. Reid; the procedures are described in McFadden (1973b, ch. III). The bus data was collected by M. Johnson; as described in McFadden (1973b, ch. IV). B 318 D. McFadden, Measurement of urban travel demand In order to limit the size of the investigation and to concentrate on simplest and best understood travel behavior where the advantages and advantages of alternative models could be most easily detected, attention confined to work trip behavior, specifically mode choice and timing of commute trip. We report here only on the mode choice decision. the diswas the Of the 213 survey respondents, 160 used auto or bus commute modes (as opposed to walk, bicycle, etc.), had access to both modes, and had complete data on the major time and cost variables. This subsample formed the basis for the analysis. The following paragraphs point out some of the main demographic characteristics of the sample; there is no indication that the subsample utilized differs significantly except in the exclusion of OaklandBerkeley respondents who walk or bicycle to work. Table 1 summarizes some demographic proportions in the sample. The median income in the sample is Demography of the sample. Table 1 Demographic characterists of the sample (sample size : 213). Variable Percent of sample White Work in San Francisco One-family dwelling Male respondent Married Auto usual work mode Nuclear family Primary individual alone Has driver’s license Car available to family 17 26 72 65 69 78 72 13 90 91 Respondenr : Never uses bus Health good or excellent Physical handicap Drives vehicle as part of work Standard work week 26 95 2 25 75 Respondent: Has second job Flexible working days Flexible working hours Standard work period (within 6 a.m. - 6 p.m.) Car pool used whenever car mode used Expect could use BART Plan to use BART regularly 6 19 31 65 33 69 16 D. McFadden, Measurement 319 of urban travel demand $12,500, the average number of adults over sixteen is 2.23, the average age of the respondent is forty-one, the average number of household members employed is 1.6, and the average number of cars per worker is 1.29. These figures are generally comparable to census statistics for families with employed members. Binary logit response curves. Various forms of the binary logit model described in example E were estimated by maximum likelihood methods, described in McFadden (1973a). Table 2 gives estimates obtained for ‘standard’ specifications of the relative ‘impedence’ of the modes. In these models, the pure auto mode preference effect corresponds to a variable that is one for the auto alternative and zero for the bus alternatives; a positive coefficient indicates that Table 2 Binary logit response curves; dependent variable: Auto-bus mode choice (zero if bus is usual or frequent mode, one otherwise); estimation method: Maximum likelihood on individual observations; sample size: 160; T-statistics in parentheses. Independent variable Family income with ceiling of $10,000, in IEper year Car-bus cost, in cents per round trip Car-bus on-vehicle time times post-tax wage, in mm. per l-way x 0 per hr. Bus walk time times wage, in min. per l-way x S per hr. Bus first wait time times wage, same units Bus transfer wait time times wage, same units Bus total wait time times wage, same units Bus total access time times wage, same units Bus total travel time times wage, same units Pure auto mode preference effect (constant) Model 1 Model 2 Model 3 Model 4 0.000065 (0.518) - 0.00920 (3.085) -0.00858 (1.263) 0.000064 o.oooo95 0.000074 (0.601) -0.01165 (4.506) - - 0.000092 (0.021) -0.01713 (0.771) - 0.01902 (1.365) - (0.517) -0.00915 (3.184) -0.00852 (1.273) - 0.000080 (0.018) - -0.01838 (1.947) - (0.774) -0.01022 (3.726) - 0.01479 (2.460) - - - - :--0.00314 (0.818) - 0.1499 (0.165) 0.1483 (0.163) 0.3832 (0.428) - 0.00728 (2.480) 0.5516 (0.561) Likelihood ratio index R2 index 0.30626 0.92 0.30623 0.93 0.2794 0.66 0.2633 0.61 Percent correctly predicted Car Bus Value of time (percent of after tax 85 79 85 79 84 68 83 68 32 56-62 28 60 43 9 45 wage) On-vehicle Wait 320 D. McFadden, Measurement of urban travel demand when the remaining variables are zero, more than half the population will choose auto. Bus transfer wait time is calculated directly from transit schedules. Initial wait time is taken to be one-half the average headway on the initial carrier for the home to work and the work to home trips, with a ceiling of a fifteen-minute wait; this measure will be biased upward when commuters can follow transit schedules. Bus walk time is computed from the number of blocks walked at the origin and destination, assuming a walking time of two minutes per block. Models l-4 ignore the possibility of auto access to transit even though twentytwo percent of the bus riders use auto access to bus. Thus, walk time may be a substantial overestimate of actual bus access time, particularly for suburban commuters where the ‘park-ride’ option is most common. This shortcoming of the empirical analysis may explain the unexpected insignificance of the coefficient of bus walk time in these models. l1 The likelihood ratio index and R2 index reported in table 2 are measures of goodness of fit discussed in McFadden (1973a). The R2 index is similar to the multiple correlation coefficient in ordinary least squares; the likelihood ratio index is a more stable and statistically satisfactory measure for the estimation method used. The models of table 2 all give coefficients of expected sign. With the exception of bus walk time, the implied valuations of time agree with previous estimates [Quarmby (1967), Thomas (1971)]; at the sample mean aftertax wage of $3.87 per hour, the value of on-vehicle time is $1.23 per hour and the value of wait time is $2.32 per hour. However, because of the low precision of the estimates of the travel time coefficients, we cannot reject at the ten percent level the hypothesis that all components of travel time are weighed equally. Weighting of time and cost components. The specification of the models in table 2 can be tested against alternative hypotheses that different travel time and cost variables and other factors have distinguishable effects on behavior. Of particular interest are the questions of whether mode attributes can be measured generically using conventional time and cost variables, and whether components of time and cost are weighted equally. We summarize the conclusions; the estimates on which they are based are given in McFadden (1974). (a) We accept at the ten percent level the hypothesis that auto and bus on-vehicle times are weighed the same. The power of the test is low, and the point estimates imply an average premium of $0.88 per hour in the weight attached to bus travel. This premium could reflect the reduced comfort and privacy of bus transit which are not measured directly. (b) We accept at the ten percent level the hypotheses that no weight is given to schedule delay (defined as the average of the waiting times at the workplace before the job begins and after the job ends which are required to fit bus * ‘A conditional logit analysis treating bus with walk and ride access as distinct alternatives is reported in McFadden (1974). D. McFadden, Measurement of urban travel demand 321 schedules), number of transfers, or auto time spent in driving on freeways at less than twenty miles per hour. The power of the tests is again low. The estimates provide speculative, but plausible, values of $3.33 per hour for schedule delay, 15.5 cents per transfer, and a prentiunt of $2.13 per hour on auto time spent driving under congested conditions. (c) We accept at the ten percent level the hypothesis that the value of travel time is linear homogeneous in the after-tax wage rate. The estimates suggest, however, that value of time may be an increasing function of the wage rate. This conclusion, if substantiated, would be consistent with hours-worked decisions more closely approximating the neoclassical labor-leisure margin at higher wage levels, or with imperfect correlation of measured and effective wages in labor markets segmented by wage rate.i2 (d) We accept at the ten percent level the hypotheses of equal weighting of total auto costs and total bus costs, and of auto mileage, tolls, parking, and maintenance costs. The estimates suggest that mileage and maintenance costs may not be weighed as heavily as tolls and parking costs; however, the precision of these estimates is quite low. Because of the small sample size, none of the tests above are conclusive, should be taken only as suggestions for further research. Inventory of possible explanatory and variables. In order to make an inventory of the large number of additional variables which might influence mode choice, we posed the question of whether the ‘unexplained residual’ from the binary logit model was correlated with these variables. This was done by calculating transformed residuals from the logit estimating equation, and correlating these residuals with the list of candidate explanatory variables. This method was devised for the binary logit case by Cox (1970); an essentially equivalent multinomial transformation described in McFadden (1973a) was used in the present analysis. The residuals are derived from Model 1. They are distributed with zero mean and unit variance if Model 1 is correct, and in this analysis are positive when bus is chosen, negative otherwise. (Hence, a positive correlation indicates high values of the explanatory variable are associated with increased bus use.) Table 3 is a selected list of variables correlated with the residuals; those significant at the five percent level are candidates for inclusion in further estimation. It should be noted that some of the significant correlations are with variables which we would expect to be jointly determined with mode choice rather than predetermined at the point the mode choice decision is made. The behavioral model should be expanded to include a theory of this simultaneous choice. A number of correlations in table 3 deserve comment. First, there are variables, “This conclusion is based on unpublished research by Luke Chan, University of California, Berkeley. 322 D. McFadden, Measurement of urban travel demand Table 3 Correlations of unexplained residuals in binary logit analysis with candidate explanatory variables. Variable ~.______ _ ‘Important to live close to public transport’ Does not have regular use of a car Number of cars in household Respondent does not drive Index of population density on street Distance to parking at home No car required in work ‘Enjoy riding distances with family’ Length of residence in community Plans to use BART Adjusts travel time to traffic conditions Owns home Number of rooms in house Multiple-family dwelling unit Number of drivers in household Number of minutes can arrive late at work Expect to stay in present location for 2 years Minutes leeway allowed for emergencies ‘I become angry in traffic jams’ Mixed residential/commercial street ‘Bus drivers are polite’ ‘Enjoy freeway driving in traffic’ ‘Buses smell of fumes’ Respondent’s age ‘I can read or study on the bus’ Amount varies time leaving work Female respondent ‘I am lucky with parking’ ‘People buy cars that are too big’ ‘Fast freeway driving makes me nervous’ Distance respondent is willing to walk Number of weekend days worked Workplace in CBD Work trip in peak Workplace in San Francisco Non-white respondent ‘Cars are no better than bus in current traffic’ ‘A car is the ultimate convenience’ Years of education ‘Poor bus service is a problem’ ‘Protection from crime is a problem’ Distance to work Non-standard working hours Number of household members employed Marital status of respondent Correlation 0.48” 0.44” - 0.34” 0.33” 0.30” 0.27” 0.23” 0.23” -0.23; -0.20b -0.22b -0.22b - 0.22” 0.21 b - 0.20b - 0.20b -0.19b -0.18b O.lgb O.lgb O.lgb -0.16 -0.15 -0.13 0.14 -0.11 0.11 -0.12 0.13 0.12 0.13 0.10 - 0.01 -0.09 -0.01 0.08 0.10 -0.09 0.01 -0.06 - 0.07 0.02 -0.02 - 0.05 0.00 - “Significant at 1 % level. %ignificant at 5 o/olevel. D. McFadden, Measurement of urban travel demand 323 such as (the respondent does not drive), which indicate whether the respondent has access to the auto mode. The model should clearly either screen out individuals with these atypical choice sets or include explanatory variables identifying these cases. There is the danger that some variables of this type are simultaneously determined by mode choice; to avoid the statistical problems associated with simultaneity, instrumental variables methods may be required. Second, variables such as (number of cars in household) tend to be correlated due to the joint determination of auto ownership and mode choice. We report elsewhere on estimation of the simultaneous auto ownership and mode choice decisions using instrumental variables methods within the binary logit framework [McFadden (1974)]. Third, variables such as (distance to parking at home) and (no car required in work) represent legitimate explanatory factors that appear to reflect attributes of modes not captured in the summary time and cost measures. Fourth, variables such as (length of residence in community) and (owns home) reflect socioeconomic factors which appear to influence the distribution of tastes. Fifth, variables such as (index of population density on street), (‘important to live close to public transit’), and to some extent (number of rooms in house), (owns home), etc. are all related to the location decision, which in turn may be made jointly with the mode choice. These correlations suggest that there is a significant relationship between these decisions. If individuals with pro-bus tastes or relatively low valuations of time locate where bus impedence is relatively low, and vice versa, and Model 1 is estimated without taking this effect into account, then the steepness of the estimated response curve is exaggerated, and one may forecast too high an incremental response to a policy change. Sixth, a few attitude variables are significant: (‘enjoy riding distances with family’), (‘I become angry in traffic jams’), and (‘bus drivers are polite’). These may reflect a causal effect of attitudes on tastes and behavior, or alternatively may themselves be jointly determined with’mode choice by more basic explanatory factors. The interest in attitude variables from the standpoint of transport policy analysis lies in the question of whether planners can influence behavior by campaigns to modify attitudes. A demand model with explanatory attitude variables is not useful in answering this question unless the mechanism for the action of public relations programs on these attitudes can be discovered. In the latter case, one may well be able to bypass the measurement of attitudes entirely, and concentrate directly on the relation between publicity campaigns and mode choice behavior. Alternatively, one may wish to develop models of the simultaneous processes of attitude formation and modification of travel behavior. Neither of these alternatives suggests that it is particularly useful to estimate travel demand models treating attitudes as pure explanatory variables. The current inventory of attitude items indicates that with the exception of (‘bus drivers are polite’), there is little relation between behavior and the attitudes that might be influenced by a campaign publicizing the attributes of transit. 324 D. McFadderr, Mcasuretnent of urbarr rraoel demand Travel demandforecasts. The binary logit response curves estimated in table 1 provide a basis for predicting or forecasting individual mode choice, both for the existing auto-bus alternatives and for the auto-bus-rail alternatives available after BART is fully operational. Further, by inference from the sample to the population from which it is drawn, one can forecast aggregate modal split. Suppose we have a sample that is representative of the population and a logit model such as Model 1 estimated either from the sample or from external sources. Then, the predicted probability for any individual in the sample is a best estimate of the distribution of responses in the population by those individuals facing the same environments. Since the sample represents a (weighted) random selection of the environments faced by the population as a whole, the (inversely weighted) average of the predicted probabilities over the sample is a best estimate of aggregate demand. ’ 3 The influence of transport policy on aggregate demand can then be assessed by computing its effect on the sample average. It should be noted that this procedure provides a more accurate measure of demand elasticities than can be obtained by the conventional method of computing the elasticity of the respo;lse curve at the mean of the independent variables : Aggregate demand is the average of the response curve weighted by the distribution of the independent variable. If a substantial proportion of the population faces relative impedences which are sufficiently extreme to elicit almost certain mode choices in one direction or the other, then a small change in the impedence of one of the modes will still leave the relative impedence for this proportion of the population sufficiently extreme to almost certainly determine mode choice. As a result, the response of aggregate demand to this impedence change will be low, and will bear no systematic relation to the elasticity of the response curve at the data mean. Table 4 presents computations of the aggregate modal split (observed weighted sample frequency) for the auto-bus choice, and the elasticity of these aggregate demands with respect to changes in the explanatory variables. The elasticity values are relatively low, as is normally expected for short-run travel demand. They suggest that the most effective way to increase bus patronage is to increase auto costs, say, by introducing parking or gasoline taxes. A ten percent reduction in bus fares or in running times would each yield a patronage increase of approximately five percent. We next turn to the question of forecasting demand for a new mode, BART. Using engineering forecasts of BART service levels made in July 1972, and taking 13An alternative method of computing aggregate demands is to specify a distribution of the indepqndcnt variables in the population a,ld compute anniytical!y or numerically the expectation of the response curve with respect to :his distribution. This can be done particularly conveniently in the case of binary probit analysis: If the independent variables x are normally distribuied with mean 1 and covariance matrix A, and the probit response curve is P = @g(@x), then aggregate demand is given by D = @{O’~/,l(l+t)‘AO)). This demand again has the proper;; that the more disperse the distribution ok ihe independent variables, the lower the demand elasticities. D. McFadden, Measurement of urban travel demand 325 the calibration of Model 1 to provide the appropriate weights for a generic characterization of the BART alternative, we have used the conditional logit model given in eq. (27) and (28) to compute aggregate demand forecasts for our sample. These results are preliminary due to the preliminary nature of the BART service level measurements. The conditional logit model has the ‘independence of irrelevant alternatives’ property discussed in McFadden (1973a) which may bias upward the sum of the predicted probabiiities of two alternatives whose unobserved attributes are not perceived by decision-makers as independent. Since this may be the case for the two public transit modes, we also considered Table 4 Estimated auto-bus patronage and demand elasticities from Model 1 ;population: East Bay residents who commute to work in Oakland, Berkeley, Emeryville, San Francisco, or Daly City.a Patronage (morning commute)b Modal spW Demand elasticity with respect to ! Income (with a ceiling of $10,000) Car cost Car on-vehicle time Bus cost Bus on-vehicle time Bus walk time Bus first wait time Bus transfer wait time Car demand Bus demand 69,488 75.1% 23,045 24.9 % 0.09 -0.32 -0.13 0.15 0.15 0.00 0.06 0.09 - 0.28 0.97 0.39 - 0.45 -0.46 0.00 -0.17 -0.26 “The calibration sample weighted (by strata) to present this population. bB~s demand by regular or frequent users. ‘The unweighted sample modal split is 75.6 percent car, 24.4 percent bus. “The demand elasticities are computed from predicted patronage, calculated,from the weighted sample. the ‘cascade’ and ‘maximum’ models, which view the individual as first making a choice between auto and transit, and then choosing between bus and BART if transit is selected. The results are given in table 5. The modal splits given by these models can be compared to the sixteen percent of the sample who indicated that they planned to use BART. Since the BART system is not in full operatioh and actual patronage counts are not recorded by trip purpose, it is difficult to compare these forecasts with current patronage figures. In October 1973, without trans-bay service, BART averaged 9,762 daily ‘commute’ round trips in the area for which our population is defined. Since twenty-six percent of our population works in San Francisco and does not yet have the BART alternative, a crude 326 D. McFadden, Measurement of urban travel demand calculation taking seventy-four percent of the conditional logit patronage forecast yields a daily forecast of 9,658. The figures of 9,762 and 9,658 are only crudely comparable since the BART actual patronage figure excludes non-peak work commutes and includes peak non-work trips, and no adjustment has been made in our forecasts for changes in the independent variables between July 1972 and October 1973, changes in population size and number of workers, or inaccuracies in weighting the sample to obtain population figures. BART transit district forecasts for full system operation are substantially higher than those predicted by the conditional logit model, and the weight of the biases in the Table 5 Modal split forecasts for car-bus-BART mode choice from Model I ; assunptiorts: (1) SART running times and fares are set at the engineering specifications of July 1972, (2) car and bus running times and fares arc unchanged from July 1972, (3) home to BART access is by car (park-ride), (4) trip ‘generation’ and ‘distribution’ are unchanged;” population: East Bay residents who commute to work in Oakland, Berkeley, Emeryvillc, San Francisco, or Daly City. BART given transit Car BUS BART Total transit Conditional logit model Patronage Modal split 61,110 66.0% 18,371 19.9% 13,051 14.1% 3 1,422 34.0% 41.5% Cascade model Patronage Modal split 67,495 72.9 ;,; 14,911 16.1% 10,126 10.9% 25,037 27.0% 40.4 ;,‘, Maximum model Patronage Modal split 66,067 71.4% 15,740 17.0% 10,724 11.6% 26,464 28.6% 40.4 :/, “The simultaneous estimation of modal split, distribution, and generation behavioral model is discussed in Domencich and McFadden (1974); no attempt to an:~1yzc generation and distribution in this study. in a consistent has been made preceeding calculation also suggests that the conditional logit forecasts may be too low. In the same manner as for the binary auto-bus mode split, we can corilpute the elasticity of the forecast aggregate demands with respect to the explanatory variables. This is done in table 6 for the conditional logit model. The elasticity of BART demand with respect to auto cost is relatively high, suggesting that policy measures such as increasing tolls or parking taxes will have a substantial effect on BART patronage. The elasticity of BART patronage with respect to BART on-vehicle time is also relatively high, indicating that mairrtenance of the engineering forecasts of running times is an important factor in retaining patronage. The elasticity of BART demand with respect to BART fares is almost D. A4cFadden, Measurement one in this short-run model, indicating increase revenue by only 1.4 percent. that a ten percent increase in fares would Table 6 Demand elasticities for car-bus-BART Model 1, conditional logit model, Elasticity with respect to: Income (with a ceiling of 810,000) Car cost Car on-vehicle time Bus cost Bus on-vehicle time Bus walk time Bus first wait time Bus transfer wait lime BART cost BART on-vei-.;cle time BART walk time BART first wait time BART transfer Wait time 327 of urban trove1 demand mode choice; asmn~pfions: and conditions of table 5. Car demand 0.15 -0.47 -0.22 0.12 0.14 0.00 0.05 0.07 0.13 0.10 0.00 0.02 0.11 Bus demand -0.25 0.81 0.36 -0.58 -0.60 0.00 -0.19 -0.29 0.25 0.13 0.00 0.03 0.16 BART demand -0.29 0.82 0.41 0.28 0.23 0.00 0.06 0.09 -0.S6 -0.60 0.00 -0.12 -0.66 4. Conclusions The reader is cautioned that, as in any pilot study, the results reported above are tentative and may not hold up under further investigation. Further, because of the specialized nature of the sample, particular care should be exercised in drawing inferences on aggregate behavior of the Bay Area population. Taken in sum, the results appear to be generally internally consistent, and consistent with the existing literature and folklore on travel demand. The behavioral methods outlined in this paper for the measurement of travel demand appear to open the possibility of analyzing hitherto unexplored aspects of the subject, with the hoped for consequence of refining the calculation of benefits of transport projects, and thus improving the quality of urban transportation planning. References Ben-Akiva, Ij4., 1972, Structure of travel demand models (Transportation Systems Department ofCivil Engineering, M.I.T., Cambridge, Mass.) unpublished. Brand, D., 1972, The slate of the art of travel demand forecasting: A critical review University, Cambridge, Mass.) unpublished. Cox, D., 1970, Analysis of binary data (Methuen, London). Domencich, T. and D. h?cFadden, 1974, Urban travel demand: A behavioral analysis River Associates, North-Holland, Amsterdam). Dupuit, J., 1844, On the measurement of the utility of public works, Annales des Chaussees, 2nd ser. 8. Division, (Harvard (Charles Ponts et 328 D. McFadden, Measurement of urban travel demand Lisco, T., 1967, The value of commuters’ travel time -A study in urban transportation, dissertation (University of Chicago, Chicago, Ill.). McFadden, D., 1968, The revealed preferences of a government bureaucracy (Department of Economics, University of California, Berkeley, Calif.) unpublished. McFadden, D., 1973a, Conditional logit analysis ofqualitative choice behavior, in: P. Zarembka, ed., Frontiers in econometrics (Academic Press, New York). McFadden, D., 1973b, Travel demand forecasting study, BART Impact Study Final Report Series (Institute of Urban and Regional Development, University of California, Berkeley, Calif.) unpublished. McFadden, D., 1974, The measurement of urban travel demand, II (Department of Economics, University of California, Berkeley, Calif.) unpublished. McFadden, D. and F. Reid, 1974, Aggregate travel demand forecasting from disaggregated behavioral models (Department of Economics, University of California, Berkeley, Calif.) unpublished. Meyer, J., J. Kain and M. Wohl, 1966, The urban transportation problem (Harvard University Press, Cambridge, Mass.). Oi, W. and P. Shuldiner, 1962, An analysis of urban travel demands (Northwestern University Press, Evanston, Ill.). Quandt, R. and W. Baumol, 1966, The demand for abstract transport modes: Theory and measurement, Journal of Regional Science 6,13-26. Quarmby, G., 1967, Choice of travel mode for the journey to work: Some tindings, Journal of Transport Economics and Policy 1. Stopher, P. and T. Lisco, 1970, Modelling travel demand: A disaggregate behavioral approach - Issues and applications, Transportation Research Forum Proc., 195-214. Thomas, T. and G. Thompson, 1971, Value of time saved by trip purpose, Highway Research Record369,104-113. Warner, S., 1962, Stochastic choice of mode in urban travel: A study in binary choice (Northwestern University Press, Evanston, Ill.). This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1 Measuring Regularity of Individual Travel Patterns Gabriel Goulet-Langlois, Haris N. Koutsopoulos, Zhan Zhao, and Jinhua Zhao Abstract— Regularity is an important property of individual travel behavior, and the ability to measure it enables advances in behavior modeling, mobility prediction, and customer analytics. In this paper, we propose a methodology to measure travel behavior regularity based on the order in which trips or activities are organized. We represent individuals’ travel over multiple days as sequences of “travel events”—discrete and repeatable behavior units explicitly defined based on the research question and the available data. We then present a metric of regularity based on entropy rate, which is sensitive to both the frequency of travel events and the order in which they occur. The methodology is demonstrated using a large sample of pseudonymised transit smart card transaction records from London, U.K. The entropy rate is estimated with a procedure based on the Burrows-Wheeler transform. The results confirm that the order of travel events is an essential component of regularity in travel behavior. They also demonstrate that the proposed measure of regularity captures both conventional patterns and atypical routine patterns that are regular but not matched to the 9-to-5 working day or working week. Unlike existing measures of regularity, our approach is agnostic to calendar definitions and makes no assumptions regarding periodicity of travel behavior. The proposed methodology is flexible and can be adapted to study other aspects of individual mobility using different data sources. Index Terms— Regularity, intrapersonal variability, travel behavior, smart card data, entropy rate. I. I NTRODUCTION T RAVEL behavior is dynamic and varies across individuals but also for the same person over time. Interpersonal variability refers to the heterogeneous spatiotemporal preferences of people, reflecting different sociodemographic attributes, home/work locations, and lifestyle preferences [26]. Intrapersonal variability describes longitudinal variability in the characteristics of the same individual’s travel behavior from trip to trip, day to day, or week to week [13], [26], [31]. Sometimes it is referred to in the literature as intraindividual [15], or day-to-day variability [17], [21], [24]. Regularity Manuscript received April 2, 2017; revised July 11, 2017; accepted July 16, 2017. This work was supported by Transport for London. The Associate Editor for this paper was H. S. Mahmassani. (Corresponding author: Jinhua Zhao.) G. Goulet-Langlois was with the Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. He is now with Transport for London, London SW7 2NJ, U.K. H. N. Koutsopoulos is with the Department of Civil and Environmental Engineering, Northeastern University, Boston, MA 02115 USA. Z. Zhao is with the Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. J. Zhao is with the Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: jinhua@mit.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TITS.2017.2728704 refers to the extent to which individual travel behaviors repeat over time. A person’s activity choices and their associated trips are not made randomly. According to activity-based travel theory, they are dictated by preferences, constraints, and needs which recur over time to some degree [20]. While conventional cross-sectional data, one-day travel diary surveys for example, can capture the interpersonal variability, measuring intrapersonal variability/regularity requires individual-level longitudinal data. Multi-day travel surveys, often used for activity-based modeling, provide such data but are costly to collect and hence usually constrained to small sample sizes and short observation periods. However, advances in urban sensing technologies afford the opportunity to collect traces of individual mobility on a large scale and over extended periods of time. New mobility data sources, such as mobile phone records and transit smart card records, enable detailed and reliable measurement of travel regularity. No existing definition and measure of behavior regularity align with the variety in people’s routines and granularity which these new data sources can capture. Central to the definition of regularity is the definition of a unit of analysis for which repetition is considered. This unit should be chosen in line with the attributes relevant to the research question of interest and consistent with the resolution of the available sensor data. Reference [15] use the term behaviors to describe components of travel behavior characterized by combinations of attributes, for example “driving a car to work”. In this paper, we use the term “travel events” to refer to the same concept as [15]’s behaviors, but with a broader connotation. A travel event is a repeatable unit describing individual travel behavior, characterized by one or more attributes such as purpose, location, and duration. At the most basic level, a travel event is either a trip or an activity. Travel events can also be aggregated to different levels (e.g. daily or weekly) to form higher-level travel events. For example, for the analysis of individual daily routines, a travel event may be a combination of activities in one day. In this paper, if not specified otherwise, “travel events” are used to refer to the most basic building blocks of travel behavior—trips and activities. Travel events do not occur in isolation. People’s activity patterns govern the co-occurrence of multiple travel events. This is the basis of work on trip chaining behavior, e.g. [27], and activity-based models, e.g. [4]. Combinations of travel events reflect such activity patterns. Each event must be considered as part of this context. While some travel events are frequently repeated over time, their surrounding contexts may change from day to day [15]. This highlights that regularity depends not only on variability in the characteristics of a 1524-9050 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS single event but also on the pattern in which multiple events are combined. In our approach, multiple travel events can be ordered over time and form “travel sequences”. In existing literature, some methods have been proposed to measure regularity by examining the periodic patterns of travel behavior [19], [32], [35]. However, periodicity is not equivalent to regularity. While periodicity only captures the cyclic repetitions of travel events at fixed time intervals (typically set as a day or a week), regularity refers to all forms of repetitions. Travel patterns may not necessarily repeat periodically or may repeat over unconventional periods not aligned with the typical day or week. To some extent, periodicity is a special type of regularity. The order in which an individual completes trips and activities is an integral component of the structure in their travel routines. A good metric of regularity should be sensitive to such sequential dependency in a travel sequence, without a predefined periodic cycle. In this paper, we propose a new approach to measuring the regularity of travel behavior based on the order in which travel events are organized over time in travel sequences. The definition is not tied to an underlying calendar. Hence it is flexible. We demonstrate the approach using a large sample of transit smart card transaction records over a period of a month. The ability to measure regularity improves our understanding of travel behavior, facilitates advancements in behavior modeling, and enables the development of customer analytics for travel prediction, user segmentation, and targeted demand management. The remainder of the paper is organized as follows. We present a literature review of the related work on intrapersonal variability/regularity in Section II. Section III proposes a sequential representation of travel behavior and develops its mathematical formulation. This is followed by a description of the proposed measure of regularity based on entropy rate in Section IV. The measure is demonstrated in Section V using smart card data from London, U.K. The paper is concluded with a discussion of future research directions and potential implications in Section VI, and a summary of the main findings in Section VII. II. L ITERATURE R EVIEW While the concept of travel behavior regularity is recognized as a critical dimension of travel behavior, approaches to measure such variability remain limited in scope. Specifically, many studies measure regularity based only on the extent to which single travel events are repeated, without consideration for how multiple events are combined. Some methods focus only on the relative frequency of trips. For example, [5] proposed a spatial repetition index corresponding to the percentage of activity locations which are visited more than once over a 7 day period. Based on survey data, this measure is computed for different time periods to evaluate the spatial stability of individual activity patterns at different times of the week. Based on smart card data, [18] identified the OD pairs that the card holder frequently travels as “regular OD” and the time of the trips between these regular ODs as “habitual time”. They measured the regularity of transit users based on the percentage of a user’s trips completed within habitual times and between regular ODs. Reference [23], using smart card data, evaluated the level of spatial and temporal variability of different users based on the frequency of trips made to different stops at different times of the day. Other studies rely on the variance of different measures to quantify longitudinal variability. References [25] and [26] evaluated the variance in number of trips per day from a 7-day travel survey. Their results differentiated the part of the variance of trip generation rates associated with intrapersonal variability from the part associated with interpersonal variability. Reference [21] analyzed variability in the departure time of the first trip of the day. Relying on the concept of individual space-time prisms, they modeled the variance of first departure time so as to differentiate the part of the variance due to randomness, from the part due to changes in the time constraints dictating an individual’s schedule. Similarly, [7] also attempted to dissect the variance of the first trip departure time by formulating a multilevel model for which the variance was decomposed into five parts: inter-individual variation, inter-household variation, spatial variation, temporal variation, and intra-individual variation. Like the frequencybased measures, these variance-based measures treat each trip independently and are not concerned with the sequence of multiple trips. Accounting for combinations of travel events has long been recognized in the literature of travel behavior modeling as important. Some models rely on the assumption that activity and trip combinations are primarily a function of days of the week. For example, using the 7-day Toronto Travel Activity Panel Survey, [12] modeled the frequency of 15 nonhome/work activity categories for the 7 days of the week using 7 independent models. In contrast, some studies model the relationship between different travel events more explicitly. Reference [29] modeled preplanned and spontaneous activity duration as well as number of trips by mode, using data from the 7-day activity survey in [12]. Their approach introduces same-day effects and next-day effects to capture the relationship between multiple activities. From a long-term perspective, [3] examined the relationship between successive activities for the same purpose (e.g. shopping) using a 6-week travel survey from Karlsruhe, Germany. They modeled the time elapsed between successive activities using a multivariate hazard model. Other studies used pattern recognition techniques to directly model the activity sequence as a whole, and such techniques include Walsh-Hadamard transformation [28], sequence alignment [16], and conditional random field [2]. These studies account, to various degrees, for the relationship between travel events to improve travel demand models. They use panel survey data and do not aim at measuring regularity in the order of travel events over time. To measure regularity in combinations of travel events, many researchers, especially in the human mobility literature, proposed methods to uncover periodic patterns. Some studies use the Fourier transform to identify underlying periods of repetition in travel from digital traces of location collected over multiple weeks. Reference [19] found daily and weekly periods to be most significant in observing individuals’ connection This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. GOULET-LANGLOIS et al.: MEASURING REGULARITY OF INDIVIDUAL TRAVEL PATTERNS to Wi-Fi access points (AP) on the Dartmouth campus. Reference [8] identified the same dominant periods using data from MIT’s Reality Mining project. Reference [22] proposed a probabilistic measure of periodicity and demonstrated its robustness to noise and missing observations using GPS data, with superior performance over methods based on the Fourier transform. The above studies account for repetition in combinations of travel events, by measuring the extent to which their cooccurrence map to a set calendar cycle (most often a weekly cycle). Other studies attempt to measure regularity explicitly by imposing a predefined cyclic period. For example, [35] proposed a measure of temporal irregularity in the intervals between a person’s visits to a given location. They applied a weekly based measure to different data sources and found that the behavior captured from smart card data was most regular, while Wi-Fi data revealed the least regularity. Reference [32] presented another regularity measure also based on a weekly cycle. Given hourly information of a person’s location over several weeks, they used the percentage of hours spent at the location most frequently visited during each hour of the week as the index of periodicity for the corresponding hour. However, periodicity is not the same as regularity. Regularity indicates the degree to which sub-sequences of events are repeated, and these sub-sequences do not have to align with a particular cycle. This is especially relevant to sequences of activities, as activities are likely to be organized in a logical order. For example, visiting the doctor’s office, going to the pharmacy to pick-up a prescription, and returning home are likely to occur in this logical order. The repetition of this sequence may not be periodic. Furthermore, [19], [32], [35], and [8] all discuss periodicity in the context of the most conventional cycles of repetition: the day and the week. We argue that regularity is an internal property of a travel sequence and should not depend on how the sequence aligns with the calendar. Some patterns may repeat on non-daily or weekly cycles. For example, certain types of employment (e.g. shift-workers, firefighters, doctors) may dictate working schedules which repeat on a cyclical unit other than the week. Periodicity measures computed on a weekly basis (as done by [32] and [35]) would fail to capture the true regularity in such cases. Similarly, a measure of daily periodicity may not be able to capture patterns spanning more than a calendar day, such as going out in the evening, sleeping at a friend’s home, and then returning home the next day. In conclusion, no index that captures repetition in the order in which events are observed has been introduced in the literature. In the following sections, we present a new metric for measuring the regularity of travel behavior that depends explicitly on the order in which travel events occur. As such, the metric avoids the issues inherent in existing periodicitybased measures which examine only co-occurring patterns of travel events and calendar events (i.e. hour, day, week). III. S EQUENCE R EPRESENTATION Individual travel patterns can be conceptualized as a sequence of travel events. These events unfold over time with 3 respect to a background calendar (time of day, day of the week, month). Travel events are characterized by different aspects of behavior, including location, time of day, mode, route, travel time, activity type (or travel purpose) and activity duration. For instance, an event defined as an activity occurs at a certain time of day (8 pm on Friday), for a certain duration (2 hours), at a certain location (downtown) and for a certain purpose. As recognized by [13]–[15], variations along these behavioral dimensions are not independent. For example, an individual’s choice of mode or route will significantly influence the travel time for her morning commute, which impacts her departure time. A key component of these sequences is the order in which events take place. An appropriate measure of regularity in a person’s travel behavior should capture both, the extent of repetition in travel events and in the order in which they are performed. It is necessary to introduce a mathematical representation of travel sequences which captures the order of events to define such a regularity index. We model the mobility of each individual over multiple days as a random process, which represents how often and in what order travel events are generated. The notation follows that used by [9]. Let the stochastic process corresponding to the mobility of a given individual u be denoted by Xu and a travel event generated by this process by random variable X u . Each travel event X u assumes a discrete value x from the set of possible travel event outcomes E u defined for individual u. x can be regarded as a unique identifier for a repeatable event. Two separate events assume the same value of x if and only if they have the same combinations of event attributes. X u has a discrete probability distribution p(x) = Pr {X u = x} for x ∈ Eu . For simplicity, subscript u is omitted and all remaining notation is defined with respect to a single individual. The stochastic process X = {. . . , X −1 , X 0 , X 1 , X 2 , . . .} represents the ordered set of random variables X i . Any finite sequence of this ordered set between event i and event j is denoted j by the ordered subset X i = {X i , X i+1 , . . . , X j −1 , X j }, with j −∞ < i ≤ j < ∞ such that X i ⊂ X. Given a finite window of analysis, we observe a specific realization j x i = {x i , x i+1 , . . . , x j −1 , x j } of the finite random variable j sequence X i . Informally, set E is akin to an alphabet from which a string of discrete events can be constructed. Different types of sequences, or strings, can be represented based on different definitions of travel events x ∈ E, driven by the aspects of behavior of interest. In practice, the specification of E is constrained by the available data. Different data provides information on varying aspects of travel and at various aggregation levels. For instance, smart card data provides location information at the stop level and the timing of the event, but no direct information on activity purpose. For consistency and computation convenience, we assume all event attributes are discrete. This assumption is common for travel behavior analysis since many travel attributes are discrete by nature, such as purpose, location and time periods (e.g. morning peak, midday, afternoon peak). Attributes This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 Fig. 1. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS Example of travel sequences. that typically assume continuous values (e.g. activity duration) are discretized into a finite number of categories. The specification of these categories depends on both, the goal and the data of the analysis. While a larger number of categories can capture the variation of these attributes in finer detail, it can also make the specific values less repeatable and lead to a sparse distribution of p(x). Ideally, these categories should meaningfully reflect behavioral choices. For example, using some clustering approach (e.g. Gaussian mixture model), the activity duration can be discretized into three categories - long, medium, short, and each of these categories is likely to be associated with certain activity types (e.g. home, work, other). Fig. 1 shows how a person’s travel over a day can be summarized as different travel sequences by changing the definition of travel events. For this example, we discretize activity duration into three categories - Long (> 10 hours), Short (< 3 hours) and Medium (between 2 and 10 hours), and travel duration into two categories - Long (> 30 minutes) and Short (< 30 minutes). We also characterize the trip start time using 24 hourly intervals. The level of discretization determines the granularity of travel events. Typically, finer granularity means that each travel event is more unique and less likely to repeat. For many applications, a single aspect of travel behavior (i.e. purpose, location, or mode) is relevant. In these cases, the travel events only have a single attribute, and we may directly set the x value of an event to its attribute value. For example, the first sequence in Fig. 1 focuses on the locations visited by the person. This can be represented by defining set E as the set of all locations visited by the individual over the j period of analysis. In this example, x i is simply a series of location IDs. In other contexts, it may be necessary to define events based on combinations of multiple attributes. For instance, location, function, and duration could be combined to differentiate between two activities observed in the same geographical area. In this case, the events x in set E are defined as compound outcomes of location, function, and purposes, as illustrated in the third sequence of Fig. 1. At different levels of aggregation, multiple trips or activities can be grouped together to define a single event. For example, all trips made on the same day can be grouped into a single event to create a binary sequence representing when the person traveled across multiple days. This representation provides a flexible approach to simplify and represent multidimensional travel behavior as a string of travel event symbols. These symbols are defined in line with the objective of the study so as not to distort or omit relevant information about aspects of travel of interest. IV. M EASUREMENT OF R EGULARITY As described in the previous section, we model the mobility of an individual over multiple days as a sequence of events generated by a random process X. Through this abstraction, it is possible to characterize an individual’s mobility by quantifying the nature of the random process X. Many different This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. GOULET-LANGLOIS et al.: MEASURING REGULARITY OF INDIVIDUAL TRAVEL PATTERNS properties of process X may provide information about the individual’s travel pattern. For example, consider a process X representing the activity sequence of an individual. In this case, the cardinality of set E informs us about the diversity of activities in which the individual engages, and the mode of probability distribution p(x) reveals the individual’s most frequent activity. This section introduces ways to measure such properties of X which can be used to describe regularity of a travel sequence. A. Entropy vs Entropy Rate First, we examine the extent of repetition of a travel sequence regardless of the order. Under this assumption, the regularity of a random process is solely determined by the probability distribution p(x). Intuitively, on average, an outcome generated by a more regular process should be less uncertain and more predictable. In information theory, the level of randomness or unpredictability of a process can be measured using entropy. Entropy measures the average information, or surprise, provided by each realization of a random variable in bits. The entropy H (X) of random variable X with probability distribution p(x) = Pr {X = x} for x ∈ E is defined by (1).  p(x) log2 p(x) (1) H (X) = − x∈E For the travel sequence problem, X represents the random variable associated with a travel event and E denotes the set of all possible travel event outcomes defined for a given individual. Entropy can be thought of as a measure of variance defined for categorical probability distributions. It accounts for both the number of possible outcomes (the cardinality of set E) and the relative frequency of outcomes. Hence, entropy equals 0 for a process with a single possible outcome (no uncertainty) and is highest when the probability distribution of a random variable with multiple outcomes is uniform (when all events are equally likely). Reference [30] used entropy to measure and contrast the complexity of activity patterns completed by individuals of different gender. The author points out that entropy is a good measure of the amount of heterogeneity in a categorical distribution, which is especially relevant when considering qualitative outcomes such as activities. Although entropy is a good measure of repetition of isolated events in a travel sequence, it does not capture the extent to which ordered sub-sequences of events repeat over time. Travel sequences are not typically memoryless processes. Rather, the conditional distribution of an event X i depends on the outcome of events X i−1 , X i−2 , . . . preceding it (i.e p(X i |X i−1 , X i−2 , . . .) = p(X i )). For example, observing a visit to the doctor might significantly increase the likelihood of a visit to the pharmacy in the following event. Entropy rate accounts for the order of events in a travel sequence, or more formally for the memory in process X. Entropy rate H (X) of the random process X is defined as the asymptotic rate at which the entropy of sub-sequence X 1n changes with increasing n [9], calculated using (2). 1 H (X) = lim H (X 1, X 2 , X 3 , . . . , X n ) (2) n→∞ n 5 where, H (X 1, X 2 , X 3 , . . . , X n ) denotes the entropy of the joint variable X 1n defined for the subsequence X 1 , X 2 , . . . , X n . References [9] and [6] stated that this limit exists for all stationary random processes and is equal to H (X) = lim H (X n |X n−1 , . . . , X 2 , X 1 ) n→∞ = lim − n→∞  x 1n ∈E n pn (x 1n ) log2 pn (x 1n ) pn (x 1n−1 ) (3) where pn denotes the joint probability distribution of a subsequence of length n. As described by (2) and (3), entropy rate measures the average entropy of each new event generated by random process X, accounting for preceding events. It is measured as the entropy per event and has units in bits per event. The entropy rate of a random process with no memory is exactly equivalent to the entropy of the process as each new event...
Purchase answer to see full attachment
Explanation & Answer:
2500 words
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let me know if you have any questions.

Travel Demand

Student’s Name
Professor’s Name
Institution
Course code
Date

Catalog
Chapter 1 ................................................................................................................................................. 3
Introduction ............................................................................................................................................ 3
Aims of the project.......................................................................................................................... 3
Objectives ....................................................................................................................................... 3
Needs for the study ......................................................................................................................... 4
Scope of the project ........................................................................................................................ 5
Chapter 2 ................................................................................................................................................. 5
Literature review .................................................................................................................................... 5
Lifestyle and travel demand ............................................................................................................ 5
Characteristics in individuals .......................................................................................................... 6
Perceptions of risk and safety ......................................................................................................... 7
Looking into travel demand in urban areas ..................................................................................... 7
Chapter 3 ................................................................................................................................................. 8
Methodology of project .......................................................................................................................... 8
Flowchart ........................................................................................................................................ 8
Questionnaire ................................................................................................................................ 10
Chapter 4 ............................................................................................................................................... 10
Data collection and analysis ................................................................................................................. 10
Charts/analysis .............................................................................................................................. 10
Chapter 5 ............................................................................................................................................... 14
Results/findings ..................................................................................................................................... 15
Chapter 6 ............................................................................................................................................... 15
Recommendations................................................................................................................................. 15

Chapter 1
Introduction
Aims of the project
The project is being partaken to study the travel demand showcased in the
Muscat region. The project's findings are to be taken up for the strategic reevaluation
of the region's infrastructure. The infrastructure evaluation may be prompted if a need
may be expressed from the project's findings. The project is thus aimed to involve the
people living in the Muscat region to relate their opinions on a few aspects that will be
considered for the project's assertions. The people will be engaged in a survey that
will help them fully recount their lives and the various factors that influence their
travel choices and modes thereof. Th project will thus focus on the opinions from the
survey respondents in the analysis of the feedback that shall have been collected.
Experts taking an interest in the region of Muscat should stand to be well informed
from the project's proceedings. The experts should be able to gather insight from the
data collected and information given by the survey respondents. From the project's
proceedings, the experts should therefore be well informed on improving the
infrastructure set up for the Muscat region.
Objectives
The project is being driven with key objectives in mind. The project is being
undertaken to learn about the key factors that revolve around the issue of travel
demands in the region of Muscat. It is to be hoped that a plan to improve the region
substantially may be dealt with with the study. Therefore, the project is a prior
measure to pave the way for the region's infrastructural development to be
undertaken. The project will look into several factors. This is hoped to help structure

the region for its betterment over a long-term look into it. The project's objective is to
get the attention of all stakeholders in the Muscat region involved. The stakeholders
each have a role to play in the project. The stakeholders vary in opinion and influence,
and therefore getting their mixed takings and givings about the situation in the Muscat
region will be helpful for the project. The factor under study through this project,
travel demands, will thus be evaluated and analyzed in the best manner possible. This
way, the project's objectives can take shape progressively for the benefit of the
Muscat region and its population.
Needs for the study
The matter under study in this project is the travel demand being showcased in
the Muscat region. Therefore, this leads to the listing of individual factors that are
seen to affect the phenomenon being studied. Therefore, the study is being undertaken
to fully account for the influence of the factors on the travel demands being displayed.
This way, the experts can fully understand how others may influence certain factors
and even cause the occurrence of other phenomenons as a result. It is understood that
behavior and habits are influenced characteristics that people get to showcase. The
behaviors and habits may be a matter of circumstances, while some are brought up by
situations in which people may find themselves. Therefore the study will be hoped to
create a balance in understanding the circumstances and situations that the people are
exposed to either daily or occasionally. This, therefore, gives a basis to the study's
pinpointing the exact influences of the situations and circumstances on the travel
demand of the Muscat region. The lives and coping mechanisms aimed at the
betterment of the examined situations and circumstances may be fully captured in the
end. This, therefore, gives a clear room for understanding the entire happening of the
decisions that accompany the region's travel demands.

Scope of the project
The project is being undertaken with an array of measurable variables in the
composition of its scope. The project's goals, deliverables, and deadlines are fully
developed, defined, and set. This, therefore, gives the project the clarity it deserves in
the rollout that is being implemented. The project's goals revolve around the study's
success that had been discerned as requisite for the project as it was being initiated.
The project is thus to be used to collect and analyze the data required for the benefit
and success of the study. The project's deliverables, in this view, are the informational
points that are to be discerned off the data that is to be collected and analyzed.
Therefore, the project is to employ the best data collection method possible to ensure
that the data is accurate and free from exaggerations or distortions. With the issue
under study being dynamic, the data collected and analyzed with respect to the study
needs to be delivered in a timely manner. This aims to have the results used in a
timely manner as the travel demands are dynamics that are susceptible to change.
Chapter 2
Literature review
Lifestyle and travel demand
It is argued that lifestyles are responsible for the travel demands that are to be
examined in ...

Similar Content

Related Tags