Data Analysis Using Epi InfoTM7: Descriptive Analysis
In order to learn about epidemiological studies, data analysis, and the use of secondary data, we will be using the Youth Risk Behavior Survey (YRBS). This is a school-based survey on youth in high school (grades 9-12), which began in 1991 and is collected every other year in odd numbered years. The survey focuses on six areas of health and provides selected questions for each area. The areas include:
- Behaviors that contribute to unintentional injuries and violence
- Sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases, including HIV infection
- Alcohol and other drug use
- Tobacco use
- Unhealthy dietary behaviors
- Inadequate physical activity
In Week Four, you will be asked to review the YRBS questionnaire and code to determine which topic you would like to utilize for your Final Paper that is due in Week Six. You also conducted a literature review and completed an annotated bibliography for your selected topic. This week, you will use our class topic of “cigarette smoking and marijuana use” to conduct data analysis on three research questions. For this week’s assignment, you will focus on running frequencies, determining prevalence estimates, and assessing whether these estimates are statistically significant among different groups. Upon completion of this assignment, you will be able to utilize these same data analysis skills with regard to the topic you choose for your Final Paper.Epi InfoTM 7
Ensure you are able to access and utilize the Epi InfoTM
7 software. Refer to the directions in Week One for more information. YRBS Data
Visit the CDC’s Data Files & Methods web page to download the 2011 National High School YRBS Data Files
. Click the link for Access File
under the “Access®” category for the year 2011. You will be prompted to either open or save the YRBS2011.zip
file and you should save this file to a location on your computer that you can easily find. Once the file is saved, click the YRBS2011.zip
file and right click, selecting Extract All. The files will extract to the same location on your computer. You will need to upload this data into the Epi InfoTM
7 software to complete the assignment below.
When analyzing a data set using statistical software packages, it is important that you become familiar with the codebook. The codebook provides details on the variables within the data set. It includes information on how the questions were asked in the study, whether the data that is presented in the dataset is nominal, ordinal, interval, or ratio data, and the basic frequency numbers of the data set. Review and download the 2011 YRBS Data User’s Guide
(codebook) and review the 2011 National Youth Risk Behavior Survey
questionnaire to become familiar with the data variables and survey questions. Sometimes when you review the codebook (User’s Guide) and compare it to the survey, you can determine if some of the data was recoded into a more “user-friendly” variable. This will be explored further in Week Three’s assignment. Assignment Instructions
Utilize the Epi InfoTM 7 Quick Start Guide, v.0.2.2
as a resource to complete the tasks below.
First, you will need to upload your data and save the canvas file:
- Launch the Epi InfoTM 7 program and, from the Menu screen, select Visual Dashboard.
- From the Visual Dashboard, you will be prompted to “Set a Data Source” by clicking either the white arrow within the blue pop-up box or by clicking on Set Data Source in the top left-hand corner of the dashboard.
- A new window titled “Select Data Source” will pop-up, change the “Database Type” to Microsoft Access 2002-2003 (.mdb) using the drop down menu.
- Click the square button next to the “Data Source” line, a new window titled “Open Microsoft Access File” will pop-up. Click the square button next to the “Database file name” line, and then locate and select the 2011 YRBS data that you saved earlier. Click OK.
- In the “Select Data Source” screen, highlight the XXHq file in the “Data Source Explorer” box. Click OK.
- From the blank canvas screen, click Save As at the top of the navigation toolbar. Save this canvas within a folder or location on your computer where you can easily access it.
- Next, you will run frequency tabulations for select variables:
- Right click on the blank center canvas screen, hover over the “Add Analysis Gadget” and select Frequency from the menu.
- The “Frequency” tool box will pop up within the canvas. Select the first variable from the list (q1) from the “Frequency of” selection box, and then click Run. A frequency table will appear on the canvas that contains the statistical data for this variable.
- Complete Steps numbers 7 and 8 for each of the following variables:
- Age (q1)
- Gender (q2)
- Grade (q3)
- Race (RACEETH)
- Cigarette smoking (qn31)
- Marijuana (qn48)
- Once you have a frequency table for each variable, you can export the output to Excel by right-clicking with your mouse cursor on the table itself. From Excel, copy and paste each table output that you have created into one tab within the same Excel file. The tab should be labeled Wk2_Frequency. Save the entire Excel file as Firstname_Lastname_Week 2_Assignment.
- Then, you will complete cross-tabulations using the MxN/2x2 table feature:
- In the same Epi InfoTM 7 canvas that you made the frequency tables, right click on the blank center canvas screen, hover over the “Add Analysis Gadget,” and select MxN/2x2 from the menu.
- Use the “MxN/2x2” gadget window to determine what percentage of students use marijuana by gender, race, grade, age, and smoking status.
- Export the output to Excel by right-clicking with your mouse cursor on the table itself. From Excel, copy and paste each table output that you have created into one tab within the same Excel file. The tab should be labeled Wk3_2x2 Tables.
- Next, we are going to look at joint marijuana and cigarette smoking in the last 30 days. In order to do so, we are going to create a new variable in Epi InfoTM that will indicate whether a student is coded 1 ("yes) on both qn31 and qn48. We will use Epi InfoTM's data-editing capabilities to accomplish this task.
- Locate the “Defined Variables” feature on the left-hand side of the Visual Dashboard canvas, place your mouse over it and a box should emerge. Click on New Variable, then click With Assigned Expression. A new box titled, “Add Variable with Expression”, should appear.
- In this box, under “Assign Field”, enter cigs_and_pot (or other name of your choice). Under “Expression”, enter (2-qn31)*(2-qn48). Under “Data type”, choose Numeric. Click OK, and the box will disappear; and, you will have created a new variable titled cigs_and_pot.
- You might check to see what you’ve done. To do this, access the Analysis gadget, and select Frequency of qn31, qn48, and cigs_and_pot. Compared to the 1 [yes] – 2 [no] coding of qn31 and qn48; cigs_and_pot is coded 0 [no] or 1 [yes] as the number of students who have used both marijuana and cigarettes is less than the number of students who used cigarettes or the number of students who used marijuana. Do you see why this must be true?
Here’s an explanation of the expression (2-qn31)*(2-qn48). If a student is coded 1 [yes] for both cigarettes (qn31) and marijuana (qn48), then each term (2-qn31) and (2-qn48) is 2-1 = 1, and the product is 1*1 = 1.
If a student is coded 0 [no] to either cigarettes or marijuana, then at least one of the two terms (2-qn31) and (2-qn48) is 0, so the product of the two terms will be 0.
In short, the new variable cigs_and_pot will be 1 if the student is coded 1 (yes) to both cigarettes and marijuana, and will be 0 otherwise (that is, either not a cigarette smoker or not a marijuana user). From this, you can see that creating new variables in Epi InfoTM via assigned expressions is quite powerful, especially if you have a good grasp of algebra.
Lastly, using the newly created cigs_and_pot variable along with the “MxN/2x2” table analysis gadget, complete the following analyses:
- Determine the prevalence of male and females that use marijuana and cigarette smoking. Is there a statistically significant difference between male and females?
- Determine the prevalence of marijuana and cigarette smoking by grade, race and age. Are they statistically different?
- Export the output to Excel by right-clicking with your mouse cursor on the table itself. From Excel, copy and paste each table output that you have created into a new tab labeled Wk3_2x2 Tables_Step14 within the same Firstname_Lastname_Week 2_Assignment Excel file.
Finally, summarize your results and address the following:
- Describe your sample population
- Is the sample is equally distributed?
- Are there any statistical differences within groups?
- Are there any statistical differences between groups?
- Summarize and explain the differences with the odds ratio and other pertinent data as it relates to your data analysis.
Your assignment must be two to three pages (excluding title, reference, and analysis output pages) and formatted according to APA style as outlined.