Kevin Guo
Abstract: Lung cancer in never-smokers (LCINS) is one of the leading causes of cancer patient deaths in the United States. Unlike lung cancer onset by cigarette smoking, LCINS is not as readily understood and research on the subject has been conflicting. Thus, early diagnosis and prevention are key in reducing mortality among LCINS patients. In this study, the Prostate, Lung, Colorectal, and Ovarian (PLCO) dataset containing more than 155,000 participants and more than 36,000 never-smokers with LCINS was analyzed using R and Excel software to determine risk factors and imaging features of LCINS. The factors analyzed for predictive power in LCINS incidence were age, height, weight, BMI, race, income, family history, and secondary smoke exposure. Multiple statistical methods, including t-tests, ANOVA tests, and logistical regression, were implemented to assess each factor. Through comparison and corroboration of results from the statistical methods, age and race were the key factors that had statistically significant evidence as potential influences in LCINS incidence. In addition, the statistical method that provided the most information regarding a factor’s power was logistical regression due to the binomial outcome of whether or not a patient has LCINS. These results could be used in future studies to explore deep learning techniques that enable cross-sectional imaging analysis for predictive factors of LCINS or other lung cancers.