학술논문

A Resampling Univariate Analysis Approach to Ovarian Cancer From Clinical and Genetic Data
Document Type
Periodical
Source
IEEE Access Access, IEEE. 9:25959-25972 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Genetics
Databases
Data science
Standards
Measurement
Tumors
Sequential analysis
Bootstrap resampling
data science analytics
genetic data
hypothesis test
ovarian cancer
univariate analysis
Language
ISSN
2169-3536
Abstract
Ovarian cancer (OC) is the second most common gynecological malignancy and the gynecological tumor with the worst prognosis. To try to improve this situation, Data Science technologies could be a useful tool to help clinicians to know more about the disease. In our case, we are interested in exploring OC data to discover relationships between clinical and genetic factors and the disease progression. For it, we propose an analysis framework for simple and univariate statistical descriptions of features of different types, based on bootstrap resampling. Foremost, we define the framework for metric, categorical, and dates variables and determine what are the advantages and disadvantages of using different bootstrap resampling strategies, based on their statistical basis. Then, we use it to perform a univariate analysis over an OC dataset that allows to explore how is the disease progression, having platinum-free interval as indicator, in relation to clinical and genetic features of different types. Also, it provides a first set of variables possibly relevant for survival prediction. Results obtained show that some features have led to individual differences between both platinum resistant (6 months) groups. It can be concluded that this could be an indicator that the database could be discriminatory for the hypotheses studied, though it is convenient to make multivariate analyses to check how relationships among features are influenced.