Statistics is the science of collecting, analyzing, interpreting, and presenting data. It helps us make sense of the vast amounts of information generated in fields like healthcare, economics, and social sciences.
But what are data and how do we obtain them?
This question is fundamental to understanding statistics. Simply put, data are numerical representations of the real world obtained by counting or measuring. Data obtained by counting are “discrete” variables, while those obtained by measuring are “continuous” variables.
Discrete variables can only take specific and separate values, often integers. For example, the numbers of candies in a bag, the number of patients with a disease, the number of children in a family, or the number of tablets in a prescription. These variables are often represented in tables or bar charts, making them easier to understand and analyze.
Continuous variables can take any value within a range, including fractions and decimals. Examples include body weight, blood pressure, and body temperature. These variables are often represented as curves, visually showing how data points vary over a continuum.
This definition of “data” is incomplete because it only includes numerical, quantitative data. There are also non-numerical, qualitative data, which are statistically important. Examples include blood group, gender, and disease stage. These types of data provide valuable information for statistical analysis.
Additionally, there are ordinal data that fall between qualitative and quantitative data. For instance, a pain scale from 1 to 10 is an example of ordinal data. These data types are ranked in a meaningful way, but the intervals between the ranks are not necessarily equal.
Some data can also be derived or calculated from other data rather than obtained through counting or measuring. For example, a patient’s BMI (Body Mass Index) can be calculated using their weight and height. Moreover, some quantitative data can be expressed in a binary way or other types of encoding that are not necessarily numerical, adding another layer of complexity.