Call: +91-9177734525 | Email: info@opensciencepublications.com

Journal of Environmental and Social Sciences

Research Article

Impact of Multivariate Risk Factors on Children’s Health in Ekaterinburg: A Cross-Sectional Study

Ekaterina Konstantinova1* and Anatoly Varaksin2

1Senior Researcher at laboratory of Mathematical Modelling in Ecology and Medicine, Institute of Industrial Ecology, Ural Branch of Russian Academy of Sciences, S. Kovalevskaya Street, 20, 620990, Ekaterinburg, Russia
2Doctor of Sciences (Physics and Mathematics), Head of Mathematical Modeling Laboratory in Ecology and Medicine, Professor Institute of Industrial Ecology, Ural Branch of Russian Academy of Sciences, Ekaterinburg, Russia
*Corresponding author: Ekaterina Konstantinova, Senior Researcher atlaboratory of Mathematical Modelling in Ecology and Medicine, Institute of Industrial Ecology, Ural Branch of Russian Academy of Sciences, S. Kovalevskaya Street, 20, 620990, Ekaterinburg, Russia, Work ph. +7(343) 3623514; Fax: +7(343) 3743771, E-mail: konstantinovaekateri@gmail.com
Article Information: Submission: 16/03/2015; Accepted: 31/03/2015; Published: 05/04/2015
Copyright: © 2015 Ekaterina Konstantinova, et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: The study was undertaken to assess the multifactorial environment impact on human health in large industrial city. Our objective was to look for the risk factors complex that has the greatest impact on the prevalence of diseases.
Methods: A cross-sectional epidemiological study was performed with a representative sample of 441 preschool-aged children. Classification Tree method was used as the main research method.
Results: Our study showed that the complex of ecological risk factors (air pollution, drinking water pollution, a gas stove in the apartment, etc.) increases the prevalence of diseases of the respiratory system and behavioral disorders from 2.5 to 4 times coupled with the socialeconomic risk factors.
Conclusions: This study shows that such factors as atmospheric pollution, gas-stove in apartment, parent smoking and mother’s low educational attainment increase the prevalence of diseases even in terms of univariate approach. The authors have identified also risk factors sets having a maximum negative influence on the prevalence diseases.

Keywords

Children health; Indoor and outdoor air pollution; Cross-sectional study; Multivariate risk factors analysis; Classification trees

Introduction

Within the framework of the modern paradigm of noncommunicable diseases epidemiology, in contrast to the epidemiology of infectious diseases, we deal with the so-called “the web of causation”, and not with relatively simple connections “cause - effect” [1]. “When many factors act together, it has been called ‘the web of causation.’ A causal web is well understood in chronic degenerative diseases such as cardiovascular disease and cancer” [2,3]. “One of the characteristics of most non-infectious diseases is that many different factors contribute to their development and can be said to be causes in the sense that, in their absence, the incidence of disease is reduced and that, in their presence, the incidence is increased” [4]. Because human health is determined by the simultaneous action of multiple risk factors, (including environmental, social, family, etc.) on the organism, the assessment of their actions should be carried out in complex [5-8]. Thus, multivariate analysis aims to identify those risk factors combinations that have the greatest impact on public health.
Modifiable (those that can be changed) and non modifiable risk factors of health loss can be identified among all risk factors. Environmental pollution (environmental risk factors), for example, refers to the unmodified ones (people cannot change the environment in their habitat), whereas some family, behavioral, or social factors are largely modifiable ones (they can be partially or completely eliminated by changing the family “Settings”, giving up bad habits, etc.) [6,9,10,11].
The authors propose to develop a subject-oriented statistical model describing multifactorial environment impact on human health. The proposed model is based on discriminant analysis. What is a subject-oriented approach? Subject-oriented approach should have the following features: all its stages should be apparent to those skilled in the subject area and it should produce results that would be important not only for academics but also for practical science e.g., human ecology and epidemiology).
Application of discriminant analysis methods to the problem of risk factors impact on health is based on one idea. The idea is as follows: a set of predictors that can reliably classify patients and healthy individuals is the set of risk factors that has the greatest influence on the disease occurrence [12-14].
Discriminant analysis methods have not yet found wide application in problems of risk factors influence on disease prevalence. The reason for this is as follows. The decision rule constructed by any classification method should reliably distinguish the sick from the healthy. In other words it should be “the high quality decision rule” that gives a high percentage of correct classification. It is fundamentally impossible to get a reliable decision rule in problems related to the risk factors influence on the population health, because risk factors only increase the likelihood of disease, but do not cause disease. Therefore, the presence of a complex (even a large number of risk factors), cannot guarantee the disease occurrence (as well as the absence of risk factors does not guarantee the absence of disease). Consequently, it is also impossible to accurately predict the disease occurrence using a mathematical description based on the knowledge of risk factors [13-17].
Among the classification methods, let us distinguish the “black box” models (neural networks, pattern recognition techniques) that form the decision rule as an algorithm that is not available for the subject analysis. Several methods (linear discriminant function, logistic regression) give a decision rule in a form that is accessible to the subject analysis.The disadvantage of these methods is that their performance is only for unrelated risk factors (uncorrelated predictors) [14,18-21].
The authors show here that the CT method has features. The first is the possibility to apply this method for correlated predictors. The second is using a decision rule that is not of high quality, which does not necessarily clearly distinguish the sick from the healthy patient, but allows dividing the population into groups with higher and lower prevalence of disease.
To solve the problem of the risk factors impact on the population health, this is sufficient [13,15,17].

Materials and Methods

A study was conducted on a sample of Ekaterinburg preschool children. Ekaterinburg is the fourth-largest city in Russia (after Moscow, St. Petersburg, and Novosibirsk), with a population of about 1,425,000 (2012) and land area of 491 sq. km. Ekaterinburg is situated in the middle of Eurasia on the border of Europe and Asia.
It is worth stating that the geographical location of Ekaterinburg is extremely favorable and eventually this fact influenced the development of the city. The city is located in the Urals where the mountains are low; this fact favored the construction of main transportation ways from Central Russia to Siberia through Ekaterinburg (Bolshoi Siberian road, Trans-Siberian railway). As a result, Ekaterinburg was formed as a strategically important center of Russia providing connection between European and Asian parts of the country. Moreover, our city is strategically important center of Russia providing connection between European and Asian parts of the country nowadays.
It is the third (after Moscow and Saint-Petersburg) largest transportation juncture where six federal highways and seven railways meet. The city is an important transport and logistics juncture on the Trans-Siberian railway, as well as a large industrial center.
Data on the prevalence of various diseases and the presence of risk factors were obtained at routine medical inspections in the sixteenth municipal Children’s hospital [22]. Totally, 441 preschool children attending nursery schools were examined at Zheleznodorozhny district of Ekaterinburg. Comprehensive database contains information on about 100 various (ecological, social, family, etc.) risk factors (RF) collected using parents questionnaires. RFs that are not widespread (rare factors) were excluded from the study during the preliminary analysis. RFs that have not demonstrated any connections with studied diseases in terms of univariate approach were excluded from the study also. Ultimately, 11 most significant and widespread RFs were selected for the final analysis of their influence on children’s health (Table 1). Among them were outdoor air pollution, drinking water quality, gas stove usage, secondhand smoke, physical inactivity, low education level of mother, bad habits of parents, etc. [22].
Let us analyze the prevalence of RFs (Table 1) and let us start with the environmental RFs. The drinking water types presented in our study comprise three gradations: special bottled water, filtered tap water, and tap water. Data from the Table 1 show that that 40% of families having preschool children use tap water without cleaning for cooking (moreover, the percentage of such families in boys (43%) is higher than in girls (36%)).
Tap water in Ekaterinburg is characterized by the presence of organochlorine compounds, lower fluorine content, and is iodine deficit. Herewith, it is revealed that consuming chlorinated drinking water has hepatotoxic effect, i.e. it might possess a potential for carcinogenicity and mutagenicity. It is also revealed that lower fluorine content leads to high caries rates. The iodine deficiency in some cases is known to cause development of congenital anomalies and reduction of cognitive abilities in children and adults [23]. Only 27% of families buy special bottled water for cooking.
The level of air pollution from vehicle emission (the authors will use abbreviation ‘AUTO‘ thereafter) is presented in our study through three gradations: low (is denoted as -1), middle (is denoted as 0), and high (is denoted as +1). The data in Table 1 show that 25% of children attend nursery schools located in the areas with high levels of air pollution and 46% of children attend nursery schools located in the areas with middle levels of pollution. So only 29% of children analyzed in this study, attend nursery schools located in the areas with relatively clean air.
Gas stove in an apartment where the child lives is the next ecological RF. Gas stove (compared with electric one) is a wellknown RF of children’s health loss. The authors found in our study that 68% apartments in Zheleznodorozhny district of Ekaterinburg are equipped with gas stoves. The prevalence of unsatisfactory sanitary state of the apartments and prevalence cold and mold in the apartments are 46% and 23%, respectively.
JAP-2330-2178-05-0039-fig1
Table 1: Risk factors for health loss of children and their prevalence in Ekaterinburg.
We now proceed to the social and domestic RFs. The authors mean such socioeconomic factors, as educational level, family’s material well-being, and the family’s psychological climate to these factors too. Data about their prevalence are given in Table 1 too.
Most families (77%) that are complete can be estimated, certainly, as not being a RF. Table 1 demonstrates that such RF as “Mother’s smoking” is spread in 25% of cases and 75% of mothers are nonsmokers.
Table 1 demonstrates that such RF as “Mother’s smoking” is spread in 25% of cases and 75% of mothers are nonsmokers. Low education level of mother (only secondary education) is a RF too. Table 1 demonstrates that this RF is presented in 16% of families. In 47% of cases, mothers have postsecondary education and in 38% of cases, higher education. High level of education is regarded in our study as the absence of RF. However, it cannot be stated definitely that the absence of mother’s higher education is the predictor of adverse child health indicators. Nevertheless, mother’s education can indirectly affect life and living conditions of the family (e.g., more educated people probably are less likely to smoke in the presence of a child).
Our results demonstrate that along with the degree of family’s material well being, notable sociobehavioral roles played in the household are a factor in the children’s health. Among these factors family’s psychological climate and the character of leisure time and rest are substantial. An overwhelming majority of families (88%) have normal (not tense) psychological climate. Slightly less than half of the children (47%) have sufficient level of physical activity (sports or physical activity with family) that should positively affect their health. It can be seen (Table 1) that 29% of families having preschool children estimate their family’s material well being as insufficient; 58% of families estimate their family’s material well being as sufficient, and 13% as good.
The interesting thing is that predictors that can be attributed as modifiable and manageable proved to be quite common and statistically significantly influencing RFs. It is generally known that there are modifiable RFs (ones that you can change) and non modifiable RFs (ones that you cannot change). Modifiable RFs include lifestyle, eating habits, bad habits of parents and children, that is, all that can be potentially changed or eliminated. Our results show that substantial benefits for children’s health are: parent’s smoking refusal when children are present; sufficient physical activity of child; satisfactory state of apartments, and creation of normal psychological climate in the family. All these steps are quite feasible for parents. Such RFs as type of stove in an apartment or type of drinking water can be classified as partly modifiable and requiring some material spending. Nevertheless, these RFs can be eliminated if parents wish in contrast to such resistant environmental factor as atmospheric air pollution in a large industrial city.
Calculation of air pollution from vehicles emission Index AUTO:
As air pollution index for each child, the authors take the level of atmospheric air pollution from vehicles emission in territories where a child spends quite a long time. These are: playing field in nursery school and the area where the child lives. As it is not exactly known how much time a child spends near his home and whether the child spends it there at all, the authors have refused to estimate the influence of vehicles emission around place of residence. The authors assessed the level of air pollution in the playing field in the nursery school according to the next scheme. Firstly, the authors investigated traffic at the roads and crossroads near the nursery schools. As a result, the authors found data about the number of vehicles as well as different types passing on certain roadsper unit time and about number of vehicles at the crossroads. Then, based on this information the authors estimated gross emissions from all vehicles on major roads and crossroads located at a distance less than 200 meters from the nursery school per unit time. Herewith, lines (road) are treated as linear sources of emissions and crossroads are treated as points. In this case, the level of emission at the investigated point is determined as the sum of emissions from all nearest sources taking into account atmospheric scattering. The authors describe scattering (the reducing of the emission level with increase of distance from the emission source) by exponential law. To clarify the calculations, the authors use an expert coefficient considering buildings and some features of the area adjoining the investigated point. These are objects protecting from the influence of road or crossroad emissions (buildings, shrubs, fences).As a result the air pollution index, (expressed as a continuous variable), is connected with each nursery school, and, consequently, with each child.
More details about the calculation of gross emissions and methods of assessment of air pollution in a particular area have been published previously [24].
For ease of use, the air pollution indexes should be converted in the qualitative form with a small number of gradations. To do this, the authors introduced the “load from auto transport” index (denoted as IAUTO).This index allows dividing studied territories, and, consequently, to divide the studied children’s population into three groups (with low level, middle level, and high level of auto transport load or “IAUTO”). Consequently, nursery schools divide into three groups too: with low IAUTO, middle IAUTO, and with high IAUTO.
For example, nursery school №32 with low IAUTO is located at a distance of 210m from the only auto road with low traffic (380 vehicles per hour on average).
As to nursery school №369, which fell into the third group with “high IAUTO”, it has three sources of emissions in nearest neighborhood. So, nursery school №369 is located 10 m (only!) from auto road with traffic of 1020 vehicles per hour; the same nursery school is located 50 m from the second road with traffic of 870 vehicles per hour; and, finally, it is located 140 m from the third road with traffic of 560 vehicles per hour.
The analyzed diseases:
The upper respiratory tract diseases, diseases of the circulatory system (International Classification of Diseases (ICD)-10, chapter IX), diseases of the musculoskeletal system and connective tissue (ICD- 10, chapter XIII), and behavioral disorders (ICD-10, chapter V) are selected in this study as analyzed diseases. The upper respiratory tract diseases (relating to “Diseases of the Respiratory System”, according to ICD-10, 10th Revision, chapter X) include ethmoiditis, rhinitis, hypertrophy of the tonsils, tonsillitis, and adenoiditis. Behavioral disorders among the preschool children (relating to “Mental and behavioral disorders” according to ICD-10, 10th Revision, chapter V) include most often sleep disorders, enuresis, dyslalia, and common speech disturbance. The authors denote class of diseases using the letter D and the index numbers of the ICD-10 chapters.
Statistical methods:
Classification Tree method was used as the main research method (Breiman et al, 1984). The specificity of this method is the ability of constructing subject-interpreted (or subject-oriented) decision rules, which allow selecting the most significant RFs complex for health loss in children [15].
An algorithm for constructing and analyzing the classification tree:
Among a variety of specific implementations of classification trees, the authors propose the following algorithm for constructing the classification tree. The implementation of this algorithm will be shown below on the specific examples.
1. Ranking risk factors by largest univariate effectΔW: In investigating the impact of RFs on population health, the univariate effect is traditionally understood as the difference between:
ΔW=W (RF is present) – W (RF is absent), (1)
Where W (RF is present) is the prevalence of disease in the population group that are exposed andW(RF is absent) is the prevalence of disease in the population group that are not exposed to that RF, (respectively).
First splitting: The building of a decision tree starts at the root node, which includes all children in the study sample, and which is characterized by the number of children in the sample n1 and the prevalence W1 of the study disease. The first splitting is made by the RF (from the list in Table 2), which gives the maximum univariate effect (1). As a result, the root node is divided into two nodes № 2 and № 3 (binary split algorithm), which are characterized by the number of children n2 and n3, and the prevalence of the study disease W2 and W3, respectively. According to the above, W2 and W3 differ maximally among all the possible variants of branching.
Repeated splittings: An attempt to divide each of the nodes № 2 and № 3 (that have been received in the first stage) into two new nodes with other RFs using the same principle (the branching procedure) is made at this stage. If the node is not divisible, it is called a terminal (final) node.
Stopping rules: There are mainly three ways to stop the branching algorithm:
• reaching a state where no further splitting of a node leads to a significant change in W;
• reaching a set number of branching levels;
• reaching a minimum number of cases or objects in a node.
In this study, the authors use (in one way or another) all ways of stopping the branching. Therefore, stopping branching by reaching a state where no further splitting of a node leads to a significant change in W is shown in Example 1 in the Results section. Reducing the number of objects in the nodes after each branching leads to the inability of further branching in some nodes (the number of objects in these nodes becomes too little). An example of stopping branching by this rule is shown below in Example 2. The number of risk factors in the model is directly or indirectly determined by the number of levels of branching. The number of factors in a subject-oriented statistical model allowing for a substantive interpretation should not be large (3-4 factorsare enough).
After ending the procedure of branching, all nodes become leaves or terminal nodes. Here, it is not necessary that each terminal node should contain only objects belonging to the same class (only sick or healthy only). It is sufficient that objects of the same class are dominated in the presence of objects of another class.
Forming groups with low and high value of pathology prevalence (W): To solve the problem of the impact of risk factors on population health using Classification Tree method, it is necessary to form population groups with low value and high value of pathology prevalence. For this, from all terminal nodes, we have to select those in which the value of pathologies prevalence is maximal and minimal (the selection is realized expertly). Groups with low and high value of W will be formed from these nodes.
Formulating subject-oriented decision-rules: Decision rules for the description of groups with high and low values of pathology prevalence. These rules are formulated by analyzing tree branches leading from the root node to each terminal node included in the groups (the authors have in mind groups with a low and a high value of pathologies prevalence). The authors use low-quality decision rule, which is not necessarily clearly distinguished from a healthy patient, but allows us to divide the population into groups with higher and lower diseases prevalence. Below are examples illustrating this stage of the algorithm.
Analysis of those terminal nodes not included in the decision rule: Terminal nodes that are not included in the decision rule provide important information about the possibility of compensating the negative effects of some risk factors by the lack of action of others.
Recommendations for the health protection: The resulting decision rule allows us to formulate recommendations for the population health protection understandable not only to scientist, epidemiologist, but also to practical public health specialists.

Results

Considered the application of the classification trees method to solve some specific problems.
Risk factors for the respiratory system diseases (Example 1):
Consider the problem of searching the complex of RFs maximally influencing the respiratory diseases among Ekaterinburg preschoolers. The problem is solved by constructing and analyzing a classification tree using the algorithm described above in Section 3 (Figure 1).
JAP-2330-2178-05-0039-fig1
Table 2: The prevalence of diseases.
Ranking risk factors by largest univariate effect ΔW: There are 441 children in the studied sample of Ekaterinburg preschool children (see “Materials and Methods”). Initially, all children in the sample are attributed to the root node (the node № 1 in Fig. 1). N1 = 441 above the root node corresponds to the volume of the sample (the number of children); W1 = 24, 0% corresponds to the prevalence of respiratory diseases in a sample of 441 children. We conduct ranking of RFs by largest univariate effect ΔW (1). The results are shown in the Table 2.
First splitting: For the first splitting, we should choose a RF with the largest value of ΔW as described in section 3 of the algorithm. In this case, the factor is “The sanitary state of an apartment.”An attempt to split our sample by means this RF shows that children were disproportionately divided into two groups: 404 children in the group with satisfactory sanitary state of an apartment and a very small group of 37 children in the group with unsatisfactory one. Obviously, the group of 37 children cannot be used for further branching and therefore factor “Sanitary state of an apartment” is not suitable for the first branching.
In such a case, we choose next RF from Table 2 for the first branching. This factor is “Child’s physical activity “. This factor has two levels: sufficient activity and insufficient activity. Insufficient activity is considered as a RF to increase the probability of the disease. Consequently, in terms of “Classification tree method”, dividing the sample into children having sufficient activity and children having insufficient activity allows one to get two groups of children with maximum different values of prevalence of respiratory system diseases (W2 = 29,5% for n2 = 235 children and W3 = 18,4% for n3 = 206 of children; n2 + n3 = n1= 441).
Dividing of sample by only one RF gives the relative risk RR = W2/ W3 = 29, 5 / 18, 4 = 1, 60 (Confidence Interval 1.1 – 2.2). Continuation of branching should lead to an increase in Relative Risk.
Repeated splittings: Each of the two nodes obtained at the first branching (node №2 and node№3) can be divided by the same principle (the maximum differences of W in the two obtained groups). The node № 2 is divided by the RF “Air pollution from vehicles emission”. Children attending nursery school located in low-quality air zone (IAUTO = +1) get terminal node № 5. This node corresponds to a sample of children with very high prevalence of respiratory system diseases W5 = 36, 0%.Another node (node №4; IAUTO = –1 и 0) is not a terminal one and the next stage of classification tree building this node is splitted by the factor “Type of stove in an apartment.” As a result of this dividing we obtain two terminal nodes (№8 and №9); therefore, splitting of this branch of a tree is ended. One of these two terminal nodes (node № 9) corresponds to a group of children with a high prevalence of pathology D10 (W9= 30, 0%), whereas, the second (node № 8) corresponds to a group of children with a value of prevalence W8= 19, 2%, close to the original W1 = 24, 0%.
Node №3 (as well as node №2) is split by the RF “Air pollution from vehicles emission” but this node is split by other gradations of RF. Children attending nursery schools located in the high-quality air zone (IAUTO = -1) get to terminal node №6. This node corresponds to a sample of children with very low prevalence of pathology D10 (W6 = 10, 7%). Node №7 is not terminal and the next stage of the tree building this node is divided by the factor “Type of stove in a apartment.” As a result of this dividing, we obtain two terminal nodes (№10 and №11).One of them (node № 10) corresponds to a group of children with an intermediate prevalence of pathology D10 (W10= 16, 0%), whereas, the second (node № 11) corresponds to a group of children with a value of prevalence W11= 24, 0%, equal to the original W1 = 24, 0%.
JAP-2330-2178-05-0039-fig1
Figure 1: The classification tree for respiratory system diseaese.
Stopping of branching: Nodes № 5, 6, 8, and 10 are considered as terminal, as the number of children in these nodes is rather small (50- 60 children). Further branching of these nodes will lead to nodes with very small number of objects, so any conclusions will be statistically insignificant. Nodes №9 and №11 are “perspectiveless” for further branching, as no factors from the remaining ones are able to divide these nodes with a significant increase (or decrease) of value W, compared to W9 and W11.
Expert forming groups with low and high value of pathology prevalence (W): Having assessed the value of disease prevalence in each of the terminal nodes, we, as experts, form groups with “low” and “high” value of the prevalence (W). In our example, the group with the “low” value of the diseases prevalence (W) is formed from nodes №6 and node №10, and the group with “high” value of W is formed from nodes №5 and №9. Terminal nodes W values that are close to the average value of the sample (W1 = 24, 0%) do not get into these groups.
Formulating subject-oriented decision-rules: Analysis of the “way” that leads from root node to each of the terminal nodes, forming a group with low or high W, allows us to formulate subjectoriented decision rules. Analysis of the way to nodes № 5 and 9, allows us to formulate following the decision rule. A child gets to a group with a high prevalence of respiratory diseases in two cases. The first case: RF “Physical inactivity” (node №2 is obtained from the node №1) in combination with attending a nursery school in lowquality air zone (node №5 is obtained from the node №2). The second case: combination the same RF “Physical inactivity” with gas stove in the apartment (node №9 is obtained from the node №4). In general, two terminal nodes №5 and №9 include 178 children; the average prevalence of respiratory system diseases among them is:
Having analyzed the way to the nodes №6 and 10, we obtain the decision rule describing low-prevalence respiratory diseases group. This is the physical activity of a child in combination with attending a nursery school in high-quality air zone; or in combination with electric stove in an apartment (where a nursery school can be located in middle- or low-quality air zone).
In general, two terminal nodes №6 and №10 include 106 children; and the average prevalence of respiratory system diseases among them is:
Thus, the relative risk (RR) of influence of the said above complex of RFs:
RR = W (max) / W (min) = 32, 0 / 13, 2 = 2, 4;
95% confidence interval forRR =(1,7 – 3,0)
Analysis of the terminal nodes those are not included in the decision rule:
Four terminal nodes from six obtained for Classification Tree (Figure 1) have been included to decision rule. Two remaining nodes № 8 and 11 too provide important information on the influence of RFs complex on children health. Therefore, the “way” to node№8 shows that negative health effect of such a strong RF as physical inactivity can be compensated by the lack of action of such RFs as “High IAUTO” and “Gas stove in the apartment”. That is, the presence of one RF is compensated by the lack of action of two others. On the other hand, the way to node №11 shows that the negative impact of two RFs “High and middle IAUTO” and “Gas stove in the apartment” can be compensated by sufficient physical activity of a child. The latter conclusion is particularly important because “Air pollution from vehicles emission” and “Gas stove in apartment” are the factors that the child’s parents are not always able to change, whereas, to provide the child with adequate physical activity is a quite feasible task for any family with any level of family’s well being.
This example shows that the results obtained by the Classification tree method is suitable for graphical representation, visually, and is easy to interpret.
This is an important property of the Classification tree method, as the results should be transferred to environmental/health controls and medical practitioners to make decisions in the treatment and prevention work.
Risk factors for themental and behavioral disorders (Example 2):
Consider a second example of the classification trees application for searching the complex of RFs maximally influencing the behavioral disorders among Ekaterinburg preschoolers. According to our data, the prevalence of behavioral disorders is significantly higher in boys than in girls (Table 2). For this reason the analysis was conducted only for boys.
The literature definitely indicates that the behavioral disorders prevalence increases under the action of environmental pollution on children [5,25-27].Our results confirm this observation. The main factors increasing the prevalence of behavioral disorders are: air pollution from vehicles emission and air pollution in an apartment from products of combustion of natural gas (gas stove), as well as contamination of drinking water that children consume at home.
The results of classification tree building are shown in Figure 2. From the five terminal nodes (№3, 5,6,8,9) for forming a decision rule, three nodes were used (№3,5,6). Therefore, boys consuming clean drinking water (node №3, 66 children) and boys consuming filtered tap water (its quality is a little worse), but attending nursery school in high-quality air zone (node №6, 25 children) have got into a low-prevalence disease group. For this common low-prevalence class that includes 91 boys (66 + 25 = 91), the prevalence of behavioral disorders is
Prevalence W(min) proves to be greatly lower than the average prevalence among all children W1=23,6 %.
JAP-2330-2178-05-0039-fig1
Figure 2: The classification tree for mental and behavioral disorders among boys.
Boys consuming middle- and low-quality drinking water and attending nursery school in zones with middle- and high- levels of air pollution (IAUTO = 0, +1) have got into a high-prevalence disease group. For this class that includes 124 boys, the prevalence of behavioral disorders W(max) = W5 = 34,7% proves to be higher than the average prevalence among all children–W1=23,6 %.
Thus, the relative risk (RR) of influence of complex of RFs (contamination of drinking water and air pollution) is:
RR= W(max)/W(min)=34,7/8,8=3,9;
95% confidence interval for RR = (2, 7 – 5, 0).
Note that in this case, it is possible to specify those RFs, the removing of which dramatically reduces disease prevalence (contamination of drinking water and air pollution). At the same time, the list of RFs, which can dramatically increase the disease prevalence, is less certain in this situation. Nevertheless, it turns out that the removal of only two RFs(contamination of drinking water and air pollution) reduces disease prevalence in children by almost four times!
Let us carry out an analysis of the terminal nodes those are not included in the decision rule. These nodes are the result of a combination of the presence and the absence of the RFs action. So, the node №7 shows that consuming tap water (water of poor quality) may well be offset by the clean air: W7 is even lower than W1. The node №9 shows that the action of two significant RFs(poor quality drinking water and gas stove) does not lead to a strong increase in the disease prevalence (W9 is only slightly higher than W1), if children attend nursery school in zones with clean air.

Discussion

Frequently, the impact of a single RF on health is investigated in research that is being conducted in the field of environmental epidemiology. All adjustments made for the confounding factors allow reducing the bias, but the results still remain “single-factor” [28,29]. Meanwhile, according to the modern paradigm of the web of causation, we are dealing with the influence on the health of a number of factors simultaneously [4,7-9]. In this article, the authors propose to use one of the methods of multivariate data analysis–the classification trees method.
Classification trees method proved to be an effective tool for finding RFs complex (set of RFs) having the greatest impact on population health under simultaneous action. For example (Figure 1), for respiratory system diseases in children, the most dangerous combination of RFs turns out to be air pollution (indoor and outdoor), combined with physical inactivity of the child. At the same time, air pollution is a nonmodifiable RF (as outdoor) or a difficult modifiable factor (as indoor), whereas, physical activity can be regulated by the child’s parents. This conclusion follows from the analysis of the terminal nodes of the classification tree (Figure 1). Decision rules of the classification tree method (rules describing groups of children with low and high prevalence) are formed on the basis of this analysis. In addition, different conclusions about possible changes in impacts of some RFs in the presence/absence of other RFs can be drawn by analysis of the nodes, which are not included in the decision rule. Therefore, for “Behavioral disorders” the node №9 (Figure 2) shows that the presence of two RFs “tap water” and “gas stove” is fully compensated by the absence of RFs “Outdoor air pollution” (W9 is only slightly higher than W1).
JAP-2330-2178-05-0039-fig1
Table 3: List of risk factors ranked by ΔW for respiratory system diseases.
Note that the results obtained by the classification tree method are suitable for graphical representation; therefore, the conclusions made on this basis are evident and easy to interpret for scientists in the field of medical and environmental monitoring, as well as for practical health care professionals.
The findings of the current study about the impact of RFs on population health are consistent with those of earlier studies [29-32]. A number of recent epidemiologic studies have found association between air pollution and respiratory disease and some behavioral disorders [8,26,27].
There are many large European research projects that show that air pollution is an important environmental RF for the health of children and adults (APHEA; Aphecom project; CESAR; Enrieco Project: Environmental Health Risks in European Birth Cohorts; ESCAPE идр.) Let us discuss some of them in detail.
The Aphecom project, or “Improving Knowledge and Communication for Decision Making on Air Pollution and Health in Europe”. Ultimately, through this study, the Aphekom project hopes to contribute to reducing both air pollution and its impact on health and well being across Europe.
The ESCAPE project (European Study of Cohorts for Air Pollution Effects) investigates long-term effects of air pollution exposure on human health in Europe. The background is that current estimates of the European health impact of exposure to especially fine particles in the air are large. In particular, the ESCAPE study investigates relationships between (a) air pollution and adverse prenatal health outcomes, and development of diseases such as asthma in children; (b) respiratory disease endpoints in adults; (c) cardiovascular disease endpoints in adults.
The Girona project investigates the association between long-term exposure to traffic-related air pollution and subclinical atherosclerosis in Spain, and modification of this association by diet and other.
The CESAR project (Central European Study on Air Pollution and Respiratory Health) in addition to the study of the air pollution influence on the respiratory health of children, also investigates the role of nutrition in children’s respiratory health.
Analysis of the above projects has shown that the set of factors included in the study may be relatively narrow (only air pollution); it may be expanded by some dietary factors (diet). Finally, the research may include a rather wide range of RFs, among which are both social and family RFs.
A wide range of included RFs should be assigned as advantages of our study. Therefore, apart from the effect of atmospheric pollution (outdoor air), we examine necessarily the influence of such RFs as the presence of a gas stove in an apartment (a marker of internal contamination, indoor air), as well as such factors as maternal education, psychological climate in the family, and so on.
In this regard, the authors noted that our results are consistent with the results of earlier studies that have found the impact of social and family RFs on health [5,10].
In contrast to the results listed above, the classification tree method allows one to analyze the combined influence of RFs of a different nature on human health.
As already noted, analyzing the joint effects of RFs, it is possible to investigate reducing the negative impact of some RFs by modifying other RFs. To implement this possibility, some of the RFs should be modifiable (factors can be changed). In most cases, the modifiable factors are family factors, behavioral, and other factors not related to the environment pollution. It follows that studying the influence of such factors (not only environmental pollution) on public health allows us to describe a web of causation for non communicable diseases.

Conclusions

In summary, such factors as atmospheric pollution, gas-stove in apartment, parent smoking, low-quality drinking water, child’s insufficient physical activity, and mother’s low educational attainment increase the prevalence of diseases even in terms of univariate approach. The authors have identified RFs sets having a maximum negative influence on the prevalence of respiratory diseases and behavior disorders in children. Therefore, a combination of physical inactivity along with attending a nursery school in poor-quality air zone or with gas stove in an apartment increases the prevalence of respiratory diseases in 2.4 times. Further, a combination of poorquality air (indoor and outdoor) along with consuming poor-quality drinking water increases the prevalence of mental and behavioral disorders among boys of preschool age in Ekaterinburg in 3.9 times.

References