Geographical Pathology of Acute Lymphoblastic Leukemia in Iran with Evaluation of Incidence Trends of This Disease Using Joinpoint Regression Analysis

1Department of Epidemiology, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran 2Workplace Health Promotion Research Center, Department of Epidemiology, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran 3Center for Remote Sensing and GIS Research, Faculty of Earth Sciences, Shahid Beheshti University, Tehran, Iran 4Environmental and Occupational Hazards Control Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran 5Department of Environmental Health Engineering, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran


Introduction
Blood cancers represent the fifth most prevalent type of cancer in the world, accounting for about 8% of all cancers. 1 Leukemia is the fourth leading cause of death among Iranian children aged 5 to 14 years. 2 The most prevalent cancer of children in the world and Iran is acute lymphoblastic leukemia (ALL). 3,4 This disease accounts for about 30% of all childhood cancers. The peak incidence age of ALL is between 2-5 years, and it is slightly more common in males. 5 The high incidence and prevalence of leukemia are associated with significant mortality, incurring high diagnostic and therapeutic costs in Iran. 6 Leukemia is a multifactorial disease, but despite the remarkable progress of the medical sciences, the etiology of this disease remains unknown. 7,8 Environmental risk factors in each region can provide the ground for ALL occurrence and clustering. [9][10][11][12] Researchers attribute variations in the incidence of ALL in different communities to a number of infectious, environmental, geographic, and genetic susceptibility risk factors. 5,[8][9][10][11] Researchers suspect the role of infectious pathogens, especially viruses, among the predisposing, enabling, and reinforcing factors that contribute to the occurrence of leukemia. This hypothesis was first called the infectiousness hypothesis, but if a seasonal trend for the diagnosis of leukemia is identified, the infectiousness of this disease is proved with greater certainty. [12][13][14][15] On the other hand, infectious pathogens can easily be transmitted from the patient to other healthy people. In the event of this transmission, the possibility of spatial clustering should be considered. 16,17 The hypothesis regarding the role of infectious agents in the occurrence of leukemia is still unclear, but the detection of seasonal patterns. 12,14 and spatial clusters 18,19 can serve as a supporting factor that reinforce this hypothesis. Also, the clustering and spatio-temporal analysis of leukemia can help identify the environmental factors likely to be involved in the etiology of the disease. 10,13 So far, few studies have investigated the temporal trend of childhood cancer incidence. On the other hand, spatiotemporal analyzes are important tools in the epidemiology, etiology and surveillance of childhood cancer, which can be highly effective in generating new hypotheses. 20,21 Without using remote sensing and medical geographic information systems (GIS), future studies would lack the precision needed to predict the trends, distributions, causes, and methods of controlling and preventing diseases. 22 Since the etiology of ALL is still unknown, and to the best of the authors' knowledge, no epidemiologic and geographical pathology study of ALL has been conducted in Iran so far, this study was designed to determine temporal trends in the incidence of ALL along with spatial autocorrelation and identification of high-risk and low-risk clusters to provide deeper and more accurate insights into the hypothesis about the impact of infectious pathogens and environmental risk factors on the disease cycle.

Materials and Methods
Study Design This ecological study was performed using an exploratory mixed design. As an exploratory research, it investigates the spatial pattern and trends of disease occurrence. The mixed design used in this study means that it investigates spatial patterns (multiple-group study) and temporal trends at the same time. 23 In the present study, the incidence of ALL in different time periods and places was compared in Iran.

Target and Study Populations
The target population consisted of all children aged 0-14 years diagnosed with ALL in Iran and the study population comprised children with ALL who were reported in the National Cancer Registry Program from 2006 to 2014 and met the inclusion criteria. Inclusion criteria consisted of (1) access to patients' location information (2) diagnosis of acute lymphocytic leukemia, and (3) onset age of disease from the birth to 14 years of age.
Data Sources Three types of data sources were consulted in this study. Data of ALL patients were derived from the list reported by the National Cancer Registry of Iran during 2006-2014. Data on healthy children aged 0-14 years in the Iranian provinces was obtained from the Statistical Center of Iran. Geographic coordinates of patients were obtained using Google Map and georeferenced layer of Iran provinces.
Description of the Study Area A country in southwest Asia with an area of 1 648 195 km2, Iran sits in the Middle East. According to the latest information of World Bank in 2019, Iran's population is estimated at 82 913 906 people. Iran lies between 25° 3′ and 39° 47′ N in latitude and between 44° 5′ and 63° 18′ E in longitude. The geographical location of Iran is shown in Figure 1.

Statistical Analysis
Statistical analysis covered three main parts: (1) descriptive analysis of the epidemiological and demographic indicators of patients; (2) analysis of the temporal trends of incidence; and (3) analysis of the spatial pattern of disease incidence and clustering. In the present study, all statistical tests were performed at 2-sided alpha (α) and significance level of 0.05.

Descriptive Analysis
In this study, central tendency and dispersion indices of ALL were obtained for the descriptive analysis of data. Then, the data was analyzed using appropriate statistical tests such as Mann-Whitney U using the SPSS software.
Temporal-Trend Analysis In order to study the trends of ALL, the incidence rate was estimated separately for each year between 2006 and 2014, then trend variations of the disease were evaluated using joinpoint regression analysis as well as the version 4.7.0.0 of Joinpoint Regression Program.
Incidence rate is calculated by dividing the number of new cases of disease during a specified time interval by the average number of individuals in the population at risk in this specific time interval.
Joinpoint regression analysis is used to identify time trends and points when incidence rates change significantly. In the present study, the incidence rate (per 100 000 population) and frequency of ALL disease were the dependent variable and year, month and gender were the independent variables. Then, given that the dependent variable followed the Poisson distribution, we used the natural log-linear model to compute the annual percent change (APC) and average annual percentage change (AAPC). To calculate the number of time points at which the incidence rate changes drastically (joinpoints), the Grid Search method was utilized. The P value was calculated using Monte Carlo method with 4499 iterations. Bayesian information criterion (BIC) and sum of squared errors (SSE) were used to evaluate the precision of models and select the best model. 24 Finally, in addition to identifying the time of joinpoints, the estimated APC-AAPC values as well as variations in regression line slopes along with the confidence interval (CI = 95%) were obtained for the annual trend of incidence between 2006 and 2014 for each gender. Also, to estimate the seasonal trend of ALL, the frequency of incident cases was calculated monthly and then the time of joinpoints, APC, AAPC values, and slope variations of the monthly trend along with the 95% CIs were calculated using joinpoint regression analysis.

Spatial Pattern and Cluster Analysis
In the next step, to evaluate the spatial pattern of ALL, the cumulative incidence rate (CIR) in each province of Iran was calculated in the period from 2006 to 2014 and then CIR was mapped in Iran using the ArcGIS software version 10.8.
CIR is also known as a "risk" in epidemiology. The CIR is calculated by dividing the number of new cases of disease during a specified time interval by the total number of individuals in the population at risk at the outset of that time interval.
To assess the spatial autocorrelation of the disease, the Global Moran's I index was used. The range of this statistic is from -1 to +1 with values close to +1 indicating clustered distribution, values close to -1 indicating dispersion and zero denoting random distribution of the phenomenon under study. In our study, the null hypothesis stated that ALL distribution was randomly distributed in Iran. Finally, Anselin Local Moran's I index was employed to identify the location of high-risk and low-risk clusters of ALL in Iran. This index divides the polygons into 5 parts, including High-High, Low-Low, Low-High, High-Low, and not significant. In this context, the high-high (HH) suggests that a zone and its neighboring areas have a high incidence rate, which can be seen as the high-risk clusters of disease incidence or its hotspots. The low-low (LL), however, indicates that a zone and its surrounding areas have a low incidence of disease, therefore serving as low risk clusters of disease or coldspots. 25

Descriptive
In this study, 3769 ALL patients were enrolled after applying the inclusion and exclusion criteria. The average annual incidence rate of ALL was 2.25 per 100 000 children under 15 years of age.
The mean and standard deviation of patients' age were 5.90 ± 3.68 with median and interquartile range of 5. Among all patients, 1578 (42.1%) patients were females and 2182 (57.9%) were males, so the incidence of ALL in males was 1.37 times higher than females. The mean ± standard deviation of age in females was 5.81 ± 3.61, also median and interquartile range of age were 5 years. On the other hand, the mean ± standard deviation of age in males were 5.97 ± 3.73, median was 5 and interquartile range was 6 years. To compare the age in male and female children, given the abnormal and right-skewed distribution of patients' age, the Mann-Whitney U test was used. The results of this test revealed that there was no significant difference in age distribution between male and female patients (P = 0. 261).

Temporal Trends
The results of temporal trends incidence of ALL by year, month, sex and age of patients, which were analyzed by Joinpoint regression, are shown in Table 1 and Figure 2. Table 1 and Figure 2A display the crude incidence rate of ALL by year. According to the results, the variation trend of the incidence increased from 2006 to 2014 at an annual average rate of 7.1% but this increase was not statistically significant (95% CI = -10.5, 28.1). However, a joinpoint was observed in 2008 with the incidence rate changing significantly at this point, so that the annual incidence rate increased more strongly between 2006 and 2008 (APC = 17.74%, 95% CI = -54.1, 201.8) but the incidence rate was mitigated between 2008 and 2014 (APC = 3.79%, 95% CI = -8.6, 17.9). Further information on this temporal trend is shown in Table 1. Table 1 and Figure 2B show the results of the crude incidence rate of ALL by year and gender. Overall, the incidence rate of disease was higher in boys than in girls, but speed and average annual increase in incidence rates were higher in girls than in boys during the 2006-2014 period (AAPC = 9.2% vs. 5.7%). As can be seen, a joinpoint appears in 2008, so that the incidence APC of disease is higher in girls during the 2006 to 2008 period and slightly higher in boys during the 2008 to 2014 period. Overall, the results suggest that ALL incidence was higher in girls on average throughout the study period, especially in the early years of study, but by the end of study, the incidence rate of disease rose in boys. Further information on this trend is displayed in Table 1.
Results for the trend of new ALL cases by the month of diagnosis are shown in Table 1 and Figure 2C. According to the findings, this disease has seasonal variations, so that the highest incidence of disease was observed in spring, summer, winter, and autumn, respectively, and the highest and lowest number of patients were recorded in June (377 cases) and November (242 cases), respectively. In general, the curve of monthly incidence rate of the disease is pyramid-shaped and has a joinpoint in April. According to joinpoint regression analysis, there is a progressive increase in the incidence rate of disease from January to June (APC = 12.37%, 95% CI = -2.4, 29.3), which peaks in June. However, it is followed by a drastic and significant drop from June to December (APC = -3.98%, 95% CI = -6.7, -1.1). Further information on this trend is displayed in Table 1. Table 1 and Figure 2D depict the results for the trend of new cases of ALL by age and sex. As can be seen, the  trend of incidence is parallel and identical for girls and boys, with a peak incidence between 2 and 5 years of age. Also, a joinpoint was found at the age of 2 to 3 years, with a significant increase in the trend of ALL incidence from the birth to 2 years of age (APC = 141.89%, 95% CI=71.9, 240.4), after which the trend of incidence dropped progressively and significantly by the age of 14 (APC = -11, 95% CI = -12.6, -10). Further information on this trend is displayed in Table 1.

Spatial Autocorrelation and Cluster Identification
To analyze the spatial pattern of ALL, the CIR of disease was first calculated per 100 000 children under 15 years of age during the 2006-2014 period and then the spatial distribution map of the disease was produced (Figure 3). The results of the Global Moran's I analysis are shown in Figure 4. As can be seen, the value of this index was estimated at 0.358, indicating a high degree of spatial autocorrelation as well as great tendency of ALL for clustering. The Z score = 3.71 and P value < 0.0001 indicate that the spatial autocorrelation of the ALL is significant.
With the high spatial autocorrelation and clustered spatial distribution of the disease known, we decided to identify hotspots and coldspots of the disease in Iran. "Hotspot" means a cluster of high incidence values while "cold-spot" means a cluster of low incidence values. Statistically, if an area has a high incidence rate and  is surrounded by high incidence rates, it is known as a hotspot. On the other hand, if the incidence rate is low in an area and its neighbors, it is known as a coldspot. To achieve this goal, Anselin Local Moran's I index was employed and the map was produced ( Figure 5). According to the results, the provinces of Fars and Kohgiluyeh and Boyer-Ahmad constitute a hotspot, which exhibit spatial clusters of high values and high risk of the disease. On the other hand, Kermanshah, Zanjan and Kurdistan provinces were identified as coldspot, which indicate spatial clusters of low values and low risk of the disease. The geographic coordinates of the provincial centers identified as high-risk and low-risk ALL clusters are shown in Table 2.

Discussion
The aim of the present study is to determine the temporal trend of ALL incidence along with spatial pattern assessment and identification of high-risk and low-risk clusters to gain deeper insights into the hypothesis of the impact of infectious pathogens and environmental risk factors on the incidence of ALL.
The average annual incidence rate of ALL during the 2006-2014 period was 2.25 per 100 000 children under the age of 15 years. The incidence of ALL in Iranian children appears to be lower than that of developed countries and analogous to that of developing countries. The incidence of ALL is more common in countries with a Epidemiology of Acute Lymphoblastic Leukemia in Iranian Children higher human development index (HDI). 26 Other studies have estimated the incidence rate of ALL to be between 3 and 7 per 100 000 children. 20,[27][28][29] In the present study, the mean and standard deviation of age were 5.90 ± 3.68 with a median age of 5 years. There was no statistically significant difference in age distribution between male and female children. The results of the study regarding the patients' age are aligned with other studies in this field. The mean and standard deviation of patients' age were reported at 6 ± 4 years by Parra et al, 20 5.5 ± 0.92 years by Mehrvar et al, 29 and 6.3 years by Sousa et al. 28 The incidence of ALL new cases was higher in male (57.9%) than in female (42.1%) children, but the speed and the average annual increase in incidence were higher in female children during the 2006-2014 period (AAPC = 9.2% vs. 5.7%). These results are consistent with those reported in other studies according to which the incidence rate of ALL was higher in males than females. 9,20,27,29 According to American Cancer Society, 55% of new cases and 56% of ALL deaths are related to male children. 30 In summary, as expected, the demographic data obtained from Iranian children with ALL resemble those of children with ALL in the rest of the world.
An analysis of temporal trends of ALL suggests that the incidence rate of disease saw an average increase of 7.1% between 2006 and 2014. These findings exhibit that in Iran, as in many developing countries, the incidence rate of ALL is increasing with a greater intensity.
In 2019, Hubbard et al showed that the incidence and value of AAPC in ALL have been increasing in most parts of the world. According to these studies, the average increase in the incidence of ALL in countries with low HDI is greater than countries with very high HDI (AAPC = 4.04% vs. 0.44%). This study points out that the incidence of ALL usually rises by 1 to 2% each year, but this rate is significantly higher in West Asia (3.68%) and the Caribbean (4.4%). 25 It is predicted that the incidence of ALL rises in developing countries due to their younger population, 31 About 70% of cases are expected to emerge in developing countries by 2030. 32 Results of new cases of disease by month indicate that ALL is more prevalent in spring and summer, which serves as high-risk season for the transmission of infectious pathogens. Many communicable diseases and some noncommunicable diseases have a seasonal rhythm, which often reflects the role of infectious agents in the incidence of diseases. 14 Other studies have also considered the seasonal incidence of ALL as a way of lending support to the hypothesis that the disease is infectious. 13,15 On the other hand, joinpoint regression analysis revealed that the incidence of disease increased progressively from January to June, reaching its peak in June, and then declined significantly from June to December. Accordingly, it can be stated that the incidence of ALL is higher during warm seasons. This finding aligns with other studies in which summer and spring were identified as high risk and susceptible seasons for ALL. 15,33,34 This finding can be attributed to greater and prolonged exposure to UV light during the spring and summer seasons. To date, very few studies have explored the association between ALL incidence rate and UV exposure. Masamich et al found a positive and significant correlation between leukemia death risk and UVB exposure in Japan. 35 In France, Coste et al demonstrated that with every 25 J/ cm 2 increase of energy intensity in children under 5 years of age, the incidence of ALL rises significantly (standardized incidence rate = 1.09; 95% CI = 1.03-1.14). The model was well-fitted in young children and irradiation above 100 J/cm 2 . Under this condition, with a 25 J/cm 2 increase in energy intensity, a standardized incidence rate of 1.24 (95% CI = 1.14-1.36) was obtained. Finally, researchers suggested that greater exposure to UV may be associated with a higher incidence of ALL in different geographic areas, and suppression of the immune system by UV could be proposed as a biological hypothesis for justification of this finding. 36 Thus, the results of our study align with those reported by Coste et al. However, some studies found that the highest incidence of disease was in winter 37 while others provide little evidence of seasonality of the disease. 14,38 These inconsistencies can be due to divergent ways of diagnosing and reporting the disease, differences in the incubation period, differences in the frequency and pattern of population referrals to medical centers, variances in cancer registry systems, different age-gender pyramid of countries, and discrepancy in the type of study design and method of analysis.
According to age-related results, ALL incidence may be significantly higher in the early years of life, meaning that the prevalence of microorganisms and pathogens is also significantly higher, but as individuals age and their immune system is strengthened, the disease incidence rate also declines. ALL can have an infectious origin, so that its incidence may be due to the abnormal body response to common infections early in life. 15,39 The highest incidence was observed between 2 and 5 years of age. In the studies by Hjalgrim et al, 40 Dohner et al, 41 Bahoush et al, 27 and Alrudainy et al, 42 the peak incidence of the disease was reported to be at the age of 2-5 years.
In addition to Z score and P value, Global Moran's I = 0.358 indicated that spatial autocorrelation and tendency for spatial clustering were high in ALL disease. In other words, areas with geographical proximity had a comparable incidence rate. This is a reinforcing factor for the association of environmental risk factors, especially infectious ones, with the incidence of ALL. 10,13 In this regard, the results of most studies align with our study, so that a significant clustering of ALL was observed in other geographical areas, which serves as a factor to support the involvement of environmental factors, especially infections in the incidence of the disease. 10,20,43 In the study by Nyari et al, Global Moran's I index for spatial clustering of ALL was found to be 0.133 and significant. The researchers hypothesized an environmental etiology for the incidence of ALL, which is not yet known. They posited Chernobyl catastrophe as one reason for this clustering. 43 Some pathogens (HBV, HCV, HCG, HTLV-1) can be effective in leukemia due to their ability to cause acute and chronic infections in mononuclear blood leukocytes, especially lymphocytes (lymphotropic property). 37,44 Stephen's study suggested that the spatio-temporal pattern of abnormal leukemia incidence has been consistent with the involvement of infectious agents. 45 A 2017 study by Kreis et al revealed that ETV6-RUNX1 (TEL-AML1) fusion with OR adjusted = 2.54 was the cause of ALL clustering. 46 FLT3 ITD gene mutation occur in ALL patients with poor prognosis. 47 Spatial distribution of the disease in Iran exhibited low incidence in northern and western provinces of Iran, high incidence rate in southern and eastern provinces and moderate incidence in central provinces. The results revealed that Fars and Kohgluyeh and Boyer Ahmad provinces were the hotspot and Kermanshah, Zanjan and Kurdistan provinces were the coldspot of ALL. The residence of children can be a reasonable substitute for environmental and local exposure because children spend a great deal of their time at home and are less likely to migrate. 8,48 Thus, it can be posited that the incidence of the disease is inversely correlated with latitude increase and is directly related to longitude increase. One reason for the high incidence rate of disease at low latitudes is the greater direct and intense exposure to UV radiation, which has been identified as a risk factor for ALL in other studies. 35,36 However, it should be noted that other climate variables such as temperature, precipitation, humidity, etc can be linked to latitude and longitude, and this relationship may be due to other climate variables and confounding factors.
In conclusion, the incidence of ALL in Iranian children is lower than in developed countries, but in Iran, as in many developing countries, the incidence of new cases is rapidly on the rise. In addition, the average annual incidence of ALL is higher in girls than boys. ALL is more likely to spread in spring and summer and warmer months, which may be due to longer exposure to UV as a carcinogen in ALL. The hotspot of disease was detected in Fars and Kohgluyeh and Boyer Ahmad and coldspot in Kermanshah, Zanjan and Kurdistan provinces. The high incidence of ALL in the early years of life, seasonal variations, specific spatial distribution of the disease, the severity of spatial autocorrelation, and the emergence of spatial clusters are evidence that support the impact of environmental risk factors and infectious pathogens on the disease cycle. It can be argued that the incidence of ALL is due to the synergistic interaction between environmental, infectious, geographical and genetic risk factors.