
Study design and participants
This study adhered strictly to the recommendations of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines39employing a stratified random cluster sampling method to systematically conduct the survey. To comprehensively assess the physical fitness status of children aged 9 to 12 years in Shandong Province, the research team selected 47 primary schools across six cities—Jinan, Qingdao, Linyi, Dongying, Heze, and Yantai—during the period from April to May 2024. These cities were strategically chosen to represent the eastern, western, southern, northern, and central regions of Shandong Province, ensuring geographic diversity and regional representativeness of the sample. The geographic distribution of sampled schools and participating students across the six study cities is detailed (Table 1) to ensure methodological transparency. This table provides detailed information on the number of valid participants from each city, thus improving clarity regarding the sampling coverage and representativeness across the province. This sampling design ensured that the sample adequately reflected the demographic characteristics of children in Shandong Province.
The study employed a two-stage sampling method: In the first stage, stratified sampling was conducted based on educational levels to ensure balanced representation across grades (Grades 3–6). In the second stage, cluster sampling was implemented, with natural classes within each grade defined as sampling units. Random sampling was then used to select classes. This multi-stage sampling strategy not only ensured the accurate representation of each grade but also guaranteed that the sample included students from varying academic levels and socio-cultural backgrounds. As a result, the approach effectively minimized selection bias and created a diverse yet methodologically rigorous research sample.
A total of 29,856 eligible students were included in the study. Exclusion criteria included individuals with severe organ diseases, significant physical disabilities or deformities, and those experiencing acute symptoms such as colds or fever. All included students completed both the physical fitness tests and the questionnaire surveys, achieving a 100% participation rate, with an effective participation rate of 97.37%. The gender distribution of the sample was balanced, with 15,109 males (50.61%) and 14,747 females (49.39%).
Physical fitness assessment and questionnaire survey
Physical fitness assessments were conducted in accordance with “National student physical health standard (Revised 2014)”, issued by the Ministry of Education of the People’s Republic of China40. These standards serve as the official national guideline for evaluating the physical fitness of students across different school levels, providing a systematic, comprehensive, and objective assessment framework41. The testing protocol includes core physical indicators such as body composition, cardiopulmonary endurance, muscular strength, flexibility, and speed, which together reflect the overall physical fitness status of school-aged children. To ensure consistency and accuracy, all physical fitness tests were administered using standardized instruments across all testing sites. Height and weight were measured using calibrated ultrasonic height-weight measuring devices (HW-900Y, Zhengzhou Hengwei Medical Instruments Co., Ltd.), and vital capacity was assessed using portable electronic spirometers (FGC-A+, Beijing Jafron Biomedical Co., Ltd.). Sit-and-reach flexibility was tested using standardized measurement boxes. Pull-up performance (males) was assessed by counting the number of valid repetitions, with each repetition requiring an overhand grip and the chin to pass above the bar; no time limit was imposed. For female participants, sit-up performance was evaluated over a timed 1-min interval, with repetitions manually counted by trained evaluators and the timing monitored using digital stopwatches accurate to ± 0.01s to ensure strict adherence to the time limit. Standing long jump distances were recorded using non-slip metric mats, and 50-m sprint times were measured with electronic timing gates. The 1-min rope skipping test was counted manually by trained assessors following uniform timing and counting protocols. Shuttle run (50 m×8) performance was recorded using fixed track markings and stopwatches.
All measurement devices were calibrated before each testing session in accordance with the manufacturers’ guidelines, and periodic checks were performed throughout the data collection process to maintain measurement accuracy. To ensure the reliability and comparability of results across regions, all assessors received standardized pre-survey training, including theoretical instruction and hands-on demonstrations, focusing on equipment operation, testing procedures, and error reduction. A detailed standard operating manual was provided to all personnel, and regular supervision and quality control checks were conducted by senior researchers throughout the data collection period.
All personnel involved in testing and survey administration received standardized, comprehensive training to ensure proficiency in the physical fitness testing procedures and field epidemiological survey methods. Before the formal data collection began, the research team recruited qualified personnel and conducted a unified training program focusing on physical fitness test standards and protocols. After the training, all testers were required to pass a rigorous skill assessment to ensure consistency and reliability across different regions and personnel.
In addition, to evaluate inter-rater reliability and ensure consistency across different testing teams, a subset of participants (approximately 10% of the total sample) was independently assessed by multiple trained evaluators using the same protocols. Intraclass correlation coefficients (ICCs) were calculated for key continuous physical fitness indicators (e.g., height, weight, vital capacity, sit-and-reach, sprint time), with all ICC values exceeding 0.87, indicating good to excellent reliability. These methodological safeguards were implemented to minimize measurement bias and ensure uniformity of assessment results across diverse geographic regions and testing personnel. The assessments followed a standardized sequence: height (cm) and weight (kg) were measured first, followed by the evaluation of vital capacity (ml). Subsequent tests included sit-and-reach flexibility (cm), pull-ups for males (number of repetitions), 1-min sit-ups for females (number of repetitions), standing long jump (cm), and a 50-m sprint (seconds). The final tests consisted of 1-min rope skipping (number of repetitions) and the 50-m × 8 shuttle run (seconds) for participants in Grades 5–6. A total score of ≥ 60 was set as the benchmark for passing, representing the threshold for physical fitness test qualification. The total score was calculated as a weighted sum of individual test scores, with different weights assigned to each parameter. The weight distribution for physical fitness parameters is as follows: Body Mass Index (BMI), calculated as weight (kg) divided by height (m) squared, accounted for 15%, vital capacity for 15%, the 50-m sprint for Grades 3–4 for 20%, sit-and-reach flexibility for 20%, 1-min rope skipping for 20%, and 1-min sit-ups for 10%. For Grades 5–6, the 50-m sprint accounted for 20%, sit-and-reach flexibility for 10%, 1-min rope skipping for 10%, 1-min sit-ups for 20%, and the 50-m × 8 shuttle run for 20%.
The questionnaire was administered anonymously, ensuring strict confidentiality. The survey collected data on the following variables: Demographic characteristics: gender, age, residential status, annual family income (in yuan)42and parental education levels (father’s and mother’s education). Family environment: frequency of secondhand smoke exposure, whether parents engage in physical activity, and whether parents support their children’s participation in physical activity. Physical activity (both school and non-school): frequency of physical activity per week, duration of each exercise session (in hours), and exercise intensity. Lifestyle factors: daily sleep duration (hours), daily screen time (hours), and daily homework time (hours). Dietary behaviors: frequency of breakfast consumption per week, frequency of meat intake per week, frequency of vegetable intake per week, frequency of fruit intake per week, frequency of egg consumption per week, frequency of milk intake per week, and frequency of fast-food consumption per week (see Supplementary Tables 2 and Supplementary Table 3).
Prior to the commencement of the formal survey, the researchers provided detailed explanations of the study’s objectives, significance, and participation process to the students and their legal guardians. Following the completion of the physical fitness tests, questionnaires were distributed to the students. Given the participants’ cognitive levels, the questionnaire was completed with the assistance of their parents. Upon collection, all completed questionnaires were systematically organized and data were entered into a computer for analysis using a dual-entry method, with two independent data entries to ensure accuracy.
The reliability and validity of the questionnaire were assessed prior to the formal survey implementation. To evaluate test–retest reliability, the questionnaire was administered twice to a subsample of participants with a 2-week interval. The results indicated no significant differences between the two administrations, demonstrating satisfactory temporal stability.
Content validity was established through a structured expert review process. A panel of five experts was invited, including two specialists in public health, two in physical education, and one in child epidemiology. Experts were selected based on their academic qualifications (all held doctoral degrees) and extensive experience in child health and physical fitness research. Each expert independently assessed the relevance, clarity, and comprehensiveness of each item using a 5-point Likert scale (1 = not relevant, 5 = highly relevant). Based on their evaluations, the Item-level Content Validity Index (I-CVI) was calculated, and only items with I-CVI ≥ 0.80 were retained. The Scale-level CVI (S-CVI) was also computed, demonstrating acceptable overall content validity. Discrepancies were resolved through consensus discussions, and minor modifications were made to improve item clarity and cultural appropriateness for the target population.
In accordance with the STROBE guidelines, data completeness was rigorously examined before statistical analysis39. Questionnaires with more than 10% missing responses were excluded from the final dataset. For entries with minor missing values, multiple imputation using the Markov Chain Monte Carlo (MCMC) method was applied to minimize potential bias and maximize data utility. To further ensure that missingness did not introduce systematic bias, we conducted an independent samples t-test (for age) and a chi-square test (for gender) comparing respondents with missing data to those with complete data. No statistically significant differences were observed (p > 0.05), suggesting that the missing data were missing at random and that the sample remained representative.
This study was approved by the Ethics Committee of Shandong Institute of Petroleum and Chemical Technology (Approval No.: KY-2024-021). All research procedures were conducted in strict accordance with relevant guidelines, regulations, and ethical standards. Prior to the implementation of the study, written informed consent was obtained from the legal guardians of all participants, who also signed the informed consent forms.
Statistical analysis
A database was established using Epi Data 3.1 software, and data entry was verified through dual independent entry. Statistical analyses were performed using SPSS version 27.0. Continuous variables are presented as mean ± standard deviation (\(\overline{X}\)± S), while categorical variables are described using frequency (percentage) [n (%)] for descriptive statistics. Prior to analysis, data were cleaned to identify and exclude incomplete or implausible entries. Cases with missing key variables were excluded via listwise deletion, as the proportion of missing data was less than 5%. Categorical variables were coded based on standardized criteria. For instance, physical fitness was categorized as “qualified” (≥ 60 points) and “unqualified” (< 60 points), following national evaluation standards. The prevalence of physical fitness was calculated as the proportion of students whose total score met or exceeded the 60-point threshold.
The factors influencing physical fitness and associated analyses were conducted using chi-square (χ2) tests and multivariable unconditional logistic regression analysis, with a significance level set at α = 0.05. For univariate analyses, logistic regression was used, with results expressed as odds ratios (OR) accompanied by 95% confidence intervals (CI) to provide an accurate estimate of effect sizes. To assess the goodness-of-fit and calibration of the final logistic regression model, the Hosmer–Lemeshow test was employed, ensuring the robustness of the analytical framework43.
