Nominal Classification for Determining the Geographic Origin of Cannabis: Integrating Machine Learning and Stable Isotope
Isotopic analysis. Machine Learning. Cannabis. Biomes
The aim of this research was to improve isotopic models for the geolocation of Cannabis through the integration of machine learning techniques and isotopic analysis. To achieve the intended objectives, isotopic data from isoscapes and real samples of Cannabis from different Brazilian biomes were used. The method involved bootstrap resampling to generate pseudo-samples and the application of classification algorithms, including Naive Bayes, Random Forest, and Neural Networks. The results demonstrated that the algorithms used achieved high accuracy rates in classifying samples by biome, with the combination of methods proving to be an efficient strategy for tracking. It was concluded that among the chosen classifiers, the Support Vector Machine (SVM) showed the best performance both in terms of sample size (from 50 to 500), using the sigmoid kernel function, and in percentage (5% to 95%), in this case using the polynomial kernel. The SVM model with kernel functions achieved the highest average accuracy (0.7970), reinforcing its overall effectiveness even with reduced samples. Meanwhile, the polynomial SVM model reached an average accuracy of 0.8036, the highest among all evaluated models with varying percentages, and stood out for its balance between precision and recall. Thus, the results highlight that the integration of isotopic analysis with machine learning provides significant contributions to forensic science, enabling greater control over drug trafficking and important advances in knowledge about Brazil's biomes.