
Esta obra está bajo licencia internacional Creative Commons Reconocimiento 4.0.
Arquitectura para el análisis de datos agronómicos en un ambiente de Big Data
Corresponding Author(s) : Luis Felipe Vargas Rojas
Investigación e Innovación en Ingenierías,
Vol. 8 Núm. 2 (2020): Julio - Diciembre
Resumen
Los datos en la agricultura se caracterizan por ser ruidosos, heterogéneos, de gran volumen, y con diferentes niveles de detalle. Sin embargo, las tecnologías más recurrentes para procesar y almacenar estos datos carecen de métodos para atender las demandas de los sistemas modernos. Objetivo: en este estudio se presenta una arquitectura de Big Data especializada en el proceso de predicción del rendimiento de un cultivo. Metodología: Se investigó tomando en cuenta dos enfoques, por un lado, el proceso de predicción de rendimientos de cultivos, por otro lado, se estudiaron arquitecturas de software relacionadas. A partir de las investigaciones se definieron los requerimientos para un sistema de almacenamiento y procesamiento en este ámbito. Resultados: la arquitectura incluyó (1) un modelo de datos en colecciones MongoDB; (2) un sistema de encolamiento Kafka; y (3) un sistema de procesamiento en PySpark. La arquitectura hereda de las tecnologías usadas la capacidad de escalamiento vertical y horizontal, de atender datos heterogéneos y variables de dominio específico, además de permitir la interacción con diferentes transformaciones y modelos de aprendizaje automático. Conclusión: Las tecnologías de Big Data pueden modelar el proceso de predicción de rendimientos de cultivo, este esquema sirve como referencia para llevar a cabo análisis de datos agronómicos sobre un ambiente de Big Data escalable y flexible.
Palabras clave
Descargar Cita
Endnote/Zotero/Mendeley (RIS)BibTeX
- Y. Everingham, J. Sexton, D. Skocaj, and G. Inman-Bamber, “Accurate prediction of sugarcane yield using a random forest algorithm,” Agronomy for Sustainable Development, vol. 36, no. 2, pp. 1–9, 2016. [Online]. Available: https://-doi.org/10.1007/s13593-016-0364-z
- D. Jiménez, J. Cock, H. F. Satizábal, A. Pérez-Uribe, A. Jarvis, P. Van Damme et al., “Analysis of andean blackberry (rubus glaucus) production models obtained by means of artificial neural networks exploiting information collected by small-scale growers in colombia and publicly available meteorological data,” Computers and electronics in agriculture, vol. 69, no. 2, pp. 198–208, 2009. [Online]. Available: https://-doi.org/10.1016/j.compag.2009.08.008
- A. Gonzalez-Sanchez, J. Frausto-Solis, and W. Ojeda-Bustamante, “Predictive ability of machine learning methods for massive crop yield prediction,” Spanish Journal of Agricultural Research, vol. 12, no. 2, pp. 313–328, 2014. [Online]. Available: http://-hdl.handle.net/20.500.12013/1927
- G. Ruß, “Data mining of agricultural yield data: A comparison of regression models,” in Industrial Conference on Data Mining. Springer, 2009, pp. 24–37. [Online]. Available: https://doi.org/10.1007/978-3-642-03067-3_3
- R. Lokers, R. Knapen, S. Janssen, Y. van Randen, and J. Jansen, “Analysis of big data technologies for use in agro-environmental science,” Environmental Modelling & Software, vol. 84, pp. 494–504, 2016. [Online]. Available: https://doi.org/10.1016/-j.envsoft.2016.07.017
- S. Delerce, H. Dorado, A. Grillon, M. C. Rebolledo, S. D. Prager, V. H. Patiño, G. G. Varón, and D. Jiménez, “Assessing weather-yield relationships in rice at local scale using data mining approaches,” PloS one, vol. 11, no. 8, p. e0161620, 2016. [Online]. Available: https://dx.doi.org/10.1371%2Fjournal.pone.0161620
- L. Wasserman, All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013.
- D. Jiménez, J. Cock, A. Jarvis, J. Garcia, H. F. Satizábal, P. Van Damme, A. Pérez-Uribe, and M. A. Barreto-Sanz, “Interpretation of commercial production information: A case study of lulo (solanum quitoense), an under-researched andean fruit,” Agricultural Systems, vol. 104, no. 3, pp. 258–270, 2011. [Online]. Available: https://doi.org/10.1016/j.agsy.2010.10.004
- S. Haykin and N. Network, A comprehensive foundation, 2004, vol. 2, no. 2004.
- G. Ruß, R. Kruse, M. Schneider, and P. Wagner, “Estimation of neural network parameters for wheat yield prediction,” in IFIP International Conference on Artificial Intelligence in Theory and Practice. Springer, 2008, pp. 109–118. [Online]. Available: https://doi.org/10.1007/978-0-387-09695-7_11
- J. R. Quinlan et al., “Learning with continuous classes,” in 5th Australian joint conference on artificial intelligence, vol. 92. Singapore, 1992, pp. 343–348. [Online]. Available: https://doi.org/10.1142/9789814536271
- B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992, pp. 144–152. [Online]. Available: https://doi.org/10.1145/-130385.130401
- S. R. Gunn et al., “Support vector machines for classification and regression,” ISIS technical report, vol. 14, 1998.
- L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
- J. L. Riquelme, F. Soto, J. Suardáz, P. Sánchez, A. Iborra, and J. Vera, “Wireless sensor networks for precision horticulture in southern spain,” Computers and electronics in agriculture, vol. 68, no. 1, pp. 25–35, 2009. [Online]. Available: https://-doi.org/10.1016/j.compag.2009.04.006
- V. M. Ngo, N.-A. Le-Khac, M. Kechadi et al., “An efficient data warehouse for crop yield prediction,” arXiv preprint arXiv:1807.00035, 2018. [Online]. Available: https://arxiv.org/abs/1807.00035
- A. Manjula and G. Narsimha, “Xcypf: A flexible and extensible framework for agricultural crop yield prediction,” in Intelligent Systems and Control (ISCO), 2015 IEEE 9th International Conference on. IEEE, 2015, pp. 1–5. [Online]. Available: https://-doi.org/10.1109/ISCO.2015.7282311
- A. Kamilaris, A. Assumpcio, A. B. Blasi, M. Torrellas, and F. X. Prenafeta-Boldú, “Estimating the environmental impact of agriculture by means of geospatial and big data analysis: The case of catalonia,” in From Science to Society. Springer, 2018, pp. 39–48. [Online]. Available: https://doi.org/10.1007/978-3-319-65687-8_4
- C. Bazzi, E. Jasse, E. Souza, P. S. Graziano Magalhães, G. Michelon, K. Schenatto, and A. Gavioli, “Agdatabox-api (application programming interface) a paper from the proceedings of the 14 th international conference on precision agriculture,” 07 2018.
- M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big data for remote sensing: Challenges and opportunities,” Proceedings of the IEEE, vol. 104, no. 11, pp. 2207–2219, 2016. [Online]. Available: https://doi.org/10.1109/-JPROC.2016.2598228
- J. Mintert, D. Widmar, M. Langemeier, M. Boehlje, B. Erickson et al., “The challenges of precision agriculture: is big data the answer,” in Southern Agricultural Economics Association Annual Meeting, San Antonio, Texas, no. 230057, 2016. [Online]. Available: http://dx.doi.org/10.22004/ag.econ.230057
- C. Rosenzweig, J. W. Jones, J. L. Hatfield, A. C. Ruane, K. J. Boote, P. Thorburn, J. M. Antle, G. C. Nelson, C. Porter, S. Janssen et al., “The agricultural model intercomparison and improvement project (agmip): protocols and pilot studies,” Agricultural and Forest Meteorology, vol. 170, pp. 166–182, 2013. [Online]. Available: https://doi.org/10.1016/j.agrformet.2012.09.011
- J. W. White, L. Hunt, K. J. Boote, J. W. Jones, J. Koo, S. Kim, C. H. Porter, P. W. Wilkens, and G. Hoogenboom, “Integrated description of agricultural field experiments and production: The icasa version 2.0 data standards,” Computers and Electronics in Agriculture, vol. 96, pp. 1–12, 2013. [Online]. Available: https://doi.org/-10.1016/j.compag.2013.04.003
- I. Nadareishvili, R. Mitra, M. McLarty, and M. Amundsen, Microservice architecture: aligning principles, practices, and culture. " O’Reilly Media, Inc.", 2016.
- D. G. Chandra, “Base analysis of nosql database,” Future Generation Computer Systems, vol. 52, pp. 13–21, 2015. [Online]. Available: https://doi.org/10.1016/-j.future.2015.05.003
- N. Q. Mehmood, R. Culmone, and L. Mostarda, “Modeling temporal aspects of sensor data for mongodb nosql database,” Journal of Big Data, vol. 4, no. 1, p. 8, 2017. [Online]. Available: https://doi.org/10.1186/s40537-017-0068-5
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets.” HotCloud, vol. 10, no. 10-10, p. 95, 2010.
- M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin et al., “Apache spark: a unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016. [Online]. Available: https://doi.org/10.1145/2934664
- P. Neveu, A. Tireau, N. Hilgert, V. Nègre, J. Mineau-Cesari, N. Brichet, R. Chapuis, I. Sanchez, C. Pommier, B. Charnomordic et al., “Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven phenotyping hybrid information system,” New Phytologist, vol. 221, no. 1, pp. 588–601, 2019. [Online]. Available: https://doi.org/10.1111/nph.15385