Derechos de autor 2020 Investigación e Innovación en Ingenierías
Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Arquitectura para el análisis de datos agronómicos en un ambiente de Big Data
Corresponding Author(s) : Luis Felipe Vargas Rojas
Investigación e Innovación en Ingenierías,
Vol. 8 Núm. 2 (2020): Julio - Diciembre
Resumen
Los datos en la agricultura se caracterizan por ser ruidosos, heterogéneos, de gran volumen, y con diferentes niveles de detalle. Sin embargo, las tecnologías más recurrentes para procesar y almacenar estos datos carecen de métodos para atender las demandas de los sistemas modernos. Objetivo: en este estudio se presenta una arquitectura de Big Data especializada en el proceso de predicción del rendimiento de un cultivo. Metodología: Se investigó tomando en cuenta dos enfoques, por un lado, el proceso de predicción de rendimientos de cultivos, por otro lado, se estudiaron arquitecturas de software relacionadas. A partir de las investigaciones se definieron los requerimientos para un sistema de almacenamiento y procesamiento en este ámbito. Resultados: la arquitectura incluyó (1) un modelo de datos en colecciones MongoDB; (2) un sistema de encolamiento Kafka; y (3) un sistema de procesamiento en PySpark. La arquitectura hereda de las tecnologías usadas la capacidad de escalamiento vertical y horizontal, de atender datos heterogéneos y variables de dominio específico, además de permitir la interacción con diferentes transformaciones y modelos de aprendizaje automático. Conclusión: Las tecnologías de Big Data pueden modelar el proceso de predicción de rendimientos de cultivo, este esquema sirve como referencia para llevar a cabo análisis de datos agronómicos sobre un ambiente de Big Data escalable y flexible.
Palabras clave
Descargar cita
Endnote/Zotero/Mendeley (RIS)BibTeX
- Y. Everingham, J. Sexton, D. Skocaj, and G. Inman-Bamber, “Accurate prediction of sugarcane yield using a random forest algorithm,” Agronomy for Sustainable Development, vol. 36, no. 2, pp. 1–9, 2016. [Online]. Available: https://-doi.org/10.1007/s13593-016-0364-z
- D. Jiménez, J. Cock, H. F. Satizábal, A. Pérez-Uribe, A. Jarvis, P. Van Damme et al., “Analysis of andean blackberry (rubus glaucus) production models obtained by means of artificial neural networks exploiting information collected by small-scale growers in colombia and publicly available meteorological data,” Computers and electronics in agriculture, vol. 69, no. 2, pp. 198–208, 2009. [Online]. Available: https://-doi.org/10.1016/j.compag.2009.08.008
- A. Gonzalez-Sanchez, J. Frausto-Solis, and W. Ojeda-Bustamante, “Predictive ability of machine learning methods for massive crop yield prediction,” Spanish Journal of Agricultural Research, vol. 12, no. 2, pp. 313–328, 2014. [Online]. Available: http://-hdl.handle.net/20.500.12013/1927
- G. Ruß, “Data mining of agricultural yield data: A comparison of regression models,” in Industrial Conference on Data Mining. Springer, 2009, pp. 24–37. [Online]. Available: https://doi.org/10.1007/978-3-642-03067-3_3
- R. Lokers, R. Knapen, S. Janssen, Y. van Randen, and J. Jansen, “Analysis of big data technologies for use in agro-environmental science,” Environmental Modelling & Software, vol. 84, pp. 494–504, 2016. [Online]. Available: https://doi.org/10.1016/-j.envsoft.2016.07.017
- S. Delerce, H. Dorado, A. Grillon, M. C. Rebolledo, S. D. Prager, V. H. Patiño, G. G. Varón, and D. Jiménez, “Assessing weather-yield relationships in rice at local scale using data mining approaches,” PloS one, vol. 11, no. 8, p. e0161620, 2016. [Online]. Available: https://dx.doi.org/10.1371%2Fjournal.pone.0161620
- L. Wasserman, All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013.
- D. Jiménez, J. Cock, A. Jarvis, J. Garcia, H. F. Satizábal, P. Van Damme, A. Pérez-Uribe, and M. A. Barreto-Sanz, “Interpretation of commercial production information: A case study of lulo (solanum quitoense), an under-researched andean fruit,” Agricultural Systems, vol. 104, no. 3, pp. 258–270, 2011. [Online]. Available: https://doi.org/10.1016/j.agsy.2010.10.004
- S. Haykin and N. Network, A comprehensive foundation, 2004, vol. 2, no. 2004.
- G. Ruß, R. Kruse, M. Schneider, and P. Wagner, “Estimation of neural network parameters for wheat yield prediction,” in IFIP International Conference on Artificial Intelligence in Theory and Practice. Springer, 2008, pp. 109–118. [Online]. Available: https://doi.org/10.1007/978-0-387-09695-7_11
- J. R. Quinlan et al., “Learning with continuous classes,” in 5th Australian joint conference on artificial intelligence, vol. 92. Singapore, 1992, pp. 343–348. [Online]. Available: https://doi.org/10.1142/9789814536271
- B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992, pp. 144–152. [Online]. Available: https://doi.org/10.1145/-130385.130401
- S. R. Gunn et al., “Support vector machines for classification and regression,” ISIS technical report, vol. 14, 1998.
- L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
- J. L. Riquelme, F. Soto, J. Suardáz, P. Sánchez, A. Iborra, and J. Vera, “Wireless sensor networks for precision horticulture in southern spain,” Computers and electronics in agriculture, vol. 68, no. 1, pp. 25–35, 2009. [Online]. Available: https://-doi.org/10.1016/j.compag.2009.04.006
- V. M. Ngo, N.-A. Le-Khac, M. Kechadi et al., “An efficient data warehouse for crop yield prediction,” arXiv preprint arXiv:1807.00035, 2018. [Online]. Available: https://arxiv.org/abs/1807.00035
- A. Manjula and G. Narsimha, “Xcypf: A flexible and extensible framework for agricultural crop yield prediction,” in Intelligent Systems and Control (ISCO), 2015 IEEE 9th International Conference on. IEEE, 2015, pp. 1–5. [Online]. Available: https://-doi.org/10.1109/ISCO.2015.7282311
- A. Kamilaris, A. Assumpcio, A. B. Blasi, M. Torrellas, and F. X. Prenafeta-Boldú, “Estimating the environmental impact of agriculture by means of geospatial and big data analysis: The case of catalonia,” in From Science to Society. Springer, 2018, pp. 39–48. [Online]. Available: https://doi.org/10.1007/978-3-319-65687-8_4
- C. Bazzi, E. Jasse, E. Souza, P. S. Graziano Magalhães, G. Michelon, K. Schenatto, and A. Gavioli, “Agdatabox-api (application programming interface) a paper from the proceedings of the 14 th international conference on precision agriculture,” 07 2018.
- M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big data for remote sensing: Challenges and opportunities,” Proceedings of the IEEE, vol. 104, no. 11, pp. 2207–2219, 2016. [Online]. Available: https://doi.org/10.1109/-JPROC.2016.2598228
- J. Mintert, D. Widmar, M. Langemeier, M. Boehlje, B. Erickson et al., “The challenges of precision agriculture: is big data the answer,” in Southern Agricultural Economics Association Annual Meeting, San Antonio, Texas, no. 230057, 2016. [Online]. Available: http://dx.doi.org/10.22004/ag.econ.230057
- C. Rosenzweig, J. W. Jones, J. L. Hatfield, A. C. Ruane, K. J. Boote, P. Thorburn, J. M. Antle, G. C. Nelson, C. Porter, S. Janssen et al., “The agricultural model intercomparison and improvement project (agmip): protocols and pilot studies,” Agricultural and Forest Meteorology, vol. 170, pp. 166–182, 2013. [Online]. Available: https://doi.org/10.1016/j.agrformet.2012.09.011
- J. W. White, L. Hunt, K. J. Boote, J. W. Jones, J. Koo, S. Kim, C. H. Porter, P. W. Wilkens, and G. Hoogenboom, “Integrated description of agricultural field experiments and production: The icasa version 2.0 data standards,” Computers and Electronics in Agriculture, vol. 96, pp. 1–12, 2013. [Online]. Available: https://doi.org/-10.1016/j.compag.2013.04.003
- I. Nadareishvili, R. Mitra, M. McLarty, and M. Amundsen, Microservice architecture: aligning principles, practices, and culture. " O’Reilly Media, Inc.", 2016.
- D. G. Chandra, “Base analysis of nosql database,” Future Generation Computer Systems, vol. 52, pp. 13–21, 2015. [Online]. Available: https://doi.org/10.1016/-j.future.2015.05.003
- N. Q. Mehmood, R. Culmone, and L. Mostarda, “Modeling temporal aspects of sensor data for mongodb nosql database,” Journal of Big Data, vol. 4, no. 1, p. 8, 2017. [Online]. Available: https://doi.org/10.1186/s40537-017-0068-5
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets.” HotCloud, vol. 10, no. 10-10, p. 95, 2010.
- M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin et al., “Apache spark: a unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016. [Online]. Available: https://doi.org/10.1145/2934664
- P. Neveu, A. Tireau, N. Hilgert, V. Nègre, J. Mineau-Cesari, N. Brichet, R. Chapuis, I. Sanchez, C. Pommier, B. Charnomordic et al., “Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven phenotyping hybrid information system,” New Phytologist, vol. 221, no. 1, pp. 588–601, 2019. [Online]. Available: https://doi.org/10.1111/nph.15385
Referencias
Y. Everingham, J. Sexton, D. Skocaj, and G. Inman-Bamber, “Accurate prediction of sugarcane yield using a random forest algorithm,” Agronomy for Sustainable Development, vol. 36, no. 2, pp. 1–9, 2016. [Online]. Available: https://-doi.org/10.1007/s13593-016-0364-z
D. Jiménez, J. Cock, H. F. Satizábal, A. Pérez-Uribe, A. Jarvis, P. Van Damme et al., “Analysis of andean blackberry (rubus glaucus) production models obtained by means of artificial neural networks exploiting information collected by small-scale growers in colombia and publicly available meteorological data,” Computers and electronics in agriculture, vol. 69, no. 2, pp. 198–208, 2009. [Online]. Available: https://-doi.org/10.1016/j.compag.2009.08.008
A. Gonzalez-Sanchez, J. Frausto-Solis, and W. Ojeda-Bustamante, “Predictive ability of machine learning methods for massive crop yield prediction,” Spanish Journal of Agricultural Research, vol. 12, no. 2, pp. 313–328, 2014. [Online]. Available: http://-hdl.handle.net/20.500.12013/1927
G. Ruß, “Data mining of agricultural yield data: A comparison of regression models,” in Industrial Conference on Data Mining. Springer, 2009, pp. 24–37. [Online]. Available: https://doi.org/10.1007/978-3-642-03067-3_3
R. Lokers, R. Knapen, S. Janssen, Y. van Randen, and J. Jansen, “Analysis of big data technologies for use in agro-environmental science,” Environmental Modelling & Software, vol. 84, pp. 494–504, 2016. [Online]. Available: https://doi.org/10.1016/-j.envsoft.2016.07.017
S. Delerce, H. Dorado, A. Grillon, M. C. Rebolledo, S. D. Prager, V. H. Patiño, G. G. Varón, and D. Jiménez, “Assessing weather-yield relationships in rice at local scale using data mining approaches,” PloS one, vol. 11, no. 8, p. e0161620, 2016. [Online]. Available: https://dx.doi.org/10.1371%2Fjournal.pone.0161620
L. Wasserman, All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013.
D. Jiménez, J. Cock, A. Jarvis, J. Garcia, H. F. Satizábal, P. Van Damme, A. Pérez-Uribe, and M. A. Barreto-Sanz, “Interpretation of commercial production information: A case study of lulo (solanum quitoense), an under-researched andean fruit,” Agricultural Systems, vol. 104, no. 3, pp. 258–270, 2011. [Online]. Available: https://doi.org/10.1016/j.agsy.2010.10.004
S. Haykin and N. Network, A comprehensive foundation, 2004, vol. 2, no. 2004.
G. Ruß, R. Kruse, M. Schneider, and P. Wagner, “Estimation of neural network parameters for wheat yield prediction,” in IFIP International Conference on Artificial Intelligence in Theory and Practice. Springer, 2008, pp. 109–118. [Online]. Available: https://doi.org/10.1007/978-0-387-09695-7_11
J. R. Quinlan et al., “Learning with continuous classes,” in 5th Australian joint conference on artificial intelligence, vol. 92. Singapore, 1992, pp. 343–348. [Online]. Available: https://doi.org/10.1142/9789814536271
B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992, pp. 144–152. [Online]. Available: https://doi.org/10.1145/-130385.130401
S. R. Gunn et al., “Support vector machines for classification and regression,” ISIS technical report, vol. 14, 1998.
L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
J. L. Riquelme, F. Soto, J. Suardáz, P. Sánchez, A. Iborra, and J. Vera, “Wireless sensor networks for precision horticulture in southern spain,” Computers and electronics in agriculture, vol. 68, no. 1, pp. 25–35, 2009. [Online]. Available: https://-doi.org/10.1016/j.compag.2009.04.006
V. M. Ngo, N.-A. Le-Khac, M. Kechadi et al., “An efficient data warehouse for crop yield prediction,” arXiv preprint arXiv:1807.00035, 2018. [Online]. Available: https://arxiv.org/abs/1807.00035
A. Manjula and G. Narsimha, “Xcypf: A flexible and extensible framework for agricultural crop yield prediction,” in Intelligent Systems and Control (ISCO), 2015 IEEE 9th International Conference on. IEEE, 2015, pp. 1–5. [Online]. Available: https://-doi.org/10.1109/ISCO.2015.7282311
A. Kamilaris, A. Assumpcio, A. B. Blasi, M. Torrellas, and F. X. Prenafeta-Boldú, “Estimating the environmental impact of agriculture by means of geospatial and big data analysis: The case of catalonia,” in From Science to Society. Springer, 2018, pp. 39–48. [Online]. Available: https://doi.org/10.1007/978-3-319-65687-8_4
C. Bazzi, E. Jasse, E. Souza, P. S. Graziano Magalhães, G. Michelon, K. Schenatto, and A. Gavioli, “Agdatabox-api (application programming interface) a paper from the proceedings of the 14 th international conference on precision agriculture,” 07 2018.
M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big data for remote sensing: Challenges and opportunities,” Proceedings of the IEEE, vol. 104, no. 11, pp. 2207–2219, 2016. [Online]. Available: https://doi.org/10.1109/-JPROC.2016.2598228
J. Mintert, D. Widmar, M. Langemeier, M. Boehlje, B. Erickson et al., “The challenges of precision agriculture: is big data the answer,” in Southern Agricultural Economics Association Annual Meeting, San Antonio, Texas, no. 230057, 2016. [Online]. Available: http://dx.doi.org/10.22004/ag.econ.230057
C. Rosenzweig, J. W. Jones, J. L. Hatfield, A. C. Ruane, K. J. Boote, P. Thorburn, J. M. Antle, G. C. Nelson, C. Porter, S. Janssen et al., “The agricultural model intercomparison and improvement project (agmip): protocols and pilot studies,” Agricultural and Forest Meteorology, vol. 170, pp. 166–182, 2013. [Online]. Available: https://doi.org/10.1016/j.agrformet.2012.09.011
J. W. White, L. Hunt, K. J. Boote, J. W. Jones, J. Koo, S. Kim, C. H. Porter, P. W. Wilkens, and G. Hoogenboom, “Integrated description of agricultural field experiments and production: The icasa version 2.0 data standards,” Computers and Electronics in Agriculture, vol. 96, pp. 1–12, 2013. [Online]. Available: https://doi.org/-10.1016/j.compag.2013.04.003
I. Nadareishvili, R. Mitra, M. McLarty, and M. Amundsen, Microservice architecture: aligning principles, practices, and culture. " O’Reilly Media, Inc.", 2016.
D. G. Chandra, “Base analysis of nosql database,” Future Generation Computer Systems, vol. 52, pp. 13–21, 2015. [Online]. Available: https://doi.org/10.1016/-j.future.2015.05.003
N. Q. Mehmood, R. Culmone, and L. Mostarda, “Modeling temporal aspects of sensor data for mongodb nosql database,” Journal of Big Data, vol. 4, no. 1, p. 8, 2017. [Online]. Available: https://doi.org/10.1186/s40537-017-0068-5
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets.” HotCloud, vol. 10, no. 10-10, p. 95, 2010.
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin et al., “Apache spark: a unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016. [Online]. Available: https://doi.org/10.1145/2934664
P. Neveu, A. Tireau, N. Hilgert, V. Nègre, J. Mineau-Cesari, N. Brichet, R. Chapuis, I. Sanchez, C. Pommier, B. Charnomordic et al., “Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven phenotyping hybrid information system,” New Phytologist, vol. 221, no. 1, pp. 588–601, 2019. [Online]. Available: https://doi.org/10.1111/nph.15385