ijact-book-coverT

Building Intelligent Systems on AWS: From Data Lakes to AI-Powered Insights

© 2023 by IJACT

Volume 1 Issue 3

Year of Publication : 2023

Author : Anusha Medavaka

:10.56472/25838628/IJACT-V1I3P108

Citation :

Anusha Medavaka, 2023. "Building Intelligent Systems on AWS: From Data Lakes to AI-Powered Insights", ESP International Journal of Advancements in Computational Technology (ESP-IJACT)  Volume 1, Issue 3: 68-80.

Abstract :

This is perhaps the main reason why it is now possible to build intelligent systems that can analyze loads of data in order to extract meaningful information. AWS is particularly an organization with a clear strategic direction towards becoming a cloud solution organization, and it has provided several solutions for building smart systems relying on Data lakes and AI. This paper focuses on steps to take in AWS for developing intelligent systems with specific emphasis on how data lakes can be optimized and how AI can be applied to data. First, we discuss some of the fundamentals of AWS, including storage tiers, data pools, and machine learning features. It then elaborates on the various processes that can be used while developing these systems, with a special emphasis on the ingestion, processing, and analysis of data systems. Despite the fact that the article concentrates on how AI applies value to raw data with a prime focus on AWS services such as Amazon S3, AWS Glue, Amazon Athena, Amazon SageMaker, etc., the article also provides examples of its heuristic application and real-life scenarios as well as the opportunities and challenges of intelligent system development on AWS. Finally, we also focus on the potential future advancement and new trends of deep learning technology and describe the future of edge computing and quantum computing. The reader will keep a general idea of how the creation of intelligent systems takes place through AWS projects in various areas of the economy and industry.

References :

[1] Lehrig, S., Eikerling, H., & Becker, S. (2015). Scalability, elasticity, and efficiency in cloud computing: A systematic literature review of definitions and metrics. ACM Cloud Computing, 25(8), 50-56.

[2] Gorelik, A. (2019). The enterprise big data lake: Delivering the promise of big data and data science. O’Reilly Media.

[3] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

[4] Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.

[5] Patterson, D., Gibson, G., & Katz, R. H. (1988). A case for redundant arrays of inexpensive disks (RAID). ACM SIGMOD Record, 17(3), 109-116.

[6] Hazelwood, K., Bird, S., Brooks, D., Chou, E., et al. (2018). Applied machine learning at Facebook: A datacenter infrastructure perspective. IEEE International Symposium on High-Performance Computer Architecture.

[7] Hai, R., Koutras, C., Quix, C., & Jarke, M. (2023). Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12571-12590.

[8] Garton, T. W., & Engineer Research and Development Center Vicksburg MS. (2020). Data Enrichment and Enhanced Accessibility of Waterborne Commerce Numerical Data: Spatially Depicting the National Waterway Network. US Army Engineer Research and Development Center, Information Technology Laboratory.

[9] Jayawardena, N. S., Behl, A., Thaichon, P., & Quach, S. (2022). Artificial intelligence (AI)-based market intelligence and customer insights. In Artificial intelligence for marketing management (pp. 120-141). Routledge.

[10] Park, H. S., & Tran, N. H. (2015). Development of a cloud based smart manufacturing system. Journal of Advanced Mechanical Design, Systems, and Manufacturing, 9(3), JAMDSM0030-JAMDSM0030.

[11] Poladi, S. (1924). Integrating Apache Spark with AWS Lambda: Building Scalable and Real-Time Data Processing Pipelines.

[12] Heuring, V. P., Jordan, H. F., & Murdocca, M. (1997). Computer systems design and architecture (pp. 519-520). Addison-Wesley.

[13] Elger, P., & Shanaghy, E. (2020). AI as a Service: Serverless machine learning with AWS. Manning Publications.

[14] Kaushik, P., Rao, A. M., Singh, D. P., Vashisht, S., & Gupta, S. (2021, November). Cloud computing and comparison based on service and performance between Amazon AWS, Microsoft Azure, and Google Cloud. In 2021 International Conference on Technological Advancements and Innovations (ICTAI) (pp. 268-273). IEEE.

[15] Fregly, C., & Barth, A. (2021). Data Science on AWS. “O'Reilly Media, Inc.".

[16] Jamal, S., & Wimmer, H. (2022, December). Performance analysis of machine learning algorithm on cloud platforms: AWS vs Azure vs GCP. In International Scientific and Practical Conference on Information Technologies and Intelligent Decision Making Systems (pp. 43-60). Cham: Springer Nature Switzerland.

[17] Anusha Medavaka, “Enhanced Classification Framework on SocialNetworks” in “Journal of Advances in Science and Technology”, Vol. IX, Issue No. XIX, May-2015 [ISSN : 2230-9659]

[18] Anusha Medavaka, P. Shireesha, “A Survey on TraffiCop Android Application” in “Journal of Advances in Science and Technology”, Vol. 14, Issue No. 2, September-2017 [ISSN : 2230-9659]

[19] Anusha Medavaka, P. Shireesha, “Review on Secure Routing Protocols in MANETs” in “International Journal of Information Technology and Management”, Vol. VIII, Issue No. XII, May-2015 [ISSN : 2249-4510]

[20] Anusha Medavaka, P. Shireesha, “Optimal framework to Wireless RechargeableSensor Network based Joint Spatial of theMobile Node” in “Journal of Advances in Science and Technology”, Vol. XI, Issue No. XXII, May2016 [ISSN : 2230-9659]

[21] Anusha Medavaka, "Algorithm Feasibility on IoT Devices with Memory and Computational Power Constraints", International Journal of Science and Research (IJSR), Volume 8, Issue 5, May 2019[ISSN : 2319-7064]

[22] Anusha Medavaka, "Monitoring and Controlling Local AreaNetwork Using Android APP” in “International Journal of Research”, Vol. 7, Issue No. IV, April-2018 [ISSN : 2236-6124]

[23] Anusha Medavaka, P. Shireesha, “Analysis and Usage of Spam Detection Methodin Mail Filtering System” in “International Journal of Information Technology and Management”, Vol. 12, Issue No. 1, February-2017 [ISSN : 2249-4510]

[24] Anusha Medavaka, “Identification of Security Threats and Proposed Security Mechanisms for Wireless Sensor Networks” in “International Journal of Scientific Research in Computer Science, Engineering and Information Technology”, Vol. 5, Issue No. 3, May-2019 [ISSN : 2456-3307]

[25] Anusha Medavaka, "Programmable Big Data Processing Framework toReduce On-Chip Communicationsand Computations Towards Reducing Energyof the Processing" in “International Journal of Advanced Research in Computer and Communication Engineering”, Volume 8, Issue 4, April 2019, [ISSN : 2278-1021]

[26] Anusha Medavaka, “An Overview of Security Mechanisms Towards Different Types of Attacks” in “International Journal of Scientific Research in Science and Technology”, Vol. 4, Issue No. 10, October-2018 [ISSN : 2395-602X]

[27] Anusha Medavaka, “A study on the process of hiding protective information from the big data processing databases” in “International journal of basic and applied research”, Vol. 9, Issue No. 6, June-2019[ISSN : 2278-0505]

[28] Anusha Medavaka, “A REVIEW ONDISPLAYING KNOWLEDGE INTO THE UNLIMITED WORLDVIEW OF BIGDATA” in “International Journal of Research and Analytical Reviews”, Vol. 6, Issue No. 2, May-2019 [ISSN : 2348 –1269]

[29] Anusha Medavaka, “A Comprehensive Study on Characteristics of Big Data and the Platform Used in Big Data” in “International Journal for Scientific Research & Development”, Vol. 7, Issue No. 3, May-2019 [ISSN : 2321-0613]

[30] Anusha Medavaka, “K-Means Clustering Algorithm to Search into the Documents Containing Natural Language” in “International Journal of Scientific Research in Science and Technology”, Vol. 3, Issue No. 8, Dec-2017[ISSN : 2395-602X]

[31] Anusha Medavaka, Siripuri Kiran, “Implementation of dynamic handover reduce function algorithm towards optimizing the result in reduce function” in “International Journal of Academic Research and Development”, Vol. 4, Issue No. 4, July-2019 [ISSN : 2455-4197]

[32] Atri P. Enabling AI Work flows: A Python Library for Seamless Data Transfer between Elasticsearch and Google Cloud Storage. J Artif Intell Mach Learn & Data Sci 2022, 1(1), 489-491. DOI: doi.org/10.51219/JAIMLD/preyaa-atri/132

[33] Atri P. Cloud Storage Optimization Through Data Compression: Analyzing the Compress-CSV-Files-GCS-Bucket Library. J Artif Intell Mach Learn & Data Sci 2023, 1(3), 498-500. DOI: doi.org/10.51219/JAIMLD/preyaa-atri/134

[34] Preyaa Atri, "Empowering AI with Efficient Data Pipelines: A Python Library for Seamless Elasticsearch to BigQuery Integration", International Journal of Science and Research (IJSR), Volume 12 Issue 5, May 2023, pp. 2664-2666, https://www.ijsr.net/getabstract.php?paperid=SR24522145306

Keywords :

AWS, Intelligent Systems, Data Lakes, Machine Learning, Cloud Computing, Amazon S3, Amazon SageMaker, Data Processing.