Data Analytics - Data Valley

Fundamentals of Analytics on AWS – Part 2

This course is the second of two offerings designed to introduce learners to the current market trends in analytics. Building upon the concepts introduced in Part 1, this course introduces learners to an overview of data lakes, data warehouses, and modern data architectures on AWS. You will learn about which AWS services can be used to build a data warehouse, data lakes, and modern data architectures on AWS. You will also see common modern data architecture use cases and a reference architecture. Course level: Fundamental Duration: 1 hour 30 minutes Activities This course includes: lessons, videos, scenarios, and knowledge check questions. Course objectives In this course, you will learn to: Explain data lakes, benefits, and functions. Describe the basic data lake architecture, the AWS services used to build a data lake, and challenges with building a data lake. Explain AWS Lake Formation architecture, features and benefits. Explain data warehousing, challenges with an on-premises data warehouse, and available AWS solutions. Explain modern data architecture pillars and modern data architecture concepts. Explain data movement scenarios. Describe the data mesh architecture pattern, benefits, and available AWS solutions. Identify the available AWS services for building modern data architectures. Identify the components of modern data architecture. Describe common use cases for modern data architecture. Intended audience This course is intended for: Cloud architects Data engineers Data analysts Data scientists Developers Prerequisites We recommend that attendees of this course have: Reviewed AWS Cloud Practitioner Essentials or equivalent Completed Fundamentals of Analytics on AWS – Part 1 Course outline Section 1: Introduction Lesson 1: How to Use This Course Lesson 2: Course Overview Section 2: Architectures Lesson 3: Introduction to Data Lakes Lesson 4: Introduction to Data Warehousing Lesson 5: Introduction to Modern Data Architecture Lesson 6: AWS Services for Modern Data Architecture Section 3: Common Use Cases and Reference Architectures Lesson 7: Common Use Cases Lesson 8: Reference Architectures Section 4: Conclusion Lesson 9: Quiz Lesson 10: Course Summary Lesson 11: Appendix of Resources Lesson 12: Feedback

Fundamentals of Analytics on AWS – Part 1

This course is the first of two offerings designed to introduce learners to the current market trends in analytics. In Part 1, you will learn fundamental concepts such as types of analytics, the 5 V’s of big data, and the challenges associated with processing high volumes of data. This course also maps the 5 V’s of big data to AWS services for analytics and discusses how AWS provides the most comprehensive services on the market. Following completion of this course, learners are encouraged to continue their journey with Fundamentals of Analytics on AWS – Part 2 . Course level: Fundamental Duration: 2 hours Activities This course includes: lessons, videos, scenarios, and knowledge check questions. Course objectives In this course, you will learn to: Explain data analytics, data analysis, analytics types, techniques, and analytics challenges. Define machine learning (ML), ML on AWS, and different levels of AWS for ML services. Define the 5 V’s of big data. Explain common ways to store data, challenges, characteristics of source data storage systems, and available AWS solutions. Explain data transportation, options for different environments, and available AWS solutions. Define data processing, options for each type of processing, and available AWS solutions. Identify different types of data structures, types of data storage, and available AWS solutions. Explain where ETL and ELT fits in multiple places of the analytics pipeline, the elements of an ETL and ELT process, and available AWS solutions. Explain the use of business intelligence tools to gain value from analytics, and available AWS solutions. Intended audience This course is intended for: Cloud architects Data engineers Data analysts Data scientists Developers Prerequisites We recommend that attendees of this course have: Reviewed AWS Cloud Practitioner Essentials or equivalent Course outline Section 1: Introduction Lesson 1: How to Use This Course Lesson 2: Course Overview Section 2: Analytics Concepts Lesson 3: Analytics Lesson 4: Machine Learning Lesson 5: 5 Vs of Big Data Lesson 6: Volume Lesson 7: Variety Lesson 8: Velocity Lesson 9: Veracity Lesson 10: Value Section 3: AWS Services for Analytics Lesson 11: AWS Services for Volume Lesson 12: AWS Services for Variety Lesson 13: AWS Services for Velocity Lesson 14: AWS Services for Veracity Lesson 15: AWS Services for Value Section 4: Conclusion Lesson 16: Quiz Lesson 17: Course Summary Lesson 18: Appendix of Resources Lesson 19: Feedback

AWS Glue Getting Started

Course description AWS Glue is a serverless data integration service that you can use to discover, prepare, and combine data for analytics, machine learning, and application development. In this course, you will learn the benefits, typical use cases, and technical concepts of AWS Glue, including AWS Glue Studio and AWS Glue DataBrew. DataBrew is a new visual data preparation tool that helps data analysts and data scientists clean and normalize data to prepare it for analytics and machine learning. You will have an opportunity to try the service through a demonstration using the AWS Management Console. • Course level: Fundamental • Duration: 2 hour¬¬s Activities This course includes presentations, graphics, and a demonstration with the option to follow along. Course objectives In this course, you will learn to: • Understand how AWS Glue works. • Familiarize yourself with the technical concepts of AWS Glue and DataBrew. • List typical use cases for AWS Glue and DataBrew. • Specify what it would take to implement AWS Glue and DataBrew in a real-world scenario. • Recognize the benefits of AWS Glue and DataBrew. • Explain the cost structure of AWS Glue. • Show how to use AWS Glue and DataBrew from the AWS Management Console. Intended audience This course is intended for the following roles: • Developers • Solutions architects • Data engineers • Business analysts Prerequisites AWS Technical Essentials Course outline • AWS Glue Basics o What does AWS Glue do? o What problems does AWS Glue solve? o What are the benefits of AWS Glue? o What is the data integration engine supported by AWS Glue? o How is AWS Glue used to architect a cloud solution? o What are typical use cases for AWS Glue? o What else should I keep in mind when using AWS Glue? • AWS Glue Cost Structure o How much does AWS cost? • Using AWS Glue Catalog and Glue Studio o What are the basic technical concepts I should know about AWS Glue Studio? o How do I crawl, catalog, and perform ETL on my data using AWS Glue? o Glue Studio tutorial video • AWS Glue DataBrew Basics o What are the basic technical concepts I should know about AWS Glue DataBrew? • Using AWS Glue DataBrew Data Profiling and Data Quality Checks o How do I profile my data, detect PII, and transform my data using AWS Glue DataBrew? o AWS Glue DataBrew tutorial video • Learn More o How can I learn more about AWS Glue?

Amazon EMR Getting Started

Course Description Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto. You can use Amazon EMR to set up, operate, and scale your big data environments and automate time-consuming tasks like provisioning capacity. In this course, you will learn Amazon EMR Serverless which is a new option in Amazon EMR that makes it efficient and cost-effective for data engineers and analysts to run applications built using open-source big data frameworks without having to tune, operate, optimize, secure, or manage clusters. Additionally, you will learn the benefits, typical use cases, and technical concepts of Amazon EMR. You will have an opportunity to try Amazon EMR Serverless and Amazon EMR Cluster through tutorials using the AWS Management Console. • Course level: Fundamental • Duration: 1 Hour Course objectives This course includes presentations, graphics, tutorials, and demonstrations with the option to follow along. Course objectives In this course, you will learn to: • Understand different deployment options available with Amazon EMR. • Understand how Amazon EMR works. • Understand the technical concepts of Amazon EMR Serverless. • List typical use cases for Amazon EMR Serverless. • Understand the technical concepts of Amazon EMR Cluster. • List typical use cases for Amazon EMR Cluster. • Specify what it would take to implement Amazon EMR in a real-world scenario. • Recognize the benefits of Amazon EMR. • Explain the cost structure of Amazon EMR. • Use Amazon EMR Serverless and Amazon EMR Cluster Intended audience This course is intended for: • Developers • Solutions architects • Data engineers • Data architects Prerequisites AWS Technical Essentials Data Analytics Fundamentals Course outline Introduction • Introduction to Amazon EMR • Amazon EMR Serverless Architecture and Use Cases • Amazon EMR Cluster Architecture and Use Cases Using Amazon EMR Serverless • How Do I Run a Spark Job on Amazon EMR Serverless? Using Amazon EMR • How Do I Create an Amazon EMR on EC2 Cluster? • How Do I Create an Amazon EMR Studio? • How Do I Create an Amazon EMR Workspace? • How Do I Run a Spark Job with Amazon EMR Studio Notebook? Resources • Learn More

Getting Started with Amazon OpenSearch Service

Amazon OpenSearch Service is a managed service that helps you perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch Service provisions all the resources for your cluster and launches it. It also automatically detects and replaces failed OpenSearch Service nodes, reducing the overhead associated with self-managed infrastructures. You can quickly scale your cluster with a single API call or in the console. In this course, you will learn about the benefits and technical concepts of OpenSearch Service. You will review the architecture and built-in features. You will also have an opportunity to run search queries, perform log analytics, and visualize your data using dashboards through demonstrations. • Course level: Fundamental • Duration: 1 hour Activities This course includes demonstrations, videos, and assessments. Course objectives In this course, you will learn to do the following: • Understand how OpenSearch Service works. • Familiarize yourself with the technical concepts of OpenSearch Service. • List typical use cases for OpenSearch Service. • Recognize the benefits of OpenSearch Service. • Explain the cost structure of OpenSearch Service. • Run queries to search documents in OpenSearch Service. • Perform log analytics on web logs from the OpenSearch Service Discover page. • Create dashboards and visualizations to visualize the data. Intended audience This course is intended for the following: • Site reliability engineer • Site reliability architect • Operations infrastructure lead • Cloud architect • Cloud engineer • DevOps engineer • Cybersecurity engineer • search architect Prerequisites Basic understanding of Amazon Elastic Compute Cloud (Amazon EC2), virtual private clouds (VPCs), and networking concepts Course outline Module 1: Introduction to Amazon OpenSearch Service Module 2: Using OpenSearch Service to Architect a Cloud Solution Module 3: Typical Use Cases for OpenSearch Service Module 4: Factors to Keep in Mind When Using OpenSearch Service Module 5: OpenSearch Service Cost Structure Module 6: Basic Technical Concepts of OpenSearch Service • Launch an OpenSearch Service cluster • Ingest movie dataset to the cluster • Query movie dataset from OpenSearch Dashboards • Perform log analysis on sample web logs dataset • Create visualizations and dashboards for web logs

Amazon QuickSight – Getting Started

Amazon QuickSight is a cloud-scale business intelligence (BI) service that you can use to create and publish interactive dashboards. You can access these from browsers or mobile devices. As a fully managed cloud-based service, QuickSight combines data from many different sources and provides user management tools that you can use to scale from a few users to millions. In this course, you will learn about the benefits and technical concepts of QuickSight. You will learn about the architecture and built-in features. You will have an opportunity to try key features through demonstrations. • Course level: Fundamental • Duration: 60 minutes Activities This course includes demonstrations, videos, and assessments. Course objectives In this course, you will learn to do the following: • Understand how QuickSight works. • Familiarize yourself with the technical concepts of QuickSight. • List typical use cases for QuickSight. • Recognize the benefits of QuickSight. • Explain the cost structure of QuickSight. • Design, create, and customize QuickSight dashboards to visualize data and extract business insights. Intended audience This course is intended for the following: • BI developers • Business analysts • Data analysts • BI managers Prerequisites We recommend that attendees have a basic understanding of BI and visual analytics concepts. Course outline Module 1: Introduction to QuickSight Module 2: Architecture and Use Cases Module 3: How Do I Create a QuickSight Dataset? Module 4: How Do I Create a QuickSight Analysis? Module 5: How Do I Customize QuickSight Using Themes? Module 6: How Do I Publish a QuickSight Dashboard? Module 7: How Do I Use QuickSight Q to Ask Natural Language Questions?

Data Analytics Fundamentals

In this self-paced course, you learn about the process for planning data analysis solutions and the various data analytic processes that are involved. This course takes you through five key factors that indicate the need for specific AWS services in collecting, processing, analyzing, and presenting your data. This includes learning basic architectures, value propositions, and potential use cases. The course introduces you to the AWS services and solutions to help you build and enhance data analysis solutions. Intended Audience: This course is intended for: •Data architects •Data scientists •Data analysts Course Objectives: In this course, you will learn how to: •Identify the characteristics of data analysis solutions and the characteristics that indicate such a solution may be required •Define types of data including structured, semistructured, and unstructured data •Define data storage types such as data lakes, AWS Lake Formation, data warehouses, and the Amazon Simple Storage Service (Amazon S3) •Analyze the characteristics of and differences in batch and stream processing •Define how Amazon Kinesis is used to process streaming data •Analyze the characteristics of different storage systems for source data •Analyze the characteristics of online transaction processing (OLTP) and online analytical processing (OLAP) systems and their impact on the organization of data within these systems •Analyze the differences of row-based and columnar data storage methods •Define how Amazon EMR, AWS Glue, and Amazon Redshift each work to process, cleanse, and transform data within a data analysis solution •Analyze the concept of atomicity, consistency, isolation, and durability (ACID) compliance as well as basic availability, soft state, eventual consistency (BASE) compliance and how an extract, transform, load (ETL) process can help to ensure compliance •Explore the concept of data schemas and understand how they define data and how this information is stored in metastores •Analyze the concept of data versus information •Recognize the ways to analyze data to produce information for reports using tools such as Amazon QuickSight and Amazon Athena •Define how AWS services work together to visualize data Prerequisites: We recommend that attendees of this course have the following prerequisites: •Working knowledge of database concepts •Basic understanding of data storage, processing, and analytics •Experience with enterprise IT systems Delivery Method: This course is delivered through a mix of: •Digital training Duration: •3 Hours 30 Minutes Course Outline: This course covers the following concepts: • Lesson 1: Introduction to data analysis solutions – Data analytics and data analysis concepts – Introduction to the challenges of data analytics • Lesson 2: Volume – data storage – Introduction to Amazon S3 – Introduction to data lakes – Introduction to data storage methods • Lesson 3: Velocity – data processing – Introduction to data processing methods – Introduction to batch data processing – Introduction to stream data processing • Lesson 4: Variety – data structure and types – Introduction to source data storage – Introduction to structured data stores – Introduction to semistructured and unstructured data stores • Lesson 5: Veracity – cleansing and transformation – Understanding data integrity – Understanding database consistency – Introduction to the ETL process • Lesson 6: Value – reporting and business intelligence – Introduction to analyzing data – Introduction to visualizing data • Lesson 7: Key Takeaways – Putting the pieces together – What’s next

Getting Started with Amazon Redshift

In this course, you will learn the benefits, typical use cases, and technical concepts of Amazon Redshift. You can also try the service through a demonstration using the AWS Management Console. The cloud data warehouse service integrates with data lakes based on Amazon Simple Storage Service (Amazon S3). It also integrates with relational database services such as Amazon Relational Database Service (Amazon RDS) for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, Amazon RDS for MySQL, and Amazon Aurora MySQL-Compatible Edition. Amazon Redshift supports building and using machine learning (ML) models using familiar SQL commands, thereby reducing the skills needed to take advantage of ML. • Course level: Fundamental • Duration: 1 hour Activities This course includes presentations, graphics, and a demonstration with the option to follow along. Course objectives In this course, you will learn to: • Understand how Amazon Redshift works • Familiarize yourself with the technical concepts of Amazon Redshift • List typical use cases for Amazon Redshift • Specify what it would take to implement Amazon Redshift in a real-world scenario • Recognize the benefits of Amazon Redshift • Explain the cost structure of Amazon Redshift • Use Amazon Redshift from the AWS Management Console Intended audience This course is intended for: • Data warehouse engineers • Solutions architects Prerequisites • One or more years of data warehouse management experience • [AWS Technical Essentials] () Course outline • Amazon Redshift Basics • Using Amazon Redshift • Learn More

Exam Readiness: AWS Certified Data Analytics – Specialty

The AWS Certified Data Analytics – Specialty exam validates technical skills and experience in designing and implementation AWS services to derive value from data. This course helps you prepare for the exam by exploring the exam’s topic areas and familiarizing you with the question style and exam approach. The course reviews sample exam questions in each topic area and teaches you how to interpret the concepts being tested so you can more easily eliminate incorrect responses. The course addresses each of the exam’s content domains:: •Data collection systems •Storage and data management concerns •Data processing solutions •Analysis and visualization of analytical data •Security of the data analysis system Intended Audience: This course is intended for: •IT professionals •Data platform engineers/data architects •Data scientists •Data analysts •Solutions architects Course Objectives: In this course, you will learn how to: •Navigate the logistics of the examination process •Understand the exam structure and question types •Identify how questions relate to AWS data analytics concepts •Interpret the concepts being tested by exam questions •Develop a personalized study plan to prepare for the exam Prerequisites: We recommend that attendees of this course have the following prerequisites: •AWS Certified Cloud Practitioner or an Associate-level AWS certification •Five or more years of hands-on experience working with complex data analytics processes and analysis solutions on AWS Delivery Method: This course is delivered through: •Digital training Duration: •3 Hours 30 Minutes Course Outline: This course covers the following concepts: •Testing center information and expectations •Exam overview and structure •Question structure and interpretation techniques •Deep dive into exam domains, including practice exam questions

Amazon Redshift Service Primer

This course introduces you to Amazon Redshift and its core features and capabilities. The course describes how this service integrates with other AWS services, introduces important terminology and technology concepts, and includes a demonstration of the service. Intended Audience: This course is intended for: •IT professionals •Data platform engineers •Database developers •Solutions architects Course Objectives: In this course, you will learn to: •List the purpose of the service and its function •Summarize the benefits of the service •Recall how the service works •Identify use cases for the service •Recognize how the service is billed •Recall how to get additional information on the service •Clarify how this service integrates with other services •Summarize the relevant terminology associated with this service •Identify security strategies used by this service Prerequisites: We recommend that attendees of this course have the following prerequisites: •None Delivery Method: This course is delivered through: •Digital training Duration: •20 minutes Course Outline: This course covers the following concepts: •Service Introduction •Service Technical Overview •Service Demonstration •Service Assessment •Service Review

Introduction to Amazon Kinesis Data Analytics for Java Applications

The new support for Java programming in Amazon Kinesis Data Analytics helps you solve challenges, and this course will show you how. You’ll also learn how the SDKs are supported through Apache Flink libraries and see how it works in real-world use cases.

Best Practices for Data Warehousing with Amazon Redshift

In this course, you will learn about the concepts of implementing a data warehouse using Amazon Redshift. You will learn about basic table design, data storage, data ingestion techniques, and workload management. You will also learn about the effect of node and cluster sizing.

AWS Domain: Data Analytics