소장자료

내 서재 담기 내 서재 보기

LDR			11114nam a22004337a 4500
001			0100916072▲
003			EBZ▲
005			20260130150640▲
006			m d ▲
007			cr \|n\|\|\|\|\|\|\|\|\|▲
008			240531s2024 enk o 000 0 eng d▲
020			▼a9781804616208▼q(electronic bk.)▲
020			▼a1804616206▼q(electronic bk.)▲
020			▼z1804619787▲
020			▼z9781804619780▲
037			▼a9781804619780▼bO'Reilly Media▲
037			▼a10559431▼bIEEE▲
040			▼aEBZ▼beng▼cEBZ▲
049			▼aMAIN▲
050		4	▼aQA76.585▲
082	0	4	▼a004.67/82▼223/eng/20240624▲
100	1		▼aShah, Saba,▼eauthor.▲
245	1	0	▼aDATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK USING PYTHON▼h[electronic resource] :▼bthe ultimate guide to getting certified in Apache Spark using practical examples with Python.▲
250			▼a1st edition.▲
260			▼aBirmingham, UK :▼bPackt Publishing Ltd.,▼c2024.▲
300			▼a1 online resource▲
341	0		▼atextual▼2sapdv▼3EBSCOhost▲
505	0		▼aCover -- Title Page -- Copyright and Credits -- Foreword -- Contributors -- Table of Contents -- Preface -- Part 1: Exam Overview -- Chapter 1: Overview of the Certification Guide and Exam -- Overview of the certification exam -- Distribution of questions -- Resources to prepare for the exam -- Resources available during the exam -- Registering for your exam -- Prerequisites for the exam -- Online proctored exam -- Types of questions -- Theoretical questions -- Code-based questions -- Summary -- Part 2: Introducing Spark -- Chapter 2: Understanding Apache Spark and Its Applications -- What is Apache Spark? -- The history of Apache Spark -- Understanding Spark differentiators -- The components of Spark -- Why choose Apache Spark? -- Speed -- Reusability -- In-memory computation -- A unified platform -- What are the Spark use cases? -- Big data processing -- Machine learning applications -- Real-time streaming -- Graph analytics -- Who are the Spark users? -- Data analysts -- Data engineers -- Data scientists -- Machine learning engineers -- Summary -- Sample questions -- Chapter 3: Spark Architecture and Transformations -- Spark architecture -- Execution hierarchy -- Spark components -- Spark driver -- SparkSession -- Cluster manager -- Spark executors -- Partitioning in Spark -- Deployment modes -- RDDs -- Lazy computation -- Transformations -- Summary -- Sample questions -- Answers -- Part 3: Spark Operations -- Chapter 4: Spark DataFrames and their Operations -- Getting Started in PySpark -- Installing Spark -- Creating a Spark session -- Dataset API -- DataFrame API -- Creating DataFrame operations -- Using a list of rows -- Using a list of rows with schema -- Using Pandas DataFrames -- Using tuples -- How to view the DataFrames -- Viewing DataFrames -- Viewing top n rows -- Viewing DataFrame schema -- Viewing data vertically.▲
505	8		▼aViewing columns of data -- Viewing summary statistics -- Collecting the data -- Using take -- Using tail -- Using head -- Counting the number of rows of data -- Converting a PySpark DataFrame to a Pandas DataFrame -- How to manipulate data on rows and columns -- Selecting columns -- Creating columns -- Dropping columns -- Updating columns -- Renaming columns -- Finding unique values in a column -- Changing the case of a column -- Filtering a DataFrame -- Logical operators in a DataFrame -- Using isin() -- Datatype conversions -- Dropping null values from a DataFrame -- Dropping duplicates from a DataFrame -- Using aggregates in a DataFrame -- Summary -- Sample question -- Answer -- Chapter 5: Advanced Operations and Optimizations in Spark -- Grouping data in Spark and different Spark joins -- Using groupBy in a DataFrame -- A complex groupBy statement -- Joining DataFrames in Spark -- Reading and writing data -- Reading and writing CSV files -- Reading and writing Parquet files -- Reading and writing ORC files -- Reading and writing Delta files -- Using SQL in Spark -- UDFs in Apache Spark -- What are UDFs? -- Creating and registering UDFs -- Use cases for UDFs -- Best practices for using UDFs -- Optimizations in Apache Spark -- Understanding optimization in Spark -- Catalyst optimizer -- Adaptive Query Execution (AQE) -- Data-based optimizations in Apache Spark -- Addressing the small file problem in Apache Spark -- Tackling data skew in Apache Spark -- Managing data spills in Apache Spark -- Managing data shuffle in Apache Spark -- Shuffle joins -- Shuffle sort-merge joins -- Broadcast joins -- Broadcast hash joins -- Narrow and wide transformations in Apache Spark -- Narrow transformations -- Wide transformations -- Choosing between narrow and wide transformations -- Optimizing wide transformations -- Persisting and caching in Apache Spark.▲
505	8		▼aUnderstanding data persistence -- Caching data -- Unpersisting data -- Best practices -- Repartitioning and coalescing in Apache Spark -- Understanding data partitioning -- Repartitioning data -- Coalescing data -- Use cases for repartitioning and coalescing -- Best practices -- Summary -- Sample questions -- Answers -- Chapter 6: SQL Queries in Spark -- What is Spark SQL? -- Advantages of Spark SQL -- Integration with Apache Spark -- Key concepts -- DataFrames and datasets -- Getting started with Spark SQL -- Loading and saving data -- Utilizing Spark SQL to filter and select data based on specific criteria -- Exploring sorting and aggregation operations using Spark SQL -- Grouping and aggregating data -- grouping data based on specific columns and performing aggregate functions -- Advanced Spark SQL operations -- Leveraging window functions to perform advanced analytical operations on DataFrames -- User-defined functions -- Working with complex data types -- pivot and unpivot -- Summary -- Sample questions -- Answers -- Part 4: Spark Applications -- Chapter 7: Structured Streaming in Spark -- Real-time data processing -- What is streaming? -- Streaming architectures -- Introducing Spark Streaming -- Exploring the architecture of Spark Streaming -- Key concepts -- Advantages -- Challenges -- Introducing Structured Streaming -- Key features and advantages -- Structured Streaming versus Spark Streaming -- Limitations and considerations -- Streaming fundamentals -- Stateless streaming -- processing one event at a time -- Stateful streaming -- maintaining stateful information -- The differences between stateless and stateful streaming -- Structured Streaming concepts -- Event time and processing time -- Watermarking and late data handling -- Triggers and output modes -- Windowing operations -- Joins and aggregations -- Streaming sources and sinks.▲
505	8		▼aBuilt-in streaming sources -- Custom streaming sources -- Built-in streaming sinks -- Custom streaming sinks -- Advanced techniques in Structured Streaming -- Handling fault tolerance -- Handling schema evolution -- Different joins in Structured Streaming -- Stream-stream joins -- Stream-static joins -- Final thoughts and future developments -- Summary -- Chapter 8: Machine Learning with Spark ML -- Introduction to ML -- The key concepts of ML -- Types of ML -- Types of supervised learning -- ML with Spark -- Advantages of Apache Spark for large-scale ML -- Spark MLlib versus Spark ML -- ML life cycle -- Problem statement -- Data preparation and feature engineering -- Model training and evaluation -- Model deployment -- Model monitoring and management -- Model iteration and improvement -- Case studies and real-world examples -- Customer churn prediction -- Fraud detection -- Future trends in Spark ML and distributed ML -- Summary -- Part 5: Mock Papers -- Chapter 9: Mock Test 1 -- Questions -- Answers -- Chapter 10: Mock Test 2 -- Questions -- Answers -- Index -- Other Books You May Enjoy.▲
520			▼aLearn the concepts and exercises needed to get certified as a Databricks Associate Developer for Apache Spark 3.0 and validate your skills as a Spark expert with an industry-recognized credential Key Features Understand the fundamentals of Apache Spark to help you design robust and fast Spark applications Delve into various data manipulation components for each phase of your data engineering project Prepare for the certification exam with sample questions and mock exams, and get closer to your goal Purchase of the print or Kindle book includes a free PDF eBook Book Description With extensive data being collected every second, computing power cannot keep up with this pace of rapid growth. To make use of all the data, Spark has become a de facto standard for big data processing. Migrating data processing to Spark will not only help you save resources that will allow you to focus on your business, but also enable you to modernize your workloads by leveraging the capabilities of Spark and the modern technology stack for creating new business opportunities. This book is a comprehensive guide that lets you explore the core components of Apache Spark, its architecture, and its optimization. You'll become familiar with the Spark dataframe API and its components needed for data manipulation. Next, you'll find out what Spark streaming is and why it's important for modern data stacks, before learning about machine learning in Spark and its different use cases. What's more, you'll discover sample questions at the end of each section along with two mock exams to help you prepare for the certification exam. By the end of this book, you'll know what to expect in the exam and how to pass it with enough understanding of Spark and its tools. You'll also be able to apply this knowledge in a real-world setting and take your skillset to the next level. What you will learn Create and manipulate SQL queries in Spark Build complex Spark functions using Spark UDFs Architect big data apps with Spark fundamentals for optimal design Apply techniques to manipulate and optimize big data applications Build real-time or near-real-time applications using Spark Streaming Work with Apache Spark for machine learning applications Who this book is for This book is for you if you're a professional looking to venture into the world of big data and data engineering, a data professional who wants to endorse your knowledge of Spark, or a student. Although working knowledge of Python is required, no prior Spark knowledge is needed. Additionally, experience with Pyspark will be beneficial.▲
532		0	▼3EBSCOhost▼a"EBSCO evaluates our products based on the Web Content Accessibility Guidelines (WCAG) and the related Section 508 and EN 301 549 regulations in the US and EU. Most EBSCO products are substantially conformant with WCAG 2.2 level AA." Source: https://connect.ebsco.com/s/article/EBSCO-VPATs?language=en_US. Last accessed April 22, 2025.▲
590			▼aAdded to collection customer.56279.3▲
630	0	0	▼aSpark (Electronic resource : Apache Software Foundation)▼xExaminations▼vStudy guides.▲
650		0	▼aCloud computing.▲
650		0	▼aBig data.▲
776	0	8	▼iPrint version:▼z1804619787▼z9781804619780▲
856	4	0	▼3EBSCOhost▼uhttps://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=3910729▲

미리보기 상세보기

DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK USING PYTHON[electronic resource] : the ultimate guide to getting certified in Apache Spark using practical examples with Python

자료유형

국외eBook

서명/책임사항

DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK USING PYTHON [electronic resource] : the ultimate guide to getting certified in Apache Spark using practical examples with Python.

개인저자

Shah, Saba

판사항

1st edition.

발행사항

Birmingham, UK : Packt Publishing Ltd. , 2024.

형태사항

1 online resource

내용주기

Cover -- Title Page -- Copyright and Credits -- Foreword -- Contributors -- Table of Contents -- Preface -- Part 1: Exam Overview -- Chapter 1: Overview of the Certification Guide and Exam -- Overview of the certification exam -- Distribution of questions -- Resources to prepare for the exam -- Resources available during the exam -- Registering for your exam -- Prerequisites for the exam -- Online proctored exam -- Types of questions -- Theoretical questions -- Code-based questions -- Summary -- Part 2: Introducing Spark -- Chapter 2: Understanding Apache Spark and Its Applications -- What is Apache Spark? -- The history of Apache Spark -- Understanding Spark differentiators -- The components of Spark -- Why choose Apache Spark? -- Speed -- Reusability -- In-memory computation -- A unified platform -- What are the Spark use cases? -- Big data processing -- Machine learning applications -- Real-time streaming -- Graph analytics -- Who are the Spark users? -- Data analysts -- Data engineers -- Data scientists -- Machine learning engineers -- Summary -- Sample questions -- Chapter 3: Spark Architecture and Transformations -- Spark architecture -- Execution hierarchy -- Spark components -- Spark driver -- SparkSession -- Cluster manager -- Spark executors -- Partitioning in Spark -- Deployment modes -- RDDs -- Lazy computation -- Transformations -- Summary -- Sample questions -- Answers -- Part 3: Spark Operations -- Chapter 4: Spark DataFrames and their Operations -- Getting Started in PySpark -- Installing Spark -- Creating a Spark session -- Dataset API -- DataFrame API -- Creating DataFrame operations -- Using a list of rows -- Using a list of rows with schema -- Using Pandas DataFrames -- Using tuples -- How to view the DataFrames -- Viewing DataFrames -- Viewing top n rows -- Viewing DataFrame schema -- Viewing data vertically.
Viewing columns of data -- Viewing summary statistics -- Collecting the data -- Using take -- Using tail -- Using head -- Counting the number of rows of data -- Converting a PySpark DataFrame to a Pandas DataFrame -- How to manipulate data on rows and columns -- Selecting columns -- Creating columns -- Dropping columns -- Updating columns -- Renaming columns -- Finding unique values in a column -- Changing the case of a column -- Filtering a DataFrame -- Logical operators in a DataFrame -- Using isin() -- Datatype conversions -- Dropping null values from a DataFrame -- Dropping duplicates from a DataFrame -- Using aggregates in a DataFrame -- Summary -- Sample question -- Answer -- Chapter 5: Advanced Operations and Optimizations in Spark -- Grouping data in Spark and different Spark joins -- Using groupBy in a DataFrame -- A complex groupBy statement -- Joining DataFrames in Spark -- Reading and writing data -- Reading and writing CSV files -- Reading and writing Parquet files -- Reading and writing ORC files -- Reading and writing Delta files -- Using SQL in Spark -- UDFs in Apache Spark -- What are UDFs? -- Creating and registering UDFs -- Use cases for UDFs -- Best practices for using UDFs -- Optimizations in Apache Spark -- Understanding optimization in Spark -- Catalyst optimizer -- Adaptive Query Execution (AQE) -- Data-based optimizations in Apache Spark -- Addressing the small file problem in Apache Spark -- Tackling data skew in Apache Spark -- Managing data spills in Apache Spark -- Managing data shuffle in Apache Spark -- Shuffle joins -- Shuffle sort-merge joins -- Broadcast joins -- Broadcast hash joins -- Narrow and wide transformations in Apache Spark -- Narrow transformations -- Wide transformations -- Choosing between narrow and wide transformations -- Optimizing wide transformations -- Persisting and caching in Apache Spark.
Understanding data persistence -- Caching data -- Unpersisting data -- Best practices -- Repartitioning and coalescing in Apache Spark -- Understanding data partitioning -- Repartitioning data -- Coalescing data -- Use cases for repartitioning and coalescing -- Best practices -- Summary -- Sample questions -- Answers -- Chapter 6: SQL Queries in Spark -- What is Spark SQL? -- Advantages of Spark SQL -- Integration with Apache Spark -- Key concepts -- DataFrames and datasets -- Getting started with Spark SQL -- Loading and saving data -- Utilizing Spark SQL to filter and select data based on specific criteria -- Exploring sorting and aggregation operations using Spark SQL -- Grouping and aggregating data -- grouping data based on specific columns and performing aggregate functions -- Advanced Spark SQL operations -- Leveraging window functions to perform advanced analytical operations on DataFrames -- User-defined functions -- Working with complex data types -- pivot and unpivot -- Summary -- Sample questions -- Answers -- Part 4: Spark Applications -- Chapter 7: Structured Streaming in Spark -- Real-time data processing -- What is streaming? -- Streaming architectures -- Introducing Spark Streaming -- Exploring the architecture of Spark Streaming -- Key concepts -- Advantages -- Challenges -- Introducing Structured Streaming -- Key features and advantages -- Structured Streaming versus Spark Streaming -- Limitations and considerations -- Streaming fundamentals -- Stateless streaming -- processing one event at a time -- Stateful streaming -- maintaining stateful information -- The differences between stateless and stateful streaming -- Structured Streaming concepts -- Event time and processing time -- Watermarking and late data handling -- Triggers and output modes -- Windowing operations -- Joins and aggregations -- Streaming sources and sinks.
Built-in streaming sources -- Custom streaming sources -- Built-in streaming sinks -- Custom streaming sinks -- Advanced techniques in Structured Streaming -- Handling fault tolerance -- Handling schema evolution -- Different joins in Structured Streaming -- Stream-stream joins -- Stream-static joins -- Final thoughts and future developments -- Summary -- Chapter 8: Machine Learning with Spark ML -- Introduction to ML -- The key concepts of ML -- Types of ML -- Types of supervised learning -- ML with Spark -- Advantages of Apache Spark for large-scale ML -- Spark MLlib versus Spark ML -- ML life cycle -- Problem statement -- Data preparation and feature engineering -- Model training and evaluation -- Model deployment -- Model monitoring and management -- Model iteration and improvement -- Case studies and real-world examples -- Customer churn prediction -- Fraud detection -- Future trends in Spark ML and distributed ML -- Summary -- Part 5: Mock Papers -- Chapter 9: Mock Test 1 -- Questions -- Answers -- Chapter 10: Mock Test 2 -- Questions -- Answers -- Index -- Other Books You May Enjoy.

요약주기

Learn the concepts and exercises needed to get certified as a Databricks Associate Developer for Apache Spark 3.0 and validate your skills as a Spark expert with an industry-recognized credential Key Features Understand the fundamentals of Apache Spark to help you design robust and fast Spark applications Delve into various data manipulation components for each phase of your data engineering project Prepare for the certification exam with sample questions and mock exams, and get closer to your goal Purchase of the print or Kindle book includes a free PDF eBook Book Description With extensive data being collected every second, computing power cannot keep up with this pace of rapid growth. To make use of all the data, Spark has become a de facto standard for big data processing. Migrating data processing to Spark will not only help you save resources that will allow you to focus on your business, but also enable you to modernize your workloads by leveraging the capabilities of Spark and the modern technology stack for creating new business opportunities. This book is a comprehensive guide that lets you explore the core components of Apache Spark, its architecture, and its optimization. You'll become familiar with the Spark dataframe API and its components needed for data manipulation. Next, you'll find out what Spark streaming is and why it's important for modern data stacks, before learning about machine learning in Spark and its different use cases. What's more, you'll discover sample questions at the end of each section along with two mock exams to help you prepare for the certification exam. By the end of this book, you'll know what to expect in the exam and how to pass it with enough understanding of Spark and its tools. You'll also be able to apply this knowledge in a real-world setting and take your skillset to the next level. What you will learn Create and manipulate SQL queries in Spark Build complex Spark functions using Spark UDFs Architect big data apps with Spark fundamentals for optimal design Apply techniques to manipulate and optimize big data applications Build real-time or near-real-time applications using Spark Streaming Work with Apache Spark for machine learning applications Who this book is for This book is for you if you're a professional looking to venture into the world of big data and data engineering, a data professional who wants to endorse your knowledge of Spark, or a student. Although working knowledge of Python is required, no prior Spark knowledge is needed. Additionally, experience with Pyspark will be beneficial.

주제

Spark (Electronic resource : Apache Software Foundation), Examinations, Study guides.
Cloud computing.
Big data.

기타형태저록

Print version : 1804619787 9781804619780

ISBN

9781804616208 1804616206

원문 등 관련정보

링크 정보

https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=3910729

북토크

자유롭게 책을 읽고
느낀점을 적어주세요

글쓰기

부산대학교 도서관

원문 등 관련정보

링크 정보

관련 서지자원

북토크

청구기호 브라우징

부산대학교 도서관

소장자료

원문 등 관련정보

링크 정보

관련 서지자원

북토크

청구기호 브라우징

내 서재 + 폴더 추가

메일 발송

로그인이 필요합니다.

내 서재