Python Polars: The Definitive Guide
Introduction
Welcome to the official website of the book Python Polars: The Definitive Guide by Jeroen Janssens and Thijs Nieuwdorp. The book is now available in both print and ebook formats at your favorite bookstore. To decide if this book is right for you (or while you’re waiting for your copy to arrive), you can:
- Get a sneak peak by downloading the sample chapter below.
- Read what others have to say.
- Join the community to connect with other readers.
- Explore the repository, where you’ll find all the code and data from the book to follow along.
- Connect with authors Jeroen and Thijs.
Book Description
Unlock the power of Polars, a Python package for transforming, analyzing, and visualizing data. In this hands-on guide, Jeroen Janssens and Thijs Nieuwdorp walk you through every feature of Polars, showing you how to use it for real-world tasks like data wrangling, exploratory data analysis, building pipelines, and more.
Whether you’re a seasoned data professional or new to data science, you’ll quickly master Polars’ expressive API and its underlying concepts. You don’t need to have experience with pandas, but if you do, this book will help you make a seamless transition. The many practical examples and real-world datasets are available on GitHub, so you can easily follow along.
- Process data from CSV, Parquet, spreadsheets, databases, and the cloud
- Get a solid understanding of Expressions, the building blocks of every query
- Handle complex data types, including text, time, and nested structures
- Use both eager and lazy APIs, and know when to use each
- Visualize your data with Altair, hvPlot, plotnine, and Great Tables
- Extend Polars with your own Python functions and Rust plugins
- Leverage GPU acceleration to boost performance even further
Who This Book Is For
This book is designed for anyone looking to leverage the power of Polars in Python to transform, analyze, and visualize data more efficiently and effectively. Whether you’re a seasoned data analyst, a data engineer, or even someone new to the world of data science, you’ll find valuable insights and practical examples that can be applied directly to real-world challenges. To illustrate the diverse ways in which Polars can benefit different users, let’s take a look at two key personas: Hanna, a seasoned data analyst, and Kosjo, an experienced data engineer.
Hanna: The Data Analyst
Hanna is a seasoned data analyst. She’s comfortable with Python and has a good grasp of pandas, but occasionally struggles with its syntax and feels there must be a more elegant way to perform certain operations. Like many analysts, she regularly tackles exploratory data analysis (EDA) tasks that involve cleaning, transforming, and summarizing large datasets. However, she often finds herself battling with pandas’ sometimes complex and unintuitive syntax, especially when it comes to performing more advanced data manipulations or scaling her work to larger datasets.
For someone like Hanna, this book offers a streamlined, more intuitive alternative to pandas, with the added benefit of being able to handle data at a larger scale without sacrificing speed or readability. Polars provides a more Pythonic and performant way to perform the types of analyses Hanna does daily. By learning Polars, Hanna can simplify her workflow, write more elegant code, and unlock greater performance in her exploratory data analysis tasks.
Kosjo: The Data Engineer
Kosjo is an experienced data engineer, tasked with processing large volumes of data and building pipelines that support complex data workflows. They are highly skilled in Python and work with various technologies to ensure smooth data movement and processing. As part of their role, Kosjo is often responsible for optimizing processes to reduce infrastructure costs, especially when working with big data. This means reducing the time and resources required for heavy transformations without having to manage a distributed computing cluster.
Polars can help Kosjo achieve these goals. It is designed for speed and performance, especially when dealing with large datasets or intensive transformations. Its parallel execution model allows Kosjo to process data faster than traditional pandas, while its intuitive API keeps development simple. This book will guide Kosjo through leveraging Polars for complex data engineering tasks, enabling them to scale their workflows efficiently without the overhead of distributed systems or dealing with complex setup configurations.
A Broader Audience
In addition to these two personas, this book is also for data scientists, software engineers, and anyone else working with Python who is looking to explore the capabilities of Polars. Whether you’re handling small to medium-sized datasets or need to process terabytes of data, Polars offers a unified, high-performance approach to working with data. If you’re looking for a faster, more elegant way to analyze and manipulate your data without compromising on readability, this book will serve as a valuable resource to enhance your data-handling skills.
In summary, whether you’re looking to improve your day-to-day data analysis or streamline your data engineering workflows, Python Polars: The Definitive Guide is designed to help you unlock the full potential of Polars and solve data challenges with speed and elegance.
How This Book Is Organized
This book contains 18 chapters, spread over five parts and an appendix. Each chapter starts with a short introduction of the things we’ll discuss and concludes with key takeaways.
Part 1: Begin
The first part, “Begin,” contains the first three chapters of the book. These chapters are meant to introduce you to Polars, get you up and running, and help you start using it yourself.
Chapter 1 discusses what Polars is, explains why you should use it, and demonstrates its capabilities through a showcase. Chapter 2 covers everything you need to get started with Polars, including instructions on how to install Polars and how to get the code and data used in this book. If you have any experience using pandas, then Chapter 3 will help you transition to Polars by explaining and showing the differences between the two packages.
Part 2: Form
The name of the second part, “Form,” has two meanings, as it’s about both the form of data structures and data types as well as forming DataFrames from some source. In other words, you’ll learn how to read and write data, and how this data is stored and handled in Polars.
Chapter 4 provides an overview of the data structures and data types that Polars supports and how missing data is handled. Chapter 5 explains the difference between the eager API, which is used for quick results, and the lazy API, which is used for optimized performance. Chapter 6 covers how to read and write data from and to various file formats, such as CSV, Parquet, and Arrow.
Part 3: Express
Expressions play a central role within Polars, so it’s only fitting that this third part, “Express,” is in the middle of the book.
Chapter 7 starts with examples of where expressions are used, provides a formal definition of an expression, and explains how you can create them. Chapter 8 enumerates the many methods for continuing expressions, including mathematical operations, working with missing values, applying smoothing, and summarizing. Chapter 9 shows how to combine multiple expressions using, for example, arithmetic and Boolean logic.
Part 4: Transform
Once you understand expressions, you can incorporate them into functions and methods to transform your data, which is what this fourth part, “Transform,” is all about.
Chapter 10 explains how to select and create columns and work with column names and selectors. Chapter 11 shows the different ways of filtering and sorting rows. Chapter 12 covers how to work with textual, temporal, and nested data types. Chapter 13 goes into grouping, aggregating, and summarizing data. Chapter 14 explains how to combine different DataFrames using joins and concatenations. Chapter 15 shows how to reshape data, through (un)pivoting, stacking, and extending.
Part 5: Advance
The last part of this book, “Advance,” contains a variety of more advanced topics.
Chapter 16 explains how to visualize data using a selection of visualization packages, including Altair, hvPlot, and plotnine. Chapter 17 shows how you can extend Polars with custom Python functions and your own Rust plugins. Chapter 18 looks behind the curtains of Polars, explaining how it’s built, how it works under the hood, and why it’s so fast.
The book concludes with an appendix that covers how to leverage the power of GPUs to accelerate Polars, offering insights into maximizing performance.
Get Free Sample Chapter
To get a good idea of what the book is all about, you can read the first chapter for free. This chapter discusses what Polars is, explains why you should use it, and demonstrates its capabilities through a showcase. Enter your name and email address below to receive an email with a link to the PDF (4,7 MB).
Feel free to unsubscribe from the newsletter once you’ve got the PDF. Stay subscribed if you’d like to receive future updates about our book and other resources related to Polars.