Python Polars: The Definitive Guide

Introduction

Welcome to the official website of the book Python Polars: The Definitive Guide by Jeroen Janssens and Thijs Nieuwdorp. The book is now available in both print and ebook formats at your favorite bookstore. To decide if this book is right for you (or while you’re waiting for your copy to arrive), you can:

  • Get a sneak peak by downloading the sample chapter below.
  • Read what others have to say.
  • Join the community to connect with other readers.
  • Explore the repository, where you’ll find all the code and data from the book to follow along.
  • Connect with authors Jeroen and Thijs.

Get the Book

The code is idiomatic, formatted using black, thoroughly tested, and sprinkled with helpful callouts.

The code is idiomatic, formatted using black, thoroughly tested, and sprinkled with helpful callouts.

Most visualizations are created with the plotnine package.

Most visualizations are created with the plotnine package.

We cover many related topics such as encodings and floating point representations.

We cover many related topics such as encodings and floating point representations.

We dive deep into the concept of expressions, the building blocks of every query.

We dive deep into the concept of expressions, the building blocks of every query.

The book contains many tips, tricks, and warnings based on our own real-world experience with Polars.

The book contains many tips, tricks, and warnings based on our own real-world experience with Polars.

Each chapter ends with useful takeaways.

Each chapter ends with useful takeaways.

Even the Great Tables package makes an appearance.

Even the Great Tables package makes an appearance.

We collaborated with NVIDIA and Dell Technologies to benchmark Polars on the GPU.

We collaborated with NVIDIA and Dell Technologies to benchmark Polars on the GPU.

Book Description

Unlock the power of Polars, a Python package for transforming, analyzing, and visualizing data. In this hands-on guide, Jeroen Janssens and Thijs Nieuwdorp walk you through every feature of Polars, showing you how to use it for real-world tasks like data wrangling, exploratory data analysis, building pipelines, and more.

Whether you’re a seasoned data professional or new to data science, you’ll quickly master Polars’ expressive API and its underlying concepts. You don’t need to have experience with pandas, but if you do, this book will help you make a seamless transition. The many practical examples and real-world datasets are available on GitHub, so you can easily follow along.

  • Process data from CSV, Parquet, spreadsheets, databases, and the cloud
  • Get a solid understanding of Expressions, the building blocks of every query
  • Handle complex data types, including text, time, and nested structures
  • Use both eager and lazy APIs, and know when to use each
  • Visualize your data with Altair, hvPlot, plotnine, and Great Tables
  • Extend Polars with your own Python functions and Rust plugins
  • Leverage GPU acceleration to boost performance even further

Who This Book Is For

This book is designed for anyone looking to leverage the power of Polars in Python to transform, analyze, and visualize data more efficiently and effectively. Whether you’re a seasoned data analyst, a data engineer, or even someone new to the world of data science, you’ll find valuable insights and practical examples that can be applied directly to real-world challenges. To illustrate the diverse ways in which Polars can benefit different users, let’s take a look at two key personas: Hanna, a seasoned data analyst, and Kosjo, an experienced data engineer.

Hanna: The Data Analyst

Hanna is a seasoned data analyst. She’s comfortable with Python and has a good grasp of pandas, but occasionally struggles with its syntax and feels there must be a more elegant way to perform certain operations. Like many analysts, she regularly tackles exploratory data analysis (EDA) tasks that involve cleaning, transforming, and summarizing large datasets. However, she often finds herself battling with pandas’ sometimes complex and unintuitive syntax, especially when it comes to performing more advanced data manipulations or scaling her work to larger datasets.

For someone like Hanna, this book offers a streamlined, more intuitive alternative to pandas, with the added benefit of being able to handle data at a larger scale without sacrificing speed or readability. Polars provides a more Pythonic and performant way to perform the types of analyses Hanna does daily. By learning Polars, Hanna can simplify her workflow, write more elegant code, and unlock greater performance in her exploratory data analysis tasks.

Kosjo: The Data Engineer

Kosjo is an experienced data engineer, tasked with processing large volumes of data and building pipelines that support complex data workflows. They are highly skilled in Python and work with various technologies to ensure smooth data movement and processing. As part of their role, Kosjo is often responsible for optimizing processes to reduce infrastructure costs, especially when working with big data. This means reducing the time and resources required for heavy transformations without having to manage a distributed computing cluster.

Polars can help Kosjo achieve these goals. It is designed for speed and performance, especially when dealing with large datasets or intensive transformations. Its parallel execution model allows Kosjo to process data faster than traditional pandas, while its intuitive API keeps development simple. This book will guide Kosjo through leveraging Polars for complex data engineering tasks, enabling them to scale their workflows efficiently without the overhead of distributed systems or dealing with complex setup configurations.

A Broader Audience

In addition to these two personas, this book is also for data scientists, software engineers, and anyone else working with Python who is looking to explore the capabilities of Polars. Whether you’re handling small to medium-sized datasets or need to process terabytes of data, Polars offers a unified, high-performance approach to working with data. If you’re looking for a faster, more elegant way to analyze and manipulate your data without compromising on readability, this book will serve as a valuable resource to enhance your data-handling skills.

In summary, whether you’re looking to improve your day-to-day data analysis or streamline your data engineering workflows, Python Polars: The Definitive Guide is designed to help you unlock the full potential of Polars and solve data challenges with speed and elegance.

How This Book Is Organized

This book contains 18 chapters, spread over five parts and an appendix. Each chapter starts with a short introduction of the things we’ll discuss and concludes with key takeaways.

Part 1: Begin

The first part, “Begin,” contains the first three chapters of the book. These chapters are meant to introduce you to Polars, get you up and running, and help you start using it yourself.

Chapter 1 discusses what Polars is, explains why you should use it, and demonstrates its capabilities through a showcase. Chapter 2 covers everything you need to get started with Polars, including instructions on how to install Polars and how to get the code and data used in this book. If you have any experience using pandas, then Chapter 3 will help you transition to Polars by explaining and showing the differences between the two packages.

Part 2: Form

The name of the second part, “Form,” has two meanings, as it’s about both the form of data structures and data types as well as forming DataFrames from some source. In other words, you’ll learn how to read and write data, and how this data is stored and handled in Polars.

Chapter 4 provides an overview of the data structures and data types that Polars supports and how missing data is handled. Chapter 5 explains the difference between the eager API, which is used for quick results, and the lazy API, which is used for optimized performance. Chapter 6 covers how to read and write data from and to various file formats, such as CSV, Parquet, and Arrow.

Part 3: Express

Expressions play a central role within Polars, so it’s only fitting that this third part, “Express,” is in the middle of the book.

Chapter 7 starts with examples of where expressions are used, provides a formal definition of an expression, and explains how you can create them. Chapter 8 enumerates the many methods for continuing expressions, including mathematical operations, working with missing values, applying smoothing, and summarizing. Chapter 9 shows how to combine multiple expressions using, for example, arithmetic and Boolean logic.

Part 4: Transform

Once you understand expressions, you can incorporate them into functions and methods to transform your data, which is what this fourth part, “Transform,” is all about.

Chapter 10 explains how to select and create columns and work with column names and selectors. Chapter 11 shows the different ways of filtering and sorting rows. Chapter 12 covers how to work with textual, temporal, and nested data types. Chapter 13 goes into grouping, aggregating, and summarizing data. Chapter 14 explains how to combine different DataFrames using joins and concatenations. Chapter 15 shows how to reshape data, through (un)pivoting, stacking, and extending.

Part 5: Advance

The last part of this book, “Advance,” contains a variety of more advanced topics.

Chapter 16 explains how to visualize data using a selection of visualization packages, including Altair, hvPlot, and plotnine. Chapter 17 shows how you can extend Polars with custom Python functions and your own Rust plugins. Chapter 18 looks behind the curtains of Polars, explaining how it’s built, how it works under the hood, and why it’s so fast.

The book concludes with an appendix that covers how to leverage the power of GPUs to accelerate Polars, offering insights into maximizing performance.

Get Free Sample Chapter

To get a good idea of what the book is all about, you can read the first chapter for free. This chapter discusses what Polars is, explains why you should use it, and demonstrates its capabilities through a showcase. Enter your name and email address below to receive an email with a link to the PDF (4,7 MB).

Feel free to unsubscribe from the newsletter once you’ve got the PDF. Stay subscribed if you’d like to receive future updates about our book and other resources related to Polars.