Modern companies are investing in data warehouses that centralize valuable information about their operations, products, finances, and more. Those warehouses often contain millions upon millions of records that can enable smarter, data-driven business decisions. But working with such massive datasets can quickly bog down data exploration and data visualization tools, leaving analysts waiting for charts to generate — if their tools don’t crash while trying. 

In Observable Canvases, we want users to spend time answering questions with their data, not waiting for visualizations to load. Here, we describe why charts that represent a lot of data tend to be slow, and how we keep charts fast and useful in canvases. 

Why big data visualizations are slow to load

Data visualizations that pull in huge amounts of data are slow for several reasons. 

In business analytics, the records you want to visualize often live in a cloud data warehouse. If you’re writing queries that return large numbers of records, you might be in for a long wait due to database latency (how long it takes for the database server to process and return a query response) and network latency (how long it takes to transfer data across a network, often from the database to the client).

Generating a chart from a large number of points also takes time, whether rendered server-side or built on the client in JavaScript, like we do in canvases. Adding millions of marks to a chart area is computationally expensive, and might surpass your device’s processing power. Small changes to chart options, like updating the blend mode or mark opacity, can tank performance. When you reach the limits of what your computer, app, or graphical software is capable of, you can be left watching a spinning wheel that eventually turns into a crash notification. 

A schematic illustration showing where bottlenecks occur in data visualization with large datasets, particularly during slow data processing and transfer, and during chart rendering.

Visualizing large datasets can get bogged down while waiting for data from your company’s data warehouse, and during computationally expensive chart rendering.

When data visualizations are slow to render, you’re left waiting for a new version of your chart to appear each time you make a change. Waiting for charts to load stifles experimentation, iteration, and fluid data exploration. 

But it doesn’t have to be that way. Here’s how we keep charts snappy in Observable Canvases.

How we make faster charts in canvases

In Observable Canvases, we keep visualizations fast in part by using automated queries that return a smaller, aggregated version of the data at a high enough resolution to still produce rich and responsive charts. The goals of doing so are to cut down how much data is transferred, and reduce the amount of subsequent data processing needed in your browser in order to generate a chart. 

Let’s consider an example. We want to make a histogram to explore the distribution of 8 million order prices from an online store. One way to make the chart is to write a query that returns all 8 million individual records and then do the necessary binning and counting client-side when the chart is created. That could be slow at both the data transfer and chart rendering bottlenecks.

Instead, we send an optimized query to the database server that returns data aggregated at a level of resolution needed to make the chart but substantially smaller than the raw data. For our specific histogram example, that query might return counts of 8 million records in 1000 bins — which is far more bins than any reasonable histogram should contain. In that case, you have reduced your starting point from 8 million records to just 1000 values (one count per bin) plus information about the bin endpoints.

After a first pass that returns aggregated data at a higher resolution than what your final chart needs, you can then do subsequent aggregations for coarser bins to produce the final histogram.

Schematic illustration showing initial data aggregation in a database query that returns aggregated data, but at a higher level of resolution than what is needed in the data visualization. Further aggregation on the client can simplify visuals further.

In Observable Canvases, charts are fast because SQL queries return a smaller subset or aggregated version of the data instead of returning all individual values represented in the chart. In the example above, the SQL query returns aggregated counts, but at a much higher resolution than what’s needed for the final chart.

Charts made this way can still be responsive, because they have access to the aggregated but higher resolution data returned by the original query. We don’t need to run additional SQL queries against the database to re-bin values at this point: we can simply let the visualization tool (in this case, Observable Plot) do the work of additional aggregation. For example, in the histogram below the number of bins increases as a user expands the chart width to reveal more detail as space becomes available.

When the returned data is aggregated at a high resolution, charts can still be responsive as shown in the histogram above. The responsive granularity helps keep charts clear and interpretable as dimensions change.

Better exploratory charts don’t always show everything

Returning an aggregated or abbreviated version of the data keeps visualizations fast, but may also leave you feeling that something is lost in the process. We get it. It can be comforting to have each data point in hand, ready to be individually inspected…even if that never actually happens. 

We also know that with really large datasets, speed isn’t the only issue. When charts try to show too much, they quickly become visually overwhelming and less clear. Meaningful patterns get muddled in the noise. 

Data aggregation, binning, and thoughtful truncation can produce more digestible charts that help viewers focus on important patterns, instead of getting lost in the weeds. 

That’s why, in some cases, we aggregate data in our first pass SQL query as described above, then do even further simplification for the final chart. For example, the SQL query for our built-in scatterplot does some initial aggregation by counting observations within bins based on x- and y-coordinates. At most only 10,000 bins (the top 10,000 by count) are displayed; text annotation ensures that a user knows when additional bins are hidden. 

These conditions produce a simpler but still information-rich chart that helps viewers to discern major patterns in their data. And, viewers can lean on chart interactivity in canvases to get a closer look. Brushing over a region of the scatterplot below will produce a focused version of the chart, automatically re-binning the query with smaller extents to show the selected data at higher granularity. 

A dense scatterplot showing the relationship between temperature and electricity usage for Norway buildings. A note in the top right informs a viewer that the data has been truncated.

A default scatterplot in a canvas, visualizing outdoor temperature and electricity imports for Norwegian buildings. Dot size differs based on the count of observations in each 2-D bin that are returned by the “first pass” SQL query. Note the warning in the top right informing a viewer that values are truncated. Data: Lien et al. (2025)

Carefully aggregated and abbreviated data can be a win-win for speed and clarity, especially in the messy middle of data analysis. But you’re not constrained to using our built-in charts in canvases. You can always write your own custom queries, and build bespoke charts in JavaScript, to create different (albeit, potentially slower) views of your data. 

Get past data visualization bottlenecks

In Observable Canvases, built-in SQL queries can return smaller, aggregated versions of data to create fast and clear charts — no matter how many records you’ve got. And, you don’t have to follow our recipe for speed: you can build your own SQL queries and charts in JavaScript for fully customized analysis and visualization.

Learn how the speed, fluidity, and flexibility of data work in canvases helps your team and stakeholders to answer and uncover business questions together.