Unlocking OLAP in MySQL

This is a preliminary draft - see avaitla16.com/how-i-write

Historically you've needed an OLTP (online transaction processing system - like mysql, postgres, sqlserver, oracle) database to run transactions on and then an ETL to an OLAP system to perform reports, the modern data stack including:

  • Modern Data Stack Warehouses: Bigquery, Databricks, Snowflake, Redshift, Clickhouse, Materialize, etc
  • Enterprise Products: Sybase IQ, Vertica, Informix, Teradata, Oracle Warehouse etc

Warehouses generally operate with massive amounts of compute in memory, columnar storage, and ruthless parallelization (often times eschewing indexes all together). They also come with helpful SQL that's intended for Analysts convenience (pivoting, unnest, statistics or ml functions). An extensive background can be found here: holistics.io/blog/the-rise-and-fall-of-the-..

A discussion on the separation of OLTP and OLAP is given here: philip.greenspun.com/wtr/data-warehousing.h.. and databases which attempt to combine them is called HTAP (en.wikipedia.org/wiki/Hybrid_transactional/..)

I believe the need to have a separate OLTP and OLAP is a constraint solely due to technology and splitting the two causes a lot of problems:

  • A social separation of the development team from the data team (devs generally understand their OLTP syntax but not the OLAP syntax, and data teams understand the OLAP syntax but maybe not the OLTP one). This causes a gap in data understanding as developers who build the features and no many nuances aren't doing reporting against it. This is given to data engineering who need to unravel the development teams schemas after the fact.
  • Lack of realtime analytics (fivetrans minimum sync frequency is 5 minutes and from there materializations may run nightly)
  • There are computations that may be sophisticated that you only want to do once and not have repeated in OLAP (for risk of incorrect calculations). This includes a persons invoice which they need to see in OLTP but also finance needs to see in an analytics report.
  • Early stage companies may create more overhead of having two systems rather than doing everything in one place and artificially have to adopt the syntax and nuances of an OLAP system - often times the dev team is also much larger than the analysts team so centralization in one HTAP system means .

Indeed for folks using Postgres, they often use it for reporting as well on small datasets so you only need to know the Postgres dialect. Citus Data is one such HTAP example for postgres with distributed tables (citusdata.com), but you can also see tuning advice here (narrator.ai/blog/using-postgresql-as-a-data..). Similarly with SQL Server can also run reporting queries with a different storage engine seamlessly under the hood in SQL Server Reporting Services.

To date this has not been possible in MySQL. However there are a key set of things that have changed in the past ten years to start making this possible now:

We'll now go through a sample of how MySQL can be a blazing fast reporting engine... TODO migrate a workload and show the heatwave performance for analytics. We focus on heatwave for 100% compatibility and the ability to use it as a replication chain - so no ETL or query syntax changes needed whatsoever.