Small Data

Small Data Small data appears to be a very exciting movement that is moving the overton window away from Big Data onto much simpler and cheaper solutions. Its main drivers are the following: organizations don’t use that much data hardware is getting really, really good 1. Orgs don’t use much data Of queries that scan at least 1 MB, the median query scans about 100 MB. The 99.9th percentile query scans about 300 GB. Analytic databases like Snowflake and Redshift are “massively parallel processing” systems, but 99.9% of real world queries could run on a single large node. I did the analysis for this post using DuckDB, and it can scan the entire 11 GB Snowflake query sample on my Mac Studio in a few seconds. The small data hypothesis is true. src:…

Read more on Lobste.rs