#HackingFinance [w/ anthemis | group]

welcome to the sixth paradigm

Big Data reflects today’s world where data generating events are measured in the billions and business decisions based on insight derived from this data is measured in seconds. There are few tools that provide deep insight into both live and stationary data as business events are occurring; Druid was designed specifically to serve this purpose.

If you’re not familiar with Druid, it’s a powerful, open source, real-time analytics database designed to allow queries on large quantities of streaming data – that means querying data as it’s being ingested into the system (see previous blog post). Many databases claim they are real-time because they are “real fast;” this usually works for smaller workloads or for customers with infinite IT budgets. For companies like Netflix, whose engineers use Druid to cull through 70 billion log events per day, ingesting over 2 TB per hour at peak times (more on this in a later blog post), real-time means they have to query data as it’s being ingested into the system.