Building Analytics Apps for External Users? Here's How to Get it Right
June 01, 2022

David Wang

Analytics has come a very long way in recent years, with transformational developments widening its impact beyond internal stakeholders and into the hands of external users. This brings with it a range of challenges, and when building analytics applications for customers, for example, one of the first considerations will be the choice of database backend.

While the main options will probably include PostgreSQL, MySQL, or even extending a data warehouse beyond its core BI dashboards and reports, it's important to keep in mind that analytics for external users can be revenue-impacting. As a result, choosing the right tool for the job is essential if organizations are to deliver a high-quality user experience.

Analytics Performance

For many users, the most frustrating element of their analytics experience is performance, and in particular, the wait-state of queries in a processing queue. It's one thing to have an internal business analyst wait a few seconds or even several minutes for a report to load; it's entirely different when analytics functionality is being offered to external users whose tolerance of processing delays will be much lower.

This problem is generally caused by the amount of data to analyze with the processing power of the database and the number of users and API calls. Collectively, this determines how well the database can keep up with the application.

However, there are a variety of approaches to building an interactive data experience with any generic OLAP database when there's a lot of data. The issue? These come at a price. For instance, precomputing all the queries makes the architecture very expensive and rigid, while aggregating the data first minimizes the insights, and limiting the data analyzed to recent events doesn't give users the complete picture. All of these ways involve making compromises.

There is, however, a no-compromise approach that can deliver an optimized architecture and data format built for interactivity at scale. This comes in the form of Apache Druid — a high-performance, real-time analytics database to power analytics applications for any number of users.

Druid employs a uniquely distributed and elastic architecture that prefetches data from a shared data layer into a near-infinite cluster of data servers. This architecture enables faster performance than a decoupled query engine like a cloud data warehouse because there's no data to move and more scalability than a scale-up database like PostgreSQL/MySQL.

Furthermore, Druid provides automatic, multi-level indexing that is built into the data format to drive more queries per core. This goes beyond the typical OLAP columnar format with the addition of a global index, data dictionary, and bitmap index. In doing so, it maximizes CPU cycles for faster crunching.

High Availability Should be a High Priority

To illustrate the value of these capabilities, consider this scenario: if a dev team is building a backend for internal reporting, does it really matter if it goes down for a few minutes or even longer?

For most, the answer is probably not and explains why there's always been tolerance for unplanned downtime and maintenance windows in classical OLAP databases and data warehouses.

But what if the dev team then needs to build an external analytics application that customers will use?

Any outages here can impact revenue, with a serious knock-on effect on issues as varied as team resources to customer satisfaction. As a result, resilience — both high availability and data durability — must be a priority when choosing a database for external analytics applications.

Delivering resilience means posing some important design criteria questions — can you protect from a node or a cluster-wide failure?

How bad would it be to lose data?

What work is involved to protect your app and your data?

The legacy approach to achieving greater resiliency is to replicate nodes and to remember to take backups. But when dev teams are building apps for customers, the sensitivity to data loss is much higher, and as a result, occasional backups aren't fit for purpose.

In contrast, Druid's core architecture is designed to withstand downtime without losing data (even recent events) by implementing high availability (HA) and durability based on automatic, multi-level replication with shared data in S3/object storage. This not only enables the HA properties dev teams expect but also a form of continuous backup that automatically protects and restores the latest state of the database even if an entire cluster is lost.

Cost-Performance Benefits

Building a database that delivers high concurrency means striking the right balance between CPU usage, scalability, and cost. Historically, addressing concurrency was a matter of allocating more hardware to the challenge, and while adding more CPUs certainly allows organizations to run more queries, it can easily become very expensive.

In contrast, databases like Apache Druid are built with optimized storage and query engine that drives down CPU usage. By only reading the data it needs to, the infrastructure can serve more queries in the same timespan.

This is also an important consideration when building external applications that will deliver the performance and resilience required both today and in the future. For those organizations focused on customer retention, being able to scale their infrastructure is key to remaining competitive.

David Wang is VP of Product Marketing at Imply
Share this

Industry News

September 22, 2022

Katalon announced the launch of the Katalon Platform, a modern and comprehensive software quality management platform that enables teams of any size to easily and efficiently test, launch, and optimize apps, products, and software.

September 22, 2022

StackHawk announced its Deeper API Security Test Coverage release.

September 21, 2022

Platform9 announced the launch of its latest open source project, Arlon.

September 21, 2022

Redpanda Data announced Redpanda Console.

September 21, 2022

mabl announced its availability as a private listing on Google Cloud Marketplace.

September 21, 2022

Zesty announced a $75 million Series B funding round led by B Capital and Series A investor Sapphire Ventures.

September 20, 2022

Opsera, the Continuous Orchestration platform for DevOps, announced a free trial of its no-code Salesforce Release Management platform for fast and secure Salesforce releases.

September 20, 2022

Sysdig announced ToDo and Remediation Guru.

September 20, 2022

AutoRABIT announced CodeScan Shield.

September 19, 2022 announced the general availability of the Akuity Platform, a fully-managed SaaS service for simpler, safer and faster Kubernetes application delivery, using Argo.

September 19, 2022

Rocket Software launched Rocket® Support for Zowe, a supporting offering for the Open Mainframe Project’s Zowe® open-source framework for z/OS® and its multiple modern interfaces.

September 19, 2022

Appfire announced the acquisition of German company 7pace.

September 15, 2022

Dell Technologies is expanding its long-standing strategic relationship with Red Hat to offer new solutions that simplify deploying and managing on-premises, containerized infrastructure in multicloud environments.

September 15, 2022

Postman announced Postman v10, the most significant upgrade to the platform in almost a year, offering new features around API governance and security, as well as expanded capabilities in collaboration and integration—and higher productivity than ever.

September 15, 2022

Harness announced the general availability of fully managed Harness GitOps-as-a-Service to enable enterprise continuous delivery (CD) workflows for application and infrastructure deployments.