Building Analytics Apps for External Users? Here's How to Get it Right
June 01, 2022

David Wang
Imply

Analytics has come a very long way in recent years, with transformational developments widening its impact beyond internal stakeholders and into the hands of external users. This brings with it a range of challenges, and when building analytics applications for customers, for example, one of the first considerations will be the choice of database backend.

While the main options will probably include PostgreSQL, MySQL, or even extending a data warehouse beyond its core BI dashboards and reports, it's important to keep in mind that analytics for external users can be revenue-impacting. As a result, choosing the right tool for the job is essential if organizations are to deliver a high-quality user experience.

Analytics Performance

For many users, the most frustrating element of their analytics experience is performance, and in particular, the wait-state of queries in a processing queue. It's one thing to have an internal business analyst wait a few seconds or even several minutes for a report to load; it's entirely different when analytics functionality is being offered to external users whose tolerance of processing delays will be much lower.

This problem is generally caused by the amount of data to analyze with the processing power of the database and the number of users and API calls. Collectively, this determines how well the database can keep up with the application.

However, there are a variety of approaches to building an interactive data experience with any generic OLAP database when there's a lot of data. The issue? These come at a price. For instance, precomputing all the queries makes the architecture very expensive and rigid, while aggregating the data first minimizes the insights, and limiting the data analyzed to recent events doesn't give users the complete picture. All of these ways involve making compromises.

There is, however, a no-compromise approach that can deliver an optimized architecture and data format built for interactivity at scale. This comes in the form of Apache Druid — a high-performance, real-time analytics database to power analytics applications for any number of users.

Druid employs a uniquely distributed and elastic architecture that prefetches data from a shared data layer into a near-infinite cluster of data servers. This architecture enables faster performance than a decoupled query engine like a cloud data warehouse because there's no data to move and more scalability than a scale-up database like PostgreSQL/MySQL.

Furthermore, Druid provides automatic, multi-level indexing that is built into the data format to drive more queries per core. This goes beyond the typical OLAP columnar format with the addition of a global index, data dictionary, and bitmap index. In doing so, it maximizes CPU cycles for faster crunching.

High Availability Should be a High Priority

To illustrate the value of these capabilities, consider this scenario: if a dev team is building a backend for internal reporting, does it really matter if it goes down for a few minutes or even longer?

For most, the answer is probably not and explains why there's always been tolerance for unplanned downtime and maintenance windows in classical OLAP databases and data warehouses.

But what if the dev team then needs to build an external analytics application that customers will use?

Any outages here can impact revenue, with a serious knock-on effect on issues as varied as team resources to customer satisfaction. As a result, resilience — both high availability and data durability — must be a priority when choosing a database for external analytics applications.

Delivering resilience means posing some important design criteria questions — can you protect from a node or a cluster-wide failure?

How bad would it be to lose data?

What work is involved to protect your app and your data?

The legacy approach to achieving greater resiliency is to replicate nodes and to remember to take backups. But when dev teams are building apps for customers, the sensitivity to data loss is much higher, and as a result, occasional backups aren't fit for purpose.

In contrast, Druid's core architecture is designed to withstand downtime without losing data (even recent events) by implementing high availability (HA) and durability based on automatic, multi-level replication with shared data in S3/object storage. This not only enables the HA properties dev teams expect but also a form of continuous backup that automatically protects and restores the latest state of the database even if an entire cluster is lost.

Cost-Performance Benefits

Building a database that delivers high concurrency means striking the right balance between CPU usage, scalability, and cost. Historically, addressing concurrency was a matter of allocating more hardware to the challenge, and while adding more CPUs certainly allows organizations to run more queries, it can easily become very expensive.

In contrast, databases like Apache Druid are built with optimized storage and query engine that drives down CPU usage. By only reading the data it needs to, the infrastructure can serve more queries in the same timespan.

This is also an important consideration when building external applications that will deliver the performance and resilience required both today and in the future. For those organizations focused on customer retention, being able to scale their infrastructure is key to remaining competitive.

David Wang is VP of Product Marketing at Imply
Share this

Industry News

September 21, 2023

Red Hat and Oracle announced the expansion of their alliance to offer customers a greater choice in deploying applications on Oracle Cloud Infrastructure (OCI). As part of the expanded collaboration, Red Hat OpenShift, the industry’s leading hybrid cloud application platform powered by Kubernetes for architecting, building, and deploying cloud-native applications, will be supported and certified to run on OCI.

September 21, 2023

Harness announced the availability of Gitness™, a freely available, fully open source Git platform that brings a new era of collaboration, speed, security, and intelligence to software development.

September 20, 2023

Oracle announced new application development capabilities to enable developers to rapidly build and deploy applications on Oracle Cloud Infrastructure (OCI).

September 20, 2023

Sonar announced zero-configuration, automatic analysis for programming languages C and C++ within SonarCloud.

September 20, 2023

DataStax announced a new JSON API for Astra DB – the database-as-a-service built on the open source Apache Cassandra® – delivering on one of the most highly requested user features, and providing a seamless experience for Javascript developers building AI applications.

September 19, 2023

Oracle announced the availability of Java 21.

September 19, 2023

Mirantis launched Lens AppIQ, available directly in Lens Desktop and as (Software as a Service) SaaS.

September 19, 2023

Buildkite announced the company has entered into a definitive agreement to acquire Packagecloud, a cloud-based software package management platform, in an all stock deal.

September 19, 2023

CrowdStrike has agreed to acquire Bionic, a provider of Application Security Posture Management (ASPM).

September 18, 2023

Perforce Software announces BlazeMeter's Test Data Pro, the latest addition to its continuous testing platform.

September 18, 2023

CloudBees announced a new cloud native DevSecOps platform that places platform engineers and developer experience front and center.

September 18, 2023

Akuity announced a new open source tool, Kargo, to implement change promotions across many application life cycle stages using GitOps principles.

September 14, 2023

CloudBees announced significant performance and scalability breakthroughs for Jenkins® with new updates to its CloudBees Continuous Integration (CI) software.