Building Analytics Apps for External Users? Here's How to Get it Right
June 01, 2022

David Wang
Imply

Analytics has come a very long way in recent years, with transformational developments widening its impact beyond internal stakeholders and into the hands of external users. This brings with it a range of challenges, and when building analytics applications for customers, for example, one of the first considerations will be the choice of database backend.

While the main options will probably include PostgreSQL, MySQL, or even extending a data warehouse beyond its core BI dashboards and reports, it's important to keep in mind that analytics for external users can be revenue-impacting. As a result, choosing the right tool for the job is essential if organizations are to deliver a high-quality user experience.

Analytics Performance

For many users, the most frustrating element of their analytics experience is performance, and in particular, the wait-state of queries in a processing queue. It's one thing to have an internal business analyst wait a few seconds or even several minutes for a report to load; it's entirely different when analytics functionality is being offered to external users whose tolerance of processing delays will be much lower.

This problem is generally caused by the amount of data to analyze with the processing power of the database and the number of users and API calls. Collectively, this determines how well the database can keep up with the application.

However, there are a variety of approaches to building an interactive data experience with any generic OLAP database when there's a lot of data. The issue? These come at a price. For instance, precomputing all the queries makes the architecture very expensive and rigid, while aggregating the data first minimizes the insights, and limiting the data analyzed to recent events doesn't give users the complete picture. All of these ways involve making compromises.

There is, however, a no-compromise approach that can deliver an optimized architecture and data format built for interactivity at scale. This comes in the form of Apache Druid — a high-performance, real-time analytics database to power analytics applications for any number of users.

Druid employs a uniquely distributed and elastic architecture that prefetches data from a shared data layer into a near-infinite cluster of data servers. This architecture enables faster performance than a decoupled query engine like a cloud data warehouse because there's no data to move and more scalability than a scale-up database like PostgreSQL/MySQL.

Furthermore, Druid provides automatic, multi-level indexing that is built into the data format to drive more queries per core. This goes beyond the typical OLAP columnar format with the addition of a global index, data dictionary, and bitmap index. In doing so, it maximizes CPU cycles for faster crunching.

High Availability Should be a High Priority

To illustrate the value of these capabilities, consider this scenario: if a dev team is building a backend for internal reporting, does it really matter if it goes down for a few minutes or even longer?

For most, the answer is probably not and explains why there's always been tolerance for unplanned downtime and maintenance windows in classical OLAP databases and data warehouses.

But what if the dev team then needs to build an external analytics application that customers will use?

Any outages here can impact revenue, with a serious knock-on effect on issues as varied as team resources to customer satisfaction. As a result, resilience — both high availability and data durability — must be a priority when choosing a database for external analytics applications.

Delivering resilience means posing some important design criteria questions — can you protect from a node or a cluster-wide failure?

How bad would it be to lose data?

What work is involved to protect your app and your data?

The legacy approach to achieving greater resiliency is to replicate nodes and to remember to take backups. But when dev teams are building apps for customers, the sensitivity to data loss is much higher, and as a result, occasional backups aren't fit for purpose.

In contrast, Druid's core architecture is designed to withstand downtime without losing data (even recent events) by implementing high availability (HA) and durability based on automatic, multi-level replication with shared data in S3/object storage. This not only enables the HA properties dev teams expect but also a form of continuous backup that automatically protects and restores the latest state of the database even if an entire cluster is lost.

Cost-Performance Benefits

Building a database that delivers high concurrency means striking the right balance between CPU usage, scalability, and cost. Historically, addressing concurrency was a matter of allocating more hardware to the challenge, and while adding more CPUs certainly allows organizations to run more queries, it can easily become very expensive.

In contrast, databases like Apache Druid are built with optimized storage and query engine that drives down CPU usage. By only reading the data it needs to, the infrastructure can serve more queries in the same timespan.

This is also an important consideration when building external applications that will deliver the performance and resilience required both today and in the future. For those organizations focused on customer retention, being able to scale their infrastructure is key to remaining competitive.

David Wang is VP of Product Marketing at Imply
Share this

Industry News

February 02, 2023

Red Hat announced a multi-stage alliance to offer customers a greater choice of operating systems to run on Oracle Cloud Infrastructure (OCI).

February 02, 2023

Snow Software announced a new global partner program designed to enable partners to support customers as they face complex market challenges around managing cost and mitigating risk, while delivering value more efficiently and effectively with Snow.

February 02, 2023

Contrast Security announced the launch of its new partner program, the Security Innovation Alliance (SIA), which is a global ecosystem of system integrators (SIs), cloud, channel and technology alliances.

February 01, 2023

Red Hat introduced new security and compliance capabilities for the Red Hat OpenShift enterprise Kubernetes platform.

February 01, 2023

Jetpack.io formally launched with Devbox Cloud, a managed service offering for Devbox.

February 01, 2023

Jellyfish launched Life Cycle Explorer, a new solution that identifies bottlenecks in the life cycle of engineering work to help teams adapt workflow processes and more effectively deliver value to customers.

January 31, 2023

Ably announced the Ably Terraform provider.

January 31, 2023

Checkmarx announced the immediate availability of Supply Chain Threat Intelligence, which delivers detailed threat intelligence on hundreds of thousands of malicious packages, contributor reputation, malicious behavior and more.

January 31, 2023

Qualys announced its new GovCloud platform along with the achievement of FedRAMP Ready status at the High impact level, from the Federal Risk and Authorization Management Program (FedRAMP).

January 30, 2023

F5 announced the general availability of F5 NGINXaaS for Azure, an integrated solution co-developed by F5 and Microsoft that empowers enterprises to deliver secure, high-performance applications in the cloud.

January 30, 2023

Tenable announced Tenable Ventures, a corporate investment program.

January 26, 2023

Ubuntu Pro, Canonical’s comprehensive subscription for secure open source and compliance, is now generally available.

January 26, 2023

Mirantis, freeing developers to create their most valuable code, today announced that it has acquired the Santa Clara, California-based Shipa to add automated application discovery, operations, security, and observability to the Lens Kubernetes Platform.

January 25, 2023

SmartBear has integrated the powerful contract testing capabilities of PactFlow with SwaggerHub.

January 25, 2023

Venafi introduced TLS Protect for Kubernetes.