StreamSets Adds New Features to DataOps Platform
September 11, 2018

StreamSets announced innovations that help companies efficiently build and continuously operate dataflows that span their data center and leading cloud platforms — AWS, Microsoft Azure and Google Cloud Platform.

New capabilities include data drift handling for cloud data stores for improved pipeline resiliency, continuous integration and delivery (CI/CD) automation that brings DevOps-style agility to dataflow pipelines, and the ability to centrally manage in-stream data protection policies for security and compliance.

These features build on StreamSets DataOps Platform’s rich catalog of cloud connectors, its cloud-native architecture for easy cross-platform deployment, and its ability to elastically scale dataflows via Kubernetes.

Features such as data drift handling and in-stream data protection are powered by StreamSets’ unique Intelligent Pipelines capability, which inspects and analyzes data in-flow, overcoming the lack of visibility common in traditional data integration and big data ingestion approaches.

A majority of StreamSets customers already use the StreamSets DataOps Platform for cloud dataflows, executing both “lift and shift” cloud migration projects that require peak throughput, and continuous real-time streaming of data.

“As our customers embark on their hybrid cloud journey, we see first-hand their struggle to orchestrate end-to-end management of data movement across a growing range of on-premises and cloud platforms,” said Arvind Prabhakar, CTO, StreamSets. “Our DataOps platform was architected as cloud-native from the start, allowing us to easily evolve with the market. Cloud drift-handling and CI/CD for dataflows are unique enhancements that help our customers on their journey from traditional to modern data integration based on DataOps.“

The expansion of data architectures into the cloud creates challenges for enterprises that still rely on traditional data integration software or single-purpose big data ingestion tools. Using these methods, pipelines take too long to build and deploy, and often rely on valuable, specialized developers. They are opaque, denying end-to-end visibility into pipeline performance to prevent failures or detect sensitive personal data in the dataflow. Finally, they are rigid, breaking whenever data drift occurs, such as when fields are added or changed or data platforms are upgraded.

With these new features, which began rolling out in late August, StreamSets DataOps Platform now offers:

- Development automation through a full-featured dataflow designer that includes “easy button” connectors for Amazon S3, Elastic MapReduce (EMR) and RedShift; Azure Data Lake Storage, HDInsight and Azure Databricks; Google DataProc and Snowflake

- Elastic scaling of cloud, multi-cloud and reverse hybrid cloud dataflows via Kubernetes

- New data drift handling, which automatically reflects updates to source schema in Amazon Athena, Azure SQL and Google BigQuery cloud data services

- A new CI/CD framework for automating frequent changes to dataflows through iterative design, test, validate and deployment steps

- New central governance of StreamSets Data Protector policies that detect and deal with sensitive data such as PII and PHI

Share this

Industry News

October 02, 2023

Spectro Cloud announced Palette EdgeAI to simplify how organizations deploy and manage AI workloads at scale across simple to complex edge locations, such as retail, healthcare, industrial automation, oil and gas, automotive/connected cars, and more.

September 28, 2023

Kong announced Kong Konnect Dedicated Cloud Gateways, the simplest and most cost-effective way to run Kong Gateways in the cloud fully managed as a service and on enterprise dedicated infrastructure.

September 28, 2023

Sisense unveiled the public preview of Compose SDK for Fusion.

September 28, 2023

Cloudflare announced Hyperdrive to make every local database global. Now developers can easily build globally distributed applications on Cloudflare Workers, the serverless developer platform used by over one million developers, without being constrained by their existing infrastructure.

September 27, 2023

Kong announced full support for Kong Mesh in Konnect, making Kong Konnect an API lifecycle management platform with built-in support for Kong Gateway Enterprise, Kong Ingress Controller and Kong Mesh via a SaaS control plane.

September 27, 2023

Vultr announced the launch of the Vultr GPU Stack and Container Registry to enable global enterprises and digital startups alike to build, test and operationalize artificial intelligence (AI) models at scale — across any region on the globe. \

September 27, 2023

Salt Security expanded its partnership with CrowdStrike by integrating the Salt Security API Protection Platform with the CrowdStrike Falcon® Platform.

September 26, 2023

Progress announced a partnership with Software Improvement Group (SIG), an independent technology and advisory firm for software quality, security and improvement, to help ensure the long-term maintainability and modernization of business-critical applications built on the Progress® OpenEdge® platform.

September 26, 2023

Solace announced a new version of its Solace Event Portal solution that gives organizations with Apache Kafka deployments better visibility into, and control over, their Kafka event streams, brokers and associated assets.

September 26, 2023

Reply launched a proprietary framework for generative AI-based software development, KICODE Reply.

September 26, 2023

Harness announced the industry-wide Engineering Excellence Collective™, an engineering leadership community.

September 25, 2023

Harness announced four new product modules on the Harness platform.

September 25, 2023

Sylabs announced the release of SingularityCE 4.0.

September 25, 2023

Timescale announced the launch of Timescale Vector, enabling developers to build production AI applications at scale with PostgreSQL.