NoSQL Database Types Explained: Column-oriented ... - TechTarget

Column-oriented database examples

The increasing demand for high-performance analytics on large data sets increases the demand for columnar databases. The choice between open source and commercial columnar databases often depends on budget, required features, in-house expertise and specific uses. Many organizations use a mix of both; they might use open source tools for some applications and commercial systems for others.

Here are a few examples of the most popular systems, both open source and commercially licensed, typically used for the most common use cases. Tools were selected using insight from G2 review rankings, research from IT Market Strategy and additional market research by TechTarget editors. This unranked list is in alphabetical order.

Amazon Redshift is a fully managed, cloud-based columnar database that organizations often use for data warehousing. Redshift is for large-scale analytics and business intelligence use cases. It handles complex queries across petabyte-scale data sets using massively parallel processing. A key advantage of Redshift is that it integrates seamlessly with the AWS ecosystem of services and applications, and it supports high-speed queries, fast data compression that reduces storage size by up to 35%, and elastic scaling. Amazon offers pay-as-you-go pricing, which can be cost-effective and helps make Redshift a popular system for use alongside other databases. It often acts as a cost-efficient store for older, less-frequently-accessed data in data warehousing, reporting and analytics scenarios.

Apache Cassandra has on-premises, cloud and hybrid deployment configurations. Its open source license offers community support through Planet Cassandra, which has resources from monthly global meetups to regular onboarding meetings for new users. However, the learning curve for initial setup and optimization is steep. The highly scalable and fault-tolerant system can handle large volumes of data distributed across multiple nodes. Cassandra has tunable consistency levels to customize the tradeoff between data that is consistent across all servers or available for use with very low latency. It's popular for IoT scenarios with streaming data and its reduced cost of ownership.

ClickHouse is an open source columnar system initially developed by the Russian internet giant Yandex. It excels at OLAP and features a highly available, high-performance architecture for mission-critical analytics in real-time advertising, spot-pricing and telecommunications. ClickHouse can handle large-scale data sets with real-time data ingestion and fast query performance. Limitations include the lack of native full-text search and a more limited community and ecosystem than Apache Cassandra, a drawback for open source software.

Microsoft Azure Cosmos DB is a multimodel architecture, which means it can support various data models, such as document, key-value and graph databases. Column-oriented is one of the most important and commonly used configurations. The cloud-based database offers multiple APIs for developers, including SQL, MongoDB and Cassandra. To support global applications, Cosmos automates replication and has tunable consistency levels. It's a popular choice for web applications, especially mission-critical ones across multiple regions.

Donald Farmer is a data strategist with 30+ years of experience, including as a product team leader at Microsoft and Qlik. He advises global clients on data, analytics, AI and innovation strategy, with expertise spanning from tech giants to startups. He lives in an experimental woodland home near Seattle.

Alex Williams is an independent IT consultant and owner of Hosting Data UK. He has almost a decade of experience as a developer and is knowledgeable in IT systems, cybersecurity, data management, internet privacy and finance.

Tag » What Are Wide Column Databases