Dive into CnosDB 100,000 Lines of Source Code with Ease

Conquering Inaccessible Databases? CnosDB to the Rescue! Dive into 100,000 Lines of Source Code with Ease! Many members of our community have been curious about CnosDB and where to begin reading its source code. This question was discussed during a previous CnosDB HiTea livestream, and today we’ll revisit the topic. The CnosDB source code can …

Dive into CnosDB 100,000 Lines of Source Code with Ease Read More »

The Rise of Super Unicorn Databricks

With the explosive growth of internet data, data has become a new type of resource for businesses, as valuable as oil. More companies are looking to leverage various structured and unstructured data to gain a competitive edge.However, they are facing challenges such as complex legacy infrastructure, resolving data silos, and managing high latency. Consequently, the demand for data lakes has been steadily increasing. A data lake is a storage repository that ingests vast amounts of raw data in its native format, enabling enterprises to easily access them when needed. Databricks is currently a super unicorn company in the primary market. It helps businesses prepare data for analysis, supports the adoption of machine learning, and enables data-driven decision-making. It also empowers data scientists to collaborate with data engineers and other business departments to build data products. Today, it has expanded into a broader lake-warehouse integration with the Databricks Marketplace. 01 Apache Spark, the beginning The Databricks team, consisting of seven computer science Ph.D. holders, embarked on the development of the Spark engine for data processing. In 2014, the project set a world record for data sorting speed. To make Spark accessible to a broader user base, they chose to open-source it and founded Databricks in 2013. In the same year, the company completed its Series A funding round, led by A16z. In January 2016, Databricks appointed a new CEO. A year later, the company closed its first million-dollar deal. Overall, the Databricks team is the core developer of Apache Spark, possessing significant influence and expertise. Thus, as a commercial company built around Spark, Databricks rightfully claims its position in the market. 02 Expanding Product Line for Revenue Diversification Databricks initially focused on Spark, which was used for querying large, unstructured datasets stored in data lakes. However, to cater to the market demands, Databricks expanded into a lake-warehouse platform. Built on Spark, the platform includes Delta Lake, which provides ACID transactions and data versioning for data lakes; MLflow, an open-source platform for managing machine learning workflows; and Redash, a SQL-based data analysis collaboration tool. Overall, the Databricks lake-warehouse platform combines elements of both data lakes and data warehouses. It offers the flexibility, cost-effectiveness, and scalability of data lakes, while also providing data management and ACID transactions typically found in data warehouses. Users can enable business intelligence and machine learning on all their data. Databricks products are available on major cloud services such as AWS, Azure, and GCP, providing a unified environment for data, analytics, and machine learning workloads. Visualization can become an integral part of these different activities. 03 Data Lake Market Growth with User Enterprises Spanning Large, Medium, and Small Size Databricks believes that businesses are moving away from isolated systems for data storage and opting for centralized data repositories. This approach enables enterprises to gain deeper insights into past and future trends through business intelligence and predictive analytics. Data lake technology is precisely based on this concept, allowing for the storage of all types and sources of data together. Statistics indicate that the data lake market is projected to grow from $7.9 billion in 2019 to $20.1 billion in 2024. This growth signifies the increasing adoption and recognition of data lakes as a valuable asset for organizations across industries, with users spanning large, medium, and small enterprises. Furthermore, Databricks serves customers across large, medium, and small enterprises spanning various industries. As of March 2023, it has garnered over 9,000 enterprise customers worldwide. Some notable customers include AT&T, Shell, Burberry, Toyota, Adobe, Condé Nast, and Regeneron Pharmaceuticals. If we divide the ARR (Annual Recurring Revenue) ,$1 billion, of Databricks at the end of Q2 2022 by the customer count of 7,000+ at the end of Q2 2022, we can roughly estimate the ACV (Average Contract Value) of Databricks to be around $143,000. In comparison, the estimated ACV of Snowflake is $301,000 as of Q3 2023, indicating that there is still room for improvement for Databricks to increase ACV. 04 Triple Threat In 2012, Snowflake, founded by former Oracle architects, emerged as a formidable competitor to Databricks. Initially positioning itself as a cloud data platform for data warehousing and analytics workloads, Snowflake primarily targeted business analysts and data engineers. Concurrently, Databricks garnered favor among data scientists and machine learning engineers. However, the boundaries between the two have become blurred. For instance, Snowflake has introduced features like Snowpark for Data Science, transactional databases, and Python support, aiming to attract data scientists. On the other hand, Databricks has launched products such as Databricks SQL, Delta Lake capabilities, and the Unity catalog to cater to customers focused on data storage and security. In terms of their models, Snowflake operates within a closed-source ecosystem, while Databricks is open-source. Databricks’ primary product lines are available for free, with customers having the option to choose Databricks’ enterprise offerings for more advanced features and support. Snowflake provides ready-made solutions, enabling companies to quickly embark on basic analytics, while Databricks offers better customization and configuration, allowing customers to have full control over their settings. By the end of 2022, Snowflake had an annual revenue of $2.1 billion, while Databricks projected an annual revenue of $1.4 billion. The competition between the two is expected to intensify. The second type of competitors is the cloud providers themselves. Databricks competes with the proprietary products offered by cloud providers. For example, AWS has Amazon EMR, Azure has Azure HDInsight, and GCP has Dataproc for big data processing. In terms of business analytics solutions, Amazon QuickSight, Azure’s Power BI Embedded, and GCP’s Looker compete with Databricks. Lastly, Databricks faces competition from specialized data management and scientific domain solution companies. For instance, Databricks’ scheduler competes with Apache Airflow, and its MLflow product competes with DataRobot and Alteryx. 05 Sustained Revenue Growth to be a Capital-Acknowledged Unicorn Databricks, as an open-source software, generates revenue by offering additional features and services for a fee. It provides a fully managed version of its open-source software to enterprises, along with auxiliary tools such as SaaS query-writing tools and connectors for data sources. In terms of the pricing model, Databricks charges based on the amount of compute resources consumed by the customers per second. To accomplish this, they have introduced their proprietary unit of measurement called DBU (Databricks Unit), where the number of DBUs consumed by a workload depends on various factors including the compute resources utilized, the volume of data processed, the region, the pricing tier, and the type of service being used. Additionally, in order to attract users, similar to other open-source companies, Databricks offers a 14-day free trial period to users. On the financial side, Databricks has also achieved remarkable growth. At the end of Q3 2019, its ARR was $200 million, its revenue was $425 million for the full year 2020, and its ARR exceeded $800 million in 2021. As of August 2022, Databricks’ ARR has exceeded $1 billion, with annual growth of over 70%. As of August 2021, Databricks’ valuation was $38 billion, and it has raised a total of $3.5 billion in capital markets. Its investors include A16z, Tiger Global, Amazon Web Services, Microsoft, and Coatue. However, there have also been reports that in October 2022, Databricks reduced its internal stock price, resulting in a valuation downgrade to $31 billion, a decrease of approximately 7% from the same period in 2021. Nonetheless, Databricks remains a super unicorn in the primary market. 06 Trends, Opportunities, and Risks With the decrease in cloud storage costs and improved internet speeds, more and more companies are choosing to store all their data in centralized repositories instead of separately storing different types of data. This centralization trend helps businesses gain better insights into their operations through real-time business intelligence and predictive analytics. Additionally, the exponential growth of data has made it impractical for companies to maintain multiple large-scale data stores, leading to the convergence of data lakes and data warehouses into a single platform. ChatGPT has become a topic across industries, and Databricks has embraced this wave by offering its Unified Data Analytics platform, which allows data teams to store and protect data, generate analytics and insights, and drive the development of machine learning tools. Moreover, Databricks provides integration with popular artificial intelligence frameworks such as TensorFlow and PyTorch, making it easier to build and deploy machine learning models. Databricks relies on cloud infrastructure providers like AWS, Azure, and GCP to deliver its services. Looking back, the partnership with Microsoft was a milestone for Databricks, as it helped the company’s revenue grow from under $1 million in early 2017 to over $100 million in 2018. Any changes in relationships with major cloud providers could impact Databricks’ service capabilities. In summary, we have reason to believe that in this era of data expansion and the rise of AI, Databricks’ offering of a unified data storage and analytics platform holds value for enterprises. The company has a great opportunity and capability to capitalize on this wave, although it also faces challenges along the way.

Hex Tech: The Crisis and Opportunity of a Programming-enabled BI Platform

Data practitioners often jump between various tools, and this fragmentation leads to problems in collaboration, sharing, and productivity. The increase in enterprise cloud data volumes and the emergence of data transformation, model building, and visualization tools have driven the rise of modern data stacks. Most companies are increasing their investment in data teams to adapt …

Hex Tech: The Crisis and Opportunity of a Programming-enabled BI Platform Read More »

CnosDB Becomes The First Time-series Database That supports Sqllogictest, Further Improving Stability and Reliability

The cloud-native time-series database CnosDB has introduced the sqllogictest integration framework. By integrating sqllogictest into CnosDB, developers can more easily test and verify the correctness of the database, and quickly discover and solve potential issues. This new integration has significant implications for CnosDB, providing developers with more efficient and reliable testing tools. CnosDB also becomes …

CnosDB Becomes The First Time-series Database That supports Sqllogictest, Further Improving Stability and Reliability Read More »

DeepL: The Fast Expansion Path of a Slow Company

Why does DeepL dare to claim to be the “world’s most accurate translator”? DeepL: The Fast Expansion Path of a Slow Company Language is the carrier passed down from generation to generation in human civilization. As the saying goes, “Ten miles of different sounds, five miles of different tones”, language also tells the story of …

DeepL: The Fast Expansion Path of a Slow Company Read More »

Airbyte, The Future of Data Integration

Gartner has predicted that by 2025, 80% of organizations seeking to expand their digital businesses will fail because they have not adopted modern approaches to data and analytics governance. The data ecosystem is the most important part of the infrastructure ecosystem, and the processing, distribution, and computation of data throughout the data circulation ecosystem are …

Airbyte, The Future of Data Integration Read More »

CnosDB Assists Paifang Technology in Predictive Maintenance of Rotating Machinery

Modern industrial production is advancing vigorously towards large-scale, high-speed, and intelligent direction. As productivity and intelligence gradually improve, major manufacturers are increasingly demanding safety and reliability of their mechanical equipment, especially for some large rotating machinery such as electric motors, fan units, hydraulic pumping stations, and so on. They have become key equipment in modern …

CnosDB Assists Paifang Technology in Predictive Maintenance of Rotating Machinery Read More »

Using CnosDB and TensorFlow for Time Series Prediction

CnosDB is a high-performance time-series database based on a distributed architecture. TensorFlow, on the other hand, is one of the most popular deep learning frameworks for prediction. In this article, you will learn how to use time-series data for prediction, specifically using CnosDB and TensorFlow. Due to the autocorrelation of time-series data, many data science …

Using CnosDB and TensorFlow for Time Series Prediction Read More »