Home » Blog » CnosDB x LangChain: Querying Time-Series Databases with Conversational Interface

CnosDB x LangChain: Querying Time-Series Databases with Conversational Interface

The topic of large models and related discussions has dominated the charts for over half a year. While discussing and paying attention to this, the CnosDB technical team has integrated large models and artificial intelligence-related technologies with database development and practices. Following the comprehensive integration of TensorFlow into CnosDB and the application of Copilot and Cursor in production, the CnosDB technical team has recently integrated CnosDB with LangChain, enabling users to query time-series data using natural language through the LangChain framework. Examples of the integration between CnosDB and LangChain can be found at: (https://python.langchain.com/docs/ecosystem/integrations/cnosdb)

Due to its support for standard SQL, CnosDB has become the world’s first time-series database to integrate with the LangChain ecosystem. After integrating the LangChain framework and incorporating GPT, customers can easily retrieve relevant query results from the database by asking questions such as “What is the average temperature of various weather observation stations in Beijing in the past hour?” or “What are the highest and lowest temperatures in Shanghai this month?” without the need to write any SQL queries.

In this article, we will mainly introduce how to connect CnosDB database using LangChain and achieve communication between natural language and the database.

Introduction of CnosDB and LangChain

CnosDB is an open-source distributed time-series database that offers high performance, high compression rates, and ease of use. It is primarily designed for applications in IoT, industrial IoT, connected vehicles, and IT operations. All the code is available on GitHub.

CnosDB possesses the following features:

  • High Performance: CnosDB addresses the issue of time series expansion, theoretically supporting unlimited time series. It supports aggregate queries along the time axis, including queries with time interval-based windowing, queries with windowing based on enumerated column values, and queries with windowing based on the time interval between adjacent time series records. It has caching capabilities for the latest data and can be configured with cache space to enable fast retrieval of the most recent data.
  • Simplicity and Ease of Use: CnosDB provides clear and straightforward interfaces with easy project configuration. It supports standard SQL, making it easy to get started and seamlessly integrates with third-party tool ecosystems. It offers convenient data access functionality and supports schemaless data ingestion and historical data replay (including out-of-order writes).
  • Cloud-Native: CnosDB has a native distributed design, data sharding and partitioning, storage and computation separation, Quorum mechanism, Kubernetes deployment, and comprehensive observability. It ensures eventual consistency and can be deployed on public, private, and hybrid clouds. It provides multi-tenancy with role-based access control. The compute layer supports stateless scaling of nodes, and the storage layer allows horizontal scaling to increase system storage capacity.

LangChain is a framework for developing language model-driven applications. It enables the following capabilities:

  • Data Awareness: Connects the language model with other data sources.
  • Interactivity: Allows the language model to interact with its environment.

The main value of LangChain lies in:

  1. Componentization: It provides abstracted tools for using language models and offers a range of implementations for each abstracted tool. These components are modular and easy to use, regardless of whether you are using other parts of the LangChain framework.
  2. Ready-made Chain Structure: A structured combination of components for performing specific high-level tasks.

The ready-made chain structure makes it easy to get started. For more complex applications and detailed use cases, the components make it easy to customize existing chain structures or build new ones.

By utilizing LangChain’s components and ready-made chains, users no longer need to learn how to interact with databases using SQL scripts in advance, saving a significant amount of time and effort. With the powerful capabilities of LangChain, SQLDatabase, SQL Agent, and OpenAI’s large language models, we can now create applications that allow users to communicate with CnosDB using natural language.

Architecture Diagram

From the architecture diagram, it can be observed that by leveraging the components and ready-made chains of LangChain, users are not required to learn how to interact with databases using SQL scripts in advance, saving a significant amount of time and effort. With the powerful functionalities of LangChain, SQLDatabase, SQL Agent, and OpenAI’s large language models, we are now able to create applications that enable users to communicate with CnosDB using natural language.

Install CnosDB

Taking Docker as an example, we can install CnosDB. For other installation methods, please refer to the official documentation’s installation section (https://docs.cnosdb.com/zh/latest/deploy/install.html).

To install CnosDB using Docker, follow these steps:

  1. Install Docker environment.
  2. Launch a container using Docker.
docker run --name cnosdb -p 8902:8902 -d cnosdb/cnosdb:community-latest cnosdb run -M singleton

3. Enter the ontainer

docker exec -it cnosdb sh

4. Run cnosdb-cli

cnosdb-cli --port 8902

5. After connecting successfully, it appears:

CnosDB CLI v2.3.1
Input arguments: Args { host: "localhost", port: 8902, user: "cnosdb", password: None, database: "public", target_partitions: Some(1), data_path: None, file: [], rc: None, format: Table, quiet: false }
public 

Install LangChain

  1. Implement Code:
pip install langchain

Install ConsDB Conector

pip install cnos-connector
# cnosdb_connector 0.1.8

Connect CnosDB

Using cnosdb_connector and SQLDatabase to connect CnosDB,We need to crate the uri that SQLDatabase needs:

# Use make_cnosdb_langchain_uri to create uri
uri = cnosdb_connector.make_cnosdb_langchain_uri()
# Use SQLDatabase.from_uri to create DB
db = SQLDatabase.from_uri(uri)

Or to use SQLDatabase from_cnosdb method

def SQLDatabase.from_cnosdb(url: str = "127.0.0.1:8902",
                              user: str = "root",
                              password: str = "",
                              tenant: str = "cnosdb",
                              database: str = "public")

Use case:

# 使用 SQLDatabase 连接 CnosDB
from cnosdb_connector import make_cnosdb_langchain_uri
from langchain import SQLDatabase
uri = cnosdb_connector.make_cnosdb_langchain_uri()
db = SQLDatabase.from_uri(uri)# 创建 OpenAI Chat LLM
from langchain.chat_models import ChatOpenAIllm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

SQLDatabaseChain Use Case:

This example demonstrates how to use SQLDatabaseChain to answer a question through a database.

from langchain import SQLDatabaseChain
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)db_chain.run(
    "What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022?"
)
> Entering new  chain...
What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?
SQLQuery:SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time < '2022-10-20'
SQLResult: [(68.0,)]
Answer:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
> Finished chain.

SQL Database Agent Use Case:

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
agent.run(
    "What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
)
> Entering new  chain...
Action: sql_db_list_tables
Action Input: ""
Observation: air
Thought:The "air" table seems relevant to the question. I should query the schema of the "air" table to see what columns are available.
Action: sql_db_schema
Action Input: "air"
Observation: 
CREATE TABLE air (
	pressure FLOAT, 
	station STRING, 
	temperature FLOAT, 
	time TIMESTAMP, 
	visibility FLOAT
)
/*
3 rows from air table:
pressure	station	temperature	time	visibility
75.0	XiaoMaiDao	67.0	2022-10-19T03:40:00	54.0
77.0	XiaoMaiDao	69.0	2022-10-19T04:40:00	56.0
76.0	XiaoMaiDao	68.0	2022-10-19T05:40:00	55.0
*/
Thought:The "temperature" column in the "air" table is relevant to the question. I can query the average temperature between the specified dates.
Action: sql_db_query
Action Input: "SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time <= '2022-10-20'"
Observation: [(68.0,)]
Thought:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0. 
Final Answer: 68.0> Finished chain.

Future: Revisiting AI4DB and DB4AI

Since its inception, CnosDB has been committed to the principles and beliefs of AI4DB and DB4AI, promoting the integration of artificial intelligence with databases and creating an eco-friendly, highly available, and stable time-series database system for artificial intelligence applications.

“AI4DB” refers to the use of AI technologies to enhance the capabilities of databases. For example, AI techniques can be utilized to extract patterns, make predictions, perform classifications from data, or implement more intelligent queries and analysis using natural language processing. This approach improves the efficiency and accuracy of databases, making them more adaptable to constantly changing data environments. On the other hand, “DB4AI” refers to leveraging databases to support AI applications. Databases provide the infrastructure for data storage and management, data cleaning and preprocessing, data access and sharing, etc., thus supporting AI applications. In this context, databases play a role in providing data to AI algorithms for training and prediction purposes.

We firmly believe that in the future, more developers will leverage large models such as GPT to create applications. Therefore, the usage of databases should be better aligned with the specific practices of large models. It is based on this belief that CnosDB has become the first time-series database product to embrace the LangChain ecosystem. A new paradigm for application development is on the horizon, and together, we can embrace the future and create powerful applications that address real-world problems.