Help Center/
DataArts Fabric/
Developer Guide/
Data-AI Convergence/
DataArts Fabric DataFrame/
Scenario Practices/
Examples of DataFrames Without UDFs
Updated on 2025-09-18 GMT+08:00
Examples of DataFrames Without UDFs
Below is an example using query1 from tpch to demonstrate the usage of DataFrame.
The query SQL is as follows:
SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= CAST('1998-09-02' AS date) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
The corresponding DataFrame logic is as follows:
import ibis # Import Ibis dependencies. con = ibis.fabric.connect( # Call the DataArts Fabric backend connection to establish a connection. endpoint=FABRIC_ENDPOINT, # Specify the service region. For details, see Regions and Endpoints. endpoint_id=FABRIC_ENDPOINT_ID, # Querying an Endpoint ID access_key=ACCESS_KEY, # Obtaining an AK/SK secret_key=SECRET_KEY, project_id=FABRIC_PROJECT_ID, # How Do I Obtain a Project ID? catalog_name=IBIS_TEST_FABRIC_CATELOG, # Connect to a specified catalog. workspace_id=FABRIC_WORKSPACE_ID, # Obtaining a Workspace ID lakeformation_instance_id=IBIS_TEST_FABRIC_LAKEFORMATION_INSTANCE_ID, # LakeFormation instance ID. For details, see Mapping Between DataArts Fabric SQL and LakeFormation Data Types. obs_directory_base=OBS_DIRECTORY_BASE, # Storage path for UDFs in OBS. obs_bucket_name=OBS_BUCKET_NAME, # OBS bucket name. obs_server=OBS_SERVER, # OBS access address. For details, see Endpoints and Domain Names. ) t = con.table("lineitem", database="tpch") # Retrieve table information by connecting to the backend and create a table object. q = t.filter(t.l_shipdate <= add_date("1998-12-01", dd=-90)) discount_price = t.l_extendedprice * (1 - t.l_discount) charge = discount_price * (1 + t.l_tax) q = q.group_by(["l_returnflag", "l_linestatus"]) q = q.aggregate( sum_qty=t.l_quantity.sum(), sum_base_price=t.l_extendedprice.sum(), sum_disc_price=discount_price.sum(), sum_charge=charge.sum(), avg_qty=t.l_quantity.mean(), avg_price=t.l_extendedprice.mean(), avg_disc=t.l_discount.mean(), count_order=lambda t: t.count(), ) q = q.order_by(["l_returnflag", "l_linestatus"]) sql = q.compile() # Compile DataFrame into a SQL string. df = q.execute() # Execute the expression and return the result set.
Parent topic: Scenario Practices
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot