Updated on 2025-09-18 GMT+08:00

ICEBERG Table Service Functions

iceberg_expire_snapshots

Description: Each commit in Iceberg generates a new snapshot while retaining old data and metadata for snapshot isolation and time travel. Expire snapshots can be used to clean up old snapshots that are no longer needed and data files only included in unnecessary snapshots to improve query and cost efficiency.

Notes: Clearly define the cleaning and retention policy for old snapshots. After executing the cleanup of old snapshots, the deleted old snapshot data will no longer be accessible through regular queries. If a running query relies on the soon-to-expire snapshot data, it may lead to query failure or inaccurate results.

Syntax:

1
iceberg_expire_snapshots('schema.table','older_than')

Input parameters:

Parameter

Type

Default Value

Mandatory

Description

table

string

None

Yes

Table where operations are to be performed

older_than

timestamp

5 days ago

Yes

Snapshots exceeding this time limit will be deleted.

Note: You need to input the specific time zone to avoid misunderstandings. It can be specified at input or through set timezone=xxx.

Return values:

Parameter

Description

deleted_data_files_count

Number of deleted data files.

deleted_position_delete_files_count

Number of deleted position delete files.

deleted_equality_delete_files_count

Number of deleted equality delete files.

deleted_manifest_files_count

Number of deleted manifest files.

deleted_manifest_lists_count

Number of deleted snapshot files.

deleted_statistics_files_count

Number of deleted statistics files.

iceberg_rewrite_manifests

Description: The Iceberg table tracks each data file. The more data files, the more metadata stored in the snapshot file. rewrite manifests can re-write the table's manifest to optimize scan planning.

Notes: This operation generates a new manifest file, potentially affecting ongoing queries and write operations, leading to metadata conflicts. Avoid concurrent execution with other operations.

Syntax:

1
iceberg_rewrite_manifests('schema.table')

Input parameters:

Parameter

Type

Default Value

Mandatory

Description

table

string

None

Yes

Table where operations are to be performed

Return values:

Parameter

Description

rewritten_manifests_count

Number of rewritten manifest files.

added_manifests_count

Number of new manifest files.

iceberg_remove_orphanfiles

Description: Files not referenced by any Iceberg metadata file are termed orphaned files. Such files might result from failed transactions, unsuccessful concurrent commits, failures in old snapshot cleanup, or metadata cleanup errors, and can generally be deleted.

Notes: Ensure no concurrent write operations exist. During deletion, if there is a running write task, it might mistakenly delete files being generated (uncommitted temporary files).

Syntax:

1
iceberg_remove_orphanfiles('schema.table')

Input parameters:

Parameter

Type

Default Value

Mandatory

Description

table

string

None

Yes

Table where operations are to be performed

Return values:

Parameter

Description

orphan_file_location

Path of the file identified as an orphaned file.

iceberg_add_files

Description: Add all files under the target folder to the Iceberg table.

This syntax recursively scans all files under the target folder and adds them to the Iceberg table.

All files must satisfy:

  • Must be in Parquet format.
  • Column names, types, and counts (including partition columns) in the Parquet file must match those in the target table.
  • All data within a single Parquet file must belong to the same partition.

Syntax:

1
iceberg_add_files(table,foldername,iscopy)

Input parameters:

  • table: table name, which is of the regclass type.
  • foldername: folder path list, which is of the text type. Multiple paths are supported and are separated by commas (,).
  • iscopy: whether to copy, which is of the Boolean type.

Return value: whether the execution is successful, which is of the Boolean type.

iceberg_show_snapshots

Definition: Retrieves snapshot information for a specified ICEBERG table.

Syntax:

1
iceberg_show_snapshots(table)

Input parameters: table: table name, which is of the regclass type.

Return values:

  • sequence_number: snapshot sequence number, which is of the int64 type.
  • snapshot_id: snapshot ID, which is of the int64 type.
  • create_time: snapshot creation time, which is of the timestamptz type.
  • parent_id: snapshot parent node, which is of the int64 type. If the value is 0, no parent node exists.
  • operation: snapshot operation type, which is of the text type.
  • manifestlist: path to the manifest list associated with the snapshot, which is of the text type.
  • summary: snapshot statistics, which is of the text type.