ICEBERG Table Service Functions
iceberg_expire_snapshots
Description: Each commit in Iceberg generates a new snapshot while retaining old data and metadata for snapshot isolation and time travel. Expire snapshots can be used to clean up old snapshots that are no longer needed and data files only included in unnecessary snapshots to improve query and cost efficiency.
Notes: Clearly define the cleaning and retention policy for old snapshots. After executing the cleanup of old snapshots, the deleted old snapshot data will no longer be accessible through regular queries. If a running query relies on the soon-to-expire snapshot data, it may lead to query failure or inaccurate results.
Syntax:
1
|
iceberg_expire_snapshots('schema.table','older_than') |
Input parameters:
Parameter |
Type |
Default Value |
Mandatory |
Description |
---|---|---|---|---|
table |
string |
None |
Yes |
Table where operations are to be performed |
older_than |
timestamp |
5 days ago |
Yes |
Snapshots exceeding this time limit will be deleted. Note: You need to input the specific time zone to avoid misunderstandings. It can be specified at input or through set timezone=xxx. |
Return values:
Parameter |
Description |
---|---|
deleted_data_files_count |
Number of deleted data files. |
deleted_position_delete_files_count |
Number of deleted position delete files. |
deleted_equality_delete_files_count |
Number of deleted equality delete files. |
deleted_manifest_files_count |
Number of deleted manifest files. |
deleted_manifest_lists_count |
Number of deleted snapshot files. |
deleted_statistics_files_count |
Number of deleted statistics files. |
iceberg_rewrite_manifests
Description: The Iceberg table tracks each data file. The more data files, the more metadata stored in the snapshot file. rewrite manifests can re-write the table's manifest to optimize scan planning.
Notes: This operation generates a new manifest file, potentially affecting ongoing queries and write operations, leading to metadata conflicts. Avoid concurrent execution with other operations.
Syntax:
1
|
iceberg_rewrite_manifests('schema.table') |
Input parameters:
Parameter |
Type |
Default Value |
Mandatory |
Description |
---|---|---|---|---|
table |
string |
None |
Yes |
Table where operations are to be performed |
Return values:
Parameter |
Description |
---|---|
rewritten_manifests_count |
Number of rewritten manifest files. |
added_manifests_count |
Number of new manifest files. |
iceberg_remove_orphanfiles
Description: Files not referenced by any Iceberg metadata file are termed orphaned files. Such files might result from failed transactions, unsuccessful concurrent commits, failures in old snapshot cleanup, or metadata cleanup errors, and can generally be deleted.
Notes: Ensure no concurrent write operations exist. During deletion, if there is a running write task, it might mistakenly delete files being generated (uncommitted temporary files).
Syntax:
1
|
iceberg_remove_orphanfiles('schema.table') |
Input parameters:
Parameter |
Type |
Default Value |
Mandatory |
Description |
---|---|---|---|---|
table |
string |
None |
Yes |
Table where operations are to be performed |
Return values:
Parameter |
Description |
---|---|
orphan_file_location |
Path of the file identified as an orphaned file. |
iceberg_add_files
Description: Add all files under the target folder to the Iceberg table.
This syntax recursively scans all files under the target folder and adds them to the Iceberg table.
All files must satisfy:
- Must be in Parquet format.
- Column names, types, and counts (including partition columns) in the Parquet file must match those in the target table.
- All data within a single Parquet file must belong to the same partition.
Syntax:
1
|
iceberg_add_files(table,foldername,iscopy) |
Input parameters:
- table: table name, which is of the regclass type.
- foldername: folder path list, which is of the text type. Multiple paths are supported and are separated by commas (,).
- iscopy: whether to copy, which is of the Boolean type.
Return value: whether the execution is successful, which is of the Boolean type.
iceberg_show_snapshots
Definition: Retrieves snapshot information for a specified ICEBERG table.
Syntax:
1
|
iceberg_show_snapshots(table) |
Input parameters: table: table name, which is of the regclass type.
Return values:
- sequence_number: snapshot sequence number, which is of the int64 type.
- snapshot_id: snapshot ID, which is of the int64 type.
- create_time: snapshot creation time, which is of the timestamptz type.
- parent_id: snapshot parent node, which is of the int64 type. If the value is 0, no parent node exists.
- operation: snapshot operation type, which is of the text type.
- manifestlist: path to the manifest list associated with the snapshot, which is of the text type.
- summary: snapshot statistics, which is of the text type.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot