In the ever-evolving landscape of data management, efficient and cost-effective solutions have become more crucial than ever. With Microsoft Fabric’s introduction, the traditional Synapse Data Warehouse has been reimagined and integrated into a unified data platform, offering enhanced features and deeper integration within the broader Fabric ecosystem. hence Microsoft Fabric, in conjunction with Fabric Data Warehouse, introduces a game-changing feature: zero-copy clones.
This innovation allows for swift table replication with minimal storage costs, revolutionizing our approach to development, testing, reporting, and data archiving. The blog offers guidance on effectively harnessing the potential of this innovation.
Zero-Copy Clone
A zero-copy clone is a data management technique that involves creating a replica of a table without duplicating the actual data content. In this process, the metadata of the table, including its structure, column names, and constraints, is copied, while the underlying data stored in the original location (such as OneLake, often as Parquet files) is not duplicated.
The creation of a zero-copy clone essentially results in a table with the same structure as the original, but without the need to reproduce the entire dataset.
Benefits of Zero-Copy Clone
- Efficient Storage – Zero-copy clones offer efficient storage by avoiding the duplication of actual data, leading to minimal additional storage costs.
- Instant Creation – The ability to generate clones almost instantly without copying the entire dataset provides a swift solution for development, testing, and reporting processes.
- Data Recovery – Serving as a backup, zero-copy clones retain the previous data state, acting as a safeguard in scenarios of failed releases or data corruption, aiding in effective data recovery.
- Historical Reporting – The feature allowing the creation of historical reports reflecting past data states makes zero-copy clones valuable for time-based analysis.
Applications of Zero-Copy Cloning
- Development and Testing
- Simplifies development and testing processes by generating copies of tables in lower environments without the need to duplicate extensive datasets.
- Analytical Workloads
- Guarantees uniform reporting and provides a mechanism for zero-copy data duplication, especially beneficial for analytical workloads, machine learning modelling, and testing.
- Data Recovery
- Serves as a safety measure for data recovery in cases of failed releases or data corruption by preserving the original data state.
- Historical Analysis
- Facilitates the creation of historical reports depicting the data state at specific points in time, supporting historical analysis and identification of trends.
Table cloning in Fabric’s Data Warehouse offers various features and considerations
Creation of a Table Clone
- A table clone in the warehouse can be swiftly created using simple T-SQL, either within the same schema or across different schemas.
- The clone can be based on the current point-in-time, representing the present state of the table, or a previous point-in-time up to seven days in the past, containing data as it appeared at that specific moment.
- No limitations exist on the number of clones within or across schemas, and multiple tables can be cloned simultaneously.
Retention of Data History
- Data history is automatically retained by the warehouse for seven calendar days, enabling the creation of clones at specific points in time.
- All inserts, updates, and deletes within the data warehouse are preserved for this designated seven-day period.
Separate and Independent
- Upon creation, a table clone is an independent and separate copy of the data from its source.
- Changes made to the source or the clone do not affect each other, whether through Data Manipulation Language (DML) or Data Definition Language (DDL) operations.
Permissions for Creating a Table Clone
- Users holding Admin, Member, or Contributor roles in the workspace have the capability to create table clones within that workspace; however, Viewers do not possess this privilege.
- A prerequisite for table cloning is having SELECT permission for all rows and columns of the source table.
- CREATE TABLE permission within the schema where the table clone will be generated is essential.
Deletion of a Table Clone
- Original sources and clones can be deleted without constraints, and a created clone persists until explicitly deleted by the user.
- Users with specific workspace roles or ALTER SCHEMA permissions can perform deletion.
Table Clone Inheritance
- The clone inherits object-level SQL security, attributes, and primary/unique key constraints from the source table.
- Read-only delta logs and data files stored as delta parquet files are created for every table clone, ensuring data integrity and protection against corruption.
Clone a table
You have the option to clone a table either by employing T-SQL or by utilizing the warehouse editor, even without any prior coding experience.
- Clone a table using T-SQL in Microsoft Fabric
Commence by following the steps below to create a table clone in Microsoft Fabric’s Warehouse using the T-SQL syntax CREATE TABLE AS CLONE OF.
- Create a table clone within the same schema in a warehouse
You have the option to clone tables within the same database. Please refer to the following video for a detailed explanation of the cloning process.
- Create a table clone across schemas within same warehouse
You can clone tables across databases within the same warehouse. For a detailed explanation of the cloning process, please refer to the following video.
- Clone tables using warehouse editor
You can effortlessly replicate tables using the warehouse editor, whether capturing the current state, exploring historical snapshots, or cloning multiple tables simultaneously.
- Clone table as of current state
To clone a table to its current state, follow these steps
- Select the table of interest.
- Right-click on the table intended for cloning and choose “Clone table.” This action will bring up the Clone Table pane.
- In the Clone Table pane, the source table schema and name are auto-populated.
- Choose “Table state” as “Current” to initiate the cloning process, capturing the current state of the table.
- Choose the destination schema and edit the pre-filled destination table name.
- Expand the SQL Statement section to view the generated T-SQL statement.
- Click Clone.
This process allows the creation of a clone reflecting the current state of the selected source table.
- Clone table as of past point-in-time.
Similarly to the current state, you can opt for “Clone table as of a past point-in-time.”
This allows you to choose the historical state of the table within the last seven days by selecting a specific date and time in UTC.
The result is a cloned table that reflects the data from a specific past point in time, offering flexibility in historical analysis.
- Clone multiple tables at once.
- You can also clone a group of tables simultaneously, which is helpful for duplicating related tables at the same past point in time.
- Right-click on the Tables menu, then choose “Clone tables.” This action opens the Clone Table pane, where you can select multiple tables to be cloned simultaneously.
- By selecting source tables, specifying the current or past table state, and designating the destination schema, you can quickly perform the clone of multiple tables.
- The default naming pattern for cloned objects is source_table_name-Clone. If customization of the name is needed, T-SQL commands for multiple CREATE TABLE AS CLONE OF statements are available.
Use Cases for Cloning in Fabric
- Data Science: Data scientists can clone a dataset to experiment with different models or transformations without modifying the original data.
- Data Engineering: Engineers can clone datasets or tables as part of ETL (Extract, Transform, Load) processes or to maintain different stages of data transformation.
- Backup and Disaster Recovery: Cloning can be used to create backups of critical data or tables, helping to mitigate the risk of data loss.
- Data Testing: Data engineers or analysts can clone datasets to test new algorithms, schema changes, or other updates without affecting production data.
| Tags | Microsoft Fabric |
| MS Learn Modules | |
Test Your Knowledge |
Quiz |
