Lesson 20 – Ingest data using Copy Activity in Microsoft Fabric

Explore the simplicity of data transfer in Microsoft Fabric with Copy Activity – your straightforward guide to understanding its seamless functionality in the realm of data movement.

Understanding Copy Activity

Copy Activity is a versatile tool within Microsoft Fabric’s Data Pipeline service. It allows users to effortlessly copy data among various data stores located in the cloud. Whether you need to move data between Azure Storage, SQL databases, or other cloud-based storage solutions, Copy Activity streamlines the process, making it a go-to tool for data engineers and analysts.

To copy data from one place to another, the Copy activity does three main things

  • The Copy activity starts by fetching data from the source data store.
  • It performs operations like serialization/deserialization, compression/decompression, and column mapping based on your configuration.
  • The processed data is then written to the destination data store, completing the copying process.

How to Ingest data using Copy activity?

Proceed with the steps to execute the Copy Activity.

Prerequisite

  1. Add a copy activity directly

       Follow these steps to add a copy activity directly

  • Navigate to the workspace where you established the data pipeline and access the Data Pipeline section.
  • Add copy activity using either way
    • Copy data –> Add to canvas
    • Add pipeline activity –> Copy data

When you add a new copy activity to a pipeline, check the bottom of the screen. There, you’ll find the properties pane showing General, Source, Destination, Mapping, and Settings options.

Configure settings under General tab

General settings include Name and Description of the activity, and some options depends on the activity you choose. You enter all the necessary information and switch to next tab.

Configure settings under Source tab

Choose your data source by selecting “workspace” as the data store type and “Lakehouse” as the workspace data store type. Pick the specific lakehouse and table for copying, and preview the data if needed.

You can also use “external” option to gather data from various external sources like Azure Blob Storage, FTP, SQL Server, and more.

Configure settings under Destination tab Select the Lakehouse name for your desired data destination. Alternatively, you have the option to choose an external destination, such as Azure Blob Storage, Snowflake etc.

Configure settings under Mappings tab

If the connector you’re using allows mapping, head to the Mapping tab to set up your configuration.

  • Expand Type conversion settings to configure type conversion.
  • Select Import schemas to import your data schema, Auto-mapping is done as shown, indicate your source and Destination columns.
    • You can customize the Destination column name If you create new table in Destination.
    • You are not allowed to modify the destination column name when you write data into existing table.
    • You are also view the data type of both source and destination columns.
    • You can add new columns or remove existing columns based on your preference.

Configure settings under Setting tab

Customize the settings in the Settings tab according to your preferences.

Validate

Click on the “Validate” option in the menu to validate your pipeline.

Run

After successful validation, execute the pipeline by clicking “Run“. If the pipeline is unsaved, a dialog box will prompt you to save and run. Click that option to proceed and obtain the desired output.

The pipeline status is displayed as “Success” indicating that the execution of the pipeline was completed without errors or issues.

2. Add a copy activity using copy assistant

Follow these steps to add a copy activity using copy assistant

  • Navigate to the workspace where you established the data pipeline and access the Data Pipeline section.
  • To add a copy activity using the Copy Assistant in either way:
    • Copy data –> Use copy assistant
    • Data pipeline homepage –> Copy data

Configure source

  • When adding a copy activity using the Copy Assistant, you have the option to choose the data source type from different categories. Click Next.
  • In the Copy Assistant, you have the option to select from sample datasets, and in this instance, the NYC Taxi – green dataset was chosen. This allows users to work with predefined datasets for testing and demonstration purposes, enhancing the ease of configuration in the copy activity. Click Next.
  • After choosing the NYC Taxi – green dataset, preview the selected dataset table, and click “Next” to proceed with the copy activity configuration.

Configure Destination

  • Select the data destination option from the category. In this case, “Lakehouse” under the workspace option has been selected. Then, click “Next” to proceed with the copy activity configuration.
  • You can either choose from an existing lakehouse or create a new lakehouse. Click “Next” to continue with the copy activity configuration.

  • Set up and map your source data to the destination, then click “Next” to finalize your destination configurations.

Review and Run your copy activity

Review the settings for your copy activity in the preceding steps and click “Save + Run” to complete the process. Alternatively, you can return to the previous steps in the tool to make edits if necessary.

The pipeline status is displayed as “Success” indicating that the execution of the pipeline was completed without errors or issues.

Tags Microsoft Fabric
Useful links
  • Microsoft Fabric: Data pipelines
  • MS Learn Modules

    Test Your Knowledge

    Quiz