Ingesting Data with Data Factory in Microsoft Fabric

Good analytics starts with great data, and great data starts with reliable ingestion pipelines.

In Microsoft Fabric, Data Factory is the powerhouse behind that process.
It’s the next generation of Azure Data Factory, built right into the Fabric platform; making it easier than ever to:

Connect to hundreds of data sources
Transform and clean data on the fly
Schedule and automate ingestion (without writing code)

In this post, we’ll cover:

Copy Data activity and pipeline basics
How to connect to common data sources
Tips for scheduled and incremental loads

Copy Data Activity & Pipeline Basics

The Copy Data activity is your go-to tool for moving data from any source to your destination in Fabric.

Creating your first pipeline:

In your Fabric workspace, click New → Data pipeline
Name your pipeline (e.g., SalesData_Pipeline)
In the pipeline canvas, click Add activity → Copy data
Choose your source (Azure SQL, Blob Storage, REST API, etc.)
Choose your destination (Lakehouse table, Warehouse, etc.)
Map columns if needed
Save and Run the pipeline

Pro tip: Pipelines can do more than just copy data; you can chain ingestion, transformations, notifications, and even conditional logic.

Connecting to Common Data Sources

Fabric’s Data Factory comes with a huge connector library.
Here are some common examples:

A. Azure SQL Database

Source type: Azure SQL Database
Authentication: SQL Auth or Azure AD
Connection string:

Server=tcp:servername.database.windows.net,1433;Database=dbname;

Best for: Operational or transactional data ingestion

B. Azure Blob Storage

Source type: Azure Blob Storage
Authentication: SAS token or Managed Identity
Best for: Bulk loads of CSV, JSON, Parquet files

C. REST APIs

Source type: HTTP or REST
Authentication: Basic, OAuth 2.0, or API key
Best for: SaaS applications or streaming data feeds

Scheduling & Incremental Loads

A. Scheduled Loads

Use the Triggers tab to run pipelines hourly, daily, weekly — or even every few minutes for time-sensitive data.
Ideal for dashboards that must stay near real-time.

B. Incremental Loads

Move only what’s changed since the last run:

SELECT * FROM Orders
WHERE LastUpdated > @LastRunTime

Use a watermark column like LastUpdated
Store @LastRunTime in a pipeline variable so it persists between runs

C. Error Handling

Add failure paths to trigger email or Microsoft Teams alerts
Use retry policies to handle temporary network issues automatically

Why It Matters

A well-designed pipeline in Microsoft Fabric’s Data Factory doesn’t just move data, it keeps your Lakehouse, Warehouse, and BI dashboards running on fresh, accurate information.

That’s the difference between a dashboard your executives trust and one they ignore.

Share this:

Related

Leave a comment Cancel reply