Ingesting Data with Data Factory in Microsoft Fabric

Good analytics starts with great data, and great data starts with reliable ingestion pipelines.

In Microsoft Fabric, Data Factory is the powerhouse behind that process.
It’s the next generation of Azure Data Factory, built right into the Fabric platform; making it easier than ever to:

  • Connect to hundreds of data sources
  • Transform and clean data on the fly
  • Schedule and automate ingestion (without writing code)

In this post, we’ll cover:

  1. Copy Data activity and pipeline basics
  2. How to connect to common data sources
  3. Tips for scheduled and incremental loads

Copy Data Activity & Pipeline Basics

The Copy Data activity is your go-to tool for moving data from any source to your destination in Fabric.

Creating your first pipeline:

  1. In your Fabric workspace, click New → Data pipeline
  2. Name your pipeline (e.g., SalesData_Pipeline)
  3. In the pipeline canvas, click Add activity → Copy data
  4. Choose your source (Azure SQL, Blob Storage, REST API, etc.)
  5. Choose your destination (Lakehouse table, Warehouse, etc.)
  6. Map columns if needed
  7. Save and Run the pipeline

Pro tip: Pipelines can do more than just copy data; you can chain ingestion, transformations, notifications, and even conditional logic.

Connecting to Common Data Sources

Fabric’s Data Factory comes with a huge connector library.
Here are some common examples:

A. Azure SQL Database

  • Source type: Azure SQL Database
  • Authentication: SQL Auth or Azure AD
  • Connection string:

Best for: Operational or transactional data ingestion

B. Azure Blob Storage

  • Source type: Azure Blob Storage
  • Authentication: SAS token or Managed Identity
  • Best for: Bulk loads of CSV, JSON, Parquet files

C. REST APIs

  • Source type: HTTP or REST
  • Authentication: Basic, OAuth 2.0, or API key
  • Best for: SaaS applications or streaming data feeds

Scheduling & Incremental Loads

A. Scheduled Loads

  • Use the Triggers tab to run pipelines hourly, daily, weekly — or even every few minutes for time-sensitive data.
  • Ideal for dashboards that must stay near real-time.

B. Incremental Loads

Move only what’s changed since the last run:

  • Use a watermark column like LastUpdated
  • Store @LastRunTime in a pipeline variable so it persists between runs

C. Error Handling

  • Add failure paths to trigger email or Microsoft Teams alerts
  • Use retry policies to handle temporary network issues automatically

Why It Matters

A well-designed pipeline in Microsoft Fabric’s Data Factory doesn’t just move data, it keeps your Lakehouse, Warehouse, and BI dashboards running on fresh, accurate information.

That’s the difference between a dashboard your executives trust and one they ignore.

Leave a comment

Create a website or blog at WordPress.com

Up ↑