Introduction
Microsoft Fabric is a powerful platform that brings together data engineering, pipelines, lakehouse, and Power BI into a single ecosystem.
However, once you move from demos to real production workloads, things start to break.
In this post, I will walk through:
- A typical Fabric data platform architecture
- Real production issues I faced
- Why they happen
- How to fix them with practical solutions
This is based on building a real Magento to Fabric data pipeline, not just theoretical examples.
Fabric Architecture in Production
A common real-world setup looks like this:
Ingestion
Data is ingested using Fabric Pipelines or Azure Data Factory into Bronze layer stored as files.
Processing
Fabric notebooks transform data into Silver layer using Delta tables.
Consumption
Power BI semantic models are built on top of curated datasets.
What Breaks in Production
Fabric works smoothly in controlled demos, but in production you will encounter:
- Datatype mismatches
- Schema drift
- Merge failures
- String size limitations
- Differences from Databricks
Let’s go through the most important issues.
Issue 1: SqlDecimal cannot be converted to Double
Error
SqlDecimal cannot be converted to Double
Why it happens
Systems like Magento store numeric values as high precision decimals such as DECIMAL(38,18). Fabric pipelines sometimes try to insert these into floating point columns, which causes conversion failures.
Fix
Always control datatypes explicitly.
In SQL:
CAST(column AS DECIMAL(38,10))
In PySpark:
df = df.withColumn(“price”, col(“price”).cast(“decimal(20,6)”))
Real lesson
Never rely on automatic datatype mapping. Always define decimal precision explicitly.
Issue 2: DELTA MERGE unresolved expression
Error
Cannot resolve row_updated_at in UPDATE clause
Why it happens
Using UPDATE SET * assumes source and target schemas are identical. If the target table has additional columns not present in the source, the merge fails.
Fix
Use dynamic column matching instead of updating all columns blindly.
Only update columns that exist in both source and target.
Real lesson
Avoid UPDATE SET *. Always control your merge logic.
Issue 3: dbutils not defined in Fabric
Error
NameError: name ‘dbutils’ is not defined
Why it happens
Fabric does not support Databricks utilities.
Fix
Use mssparkutils instead:
from notebookutils import mssparkutils
mssparkutils.fs.ls(path)
Real lesson
Fabric is not Databricks. Utilities must be adapted accordingly.
Issue 4: String or VARCHAR overflow
Problem
Certain columns such as product options or additional data can contain very large text or JSON, which exceeds Fabric limits.
Fix
Truncate large columns:
df = df.withColumn(col_name, substring(col(col_name), 1, 8000))
Real lesson
Identify large text columns early and handle them explicitly.
Issue 5: Duplicate records in incremental loads
Problem
Incremental loads often introduce multiple records for the same key due to updates.
Fix
Use a window function to retain only the latest record:
window = Window.partitionBy(key).orderBy(col(“updated_at”).desc())
df = df.withColumn(“rn”, row_number().over(window))
.filter(col(“rn”) == 1)
.drop(“rn”)
Real lesson
Always deduplicate incremental data using a reliable timestamp column.
Issue 6: Lookup activity not reading Lakehouse files
Problem
Lookup activity fails to read files stored in the Lakehouse.
Why it happens
Fabric uses a different path structure compared to traditional data lake storage.
Fix
Use correct paths such as:
Files/your-folder/your-file
Real lesson
Always validate file paths when working with Fabric storage.
Issue 7: Schema drift breaking pipelines
Problem
New columns appear in source data and cause pipeline failures.
Fix
Use dynamic schema handling and column intersection logic in transformations and merge operations.
Real lesson
Schema drift is inevitable in production. Pipelines must be designed to handle it gracefully.
Best Practices for Production Fabric Pipelines
- Normalize datatypes early
- Avoid automatic schema assumptions
- Use controlled merge logic
- Handle large text columns proactively
- Deduplicate incremental data
- Monitor and log failures
Final Thoughts
Microsoft Fabric is a strong platform, but it is still evolving. The biggest challenge today is bridging the gap between simple demos and real production use cases.
The most important shift is moving from relying on default behavior to explicitly controlling schema, data types, and transformations.
Leave a comment