Microsoft Fabric is a powerful tool for handling and analyzing data. Whether you’re new or experienced, picking the right data store is crucial. This blog guides you through the intricate decision-making process of selecting the optimal data store in Microsoft Fabric.
Explore key considerations for each option – Data Warehouse, Lakehouse, Power BI Datamart, and KQL Database – and gain valuable insights from practical scenarios tailored to your data management needs.
Data Warehouse
- Aspects
- Data Volume: Unlimited
- Type of Data: Structured
- Primary Developer Persona: Data warehouse developer, SQL engineer
- Primary Developer Skill Set: SQL
- Data Organization: Databases, schemas, and tables
- Read Operations: T-SQL, Spark
- Write Operations: T-SQL, Spark
- Multi-Table Transactions: Yes
- Primary development interface: SQL scripts
- Security: Object-level (table, view, function, stored procedure), column-level, row-level
- Accessing Data Through Shortcuts: Yes, accessing data indirectly through the lakehouse.
- Ability to Serve as Shortcut Sources: Yes (tables)
- Cross-item Query Capability: Yes, you have the capability to query across both lakehouse and warehouse tables.
- Scenario
Ram, a proficient data analyst in a growing tech firm, is tasked with establishing a robust data infrastructure in Microsoft Fabric. Given the company’s historical reliance on SQL-based tools, Ram opts for a data warehouse to streamline data cleaning, modelling, and analysis.
This decision aligns seamlessly with the team’s strong background in SQL, fostering efficient T-SQL interactions. Ram consciously decides to utilize a data warehouse, ensuring a structured and familiar environment for the team. This choice not only facilitates proficient management and analysis of data but also accommodates Spark users within the organization.
The data warehouse becomes the cornerstone for scalable insights in the ever-evolving tech landscape.
Lakehouse
- Aspects
- Data Volume: Unlimited
- Type of Data: Structured, Semi-structured and Unstructured
- Primary Developer Persona: Data engineer, Data scientist
- Primary Developer Skill Set: Support for Spark using Scala, PySpark, Spark SQL, and R.
- Data Organization: Based on folders and files, databases, and tables.
- Read Operations: T-SQL, Spark
- Write Operations: Spark (Scala, PySpark, Spark SQL, R)
- Multi-Table Transactions: No
- Primary development interface: Spark notebooks, Spark job definitions
- Security: Security measures include row-level and table-level (when utilizing T-SQL), with no specific measures for Spark.
- Accessing Data Through Shortcuts: Yes
- Ability to Serve as Shortcut Sources: Yes (files and tables)
- Cross-item Query Capability: Yes, you can seamlessly query across both lakehouse and warehouse tables. Moreover, you have the capability to extend your queries across multiple lakehouses, utilizing Spark, and incorporating shortcuts for efficient access.
- Scenario
Sita, a skilled data engineer, faces the challenge of managing terabytes of data in Microsoft Fabric. With a team skilled in both PySpark and T-SQL, Sita chooses a lakehouse.
This decision allows for effective utilization of diverse skills, accommodating T-SQL experts for consumption and enabling seamless interaction with data for PySpark users. The lakehouse architecture proves to be the ideal solution, fostering collaboration and maximizing the team’s capabilities in Fabric’s dynamic data environment. Sita’s thoughtful choice aligns with the team’s skills, ensuring efficient data handling and insightful analysis.
Power BI Datamart
- Aspects
- Data Volume: Up to 100 GB
- Type of Data: Structured
- Primary Developer Persona: Citizen developer
- Primary Developer Skill Set: No code, SQL
- Data Organization: Based on database, tables, queries.
- Read Operations: T-SQL, Spark, Power BI
- Write Operations: T-SQL, Dataflow
- Multi-Table Transactions: No
- Primary development interface: Power BI
- Security: Incorporated Row-Level Security (RLS) editor.
- Accessing Data Through Shortcuts: No
- Ability to Serve as Shortcut Sources: No
- Cross-item Query Capability: No
*A citizen developer is a non-professional, business-oriented individual who utilizes low-code or no-code platforms to create software applications, addressing specific business needs without extensive programming expertise.
- Scenario
Alex, a business analyst with expertise in Power BI, is tasked with delivering a comprehensive data report for a client presentation. Considering the project’s scope and the need for swift insights, Alex evaluates the options of a data warehouse or lakehouse, deeming them excessive for the current requirements. Acknowledging the self-service, no-code capabilities of Power BI and the manageable data volume, Alex strategically opts for Power BI as the go-to tool.
Collaborating with team members proficient in no-code and SQL analytical tools, Alex leverages the existing Power BI infrastructure within the organization. This decision enables a rapid development process, allowing for seamless execution of queries and dynamic report creation.
The choice of Power BI proves instrumental in delivering a visually appealing and insightful presentation to the client, showcasing the flexibility and effectiveness of the tool in meeting immediate data analysis needs.
Event House (KQL database)
- Aspects
- Data Volume: Unlimited
- Type of Data: Structured, Semi-structured, Unstructured
- Primary Developer Persona: Citizen Data scientist, Data engineer, Data scientist, SQL engineer
- Primary Developer Skill Set: No code, KQL, SQL
- Data Organization: Based on databases, schemas, and tables.
- Read Operations: KQL, T-SQL, Spark, Power BI
- Write Operations: KQL, Spark, connector ecosystem
- Multi-Table Transactions: Yes, available for multi-table ingestion.
- Primary development interface: KQL Query set, KQL Database
- Security: Row-level Security.
- Accessing Data Through Shortcuts: Yes
- Ability to Serve as Shortcut Sources: Yes
- Cross-item Query Capability: Yes, the capability to query across KQL Databases, lakehouses, and warehouses utilizing shortcuts.
- Advanced analytics: Incorporates native elements for Time Series analysis and comprehensive geospatial storing and query capabilities.
- Advanced formatting support: Complete indexing support for free text and semi-structured data such as JSON.
- Ingestion latency: Utilizes both queued ingestion and streaming ingestion, with the latter experiencing only a few seconds of latency.
- Scenario
Aruna, a seasoned data architect, is entrusted with designing a scalable solution for a multinational company dealing with vast, diverse data. Evaluating advanced analytics needs, Aruna strategically opts for a KQL Database in Microsoft Fabric, chosen for its quick response times and proficiency in handling billions of rows of structured, semi-structured, and unstructured data.
Collaborating with a cross-functional team comprising data scientists, analysts, and developers, Aruna leverages the KQL Database’s scalability and native elements for time series analysis and geospatial functions. This decision facilitates the rapid execution of complex queries, enabling the team to compare current and previous periods, identify emerging issues, and perform detailed geospatial analytics.
The choice of a KQL Database proves instrumental in building a robust and high-performance data solution, showcasing its capabilities in handling diverse data formats and providing insights critical for the company’s strategic decision-making processes.
Conclusion
In conclusion, selecting the right data store in Microsoft Fabric is a strategic decision that hinges on factors such as data type, volume, and developer skills. Whether opting for a Data Warehouse, Lakehouse, Power BI Datamart, or KQL Database, organizations can tailor their choices to meet specific needs, ensuring efficient data management and insightful analytics in the dynamic realm of Microsoft Fabric.
| Tags | Microsoft Fabric |
| MS Learn Modules | |
Test Your Knowledge |
Quiz |
