An in-depth look at Microsoft Dataverse (formerly the Common Data Service). This book looks at what Dataverse is and how you can manage data, tables and relationships.
The first thing we need to do in this book is clear up any confusion about the title. Originally this talk at Global Con 4 was titled Deep Dive in the Common Data Service, but as of 16th November 2020, Microsoft renamed the Common Data Service to Microsoft Dataverse. The new name is used throughout this book but if you have heard of Common Data Service previously rest assured that we are talking about one and the same thing.
In this ebook, we will examine what Microsoft Dataverse is, how it works and where it originated. It is useful to understand the origins of Dataverse as this helps understand some of the nuances that come up, and these nuances will be looked at along with the different data types available.
We will also explain entities and how relationships between these entities can be modelled. Finally, we will cover some critical design considerations you will come across when working with Dataverse.
What is the Dataverse?
Microsoft Dataverse is at the core of the Power platform. It easily structures a variety of business data to support interconnected business applications and processes.
As the diagram above shows, this means that Dataverse allows centralised, structured data to be available to all the different tools in the Power Platform.
What’s in the box
Microsoft Dataverse is much more than just a database, however. It is accessed by applications via its own API and the applications that access it can be Power Platform or Azure applications as well as a third-party desktop and mobile apps.
As you can see from the image above, the API includes a number of features, which makes Dataverse much more than a simple database:
- Security including authentication, authorization and auditing
- Logic such as workflows which can automate tasks, business rules, duplicate detection and calculated columns
- Data features such as data modelling, validation and reporting
- Data storage
- Integration features including webhooks, eventing and data export
The Origins of Dataverse
Microsoft Dataverse can be traced back to 2003 and Microsoft business Solutions CRM 1.0, which used SQL Server 2000 for its data storage.
In December 2005 Microsoft then released CRM 3.0 which used SQL Server 2000 or 2005 and was the first version that allowed data modelling and extensibility. This version is particularly important because some of the nuances that will be discussed later on can be traced back to the versions of the SQL server used.
December 2011 saw the release of CRM 2011 using SQL Server 2008 or 2012, which was a particularly momentous occasion as this is the first time when Dynamics was available as a cloud-based service as well as on-premise.
Yearly updates followed and then in October 2017 CRM 365 was released and this marked the biggest change so far. Microsoft continued to grow this online offering, but
it soon became clear that the architecture of the SQL server wasn’t fully suited to cloud-based CRM. For this reason, they moved to use SQL Azure, and this helped improve scalability and performance massively.
Around this time, the Common Data Service (CDS) version 1.0 was also released, which was a completely separate technology from everything above. In March 2018 version
CDS was born as Microsoft realised that CDS was a better solution for online CRM and Dynamic applications. Version 2.0 of CDS is the predecessor of Microsoft Dataverse.
Microsoft Dataverse… on Azure
Dataverse API endpoints use the compute, storage and eventing & extensibility layers of Azure:
- Azure compute is used for web services, to provide capacity for synchronization jobs, processing of sandbox code and reporting.
- In the storage layer SQL elastic pools are used to provide flexible, robust data storage. Also included is blob storage for files, CosmosDB databases for unstructured data such as logs and search which is used for indexing and searching records.
- The eventing & extensibility layer provides access to Azure functions, event hubs, service bus for integration and app services.
All of the above allow Microsoft Dataverse to provide an optimised high-performance service able to cope with thousands of users from around the world and millions of records.
Data modelling in Microsoft Dataverse
Modelling data in Dataverse is fairly simple, but there are some data constructs and limits that you need to be aware of.
In CDS the main place where you stored data were called Entities but in Dataverse this has now changed to Tables. CDS used the term fields for individual pieces of data but in Dataverse these are now known as columns.
The maximum number of custom entities/tables used to be 300, but this has now been increased to 1500. There is also a limit of 1024 columns per table or 8Kb on a single record, depending on the size of the columns. This should be more than enough though as if you came close to using this many columns in a single table you probably need to take a look at how you are structuring your data.
Basic Data Types
Single Line of Text
This is actually an nvarchar(n) column to store Unicode characters. You can specify the length (n) when defining the column. This length actually specifies the number of bytes that the column will use (2n+2). There can be some surprises with special characters with this data type because of the number of bytes they take up per character.
Multiple Lines of Text
This is an nvarchar(MAX) column that allows you to store up to 2GB of data or around one million characters.
This was previously known as Option Set in CDS. It is a 4-byte integer column that allows you to define a set of options that you can choose from. This can be a multiple-choice list if required.
This was previously known as Two Options. It is a binary column that can store a Boolean value, but this isn’t limited to Yes/No, it can be any two values that you choose.
The image data type is a varbinary column and is scaled to 144×144 jpeg. It is also limited to the attachment size that you have defined in your environment.
This is a special foreign key data type used to define relationships between tables.
Date and Time
The Dataverse Date and Time column is based on SQL DateTime and uses 8 bytes. It is inherited from the DateTime date type used in SQL 2000, which means that no dates prior to 1st January 1753 can be stored.
This is because of the change from the Julian calendar to the Gregorian calendar which resulted in 10 days being lost so date operations from before this date can be complicated. SQL 2000 took the decision to ignore dates before this date. This issue was fixed in SQL Server 2008 using the datetime2 data type, but Microsoft Dataverse inherits the issue from the old data type. However, the chances of this being a problem are quite remote.
Microsoft Dataverse is designed to cover global organisations, so all Date and Time columns allow you to specify different behaviours to account for different time zones and locations. There are three different behaviours:
- User local – this uses the time zone offset of the current user. The data is stored in coordinated universal time (UTC) along with the offset. This can cause confusion though, for example, if you store a date of birth with no time this will actually be stored with a time of 00:00 (midnight). If a user in a time zone that is an hour or more behind looks at this date, then they will see it as the day before as the time has moved back to 11pm the previous day.
- Time-Zone Independent – this was introduced to try and mitigate the issue mentioned above. It means that the time zone is ignored, and no offset is specified so the date and time displayed will always be the exact date and time entered locally at the time.
- Date Only – this stores just the date with no time
This is a 4-byte integer.
This can store up to 5 decimals and uses up to 8 bytes of data. It is not precise and shouldn’t be used in calculations or exact comparisons. This also means it cannot be used within a calculated column.
Unlike the Float data type, Decimal doesn’t use a floating-point decimal component and so is precise and can be used in calculations and comparisons. It can store up to 10 decimals and 13 bytes. Both the decimal and float data types can also accept numbers in scientific notation e.g. 1e-5 for 0.00001.
This is another precise data type, which can store up to 4 decimals and 8 bytes. When you create a new environment, you specify a base currency that applies to the whole environment. However, you can then specify a transaction currency per column which includes an exchange rate. This means that multiple currencies can be used within an organisation.
Status & Status Reason
Columns of this type cannot be created manually; they have been created automatically in any new tables you create. For most tables, the status is two choices – Active and Inactive but some tables such as Case have more than this.
Status Reason allows you to store a reason for the status that has been set. Rules can also be set on transitions between statuses so for example a case couldn’t be changed from resolved to on hold.
In Microsoft Dataverse there are three types of the table – standard, activity and virtual. Virtual tables are beyond the scope of this book and merit their own session to fully explain them. Essentially though they are tables that exist in Dataverse, the data within them is provided by external data sources.
When you create a new table, you choose an internal name and display name for the table. The internal name cannot be changed, but the display name can. You also specify a primary column, which must be a text column and is used in lookups and titles. If required, this primary column can be set to an auto-number so each new record will have an automatically generated unique identifier.
There is also a setting to enable attachments. If this is enabled, then an automatic relationship will be created between your new table and the attachments table.
When creating a table, you also have to define the ownership type. There are two main options for this – User or Team and Organisation:
If you choose User or Team, some extra columns will be added to the table to store the user or team details as well as the owner’s business unit. This then allows the use of the full range of security controls in Microsoft Dataverse such as hierarchies and granular permissions down to the record level.
The Organisation ownership type doesn’t add any owner columns and the security options are much simpler – users either have access to all records in the table or none of them.
Tip: this setting can’t be changed so if in doubt leave it on User or Team.
As the name suggests, this table is used to store activities against a record. Creating an activity table adds over 50 extra columns to a standard table. These include time dimensions such as schedule, start etc. and default columns such as subject, description and states and activity parties and regarding such as to, from etc.
There are two types of relationships in Microsoft Dataverse – one to many and many to many. A one to many relationships uses a lookup column, which is essentially a foreign key in another table. This lookup is actually made up of a few columns in the background though. These include Id, which is the GUID of the related record as well as the name and table type.
When many to many relationships are created, this uses a hidden table that has lookups to manage the many to many relationships.
When you create a relationship, you can also specify a behaviour type. This controls how actions performed on records on the ‘one’ side of the relationship propagate to the ‘many’ side.
For example, if you deleted a contact would you want to also automatically delete all the related activities for this contact? These behaviour types allow you to control whether these related records would be deleted in this situation.
5 Critical Design Considerations
New Table vs Common Data Model (CDM) Table
There will be times when you are faced with the decision of whether to create a new custom table or use one of the provided CDM tables. There are a couple of things to be aware of when you find yourself in this situation.
Firstly, avoid recreating out of the box (OOB) tables. If you don’t need all the columns in a table that’s fine as you don’t have to use them and can remove them from the forms if you need to. Recreating an OOB table can also mean you lose functionality as, for example, the Contact table has features that allow you to use a Microsoft Outlook add-in to manage e-mails from contacts. If you re-create your own Contact table, then you would lose this functionality.
Even more important than the first point is to avoid repurposing OOB tables. For example, an organisation may not need to use the Case table for its original purpose and may use it to manage an onboarding process. If another part of the organisation then needs to use the Case table at a later date for its original purpose then they will be missing OOB functionality as it has been customised.
Activity Table vs Standard Table
When deciding whether to create a custom activity table or a custom standard table, there are a few aspects to consider. Custom activity types will show everywhere that OOB activity types appear. Also, you can’t have granular control of the permissions for custom activities, only as activities in general. Basically, either user see all activities or no activities regardless of whether they are custom or OOB.
On the plus side, an activity table gives you special relationships using activity parties, which cannot be easily replicated on standard tables. Activity tables also allow the use of increased service functionality, such as time recording and service level agreement reporting.
Redundant relationships are relationships that don’t add any more information. In the example above, there are relationships between Insurance Policy Line Item, Insurance Policy and Contact. This allows a user to move from a line item to a related policy and then view the related customer.
The relationship at the bottom of the image between Insurance Policy Line Item and Contact is redundant because there is already a way of moving from the line item to the contact. This redundant relationship could be useful in some situations, for example, if you wanted to reference a contact from a view or workflow defined against the line item then you would need this direct relationship. However, it’s important to note that these redundant relationships may require extra maintenance or automation to ensure consistency between the different tables.
Many to Many Relationships
There are three options when you need to model many to many relationships. The first way to do this is to use connections. Connections don’t create actual relationships and allow you to connect rows in different tables and specify the roles of the connected rows. For example, you could connect records in a contact table and specify the role of each such as a sales agent and customer etc.
The second method is using a system with many to many relationships. This adds no extra information but just shows the relationship between the two tables:
An improvement on the above would be to use a custom many to many relationships, as this allows you to record custom attributes against the relationship:
A custom many to many relationships essentially creates an intermediate table and sets to one to many relationships between that table and the original tables.
Choice vs Lookup
There are advantages and disadvantages to using either a choice column or lookup column, which are summarised below:
- Transported with solutions
- Easy and simple
- Not good with many values
- Changes require publishing
- Option deletion loses data
- Better for static and not too long lists
- Requires a new relationship or table
- Flexible and easy to modify values
- Allows deactivation
- Need to plan for data migration
- Allows for filtering with views
- Better for long or changing lists