BI Powers – The AI who tagged me

Andre Melancia

0 comments

We use and consume data on a daily basis using Power BI by accessing normal data sources. But we could get so much more from the data by introducing some of the other services which Microsoft offers such as Cognitive Services. In this eBook companion to the session, we will look at how we can build a basic dashboard and then call Azure Services to enhance the data that we are pulling back. Some of the elements that we look at can also be used with Excel, but we will be focussing on Power BI. The intended audience is business users who use Excel and Power BI tools daily.

Throughout this eBook we are going to build a dashboard that will provide insight into a number of emails that are being received into a specified mailbox. Those mails could be related to feedback from a customer, but rather than have to look at them manually, we will use Azure Cognitive Services to analyse the subject and categorise it as being positive or negative. This could be particularly useful if you wanted to address the negative emails first.

You could also use some of the other connectors available within Power BI, to connect to social media such as Facebook to then perform cognitive analysis on your feeds and posts.

Before we get going on this, we first of all need to make sure that we have some key items in place.

Getting Started

Power BI Basics

Before we can do anything with Power BI, we need to download the Power BI Desktop application which we can get from PowerBI.com. It is free to use as long as you have an O365 account, and in some cases, you can even use it without one. The Power BI desktop can also be downloaded in your native language.

Some of the elements which are covered in this eBook will only work from the Power BI desktop. This is a limitation around some of the connectors which we will be using, as some of them will collect and use personal data from you.

Once you have downloaded and installed Power BI Desktop, you can then install it and run the application.

From the home screen, the yellow screen, you can open your existing projects, or you can also log into Office 365. If you have Office 365, then it is a big advantage to log in with them as you can then start to access numerous resources available to you. I prefer to close this screen using the [x] in the top right corner of the yellow screen and then log in using the buttons on the ribbon.

Most of the projects, when you get started, will start by clicking on “Get Data.” This is the typical behaviour for every Power BI project which you start. That is where we are going to start, by using Power BI to get Office 365 information which is accessible by any Office 365 User, both end-users as well as administrators.

Microsoft Azure Cognitive Services

To access Azure, the Microsoft Cloud Platform, you will need to go portal.azure.com, and you can sign up for free. You will need a credit card to validate your account, but you will get an amount of free credit for use, which you can also use for Azure Cognitive Services, which we’ll be using later on.

The key services which we are going to be using are Cognitive Services, which provide several different services such as facial and image recognition as well as some language processing services such as Text Analytics.

Within Text Analytics, we are able to detect a number of pieces of information automatically such as the language, key phrases and even the sentiment that the text is potentially expressing.

In order to use this with Power BI, you need to ensure that you are logged into Azure using your domain account that you are going to be using for Power BI. The first thing which we need to do is create the service that we are going to consume within Power BI.

Creating a new Text Analytics Resource

In order to create a new resource within Azure, you need to click on the green + on the left side of the screen which will bring up the new screen. Start to type Text Analytics into the search box and you will see it appear in the autofill below.

All we need to do with the Text Analytics creation is to provide it with some basic information such as:

  • Name – the name that this service will be known by
  • Subscription – the Azure subscription where this service is going to be created
  • Location – the data center where the service will be located, preferably the closest data center to you
  • Pricing Tier – the plan you wish to use for this. If you haven’t already created the free one, ensure that you select the F0 or F1 plan as this will not cost you anything to use
  • Resource Group – the grouping for the relevant resources associated with the service

Once it has been provisioned, you will see some key pieces of information, with the most important part being the endpoint that we will use to access the service.

We are now ready to begin working in Power BI, to prepare our data and then call our Text Analytics service.

Working with Data

In most tenancies, you may have Office 365 accounts as well as other accounts, for which you have emails, tasks, or appointments etc. To start, we will select Get Data, and then click More… so that we can then see all of the data sources that we can potentially draw data from.

Selecting more can sometimes take a small amount of time to retrieve all of the connectors which we have available to us within Power BI. When it does load, however, we will see the ability to connect to a large range of data sources for example Excel, XML, JSON, or even upload an entire folder directly into Power BI. For the purpose of this demo, we are going to use Exchange, so we can easily find this by typing the name of the connector into the search box.

Be aware that with Exchange, there are two options. If we were to select the connector called “Microsoft Exchange”, then we would be preparing to connect to Exchange Server, i.e. the on-premises version. What we actually want to do, is connect to “Microsoft Exchange Online” which will connect to the cloud version. We simply click on the option that we want and then click the Connect button.

When we are faced with the email lookup dialogue box, we can enter the email address of the Exchange account we wish to connect to. At this point, if you’ve never authenticated against it before, you will be prompted to enter a username and password to access the data.

When we have authenticated, we are faced with the Navigator screen which helps us to select what data we want to work with, such as the Calendar, the Mail, Tasks, etc. If we were to select Calendar, then I would see the calendar data for that account.

For this exercise, we will concentrate on the Mail and the People data feeds from Exchange so that we can send emails in and then see how our data changes.

There are times that the data can be used in the source format; however, most times there will need to be an element of transformation to make the data usable within Power BI. Therefore, once we have selected our data, we need to select the Edit button to launch Power Query.

Viewing the Data

Once the Power Query window is open, we will see our data sources on the left-hand side of the screen, and our fields are shown in the center. There will be one common field between both the Mail and People data, which is the Folder Path. This will contain “\Contacts\” for the People Data source and “\Inbox\” for the Mail data.

For everything retrieved from Office 365, we will also have an Id field that contains a randomly generated unique ID. This is the ideal field for us to use as our key as we progress through our demos.

For all of the fields, we can select the field header and filter the data based on what we want to see. We can also scroll horizontally across the screen to review all of the fields which have been retrieved, as some of the data may prove to be useful.

There are some interesting types of data that are retrieved from this data source which you may not have encountered before.

Records

There are, however, some fields such as EmailAddress, which contain a Record rather than a simple textual value. A record is where multiple columns of data are stored within a single cell, very much like a sub-table where there could be additional rows and columns. When records are found within a column, there will be an expand button in the header of the column, which looks like arrows moving left and right.

When we expand, we are prompted to select which columns we would like to display, in this case, we have up to three email addresses.

When we expand, the columns are all added to the view as additional fields, so rather than just having the header EmailAddresses, we will instead see EmailAddresses.EmailAddress1, EmailAddresses.EmailAddress2 etc.

Lists

We also have another data structure being demonstrated in this data set called a List. A list is a number of rows of data within a single column. Therefore, when we try to expand it, we are given the option to expand the data to new rows or to extract the values as part of our transformation.

When we extract the values, we have the ability to define a delimiter for the data. We can either select one of the pre-defined characters such as a comma (,), or we can define our own.

Tables

We also have a data source with a table, which is comprised of a set of records and a set of lists, which again can be expanded.

Cleaning the Contact Data

The first thing that we will want to do with the data, is to remove some of the columns which aren’t going to provide us with any value. The Folder Path column, as an example, can be removed as all contacts are stored within the same folder.

To remove a column, you can simply right-click the column header and select remove. The key thing to understand when doing this is that when you remove the column, you are only removing it from your view, you are not affecting the underlying data.

For the purpose of this demonstration, we can get rid of most of the columns that we don’t want such as the Middle Name, JobTitle, Office Location etc.

As an alternative to removing all of the columns that you don’t want, you can reverse the process by selecting the columns you DO want, and then select Remove other columns.

This will leave just the columns that you are interested in, which going forward will be:

  • Surname
  • Given Name
  • CompanyName
  • EmailAddresses.EmailAddress1

At any point, you can click on the Advanced Editor to see exactly what is happening throughout your transformation in a code view which shows your actions written in a language called M.

Exporting to CSV

If you wanted to then export your cleaned data, for example into a CSV, then you can click on the Transform tab at the top of the screen and select “Run R Script.” R is a programming language that you can use to create functionality such as Machine Learning or perform other complex processing tasks.

The editor window will automatically create a dataset variable for you which contains all of the data which you still have available after all of your applied steps. This can then be taken and written to a CSV with just a single line of script:

write.table(dataset, file=”E://x.csv,sep=”,”,row.names=F)

Using this line, we are telling the editor that we want to take the data stored within the data set, and write it to a file stored on the E:\ drive and into a file called x.csv. The Sep parameter tells it how it will delimit the data, which means that I could change it from a comma (,) to a pipe (|) if I wanted.

When we run the script by clicking OK, it will begin to process, however before it gets underway, you will be prompted to review the Privacy levels associated with the data as a protection method to reduce the risk of a data breach.

If you choose, especially if you are only using the data privately on your own machine, then you can select to Ignore Privacy Level checks.

Once the privacy has been selected, the data processing will continue, and the file with the data will be created in the relevant folder.

If we open the file, we will see the data stored within the file, with the first row comprising of the column headers, and the data separated with the relevant delimiter. We can then use the “Convert Text to Columns” wizard, which is found on the Data tab within Excel, to separate the data into the relevant columns.

Cleaning the Mail Data

We can then do exactly the same cleanup operation with the Mail data where we can remove a lot of the columns which are not going to add any value to what we are trying to do. Just like with People, there will be a number of fields that contain Lists, Records and Tables. For this demo, we are only interested in the Subject and the name of the Sender.

Applying Grouping

Now that we have our cleaned Mail and People data, we are going to query the data in a slightly different way. The first thing we need to do so that we don’t lose our initial query is to duplicate the Query on the left-hand side, by right-clicking Mail and selecting Duplicate. We can then rename this, which in this case will be renamed to “Mail_Spammers”

In this query, we will select the Sender name and then select Group By from the Transform tab. This will bring up the Group By window which will ask us what we want to group by, and if we want to provide any further information such as the ability to count the rows in the group. Once I have configured the grouping, we can click ok.

If we then apply some basic sorting to the data, we are immediately able to see which senders have sent the largest number of emails.

Once I have finished cleaning the data in Power Query, we can return the Power BI canvas by clicking on the Home tab and selecting “Close and Apply.”

When we are back in Power BI, there will be a banner across the top of the canvas stating that there are a number of pending changes in the queries which haven’t been applied. We can go ahead and click on Apply changes, and if you are faced with errors then review the list of issues that Power BI has highlighted. Be aware that if you see an error stating that access to your CSV is denied, then that is most likely because you still have it open within Excel. Close Excel, and then try to apply the changes again.

Building the Dashboard

Now that we have some data, we can start to build out our dashboard. We can begin by selecting the data from the Fields panel on the right-hand side of the screen, which, when selected, will place a data table onto the screen.

We can then apply some formatting changes to the visual to make it easier to read the data, and we can also reorder the data to ensure that we have the largest on top. To reorder the data, you can click on the ellipsis (…) at the bottom of the data table visual, and then select the Sort by field, and the then Sort ascending/descending depending on how you want the data ordered

We can then copy and paste that data table and select a different visualization from the toolbox, for example, we could select a Pie Chart or a Donut Chart. When we select the chart, the data may not necessarily show correctly; therefore we need to make sure that the Count is placed within the Values property. You may also notice that it is still not displayed correctly as all senders are showing as the same size. This is likely because the Value is displaying the Count of Count i.e. everyone is showing as 1. Instead, we need to click on the drop-down and change the Count to Sum, which will then represent the data correctly.

So far we have carried out a number of basic steps to consume some data from a data source and present it on a dashboard. We will now look at enhancing that data using our Cognitive Services.

Enhancing the Data with Cognitive Services

If we return to the Power Query Editor, we want to create a new data source, but this time we will select a blank query.

The first thing we will do is rename the Query to SentimentFunction and then go to the advanced editor so that we can start writing our query using M. We are effectively creating a function which takes in a single parameter and then referencing the Cognitive Services we created earlier in Azure, to process the text.

Notice that the API Key, the End Point, and the relevant JSON are all provided in here, the only variable part is the “text” variable. We are also able to do some elements of validation here such as ensuring that the text is going to be limited to 5000 characters which is the documented limitation for the Cognitive Services APIs.

What we want back is the sentiment, or more specifically the sentiment score which will be a value between 0 and 1.

Once we are happy with the function, we can click Done, and then we can proceed to test it within the Power Query editor and see what we get. We can provide some basic text in as a parameter, as that’s what our function is asking for.

We can then see the score which comes back with a value to determine how positive or negative the text is considered to be.

In this example, “it’s bad” came back with a score of 1% which shows that it is a negative sentiment.

As you run tests, they will be saved as queries; however if you don’t need to keep them then you should consider deleting them. We now know that the sentiment function works, which means that we can now use it, with the ideal data being the Mail query which we created earlier.

Enhancing Mail

In order to make it clear that our Mail is now using the sentiment function, we could rename the query to “Mail_Sentiment.” Within the query, we have the Subject and we have the Sender.Name column, however, we need an additional column now to store the sentiment score.

To add a new column, we need to go to the Add Column tab on the ribbon, and because we’re using a function, we need to select “Invoke Custom Function.”

Once we have selected the custom function, we are faced with the “Invoke Custom Function” screen which allows us to define what the column is going to be called, which function we are going to be using, and of course, the text that we are going to be passing in to be checked. In this case, we will use the Subject column.

When we select OK, the new column will be added to the data set, and each row will be passed to the Azure service to return the score.

Having the score is a good start, but again we can potentially look at improving the display of the data by categorising the data.

Categorising the Data

Based on the score, it makes sense for us to categorise each email as either Positive, Neutral or Negative depending on the score which is returned from Azure. To do this, we need to add another column, however this time it will be a “Conditional column.”

When the Add Conditional Column appears, again we just need to tell it some basic information such as the name and how the conditions will be applied. As an example, if the score comes back as greater than 60% (0.6) then we will categories it as Positive. We can then define our boundaries for Neutral, otherwise it will be Negative.

This information then gets applied to the data set.

Once we are happy, we can then close and apply the queries to Power BI and return to the editor.

Updating the Dashboard

Once we have returned to the Power BI editor, we need to apply the changes and ensure that we accept any of the privacy notices that are presented because we’re accessing Azure services. But once that is done, we can start to build out our visualizations in exactly the same way as we did before, for example, by adding the data table to the canvas along with any other visualizations which would be suitable for the data.

0000-00-00 00:00:00


Leave a Reply

Your email address will not be published. Required fields are marked

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}