It is easy to build a dataset from
Salesforce data in the
Analytic Cloud. It is also rather easy to create a data set from an external file, as long as you don’t have to change the metadata file. How to combine two sets of data together in the Analytics Cloud is less obvious. With a little bit of dataflow manipulation, you can create a new, composite dataset of both sources. Here is a guide to help you through the process.
External Data
First, you want to load your external data from a CSV (comma-separated value) file by creating a new dataset. I have created a
sample Sales Data file that contains sales by account broken down by product and time period. I have also included my Sales Data.json metadata file for reference.
In this case, I did not have to make any changes from the system generated mappings. It is always a good idea to check your date format and values, as this is a common source of error. Now select Create Dataset and your Sales Data is in the Analytics Cloud.
You can then start exploring the data by creating Lenses and Dashboards. What if you want to see this data linked with your Account details from Salesforce in order to analyze by state or account type?
Salesforce Data
Loading Salesforce data is a snap using the Dataset Builder user interface. In this example, I am only interested in pulling Account information, to keep it simple. I am going to add just the Account object and a select fields to my dataset (Account ID, Account Name, Account Type, Industry and Region).
Next, select Create Dataset, provide a Dataset Name and use Create to save. This will add the associate JSON entries to the dataflow for this dataset. The data will not be loaded until the next schedule run of the dataflow, currently set for once a day, or a manual run. Don’t start the dataflow yet, as we are going to have some fun first.
Dataflow JSON
We have avoided JSON manipulation so far, that that is about to change. You cannot combine external data with Salesforce data in the UI, but we can use previously loaded datasets in the dataflow. Start by downloading the dataflow json definition (SalesEdgeEltWorkflow.json) and open it in your preferred editor. I use JSON Editor, but would be willing to switch if you have suggestions on a better tool.
Here is my dataflow JSON that shows the Account load.
"101_Load_Account": {
"action": "sfdcDigest",
"parameters": {
"object": "Account",
"fields": [
{
"name": "Id"
},
{
"name": "Name"
},
{
"name": "Type"
},
{
"name": "Industry"
},
{
"name": "Region__c"
}
]
}
},
"102_Register_Account": {
"action": "sfdcRegister",
"parameters": {
"source": "101_Load_Account",
"alias": "Account",
"name": "Account"
}
}
For simplicity, I have removed other flows for this example. By default, the workflow steps are labeled with numbers. I have added additional text to the name to help distinguish. Your dataflow JSON may look different, what is important is to find the Account load from Salesforce.
This will be a sfdcDigest, with an Object of Account and the fields we select. My is named 101_Load_Account. sfdcDigest will pull the data from Salesforce. Next, find the step with sfdcRegister action that takes has the same source name as the Account load. Mine is named 102_Register_Account, which makes the dataset accessible in the Analytics Cloud based on the name supplied.
Now, we are going to add a couple of additional steps. First, we need to add an edgemart transformation. This allows us to load a previously created dataset into the dataflow.
"200_Sales_Data": {
"action": "edgemart",
"parameters": {
"alias": "Sales_Data"
}
}
If you don’t recall the alias name of your dataset, you can always use the lens.apexp page in your org for reference - https://<pod>.salesforce.com/analytics/wave/web/lens.apexp. My step is 200_Sales_Data and my alias was Sales_Data. Now, the Sales Data file we loaded earlier is available within our data flow.
With the external data available, we can use an augment transformation to combine it with the account data. This is my step 201_Combine_Sales_Account with an augment action. We are going to make the Sales Data our left object and Account the right, specifying the join condition via left key (Account from the Sales file) and right key (Name from the Account file).
All of the columns from the Sales Data data set are automatically included, and we need to specify the columns to include from the right (Account data set) as well as specify the the relationship prefix name to use, Sales in this case.
"201_Combine_Sales_Account": {
"action": "augment",
"parameters": {
"relationship": "Sales",
"left_key": [
"Account"
],
"right_key": [
"Name"
],
"left": "200_Sales_Data",
"right": "101_Load_Account",
"right_select": [
"Id",
"Name",
"Type",
"Industry",
"Region__c"
]
}
},
"202_Register_Account_Sales_Dataset": {
"action": "sfdcRegister",
"parameters": {
"source": "201_Combine_Sales_Account",
"alias": "SalesWithAccounts",
"name": "Sales With Accounts"
}
}
Save the dataflow json file and go back to Data Monitor. Before you upload, you may want to save a backup of the previous dataflow in case you want to revert to your original dataflow. Then, Upload the dataflow json file and start the data flow. Once the dataflow finishes without errors, you have a brand new dataset of external and Salesforce Data.
Final Touches
With our new dataset, we can put together lenses and dashboards of sales activity, over time by various account attributes such as account type, key status, etc.
With the ability to load edgemarts into dataflows, we are not limited to combining Salesforce and external data. We could also load two sets of external data, load both into the dataflow and combine them into one final dataset. A very powerful feature of Analytics Cloud.
External Data Generation Notes
The Account Names are specific to my environment and may not match yours. I created a simple
Sales Data Generator Excel Template that will generate random, sample data. Simply replace the Account Name on the Account tab. If you want to match on a different key to Salesforce, replace the values in Account Name with the desired data. Adding additional columns to the right of Account Name is fine.
The Dynamic tab uses a random number between one and the number of rows with a Row ID to find an Account. You can add more or fewer Accounts, just be sure to have Row ID values for valid rows. The same approach is used for Product.
The period is generated as a random date between a Start and End date specified on the Control Tab. To change the date range, simply enter the desired start and end date. Similarly, Price is selected between the Low and High Price on the control tab. Change the Control values for a different price range.
Finally, as the Dynamic tab changes on each change, you will want to create a static copy of the data for repeatability. Copy the columns from Dynamic and then head to the Sales tab. Perform a Paste Special of Values with Number Formatting. Now you can save the tab as Sales_Data.CSV and you are ready to go with your own custom data.