Sunday, May 3, 2015

How to Combine Salesforce and External Data in Analytics Cloud

It is easy to build a dataset from Salesforce data in the Analytic Cloud. It is also rather easy to create a data set from an external file, as long as you don’t have to change the metadata file. How to combine two sets of data together in the Analytics Cloud is less obvious. With a little bit of dataflow manipulation, you can create a new, composite dataset of both sources.  Here is a guide to help you through the process.

External Data

First, you want to load your external data from a CSV (comma-separated value) file by creating a new dataset. I have created a sample Sales Data file that contains sales by account broken down by product and time period. I have also included my Sales Data.json metadata file for reference. 

In this case, I did not have to make any changes from the system generated mappings. It is always a good idea to check your date format and values, as this is a common source of error. Now select Create Dataset and your Sales Data is in the Analytics Cloud.

You can then start exploring the data by creating Lenses and Dashboards. What if you want to see this data linked with your Account details from Salesforce in order to analyze by state or account type?

 

Salesforce Data

Loading Salesforce data is a snap using the Dataset Builder user interface. In this example, I am only interested in pulling Account information, to keep it simple. I am going to add just the Account object and a select fields to my dataset (Account ID, Account Name, Account Type, Industry and Region). 

Next, select Create Dataset, provide a Dataset Name and use Create to save. This will add the associate JSON entries to the dataflow for this dataset. The data will not be loaded until the next schedule run of the dataflow, currently set for once a day, or a manual run. Don’t start the dataflow yet, as we are going to have some fun first.

 

Dataflow JSON

We have avoided JSON manipulation so far, that that is about to change. You cannot combine external data with Salesforce data in the UI, but we can use previously loaded datasets in the dataflow. Start by downloading the dataflow json definition (SalesEdgeEltWorkflow.json) and open it in your preferred editor. I use JSON Editor, but would be willing to switch if you have suggestions on a better tool.

Here is my dataflow JSON that shows the Account load.
  "101_Load_Account": {
    "action": "sfdcDigest",
    "parameters": {
      "object": "Account",
      "fields": [
        {
          "name": "Id"
        },
        {
          "name": "Name"
        },
        {
          "name": "Type"
        },
        {
          "name": "Industry"
        },
        {
          "name": "Region__c"
        }
      ]
    }
  },
  "102_Register_Account": {
    "action": "sfdcRegister",
    "parameters": {
      "source": "101_Load_Account",
      "alias": "Account",
      "name": "Account"
    }
  }

For simplicity, I have removed other flows for this example.  By default, the workflow steps are labeled with numbers.  I have added additional text to the name to help distinguish. Your dataflow JSON may look different, what is important is to find the Account load from Salesforce.

This will be a sfdcDigest, with an Object of Account and the fields we select.  My is named 101_Load_Account.  sfdcDigest will pull the data from Salesforce.  Next, find the step with sfdcRegister action that takes has the same source name as the Account load.  Mine is named 102_Register_Account, which makes the dataset accessible in the Analytics Cloud based on the name supplied.

Now, we are going to add a couple of additional steps. First, we need to add an edgemart transformation. This allows us to load a previously created dataset into the dataflow.
  "200_Sales_Data": {
    "action": "edgemart",
    "parameters": {
      "alias": "Sales_Data"
    }
  }

If you don’t recall the alias name of your dataset, you can always use the lens.apexp page in your org for reference - https://<pod>.salesforce.com/analytics/wave/web/lens.apexp. My step is 200_Sales_Data and my alias was Sales_Data.  Now, the Sales Data file we loaded earlier is available within our data flow.

With the external data available, we can use an augment transformation to combine it with the account data. This is my step 201_Combine_Sales_Account with an augment action.  We are going to make the Sales Data our left object and Account the right, specifying the join condition via left key (Account from the Sales file) and right key (Name from the Account file). 

All of the columns from the Sales Data data set are automatically included, and we need to specify the columns to include from the right (Account data set) as well as specify the the relationship prefix name to use, Sales in this case.

  "201_Combine_Sales_Account": {
    "action": "augment",
    "parameters": {
      "relationship": "Sales",
      "left_key": [
        "Account"
      ],
      "right_key": [
        "Name"
      ],
      "left": "200_Sales_Data",
      "right": "101_Load_Account",
      "right_select": [
        "Id",
        "Name",
        "Type",
        "Industry",
        "Region__c"
      ]
    }
  },
  "202_Register_Account_Sales_Dataset": {
    "action": "sfdcRegister",
    "parameters": {
      "source": "201_Combine_Sales_Account",
      "alias": "SalesWithAccounts",
      "name": "Sales With Accounts"
    }
  }

Save the dataflow json file and go back to Data Monitor.  Before you upload, you may want to save a backup of the previous dataflow in case you want to revert to your original dataflow.  Then, Upload the dataflow json file and start the data flow. Once the dataflow finishes without errors, you have a brand new dataset of external and Salesforce Data.

 


Final Touches


With our new dataset, we can put together lenses and dashboards of sales activity, over time by various account attributes such as account type, key status, etc.

image

With the ability to load edgemarts into dataflows, we are not limited to combining Salesforce and external data. We could also load two sets of external data, load both into the dataflow and combine them into one final dataset. A very powerful feature of Analytics Cloud.

 


External Data Generation Notes


The Account Names are specific to my environment and may not match yours. I created a simple Sales Data Generator Excel Template that will generate random, sample data. Simply replace the Account Name on the Account tab. If you want to match on a different key to Salesforce, replace the values in Account Name with the desired data. Adding additional columns to the right of Account Name is fine.

The Dynamic tab uses a random number between one and the number of rows with a Row ID to find an Account. You can add more or fewer Accounts, just be sure to have Row ID values for valid rows. The same approach is used for Product.
The period is generated as a random date between a Start and End date specified on the Control Tab. To change the date range, simply enter the desired start and end date. Similarly, Price is selected between the Low and High Price on the control tab. Change the Control values for a different price range.

Finally, as the Dynamic tab changes on each change, you will want to create a static copy of the data for repeatability. Copy the columns from Dynamic and then head to the Sales tab. Perform a Paste Special of Values with Number Formatting. Now you can save the tab as Sales_Data.CSV and you are ready to go with your own custom data.















23 comments:

  1. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.
    Regards,
    Salesforce training in Chennai

    ReplyDelete
    Replies
    1. Thanks for yours comments, I appreciate the feedback. Let me know if there are additional topics you would find useful as I am always on the look out for ideas to share.

      Delete
  2. Hi Carl,
    Very informative post. I was exactly looking for this feature and found this. It helped me a lot. Thank you so much.
    Can you also please provide a post on how to include regional heat map (with the actual latitude longitude map) in Analytics Cloud?

    ReplyDelete
  3. From what I have heard, chorolpleth map (map where areas are shaded based on data) is in the road map for Summer '16 (safe harbor). Additional features like cluster maps would be in subsequent releases.

    ReplyDelete
  4. Hi Carl,

    I was actually looking for this and it helped me a lot .Thank you so much!!

    Is it possible in Salesforce Wave (Analytics Cloud) to default filter via the URL?
    so that from an account I can create a custom button that redirect the user to a wave dashboard and default automatically the filters.

    Thanks in Advance.

    ReplyDelete
    Replies
    1. Shruthi,

      It's even better. You can embed a Wave Analytics Dashboard in a Salesforce Page with page layout editor. Use the field mapping to filter https://help.salesforce.com/apex/HTViewHelpDoc?id=bi_dashboard_embed.htm&language=en_US

      If visualforce is more your style, it can be done here as well https://help.salesforce.com/apex/HTViewHelpDoc?id=bi_embed_vf.htm&language=en_US

      Be sure the dashboard is an a shared app. If it is in a private app, you will receive an error.

      Delete
  5. Just wanted to say thank you. Great post. The generosity of professionals like yourself genuinely adds to the greater good by fostering a culture of knowledge sharing and learning.

    ReplyDelete
  6. Hi Carl

    Is it possible to join two external data sets into one by using primary keys and foreign keys to link them?? Ultimately I would like to upload 5 data sets usign the CSV option that are all joined by a trasnaction ID #. I am assuming for wave analytics to work and have a drill down capability I need to join these data sets into one.

    Thank you in advance!!!

    Harry

    ReplyDelete
  7. Hi Harry,

    Yes, you can join two external data sources. First you would need to load all of the sets of data into Wave. Then you would modify the data flow JSON.

    Use the edgemart action to load one data set and join to the second. Then take the third and join it to the result where you combined set on and two. Repeat for sources four and five.

    This would give you one set with all of the data joined together. You will need to watch the ongoing data loading process. The data flow runs on a schedule, but the csv uploads are manual.

    I don't know your exact use case, but you don't always have to have all of the data loaded into one data set. While it makes it easier to use the point and click tools in wave to perform an analysis, you can write SAQL on a dashboard to cogroup data from different data sets.

    I have written many SAQL statements that takes data from two sets and cogroups on a common key. It certainly is another option you can investigate

    Regards,
    Carl

    ReplyDelete
  8. Thanks Carl!!!

    This has been very helpful!!

    I was actually able to follow your instructions and join the data sets.

    Two more questions:

    1) You mention writing a SAQL statement instead....do you have a blog about that?

    2) You also mention something about registered rows vs non registered rows and how you can have a billion of unregistered rows if you want (in a youtube video). Can you elaborate on that a little? How do I un register the rows and what impact does that have exactly

    ReplyDelete
    Replies
    1. 1. I will work on a post. The cogroup statement allows you to group multiple data sets on the same dimension, which can be useful to join together data.

      2. When you load a dataset in a data flow, only the records in the register statements count towards the limit. So if you load data for 2 years of history from Salesforce, then filter to only the last 12 months, only the rows after the filter in the register statement count towards your limit. When you load a csv file, all the rows will count.

      Delete
  9. Hi carl,

    I am working on a JSON binding for 3 different object which dont have direct relationship and i am not able to bind all of 3 together. could you help me any insight how i will do so. I have 3 object (BreakFix__Kav,Custom Object1,Break_Fix__ViewStat). BreakFix__Kav and Custom Object1 have a common field called "Article Number" but Break_Fix__ViewStat dont have such field rather it holds Article Id as "parent Id". could you plesae hep me how i will bind these 3 together.

    ReplyDelete
  10. Hi Amit, thanks for your question.

    To join multiple datasets for analysis in a dashboard, you will want to make use of the cogroup operator. Cogroup will match together records and group the results.

    In your case, you can do something like the following with SAQL
    q = load BreakFix__Kav;
    r = load Custom;
    s = load BreakFix__ViewStat;
    t = cogroup q by Article_Number, r by Article_Number, s by ParentId;
    t = foreach t generate q.Article_Number as 'Article_Number', sum(r.Some_Field) as 'sum_Some_Field', sum(s.Other_Field) as 'sum_Other_Field';

    It is fine that the column names are different, as cogroup looks at the values. You can also string together multiple datasets. Cogroup takes only the matches, unless you specify right, left or full. Also, be sure to reference the projection in the foreach statement fields.

    I still have a full post on cogroup in the works.

    ReplyDelete
    Replies
    1. Thank you carl. It worked for me and i saw your detail post about cogroup now. its nice

      Delete
  11. Hi Carl,

    My apologies, I must have missed something...in what screen in wave analytics do you use cogroup operator/statement?

    Is it in the data flow (JSON) similar to how we join the data sets?

    ReplyDelete
  12. Hi Carl,
    Great stuff!
    I have a use case of creating a dataset by augmenting external data with 3 objects of Salesforce that are not related but external data has ids that are set up as external ids in these 3 objects. External data is loaded nightly(append delta, not full refresh). Will your approach still work?

    Appreciate your help!

    ReplyDelete
  13. The cogroup goes in the SAQL/pigql statement in a dashboard's JSON. This approach is helpful when the data is in different data sets.

    ReplyDelete
  14. Is there a possibility of using Wave to consolidate data from 2-3 Org's and build reports (similar to forecasting) ? What is the best way to automate data extraction or do we need to manually import the data into Wave ?

    ReplyDelete
  15. Yes, Wave supports multi-org. However, the dataflow can only extract from one org currently. You would want an integration tool (Informatica, Dell Boomi, Jitterbit, etc.) to source the data from multiple locations and push it into Wave. You could do it with csv files, but it would be an entirely manual process.

    With a multi-org approach you can go Hub & Spoke, Analytics Hub or Salesforce-to-Salesforce. Each has its pros and cons

    ReplyDelete
  16. Hi Carl, I have a CSV file that I need to upload per month. Is there an easy way to simply add to an existing data set in Wave?

    ReplyDelete
  17. Hi Harry, the short answer is yes, but not with the User Interface. There are append and upsert operations that will add data to a csv data set. I will work on a new post with the details.

    ReplyDelete
  18. Thank you Carl!!! As always your help is greatly appreciated!!!

    I will keep checking back for any updates.

    Cheers!

    Harry

    ReplyDelete
  19. Hot of the presses is a new post on the external data api and using dataset utils to upsert, append and delete data - http://blog.canntechnology.com/2016/11/salesforce-wave-explained-external-data.html

    ReplyDelete