Recently, I was working to update datasets in Salesforce Analytics Cloud. Some of my fields in the dataset were changing to support new functionality.
As part of the changes, some of the attributes and measures in data set were removed and new fields were added. While working with Wave is usually smooth sailing, this time I encountered rough waters.
The Situation
My new dataflow was rather simple. It used an edgemart to load a dataset loaded externally, computeExpression to calculate a few new columns (instead of loading them from the external source) and then registering the dataset.
After making the modification and uploading a new dataflow version, it was time to run it. Right away, I noticed something was off. Typically, the dataflow took only a few seconds to execute. However, this time it was still running after several minutes.
The Error
As I looked into the details of the execution and it seemed to be stuck on the register step. The edgemart and computeExpression both completed quickly. After a few more minutes, the dataflow failed with the following error:
- There was an error executing the Register node: Registration error: Dataset with display name Data and api name Data failed to register 1 files, EdgemartId: 0Fb410000000PKVCA2, EdgemartVersion: 0Fc41000000dfLTCAY (02K410000004tfOEAQ_03C41000000dOyEEAU)
The Investigation
To start, this seemed like an internal Wave error, not something with my dataflow. A quick search for the error message and review of known issues did not turn up any matches. There was a known issue that had previously been addressed in Spring 16 where a dataset that had the colors set in the XMD 1.1 would fail to register.
While this did not apply to my situation as I did not have colors set, I did have XMD specified on the dataset. As I was in a Winter 17 org, my dataset was specified in XMD 2.0 format.
From previous loading of XMD through REST explorer in Workbench, I knew 2.0 was a little more particular than than the 1.1 version. For example, it is no longer possible to upload XMD that has a dimension or measure that does not exist in the dataset. Instead a JSON parse error is returned. With this knowledge, the XMD on the dataset seemed like a good place to investigate.
The Resolution
I headed over to Workbench to update my XMD. As I had planned to redo the XMD after the new dataset was loaded, it was easy to remove what I had. A simple empty set of XMD would remove my prior customizations. With this change it place, it was time to run the dataflow again. This time, the dataflow finished quickly and without error. My dataset was successfully loaded and ready to use.
The take away from my experience is to review your existing XMD before removing fields from dataset. You will need to remove these elements from the XMD or the dataflow will fail with a registration error.
Hi Carl,
ReplyDeleteI need one help in recovering snapshot dataset which was refreshed with wrong data. Is it possible to recover previous version ? or If i can migrate dataset from sandbox to prod with actual records.
I cannot export from sandbox and upload in production as its having 2 million records.
Thanks
Raj
Hi Raj,
ReplyDeleteFirst let's make sure I understand your scenarios. You have an Reporting Snapshot report in Salesforce that it generated on a regular basis (weekly or monthly). That report is loaded into Wave for visualization.
Are you setup to Trend Any Report in Wave or is the custom object that holds the snapshot results part of the data flow?
When you say the snapshot dataset was refreshed with the wrong data, does this mean the object in Salesforce is correct and it is only Wave that has the incorrect data? Or is the data wrong in Salesforce as well?
There are different option based on your particular situation. Although I have not faced this situation before, here are my thoughts for your consideration.
Trend Any Report
- As this is stored as a dataset, you could work up some XMD to remove the incorrect rows. You would need to export them from Wave. You could then use the ExternalData API and a delete operations to remove the rows from the dataset. This does require a unique key set on the data set to work.
- Alternately, you could look at dataflow that extracts the dataset, filters out row that are incorrect and then registers the dataset.
- In either case, you would not be able to recover the old data, as Trend Any Report does not recreate a reporting snapshot in Salesforce to keep a copy of the data.
Reporting Snapshot Dataflow and Salesforce data okay
- This seems like the easiest case, as you can update the dataflow to pull the Reporting Snapshot object back into Wave. You may want to go to a separate dataset name to test it out first. Once it looks good, you can switch the dataflow back to the original dataset and then remove the test object.
Reporting Snapshot Dataflow and Salesforce data overwritten
- This gets tricker. If you have weekly exports enabled and include this object you would be able to recover the data and upload to Salesforce (Bulk API). There would be a gap based on the last export. You mentioned a Sandbox, this could also be used to upload the data to Salesforce (Bulk API for export and import). There would be a gap here as well. Once the data is back in Salesforce, the above Wave scenario would apply.
- Will need to keep large data volumes in mind when working with the data.
If the 2 million records are in Wave Sandbox. I have been able to use a high limit with a data table (100,000 rows) and export to csv. This would only be 20 files and you might even be able to get a higher row limit. It does take a file for the screen to load and the csv to export.