Learn how to deal with spreadsheets and the need to get their content materialized in a database, or in a BI tool in an easy and efficient way.
I don’t know how many times I used the title sentence: “There’s ALWAYS a spreadsheet…”. When giving a demo of Keboola Connection, often I’d start with Google Sheet as the data source of choice. Google Drive extractor is one of the most popular components in our platform - because, you guessed it, no matter what the project is, there’s ALWAYS a spreadsheet…
Even internally, we have 13 different Google Sheets that serve as secondary data sources. The use cases span from a simple list of our consultants and their current base office and time zone, through the finance department’s rolling forecast, with tons of things in between. Our customers have a plethora of other cases, with entering targets, custom hierarchies of products and enhancing various lists (locations, people etc.) with information not present in their database being the most frequent examples.
Spreadsheets are also a very useful stop-gap measure, when you need to implement a business process but it is not yet supported by the primary systems, such as SalesForce, where it should live eventually. Or, you’re still figuring out exactly how it should work, and want to iron out the kinks before committing to development, or even before having the software that will eventually harbour and supply this data. Earlier this week we needed to implement such a case - a method for our support to grant free credits to our Freemium customers. (If you are not familiar, in our free plan you get 300 minutes per month for free. Eventually, this will be implemented in SalesForce, but that takes time. So, spreadsheet it is, for the time being.
Now, we usually extract data from the spreadsheet (so, on schedule, an orchestration runs that pulls the data from Google Sheet). That works fine, but what about when you need that one table updated right now? Or what if you need the ability to trigger an update manually? Sure, I can go to the Keboola project and manually run the orchestration, but clearly not everyone with the privilege to add rows to that spreadsheet will also be a project admin. You kinda want to push the data. If only I could have a “run upload” button or something like that directly in the sheet… But wait a minute. Google Sheets have the scripts, starting an orchestration in Keboola is just an API call given the right token, so how hard can it be?
After quick Slack consultation with Fisa (our head of Professional Services and my partner in many, many hacks and crimes), not only was it confirmed to me that this is in fact pretty easy, I learned that (again) it’s not even such an original idea. Almost instantly, a script that our professional services guys Leo and David wrote materialized in the Slack thread. From that moment on, it was just a matter of copy-pasting, and about three minutes later:
Works like charm!
Given that you read this far I assume that you, too, deal with spreadsheets and the need to get their content materialized in a data warehouse, or in a BI tool. Perhaps even with some transformation or enrichment needing to be done along the way.
I’ll share the script below and will walk you through step-by-step process of how to get this done. Here are the steps.
In Extractors, select “Google Drive” and follow the on-screen prompts, authorizing and selecting your spreadsheet and tab(s).
If you need to do something with the data - may be as simple as some aggregations, or as complicated as you can imagine, the transformation engine is the right place to start. In the free account you can choose between SQL (running on Snowflake) and Python as your weapons of choice.
Select the writer according to your need. It may be Snowflake or another database (if you don’t have a destination database yet, the Snowflake writer will set up one for you), or it could be Salesforce or Mailchimp - not every data task is about BI and analytics.
By this point you set up three (or just two, in case you skipped the transformations step) components in Keboola Connection. An Orchestration will organize them together in a workflow - create new orchestration and in the “task list” section, select the components in the order in which you want them to run. Because this orchestration will be triggered by the Google Sheet script, you don’t need to set up a schedule for it - unless you want to, of course. You may want the workflow to run each day even if no one pushes the button. No problem with that.
During the next two steps we will be switching between Keboola Connection and your Google Sheet, so may be a good idea to have them open side by side. First, in the sheet go to “Tools” and “<> Script Editor” (it will open a new tab). Give your script a name and then copy the entirety of the following block and paste it in there. No editing is needed, so just click save. Go back to your spreadsheet tab and reload. A “Keboola” section will appear in your menu.
When it runs for the first time, the script asks you for the Orchestration URL and a token. This is where the copy-pasting between Keboola Connection and the Google Sheet will take place. Follow these steps:
That’s it. Google Sheet will save the entered URL and token as document properties and use them next time. If ever you need to change the settings (maybe you’ll build a different orchestration or you will want the token refreshed), just choose the “Reset Settings” item from the Keboola menu and everything will go back to fresh start.