Tuesday, March 31, 2015

Danger! Don't delete that data source!

Here's a common occurrence (at least for me): I'm working merrily along, having a terrific time with my data, creating all sort of good and valuable stuff. The information and insights are pouring out of Tableau. It's an almost zen state where I'm in the flow and things are humming along with the celestial choir. All is good.

After a while I notice that there's a lot of stuff in my workbook that doesn't really need to be there. Experiments, trials, that sort of thing. I've connected to a few fair data sets along the way, some of which aren't really necessary. Some Tableau gardening it in order.

Among the pruning and trimming there's a data connection that's no longer needed, for whatever reason. So I, thinking it's occupying space that could use freeing up, go ahead and ask Tableau to close it.

Close it, Dan-o.

However, unbeknownst to me the data connection is in use by one or more worksheets, so closing it would leave it or them with nothing to display. This would not be good.

Um, Boss?

Fortunately, Tableau recognizes the danger and is more than happy to help me avoid this calamity.

So it steps in and provides this very nice message, letting me know that closing the data connection will clear all the worksheets that access it. But Tableau, understanding that I may well want this to happen, offers to go ahead and close it.

Hey... just a second there.

Something's missing.

I really should make sure that these worksheets that will be cleared by closing the data connection aren't important enough to keep. So to hop to it and give them a look see.

Vaiter! Come taste the soup.

Tableau's nice enough to tell us that there are workbooks in peril, but not so nice as to tell us which ones they are. For a product devoted to helping access and and understand data this is an unfortunate oversight. (but this horse is well enough beaten by now)

Wanted: a quick, simple way to identify the Worksheets.

It would be really handy if there were only some way to identify which Worksheets were using this data connection. One that didn't take a bunch of setup, that could have the answer lickety-split.

Well, as it happens, such a thing has just made its debut.
Simple to use—no muss, no fuss, just the facts' bare bones.

Introducing the DataSource—Worksheet showing tool.

OK, maybe it's not that good a name, but it works, and isn't that the important thing?

The tool is a mini-tool written in Ruby that takes advantage of the Ruby gem 'twb' for the heavy lifting of working with Tableau Workbooks. (the gem will be the topic of it's own blog posting very soon)

Intended to be fast and simple, the tool is a small Ruby script that's run from the command line. I'm working with Windows but think there should be no trouble running it on your Mac. It's a command line app because that's where I do a lot of my work, and putting a nice, pretty, functional UI on it would take time and effort beyond its utility to me.

The tool in action—finding the Worksheets that access the "Sample - Coffee Chain (Access)" data connection.

Here's the Windows command environment showing that the Workbook is in the local directory. The tool can be pointed at another directory, but working local is a good idea.
The simple case.

The tool is invoked as the Ruby script it is with the command:

ruby dsinsheets.rb

In this case the tool is in the local directory, but it can be invoked from wherever it lives. There are three prompts from the tool:

  • In directory? (.):
    the directory to look for Workbooks in (default to 'here')
  • TWB(s) Named? (*):
    Which Workbooks to look in?
    in this case we're interested in our workbook, whose name begins with 'multi*'
    if nothing is put here, the tool will look in all the Workbooks in the directory
  • Data Sources? (*):
    Which data connections to look for?
    the default is for all of them, which is what we see here.
Eureka!

As is clearly shown here our old pal, the "Sample - Coffee Chain (Access)" data connection is used by two worksheets:

  • "Coffee Product Lines", and
  • "Sheet 9"

The tool.

The tool is a simple Ruby script, the code for which is below. It's intentionally unsophisticated, intended to be a sharp, precise tool that does one job with a minimum of fuss, with which it's easy to get the desired results.

Like many small tools it takes some handling to get a feel for. Written in Ruby, it should be very portable, and run well wherever needed. There is some setup required: Ruby must be installed, of course, and the 'twb' and 'nokogiri' gems must be installed—this is everyday stuff and there should be someone nearby who can help if it's outside your skill set as of now.

One potential wrinkle: the tool uses regular expressions to look for data connection names, so if you're interesting in seeing only specific ones you'll need to observer regex conventions when specifying them. Practice makes for easier, more comfortable use.

The tool's code.


#  dsinsheets.rb - Copyright (C) 2014, 2015  Chris Gerrard
#
#  This program is free software: you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation, either version 3 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  See  for more information.

require 'nokogiri'
require 'twb'

def init
  puts  "\n\n\tLooking for Worksheets related to Data Sources.\n\n"
  print "\n\tIn directory? (.): "
  input = STDIN.gets.chomp
  dir   = if input == '' then '' else input + "/" end
  # -
  print "\n\tTWB(s) Named? (*): "
  input = STDIN.gets.chomp
  twb   = if input == '' then '*.twb' else input end
  # -
  print "\n\tData Sources? (*): "
  input = STDIN.gets.chomp
  $ds   = if input == '' then '.*' else input end
  # -
  puts "\n\n"
  # puts "\n\tLooking for Data Source(s) matching #{$ds}\/ in '#{path}' Workbooks\n\n\n"
  dir + twb
end

dataSourceSheets = {}

def processTWB twbWithDir
  return unless twbWithDir =~ /.twb$/
  puts "\t#{twbWithDir}\n\t=============================="
  twb = Twb::Workbook.new twbWithDir
  $datasources = {}
  twb.worksheets.each do |ws|
    ws.datasources.each do |ds|
      if ds.uiname =~ /#{$ds}/i then loadSourceSheet(ds.uiname, ws.name) end
    end
  end
  $datasources.each do |dsn, sheets|
    puts "\n\t -- #{dsn}\n\t  |"
    sheets.each { |sheet| puts "\t  |-- #{sheet}" }
    puts "\n"
  end
end

def loadSourceSheet ds, sheet
  if   $datasources[ds].nil? then $datasources[ds] = [] end
  $datasources[ds].push sheet
end


path = init
Dir.glob(path) {|twb| processTWB twb }

puts "\n\n\tThat's all, Folks.\n\n"

1 comment:

  1. LOVE IT - Very well written. Thanks Chris.

    ReplyDelete