Tuesday, March 31, 2015

Tableau and the Golden Tool Rule

The Golden Tool Rule

A golden tool is one that makes doing something useful simple, straightforward, and easy.

Golden tools are delights to use. Wielding one establishes a connection with the material one's working with, whether it's a piece of wood being shaped for a custom cabinet, vegetables being diced for the stew pot, or data that's being tossed and turned, flipped, filtered and pulled together into representations that make sense.

Tableau is, at its core, a golden tool. It makes the basic data analytical operations simple, straightforward, and easy. Connecting to data is as simple as dragging and dropping a data-containing file onto or into it. Want to see what's in this dimension? Double-click on it. Interested in the sum of Profit? Double-click on it and Tableau shows it to you in the most effective visual form for the current context. Clicking, double-clicking, and dragging and dropping the data fields (and other objects) causes wonderful things to happen — Tableau understands these actions, this language of visual data analysis, and knows how to present the things you've asked it to in ways that make sense.

Original gold.

Years ago I spent almost a decade working for the company that invented the business data analytical software product category. With FOCUS–our company's product, it was possible to express the basic data analytical operations in a clear, concise, human-oriented language that anyone could pick up and get started with. FOCUS was a golden tool, in its core and time, and its ability to help forge a close connection with their data made many, many people much happier, and vastly more productive than they'd been before.

In their bones...

Tableau and FOCUS are strongly analogous. Each was a quantum step forward from the other tools of its time, making the basic data analytical operations simple, straightforward, and easy. Tableau did this by providing a visual syntax and grammar oriented around organizing data elements as first order user interface objects that represent the data and the analytical operations. FOCUS accomplished this by providing a simple language that used English words to implement the same structure and operations.

To illustrate the heart of Tableau and FOCUS, we'll assume that Snow White's Dwarf friends have been keeping track of the number of gems they've mined, and that each of the seven works in a Department.

We want to know a simple, basic thing: how many gems in total were mined by the Dwarves working in each Department?

Finding the sum of Gems mined per Department with Tableau,
and with FOCUS.

Creating the analytic above is a simple two-action process:

  1. Move "Dept" to "Rows"
    by dragging it as shown, double-clicking it in the data window, or dragging it to the left-side zone of an empty sheet.
  2. Add "Gems" to the viz
    by double-clicking it in the data window, dragging it to the center zone of an empty sheet, or dragging it to the Text button on the Marks car.
Order of Operations Matters.

One of the things that confuses people new to Tableau is that it doing things in a different order gets different results.

For example, the illustration above shows Dept being put on the Rows shelf first, followed by Gems being added to the viz. If the order is changed, with Gems first and Dept second, the visualization will be different. It's left to you, dear reader, to give it a go for yourself and see what happens – this little, seemingly innocuous exercise reveals one of the subtle, deep mysteries of Tableau. Understanding it unlocks many Tableau doors.

This bit of FOCUS:

 
      TABLE FILE Dwarves
        SUM Gems
        BY Dept
      END
 

provided at the interactive prompt, or run from a file, creates this report:


      Dept    Gems
      ======= ====
      Medical   34
      Mgmt      35
      Ops      518

Almost seems too simple, doesn't it?

Eight simple, straightforward words to generate the analysis. Even better, the two analytical statements need not be in any particular order:

 
      TABLE FILE Dwarves
        BY Dept
        SUM Gems
      END
 

Swapping the order of the BY and SUM statements makes no difference.

FOCUS was in this sense non-procedural, making it even easier to get results fast because people didn't need to know what order to put them in.



The first gold rush.

It's very difficult for those who weren't there to understand how golden FOCUS was in its day.

Introduced in 1975, FOCUS was astonishing. Instead of waiting for their COBOL developers to program and deliver reports people could now use a simple English language to get reports from their business data immediately. As a FOCUS consultant I was most often able meet with my clients in the morning and have reports ready for review that afternoon.

FOCUS was the most successful 4GL, the premier product for conducting business data analysis, used by organizations across the globe to help get information out of data into the minds of the people who needed it.

The ability to access and analyze information with minimal intervention by or support from an organization's data processing group (IT's predecessor) changed the world. Business people could make decisions based on their data rather just relying on intuition. FOCUS was used across industries and in the public sector. Life was good. As a FOCUS consultant, and a product manager, with Information Builders, Inc. (IBI)–FOCUS' vendor–I was able to help make a material difference in our clients' use of their data. In the mid-1980s IBI was one of the world's largest independent software companies, with revenues in the hundreds of millions of dollars, many. many loyal customers, and legions of devoted FOCUS users.

And then things changed.

FOCUS was conceived in the mainframe world. It thrived in that world, where CRTs and line printers were the human-computer interface, where hierarchical databases were common. Its beauty and grace were of that world. But the world of business computing changed, evolved into a world where FOCUS' mainframe roots were out of step with the emerging models of how people interacted with their computers.

Different models of human-computer interaction emerged, replacing the terminal, character-based, block mode mainframe interaction where applications drove the conversation. Minicomputers introduced per-character interaction, allowing finer granularity; every keystroke the user typed could be examined as it was typed. Micro and personal computers took this further, inverting the Human-Computer relationship, allowing for different application models. Then GUIs showed up, providing entire new possibility horizons for creating software tools that support the person who's trying to accomplish their work.

The world was full of promise from the mid-80s into the 90s. There was a vibrant environment of innovation within which clever people were trying to figure out how best to take advantage of the new ways of computing to build the next generation of golden tools. GUI PC applications were becoming well established. Business applications were evolving at a rapid pace, notably word processors and spreadsheets. At IBI we were working across all the platforms – PCs, Unix, VAX, even mainframes, on technologies and designs to create the next-generation tools that would surface the simplicity, elegance, and expressiveness of FOCUS' data analysis language using the modern Human-Computer interfaces. During this period I worked first in the Micro Product division, then in the Unix division, and with others across the divisions to create great new tools. In the Unix division we created an object oriented programming language and platform and used it build a new GUI-based, network-aware FOCUS that surfaced the basic data analytical operations as top-level UI elements. Other divisions in IBI were working on similar projects, each group creating new and wonderful stuff. At the same time other companies were working on and releasing post-mainframe reporting software products.

In the early 90s the decision was made to shut down the different divisions' projects and adopt the Microsoft Windows-based approach that eventually became WebFOCUS. It was sad. An entire generation of people left the company.

Meanwhile, things were happening, forces were marshaling, that led to the near-extinction of simple, straightforward data analysis as a viable ambition.

The business data analysis dark age descended.

For many years things were bad. The environments had changed—the reasons for it are many, and beyond the scope of this post. We, who prided ourselves on our ability to access and understand data, and to help our clients and customers do the same, had to watch helplessly as the giants ran amok, vying with one another to create ever-larger and more monstrous mountains, continents even, of data with little regard for the need to actually understand it at all scales. Consolidation before analysis in the pursuit of the mythical single version of the truth as the fulcrum about which all business data analysis pivoted became the unquestioned paradigm. Billions of dollars were spent and wasted with little to show for it. Life wasn't good. Unless you were profiting from Big BI platform sales and implementation consulting dollars.

And then, an opportunity.

In 2006 I was working for a major city government building web applications (hey, one needs to eat) when I was asked to review the ongoing citywide data warehouse project. It had been going on for a long time, eaten through tons 'o money, and had exactly two users, both of who were part of the group creating it.

Seemed simple enough: all I needed to do was understand the data as it entered and moved through the system. there were many data feeds being slurped up into staging databases; there were ODSs, ETL processes inputs and output, an EDW, and a Big BI platform, with some ancillary bits. And nobody knew, or could provide, any real transparency into what was going on. It was an almost perfect situation. All I needed was a way to access and understand the data. All the data, wherever it lived.

But how? I needed a good data analysis tool, one that could do the job with a minimum of fuss and bother, that would get let me do my work and stay out of the way. I wanted–needed–a tool with FOCUS' analytical simplicity and elegance, but in a modern form, ideally an interactive UI with direct-action support for basic data analytical operations.

So I started surveying the landscape, looking for something that would fill the bill. I tried out all the tools I could find. The most promising of the bunch were, in alphabetic order: Advizor, QlikView, Spotfire, and Tableau. They all had their strengths; each of them was an excellent tool within its design space. But which of them were any of them created for the purpose I needed – making it as simple as possible to access and analyze the data I needed to understand? Anything extraneous to this, any extra 'benefits', were of no interest to me, and in fact violated the GTPD (Golden Tool Primary Directive): anything that doesn't provide direct benefit to the specific, immediate need is not only of no value, it's a drag and detriment to getting the real job done. (and it's amazing how may BI technology companies have failed to, and continue to, recognize this simple truth - but that's a topic for other times)

Eureka! A nugget!

Only one of the tools was designed specifically to make data analysis a simple, straightforward non-technical activity that was approachable, welcoming, and truly easy to use. Tableau was far and away the best tool for the job I needed to do. And in this space it's still the best tool that's come to market.

I love Tableau for the bright light it brought to a dim, drab world. Right out of the box I could see and understand data. What's in this field? How many records are there? How much of this is in here? What's the relationship between this and that? How many permits of each kind of permit have been issued? (it was a city's operational data, remember) It was a great, great, thing then, and for these purposes it remains a great product.

The second gold rush.

For the first few years Tableau was my personal tool, one I used in my work, for my own purposes. For a time I had a steady stream of work rescuing traditional Big BI projects that had gone off the rails by using Tableau to help bring clarity to the entire endeavor. Instead of relying on technical people using SQL query tools to try to make sense out of tables of data Tableau let us see the data as the business information it really was, improving the quality and velocity of the work.

It took a few years for it to catch on—people are naturally conservative, particularly those with a vested interest who feel threatened. But as Tableau became used by more and more people it helped them individually, and it demonstrated that there really is a market, a demand, for highly effective tools that let people understand that data to matters to them wit a minimum of fuss.

Life was good again.

Tableau, and the people who created, supported, championed, and used it to good effect richly deserves the credit for the good done. Now the door is open, the horizons are expanded so far they're almost out of sight.

But...

Tableau is a golden nugget, a shiny, impressive nugget. Which, to stretch the metaphor, was invaluable when there wasn't any other gold to be had.

But it's only a nugget.

I've mentioned Tableau's core. This is the area where Tableau got it right: providing fundamentally direct and easy to use mechanisms implementing the basic data analytical operations. In this space there's not much room between how good Tableau is and how good it's possible to be. So, what are these basic operations? Simple, they are the things one does to organize, sort, filter, and aggregate data so that it can be observed and assessed in order to understand it. They are, briefly:

  • Choosing which fields to see – e.g. Profit, Region, and Department
  • Organizing the fields – e.g. Profit for the individual Departments by Region
  • Filtering, choosing which data to see, – e.g. only the West and South Regions; only records with Profit > 12
  • Deciding which aggregation to use – Tableau assumes SUM() as the default

In this, the basic data analytical space, that formed the great majority of the product when it was introduced, and when I started using it, Tableau is golden; it makes doing these things about as simple and easy as it can be, and on top of that it provides high quality visualizations of the aggregated values in context, both in the type and rendering. Gold doesn't tarnish, and Tableau's luster here hasn't faded.

But this space isn't the whole of it. There's a lot more to the totality of data analysis than the initial data analytical space, and beyond the initial space there are many places and ways in which Tableau isn't as good as it could be. This blog contains some of the areas where Tableau falls short, there are many, many more that I encounter every day. Some of them are just annoying, like the horrible formatting system. Some are application architecture aspects, like the layout and organization of the workspace where the data, dashboard, and formatting panes all share the same space, making for a lot time- and energy-wasting opening and closing. Others are structural, like the leveraging of reference lines to implement pseudo-bullet graphs, which are crude and cartoonish compared to what they could be. The list is very long, and Tableau doesn't seem to be spending any energy fixing what should be better.

Viewed broadly, Tableau is a golden nugget embedded in a matrix of cruft and bolted-on, awkward, barnacled machinery that gets much more in one's way than out of it. Worse yet for being largely undocumented—but for the immensely impressive work of people in the Tableau community who've spend vast amounts of time chipping away at it we'd be largely lost in an impenetrable forest.

He's not handsome, but he sure can hunt.

You may at this point be thinking: why on earth is this guy still using Tableau, if he's so unhappy with it?

I'm glad you asked. It's because, as much as I wish Tableau was better in all the ways I know it could and should be, it's still the best tool ever invented for basic data analysis. Bar none.

But for how long? Tableau's opened up the door and shown the world that data isn't just something that lives in corporate closets, mines, or dungeons. People are ready for and receptive to the idea that they should be able to access and analyze their information easily and simply. The horizons are expanding and the world is primed.

Prospecting.

Now that there's a bona fide demand for simple, easy, straightforward data analysis, the question is:

Where will the next golden tool come from?

Just as Tableau appeared and ushered in a new age, there will be a tool that embraces the principles of easy, simple, straightforward use leading directly to extremely high quality analytical outcomes. One that employs these principles in the basic data analytical space, but expands the operational sphere out and beyond Tableau's ease of use horizons. This new tool will be the next quantum leap forward in data analysis. I'm looking forward to it.

The blueprints for the next golden tool, identifying what it needs to be and do, and how, are already out there, if one knows where and how to look. The only real question is: who's going to build it?

 

Danger! Don't delete that data source!

Here's a common occurrence (at least for me): I'm working merrily along, having a terrific time with my data, creating all sort of good and valuable stuff. The information and insights are pouring out of Tableau. It's an almost zen state where I'm in the flow and things are humming along with the celestial choir. All is good.

After a while I notice that there's a lot of stuff in my workbook that doesn't really need to be there. Experiments, trials, that sort of thing. I've connected to a few fair data sets along the way, some of which aren't really necessary. Some Tableau gardening it in order.

Among the pruning and trimming there's a data connection that's no longer needed, for whatever reason. So I, thinking it's occupying space that could use freeing up, go ahead and ask Tableau to close it.

Close it, Dan-o.

However, unbeknownst to me the data connection is in use by one or more worksheets, so closing it would leave it or them with nothing to display. This would not be good.

Um, Boss?

Fortunately, Tableau recognizes the danger and is more than happy to help me avoid this calamity.

So it steps in and provides this very nice message, letting me know that closing the data connection will clear all the worksheets that access it. But Tableau, understanding that I may well want this to happen, offers to go ahead and close it.

Hey... just a second there.

Something's missing.

I really should make sure that these worksheets that will be cleared by closing the data connection aren't important enough to keep. So to hop to it and give them a look see.

Vaiter! Come taste the soup.

Tableau's nice enough to tell us that there are workbooks in peril, but not so nice as to tell us which ones they are. For a product devoted to helping access and and understand data this is an unfortunate oversight. (but this horse is well enough beaten by now)

Wanted: a quick, simple way to identify the Worksheets.

It would be really handy if there were only some way to identify which Worksheets were using this data connection. One that didn't take a bunch of setup, that could have the answer lickety-split.

Well, as it happens, such a thing has just made its debut.
Simple to use—no muss, no fuss, just the facts' bare bones.

Introducing the DataSource—Worksheet showing tool.

OK, maybe it's not that good a name, but it works, and isn't that the important thing?

The tool is a mini-tool written in Ruby that takes advantage of the Ruby gem 'twb' for the heavy lifting of working with Tableau Workbooks. (the gem will be the topic of it's own blog posting very soon)

Intended to be fast and simple, the tool is a small Ruby script that's run from the command line. I'm working with Windows but think there should be no trouble running it on your Mac. It's a command line app because that's where I do a lot of my work, and putting a nice, pretty, functional UI on it would take time and effort beyond its utility to me.

The tool in action—finding the Worksheets that access the "Sample - Coffee Chain (Access)" data connection.

Here's the Windows command environment showing that the Workbook is in the local directory. The tool can be pointed at another directory, but working local is a good idea.
The simple case.

The tool is invoked as the Ruby script it is with the command:

ruby dsinsheets.rb

In this case the tool is in the local directory, but it can be invoked from wherever it lives. There are three prompts from the tool:

  • In directory? (.):
    the directory to look for Workbooks in (default to 'here')
  • TWB(s) Named? (*):
    Which Workbooks to look in?
    in this case we're interested in our workbook, whose name begins with 'multi*'
    if nothing is put here, the tool will look in all the Workbooks in the directory
  • Data Sources? (*):
    Which data connections to look for?
    the default is for all of them, which is what we see here.
Eureka!

As is clearly shown here our old pal, the "Sample - Coffee Chain (Access)" data connection is used by two worksheets:

  • "Coffee Product Lines", and
  • "Sheet 9"

The tool.

The tool is a simple Ruby script, the code for which is below. It's intentionally unsophisticated, intended to be a sharp, precise tool that does one job with a minimum of fuss, with which it's easy to get the desired results.

Like many small tools it takes some handling to get a feel for. Written in Ruby, it should be very portable, and run well wherever needed. There is some setup required: Ruby must be installed, of course, and the 'twb' and 'nokogiri' gems must be installed—this is everyday stuff and there should be someone nearby who can help if it's outside your skill set as of now.

One potential wrinkle: the tool uses regular expressions to look for data connection names, so if you're interesting in seeing only specific ones you'll need to observer regex conventions when specifying them. Practice makes for easier, more comfortable use.

The tool's code.


#  dsinsheets.rb - Copyright (C) 2014, 2015  Chris Gerrard
#
#  This program is free software: you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation, either version 3 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  See  for more information.

require 'nokogiri'
require 'twb'

def init
  puts  "\n\n\tLooking for Worksheets related to Data Sources.\n\n"
  print "\n\tIn directory? (.): "
  input = STDIN.gets.chomp
  dir   = if input == '' then '' else input + "/" end
  # -
  print "\n\tTWB(s) Named? (*): "
  input = STDIN.gets.chomp
  twb   = if input == '' then '*.twb' else input end
  # -
  print "\n\tData Sources? (*): "
  input = STDIN.gets.chomp
  $ds   = if input == '' then '.*' else input end
  # -
  puts "\n\n"
  # puts "\n\tLooking for Data Source(s) matching #{$ds}\/ in '#{path}' Workbooks\n\n\n"
  dir + twb
end

dataSourceSheets = {}

def processTWB twbWithDir
  return unless twbWithDir =~ /.twb$/
  puts "\t#{twbWithDir}\n\t=============================="
  twb = Twb::Workbook.new twbWithDir
  $datasources = {}
  twb.worksheets.each do |ws|
    ws.datasources.each do |ds|
      if ds.uiname =~ /#{$ds}/i then loadSourceSheet(ds.uiname, ws.name) end
    end
  end
  $datasources.each do |dsn, sheets|
    puts "\n\t -- #{dsn}\n\t  |"
    sheets.each { |sheet| puts "\t  |-- #{sheet}" }
    puts "\n"
  end
end

def loadSourceSheet ds, sheet
  if   $datasources[ds].nil? then $datasources[ds] = [] end
  $datasources[ds].push sheet
end


path = init
Dir.glob(path) {|twb| processTWB twb }

puts "\n\n\tThat's all, Folks.\n\n"

Thursday, March 5, 2015

Tableau Server License Level Capabilities

This Tableau Public viz identifies the capabilities associated with the Viewer and Interactor Tableau Server License Levels.

It recaps the information found in this Tableau Knowledge Base article.

Instead of the KB's static HTML table the information is presented in a dynamic Tableau viz that encourages interaction, which leads to more engagement and better comprehension and retention.

I keep hoping that Tableau will start using Tableau to present information in its own online resources - doing so makes a lot of sense in the eating ones own dog food way. So far they're not seeming to see the value in it.