6/15/2020
Bin There, Finished That: A Dialogue of Bins in Tableau
Creating bins in Tableau appears fairly straight ahead, right-click on a Measure and choose Create -> Bin and also you’re carried out. Nonetheless, there are some restrictions and pitfalls which are crucial to know. On this publish, I’ll clarify these and provide a number of options, together with a hidden perform in Tableau that I feel you will discover very useful.
I will be referencing Jonathan Drummey quite a lot of occasions on this weblog publish, however what else is new. Most of the time, issues in Tableau join their method again to Jonathan Drummey and Joe Mako. There are a number of good posts with regards to bins in addition to an amazing Tableau workbook.
First, let’s begin with this blog post from Jonathan about bins.
One of many large restrictions with bins is that while you create a Histogram utilizing the bins, you aren’t ready so as to add a reference line. Jonathan and Joe provide their very own equations that will let you create your personal bins, thus utilizing a calculated discipline as a bin, and due to this fact you possibly can add reference traces to the histogram.
These calculations are good options, and despite the fact that this weblog publish is from 2014, these options can nonetheless be used right this moment.
Nick Hara wrote this excellent blog post called « I’ve Bin Everywhere » (January 2019), the place he reveals quite a lot of examples and gives this great Tableau Workbook available for download.
Let’s discover some issues about bins.
Be Cautious of Floating Factors
One of many pitfalls in utilizing the default bins in Tableau happens when binning decimal values. That is due to floating level points when coping with the arithmitic of the Flooring() or Ceiling() capabilities in Tableau. This difficulty is printed in this knowlege base article by Tableau, Histograms Display Decimal Values in Incorrect Bins. Briefly, multiply the decimal values by 100 or 1,000 in a calculation to take away the decimal locations, then bin that calculated discipline. Observe, that is what Nick does to unravel this difficulty in his workbook talked about above.
For example, this is what the bins appear like in Tableau after I take 100 numbers from 0.01 to 1.00, in increments of 0.01 and bin them with the bin measurement of 0.05. We might anticipate these to be pretty even bins, however they don’t seem to be.
Now I’ll take that very same discipline and multiply it by 100 and create a bin utilizing that calculated discipline with bin measurement 5. Once we do that we get the bin distribution that we might anticipate to see.
Okay, in order that’s fairly straight ahead, however there are different points that we are able to encounter. Let us take a look at how these customized calculated fields for bins distribute the values in these bins.
Be Cautious of Bin Distribution when Utilizing Customized Calculations
Nick initially created his binning resolution for one thing at work. For his weblog publish, he ported that resolution over utilizing the World Indicators knowledge from Tableau Desktop, particularly the sphere Inhabitants City, which is a decimal quantity representing the % of the inhabitants that’s city in every nation over a number of years. This seems to be an amazing use case for this dialogue, as a result of we encounter each points, the floating level difficulty and points with how the bins are distributed with these customized calculations.
Let’s step by this and work our technique to a remaining resolution. Whereas this part might look lengthy, very detailed and sophisticated, I promise that the ultimate resolution is fairly simple and straight ahead (and we’ll be taught a hidden perform in Tableau too).
I modified Nick’s workbook to point out the values that comprise his histogram. Under is the histogram and the person values in a stem and leaf plot (minimize off at ~30 rows).
It’s possible you’ll discover some points with the binning immediately. First, the bin begins at 0 to five and the values that inside that first bin are literally between 5 and 10. Second, and extra regarding, there are values within the second bin that needs to be grouped with the primary bin. For instance, the worth 0.0910 needs to be grouped with the opposite values which are between 5 and 10.
Nick’s system for his calculated discipline:
Calculated Subject: Inhabitants Bin (Param)
Method:
([Urban Bin]-( CEILING([Population Urban]*100)%[Urban Bin])) + CEILING([Population Urban]*100 ) – [Urban Bin]
Observe – [Urban Bin] is a parameter that Nick created to alter the bin measurement
Let’s make two minor adjustments to Nick’s system. First, we’ll change the CEILING perform to a FLOOR. Second, we’ll add 5 to the worth of the bins to slip them over one bin worth, which on this case merely means we take away the -[Urban Bin] from the equation.
Calculated Subject: Inhabitants Bin (Param) REVISED
Method:
([Urban Bin]-(FLOOR([Population Urban]*100)%[Urban Bin])) + FLOOR([Population Urban]*100)
Our revised system creates these bins. That is truly appropriate and works as a remaining resolution.
Nonetheless, if we examine these bins in opposition to the default Tableau bins it will not match. That is as a result of th Tableau default bins are encountering the floating level difficulty described earlier.
If we create a caluclated discipline and multiply it by 100 and create a bin utilizing that new calculated discipline then it solves this difficulty and the bins match the REVISED calculation above.
The default bins work nice, however sadly we will not add a reference line to the default bins as a result of they’re Discrete. We have now come full circle. We have corrected the calculation system to create a steady bin and we have corrected the default bins. Each are appropriate now, however the steady bins created by the calculated discipline is the one one that permits the reference line we’re in search of.
Utilizing a Hidden Perform in Tableau: SYS_NUMBIN()
I am unable to take any credit score for locating this hidden perform. Again in June 2017, Jonathan Drummey came out to our office for some hands on Tableau training with our small team of Tableau developers and data scientists. In considered one of our many conversations over these two days, Jonathan talked about working throughout this perform. As I recall, he was working in Tableau, he encountered some kind of difficulty and a wierd error field popped up on him. He noticed this perform and determined to provide it a attempt, and positive sufficient it labored within the calculation window. We talked briefly about how we would use this, however we moved one, time handed and sometimes I revisit it. Effectively, because it seems it is an amazing resolution for this explicit drawback and is tremendous easy to make use of.
The syntax is SYS_NUMBIN([Measure], [Bin Size]). On this case, it could be SYS_NUMBIN([Urban Population], 0.05), however as we have discovered that will likely be problematic. Additionally, this perform will create integer bins beginning at 0, then 1, and so forth. So we’ll make a number of minor changes. To resolve the floating level difficulty we’ll use the sphere that multiplies the Measure by 100.
To vary the bins from [0,1,2…] to [5,10,15] we’ll a number of the bins by the bin measurement after which add the bin measurement. For instance, (0*5) + 5 = 5 for the primary bin. (1*5) + 5 = 10 for the second bin, and so forth. To match the bins that Tableau would create robotically it could appear like this: (SYS_NUMBIN([Measure], [Bin Size]) * [Bin Size]) + [Bin Size]. On this case, the brand new system could be.
Calculated Subject: Inhabitants City * 100 (SYS_NUMBIN)
Method:
(SYS_NUMBIN([Population Urban * 100],[Urban Bin]) * [Urban Bin]) + [Urban Bin]
As well as, we are able to use the SYS_NUMBIN() perform to easily a number of the equations that Nick created. For instance, he had an amazing instance of distribution with a tail and variable width. His system appears like this:
Calculated Subject: Well being Bin Tail (Variable)
Method:
IF [Health Exp/Capita]>=[Health Threshold] THEN [Health Threshold]
ELSE
IF [Health Bin]>=[Health Exp/Capita] THEN
(([Health Bin]*.1)-(CEILING([Health Exp/Capita])%([Health Bin]*.1)))
+ CEILING([Health Exp/Capita])-[Health Bin]*.1
ELSE ([Health Bin]-(CEILING([Health Exp/Capita])%[Health Bin]))
+ CEILING([Health Exp/Capita])-[Health Bin]
END
END
Utilizing the SYS_NUMBIN() perform we are able to merely this to:
Calculated Subject: Well being Bin Tail (Variable) (SYS_NUMBIN)
Method:
IF [Health Exp/Capita]>=[Health Threshold] THEN [Health Threshold]
ELSE
IF [Health Bin]>=[Health Exp/Capita]
THEN (SYS_NUMBIN([Health Exp/Capita],[Health Bin]*.1) * [Health Bin]*.1)
ELSE (SYS_NUMBIN([Health Exp/Capita],[Health Bin]) * [Health Bin])
END
END
This produces the very same distribution:
Different Notes of SYS_NUMBIN()
Listed here are only a few different issues we are able to do with steady bins in Tableau.
Having a steady bin was the purpose on this explicit case, as a result of we needed so as to add a reference line, however you possibly can additionally add a reference band, distribution band or field plot to you binned histogram.
As a result of it is a customized calculation, you should use the bin in a calculated discipline. For instance, you possibly can mix with parameter actions, set actions or different calculations to assign coloration to the bins.
Simply set the width of the bins utilizing different guidelines. This is not something new. Individuals have been doing this for years, nevertheless it’s just a bit simpler now. For instance, the Freedman-Diaconis rule for figuring out bin width for a steady variable is [2 * (IQR/n^(1/3))] the place IQR is the interquartile vary and n is the variety of data. We are able to simply create a calculated discipline utilizing the Freedman-Diaconis rule after which we are able to simply drop that within the SYS_NUMBIN() perform for the bin measurement.
Calculated Subject: Freedman-Diaconis Bin Measurement
Method:
{ 2 * ( (PERCENTILE([Population Urban * 100], 0.75) – PERCENTILE([Population Urban * 100], 0.25)) / POWER(rely([Number of Records]), 1/3) ) }
Calculated Subject: Inhabitants City * 100 (SYS_NUMBIN with FD)
Method:
(SYS_NUMBIN([Population Urban * 100],[Freedman-Diaconis Bin Size]) * [Freedman-Diaconis Bin Size]) + [Freedman-Diaconis Bin Size]
There are variety of formulation that might be used to find out the width of the bins (see more examples here). Observe – The system that Tableau makes use of by default to find out the variety of bins is Variety of Bins = 3 + log2(n) * log(n) the place n is the variety of distinct rows within the desk. The dimensions of every bin is decided by dividing the distinction between the smallest and the most important values by the variety of bins.
Observe – Plainly SYS_NUMBIN(), even when used as a discrete dimension, doesn’t activate knowledge densification like a typical bin would. This is smart, as a result of we have transformed the tablet sort from a bin to a discrete dimension and the later would not activate knowledge densification (learn extra about Data Densification Using Domain Completion and Domain Padding here).
None of these items use instances are new, since we might use different equations (from Jonathan Drummey, Joe Mako, Nick Hara and others) to create steady bins. Nonetheless, I discover the SYS_NUMBIN() perform is way simpler to recollect, so I hope you discover this hidden perform and this data helpful when creating bins in Tableau. Under is a Tableau Public Visualization with all of those examples you could obtain and discover.