9/12/2014
My IronViz Contest Viz and Commentary
Let me start by saying, wow! Taking part within the 2014 Tableau IronViz competitors was an unbelievable expertise. The Tableau convention was superb, and I used to be in a position to meet so many buddies in individual whom I solely knew from social media platforms. The IronViz stadium was equally superb. 5 giant screens, 1,000 folks watching at full-capacity for the room, and two superb vizzers to compete in opposition to (and very nice guys too!). Lights, music, smoke, and blazingly quick viz motion. I’ve included a commentary beneath if anybody is , however you are in all probability extra interested by seeing my viz, so right here it’s (or at the least what it will have been with my misplaced 3 minutes again – particulars beneath). You should definitely transfer the mouse across the roulette wheel!
First let me describe the info. We have been utilizing information extracted from Yelp. Over 1 million rows and round 1/2 gigabytes of information. It took practically 20-30 seconds simply to load the info into Tableau. The information itself was very messy. The outline given to us was « information from Las Vegas, NV and Phoenix, AZ from Yelp from 2004 to 2014 ». It was a bit extra difficult than that as a result of the info was very messy.
Issues with information
There have been extra states within the information set than simply NV and AZ. There have been really 18 states within the file, a few of that are clearly fallacious: AZ, CA, EDH, ELN, FIF, GA, KHL, MA, MLN, MN, NC, NTH, NV, NY, ON, SCB, WI, XGl. Step one in my information cleansing was to filter out the whole lot besides NV.
Dates – we got information from 2004-2014. Nonetheless there have been just a few factors in 2004 and no constant quantity in 2005. The critiques on Yelp do not actually choose up till 2007. For the needs of my evaluation I filtered from 2012-2014. This gave me the majority of the critiques and an excellent measurement of information to work with that was most related (assuming extra critiques lately are extra helpful).
There have been misspellings of the restaurant names that prompted main evaluation points. For instance, « Zeffirino’s » was listed as the one 5 star restaurant on the Las Vegas Strip (with 5 critiques). Nonetheless, there may be one other file for « Zeffirino » within the database. That file has the identical handle, however a distinct geocoding and neighborhood (welcome to the real-world of information evaluation). The larger challenge was that this location had 191 critiques and solely 3.5 stars. In different phrases, at first look Zeffirino’s was the one 5 star restaurant on the strip. In the event you did not look at the info carefully, you’ll have come to the fallacious conclusion. This was simply considered one of many traps within the information. To resolve this, I filtered for the neighborhood equal to the « The Strip » and added a filter for ranking excluding 5, dropping the Zeffirino data utterly.
For these in attendance, you witnessed my Tableau crash, however the software program did not really « crash ». I used to be transferring so quick initially that I filtered for « NY » by mistake as a substitute of « NV » on my very first filter. This was instantly obvious after I was working with the neighborhood area 2 minutes later, so I shortly went again to the filter to regulate it. That is the place issues went screwy. For some motive, I could not filter out NY and get NV again in. It ought to have been two clicks, however I attempted just a few instances, clicking and unclicking, deciding on all and deselecting, and it wasn’t working appropriately. Being 2 minutes into the viz, I used to be anxious about attempting to debug it (in fact it labored completely in my resort room later that evening, so perhaps I used to be simply transferring too quick or had made another mistake alongside the best way). I decided within the second to only begin over. In hindsight, I’d have been higher off utilizing the great limitless « again » in Tableau as a result of it will have saved me about 20 seconds loading the large workbook once more.
After the « crash », just a few folks within the viewers have been yelling « save » through the competitors and even my nice souz-vizzer, Michael Kovner, came to visit and mentioned « Management S ». The issue was that due to the dimensions of the info, it took about 45 seconds to avoid wasting something. So every vizzer knew intimately effectively that there have been just a few issues NOT to do and attempting to avoid wasting the workbook through the competitors was most definitely one to keep away from. One choice would have been to avoid wasting the twb file with out the info, however we weren’t given the unique information file to hook up with; we have been solely given a twbx file to start out with. At this level I used to be simply attempting to work as quick as doable. The opposite factor all of us prevented was utilizing the textual content area of the critiques. The textual content area was so huge (the first motive the file is 1/2 gig), that attempting to do any calculations to parse that area, seek for phrases, or use it with out filtering it manner down was simply too pricey on time.
Alongside those self same strains, Ryan Sleeper (one of many judges), talked about in his ultimate feedback that I may have used Tableau as a substitute of Excel to create the info for the roulette wheel. He’s utterly appropriate. Nonetheless, I selected Excel on this case due to velocity. The time it was taking to do the additional calculations in Tableau for 60 data was too lengthy. It was a lot simpler to only copy and paste rows in Excel after which merely paste the completed information again into Tableau. A couple of folks requested if the Roulette Wheel would spin. Michael and I really mentioned this and explored the concept of utilizing a parameter to « spin the wheel » in some method. I wasn’t pleased with something I explored, so we determined to go away it as a Hover choice, over each the wheel and the board. If you solely have 20 minutes, 17 in my case after the crash, each second counts. So we needed to make some design choices on that one to ensure I may end in 20 minutes.
Even with the 17 minutes I had, I used to be actually near the place I wished to be. There have been some formatting issues and some dashboard actions, however total I used to be practically accomplished. Ultimately John Mathis gained the IronViz with a terrific visualization « Reviewing the Reviewers ». Congrats John!
Beneath is a side-by-side comparability. On the left facet is the place I used to be when the time clock ran out within the IronViz contest. The suitable facet exhibits the ultimate model of the place I used to be heading with it in the previous couple of minutes.
I want to thank all of the neighborhood assist. There have been plenty of Tableau Zen Masters, many others from the terrific Tableau neighborhood, and a terrific group of household and buddies that have been cheering me on, tweeting, texting, calling, and emailing. I even had somebody strategy me out of the blue on the airport on the best way house and mentioned, « Are you the professor from the IronViz? I voted for you. You need to have gained. » Thanks all to your form phrases and encouragement! Hopefully I am going to see you guys in Vegas subsequent 12 months! In the event you go to any of the restaurant picks remember to let me know the way it was.
I hope you benefit from the viz. In case you have any questions be at liberty to electronic mail me at Jeff@DataPlusScience.com
Jeffrey A. Shaffer
Observe on Twitter @HighVizAbility
Edited by Breanne LaCamera 1/26/2015 and posted 2/17/2015