vopendata

Charlotte VMUG vOpenData Presentation Posted + New Stats

06.10.2013 by William Lam // Leave a Comment

Last week I had the privilege to attend the Carolina Users Summit and from what I hear, it is one of the larger VMUGs in the US. I was asked to give a presentation on a recent community project that I collaborated with Ben Thomas on called vOpenData. The presentation goes into some background on the how the project got started, a deeper look at how it works and a live demo including some interesting stats that have not been shared before (CLTVMUG exclusive!).

I would like to thank everyone who attended the session and the great questions that were brought up. For those of you who missed it or could not attend, I have posted the presentation online and you can download it here. I hope everyone enjoyed it and hopefully you will contribute your data as well as help spread the word about vOpenData! Both Ben and I have been quite swamped with work and changes lately, so hopefully we will have some a nice update for everyone real soon!

Even if you know what vOpenData is, I think it is still worthwhile to check out the presentation as it contains a bunch more stats that have never been shared before. Here is a sneak peak of two that I am sure both Duncan Epping and Frank Denneman would be quite proud of:

Clusters w/vSphere HA Configured:

Clusters w/vSphere DRS Configured:

Lastly, I would like to give a big thanks Charlie Gautreaux for inviting me out to the Charlotte VMUG! I had a great time and met a lot of really cool people!

vOpenData Update & FAQs

04.19.2013 by William Lam // Leave a Comment

Ben Thomas and I launched the vOpenData project exactly 1 week ago and we really had no idea what to expect. We were completely blown away with the amount of interest and support from the awesome VMware community! In one weeks time we had contributions from over 30+ countries from around the world and we are approaching 70,000 virtual machines! Below is a quick screenshot on some of the impressive statistics. This has been a huge community effort and we wanted to take a moment to thank everyone who has contributed and thanks to all the bloggers who have written awesome articles to help us get the word out! Let's continue the momentum!

There have been a few requests for showing the top 5 hardware vendors instead of the top 3. To be honest, this was done to have a symmetrical looking dashboard and all the tiles just fitted when we designed it. We will see if we can adjust a few things to accommodate for this, but I do want to note that Ben is currently on travel for work and modifications to the site may take some time. This also includes the Contributors dashboard which we are hoping to have a prototype in the coming weeks, so please be patient as we work on that.

We have also received many questions and comments about requesting to see certain statistics, ability to filter, what information is collected, etc. and I wanted to put together a quick FAQ to help answer some of these questions for existing vOpenData users as well as new and potential prospects. If there are other things we can help clarify, feel free to leave a comment.

vOpenData FAQs

What is vOpenData?

Please take a look at these two blog articles for more details about vOpenData:

vOpenData: An Open Virtualization Community Database

vOpenData – Crunching Everyone’s Data For Fun And Knowledge

What information are you collecting?

You can find the complete list of anonymized data we are collecting in the Data FAQ page here. You can also view the raw contents of the zip file after executing the script. They are just plain CSV files and we highly encourage ever user to take a look before submitting.

What is a vOpenData Infrastructure?

This is just a logical container/view for your data. Once the Contributor's dashboard is available, you will be able to view and filter by various properties within a given vOpenData Infrastructure. This also means that you can either map a single vCenter Server upload to single vOpenData Infrastructure or you can have multiple vCenter Server uploads to a single vOpenData Infrastructure. If you wish to have fine grain filters, it is recommended that you have one vOpenData Infrastructure for each of your vCenter Server.

I still do not see my data reflected in the public dashboard?

It can take some time for both the processing and display of the data after upload. Please be patient and it can take up to an hour depending on the number of users and submissions.

What happens if my vCenter Server environment changes?

As your environment changes, you can periodically run the script and upload your data. If an existing object exists and it has changed, then we will update the record or add a new record if it is new. If there are no changes, then no updates are made.

What browsers are supported for the vOpenData Dashboard?

We are using Dashing for the dashboard application for vOpenData. From their site, it looks like it has only been tested on Chrome, Safari 6+ and Firefox 15+. We have also heard from a few folks that both IE9/10 does not work with the dashboard, so please use one of the supported browsers.

Is vOpenData supported on mobile browsers?

Yes, it should work on all mobile devices including the iPad, iPhone, Nexus, etc.

Can you display statistic X?

The public dashboard is just a tiny percentage of the information we have collected so far. We will provide more statistics as well as methods of filtering when the Contributor's dashboard is released. By looking at the data we currently collect in the Data FAQ page here, you can kind of guess on the types of statistics we will be able to provide.

Difference between vOpenData Public and Contributors Dashboard?

The public dashboard is just an aggregate of all the datasets submitted by the community and it only contains a very tiny percentage of the information we have collected. It is available to everyone and you do not need to sign up to view it. For those who have contributed, there will be a Contributor's dashboard that will be available when it is released and will contain all the statistics we collect. You will also be able to apply various filters on the total aggregated community data as well as your own vOpenData Infrastructures. As you can see, there is a huge benefit to those that submit their data and once the Contributors dashboard is released, you will be able to view all the data in totally new manner.

Can I see specific type of Infrastructure (e.g. server vs VDI)?

This will be available when the Contributor's dashboard is released.

Is the data submitted mainly lab environments?

Actually, more than 75%+ of the contributed data in the vOpenData database are non-lab environments. This includes Server, VDI, Cloud and Combination environment types. Hopefully we can continue to increase this number even more as we have more contributions from the community.

Why is Windows 2012 not part of the top 10 Operating System or why is Cisco UCS not listed in the top 3 Hardware Vendors/etc?

I was pretty surprised to see these type of questions. The results are all from all the data that has been submitted by the community. If folks feel that this is not correct, then I highly encourage you submit your infrastructures or talk to your customers/clients to submit their infrastructure and let the data speak for itself.

vOpenData: An Open Virtualization Community Database

04.12.2013 by William Lam // 11 Comments

Recently, I had the opportunity to help out with a very unique and cool project called vOpenData which was created by Ben Thomas (a former VMware GSS Technical Engineer). The idea for the project was sparked by a very simple tweet that came from Duncan Epping:

Ben wanted to help answer Duncan’s question but more importantly he wanted to help answer a bigger set of questions: what are some of the common virtual infrastructure deployment configurations, averages and consolidation ratios? These questions cross the minds of the everyday vSphere administrators, architects and consultants. It would be quite difficult and nearly impossible to answer these questions outside of their own environment.

Ben reached out to me with his idea and asked if I could help develop a script to collect basic configuration information from a vSphere environment to help test out his idea. I was immediately intrigued with his idea and saw the huge potential value that Ben’s unique solution could bring to the virtualization community. The coolest thing about this project is that we were able to put together a working prototype within a week’s time!

Note: Also be sure to check out Ben's article vOpenData - Crunching everyone's data fun for fun and knowledge and his perspective on how he was able to quickly develop a prototype leveraging a PaaS solution.

What is vOpenData?

vOpenData is an open community project that grew from the question "What is the average VMDK size for deployed virtual machines?” We wanted to create an open community database that is purely driven by users submitting their virtual infrastructure configurations. Leveraging the powerful virtualization community and applying simple analytics we are able to provide various trending statistics and data for virtualized environments. This is 100% community driven and the results will be available for everyone to view and hopefully you will contribute to the overall dataset!

What information do we collect?

We made an effort to not collect specific information such as hostnames or even display names that could be used to identify a particular organization. Instead, we are using UUIDs which are automatically generated by the virtualization platform to uniquely identify a particular object. This allows us to keep track of changes in the our database when a new data set is uploaded from an existing environment. In addition we are collecting various configuration data and you can find a complete list in the Data FAQs

More info on the data we collect is here: Data FAQs

What will this data be used for?

We are planning on using this data to create some interesting statistics and data modeling for the community to use in capacity planning and analysis. Most of this data will be made available through a dashboard or reports and eventually through an API to be mixed into other applications.

What about privacy concerns?

Though the data that is collected is already anonymized and non-identifying, please ensure that you are abiding by the privacy policies of your organization when uploading this data. If you are concerned about the data, it is recommended that you audit the zip contents before uploading which are just CSV files. We only ask that you do not modify the schema at all.

How do to get started?

Step 1 - Check out the sexy vOpenData Public Dashboard here to get a glimpse of some of the information you will find by submitting your configuration data.

Step 2 - Download either the PowerCLI or vSphere SDK for Perl script which you will run against a vCenter Server which will produces a compressed zip file containing several CSV files. Instructions are available on the download page. You may rename the default file name vopendata-stats.zip to something else, as long as you do not modify the contents of the file.

Step 3 - Open a browser and go to http://www.vopendata.org and sign up for new account.

Step 4 - Click on the “Infrastructures” tab at the upper left hand corner. An Infrastructure is a logical view that can help you organize the data you have collected. You can associate a single vCenter Server with an infrastructure or you can combine multiple vCenter Server data sets into a single infrastructure. The choice is really up to you on how you would like to visualize your data and whether you would like to map that to the physical location of your virtual infrastructure.

Step 5 - Once you have created your Infrastructures, you will then upload your data files to their respective Infrastructure. This may take some time as the data processing is executed in the background and will also depend on the number of users and uploads occurring at the moment. We ask that you please be patient and check back in a bit and you can refresh the page which will let you know when the processing is complete

Step 6 - After the data is uploaded to the system, there is a scheduled job that performs the analytics and calculations which occurs in periodic batches. These calculations can take up to 45minutes to an 1hour before the results are reflected in the public dashboard and is primarily governed by the single worker we have on the backend due to resource constraints. To view the results of the public dash board visit http://dash.vopendata.org

We hope you frequent the vOpenData site regularly as the community uploads more and more data and see how statistics are trending over time. We would also like thank the following people who were part of our early alpha program and assisted with both testing as well as code contributions: Frederic Martin, Raphaël SCHITZ, Timo Sugliani and of course my Automation colleague Alan Renouf! If you would like to learn more about the vOpenData project, we have also submitted a session for VMworld 2013 4976 - vOpenData - Crunching Everyone's Data For Fun And Knowledge, be sure to vote for it!

You can follow @vopendata on Twitter for new updates and notifications as well as both Ben Thomas at @wazoo and William Lam at @lamw

How can I help or contribute?

First and foremost, you can get involved by signing up for a free account and begin contributing your data to the open community database! We are also open to any suggestions and feedback as they would be very valuable to us, feel free to join the vOpenData VMTN Community Group to discuss further. We know that in this first release we are not going to be able to show everything, but have plan to show much more. Lastly, all the infrastructure that is used to provide the dashboard, the backend database and processing is all hosted and paid out of our own pockets. If you have found this to be a useful resource and would like to contribute either with a donation or sponsorship to help us continue developing this project, please contact us at vopendata[at]gmail[dot]com