WilliamLam.com

  • About
    • About
    • Privacy
  • VMware Cloud Foundation
  • VKS
  • Homelab
    • Resources
    • Nested Virtualization
  • VMware Nostalgia
  • Apple
You are here: Home / vOpenData: An Open Virtualization Community Database

vOpenData: An Open Virtualization Community Database

04.12.2013 by William Lam // 11 Comments

Recently, I had the opportunity to help out with a very unique and cool project called vOpenData which was created by Ben Thomas (a former VMware GSS Technical Engineer). The idea for the project was sparked by a very simple tweet that came from Duncan Epping:

Ben wanted to help answer Duncan’s question but more importantly he wanted to help answer a bigger set of questions: what are some of the common virtual infrastructure deployment configurations, averages and consolidation ratios? These questions cross the minds of the everyday vSphere administrators, architects and consultants. It would be quite difficult and nearly impossible to answer these questions outside of their own environment.

Ben reached out to me with his idea and asked if I could help develop a script to collect basic configuration information from a vSphere environment to help test out his idea. I was immediately intrigued with his idea and saw the huge potential value that Ben’s unique solution could bring to the virtualization community. The coolest thing about this project is that we were able to put together a working prototype within a week’s time!

Note: Also be sure to check out Ben's article vOpenData - Crunching everyone's data fun for fun and knowledge and his perspective on how he was able to quickly develop a prototype leveraging a PaaS solution.

What is vOpenData?

vOpenData is an open community project that grew from the question "What is the average VMDK size for deployed virtual machines?” We wanted to create an open community database that is purely driven by users submitting their virtual infrastructure configurations. Leveraging the powerful virtualization community and applying simple analytics we are able to provide various trending statistics and data for virtualized environments. This is 100% community driven and the results will be available for everyone to view and hopefully you will contribute to the overall dataset!

What information do we collect?

We made an effort to not collect specific information such as hostnames or even display names that could be used to identify a particular organization. Instead, we are using UUIDs which are automatically generated by the virtualization platform to uniquely identify a particular object. This allows us to keep track of changes in the our database when a new data set is uploaded from an existing environment. In addition we are collecting various configuration data and you can find a complete list in the Data FAQs

More info on the data we collect is here: Data FAQs

What will this data be used for?

We are planning on using this data to create some interesting statistics and data modeling for the community to use in capacity planning and analysis. Most of this data will be made available through a dashboard or reports and eventually through an API to be mixed into other applications.

What about privacy concerns?

Though the data that is collected is already anonymized and non-identifying, please ensure that you are abiding by the privacy policies of your organization when uploading this data. If you are concerned about the data, it is recommended that you audit the zip contents before uploading which are just CSV files. We only ask that you do not modify the schema at all.

How do to get started?

Step 1 - Check out the sexy vOpenData Public Dashboard here to get a glimpse of some of the information you will find by submitting your configuration data.

Step 2 - Download either the PowerCLI or vSphere SDK for Perl script which you will run against a vCenter Server which will produces a compressed zip file containing several CSV files. Instructions are available on the download page. You may rename the default file name vopendata-stats.zip to something else, as long as you do not modify the contents of the file.

Step 3 - Open a browser and go to http://www.vopendata.org and sign up for new account.

Step 4 - Click on the “Infrastructures” tab at the upper left hand corner. An Infrastructure is a logical view that can help you organize the data you have collected. You can associate a single vCenter Server with an infrastructure or you can combine multiple vCenter Server data sets into a single infrastructure. The choice is really up to you on how you would like to visualize your data and whether you would like to map that to the physical location of your virtual infrastructure.

Step 5 - Once you have created your Infrastructures, you will then upload your data files to their respective Infrastructure. This may take some time as the data processing is executed in the background and will also depend on the number of users and uploads occurring at the moment. We ask that you please be patient and check back in a bit and you can refresh the page which will let you know when the processing is complete

Step 6 - After the data is uploaded to the system, there is a scheduled job that performs the analytics and calculations which occurs in periodic batches. These calculations can take up to 45minutes to an 1hour before the results are reflected in the public dashboard and is primarily governed by the single worker we have on the backend due to resource constraints. To view the results of the public dash board visit http://dash.vopendata.org

We hope you frequent the vOpenData site regularly as the community uploads more and more data and see how statistics are trending over time. We would also like thank the following people who were part of our early alpha program and assisted with both testing as well as code contributions: Frederic Martin, Raphaël SCHITZ, Timo Sugliani and of course my Automation colleague Alan Renouf! If you would like to learn more about the vOpenData project, we have also submitted a session for VMworld 2013 4976 - vOpenData - Crunching Everyone's Data For Fun And Knowledge, be sure to vote for it!

You can follow @vopendata on Twitter for new updates and notifications as well as both Ben Thomas at @wazoo and William Lam at @lamw

How can I help or contribute?

First and foremost, you can get involved by signing up for a free account and begin contributing your data to the open community database! We are also open to any suggestions and feedback as they would be very valuable to us, feel free to join the vOpenData VMTN Community Group to discuss further. We know that in this first release we are not going to be able to show everything, but have plan to show much more. Lastly, all the infrastructure that is used to provide the dashboard, the backend database and processing is all hosted and paid out of our own pockets. If you have found this to be a useful resource and would like to contribute either with a donation or sponsorship to help us continue developing this project, please contact us at vopendata[at]gmail[dot]com

More from my site

  • Programmatically accessing the Broadcom Compatibility Guide (BCG)
  • Enhancements to VMware Cloud Foundation (VCF) & vSphere Automated Lab Deployment Scripts
  • Frequently asked scenarios about Subscription & Entitlement for vSphere+, vSAN+ and VCF+
  • Frequently asked scenarios about Global Inventory for vSphere+, vSAN+ and VCF+
  • Frequently asked scenarios about Cloud Consumption Interface (CCI) for vSphere+, vSAN+ and VCF+

Categories // Uncategorized Tags // vopendata, vSphere

Comments

  1. *protectedMarco Broeken says

    04/12/2013 at 7:09 pm

    I can imagine that the guys over at CloudPhysics also have a lot of valuable data.

    Perhaps they are willing to share to opendata?

    Reply
    • *protectedWilliam Lam says

      04/12/2013 at 7:35 pm

      Completely agree. Would love to collaborate with them and see how we can further benefit the virtualization community as a whole!

      Reply
  2. *protectedAmmesiah says

    04/12/2013 at 7:20 pm

    That's an great idea and an amazing job !

    Long live vOpenData !

    Reply
    • *protectedWilliam Lam says

      04/12/2013 at 7:36 pm

      Thanks Fredric! We couldn't have done it without you and Raphael! Hopefully this will be a useful tool for everyone

      Reply
  3. *protectedMichael Ryom says

    04/13/2013 at 8:46 pm

    Would love to see network added to the stack

    Reply
    • *protectedWilliam Lam says

      04/14/2013 at 3:52 pm

      Michael,

      Definitely. Networking is on our roadmap. Is there anything in particular that is a MUST see that would be helpful/useful?

      Reply
  4. *protectedIwan 'e1' Rahabok says

    04/14/2013 at 2:45 am

    What does the color mean? Can't figure it out. If they don't mean anything, then my suggestion is to have 3 colors:
    1 for Total. e.g. total number of LUNs in the opendata database.
    1 for Average.
    1 for Maximum. This is for showing how high people push it. So we know the highest or record.

    Thanks! great job!

    Reply
    • *protectedWilliam Lam says

      04/14/2013 at 3:55 pm

      Iwan,

      Yes, the tiles are color coated to represent the specific entity types.

      Baby blue = Infrastructure (this is the logical view and everything in that color represents data related to that)

      The same goes for light green = cluster, red = clusters, yellow = hosts, etc.

      Hopefully you'll help contribute more data too!

      Reply
  5. *protectedMohammed Raffic says

    04/14/2013 at 2:17 pm

    Thanks for your valuable posts

    http://www.vmwarearena.com/

    Reply
  6. *protectedAnonymous says

    04/15/2013 at 4:48 pm

    at the moment I got the message at start:

    PowerCLI S:\VMware> .\getvOpenData.ps1
    The '<' operator is reserved for future use. + FullyQualifiedErrorId : RedirectionNotSupported

    Reply
    • *protectedWilliam Lam says

      04/15/2013 at 10:42 pm

      Hi,

      Make sure your download was not corrupted. Someone ran into this when we first launch and it was due to a bad download. On the github site, there is a link for the "zip" file that you can download the scripts

      Reply

Thanks for the comment!Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Search

Thank Author

Author

William is Distinguished Platform Engineering Architect in the VMware Cloud Foundation (VCF) Division at Broadcom. His primary focus is helping customers and partners build, run and operate a modern Private Cloud using the VMware Cloud Foundation (VCF) platform.

Connect

  • Bluesky
  • Email
  • GitHub
  • LinkedIn
  • Mastodon
  • Reddit
  • RSS
  • Twitter
  • Vimeo

Recent

  • Programmatically accessing the Broadcom Compatibility Guide (BCG) 05/06/2025
  • Quick Tip - Validating Broadcom Download Token  05/01/2025
  • Supported chipsets for the USB Network Native Driver for ESXi Fling 04/23/2025
  • vCenter Identity Federation with Authelia 04/16/2025
  • vCenter Server Identity Federation with Kanidm 04/10/2025

Advertisment

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Copyright WilliamLam.com © 2025