WilliamLam.com

  • About
    • About
    • Privacy
  • VMware Cloud
  • Tanzu
    • Application Modernization
    • Tanzu services
    • Tanzu Community Edition
    • Tanzu Kubernetes Grid
    • vSphere with Tanzu
  • Home Lab
  • Nested Virtualization
  • Apple
You are here: Home / vOpenData: An Open Virtualization Community Database

vOpenData: An Open Virtualization Community Database

04.12.2013 by William Lam // 11 Comments

Recently, I had the opportunity to help out with a very unique and cool project called vOpenData which was created by Ben Thomas (a former VMware GSS Technical Engineer). The idea for the project was sparked by a very simple tweet that came from Duncan Epping:

Ben wanted to help answer Duncan’s question but more importantly he wanted to help answer a bigger set of questions: what are some of the common virtual infrastructure deployment configurations, averages and consolidation ratios? These questions cross the minds of the everyday vSphere administrators, architects and consultants. It would be quite difficult and nearly impossible to answer these questions outside of their own environment.

Ben reached out to me with his idea and asked if I could help develop a script to collect basic configuration information from a vSphere environment to help test out his idea. I was immediately intrigued with his idea and saw the huge potential value that Ben’s unique solution could bring to the virtualization community. The coolest thing about this project is that we were able to put together a working prototype within a week’s time!

Note: Also be sure to check out Ben's article vOpenData - Crunching everyone's data fun for fun and knowledge and his perspective on how he was able to quickly develop a prototype leveraging a PaaS solution.

What is vOpenData?

vOpenData is an open community project that grew from the question "What is the average VMDK size for deployed virtual machines?” We wanted to create an open community database that is purely driven by users submitting their virtual infrastructure configurations. Leveraging the powerful virtualization community and applying simple analytics we are able to provide various trending statistics and data for virtualized environments. This is 100% community driven and the results will be available for everyone to view and hopefully you will contribute to the overall dataset!

What information do we collect?

We made an effort to not collect specific information such as hostnames or even display names that could be used to identify a particular organization. Instead, we are using UUIDs which are automatically generated by the virtualization platform to uniquely identify a particular object. This allows us to keep track of changes in the our database when a new data set is uploaded from an existing environment. In addition we are collecting various configuration data and you can find a complete list in the Data FAQs

More info on the data we collect is here: Data FAQs

What will this data be used for?

We are planning on using this data to create some interesting statistics and data modeling for the community to use in capacity planning and analysis. Most of this data will be made available through a dashboard or reports and eventually through an API to be mixed into other applications.

What about privacy concerns?

Though the data that is collected is already anonymized and non-identifying, please ensure that you are abiding by the privacy policies of your organization when uploading this data. If you are concerned about the data, it is recommended that you audit the zip contents before uploading which are just CSV files. We only ask that you do not modify the schema at all.

How do to get started?

Step 1 - Check out the sexy vOpenData Public Dashboard here to get a glimpse of some of the information you will find by submitting your configuration data.

Step 2 - Download either the PowerCLI or vSphere SDK for Perl script which you will run against a vCenter Server which will produces a compressed zip file containing several CSV files. Instructions are available on the download page. You may rename the default file name vopendata-stats.zip to something else, as long as you do not modify the contents of the file.

Step 3 - Open a browser and go to http://www.vopendata.org and sign up for new account.

Step 4 - Click on the “Infrastructures” tab at the upper left hand corner. An Infrastructure is a logical view that can help you organize the data you have collected. You can associate a single vCenter Server with an infrastructure or you can combine multiple vCenter Server data sets into a single infrastructure. The choice is really up to you on how you would like to visualize your data and whether you would like to map that to the physical location of your virtual infrastructure.

Step 5 - Once you have created your Infrastructures, you will then upload your data files to their respective Infrastructure. This may take some time as the data processing is executed in the background and will also depend on the number of users and uploads occurring at the moment. We ask that you please be patient and check back in a bit and you can refresh the page which will let you know when the processing is complete

Step 6 - After the data is uploaded to the system, there is a scheduled job that performs the analytics and calculations which occurs in periodic batches. These calculations can take up to 45minutes to an 1hour before the results are reflected in the public dashboard and is primarily governed by the single worker we have on the backend due to resource constraints. To view the results of the public dash board visit http://dash.vopendata.org

We hope you frequent the vOpenData site regularly as the community uploads more and more data and see how statistics are trending over time. We would also like thank the following people who were part of our early alpha program and assisted with both testing as well as code contributions: Frederic Martin, Raphaël SCHITZ, Timo Sugliani and of course my Automation colleague Alan Renouf! If you would like to learn more about the vOpenData project, we have also submitted a session for VMworld 2013 4976 - vOpenData - Crunching Everyone's Data For Fun And Knowledge, be sure to vote for it!

You can follow @vopendata on Twitter for new updates and notifications as well as both Ben Thomas at @wazoo and William Lam at @lamw

How can I help or contribute?

First and foremost, you can get involved by signing up for a free account and begin contributing your data to the open community database! We are also open to any suggestions and feedback as they would be very valuable to us, feel free to join the vOpenData VMTN Community Group to discuss further. We know that in this first release we are not going to be able to show everything, but have plan to show much more. Lastly, all the infrastructure that is used to provide the dashboard, the backend database and processing is all hosted and paid out of our own pockets. If you have found this to be a useful resource and would like to contribute either with a donation or sponsorship to help us continue developing this project, please contact us at vopendata[at]gmail[dot]com

More from my site

  • Logging into vCenter when vCenter Cloud Gateway (VCGW) is disconnected from vSphere+ Cloud Service
  • How to check if your vCenter Server is using vSphere+ / vSAN+ Subscription?
  • Automating subscription and usage retrieval for vSphere+ and vSAN+ Cloud Service
  • Quick Tip - Inventory core count for vSphere+, vSAN+ & VCF+ Cloud Service
  • Automating Virtual Machine screenshots in vSphere

Categories // Uncategorized Tags // vopendata, vSphere

Comments

  1. Marco Broeken says

    04/12/2013 at 7:09 pm

    I can imagine that the guys over at CloudPhysics also have a lot of valuable data.

    Perhaps they are willing to share to opendata?

    Reply
    • William Lam says

      04/12/2013 at 7:35 pm

      Completely agree. Would love to collaborate with them and see how we can further benefit the virtualization community as a whole!

      Reply
  2. Ammesiah says

    04/12/2013 at 7:20 pm

    That's an great idea and an amazing job !

    Long live vOpenData !

    Reply
    • William Lam says

      04/12/2013 at 7:36 pm

      Thanks Fredric! We couldn't have done it without you and Raphael! Hopefully this will be a useful tool for everyone

      Reply
  3. Michael Ryom says

    04/13/2013 at 8:46 pm

    Would love to see network added to the stack

    Reply
    • William Lam says

      04/14/2013 at 3:52 pm

      Michael,

      Definitely. Networking is on our roadmap. Is there anything in particular that is a MUST see that would be helpful/useful?

      Reply
  4. Iwan 'e1' Rahabok says

    04/14/2013 at 2:45 am

    What does the color mean? Can't figure it out. If they don't mean anything, then my suggestion is to have 3 colors:
    1 for Total. e.g. total number of LUNs in the opendata database.
    1 for Average.
    1 for Maximum. This is for showing how high people push it. So we know the highest or record.

    Thanks! great job!

    Reply
    • William Lam says

      04/14/2013 at 3:55 pm

      Iwan,

      Yes, the tiles are color coated to represent the specific entity types.

      Baby blue = Infrastructure (this is the logical view and everything in that color represents data related to that)

      The same goes for light green = cluster, red = clusters, yellow = hosts, etc.

      Hopefully you'll help contribute more data too!

      Reply
  5. Mohammed Raffic says

    04/14/2013 at 2:17 pm

    Thanks for your valuable posts

    http://www.vmwarearena.com/

    Reply
  6. Anonymous says

    04/15/2013 at 4:48 pm

    at the moment I got the message at start:

    PowerCLI S:\VMware> .\getvOpenData.ps1
    The '<' operator is reserved for future use. + FullyQualifiedErrorId : RedirectionNotSupported

    Reply
    • William Lam says

      04/15/2013 at 10:42 pm

      Hi,

      Make sure your download was not corrupted. Someone ran into this when we first launch and it was due to a bad download. On the github site, there is a link for the "zip" file that you can download the scripts

      Reply

Thanks for the comment! Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Search

Author

William Lam is a Senior Staff Solution Architect working in the VMware Cloud team within the Cloud Infrastructure Business Group (CIBG) at VMware. He focuses on Cloud Native technologies, Automation, Integration and Operation for the VMware Cloud based Software Defined Datacenters (SDDC)

Connect

  • Email
  • GitHub
  • LinkedIn
  • RSS
  • Twitter
  • Vimeo

Recent

  • Self-Contained & Automated VMware Cloud Foundation (VCF) deployment using new VLC Holodeck Toolkit 03/29/2023
  • ESXi configstorecli enhancement in vSphere 8.0 Update 1 03/28/2023
  • ESXi on Intel NUC 13 Pro (Arena Canyon) 03/27/2023
  • Quick Tip - Enabling ESXi Coredumps to be stored on USB 03/26/2023
  • How to disable the Efficiency Cores (E-cores) on an Intel NUC? 03/24/2023

Advertisment

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Copyright WilliamLam.com © 2023