Archive for the ‘Virtualization’ Category
After upgrading to vCenter 5 running on Windows Server 2008 R2 you may begin seeing an ADWS event with ID 1209 logged in the Active Directory Web Services event log within the Windows Event Viewer. This event will be logged once per minute with the following text:
“Active Directory Web Services encountered an error while reading the settings for the specified Active Directory Lightweight Directory Services instance. Active Directory Web Services will retry this operation periodically. In the mean time, this instance will be ignored. Instance name: ADAM_VMwareVCMSDS”
This event by itself is not something that should cause you to believe your vCenter installation is not working properly. Essentially what is occurring is that the VMwareVCMSDS ADAM instance does not have a valid SSL Port set within the registry. You can confirm this by browsing to the following registry location on your vCenter server:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\ADAM_VMwareVCMSDS\Parameters]
You should see that the Port SSL registry entry is either missing or does not contain a value. You can simply add the REG_DWORD value while you are already in the registry with the value of 636. Or you can use the below Microsoft approved method to add the port value.
Add ADAM_VMwareVCMSDS SSL Port Value
- Stop the vCenter ADAM instance within the Microsoft Services Control Panel. The service name is VMwareVCMSDS. Optionally, you simply run net stop vmwarevcmsds from an elevated command prompt.
- At a command prompt, type dsdbutil.
- Within the dsdbutil, type activate instance VMwareVCMSDS.
- Type SSL port 636.
- Type Quit.
- Type net start vmwarevcmsds.
Once the service starts, I highly suggest you reboot the vCenter server to allow all the vCenter services as well as the Microsoft Active Directory Web Services service to restart. If you don’t reboot the server you may encounter a fairly nondescript error message when attempting to logon using the vSphere Client until you reboot.
VMware has issued a KB for this particular issue:
Active Directory Web Services fails to read the settings for the specified Active Directory Lightweight Directory Services instance
Note: vCenter 5 introduced a new “Inventory” service that communicates over its own web services port and uses its own SSL certificate. vCenter 5 specific instructions will be noted below.
|
Step 1
|
To proceed with the below process you will need to install the latest version of OpenSSL on Windows/Linux or optionally you can leverage the OpenSSL install on a VMware Management Assistant (vMA) appliance.
Tip: If you install OpenSSL on Windows you will need to set the environment variable OPENSSL_CONF to the directory where the default openssl.cfg file is located (this is typically c:\OpenSSL-Win32\bin\openssl.cfg). In the default scenario, at the command-prompt type set OPENSSL_CONF=c:\openssl-win32\bin\openssl.cfg
You can confirm the environment variable is correct by simply typing set at the command prompt and looking for the OPENSSL_CONF line.
|
|
Step 2
|
The very first item you will need to create is the replacement certificate Private Key.
At a command-prompt type openssl genrsa 2048 > rui.key
This will create a file within your current working directory called “rui.key”—this is your private key.
|
|
Step 3
|
Using the private key you will need to create a Certificate Signing Request (CSR) used by your Certificate administrator (or alternately used by a public/commercial Certificate Authority) to issue the Public Key.
At the command-prompt type openssl req –new –key rui.key > rui.csr
You will be prompted for the following information:
- Country Name (2 letter code): US
- State or Province Name (full name): California
- Locality Name (eg City): San Francisco
- Organization Name (eg Company): DuckWorks
- Organizational Unit Name (eg Section): Information Technology
- Common Name (this is your fully qualified server name): vCenter.duckworks.com
- Email Address: <don’t enter one>
- A challenge password: <don’t enter one>
- An optional company name: <don’t enter one>
This will create a file within your current working directory called rui.key—this is your Certificate Signing Request (CSR).
|
|
Step 4
|
Using notepad or any file editor, open the file rui.csr you created in step 3 above. Copy the text starting with (including) —–BEGIN CERTIFICATE REQUEST—– and ending (including) —–END CERTIFICATE REQUEST—–.

|
|
Step 5
|
You are now going to create the replacement certificate Public Key using a Microsoft Certificate Authority.
- Browse to your Microsoft Certificate Authority website (usually https://<servername>/certsrv/). Note: Your Certificate Authority may not use https but may be accessible using http://<servername>/certsrv/.
- Select Request a Certificate.
- Select Advanced Certificate Request.
- Select Submit a certificate request by using a base-64-encoded CMC or PKCS #10 file, or submit a renewal request by using a base-64-encoded PKCS #7 file.
- Copy the contents gathered in Step 4 into the textbox (you can also browse to your CSR file as well).
- Select the Web Server Certificate Template.
- Select Submit.
- Select Base 64 encoded option.
- Select Download certificate
- Important: When saving the certificate make sure you rename the certificate to rui.crt (note the .CRT file extension—don’t leave .CER as the file extension). When saving a base64 type certificate, .cer and .crt are interchangeable.
You can open the rui.crt file within Windows and it should look similar to the following:

|
|
Step 6
|
Create a PFX (pkcs12) file containing the public and private key pairs.
- Copy the two files rui.key and rui.crt into a folder (it’s easier to keep the files together).
- Using a command-prompt, type openssl pkcs12 –export –in rui.crt –inkey rui.key –name rui –passout pass:testpassword –out rui.pfx (note: using “testpassword” is significant because it’s used as the keystore password in the Tomcat server.xml file—you can use a different password to secure the PFX file but you will need to update the server.xml file to match the password you used.)
- Copy all three files (rui.crt, rui.key, rui.pfx) to C:\ProgramData\VMware\Vmware VirtualCenter\SSL\ (WIN2008) (recommend archiving the existing SSL certificates).
- From the command-prompt, type net stop vpxd (This will stop your core vCenter service so make sure you have a maintenance window).
- From the command-prompt, change your directory (CD) to the installation path of vCenter.
- From the command-prompt, type vpxd –p (you will be prompted for the database password used on your ODBC connection—the password will be re-encrypted using the new certificate).
- From the command-prompt, type net start vpxd
|
|
Step 7
|
This step is for vCenter 5 only. Use the following steps to replace the certificate used by the vCenter 5 Inventory service. This process is simple because you can use the certificate generated using the steps above for the inventory service.
- Copy rui.key, rui.crt, and rui.pfx to your vCenter Inventory Service installation path (ex. C:\Program Files\VMware\Infrastructure\Inventory Service\SSL\)
- Restart the vCenter Inventory Service within the Windows Service Control panel (services.msc).
|
After following the above steps your vCenter server will now be using the new certificate for all web services. Additionally, you will no longer be presented with a certificate warning popup when using the vSphere Client if the certificate authority that issued the replacement certificate is trusted by your computer (in this specific case any domain joined computers will automatically trust all certificates issued by your internal enterprise Microsoft Certificate Authority).
Stumbled upon an issue with the latest release of vSphere ESXi 4.1 Update 1 (fully patched) where following VMware KB 1032666 to modify ESXi default password hashing from MD5 to something stronger such as SHA-256 or SHA-512. Some federal government agencies cannot use MD5 for password hashing since it considered cracked (see wikipedia MD5). Tried two different “approved” ways to edit the system-auth PAM file. One, use [#chmod 644 system-auth] to set permissions on the file so that it is user editable (or just use :wq! after editing). Two, use [#chmod +t system-auth] before editing. Unfortunately, after a reboot the system-auth file returns back to its pre-edited version.
I opened an SR with VMware and they in turn opened a PR and shortly thereafter confirmed the issue is not by design and in-fact is a bug. VMware estimates that this issue will be resolved in the Update 2 release of vSphere ESXi 4.1. Haven’t had a chance to see if the issue is present in vSphere ESXi 5.0.
This document is the public draft release of the vSphere 4.1 Security Hardening Guide. This guide is an incremental update to the vSphere 4.0 Security Hardening Guide based on new and changed features of vSphere. Please provide your feedback in the comments section. This draft will remain posted for comments until approximately the end of February 2011.
Link to document: vSphere 4.1 Security Hardening Guide (draft)
Mattias Sundling over at Quest has put together a brief whitepaper titled “Maximizing VM Performance”. It is not intended to be a deep-dive but it covers the main VM performance considerations in plain terms.
Check it out here: Maximizing VM Performance
I find it a bit interesting, but not surprising, that VMware is soon going to be releasing a native iPad application providing similar functionality that is found in the vSphere client for Windows. I wonder why VMware is utilizing programming resources on the iPad when they have yet to come to market with a supported Linux version of vCenter? For many years virtualization administrators in Linux/Unix shops have been telling VMware that vCenter and its associated database server and vSphere clients are the only Windows boxes they support and would very much appreciate a fully functional vCenter and client for Linux. Back in 2008, VMware was saying they would be releasing versions of vCenter running on Linux along with familiar Windows versions. Still waiting… (a limited CTP version does exist here).
I also wonder if VMware has given up on the vCenter Mobile Access (vCMA) product since it is still in a community technology preview version, there have been no major enhancements, and its been a long time since it was introduced. The vCMA had the “cool” factor when it was released–I remember showing people how I could vMotion a VM from one ESX host to another from my Blackberry. That cool factor faded away to the point where I haven’t used the vCMA in over a year–it’s just too kludgy to get anything done. Is the iPad vSphere application the new vCMA–the new vendor specific application that will introduce the iPad into corporate virtualization environments? Will it take over the functionality of the vCMA?
Don’t get me wrong, I see tremendous possibilities for the iPad within the corporate environment. The VMware vSphere iPad application could be very useful to large organizations that have lots of ESX hosts. Imagine an administrator being able to evacuate and place an ESX host in maintenance mode while troubleshooting a hardware issue within the datacenter (or from Hawaii on business). I can even see the VMware vSphere iPad application allowing virtualization administrators to manage a significant portion of their daily workload away from the office.
In conclusion, it’s great that VMware is working on new innovative ways to enhance access to vCenter from various devices; however, if I had my way I would rather VMware spend more time doing the following (in this order):
- Enhance the vCenter product for Windows. When I say “enhance” I mean work on the fit-and-finish of the product. All too often I am presented with ambiguous error messages or stumble on a failed process, yet the event reporting within vCenter can’t seem to tell me what’s wrong.
- Work on the overall performance of the vCenter UI. There are reports all over the Internet of the horrific performance within the vCenter user interface. I see it everyday. Viewing inventory takes 10 seconds to load up once the vCenter interface is visible (this doesn’t count the time to logon and load the plugins). Granted, performance is linked to hardware specifics and one must build an appropriate server environment to support vCenter; though, I am talking about poor performance on vCenter servers running with new multiple Xeon quad-cores with 8GB+ physical memory with a large dedicated physical DB server back-end.
- Enable performance monitoring across all hosts from a single UI window. Since a DRS cluster is essentially a pool of CPU and memory resources–why are we still required to troubleshoot performance by analyzing single ESX servers (think esxtop)?
- Enhance command-line troubleshooting tools. For example, an esxtop command that has a global view of clusters and storage. Yes it’s great to see the read/write MBps to a specific VMFS LUN but I want to see the total across all hosts not just the localized view of a single ESX host.
- Stop developing new features that are only added to the growing list of VMware products including vCloud Director, vCloud Request Manager, Orchestrator, CapacityIQ, Site Recovery Manager, Lab Manager, and Configuration Manager. Put some of the features in vCenter for continued value-add. For example, why haven’t we seen simple Virtual Machine replication in vCenter?
- Finish and release a fully functional Linux vCenter server with associated Linux vSphere client.
- Create better quality upgrade and patch bundles. Why do customers cross their fingers hoping everything is going to work as expected after upgrading vCenter or an ESX host? How many times have I seen an upgrade break vCenter (for example, certificates, web services, health monitoring)? Answer, many times.
- (last) Develop a mobile vSphere client.
One of the great new features of VMware vCenter 4.1 is Distributed Resource Scheduler (DRS) Groups. DRS Groups provides functionality that allows separation and placement of virtual machines onto specific ESX/ESXi hosts within a DRS cluster. Using DRS Groups, limiting the available hosts to a virtual machine or group of virtual machines is simple. Why might you want to use DRS Groups? I can think of many great scenarios where I could use DRS Groups; though, I will discuss one specific example regarding vCenter Server placement.
It is probably safe to assume most VMware administrators have implemented vCenter Server as a virtual machine within a DRS/HA cluster. A virtual vCenter Server running within a DRS/HA cluster provides many great high-availability and manageability benefits; however, there is a specific challenge that has not been solvable until vCenter Server 4.1. In the event vCenter Server become unavailable, an administrator would need to connect directly to an ESX/ESXi host using the vSphere Client where vCenter is located to manage the server there (i.e. open a console connection, restart the vCenter server, power the vCenter server up, etc.). But because vCenter Server is running inside a DRS cluster it is sometimes very time consuming locating the specific ESX/ESXi host where vCenter is running if you have many ESX/ESXi hosts within the DRS/HA cluster. For example, if there are 12 ESX/ESXi hosts running within a DRS cluster; the vCenter server could be running on any one of the 12—could you imagine using the vSphere Client and connecting to up to 12 hosts before locating the vCenter Server? Could you afford wasting 20 minutes during an emergency trying to locate vCenter Server?
Using DRS Groups vCenter Server can be limited to run on a limited number of ESX/ESXi hosts within a DRS cluster. For example, using DRS Groups an administrator can designate three of the 12 hosts where vCenter Server can run. In the event vCenter Server becomes unavailable it would be much easier to locate vCenter if you know it is primarily running on any of three hosts instead of 12 hosts.
The following VMware KB article provides a starting point for you to further investigate DRS Groups: http://kb.vmware.com/kb/1022842 .
I tend to recommend using ESXi versus ESX for several reasons. However, this week I was reminded of the shortfalls ESXi has yet to mitigate. First, in response to a complaint regarding slow storage performance I responded by gathering metrics using various tools available in vSphere (i.e. performance graphs, esxtop, etc.). I was quickly reminded vscsiStats functionality, a indispensable storage troubleshooting tool, is not available in ESXi. Scott Drummonds over at Pivot Point (blog) has provided vscsiStats binaries out-of-band that can be installed within an ESXi server. The problem is that applying these binaries to ESXi is not supported by VMware nor will VMware release security related patches for these unsupported binaries. There is no “supported” workaround for running vscsiStats in ESXi.
The second issue was in regards to troubleshooting a vMotion related problem with a virtual machine (well, what appeared at the time to be a vMotion issue). Basically, the virtual machine would not vMotion regardless of what was tried. Even after confirming no virtual devices were causing the problem the only solution was to power off the virtual machine and then perform the migration. I attempted to review the virtual machine vmware.log file after the virtual machine was powered back on. Unfortunately, the only way to read a vmware.log file is to view it directly from the console of the ESXi host that is running the virtual machine. Because SSH is not supported in ESXi (yes, it can be enabled) I was not able to read the vmware.log file remotely. There is no “supported” workaround to remotely view the vmware.log file when using ESXi.
These two issues alone can be deal-breakers for some.
UPDATE 1: VMware made huge steps towards closing the supportability and functionality gap with the ESXi 4.1 release. The two issues identified above have been mitigated as ESXi 4.1 allows supported command-line access locally and remotely via SSH. Additionally, I am happy to report the vscsiStats tool is now available and officially supported in ESXi 4.1 at /usr/lib/vmware/bin/vscsiStats. Great job VMware!!!
Are Virtual Machines free since I can run multiple independent instances of an Operating System in isolation on a single physical server? I can only pay for a physical server once; if I already paid for it, why can’t I consider each Virtual Machine a no-cost server installation?
The uncontrolled installation of perceived free virtual server installations is called virtual machine sprawl—by many accounts is a new epidemic in the datacenter with the advent and rapid adoption of Virtualization. If Virtual Machines are free, how do I pay for additional capacity requirements when I use all existing capacity? This article describes the costs associated with Virtual Machines and two strategies for calculating cost per Virtual Machine. This information can be used to recover hardware and software costs associated with virtualization and can also be used in reports and analyses of projected project costs.
In the mainframe days, Technology departments developed a method to recover technology costs by charging customers for services provided. This method of cost recovery is called the “chargeback”. Chargeback models are not perfect. In fact, Chargeback is often a point of contention within organizations depending on which side of the aisle you are on. For example, IT managers are typically fond of the chargeback model. It’s a way to prevent customers from demanding excessive services without first considering the cost impact. Business managers, however, see chargeback from the other side. Often the cost passed back to the customer includes a margin of profit which is looked at in a negative way because the business manager feels he is being asked to pay for “more” than the actual usage cost. Executive management typically falls in the middle. They like chargeback because independent accounting and performance metrics can be tracked back to individual business units and used as a scorecard. However, Executives struggle with chargeback because it’s “one organization”; the money used for IT capital expenditures is often allocated using an Enterprise strategy (i.e. “Business Unit A” needs a new database server and the money will come from a general IT expenditures account).
Assuming you still want to pursue a chargeback model you will need to figure out how to accurately account for resource usage in the virtual environment that covers actual costs, is perceived by business customers as fair, and also allows enough (profit) to purchase additional capacity and replacement hardware and software in future years. There are two models of Chargeback that work fairly well to recover costs associated with providing Virtual Infrastructure services. I will call these “Simple” and “Complex”.
In the Simple model, the cost per Virtual Machine is calculated as the Virtual Infrastructure is built-out. By totaling the costs associated with physical server hardware purchases, virtualization software licensing, Operating System licensing, storage costs, and any other costs associated with standing up a new virtual machine (i.e. maintenance and personnel costs) and dividing by the number of Virtual Machines that are expected to be supported on the Virtual Infrastructure over its lifetime, you can determine an estimated per Virtual Machine cost. This model is simple and easy to calculate but is largely unfair to customers because it doesn’t account for differences between actual resource usage. For example, Business Unit A uses a single-CPU Virtual Machine with 1GB of RAM and Business Unit B uses a four-CPU Virtual Machine with 8GB or RAM—though, both Business Units are paying the same.
In the Complex model, the cost per Virtual Machine is calculated using actual resource usage information of a Virtual Machine. This model is especially complex in a virtual environment because of the many different ways a Virtual Machine can be configured and deployed. Further, because Virtual Machines are often shuffling around physical servers as capacity requirement change, it’s impossible to use a simple ledger-style Chargeback method. To implement a fair and balanced Chargeback model using the Complex model, the use of Chargeback tools (VMware Chargeback and VKernel Chargeback) purpose-built for virtualization becomes a requirement. The Complex model requires the cost per Virtual Machine to be calculated using the Simple model described above as a first step. Actual CPU, Memory and Storage information is then collected as an additional cost and added to the basic per Virtual Machine cost. By using the complex model a fair and accurate per Virtual Machine cost can be associated to an individual Virtual Machine.
Virtual Machines are not “free”. Each Virtual Machine has hardware costs, software costs, infrastructure costs, personnel costs and other hidden costs (HVAC, Electricity, Datacenter footprint) that are all factors to consider when creating a Virtual Machine.
This is Part 1 in a series of posts related to the definition and configuration of VMware Distributed Power Management.
The consolidation of physical servers into Virtual Machines provides significant hardware cost savings and also reduces power consumption throughout the datacenter. VMware Distributed Power Management (DPM) is yet another way to further increase return on investment by continuously analyzing the Virtual Infrastructure and consolidating Virtual Machines onto fewer hosts during times of low utilization. Hosts that are identified by DPM as underutilized are powered off. As capacity requirements increase, DPM powers on additional hosts to handle the load using either Wake-on-LAN (WOL), iLO, or through IPMI calls to a hardware Baseboard Management Controller (BMC).
How do you get started with DPM? First, you need the appropriate licensing that enables Distributed Resource Scheduler (DRS). This means you will need to license VMware vSphere Enterprise or VMware vSphere Enterprise Plus versions of ESX(i). Second, you will need to configure a VMware DRS cluster within vCenter made up of two or more ESX(i) 4.x hosts. Lastly, you will need to enable and configure the DPM feature on the DRS cluster.
The most difficult process to tackle when setting up DPM is selecting and configuring the “wake-up” solution. There are three options, WOL, iLO and IPMI. WOL has been around for many years and works fairly well; though, it is broadcast based. The physical NICs in the ESX(i) host must support WOL and WOL usually needs to be enabled in the BIOS. WOL will not be able to wake-up ESX(i) hosts if the vCenter server is not on the same subnet as the vMotion NICs unless you enable IP directed Broadcasts. WOL works by sending a UDP “magic packet” to all hosts on a subnet with a payload that describes which host should wake-up. All the hosts on the subnet analyze the packet and only the host the command was meant for powers on.
iLO and IPMI can be described together because they are very similar. iLO (Integrated Lights-out) and BMC IPMI (Intelligent Platform Management Interface) are out-of-band devices that allow an administrator or process to interact directly with a server at a hardware level. Because communication with these devices uses unicast they can be located on any subnet within your routed network. iLO devices typically contain more features than most basic IPMI Baseboard Management Controllers (ex. Console Redirection, Virtual Disks, etc.). Most iLO and IPMI solutions are configured in a similar way. At boot, you press a defined key combination to enter the configuration menu. You will need to configure the IP address, subnet, and gateway. Additionally, you may need to specify a VLAN if you are using trunking on your iLO or IPMI network interface. You will also need to define a username and password combination for access to the iLO or IPMI device. vCenter will require this information during the DPM configuration. Also, while you are configuring the device you will want to take note of the MAC address.
An often unknown but important piece of information is that most BMC IPMI devices share a servers internal integrated NICs. Typically, these integrated NICs are also being used by your ESX(i) host for Service Console, vMotion or Virtual Machine traffic. This is a perfectly acceptable configuration except in the following circumstance: On Dell hardware for instance; the BMC (the device which provides the IPMI interface) can only run at 100Mb when the server is powered off. It is common for the integrated NICs to operate at a speed of 1000Mb. It’s also very common for network administrators to force a speed and duplex setting on switch ports for servers at 1000Mb. What happens is that while ESX(i) is up the NICs will run at their designated 1000Mb speed. When the server is powered down the integrated NICs transition to a low power mode of 100Mb. When the switch port is configured for 1000Mb the BMC NICs disconnect and you lose the ability to remotely administer the BMC (i.e. a speed/duplex mismatch occurs). The solution is to enable AUTO/AUTO on the switch ports used by the BMC NICs. The server will negotiate 100Mb when powered off and will negotiate 1000Mb when ESX(i) is up.
In Part 2 of this DPM series we will configure the cluster for DRS and DPM.