Management interfaces

20 09 2014

With all the additional functionality of virtualisation come added management requirements. From an infrastructure perspective alone this has resulted in the introduction of a number of infrastructure managements often referred to as the VMMM (virtual machine monitors’ management) in some academic work (van Cleeff, et al., 2009), across multiple vendors including vSphere, XenCenter, System Centre Configuration Manager, Eucalyptus, UVMM and many others. These tools allow administrators to mimic physical interactions with the machines that are now no longer possible in virtual environments. These actions can include creating new machines, adjusting resources allocation and general maintenance of the VM’s.

The addition of management interfaces is not exclusive to the virtual machine portion; either storage or backup and even the infrastructure that runs the virtual hardware (blade chassis) also introduce new management interfaces into the environment. These have the ability to remotely configure arrays, take entire copies of live machines and even turn off the physical hardware that the hypervisors run on.

While all are designed to improve the manageability for administrators, these management interfaces introduce additional code and complexity into environments that create new attack vectors. When exploited, these vectors can lead to an attacker successfully performing actions remotely that would have previously been impossible to achieve in traditional systems. The type of attacks that could be completed, should an attacker gain access to any one of these management interfaces, have the potential to affect numerous machines and services simultaneously, even forcing administrators to implement disaster recovery strategies.

Advertisements




Mitigation techniques for shared hardware

4 06 2014

The concept of mitigation techniques to reduce the likelihood of shared hardware attacks are similar to that of the hypervisor techniques in terms of determining which machines have access to which host. To ensure isolation of the hosts, the same measures can be used as described in an earlier post regarding hypervisors, such as using DRS groups and in larger cloud environments specific hardware-conscious options such as the ‘Dedicated VDC’ in VMware’s vCloud Datacentre. While host isolation techniques can be achievable using these methods, they only address physical component allocation at the host portion such as RAM, CPM, mezzanine cards/NIC’s etc. What these do not address however is the issue of shared storage and blade infrastructures.

Where ‘in house’ storage is concerned, separate groups of arrays could be used to reduce the implications an attack could have due to what systems they could affect. In the same way that DRS groups were created in the hypervisor demonstrations, machines could be grouped by security rating so that less secure machines are not placed on the same array as other higher valued machines. This mitigates the risk of a less secure system threatening the performance of a group of higher target machines due to vulnerabilities found in the VM. With this measure however there is also the consideration that in doing so, you are creating one high target area that could be attacked, affecting all the core services. The increase to security that segmentation of disk arrays creates unfortunately has an adverse effect on resource efficiency, as the smaller the array the higher the disk overhead (Shangle, 2012), (International Computer Concepts, 2012).

There are options within some VMMM’s (certainly within vCenter) to evenly distribute and limit the amount of IOPS a single machine is able to request. This can be set at the machine level using the resource allocation section, and set based on a per machine level. Within the vCenter suite ‘share values’ can also be set to individual machines and automatically limit disk allocation, should disk latency reach a certain threshold. The latter option is not included into the core functionality of the vSphere suite and therefore is at an additional licence cost. In the figure below you will see that the limit has been set to 1000 IOP’s for the ‘Public-Web’ virtual machine. While this option does stop the ability for one machine to overwhelm the entire storage, it can also unnecessarily restrict genuine requests from VM’s, should they experience a higher than normal workload.

Using the free IOPs limiting ability in vCenter

Using the free IOPs limiting ability in vCenter

 

The additional licencing cost of VMware’s ‘Storage I/O Control’ allows one to associate a share value with machines rather than a set resource threshold. A latency figure can be set on a LUN and should that threshold be reached vCenter will ensure that the machine with the highest costing will get the specified disk allocation required. To protect internally hosted environments, core servers would be given the highest costing while less importance, more vulnerable machines would be allocated lower figures to ensure that key functions of the business continue to function, should this type of attack take place.

In circumstances where these options are not available in the VMMM, such as is the case with the standard Hyper-V manager, which “does not have any built in mechanism to dynamically or even statically control storage I/O”  (Berg, 2011), alternative solutions will be required.

Avoiding sharing is undoubtedly the simplest option when securing high risk, mission critical systems. The challenge becomes more complicated when considering public cloud environments. Avoiding storage contention due to noisy or malicious neighbours on a public cloud service is one that should be seriously considered before any cloud adoption takes place. Many big companies are now using the public cloud infrastructure to host their services. Amazon has had impressive adoption rates with online services including Netflix (Buisiness Wire, 2010), Reddit (Berg, 2010), MySpace (High Scalibility, 2010) and many others now opting to migrate their entire business onto Amazons EC2/EC3 infrastructure.

One example of how resource contention can be avoided within a public cloud was undertaken by Netflix’s (Cockcroft, 2011), who extensively researched the inner workings of Amazons EBS (Elastic Block Store) so that they could best utilize the service and not be affected by neighbours’ disk requirements. (Cockcroft, 2011) discovered that Amazons EBS volumes were between 1 GB and 1Tb in allocated size and it was deemed that allocating volumes in 1TB blocks to the Netflix servers regardless of their actual storage requirements avoids the likelihood of co-tenancy and, in turn, storage contention. Amazon makes this a more feasible option to determine the sizing of disks as the whole EC2 service has a large amount of information available, especially when compared to other providers. Having access to this level of information should be a key consideration when planning any cloud migration.

The threat of an exploit at the blade hardware layer is an extremely difficult attack to mitigate against and one that cannot be achieved without taking unrealistic precautions that undermine the reasoning and benefits of a blade system altogether. While there may be scope within the larger blade systems to use separate physical interconnect modules to ensure that secure and insecure machines use different routes in and out of the enclosure, there is still the backplane of the chassis, which is completely shared among all hosts. As mentioned prior, hardware attacks at this layer of the system would most likely be DOS attacks rather than disclosure and have yet to be demonstrated.

Shangle, R., 2012. Level 0,1,2,3,4,5,0/1. [Online]  Available at: http://it.toolbox.com/wiki/index.php/Level_0,1,2,3,4,5,0/1

Buisiness Wire, 2010. Netflix Selects Amazon Web Services To Power Mission-Critical Technology Infrastructure. [Online]  Available at: http://www.thestreet.com/story/10749647/netflix-selects-amazon-web-services-to-power-mission-critical-technology-infrastructure.html

Cockcroft, A., 2011. Understanding and using Amazon EBS – Elastic Block Store. [Online] Available at: http://perfcap.blogspot.co.uk/2011/03/understanding-and-using-amazon-ebs.html

Berg, . M. v. d., 2011. Storage I/O control for Hyper-V. [Online]  Available at: http://up2v.nl/2011/06/20/storage-io-control-for-hyper-v/





Attacking shared hardware used for virtualisation

5 05 2014

There are a number of conjectural and proven attacks that involve the exploitation of shared hardware. One of the more relevant attacks that threaten virtual environments is the ability to degrade the performance of other machines by causing an unpredictable strain on the shared hardware. This could be possible either by taking control of a virtual machine in the environment through an existing software exploit or in the case of a cloud provider, simply by purchasing one. Amazon has multiple security measures in place to deal with inside attacks on their Amazon Web Services (AWS) platform, although with the pricing of a Microsoft Windows instance costing as little as $0.115 per hour, there is a very low cost entry for attackers.  While one moderately powered machine would not be able to affect numerous neighbouring client’s ‘performance’ on Amazon’s infrastructure, this low entry figure demonstrates how little it would cost to rent multiple instances for a clustered attack.

Although not primarily considered a security issue, resource contention is a major issue within virtual systems, especially when operating in multi-tenant environments. The term “noisy neighbours” is used to describe virtual instances of a machine sharing the same host or storage as another and affecting its performance. Problems caused by noisy neighbours or resource intensive virtual machines are typically due to either a misconfiguration or simply from being unfortunate enough to be placed on the same hardware as other high performance machines. However, when considering this issue from a security perspective, if an attacker is able to place a number of machines on the shared hardware as a competitor’s machine, they have the ability to degrade the performance. There has been prior research conducted into determining the internal mappings of a machine within large cloud infrastructures. One paper entitled “Hey, You, Get Off of My Cloud!”  by (Ristenpart, et al., 2009), the authors use the Amazon EC2 service as their environment to test the ability to map the internal location of machines and discuss how this information can be used to construct machines that co-reside with specific targets.  While the specific methods involved in determining the internal location of a machine in large cloud environments are out of scope of this article, in the paper “Hey, You, Get Off of My Cloud!” (Ristenpart, et al., 2009) a description of how an accurate mapping can be achieved using “timestamp fingerprinting” and “Cache-Based Detection” is given.

Studies often measure the impact that noisy neighbours cause on co-residing tenants by analysing RAM, CPU or network usage. While these are relative elements that are affected, one major drawback in only measuring these aspects is that disk activity, such as the IOPS (Input/Output Operations Per Second) on to shared storage is not taken into consideration. This can be one of the more difficult elements to measure, as storage arrays can differ greatly in both size and performance, even within the same provider. Misbehaving disk activity can also be much more erratic in its usage, especially when compared to RAM, which tends to gradually increase rather than produce the spikes in performance that are seen in IOPS.

One attack that would be possible using shared storage would be to use the mapping techniques discovered by (Ristenpart, et al., 2009) to place a group of machines on the same storage array or LUN as a target before generating high I/O. If the activity generated was high enough, contention for disk access would be experienced by all machines using that storage and as a result, machines become noticeably slower, due to the disk latency created. Amazon EC2 does not limit the amount of I/O that a machine can use, as it is a chargeable resource that is billed based on usage to the owning customers account. These charges would obviously not be a problem for an attacker using a stolen credit card for example. While there are a number of articles (Cockcroft, 2011) about the consequences of sharing storage with other busy or malfunctioning VM’s, the author has not found any documentation on using heavily crafted IOPS as being a documented or recognised attack.  A demonstration of how this attack could be carried out is shown later in this section.

Attacks that use shared hardware as a vector are not only capable of producing new attack vectors that affect the availability, but all three aspects of the Confidentiality, Integrity, Availability (CIA) Triad (Perrin, 2008). The confidentiality of machines on virtual systems should also be a consideration before the adoption takes place.  While some of the attacks that surround exploiting the confidentiality and integrity portion of the CIA Triad using shared hardware can fall on the academic side of the spectrum rather than active exploits, these concepts should at least be taken into consideration, especially by high risk targets.

One example of how a shared CPU can be manipulated is (Phil, 2012) demonstration of how two machines running on the same host can communicate with each other without using any networking protocols. This types of attack is typically knows as side-channel attack and has been a known issue for a number of years (Page, 2003), (Osvik, et al., 2005).  In (Phil, 2012) ‘virtualisation specific attack’, there are a number of pre-requisites required for the attack to be successful. These include both of the virtual machines requiring the same number of processors and running on the VMware platform with unlimited CPU resources. However, once all of the appropriate elements are in place (Phil, 2012) was able to send data bits from one VM to another over the CPU by oversubscribing the hardware. While these attacks may be an extremely niche and inefficient with transfer rates being as slow as 0.5bits/sec (depending on the noise of other machines on that host), it does show the principals of how attacking virtual machines at this layer is possible.

An area that the author would be interested in investigating further (due to being unable to find any research that has been undertaken in the area) would be the security implications of shared hardware involved in blade environments. The most effective way to ensure the integrity of an environment is to adopt the ultra-cautious approach of disconnecting machines from the internet and any other connecting networks. This is known as an ‘air-gap’, and is typically used to secure high target environments such as SCATA (supervisory control and data acquisition) systems. Blade systems such as PowerEdge M1000e offer “compelling operational benefits, such as improved cabling, rapid hardware provisioning, high compute density, energy-efficient design and increasing management automation”, which can offer enough resources to individually power an entire large organisation or business. Using VLAN’s, multiple networks can be hosted within the one enclosure including Demilitarized Zones (DMZ) and Virtual Desktop Infrastructures (VDI). While research has been done into the sharing of components such as RAM, CPU etc., elements of the blade environment such as the chassis backplane and connection fabric into the system pose an equal if not greater risk. If malicious software was able to infect the software that manages these physical elements of the system they could potentially monitor and affect the integrity of information to and from any virtual machine or host.

As discussed earlier, when placed on the same storage array as a number of machines, an attacker may be able to affect the performance of other machines by requesting large amounts of disk I/O on a shared storage array. To demonstrate the plausibility of this attack the author conducted a simulation of two attack machines and one target machines that were placed on the same storage array. I will post a full description of the simulation in a separate posting. To demonstrate the disruption caused by this attack, the experiment will be using the built-in monitoring tool ‘esxtop’. The figure indicated under the GAVG/cmd column is the figure that will best demonstrate the impact the attack has on the storage array. This figure identifies the “response time as it is perceived by the guest operating system” by adding the “average response time in milliseconds per command being sent to the device” (DAVG/cmd)  to “the amount of time the command spends in the VMkernel” (KAVG/cmd).

The simulation used three machines to demonstrate this process, two representing the controlled machines of the attacker and one the victim machine. Both of the attacking machines are running a freely available Microsoft SQL I/O stress testing/benchmarking utility named “SQLIO”. To simulate high I/O the author initiates the utility using a snippet of the parameters shown below.

“sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt

sqlio -kR -s360 -frandom -o8 -b8 -LS -Fparam.txt…”

 The ‘frandom’ perimeter in the SQLIO utility generates random reads and writes rather than sequential, as random disk activity is known for being more intensive on storage devices (Kelkar, 2011). This resulted in the number of read operations on one of the attacking machines to rise to a consistent rate of 5323.41 commands per second, causing the GAVG to rise from zero to 82.31 milliseconds on the attacking machine and from zero to 47.41ms on the victim machine. While these contention results fluctuated during the tests, the GAVG was consistently above 30 ms on both one of the attacking machines and the victim machine during the test as is shown in Figure 1 and on the graph in Figure 2.

Statistics for each machine during the I/O tests

Figure 1 – statistics for each machine during the I/O tests

The average figures that were shown by the monitoring software also demonstrate the high latency that was experienced by each machine. To demonstrate the impact that the attack has on the machines response time (GAVG), Figure 2 shows the average GAVG figure that was reported by each VM before the script is run and then for the following 5 minutes. The graph shows that the average GAVG before the script was run was instant at 0ms, but once the script was initiated this figure increased, peaking at around 82ms. The average response time for the victim machine throughout the 5 minute period was 46.77ms, which is 36.77ms above that recommended by VMware.

Figure 9 - Graph showing the average millisecond GAVG response time reported for each guest OS during the testing

Figure 2 – Graph showing the average millisecond GAVG response time reported for each guest OS during the testing

This graph demonstrates that it is possible for an attacker with machines located on the same shared storage array as their target, to be able to adversely affect the performance of other machines through over subscription of the hardware.

 

Sources:

Ristenpart, T., Tromer, E., Shacham, H. & Savage, S., 2009. Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. [Online]  Available at: http://cseweb.ucsd.edu/~hovav/dist/cloudsec.pdf

Cockcroft, A., 2011. Understanding and using Amazon EBS – Elastic Block Store. [Online]  Available at: http://perfcap.blogspot.co.uk/2011/03/understanding-and-using-amazon-ebs.html

Perrin, C., 2008. The CIA Triad. [Online]
Available at: http://www.techrepublic.com/blog/security/the-cia-triad/488

Osvik, D. A., Shamir, A. & Tromer, E., 2005. Cache Attacks and Countermeasures: the Case of AES. Rehovot: Department of Computer Science and Applied Mathematics.

 





Introduction to Shared hardware

10 04 2014

The ability to distribute resources across multiple physical machines has played a pivotal role in the growth of virtualisation today. The term elastic computing is sometimes used to describe the nature of virtualised environments as resources can be added and retracted dependent on their need. In traditional environments, high performance CPU’s were often required to accommodate fluctuations in processing requirements, however the average utilisation of the CPU is typically below 10% . Shown in the table below is a VMware comparison of the hardware requirement for three different industries and their CPU average utilisation, when compared to that of a virtual implementation. The chart compares the computing requirements for three different industries and demonstrates how the use of virtualisation can lower the number of physical servers, while also improving elements of services and setup time.

Table demonstrating the benefits of virtualisation by comparing the typical computing requirement of three different industries (VMware, 2006)

Table demonstrating the benefits of virtualisation by comparing the typical computing requirement of three different industries (VMware, 2006)

The benefits of virtualisation span further than processing however, storage being another area that benefits tremendously. Storage area networks (SAN’s) are used to centrally manage storage in one location rather than managing multiple RAID sets in an individual chassis. Much like CPU’s, resources can be added to machines dynamically rather than requiring all resources to be factored in during initial installation, thus reducing the total cost of ownership (TCO).

The ability to share resources also results in exponential improvements in availability. As entire infrastructures can effectively run on one piece of hardware such as with blade systems, adding multiple layers of redundancy is a one-time process that benefits all machines residing in that system throughout the lifetime of all the servers within the infrastructure. Equipping traditional tower or rack servers with the level of redundancy that is seen in a blade system or datacentre would be highly impractical and costly.
However, with this new capacity to share resources across multiple machines comes a host of vectors that require planning and mitigation methods to be in place before adoption. When removing oneself from the obvious benefits offered by shared resources and observing from a security perspective, the idea of having multiple machines sharing the same sticks of RAM, CPU and hard disks seems likely to introduce some negative security implications.

As is covered in an earlier article on hypervisors, machines hosted on the same host can be vulnerable to attacks through misconfigurations and software exploits within the hypervisor and this is no different with the hardware aspect. Guests residing on the same host as other guests can potentially interfere with each other using the very hardware they both run on. All of these attacks and opportunities to interfere with host neighbours are especially pertinent when considering the implications associated with a public cloud perspective.

In the same way it is possible for attackers to craft exploits against the software to exploit the hypervisor and it is also possible to target behaviours in the hardware of virtual systems to exchange information and disrupt service. When describing hardware attacks, the author will not discuss the implications and methods in obtaining physical access to the hardware in this section, as in the author’s opinion once physical access is gained to any hardware system then it is easy to compromise the security. All of the attacks in this next article are achievable through remote access to the system.





Mitigation techniques for hypervisors

3 01 2014

There are both vendor specific and general vendor agnostic network security mitigation techniques that can be used to eliminate scanning and direct connection to the hypervisor. These include IDS/IPS (Intrusion Detection Systems / Intrusion Prevention Systems) on the network portion – to detect and block known scanning patterns performed by common scanning tools and specific vendor hardening options, such as VMware’s Lockdown Mode (VMware, 2012), which turns off the ability to connect directly to the host from anything other than the vCenter’s ‘vpxuser’ account.  However, there are many known evasion techniques available to bypass IDS/IPS systems and while permitted communication may be rejected by VMware’s ‘Lockdown mode’, vulnerabilities in the hypervisor code may still leave other communication channels open to abuse and this would be the case in all hypervisors.

In the author’s opinion, hypervisors should be situated on their own network/subnet, with strict ACL’s in place thus making them directly inaccessible to anyone outside of a limited scope of technical staff. There will be few systems in traditional networks that require this kind of isolation from users, as most require at least some level of user interaction for services etc. While this measure should be sufficient to eliminate attacks being performed directly on the hypervisors, utilising additional layers of defence such as IDS/IPS and specific hypervisor hardening mechanisms such as VMware’s ‘lockdown mode’ should also be used in parallel, to further ensure the integrity of the host.

Unfortunately these multiple network layers of protection do not address the issue of VM escape attacks as mentioned in my last post.  To achieve this, technicians should apply the same security mentality to the hypervisor layer that has previously been applied to the network layer. Depending on the size and nature of a virtualisation platform, layers should be applied to this new portion of infrastructure. One of the layers should be able to mitigate a worst case scenario attack, where an attacker targets less secured and more accessible VM’s on the infrastructure to attack higher valued VM’s running on the same hypervisor. To do this (again depending on the size and risk model of the environment), clusters could be configured to divide virtual machines into categories based on their security rating and grouped together accordingly. While this may be less beneficial when calculating the total ROI that virtualisation offers it is beneficial from a security perspective as the larger the resource pools are (in terms of the number of shared hardware elements) the more efficient the use of hardware is and therefore results in lower overheads. This cost should be offset against the likelihood and impact this kind of attack could cause in order to factor in the increased costs of grouping machines by security rating.

One method of eliminating the threat of machines running on the same hypervisor, even in an automated resource allocation environment such as the VMware’s DRS feature (Distributed Resource Scheduler), is to utilise the groups and rules available in the VMMM. A specific example of this is in the latest version of VMware’s vCenter (Version 5), there are options to collate groups/clusters of machines and set preferences on to which hypervisor they are situated on. Although this feature is intended for the purpose of grouping and separating machines from a resource and availability perspective, the authors also considers that this could also be used to ensure the security of machines against many hypervisor attacks, as well as other attacks, which will be discussed a later post.

In the example below, the author demonstrates a method that can be used for ensuring that machines with differing security classes are not located on the same physical host in automated distributed resource environments by using the resource rules in the VMware vCenter suite. This is done by assigning virtual machines into groups based on the security rating they are considered to have by the organisation. This method of using machine groups for security purposes in DRS environments is one that the author has not seen documented or discussed elsewhere.

These rules could be scaled depending on the size or risk index determined by the organization. There is also scope within these rules to balance out the resource/security overhead by specifying that the machine should not run on a certain host rather than must not. It should also be noted that VMware’s vCloud Datacenter offers a (Lodge, 2010) “Dedicated VDC” option, which provides physically separate hardware – ideal for meeting security or regulatory requirements, where physically sharing isn’t an option”.

Using vCenter groups for segmentation

In this example, the author has set up a simple scenario demonstrating how the rules in VMware’s vCenter suite could be used to separate a group of machines, considered insecure, from running on the same hypervisor/host as another group that are considered secure. This method of using machine groups for security purposes in DRS environments is one that the author has not seen documented or discussed anywhere else prior to writing this.

Using DRS groups, two groups are created - ‘Secure-servers’ and ‘Insecure’. Machines are associated with the appropriate group based on their service etc

Using DRS groups, two groups are created – ‘Secure-servers’ and ‘Insecure’. Machines are associated with the appropriate group based on their service etc

A rule is created specifying that all servers in the 'Secure-Server' group must run on ESX1

A rule is created specifying that all servers in the ‘Secure-Server’ group must run on ESX1

Another rule is created specifying that all machines in the 'Insecure' group must not run on ESX1

Another rule is created specifying that all machines in the ‘Insecure’ group must not run on ESX1

These rules could be scaled depending on the size or risk index determined by the organization. There is also scope within these rules to balance out the resource/security overhead by specifying that the machine should not run on a certain host rather than must not. It should also be noted that VMware’s vCloud Datacenter offers a (Lodge, 2010) “Dedicated VDC” option, which provides physically separate hardware – ideal for meeting security or regulatory requirements, where physically sharing isn’t an option

Lodge, M., 2010. Getting rid of noisy neighbors: Enterprise class cloud performance and predictability. [Online]
Available at: http://blogs.vmware.com/rethinkit/2010/09/getting-rid-of-noisy-cloud-neighbors.html





Attacking the hypervisor

27 11 2013

The hypervisor has the disadvantage of being potentially attacked in one of two ways, from either the network layer or from the host running on that hypervisor. The default behaviour of a hypervisor on a network is to respond to connections through standard TCP/IP, much the same as other desktop machines, devices and infrastructure. This results in the hypervisor being locatable on the network and consequently susceptible to traditional network enumeration attacks such as Nmap (nmap.org, 2012) and Nessus (Tenable, 2012). While enumeration tools are primarily used as a discovery mechanism, they are often able to extract further information about a system by analysing characteristics and information returned by the host. An example of this technique using the currently most utilised enumeration software (Nmap) would be by specifying the ‘–O’ switch, which compares the host’s packet response against a large database of software. Once this extra information about the host has been identified, additional approaches can be used to cross-examine the hosts further to identify attributes such as patch levels and service packs. Depending on the software that is found, using these approaches the attacker is then able to determine the appropriate CVE (Common Vulnerabilities and Exposures) that the host may be vulnerable to. After the vulnerabilities have been identified, the attacker is able to exploit the system using the exploit and insert a payload to further control the host and maintain access. Current examples of software that can be used to exploit systems and insert malicious payloads are Metasploit (Metasploit, 2012) and CORE Impact (Core Security, 2012). This method of enumeration and exploitation will already be familiar to security staff responsible for scanning traditional clients as it is identical.

It is the second method used to attack the hypervisor from the guest or virtual machine that is much more dangerous and an unfamiliar concept, especially for companies invested in the cloud computing or hosting servers in large datacentres.

The term virtual machine (VM) escape is the concept of breaking out of an isolated VM in order to execute malicious code on the host. There have been a number of vulnerabilities on both Type 1 and 2 hypervisors that demonstrate this concept of escape (CVE-2009-1244, CVE-2011-1751, CVE-2012-0217 (Xen, 2012), CVE-2012-3288). While the danger of ‘Type 2’ hypervisor escape is still a threat, the implications of breaking out of a guest, running on an enterprise ‘Type 1’ hypervisor such as ESXi or the Xen hypervisor would be much greater due to the environments that they are often employed in. In traditional networks, security can often be achieved through the segmentation of networks into either physical or virtual networks. This segmentation is still applicable within virtual networks; however this only offers security at the network layer, rather than this new layer of ‘guest–host’ exploitation. While this might sound like an unlikely threat, due to HA features found in VMMM such as VMware’s DRS (Distributed Resource Scheduler) the movement of machines across hypervisors is often determined by the management server rather than by a human. That is unless specified rules are created by the administration. This dynamic movement of virtual machines has the potential to result in an unpatched, publically addressable server being hosted on the same hypervisor/hardware as domain controllers and other high value target machines. This threat is certainly a cause for concern when considering mid to large size networks hosting tens to hundreds of machines within the same infrastructure separated by VLAN’s. The implications and likelihood of this attack is greatly increased when considering multi-tenant public cloud infrastructures. The topic of how hackers could potentially start to rent hosted machines on public clouds to attack other machines will be covered at a later date, but in the authors opinion this could become an actual threat that needs to be considered during a company’s risk analysis process.

There are a number of methods of assessing the security of virtual environments; one of the tools that was recently developed to assist in the evaluation of virtual environments is the VASTO project (Virtualization Assessment Toolkit) (Criscione, et al., 2012). The VASTO project is essentially a collection of Metasploit modules written to query and attack virtual environments, although mainly the VMware platform. The modules are added to the Metasploit project to leveraging an already established and robust framework.

As highlighted earlier, hypervisors are often located on the same subnet as the rest of the servers and, in some cases, the clients. This means that if an attacker is able to gain access to a network that is able to communicate with the hypervisor due to placement or incorrectly configured ACL’s (Access Control Lists), the hypervisor could be attacked directly. Shown in the following example are three simple methods that can be used to locate and query a hypervisor in order to retrieve important information such as version, build number and vulnerabilities that it is susceptible to.

For this demonstration, the author is using a laptop wired into the 192.168.20.0 network in a test environment. Shown in Figure 1, the author uses an NMAP command with the ‘-sV’ switch to scan the entire subnet to return a list of live hosts and associated services. The scan correctly identifies both of the ESXi servers located on the network.

Figure 1- Section of results from an NMAP scan “nmap –sV –T 192.168.20.0/24”

Figure 1- Section of results from an NMAP scan “nmap –sV –T 192.168.20.0/24”

As shown in figure 2 NMAP returns results showing that ESXi is installed on two IP addresses on the 192.168.20.0 subnet and has several open ports. While NMAP does identify the product and version correctly on this occasion, it is not always completely reliable in returning the exact version of the host running on the host. To do this there are a number of methods including VASTO, Nessus or OpenVAS. Using the “vmware_version” module found in VASTO (shown in Figure 2) we are able to detect the exact version of the host including build number.

Figure 2 - Section of results from VASTO vmware_version scan

Figure 2 – Section of results from VASTO vmware_version scan

This now gives the attacker the information needed to locate existing exploits against this version or even develop new exploits, depending on the value of a target. Shown in Figure 4 is a screen shot of a Nessus report generated after a scan against the IP address of the ESXi host. Nessus is an automated scanning and vulnerability assessment tool that fingerprints the host against numerous plugins in order to detect exploits that the host is vulnerable to. While the full report highlighted a number of vulnerabilities found on the host, Figure 3 shows that this particular host is vulnerable to one plugin tested – containing 3 CVE’s (CVE-2012-2448, CVE-2012-2449, CVE-2012-2450).

The number of exploits and risks associated with them is not the area being addressed in this demonstration, but rather the ability to identify the hypervisors and attack it directly. The quantity and complexity of the attacks involved in exploiting type 1 hypervisors is currently much greater than those found on type 2 implementations. As with any technology, as popularity grows and new features are added then the greater the likelihood is that easy to acquire automated attacks will exist.

Figure 3 - Section of Nessus report highlighting highly rates vulnerabilities

Figure 3 – Section of Nessus report highlighting highly rates vulnerabilities

VMware greatly increased the security of their hypervisor through the replacement of their ESX product in favour of adopting the new lightweight (smaller code footprint) ESXi, which did not include their service console within the architecture of the code (VMware, 2012). However there are still elements within the hypervisor that continue to threaten its security. One of these is the notion that ESXi is by default configured to be accessible through a browser. Clients with Port 80 and 443 access to the hosts are able to directly access the hypervisor through a browser and even use the host to download the vSphere management client. While this may be convenient, it is the author’s opinion that this ‘out of the box’ configuration lacks the fundamental security posture that should be taken against such a high value target. An attacker with the vSphere client is able to directly manage the hypervisor once a username and password have been provided. It should also be noted that all ESXi servers (by default) are configured using the ‘root’ account, meaning that the only unknown credential required to manage the host is the root password and it would be possible to ‘brute-force’ this. Furthermore there is a customised brute forcing tool in the VASTO suite called “vmware_login”, which allows automatic dictionary or ‘brute-force’ login attempts.  In addition to all of these vectors, there are also the pertinent issues of existing network security issues such as MITM’s (Man-in-the-middle attack), which could expose these credentials.

To demonstrate the prevalence of exposed hypervisors, using the online search tool ‘Shodan’ (SHODAN, 2012) the author is able to search the internet for exposed ‘ESX’ hosts. In figure 4 we are able to see that Shodan has returned 749 results fitting that description.

Figure 4 - Results of a Shodan search for host containing the term "esx"

Figure 4 – Results of a Shodan search for host containing the term “esx”

While not all of the hosts returned by Shodan are active, a large number of them are still current and allow remote connections to be made over the internet. Shown in figure 5 is a valid connection to one of the returned addresses through a web browser showing an ESXi 5 host.

Figure 5 - Connection to the IP address of one of the hosts found by Shodan

Figure 5 – Connection to the IP address of one of the hosts found by Shodan





Introduction to the hypervisor

22 11 2013

The hypervisor is arguably one of the more misunderstood concepts of virtualization for technical professionals who are more familiar with traditional methods of computing, as it can often be viewed as simply another operating system. While the hypervisor may be a form of operating system, the implications surrounding the impact of a successful exploitation against the system cannot be likened to that of a traditional network operating system. The most obvious element that distinguishes a hypervisor from a traditional operating system is the far reaching implications that vulnerabilities in the hypervisor could have upon the entire system. There are numerous implementations of vendor hypervisors which all have differing levels of vulnerabilities associated. In this post a definition of typical hypervisor implementations is given to establish a baseline of understanding before continuing into the attacks.

In its simplest form, a hypervisor is a piece of code that controls the flow of instructions between guest operating systems and the physical hardware. The hypervisor emulates the physical characteristics of the actual machine such as the processor, RAM, network cards, etc. and presents a homogeneous environment to all the guests.  There are two types of hypervisors – ‘Native’ (also known as ‘Bare Metal’ or Type 1) and ‘Hosted’ (also known as Type 2).

Native hypervisors are installed directly onto the hardware, as would be done with any traditional operating system. There are also implementations of hypervisor that come preinstalled on the host ROM. Native hypervisors benefit from having direct access to the underlying physical hardware of the host, resulting in improved performance. As these systems are not full operating systems they also have the benefit of having a smaller attack surface and are therefore considered more secure.

Hosted hypervisors are installed onto the existing operating system, eg Windows or Linux, which is responsible for communication between the hardware and the hypervisor. This type of hypervisor is less efficient in terms of performance and security and is typically used on desktops rather than servers.

While most common ‘Type 1’ hypervisors are a fraction of the size of a typical desktop operating system – such as Windows 7, the code is still an additional layer of software that is added to the total attack surface of the machine. This underlying dependence, which all hosted machines have on the hypervisor, is one of the most contested factors around virtualisation security ie the ability to compromise the hypervisor and use it to ‘escape’  to other machines hosted on that software.

The security of a Hypervisor is comparable to that of a standard operating system, when considering the size and surface attack area. Systems such as OpenBSD and TrustedBSD allow a greater level of customisation and ability to greatly reduce the features available for a particular task. This lower default functionality offered by systems often has a direct correlation to its security. Hypervisors have typically based their security and efficiency on the amount of Source lines of code (SLOC) used. The core functionally of a hypervisor is to translate and schedule the flow of instructions from the guest to the hardware, anything additional to this could be described as a non-essential feature.

One example of a security focused hypervisor is IBM’s ‘sHype’ implementation of the Xen hypervisor. The total code of this project is claimed to be around 2600 lines in length. It is reported that ESXi and KVM hypervisors were around 200,000 SLOC in 2010 (Steinberg & Kauer, 2010, p. 3), with indications that this could rise and therefore further increasing the attack surface.