Attacking additional virtualisation infrastructure

10 02 2015

I should start by stating that in almost all documents regarding live migration, High Availability (HA) and Fault tolerance configurations, it is stated that these networks should be placed on isolated networks that are not shared or accessible by unauthorised individuals. In an article by (Siebert, 2011) it specifically states to:

“Keep management and storage traffic, for instance, on a physically isolated network that’s away from the regular VM network traffic”

However, this does not diminish the argument that this information is still being moved inside the environment and should these guidelines be overlooked or the networks breached, the consequences of what can be achieved can be unintelligible to staff only familiar with traditional methods.

Live migration traffic

Live migration can be used to move machines between hosts that are located in the same blade chassis, across campuses and even over different continents, with minimal downtime (Travostino, et al., 2006). The action of preforming a live migration from one host to another is a common feature found in many of the larger VMMM such as XEN and VMware. The act of moving a machine across hosts in the VMware environment can be a manual or automated process when combined with other features such as DRS and DPM (Distributed Power Management). While explanations of the DRS feature is discussed in numerous earlier posts, DPM enables vCenter to automate the movement of machines off hosts during quiet periods of usage, allowing them to be powered down thus improving power consumption. It is my personal option that live migration is one of the more intriguing introductions of virtualisation, as the information that is located in RAM contains such important information.

Using the test environment I will demonstrate what is possible when access to the vMotion network is obtained. For this example a vMotion has been initiated specifying that VM1 on ESX1 is to be migrated – and only the host portion is required to move and not the datastore. The VM will be moved to the only other host in the cluster, ‘ESX2’. To understand how this process is vulnerable to attacks I will first breakdown the processes of the steps involved in this operation (Kutz, 2007):

  1. The request is made specifying the VM and which host it will be moved to.
  2. All of the RAM of VM1 is copied over the vMotion network to ESX2, any active changes happening during this time on ESX1 are written to a memory bitmap on ESX1.
  3. VM1 is quiesced on ESX1 and the memory contained in the bitmap is copied over the vMotion network to ESX2.
  4. VM1 is started on ESX2 and access to the VM are sent to the instance on ESX2.
  5. Remaining memory from VM1 is copied from ESX1, while memory is being read and written from VM1 on ESX1.
  6. Once successful, VM1 is unregistered on ESX1 and the task is complete.

In figure 1 we see a basic visual representation of the process in the test environment. For demonstration purposes and ease of result display, I have a separate vMotion network that uses a hub to connect the two machines. This will allow me to sniff traffic without having to perform an additional attack such as MiTM or configure port mirroring, port spanning, SPAN, RSPAN etc.

Basic visual representation of the process of a machine being migrated to another host in the test environment

Basic visual representation of the process of a machine being migrated to another host in the test environment

In the virtual machine that will be moved, I have written information into a text file but not saved it to disk, so that we can be sure it will be located in the machines memory. This information in the example is a representation of a secure document containing a username and password. This text file and content are shown in Figure 2.

An unsaved text file that has been written on the running virtual machine

An unsaved text file that has been written on the running virtual machine

I now initiate a vMotion request and starts a packet capture on the vMotion network using Wireshark. The network card is configured in promiscuous mode, to allow it to capture all of the packets being transmitted on the network. After the vMotion process has finished and the capture stopped, I am able to follow the TCP stream of the communication that took place between the two hosts and view the information in a searchable form. As is shown in Figure 3, the information transmitted in the document is viewable in clear text, although separated by some punctuation

A TPC stream of communication between the two hosts showing the notepad content

A TPC stream of communication between the two hosts showing the notepad content

Although this example used a notepad document to demonstrate the ability to sniff data, there are much greater risks to consider when addressing passive snooping attack on migration traffic. Microsoft stores LM authentication hashes for active sessions in memory (Pilkington, 2012) in all current versions of Windows. As a result, should a passive snooping attack take place while a machine is locked or logged on, the LM hash will also be viewable in the capture. Windows LM hashes are also reversible back into the user’s original password using online services such as (OS – Objectif Securite, 2012) and other hash cracking tools. Finally, and possibly of most concern, is the accessibility of encryption keys. Even when full disk encryption is used to secure data, the keys (after initial user input) are cached in memory by the operating system. On traditional hardware this is considered the safest place, as volatile memory is erased quickly once the power is taken away. This is something that usually requires physical access to the machine to exploit, unless the machine is already infected. It is now possible to sniff the encryption key during a live migration to obtain the encryption hash.

Live migration attacks are not only vulnerable to sniffing attacks. In a paper written in 2007 by (Oberheide, et al., 2007), the researchers describes a number of attacks that are possible on live migration traffic. In the paper the researchers discus three ‘classes’ of threats that can be used against live migration environments:

  • Control plane
  • Data Plane
  • Migration Module

Control plane attacks

Control plane attacks target the mechanisms that are employed to initiate and manage the migrations within the infrastructure manager. While there were no active attacks in the paper, the theory behind the attacks is still relevant and should be considered in a risk analysis exercise. The examples in the paper that (Oberheide, et al., 2007) highlights, demonstrate how successful attacks to the control plane could result in:

  • The migration of machines onto illegitimate hosts that are owned by the attacker.
  • Mass migration of a large number of machines, thus overloading the network and causing disruption to service.
  • Manipulating the resources management for hosts in a DRS style environment, so that hosts are not evenly distributed and overwhelm single resources.

Data plane attacks

Data plane attacks are threats that take place on the networks on which the migrations are situated. The passive snooping attack that was demonstrated previously is one of the attacks that are briefly mentioned in the paper. The second attack on the data plane class is described as ‘active manipulation”. As the name suggests, active manipulation is when data is changed during the migration of the machine. Although in the paper (Oberheide, et al., 2007) introduce their custom tool Xensploit for preforming this attack, it is merely the collection of existing attacks collated into one tool for ease of use. The attack works by using MiTM characteristics to intercept traffic between the two hosts and manipulating sections of the traffic (RAM) during transit. In the example of the Xensploit software in the paper (Oberheide, et al., 2007, p. 4) showed how the attack could be used to establish an SSH (Secure Shell) session to a machine configured to only allow connections from authorized sources. Using Xensploit they were able to manipulate within the object code of the SSHD process to add their key as an authorised source, thus allowing them to SSH once the new instance had completed its migration.

Migration Model attacks

Lastly, although these attacks are included in the paper, I feels that this section falls under one of my other previous topics of the hypervisor rather than on live migration. The paper briefly covers attacks that exploit vulnerabilities in the migration models – that are part of the VMM (virtual machine monitor).

Combining attacks

Gaining access to any of these attack classes should be considered a high risk to security staff, although with access to just one of the vectors, it may still not be possible for an attacker to target a specific machine depending on its physical location. If a combination of the attacks were available, the attacker would be able to leverage them to better achieve control of the environment. An example of this is in the case of data plane class attacks. These attacks are only useful if the target machine is being migrated during the period of capture. In environments where features such as DRS and DPM are not in use, it is possible for machines to stay fixed to a host for a considerable period of time. If the attacker was able to utilize an attack at the control plane they would be able to trigger the migration of the necessary machines for the attack to then take place at the data plane.

Shortly after the paper was released, VMware’s (Wu, 2008) comments on this attack:

“Although impressive, this work by no means represents any new security risk in the datacentre… Rather, it a reminder of how an already-compromised network, if left unchecked, could be used to stage additional severe attacks in any environment, virtual or physical.”

While I can appreciate what (Wu, 2008) is saying, I must disagree that these types of attacks are comparable to physical environments. Data-In-Transit on physical systems do not tend to include such sensitive information. It is also better understood in traditional systems that unsafe protocols such as email and ftp should not be used to transfer confidential information. While the previous example required physical access to the vMotion network, it is also possible to access this data remotely through misconfiguration, access to the management interface or manipulation of virtual infrastructure.

Wu, W., 2008. VMware Security & Compliance Blog. [Online] Available at: http://blogs.vmware.com/security/2008/02/keeping-your-vm.html

Siebert, E., 2011. Five VMware security breaches that should never happen. [Online] Available at: http://searchvmware.techtarget.com/tip/Five-VMware-security-breaches-that-should-never-happen

Travostino, F., Daspit, P. & Gommans, L., 2006. Seamless live migration of virtual machines over the MAN/WAN. Future Generation Computer Systems – IGrid 2005: The global lambda integrated facility, 22(8), pp. 901-907.

Kutz, A., 2007. How to obtain, configure and use VMotion and how VMotion works. [Online]
Available at: http://searchvmware.techtarget.com/tip/How-to-obtain-configure-and-use-VMotion-and-how-VMotion-works
Pilkington, M., 2012. Protecting Privileged Domain Accounts: LM Hashes — The Good, the Bad, and the Ugly. [Online]
Available at: http://computer-forensics.sans.org/blog/2012/02/29/protecting-privileged-domain-accounts-lm-hashes-the-good-the-bad-and-the-ugly

Oberheide, J., Cooke, E. & Jahanian, F., 2007. Empirical Exploitation of Live Virtual Machine Migratio. [Online]
Available at: http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-539-07.pdf

Advertisements




Additional virtualisation infrastructure

27 11 2014

Virtual environments introduce a host of new features that are designed to increase the availability and manageability of computer systems. These features include the ability to move live virtual machines across physical hosts with little to no disruption to service and even automatically shift entire workloads and power off unnecessary physical hosts, to better utilise power consumption.

These features rely heavily on networks to transmit information between physical hosts. An example of one feature that is dependent on one of these new networks is VMware’s vMotion. VMotion allows virtual machines to be moved across hosts in an infrastructure, allowing hosts to be taken down for maintenance or patching. This is done by transmitting a snapshot of the VM’s RAM across a network to the receiving host. Features like vMotion mean that systems benefit from extremely high uptime, with relatively low cost implications, in comparison to their traditional counterparts. Although vMotion is a VMware product, the concept of live migration is the same concept across numerous implementations, including Xen’s implementation named ‘XenMotion’.

The information that is now transmitted over these networks as a result of these features can pose serious security considerations that are not comparable in traditional systems. The movement of RAM from outside the chassis of a server tower is something that has not been an issue prior to these features. Now is it possible (in poorly designed implementation) for entire portions of RAM to be sent unencrypted over user-accessible networks.

The new networks introduced into virtual infrastructures are not exclusively reserved for HA features however. The centralisation of storage has also introduced fast, multi-connected, multipath routing storage networks, which connect the hosts to the data stores. The information that traverses these networks is that same that would have been transmitted over the internal SCSI or SATA connection in traditional servers. Storage offers increased capability when compared to its predecessor, an example of its capability is described by (Meth, et al., 2003):

“In a storage area network, it is possible to perform LAN-free and server-free backup operations that copy data from a storage device directly to another storage device without transferring the data across the general-purpose network and the servers.  In other words, data are sent across the dedicated storage area network directly between the source and destination storage devices.”

The Storage area network is an element of the virtual infrastructure that is often left unsecured as it is not uncommonly configured by separate groups of specialists who are not as security conscious as the networking teams (Lewis, 2002). While SAN security is a very pertinent threat when discussing virtual environments, I will not be detailing how attacks and mitigation techniques can be achieved in this blog because of the level of familiarity that is required and the difference in technologies when compared to regular networking. There are however numerous pre-existing guides to SAN security that should be consulted before introducing the technology into any environment (BROCADE, 2007), (Haron, 2002), (Majstor, 2004).

Microsoft , 2012. The OSI Model’s Seven Layers Defined and Functions Explained. [Online] Available at: http://support.microsoft.com/kb/103884?wa=wsignin1.0

Lewis, M., 2002. Unsecure SANs invitation for hackers. [Online] Available at: http://searchstorage.techtarget.com/news/812240/Unsecure-SANs-invitation-for-hackers

BROCADE, 2007. The Growing Need for Security in Storage Area Networks. [Online] Available at: http://www.hds.com/assets/pdf/white-paper-for-security-in-storage-area-networks.pdf

Haron, M., 2002. Is Your Storage Area Network Secure? An Overview of Storage Area Network from Security Perspective. [Online] Available at: http://www.sans.org/reading_room/whitepapers/storage/storage-area-network-secure-overview-storage-area-network-security-perspective_516

Majstor, F., 2004. Storage Area Networks Security Protocols and Mechanisms. [Online] Available at: http://www.employees.org/~franjo/papers/SAN_Security_WP_v1.pdf





Mitigation techniques for management interfaces

13 11 2014

In VMware’s hardening guide, they offer a number of mitigation techniques that can be used to further secure vCenter from exploitation. The majority of the recommended measures around securing vCenter are of an operational nature rather than reconfiguring settings within VMware. A total of six of the options concerning vCenter alone in the documentation include measures that should be taken in the network to avoid the likelihood of MiTM attacks. Much the same as in my earlier post on hypervisors and throughout my blog, isolating these networks from any user reachable subnet is an advised approach for all management interfaces. If an attacker is unable to directly query the target, they are not able to directly exploit it without exploiting another system or finding a flaw in the software controlling the ACL’s. Although the segmentation approach will thwart the majority of conventional attacks, it is the authors opinion that securing such high target interfaces with usernames and passwords is inadequate and that further authentication processes are necessary.

In an article regarding storage security (Schulz, et al., 2005) commented:

“The strength of any authentication mechanism is based on the quality of the implementation and the strength of credentials. If the credentials are weak, or if authentication data is exposed due to faulty implementation, the mechanism itself can and will be defeated”

While not native to the vSphere product, it is possible to use third party solutions, such as HyTrust (HyTrust, 2012) to require ‘two factor authentication’ for access to the interface. It is also possible to enable two-factor authentication on other admin interfaces such as HP’s iLO. This adopts the defence-in-depth model that is used to ensure that the integrity of a system is not dependent on only one element.

When two-factor authentication is not available, standard network protection measures should be followed that are intransigent with traditional aspects of network security, such as strong passwords and account lockout policies.  It is also advisable to use the method of least privilege when creating the accounts that will have access to the management interfaces, to reduce the result of an attack. All management interfaces have differing levels of customization from the lesser granular options of the Cisco UCS to the highly configurable VMware vCenter. When configuring a user account in VMware’s vCenter, an administrator has the ability to granularly allow/deny individual actions on individual machines that are managed in that environment. An example of the granularity of vCenter’s permissions can be seen in Figure 1. Restricting user’s access to only allow access to the areas of the interface that are needed will reduce the total impact to the environment should a compromise of that account occur.

A small percentage of the options available when configuring permissions in vCenter

A small percentage of the options available when configuring permissions in vCenter

Schulz, G. et al., 2005. Virtualization Journal. [Online] Available at: http://virtualization.sys-con.com/node/48056

HyTrust, 2012. Two factors are better than one. [Online] Available at: http://www.hytrust.com/solutions/security/two-factor





Post exploitation on management interfaces

7 11 2014

Once an attacker successfully gains admin access to a particular element of the management infrastructure, what is possible? In traditional networks, the compromise of an ‘Enterprise admin’ or ‘root’ would be a worst case scenario. Using this account, an attacker could logon to servers, delete accounts and data/policies and affect services running on those servers. Although without pre-defined scripting, all the actions would need to be done manually on each server, which of course would have time implications. The attack would be effective, but once identified, action could be taken on uncompromised servers and machines not connected on the network by a domain trust would remain unaffected.

Now consider a fully virtualised environment that is fully managed by a single management interface (which, from the author’s experience, is not uncommon). Should an attacker obtain admin access to a management interface, then the implications are much greater.

Data centre management system (vCenter, SCOM)

If an attacker was to gain admin access to the management software that controls the environment, the implications from a defensive point are devastating. From inside the console they would be able to delete configurations, virtual switches and even full machines from both a management and disk perspective. What is often done in large environments is to create a disaster recovery environment in another location of the campus/business, so in the event of a loss of service in one location, the mirror environment could be used. It is not uncommon for both locations to be managed by the same management software as it allows increased replication and failover. While this may increase availability and functionality, it also means that gaining access to the one system could potentially allow an attacker to erase an entire site including any recovery environment.

While it may not have the same initial impact, there are also a number of other actions that could be completed by the attacker once access to a management interface is gained. Misconfigurations in the way the hardware performs could go unnoticed for a long period of time, but still cause huge disruption to services that may be harder and more time consuming to identify. There are also persistent attacks that could be configured for snooping, such as placing switches into promiscuous mode in order to preform undetected information gathering. It is also possible from the interface to copy full machines offline and boot them up using consumer tools such as VMware Player (VMware, 2012) for analysis and offline attacks (Siebert, 2011).

Storage interfaces

While there are only so many actions that can be done as a result of gaining access to the interfaces that manage the storage, they are equally disruptive. Entire disk arrays can be

remotely wiped, resulting in the loss of multiple servers and data stored on that storage in one action. Disks could also be misconfigured to affect performance on all machines operating on that array

Hardware management (iLO, DRAC, RSA, On-board Administrator)

With access to the hardware management interface an attacker would be able to perform physical actions on the infrastructure remotely. These control the hardware that manages the hardware of the hypervisors, the interface fabric, power and cooling on blade systems. Although most of the actions from here are reversible, they still present a single targetable area that has the potential to severely affect service for a considerable period of time.

 

Siebert, E., 2011. Five VMware security breaches that should never happen. [Online] Available at: http://searchvmware.techtarget.com/tip/Five-VMware-security-breaches-that-should-never-happen





Management interfaces

20 09 2014

With all the additional functionality of virtualisation come added management requirements. From an infrastructure perspective alone this has resulted in the introduction of a number of infrastructure managements often referred to as the VMMM (virtual machine monitors’ management) in some academic work (van Cleeff, et al., 2009), across multiple vendors including vSphere, XenCenter, System Centre Configuration Manager, Eucalyptus, UVMM and many others. These tools allow administrators to mimic physical interactions with the machines that are now no longer possible in virtual environments. These actions can include creating new machines, adjusting resources allocation and general maintenance of the VM’s.

The addition of management interfaces is not exclusive to the virtual machine portion; either storage or backup and even the infrastructure that runs the virtual hardware (blade chassis) also introduce new management interfaces into the environment. These have the ability to remotely configure arrays, take entire copies of live machines and even turn off the physical hardware that the hypervisors run on.

While all are designed to improve the manageability for administrators, these management interfaces introduce additional code and complexity into environments that create new attack vectors. When exploited, these vectors can lead to an attacker successfully performing actions remotely that would have previously been impossible to achieve in traditional systems. The type of attacks that could be completed, should an attacker gain access to any one of these management interfaces, have the potential to affect numerous machines and services simultaneously, even forcing administrators to implement disaster recovery strategies.





Attacking shared hardware used for virtualisation

5 05 2014

There are a number of conjectural and proven attacks that involve the exploitation of shared hardware. One of the more relevant attacks that threaten virtual environments is the ability to degrade the performance of other machines by causing an unpredictable strain on the shared hardware. This could be possible either by taking control of a virtual machine in the environment through an existing software exploit or in the case of a cloud provider, simply by purchasing one. Amazon has multiple security measures in place to deal with inside attacks on their Amazon Web Services (AWS) platform, although with the pricing of a Microsoft Windows instance costing as little as $0.115 per hour, there is a very low cost entry for attackers.  While one moderately powered machine would not be able to affect numerous neighbouring client’s ‘performance’ on Amazon’s infrastructure, this low entry figure demonstrates how little it would cost to rent multiple instances for a clustered attack.

Although not primarily considered a security issue, resource contention is a major issue within virtual systems, especially when operating in multi-tenant environments. The term “noisy neighbours” is used to describe virtual instances of a machine sharing the same host or storage as another and affecting its performance. Problems caused by noisy neighbours or resource intensive virtual machines are typically due to either a misconfiguration or simply from being unfortunate enough to be placed on the same hardware as other high performance machines. However, when considering this issue from a security perspective, if an attacker is able to place a number of machines on the shared hardware as a competitor’s machine, they have the ability to degrade the performance. There has been prior research conducted into determining the internal mappings of a machine within large cloud infrastructures. One paper entitled “Hey, You, Get Off of My Cloud!”  by (Ristenpart, et al., 2009), the authors use the Amazon EC2 service as their environment to test the ability to map the internal location of machines and discuss how this information can be used to construct machines that co-reside with specific targets.  While the specific methods involved in determining the internal location of a machine in large cloud environments are out of scope of this article, in the paper “Hey, You, Get Off of My Cloud!” (Ristenpart, et al., 2009) a description of how an accurate mapping can be achieved using “timestamp fingerprinting” and “Cache-Based Detection” is given.

Studies often measure the impact that noisy neighbours cause on co-residing tenants by analysing RAM, CPU or network usage. While these are relative elements that are affected, one major drawback in only measuring these aspects is that disk activity, such as the IOPS (Input/Output Operations Per Second) on to shared storage is not taken into consideration. This can be one of the more difficult elements to measure, as storage arrays can differ greatly in both size and performance, even within the same provider. Misbehaving disk activity can also be much more erratic in its usage, especially when compared to RAM, which tends to gradually increase rather than produce the spikes in performance that are seen in IOPS.

One attack that would be possible using shared storage would be to use the mapping techniques discovered by (Ristenpart, et al., 2009) to place a group of machines on the same storage array or LUN as a target before generating high I/O. If the activity generated was high enough, contention for disk access would be experienced by all machines using that storage and as a result, machines become noticeably slower, due to the disk latency created. Amazon EC2 does not limit the amount of I/O that a machine can use, as it is a chargeable resource that is billed based on usage to the owning customers account. These charges would obviously not be a problem for an attacker using a stolen credit card for example. While there are a number of articles (Cockcroft, 2011) about the consequences of sharing storage with other busy or malfunctioning VM’s, the author has not found any documentation on using heavily crafted IOPS as being a documented or recognised attack.  A demonstration of how this attack could be carried out is shown later in this section.

Attacks that use shared hardware as a vector are not only capable of producing new attack vectors that affect the availability, but all three aspects of the Confidentiality, Integrity, Availability (CIA) Triad (Perrin, 2008). The confidentiality of machines on virtual systems should also be a consideration before the adoption takes place.  While some of the attacks that surround exploiting the confidentiality and integrity portion of the CIA Triad using shared hardware can fall on the academic side of the spectrum rather than active exploits, these concepts should at least be taken into consideration, especially by high risk targets.

One example of how a shared CPU can be manipulated is (Phil, 2012) demonstration of how two machines running on the same host can communicate with each other without using any networking protocols. This types of attack is typically knows as side-channel attack and has been a known issue for a number of years (Page, 2003), (Osvik, et al., 2005).  In (Phil, 2012) ‘virtualisation specific attack’, there are a number of pre-requisites required for the attack to be successful. These include both of the virtual machines requiring the same number of processors and running on the VMware platform with unlimited CPU resources. However, once all of the appropriate elements are in place (Phil, 2012) was able to send data bits from one VM to another over the CPU by oversubscribing the hardware. While these attacks may be an extremely niche and inefficient with transfer rates being as slow as 0.5bits/sec (depending on the noise of other machines on that host), it does show the principals of how attacking virtual machines at this layer is possible.

An area that the author would be interested in investigating further (due to being unable to find any research that has been undertaken in the area) would be the security implications of shared hardware involved in blade environments. The most effective way to ensure the integrity of an environment is to adopt the ultra-cautious approach of disconnecting machines from the internet and any other connecting networks. This is known as an ‘air-gap’, and is typically used to secure high target environments such as SCATA (supervisory control and data acquisition) systems. Blade systems such as PowerEdge M1000e offer “compelling operational benefits, such as improved cabling, rapid hardware provisioning, high compute density, energy-efficient design and increasing management automation”, which can offer enough resources to individually power an entire large organisation or business. Using VLAN’s, multiple networks can be hosted within the one enclosure including Demilitarized Zones (DMZ) and Virtual Desktop Infrastructures (VDI). While research has been done into the sharing of components such as RAM, CPU etc., elements of the blade environment such as the chassis backplane and connection fabric into the system pose an equal if not greater risk. If malicious software was able to infect the software that manages these physical elements of the system they could potentially monitor and affect the integrity of information to and from any virtual machine or host.

As discussed earlier, when placed on the same storage array as a number of machines, an attacker may be able to affect the performance of other machines by requesting large amounts of disk I/O on a shared storage array. To demonstrate the plausibility of this attack the author conducted a simulation of two attack machines and one target machines that were placed on the same storage array. I will post a full description of the simulation in a separate posting. To demonstrate the disruption caused by this attack, the experiment will be using the built-in monitoring tool ‘esxtop’. The figure indicated under the GAVG/cmd column is the figure that will best demonstrate the impact the attack has on the storage array. This figure identifies the “response time as it is perceived by the guest operating system” by adding the “average response time in milliseconds per command being sent to the device” (DAVG/cmd)  to “the amount of time the command spends in the VMkernel” (KAVG/cmd).

The simulation used three machines to demonstrate this process, two representing the controlled machines of the attacker and one the victim machine. Both of the attacking machines are running a freely available Microsoft SQL I/O stress testing/benchmarking utility named “SQLIO”. To simulate high I/O the author initiates the utility using a snippet of the parameters shown below.

“sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt

sqlio -kR -s360 -frandom -o8 -b8 -LS -Fparam.txt…”

 The ‘frandom’ perimeter in the SQLIO utility generates random reads and writes rather than sequential, as random disk activity is known for being more intensive on storage devices (Kelkar, 2011). This resulted in the number of read operations on one of the attacking machines to rise to a consistent rate of 5323.41 commands per second, causing the GAVG to rise from zero to 82.31 milliseconds on the attacking machine and from zero to 47.41ms on the victim machine. While these contention results fluctuated during the tests, the GAVG was consistently above 30 ms on both one of the attacking machines and the victim machine during the test as is shown in Figure 1 and on the graph in Figure 2.

Statistics for each machine during the I/O tests

Figure 1 – statistics for each machine during the I/O tests

The average figures that were shown by the monitoring software also demonstrate the high latency that was experienced by each machine. To demonstrate the impact that the attack has on the machines response time (GAVG), Figure 2 shows the average GAVG figure that was reported by each VM before the script is run and then for the following 5 minutes. The graph shows that the average GAVG before the script was run was instant at 0ms, but once the script was initiated this figure increased, peaking at around 82ms. The average response time for the victim machine throughout the 5 minute period was 46.77ms, which is 36.77ms above that recommended by VMware.

Figure 9 - Graph showing the average millisecond GAVG response time reported for each guest OS during the testing

Figure 2 – Graph showing the average millisecond GAVG response time reported for each guest OS during the testing

This graph demonstrates that it is possible for an attacker with machines located on the same shared storage array as their target, to be able to adversely affect the performance of other machines through over subscription of the hardware.

 

Sources:

Ristenpart, T., Tromer, E., Shacham, H. & Savage, S., 2009. Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. [Online]  Available at: http://cseweb.ucsd.edu/~hovav/dist/cloudsec.pdf

Cockcroft, A., 2011. Understanding and using Amazon EBS – Elastic Block Store. [Online]  Available at: http://perfcap.blogspot.co.uk/2011/03/understanding-and-using-amazon-ebs.html

Perrin, C., 2008. The CIA Triad. [Online]
Available at: http://www.techrepublic.com/blog/security/the-cia-triad/488

Osvik, D. A., Shamir, A. & Tromer, E., 2005. Cache Attacks and Countermeasures: the Case of AES. Rehovot: Department of Computer Science and Applied Mathematics.

 





Introduction to Shared hardware

10 04 2014

The ability to distribute resources across multiple physical machines has played a pivotal role in the growth of virtualisation today. The term elastic computing is sometimes used to describe the nature of virtualised environments as resources can be added and retracted dependent on their need. In traditional environments, high performance CPU’s were often required to accommodate fluctuations in processing requirements, however the average utilisation of the CPU is typically below 10% . Shown in the table below is a VMware comparison of the hardware requirement for three different industries and their CPU average utilisation, when compared to that of a virtual implementation. The chart compares the computing requirements for three different industries and demonstrates how the use of virtualisation can lower the number of physical servers, while also improving elements of services and setup time.

Table demonstrating the benefits of virtualisation by comparing the typical computing requirement of three different industries (VMware, 2006)

Table demonstrating the benefits of virtualisation by comparing the typical computing requirement of three different industries (VMware, 2006)

The benefits of virtualisation span further than processing however, storage being another area that benefits tremendously. Storage area networks (SAN’s) are used to centrally manage storage in one location rather than managing multiple RAID sets in an individual chassis. Much like CPU’s, resources can be added to machines dynamically rather than requiring all resources to be factored in during initial installation, thus reducing the total cost of ownership (TCO).

The ability to share resources also results in exponential improvements in availability. As entire infrastructures can effectively run on one piece of hardware such as with blade systems, adding multiple layers of redundancy is a one-time process that benefits all machines residing in that system throughout the lifetime of all the servers within the infrastructure. Equipping traditional tower or rack servers with the level of redundancy that is seen in a blade system or datacentre would be highly impractical and costly.
However, with this new capacity to share resources across multiple machines comes a host of vectors that require planning and mitigation methods to be in place before adoption. When removing oneself from the obvious benefits offered by shared resources and observing from a security perspective, the idea of having multiple machines sharing the same sticks of RAM, CPU and hard disks seems likely to introduce some negative security implications.

As is covered in an earlier article on hypervisors, machines hosted on the same host can be vulnerable to attacks through misconfigurations and software exploits within the hypervisor and this is no different with the hardware aspect. Guests residing on the same host as other guests can potentially interfere with each other using the very hardware they both run on. All of these attacks and opportunities to interfere with host neighbours are especially pertinent when considering the implications associated with a public cloud perspective.

In the same way it is possible for attackers to craft exploits against the software to exploit the hypervisor and it is also possible to target behaviours in the hardware of virtual systems to exchange information and disrupt service. When describing hardware attacks, the author will not discuss the implications and methods in obtaining physical access to the hardware in this section, as in the author’s opinion once physical access is gained to any hardware system then it is easy to compromise the security. All of the attacks in this next article are achievable through remote access to the system.