All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen hypervisor external denial of service vulnerability?
@ 2011-02-08 12:22 Pim van Riezen
  2011-02-08 12:39 ` Pim van Riezen
  0 siblings, 1 reply; 9+ messages in thread
From: Pim van Riezen @ 2011-02-08 12:22 UTC (permalink / raw)
  To: xen-devel

Good day,

In a scenario where we saw several dom0 nodes fall down due to a sustained SYN flood to a network range, we have been investigating issues with Xen under high network load. The results so far seem to be not so pretty. We recreated a lab setup that can reproduce the scenario with some reliability, although it takes a bit of trial-and-error to get crashes out of it.

SETUP:
2x Dell R710
	- 4x 6core AMD Opteron 6174
	- 128GB memory
	- Broadcom BCM5709
	- LSI SAS2008 rev.02
	- Emulex Saturn-X FC adapter
	- CentOS 5.5 w/ gitco Xen 4.0.1

1x NexSan SATABeast FC raid
1x Brocade FC switch
5x Flood sources (Dell R210)

The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on FC, half of which are set to start compiling a kernel in rc.local. There are also 2 HVM images on both machines doing the same.

Networking for all guests is configured in the bridging setup, attached to a specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née xenbr0.86.

Grub conf for the dom0s:

	kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off
	module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline
 xencons=tty

The flooding is always done to either the entire IP range the guests live in (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP floods), with random source addresses.

ISSUE:
When the pps rate gets into the insane territory (gigabit link saturated or near-saturated), the machine seems to start losing track of interrupts. Depending on the severity, this leads to CPU soft lockups on random cores. Under more dire circumstances, other hardware attached to the PCI bus starts timing out making the kernel lose track of storage. Usually the SAS-controller is the first to go, but I've also seen timeouts on the FC controller.

THINGS TRIED:
1. Raising the broadcom RX ring from 255 to 3000. No noticable effects.
2. Downgrading to Xen 3.4.3. No effect.
3. Different Dell BIOS versions. No effect.
4. Lowering number of guests -> effects get less serious. Not a serious option.
5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less serious when dealing with tcp SYN attacks. No effect when dealing with 28byte UDP attacks.
6. Disabling HPET as per http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with cpuidle=0 and disabling irqbalance -> effects get less serious.

The changes in 6 stop the machine from completely crapping itself, but I still get soft lockups, although they seem to be limited to one of these two paths:

 [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
 [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
 [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c
 [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
 [<ffffffff80274688>] smp_call_function+0x4e/0x5e
 [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
 [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a
 [<ffffffff802d7428>] kill_bdev+0x1b/0x30
 [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169
 [<ffffffff80213492>] __fput+0xd3/0x1bd
 [<ffffffff802243cb>] filp_close+0x5c/0x64
 [<ffffffff8021e5d0>] sys_close+0x88/0xbd
 [<ffffffff802602f9>] tracesys+0xab/0xb6

and

 [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca88>] xen_idle+0x38/0x4a
 [<ffffffff8024af6c>] cpu_idle+0x97/0xba
 [<ffffffff8064eb0f>] start_kernel+0x21f/0x224
 [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb

In some scenarios, an application running on the dom0 that relies on pthread_cond_timedwait seems to be hanging in all its thread on that specific call. This may be related to some timing going wonky during the attack, not sure.

Is there anything more we can try?

Cheers,
Pim van Riezen

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-10 17:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-08 12:22 Xen hypervisor external denial of service vulnerability? Pim van Riezen
2011-02-08 12:39 ` Pim van Riezen
2011-02-08 15:53   ` Pasi Kärkkäinen
2011-02-08 16:10     ` Pim van Riezen
2011-02-08 16:28       ` Pim van Riezen
2011-02-08 16:51         ` Pasi Kärkkäinen
2011-02-08 17:08           ` Pim van Riezen
2011-02-08 17:21             ` Pim van Riezen
2011-02-10 17:08               ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.