Re: xenconsoled CPU denial of service problem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Anthony Liguori <aliguori@us.ibm.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: xen-devel@lists.xensource.com
Subject: Re: xenconsoled CPU denial of service problem
Date: Wed, 04 Oct 2006 11:49:56 -0500	[thread overview]
Message-ID: <4523E634.7050901@us.ibm.com> (raw)
In-Reply-To: <20061004135026.GC474@redhat.com>

Considering that today in Xen we have a default buffer size, it seems 
considerably easier to me to just get rid of xenconsoled completely and 
expand the domU-kernel ring queue to be the actual size of what we're 
buffering today.

This eliminates all of these problems and gets rid of a dom0 daemon. 
Plus, the domU gets taxed for the buffer memory instead of dom0.

We would then change xenconsole to read the buffer directly.

How does this sound for 3.0.4?

Regards,

Anthony Liguori


Daniel P. Berrange wrote:
> Hi Keir,
> 
> Any feedback on this rate limiting patch for DomU consoles ?
> 
> Regards,
> Dan.
> 
> On Mon, Sep 11, 2006 at 03:19:05PM +0100, Daniel P. Berrange wrote:
>> On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote:
>>> On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote:
>>>
>>>> The patch sets
>>>>   - data size 5 kb
>>>>   - period 200 ms
>>>>   - delay 200 ms
>>> A few comments:
>>>  * I think the 'delay' parameter is not really useful. Think of this simply
>>> as a simple credit-based scheduler that replenishes credit every 'period'.
>>> So in this case you get 5kB every 200ms. If the domU offers more data in a
>>> period, it must wait until its credit is replenished at the end of the
>>> current 200ms period.
>>>  * I'm not sure bytes of data is the right thing to limit here. The main
>>> thing that hoses domain0 is the repeated rescheduling of the console daemon,
>>> and that is fundamentally caused by event-channel notifications. So it might
>>> make sense to rate limit the number of times we read the event-channel port
>>> from xc_evtchn_pending -- e.g., no more than 10 times a second (should be
>>> plenty). This has a few advantages: 1. Looking just at data transferred
>>> doesn't stop a malicious domain from hosing you with no-op event-channel
>>> notifications; 2. This is a fundamental metric that can be measured and
>>> rate-limited on all backend interfaces, so perhaps we can come up with some
>>> common library of code that we apply to all backends/daemons.
>> I've re-worked the patch based on this principle of "n events allowed
>> in each time-slice", setting n=30 & the time-slice = 200ms. The code
>> was actually much simpler than my previous patch so its definitely a
>> winning strategy.  Testing by running
>>
>>  'while /bin/true ; do echo t > /proc/sysrq-trigger; done'
>>
>> ..in one of the guest VMs on a  2.2 GHz Opteron, shows no significant CPU
>> utilization attributed to xenconsoled. I've not examined whether this code
>> can be put into a common library - it was simple enough to integrate in
>> the xenconsoled event loop
>>
>>> It may turn out we need to rate limit on data *as well*, if it turns out
>>> that sinking many kilobytes of data a second is prohibitively expensive, but
>>> I doubt this will happen. For a start, the strict limiting of notifications
>>> will encourage data to get queued up and improve batching of console data,
>>> and batches of data should be processed quite efficiently. This same
>>> argument extends to other backends (e.g., batching of requests in xenstored,
>>> netback, blkback, etc).
>> Based on initial testing it doesn't look like the data rate itself was causing
>> any significant overhead, once the event channel port reads were limited.
>>
>>
>>  Signed-off by: Daniel P. Berrange <berrange@redhat.com>
>>
>> Regards,
>> Dan.
>> -- 
>> |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
>> |=-           Perl modules: http://search.cpan.org/~danberr/              -=|
>> |=-               Projects: http://freshmeat.net/~danielpb/               -=|
>> |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 
> 
>> diff -r bfd00b317815 tools/console/daemon/io.c
>> --- a/tools/console/daemon/io.c	Mon Sep 11 01:55:03 2006 +0100
>> +++ b/tools/console/daemon/io.c	Mon Sep 11 10:09:32 2006 -0400
>> @@ -37,12 +37,18 @@
>>  #include <termios.h>
>>  #include <stdarg.h>
>>  #include <sys/mman.h>
>> +#include <sys/time.h>
>>  
>>  #define MAX(a, b) (((a) > (b)) ? (a) : (b))
>>  #define MIN(a, b) (((a) < (b)) ? (a) : (b))
>>  
>>  /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul terminator. */
>>  #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 + 2)
>> +
>> +/* How many events are allowed in each time period */
>> +#define RATE_LIMIT_ALLOWANCE 30
>> +/* Duration of each time period in ms */
>> +#define RATE_LIMIT_PERIOD 200
>>  
>>  struct buffer
>>  {
>> @@ -65,6 +71,8 @@ struct domain
>>  	evtchn_port_t local_port;
>>  	int xce_handle;
>>  	struct xencons_interface *interface;
>> +	int event_count;
>> +	long long next_period;
>>  };
>>  
>>  static struct domain *dom_head;
>> @@ -306,6 +314,13 @@ static struct domain *create_domain(int 
>>  {
>>  	struct domain *dom;
>>  	char *s;
>> +	struct timeval tv;
>> +
>> +	if (gettimeofday(&tv, NULL) < 0) {
>> +		dolog(LOG_ERR, "Cannot get time of day %s:%s:L%d",
>> +		      __FILE__, __FUNCTION__, __LINE__);
>> +		return NULL;
>> +	}
>>  
>>  	dom = (struct domain *)malloc(sizeof(struct domain));
>>  	if (dom == NULL) {
>> @@ -330,6 +345,8 @@ static struct domain *create_domain(int 
>>  	dom->buffer.size = 0;
>>  	dom->buffer.capacity = 0;
>>  	dom->buffer.max_capacity = 0;
>> +	dom->event_count = 0;
>> +	dom->next_period = (tv.tv_sec * 1000) + (tv.tv_usec / 1000) + RATE_LIMIT_PERIOD;
>>  	dom->next = NULL;
>>  
>>  	dom->ring_ref = -1;
>> @@ -512,9 +529,12 @@ static void handle_ring_read(struct doma
>>  	if ((port = xc_evtchn_pending(dom->xce_handle)) == -1)
>>  		return;
>>  
>> +	dom->event_count++;
>> +
>>  	buffer_append(dom);
>>  
>> -	(void)xc_evtchn_unmask(dom->xce_handle, port);
>> +	if (dom->event_count < RATE_LIMIT_ALLOWANCE)
>> +		(void)xc_evtchn_unmask(dom->xce_handle, port);
>>  }
>>  
>>  static void handle_xs(void)
>> @@ -549,6 +569,9 @@ void handle_io(void)
>>  	do {
>>  		struct domain *d, *n;
>>  		int max_fd = -1;
>> +		struct timeval timeout;
>> +		struct timeval tv;
>> +		long long now, next_timeout = 0;
>>  
>>  		FD_ZERO(&readfds);
>>  		FD_ZERO(&writefds);
>> @@ -556,8 +579,33 @@ void handle_io(void)
>>  		FD_SET(xs_fileno(xs), &readfds);
>>  		max_fd = MAX(xs_fileno(xs), max_fd);
>>  
>> +		if (gettimeofday(&tv, NULL) < 0)
>> +			return;
>> +		now = (tv.tv_sec * 1000) + (tv.tv_usec / 1000);
>> +
>> +		/* Re-calculate any event counter allowances & unblock
>> +		   domains with new allowance */
>>  		for (d = dom_head; d; d = d->next) {
>> -			if (d->xce_handle != -1) {
>> +			/* Add 5ms of fuzz since select() often returns
>> +			   a couple of ms sooner than requested. Without
>> +			   the fuzz we typically do an extra spin in select()
>> +			   with a 1/2 ms timeout every other iteration */
>> +			if ((now+5) > d->next_period) {
>> +				d->next_period = now + RATE_LIMIT_PERIOD;
>> +				if (d->event_count >= RATE_LIMIT_ALLOWANCE) {
>> +					(void)xc_evtchn_unmask(d->xce_handle, d->local_port);
>> +				}
>> +				d->event_count = 0;
>> +			}
>> +		}
>> +
>> +		for (d = dom_head; d; d = d->next) {
>> +			if (d->event_count >= RATE_LIMIT_ALLOWANCE) {
>> +				/* Determine if we're going to be the next time slice to expire */
>> +				if (!next_timeout ||
>> +				    d->next_period < next_timeout)
>> +					next_timeout = d->next_period;
>> +			} else if (d->xce_handle != -1) {
>>  				int evtchn_fd = xc_evtchn_fd(d->xce_handle);
>>  				FD_SET(evtchn_fd, &readfds);
>>  				max_fd = MAX(evtchn_fd, max_fd);
>> @@ -573,16 +621,29 @@ void handle_io(void)
>>  			}
>>  		}
>>  
>> -		ret = select(max_fd + 1, &readfds, &writefds, 0, NULL);
>> +		/* If any domain has been rate limited, we need to work
>> +		   out what timeout to supply to select */
>> +		if (next_timeout) {
>> +			long long duration = (next_timeout - now);
>> +			/* Shouldn't happen, but sanity check force greater than 0 */
>> +			if (duration <= 0)
>> +				duration = 1;
>> +			timeout.tv_sec = duration / 1000;
>> +			timeout.tv_usec = (duration - (timeout.tv_sec * 1000)) * 1000;
>> +		}
>> +
>> +		ret = select(max_fd + 1, &readfds, &writefds, 0, next_timeout ? &timeout : NULL);
>>  
>>  		if (FD_ISSET(xs_fileno(xs), &readfds))
>>  			handle_xs();
>>  
>>  		for (d = dom_head; d; d = n) {
>>  			n = d->next;
>> -			if (d->xce_handle != -1 &&
>> -			    FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds))
>> -				handle_ring_read(d);
>> +			if (d->event_count < RATE_LIMIT_ALLOWANCE) {
>> +				if (d->xce_handle != -1 &&
>> +				    FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds))
>> +					handle_ring_read(d);
>> +			}
>>  
>>  			if (d->tty_fd != -1 && FD_ISSET(d->tty_fd, &readfds))
>>  				handle_tty_read(d);
> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
>

next prev parent reply	other threads:[~2006-10-04 16:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-28 18:02 xenconsoled CPU denial of service problem Daniel P. Berrange
2006-08-28 20:57 ` Keir Fraser
2006-08-29 15:59   ` Daniel P. Berrange
2006-08-30 18:13     ` Keir Fraser
2006-09-11 14:19       ` Daniel P. Berrange
2006-10-04 13:50         ` Daniel P. Berrange
2006-10-04 14:19           ` Keir Fraser
2006-10-04 16:49           ` Anthony Liguori [this message]
2006-10-04 17:19             ` Daniel P. Berrange
2006-10-04 17:42               ` Keir Fraser
2006-10-04 17:52               ` Anthony Liguori
2006-10-04 18:10                 ` Daniel P. Berrange
2006-10-04 18:19                   ` Anthony Liguori
2006-08-30 18:13     ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4523E634.7050901@us.ibm.com \
    --to=aliguori@us.ibm.com \
    --cc=berrange@redhat.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.