From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Daniel P. Berrange" Subject: Re: xenconsoled CPU denial of service problem Date: Mon, 11 Sep 2006 15:19:05 +0100 Message-ID: <20060911141905.GB26811@redhat.com> References: <20060829155900.GC970@redhat.com> Reply-To: "Daniel P. Berrange" Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="X1bOJ3K7DJ5YkBrT" Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --X1bOJ3K7DJ5YkBrT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote: > On 29/8/06 4:59 pm, "Daniel P. Berrange" wrote: > > > The patch sets > > - data size 5 kb > > - period 200 ms > > - delay 200 ms > > A few comments: > * I think the 'delay' parameter is not really useful. Think of this simply > as a simple credit-based scheduler that replenishes credit every 'period'. > So in this case you get 5kB every 200ms. If the domU offers more data in a > period, it must wait until its credit is replenished at the end of the > current 200ms period. > * I'm not sure bytes of data is the right thing to limit here. The main > thing that hoses domain0 is the repeated rescheduling of the console daemon, > and that is fundamentally caused by event-channel notifications. So it might > make sense to rate limit the number of times we read the event-channel port > from xc_evtchn_pending -- e.g., no more than 10 times a second (should be > plenty). This has a few advantages: 1. Looking just at data transferred > doesn't stop a malicious domain from hosing you with no-op event-channel > notifications; 2. This is a fundamental metric that can be measured and > rate-limited on all backend interfaces, so perhaps we can come up with some > common library of code that we apply to all backends/daemons. I've re-worked the patch based on this principle of "n events allowed in each time-slice", setting n=30 & the time-slice = 200ms. The code was actually much simpler than my previous patch so its definitely a winning strategy. Testing by running 'while /bin/true ; do echo t > /proc/sysrq-trigger; done' ..in one of the guest VMs on a 2.2 GHz Opteron, shows no significant CPU utilization attributed to xenconsoled. I've not examined whether this code can be put into a common library - it was simple enough to integrate in the xenconsoled event loop > It may turn out we need to rate limit on data *as well*, if it turns out > that sinking many kilobytes of data a second is prohibitively expensive, but > I doubt this will happen. For a start, the strict limiting of notifications > will encourage data to get queued up and improve batching of console data, > and batches of data should be processed quite efficiently. This same > argument extends to other backends (e.g., batching of requests in xenstored, > netback, blkback, etc). Based on initial testing it doesn't look like the data rate itself was causing any significant overhead, once the event channel port reads were limited. Signed-off by: Daniel P. Berrange Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| --X1bOJ3K7DJ5YkBrT Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="xen-console-ratelimit-4.patch" diff -r bfd00b317815 tools/console/daemon/io.c --- a/tools/console/daemon/io.c Mon Sep 11 01:55:03 2006 +0100 +++ b/tools/console/daemon/io.c Mon Sep 11 10:09:32 2006 -0400 @@ -37,12 +37,18 @@ #include #include #include +#include #define MAX(a, b) (((a) > (b)) ? (a) : (b)) #define MIN(a, b) (((a) < (b)) ? (a) : (b)) /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul terminator. */ #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 + 2) + +/* How many events are allowed in each time period */ +#define RATE_LIMIT_ALLOWANCE 30 +/* Duration of each time period in ms */ +#define RATE_LIMIT_PERIOD 200 struct buffer { @@ -65,6 +71,8 @@ struct domain evtchn_port_t local_port; int xce_handle; struct xencons_interface *interface; + int event_count; + long long next_period; }; static struct domain *dom_head; @@ -306,6 +314,13 @@ static struct domain *create_domain(int { struct domain *dom; char *s; + struct timeval tv; + + if (gettimeofday(&tv, NULL) < 0) { + dolog(LOG_ERR, "Cannot get time of day %s:%s:L%d", + __FILE__, __FUNCTION__, __LINE__); + return NULL; + } dom = (struct domain *)malloc(sizeof(struct domain)); if (dom == NULL) { @@ -330,6 +345,8 @@ static struct domain *create_domain(int dom->buffer.size = 0; dom->buffer.capacity = 0; dom->buffer.max_capacity = 0; + dom->event_count = 0; + dom->next_period = (tv.tv_sec * 1000) + (tv.tv_usec / 1000) + RATE_LIMIT_PERIOD; dom->next = NULL; dom->ring_ref = -1; @@ -512,9 +529,12 @@ static void handle_ring_read(struct doma if ((port = xc_evtchn_pending(dom->xce_handle)) == -1) return; + dom->event_count++; + buffer_append(dom); - (void)xc_evtchn_unmask(dom->xce_handle, port); + if (dom->event_count < RATE_LIMIT_ALLOWANCE) + (void)xc_evtchn_unmask(dom->xce_handle, port); } static void handle_xs(void) @@ -549,6 +569,9 @@ void handle_io(void) do { struct domain *d, *n; int max_fd = -1; + struct timeval timeout; + struct timeval tv; + long long now, next_timeout = 0; FD_ZERO(&readfds); FD_ZERO(&writefds); @@ -556,8 +579,33 @@ void handle_io(void) FD_SET(xs_fileno(xs), &readfds); max_fd = MAX(xs_fileno(xs), max_fd); + if (gettimeofday(&tv, NULL) < 0) + return; + now = (tv.tv_sec * 1000) + (tv.tv_usec / 1000); + + /* Re-calculate any event counter allowances & unblock + domains with new allowance */ for (d = dom_head; d; d = d->next) { - if (d->xce_handle != -1) { + /* Add 5ms of fuzz since select() often returns + a couple of ms sooner than requested. Without + the fuzz we typically do an extra spin in select() + with a 1/2 ms timeout every other iteration */ + if ((now+5) > d->next_period) { + d->next_period = now + RATE_LIMIT_PERIOD; + if (d->event_count >= RATE_LIMIT_ALLOWANCE) { + (void)xc_evtchn_unmask(d->xce_handle, d->local_port); + } + d->event_count = 0; + } + } + + for (d = dom_head; d; d = d->next) { + if (d->event_count >= RATE_LIMIT_ALLOWANCE) { + /* Determine if we're going to be the next time slice to expire */ + if (!next_timeout || + d->next_period < next_timeout) + next_timeout = d->next_period; + } else if (d->xce_handle != -1) { int evtchn_fd = xc_evtchn_fd(d->xce_handle); FD_SET(evtchn_fd, &readfds); max_fd = MAX(evtchn_fd, max_fd); @@ -573,16 +621,29 @@ void handle_io(void) } } - ret = select(max_fd + 1, &readfds, &writefds, 0, NULL); + /* If any domain has been rate limited, we need to work + out what timeout to supply to select */ + if (next_timeout) { + long long duration = (next_timeout - now); + /* Shouldn't happen, but sanity check force greater than 0 */ + if (duration <= 0) + duration = 1; + timeout.tv_sec = duration / 1000; + timeout.tv_usec = (duration - (timeout.tv_sec * 1000)) * 1000; + } + + ret = select(max_fd + 1, &readfds, &writefds, 0, next_timeout ? &timeout : NULL); if (FD_ISSET(xs_fileno(xs), &readfds)) handle_xs(); for (d = dom_head; d; d = n) { n = d->next; - if (d->xce_handle != -1 && - FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) - handle_ring_read(d); + if (d->event_count < RATE_LIMIT_ALLOWANCE) { + if (d->xce_handle != -1 && + FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) + handle_ring_read(d); + } if (d->tty_fd != -1 && FD_ISSET(d->tty_fd, &readfds)) handle_tty_read(d); --X1bOJ3K7DJ5YkBrT Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --X1bOJ3K7DJ5YkBrT--