From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Dutile Subject: Re: [PATCH] stack overflow during pv-guest restore Date: Fri, 01 Feb 2008 09:32:56 -0500 Message-ID: <47A32D98.8040701@redhat.com> References: <47A22A15.8060403@redhat.com> <47A2F567.76E4.0078.0@novell.com> Reply-To: ddutile@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <47A2F567.76E4.0078.0@novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Jan (& Keir) That's what I did the first time (used a simple static) I generated a fix & tested it, because a lock is taken out (long) before __cpu_up() is invoked, to ensure only one cpu is in that code path anyhow. *But* as I explained below, a modified version of cpu_initialize_context() was already upstream and it went with a kzalloc-kfree implementation, so maybe you want to ask Jeremy why he didn't go with a simple static as well. So, the bottom line: the fix tries to follow upstream... typically that's what we do (or change upstream, so all of us can follow it). - Don Jan Beulich wrote: > I think this can be fixed with a much smaller change: Since the only caller > of cpu_initialize_context() is __cpu_up(), and since __cpu_up() invocation > is serialized, the variable in question could simply be static. > > Jan > >>>> Don Dutile 31.01.08 21:05 >>> > > When secondary cpus are initialized during an i386 pv-guest restore > (due to save/restore or live migration), and the guest has > a load that generates a fair number of interrupts (e.g., parallel kernel make), > a stack overflow can occur because cpu_initialize_context() has > a 2800 byte structure it declares on its stack. linux-i386 has 4K stacks, by default. > Using 2800 bytes out of 4K by a single function in a call list isn't nice; > add the beginning of interrupt handling at just the right point, before the > switch to the interrupt stack is made... and... the scale tips... > > Simple fix: malloc & free structure as needed. > > Would fail save/restore testing of an i386-guest running a parallel, kernel make after > 50->100 save/restores; with the fix, haven't seen it fail after 650 save/restores. > > Note: this is a basic port of this function in Jeremy Fitzharinge's Xen implementation of pv-ops > in upstream Linux, part of 15/32 patch, Xen SMP guest support. > > Original patch done on rhel5; did a simple diff & merge to xen-unstable's version > of smpboot.c to generate the attached patch, so it cleanly applies; but > haven't built/run/tested the xen-unstable version. > > Signed-off-by: Donald Dutile > > >