From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Lord Subject: Re: [PATCH 1/11] Add generic helpers for arch IPI function calls Date: Thu, 24 Apr 2008 08:44:16 -0400 Message-ID: <481080A0.9050804@rtr.ca> References: <1208851058-8500-1-git-send-email-jens.axboe@oracle.com> <1208851058-8500-2-git-send-email-jens.axboe@oracle.com> <480E70ED.3030701@rtr.ca> <20080423072432.GX12774@kernel.dk> <480F3CBC.60305@rtr.ca> <20080423135110.GO12774@kernel.dk> <480F4BD9.8090003@rtr.ca> <20080424105908.GW12774@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20080424105908.GW12774-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> Sender: linux-arch-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: To: Jens Axboe Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, npiggin-l3A5Bk7waGM@public.gmane.org, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, pavel-+ZI9xUNit7I@public.gmane.org, "Rafael J. Wysocki" Jens Axboe wrote: > On Wed, Apr 23 2008, Mark Lord wrote: >> Jens Axboe wrote: >>> On Wed, Apr 23 2008, Mark Lord wrote: >>> .. >>>> The second bug, is that for the halt case at least, >>>> nobody waits for the other CPU to actually halt >>>> before continuing.. so we sometimes enter the shutdown >>>> code while other CPUs are still active. >>>> >>>> This causes some machines to hang at shutdown, >>>> unless CPU_HOTPLUG is configured and takes them offline >>>> before we get here. >>> I'm guessing there's a reason it doesn't pass '1' as the last argument, >>> because that would fix that issue? >> .. >> >> Undoubtedly -- perhaps the called CPU halts, and therefore cannot reply. :) > > Uhm yes, I guess stop_this_cpu() does exactly what the name implies :-) > >> But some kind of pre-halt ack, perhaps plus a short delay by the caller >> after receipt of the ack, would probably suffice to kill that bug. >> >> But I really haven't studied this code enough to know, >> other than that it historically has been a sticky area >> to poke around in. > > Something like this will close the window to right up until the point > where the other CPUs have 'almost' called halt(). > > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > index 5398385..94ec9bf 100644 > --- a/arch/x86/kernel/smp.c > +++ b/arch/x86/kernel/smp.c > @@ -155,8 +155,9 @@ static void stop_this_cpu(void *dummy) > /* > * Remove this CPU: > */ > - cpu_clear(smp_processor_id(), cpu_online_map); > disable_local_APIC(); > + cpu_clear(smp_processor_id(), cpu_online_map); > + smp_wmb(); > if (hlt_works(smp_processor_id())) > for (;;) halt(); > for (;;); > @@ -175,6 +176,12 @@ static void native_smp_send_stop(void) > > local_irq_save(flags); > smp_call_function(stop_this_cpu, NULL, 0, 0); > + > + while (cpus_weight(cpu_online_map) > 1) { > + cpu_relax(); > + smp_rmb(); > + } > + > disable_local_APIC(); > local_irq_restore(flags); > } .. Yup, that looks like it oughta work consistently. Now we just need to hear from some of the folks who have danced around this code in the past. (added Pavel & Rafael to Cc:). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rtr.ca ([76.10.145.34]:1221 "EHLO mail.rtr.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755914AbYDXMoR (ORCPT ); Thu, 24 Apr 2008 08:44:17 -0400 Message-ID: <481080A0.9050804@rtr.ca> Date: Thu, 24 Apr 2008 08:44:16 -0400 From: Mark Lord MIME-Version: 1.0 Subject: Re: [PATCH 1/11] Add generic helpers for arch IPI function calls References: <1208851058-8500-1-git-send-email-jens.axboe@oracle.com> <1208851058-8500-2-git-send-email-jens.axboe@oracle.com> <480E70ED.3030701@rtr.ca> <20080423072432.GX12774@kernel.dk> <480F3CBC.60305@rtr.ca> <20080423135110.GO12774@kernel.dk> <480F4BD9.8090003@rtr.ca> <20080424105908.GW12774@kernel.dk> In-Reply-To: <20080424105908.GW12774@kernel.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: Jens Axboe Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, npiggin@suse.de, torvalds@linux-foundation.org, pavel@ucw.cz, "Rafael J. Wysocki" Message-ID: <20080424124416.pwuZdSsrHN-g6Fi6199XG1ZiBeOUsY4fVrnCmqWljTM@z> Jens Axboe wrote: > On Wed, Apr 23 2008, Mark Lord wrote: >> Jens Axboe wrote: >>> On Wed, Apr 23 2008, Mark Lord wrote: >>> .. >>>> The second bug, is that for the halt case at least, >>>> nobody waits for the other CPU to actually halt >>>> before continuing.. so we sometimes enter the shutdown >>>> code while other CPUs are still active. >>>> >>>> This causes some machines to hang at shutdown, >>>> unless CPU_HOTPLUG is configured and takes them offline >>>> before we get here. >>> I'm guessing there's a reason it doesn't pass '1' as the last argument, >>> because that would fix that issue? >> .. >> >> Undoubtedly -- perhaps the called CPU halts, and therefore cannot reply. :) > > Uhm yes, I guess stop_this_cpu() does exactly what the name implies :-) > >> But some kind of pre-halt ack, perhaps plus a short delay by the caller >> after receipt of the ack, would probably suffice to kill that bug. >> >> But I really haven't studied this code enough to know, >> other than that it historically has been a sticky area >> to poke around in. > > Something like this will close the window to right up until the point > where the other CPUs have 'almost' called halt(). > > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > index 5398385..94ec9bf 100644 > --- a/arch/x86/kernel/smp.c > +++ b/arch/x86/kernel/smp.c > @@ -155,8 +155,9 @@ static void stop_this_cpu(void *dummy) > /* > * Remove this CPU: > */ > - cpu_clear(smp_processor_id(), cpu_online_map); > disable_local_APIC(); > + cpu_clear(smp_processor_id(), cpu_online_map); > + smp_wmb(); > if (hlt_works(smp_processor_id())) > for (;;) halt(); > for (;;); > @@ -175,6 +176,12 @@ static void native_smp_send_stop(void) > > local_irq_save(flags); > smp_call_function(stop_this_cpu, NULL, 0, 0); > + > + while (cpus_weight(cpu_online_map) > 1) { > + cpu_relax(); > + smp_rmb(); > + } > + > disable_local_APIC(); > local_irq_restore(flags); > } .. Yup, that looks like it oughta work consistently. Now we just need to hear from some of the folks who have danced around this code in the past. (added Pavel & Rafael to Cc:).