From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-x231.google.com (mail-pg0-x231.google.com [IPv6:2607:f8b0:400e:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yYmTZ0DG9zDrJy for ; Sat, 11 Nov 2017 17:00:16 +1100 (AEDT) Received: by mail-pg0-x231.google.com with SMTP id t10so7938153pgo.3 for ; Fri, 10 Nov 2017 22:00:16 -0800 (PST) Date: Sat, 11 Nov 2017 17:00:02 +1100 From: Nicholas Piggin To: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org, Vasant Hegde Subject: Re: [PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown Message-ID: <20171111170002.0fcbe5c7@roar.ozlabs.ibm.com> In-Reply-To: <87tvy2o0gf.fsf@concordia.ellerman.id.au> References: <20171023080507.21974-1-npiggin@gmail.com> <20171023080507.21974-2-npiggin@gmail.com> <87tvy2o0gf.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 10 Nov 2017 22:08:32 +1100 Michael Ellerman wrote: > Nicholas Piggin writes: > > > Currently powernv reboot and shutdown requests just leave secondaries > > to do their own things. This is undesirable because they can trigger > > any number of watchdogs while waiting for reboot, but also we don't > > know what else they might be doing, or they might be stuck somewhere > > causing trouble. > > > > The opal scheduled flash update code already ran into watchdog problems > > due to flashing taking a long time, but it's possible for regular > > reboots to trigger problems too (this is with watchdog_thresh set to 1, > > but I have seen it with watchdog_thresh at the default value once too): > > > > reboot: Restarting system > > [ 360.038896709,5] OPAL: Reboot request... > > Watchdog CPU:0 Hard LOCKUP > > Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 > > Watchdog CPU:16 Hard LOCKUP > > watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] > > > > So remove the special case for flash update, and unconditionally do > > smp_send_stop before rebooting. > > > > Return the CPUs to Linux stop loops rather than OPAL. The reason for > > this is that the path to firmware is longer, and the CPUs may have > > been interrupted from firmware, which may cause problems to re-enter > > it. It's better to put them into a simple spin loop to maximize the > > chance of a successful reboot. > > I always assumed we had to send the CPUs back to OPAL for the flashing > procedure. Is it OK to leave them in Linux? According to the comment and changelog 2196c6f1ed66eef23df3b478cfe71661ae83726e It was added just to keep secondaries from going silly. Vasant, can you remember details? Thanks, Nick