From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757692AbcK3UvH (ORCPT ); Wed, 30 Nov 2016 15:51:07 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:54158 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755389AbcK3Uu6 (ORCPT ); Wed, 30 Nov 2016 15:50:58 -0500 Date: Wed, 30 Nov 2016 12:51:13 -0800 From: Andrew Morton To: Cc: , , , , , , , , Ingo Molnar , Peter Zijlstra Subject: Re: [PATCH] printk: Fix spinlock deadlock in printk reenty Message-Id: <20161130125113.a5f520aa5e660514c423683e@linux-foundation.org> In-Reply-To: <1480490119-63559-1-git-send-email-linyongting@huawei.com> References: <1480490119-63559-1-git-send-email-linyongting@huawei.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 30 Nov 2016 15:15:19 +0800 wrote: > From: Jinling Ke > > when Oops in printk, printk will call zap_locks() to reinitialize > spinlock to prevent deadlock. In arm, arm64, x86 or other > architecture smp cpu, race condition will occur in printk spinlock > logbuf_lock and then it will result other cpu that is waiting printk > spinlock in deadlock(in function raw_spin_lock). Because the cpus > deadlock, you can see the error printk log: > > "SMP: failed to stop secondary CPUs" > > In arm, arm64, x86 or other architecture, spinlock variable > is divided into 2 parts, for example they are 'owner' and 'next' in arm. > When get a spinlock, the 'next' part will add 1 and wait 'next' being > equal to 'owner'. However, at this moment, the 'next' part is local > variable, but 'owner' part value is get from global variable logbuf_lock. > However,raw_spin_lock_init(&logbuf_lock) will set 'owner' part and > 'next' part to zero, the result is that cpu deadlock in function > raw_spin_lock( while loop in function arch_spin_lock ). > > struct of arm spinlock > union { > u32 slock; > struct __raw_tickets { > u16 owner; > u16 next; > } tickets; > }; > } arch_spinlock_t; > static inline void arch_spin_lock(arch_spinlock_t *lock) > {... > <--- At the moment, other cpu call zap_locks()->spin_lock_init(), > <--- set the 'owner' part to zero, but lockval.tickets.next is a > <--- local variable > while (lockval.tickets.next != lockval.tickets.owner) { > lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner); > } > ... > } > > The solution is that In function zap_locks(), replace > raw_spin_lock_init(&logbuf_lock) with raw_spin_unlock(&logbuf_lock), > to let spin_lock stay in unlocked. > > ... > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1603,7 +1603,7 @@ static void zap_locks(void) > > debug_locks_off(); > /* If a crash is occurring, make sure we can't deadlock */ > - raw_spin_lock_init(&logbuf_lock); > + raw_spin_unlock(&logbuf_lock); > /* And make sure that we print immediately */ > sema_init(&console_sem, 1); OK, so it's a race between raw_spin_lock() and raw_spin_lock_init()? I wonder if there's a more general way of preventing this, within raw_spin_lock_init()? Of course, printk is special and the situation is unlikely to occur elsewhere. I guess the raw_spin_unlock() is OK - lockdep would have warned about unlock-of-unlocked-lock but we did a debug_locks_off() to prevent that.