From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marco Elver Subject: Re: [PATCH v4 00/10] Add Kernel Concurrency Sanitizer (KCSAN) Date: Wed, 20 Nov 2019 16:54:48 +0100 Message-ID: <20191120155448.GA21320@google.com> References: <20191114180303.66955-1-elver@google.com> <1574194379.9585.10.camel@lca.pw> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Qian Cai Cc: LKMM Maintainers -- Akira Yokosawa , Alan Stern , Alexander Potapenko , Andrea Parri , Andrey Konovalov , Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Boqun Feng , Borislav Petkov , Daniel Axtens , Daniel Lustig , Dave Hansen , David Howells , Dmitry Vyukov , "H. Peter Anvin" , Ingo Molnar , Jade Alglave , Joel Fernandes , Jonathan Corbet List-Id: linux-arch.vger.kernel.org On Tue, 19 Nov 2019, Marco Elver wrote: > On Tue, 19 Nov 2019 at 21:13, Qian Cai wrote: > > > > On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > only enables KCSAN for x86, but we expect adding support for other > > > architectures is relatively straightforward (we are aware of > > > experimental ARM64 and POWER support). > > > > This does not allow the system to boot. Just hang forever at the end. > > > > https://cailca.github.io/files/dmesg.txt > > > > the config (dselect KASAN and select KCSAN with default options): > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > Thanks! That config enables lots of other debug code. I could > reproduce the hang. It's related to CONFIG_PROVE_LOCKING etc. > > The problem is definitely not the fact that kcsan_setup_watchpoint > disables interrupts (tested by removing that code). Although lockdep > still complains here, and looking at the code in kcsan/core.c, I just > can't see how local_irq_restore cannot be called before returning (in > the stacktrace you provided, there is no kcsan function), and > interrupts should always be re-enabled. (Interrupts are only disabled > during delay in kcsan_setup_watchpoint.) > > What I also notice is that this happens when the console starts > getting spammed with data-race reports (presumably because some extra > debug code has lots of data races according to KCSAN). > > My guess is that some of the extra debug logic enabled in that config > is incompatible with KCSAN. However, so far I cannot tell where > exactly the problem is. For now the work-around would be not using > KCSAN with these extra debug options. I will investigate more, but > nothing obviously wrong stands out.. It seems that due to spinlock_debug.c containing data races, the console gets spammed with reports. However, it's also possible to encounter deadlock, e.g. printk lock -> spinlock_debug -> KCSAN detects data race -> kcsan_print_report() -> printk lock -> deadlock. So the best thing is to fix the data races in spinlock_debug. I will send a patch separately for you to test. The issue that lockdep still reports inconsistency in IRQ flags tracing I cannot yet say what the problem is. It seems that lockdep IRQ flags tracing may have an issue with KCSAN for numerous reasons: let's say lockdep and IRQ flags tracing code is instrumented, which then calls into KCSAN, which disables/enables interrupts, but due to tracing calls back into lockdep code. In other words, there may be some recursion which corrupts hardirqs_enabled. Thanks, -- Marco From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f66.google.com ([209.85.128.66]:34396 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728646AbfKTPy6 (ORCPT ); Wed, 20 Nov 2019 10:54:58 -0500 Received: by mail-wm1-f66.google.com with SMTP id j18so5460289wmk.1 for ; Wed, 20 Nov 2019 07:54:56 -0800 (PST) Date: Wed, 20 Nov 2019 16:54:48 +0100 From: Marco Elver Subject: Re: [PATCH v4 00/10] Add Kernel Concurrency Sanitizer (KCSAN) Message-ID: <20191120155448.GA21320@google.com> References: <20191114180303.66955-1-elver@google.com> <1574194379.9585.10.camel@lca.pw> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Qian Cai Cc: LKMM Maintainers -- Akira Yokosawa , Alan Stern , Alexander Potapenko , Andrea Parri , Andrey Konovalov , Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Boqun Feng , Borislav Petkov , Daniel Axtens , Daniel Lustig , Dave Hansen , David Howells , Dmitry Vyukov , "H. Peter Anvin" , Ingo Molnar , Jade Alglave , Joel Fernandes , Jonathan Corbet , Josh Poimboeuf , Luc Maranget , Mark Rutland , Nicholas Piggin , "Paul E. McKenney" , Peter Zijlstra , Thomas Gleixner , Will Deacon , Eric Dumazet , kasan-dev , linux-arch , "open list:DOCUMENTATION" , linux-efi@vger.kernel.org, Linux Kbuild mailing list , LKML , Linux Memory Management List , the arch/x86 maintainers Message-ID: <20191120155448.q424UbJxug2VrJcQsxEYKxgKHECD-0xbtlr00KOhYUc@z> On Tue, 19 Nov 2019, Marco Elver wrote: > On Tue, 19 Nov 2019 at 21:13, Qian Cai wrote: > > > > On Thu, 2019-11-14 at 19:02 +0100, 'Marco Elver' via kasan-dev wrote: > > > This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN). > > > KCSAN is a sampling watchpoint-based *data race detector*. More details > > > are included in **Documentation/dev-tools/kcsan.rst**. This patch-series > > > only enables KCSAN for x86, but we expect adding support for other > > > architectures is relatively straightforward (we are aware of > > > experimental ARM64 and POWER support). > > > > This does not allow the system to boot. Just hang forever at the end. > > > > https://cailca.github.io/files/dmesg.txt > > > > the config (dselect KASAN and select KCSAN with default options): > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > Thanks! That config enables lots of other debug code. I could > reproduce the hang. It's related to CONFIG_PROVE_LOCKING etc. > > The problem is definitely not the fact that kcsan_setup_watchpoint > disables interrupts (tested by removing that code). Although lockdep > still complains here, and looking at the code in kcsan/core.c, I just > can't see how local_irq_restore cannot be called before returning (in > the stacktrace you provided, there is no kcsan function), and > interrupts should always be re-enabled. (Interrupts are only disabled > during delay in kcsan_setup_watchpoint.) > > What I also notice is that this happens when the console starts > getting spammed with data-race reports (presumably because some extra > debug code has lots of data races according to KCSAN). > > My guess is that some of the extra debug logic enabled in that config > is incompatible with KCSAN. However, so far I cannot tell where > exactly the problem is. For now the work-around would be not using > KCSAN with these extra debug options. I will investigate more, but > nothing obviously wrong stands out.. It seems that due to spinlock_debug.c containing data races, the console gets spammed with reports. However, it's also possible to encounter deadlock, e.g. printk lock -> spinlock_debug -> KCSAN detects data race -> kcsan_print_report() -> printk lock -> deadlock. So the best thing is to fix the data races in spinlock_debug. I will send a patch separately for you to test. The issue that lockdep still reports inconsistency in IRQ flags tracing I cannot yet say what the problem is. It seems that lockdep IRQ flags tracing may have an issue with KCSAN for numerous reasons: let's say lockdep and IRQ flags tracing code is instrumented, which then calls into KCSAN, which disables/enables interrupts, but due to tracing calls back into lockdep code. In other words, there may be some recursion which corrupts hardirqs_enabled. Thanks, -- Marco