From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ED0BC169C4 for ; Wed, 6 Feb 2019 03:23:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3F82B206BA for ; Wed, 6 Feb 2019 03:23:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CfKl0B15" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727306AbfBFDVO (ORCPT ); Tue, 5 Feb 2019 22:21:14 -0500 Received: from mail-yw1-f67.google.com ([209.85.161.67]:35727 "EHLO mail-yw1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725981AbfBFDVO (ORCPT ); Tue, 5 Feb 2019 22:21:14 -0500 Received: by mail-yw1-f67.google.com with SMTP id s204so348876ywg.2 for ; Tue, 05 Feb 2019 19:21:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=7sFlT7dZ1BopWZJ8KdnBo5UzhZxEbJNtMVdiIN10nE4=; b=CfKl0B151lf7CPRgAPb62YouxIXsZOp1YkGj9PlVL6RxCoeJ/rhla4tDXuTNDiNcPV rm5oKzp9GD4DMGz0OmtRnE9scxVAPHVg/BtYsj1PEV2sZybzMFDIt45RJfeAzxL+IYHQ nSHEkYVLNpBp++17WmguvD69bitKJ1PLsfx0DNoE1tANTzs4y1z0syJq0sAIYSQJLhjc Af9BOH2Gp1EZqFOS4W2DzXFNRgsE4esds67pvNNMuUOc1IK+0+XS5g8MLya8PGA70Qop tKNu0mT2GNu1HiN9TTvEFH+V1mf8V+BVaxAsuG3MLxghD2eATwXL1ndQPk7m8T5s5i9r xFiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=7sFlT7dZ1BopWZJ8KdnBo5UzhZxEbJNtMVdiIN10nE4=; b=R5zOGofQwfhOkgiNHybLfex2FHuZaGDG9WBdlOJ9rsZ/Z9Litj9CxB3iRxlWYWCZOB qlAbxJW2U0sxGj5Q2Lb1+dRhdnSQbMOLSnFwwCIj9L/l+WchP68toMsJF3ZwnDN9+B+k QVzcZYnO3wNazuRI5UIAi93FHgQLDuuwJDeBfbJd06i8HglHOSnqTl5h3HkWqqphIH// rZibI3DzuzcPROUerkVbLX+YJt+Lg4uCL29/x9bd1pgEUQ2DMnR9f/Ns1RUESTdKPVn5 pp1QX4h1ftXBPtZjE6V46mhoiOTG9SHVT7MT5WawJp6m1O0z7Nen/MR3iMsbIKBp/8Wx zCSw== X-Gm-Message-State: AHQUAuagAmiZBz6e+qCPsc6eiMfk+VzIq1Ouy8ROMj3f7N2ZCk4A21ip LXJ/t60F8+yig7XKoPYdcLs= X-Google-Smtp-Source: AHgI3IbjHoU7f/QX0BSgglyXkHzapthG92uzDGxM0HCnk1X94W/q5YUSiHDFSnAcJqSPfSCgnnZtEw== X-Received: by 2002:a0d:ea8e:: with SMTP id t136mr6980703ywe.376.1549423272847; Tue, 05 Feb 2019 19:21:12 -0800 (PST) Received: from [192.168.86.235] (c-73-241-150-70.hsd1.ca.comcast.net. [73.241.150.70]) by smtp.gmail.com with ESMTPSA id k62sm4278912ywk.84.2019.02.05.19.21.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Feb 2019 19:21:10 -0800 (PST) Subject: Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap To: Waiman Long , Alexei Starovoitov Cc: Peter Zijlstra , Alexei Starovoitov , davem@davemloft.net, daniel@iogearbox.net, edumazet@google.com, jannh@google.com, netdev@vger.kernel.org, kernel-team@fb.com References: <20190130040458.2544340-1-ast@kernel.org> <20190130040458.2544340-3-ast@kernel.org> <20190130101530.GE2278@hirez.programming.kicks-ass.net> <20190130193040.3hmh6g7efk5z3g2j@ast-mbp.dhcp.thefacebook.com> <9246a76f-e8d4-66fc-d901-374169c3e709@redhat.com> <20190130201047.hscaxd7434hrznxx@ast-mbp.dhcp.thefacebook.com> <20190131020150.zbys2r6me7em2jnl@ast-mbp.dhcp.thefacebook.com> From: Eric Dumazet Message-ID: <510d3bb7-5ec0-e281-aabc-a3d379475d71@gmail.com> Date: Tue, 5 Feb 2019 19:21:08 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 01/30/2019 06:48 PM, Waiman Long wrote: > On 01/30/2019 09:01 PM, Alexei Starovoitov wrote: >> On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote: >>> On 01/30/2019 04:11 PM, Waiman Long wrote: >>>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote: >>>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote: >>>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote: >>>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote: >>>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote: >>>>>>>>> Lockdep warns about false positive: >>>>>>>> This is not a false positive, and you probably also need to use >>>>>>>> down_read_non_owner() to match this up_read_non_owner(). >>>>>>>> >>>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different >>>>>>>> in the lockdep annotation; there is also optimistic spin stuff that >>>>>>>> relies on 'owner' tracking. >>>>>>> Can you point out in the code the spin bit? >>>>>>> As far as I can see sem->owner is debug only feature. >>>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS. >>>>>> No, sem->owner is mainly for performing optimistic spinning which is a >>>>>> performance feature to make rwsem writer-lock performs similar to mutex. >>>>>> The debugging part is just an add-on. It is not the reason for the >>>>>> presence of sem->owner. >>>>> I see. Got it. >>>>> >>>>>>> Also there is no down_read_trylock_non_owner() at the moment. >>>>>>> We can argue about it for -next, but I'd rather silence lockdep >>>>>>> with this patch today. >>>>>>> >>>>>> We can add down_read_trylock_non_owner() if there is a need for it. It >>>>>> should be easy to do. >>>>> Yes, but looking through the code it's not clear to me that it's safe >>>>> to mix non_owner() versions with regular. >>>>> bpf/stackmap.c does down_read_trylock + up_read. >>>>> If we add new down_read_trylock_non_owner that set the owner to >>>>> NULL | RWSEM_* bits is this safe with conccurent read/write >>>>> that do regular versions? >>>>> rwsem_can_spin_on_owner() does: >>>>> if (owner) { >>>>> ret = is_rwsem_owner_spinnable(owner) && >>>>> owner_on_cpu(owner); >>>>> that looks correct. >>>>> For a second I thought there could be fault here due to non_owner. >>>>> But there could be other places where it's assumed that owner >>>>> is never null? >>>> The content of owner is not the cause of the lockdep warning. The >>>> lockdep code assumes that the task that acquires the lock will release >>>> it some time later. That is not the case when you need to acquire the >>>> lock by one task and released by another. In this case, you have to use >>>> the non_owner version of down/up_read which disable the lockdep >>>> acquire/release tracking. That will be the only difference between the >>>> two set of APIs. >>>> >>>> Cheers, >>>> Longman >>> BTW, you may want to do something like that to make sure that the lock >>> ownership is probably tracked. >>> >>> -Longman >>> >>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >>> index d43b145..79eef9d 100644 >>> --- a/kernel/bpf/stackmap.c >>> +++ b/kernel/bpf/stackmap.c >>> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct >>> bpf_stack_ >>>         } else { >>>                 work->sem = ¤t->mm->mmap_sem; >>>                 irq_work_queue(&work->irq_work); >>> + >>> +               /* >>> +                * The irq_work will release the mmap_sem with >>> +                * up_read_non_owner(). The rwsem_release() is called >>> +                * here to release the lock from lockdep's perspective. >>> +                */ >>> +               rwsem_release(¤t->mm->mmap_sem.dep_map, 1, _RET_IP_); >> are you saying the above is enough coupled with up_read_non_owner? >> Looking at how amdgpu is using this api... I think they just use non_owner >> version when doing things from different task. >> So I don't think pairing non_owner with non_owner is strictly necessary. >> It seems fine to use down_read_trylock() with above rwsem_release() hack >> plus up_read_non_owner(). >> Am I missing something? >> > The statement above is to clear the state for the lockdep so that it > knows that the task no longer owns the lock. Without doing that, there > is a possibility of some other kind of incorrect lockdep warning may be > produced because the code will still think the task hold a read-lock on > the mmap_sem. It is also possible no warning will be reported. > > The bpf code is kind of special that it acquires the mmap_sem. Later on, > it either releases it itself (non-NMI) or request irq_work to release it > (NMI). So either, you use the _non_owner versions for both acquire and > release or fake the release like the code segment above. > > Peter, please chime in if you have other suggestion. > Hi guys What are the plans to address the issue then ? Latest net tree exhibits this trace : Thanks [ 546.116982] ===================================== [ 546.121688] WARNING: bad unlock balance detected! [ 546.126393] 5.0.0-dbg-DEV #559 Not tainted [ 546.130491] ------------------------------------- [ 546.135196] taskset/15409 is trying to release lock (&mm->mmap_sem) at: [ 546.141844] [] do_up_read+0x16/0x30 [ 546.146919] but there are no more locks to release! [ 546.151790] other info that might help us debug this: [ 546.158325] 1 lock held by taskset/15409: [ 546.162325] #0: 00000000683db857 (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file.isra.38+0x13e/0xbc0 [ 546.171978] stack backtrace: [ 546.176327] CPU: 0 PID: 15409 Comm: taskset Not tainted 5.0.0-dbg-DEV #559 [ 546.183214] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.54.0 06/07/2018 [ 546.190182] Call Trace: [ 546.192633] [ 546.194672] dump_stack+0x67/0x95 [ 546.198006] ? do_up_read+0x16/0x30 [ 546.201491] print_unlock_imbalance_bug.part.33+0xd0/0xd7 [ 546.206914] ? do_up_read+0x16/0x30 [ 546.210406] lock_release+0x213/0x2d0 [ 546.214088] up_read+0x20/0xa0 [ 546.217138] do_up_read+0x16/0x30 [ 546.220457] irq_work_run_list+0x4a/0x70 [ 546.224408] irq_work_run+0x18/0x40 [ 546.227911] smp_irq_work_interrupt+0x54/0x1d0 [ 546.232362] irq_work_interrupt+0xf/0x20 [ 546.236280] [ 546.238396] RIP: 0010:release_pages+0x69/0x650 [ 546.242839] Code: 01 4c 8b 2f 4c 8d 67 08 48 8d 44 f7 08 45 31 f6 48 89 04 24 eb 04 49 83 c4 08 48 8b 05 20 fe 31 01 49 39 c5 74 26 4d 8b 7d 08 <4d> 8d 47 ff 41 83 e7 01 4d 0f 44 c5 41 8b 40 34 4d 89 c7 85 c0 0f [ 546.261615] RSP: 0018:ffffac604732bb00 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff09 [ 546.269190] RAX: ffffe4c9c0ca8000 RBX: 0000000000000020 RCX: 0000000000000006 [ 546.276352] RDX: 0000000000000006 RSI: ffff8cf7facb6aa8 RDI: ffff8cf7facb6240 [ 546.283482] RBP: ffffac604732bb70 R08: ffffe4c9c0ba3d80 R09: 0000000000000000 [ 546.290613] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cf7db3292d8 [ 546.297752] R13: ffffe4c9c0ba3dc0 R14: 0000000000000000 R15: ffffe4c9c0ba3d88 [ 546.304907] free_pages_and_swap_cache+0xe6/0x100 [ 546.309618] tlb_flush_mmu_free+0x36/0x60 [ 546.313625] arch_tlb_finish_mmu+0x28/0x1e0 [ 546.317801] tlb_finish_mmu+0x23/0x30 [ 546.321459] exit_mmap+0xc9/0x1f0 [ 546.324787] ? __mutex_unlock_slowpath+0x11/0x2e0 [ 546.329502] mmput+0x62/0x130 [ 546.332481] flush_old_exec+0x60e/0x810 [ 546.336314] load_elf_binary+0x830/0x18c0 [ 546.340323] ? search_binary_handler+0x8a/0x200 [ 546.344848] search_binary_handler+0x99/0x200 [ 546.349221] __do_execve_file.isra.38+0x76d/0xbc0 [ 546.353918] __x64_sys_execve+0x39/0x50 [ 546.357757] do_syscall_64+0x5a/0x460 [ 546.361440] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 546.366489] RIP: 0033:0x7fd599e3c067 [ 546.370069] Code: Bad RIP value. [ 546.373306] RSP: 002b:00007ffcf7092e58 EFLAGS: 00000202 ORIG_RAX: 000000000000003b [ 546.380880] RAX: ffffffffffffffda RBX: 00007ffcf7093058 RCX: 00007fd599e3c067 [ 546.388010] RDX: 00007ffcf7093070 RSI: 00007ffcf7093058 RDI: 00007ffcf70937c0 [ 546.395132] RBP: 00007ffcf7092ec0 R08: 00000000ffffffff R09: 0000000001b1ae40 [ 546.402265] R10: 00007ffcf7092c80 R11: 0000000000000202 R12: 00007ffcf70937c0 [ 546.409385] R13: 00007ffcf7093070 R14: 0000000000000000 R15: 00007ffcf7093048