From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8541DC282D7 for ; Thu, 31 Jan 2019 02:48:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 499AD2184D for ; Thu, 31 Jan 2019 02:48:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726431AbfAaCsc convert rfc822-to-8bit (ORCPT ); Wed, 30 Jan 2019 21:48:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42066 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725535AbfAaCsb (ORCPT ); Wed, 30 Jan 2019 21:48:31 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 60714A0484; Thu, 31 Jan 2019 02:48:30 +0000 (UTC) Received: from llong.remote.csb (ovpn-120-78.rdu2.redhat.com [10.10.120.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id 395355D97A; Thu, 31 Jan 2019 02:48:28 +0000 (UTC) Subject: Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap To: Alexei Starovoitov Cc: Peter Zijlstra , Alexei Starovoitov , davem@davemloft.net, daniel@iogearbox.net, edumazet@google.com, jannh@google.com, netdev@vger.kernel.org, kernel-team@fb.com References: <20190130040458.2544340-1-ast@kernel.org> <20190130040458.2544340-3-ast@kernel.org> <20190130101530.GE2278@hirez.programming.kicks-ass.net> <20190130193040.3hmh6g7efk5z3g2j@ast-mbp.dhcp.thefacebook.com> <9246a76f-e8d4-66fc-d901-374169c3e709@redhat.com> <20190130201047.hscaxd7434hrznxx@ast-mbp.dhcp.thefacebook.com> <20190131020150.zbys2r6me7em2jnl@ast-mbp.dhcp.thefacebook.com> From: Waiman Long Openpgp: preference=signencrypt Autocrypt: addr=longman@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFgsZGsBEAC3l/RVYISY3M0SznCZOv8aWc/bsAgif1H8h0WPDrHnwt1jfFTB26EzhRea XQKAJiZbjnTotxXq1JVaWxJcNJL7crruYeFdv7WUJqJzFgHnNM/upZuGsDIJHyqBHWK5X9ZO jRyfqV/i3Ll7VIZobcRLbTfEJgyLTAHn2Ipcpt8mRg2cck2sC9+RMi45Epweu7pKjfrF8JUY r71uif2ThpN8vGpn+FKbERFt4hW2dV/3awVckxxHXNrQYIB3I/G6mUdEZ9yrVrAfLw5M3fVU CRnC6fbroC6/ztD40lyTQWbCqGERVEwHFYYoxrcGa8AzMXN9CN7bleHmKZrGxDFWbg4877zX 0YaLRypme4K0ULbnNVRQcSZ9UalTvAzjpyWnlnXCLnFjzhV7qsjozloLTkZjyHimSc3yllH7 VvP/lGHnqUk7xDymgRHNNn0wWPuOpR97J/r7V1mSMZlni/FVTQTRu87aQRYu3nKhcNJ47TGY evz/U0ltaZEU41t7WGBnC7RlxYtdXziEn5fC8b1JfqiP0OJVQfdIMVIbEw1turVouTovUA39 Qqa6Pd1oYTw+Bdm1tkx7di73qB3x4pJoC8ZRfEmPqSpmu42sijWSBUgYJwsziTW2SBi4hRjU h/Tm0NuU1/R1bgv/EzoXjgOM4ZlSu6Pv7ICpELdWSrvkXJIuIwARAQABzR9Mb25nbWFuIExv bmcgPGxsb25nQHJlZGhhdC5jb20+wsF/BBMBAgApBQJYLGRrAhsjBQkJZgGABwsJCAcDAgEG FQgCCQoLBBYCAwECHgECF4AACgkQbjBXZE7vHeYwBA//ZYxi4I/4KVrqc6oodVfwPnOVxvyY oKZGPXZXAa3swtPGmRFc8kGyIMZpVTqGJYGD9ZDezxpWIkVQDnKM9zw/qGarUVKzElGHcuFN ddtwX64yxDhA+3Og8MTy8+8ZucM4oNsbM9Dx171bFnHjWSka8o6qhK5siBAf9WXcPNogUk4S fMNYKxexcUayv750GK5E8RouG0DrjtIMYVJwu+p3X1bRHHDoieVfE1i380YydPd7mXa7FrRl 7unTlrxUyJSiBc83HgKCdFC8+ggmRVisbs+1clMsK++ehz08dmGlbQD8Fv2VK5KR2+QXYLU0 rRQjXk/gJ8wcMasuUcywnj8dqqO3kIS1EfshrfR/xCNSREcv2fwHvfJjprpoE9tiL1qP7Jrq 4tUYazErOEQJcE8Qm3fioh40w8YrGGYEGNA4do/jaHXm1iB9rShXE2jnmy3ttdAh3M8W2OMK 4B/Rlr+Awr2NlVdvEF7iL70kO+aZeOu20Lq6mx4Kvq/WyjZg8g+vYGCExZ7sd8xpncBSl7b3 99AIyT55HaJjrs5F3Rl8dAklaDyzXviwcxs+gSYvRCr6AMzevmfWbAILN9i1ZkfbnqVdpaag QmWlmPuKzqKhJP+OMYSgYnpd/vu5FBbc+eXpuhydKqtUVOWjtp5hAERNnSpD87i1TilshFQm TFxHDzbOwU0EWCxkawEQALAcdzzKsZbcdSi1kgjfce9AMjyxkkZxcGc6Rhwvt78d66qIFK9D Y9wfcZBpuFY/AcKEqjTo4FZ5LCa7/dXNwOXOdB1Jfp54OFUqiYUJFymFKInHQYlmoES9EJEU yy+2ipzy5yGbLh3ZqAXyZCTmUKBU7oz/waN7ynEP0S0DqdWgJnpEiFjFN4/ovf9uveUnjzB6 lzd0BDckLU4dL7aqe2ROIHyG3zaBMuPo66pN3njEr7IcyAL6aK/IyRrwLXoxLMQW7YQmFPSw drATP3WO0x8UGaXlGMVcaeUBMJlqTyN4Swr2BbqBcEGAMPjFCm6MjAPv68h5hEoB9zvIg+fq M1/Gs4D8H8kUjOEOYtmVQ5RZQschPJle95BzNwE3Y48ZH5zewgU7ByVJKSgJ9HDhwX8Ryuia 79r86qZeFjXOUXZjjWdFDKl5vaiRbNWCpuSG1R1Tm8o/rd2NZ6l8LgcK9UcpWorrPknbE/pm MUeZ2d3ss5G5Vbb0bYVFRtYQiCCfHAQHO6uNtA9IztkuMpMRQDUiDoApHwYUY5Dqasu4ZDJk bZ8lC6qc2NXauOWMDw43z9He7k6LnYm/evcD+0+YebxNsorEiWDgIW8Q/E+h6RMS9kW3Rv1N qd2nFfiC8+p9I/KLcbV33tMhF1+dOgyiL4bcYeR351pnyXBPA66ldNWvABEBAAHCwWUEGAEC AA8FAlgsZGsCGwwFCQlmAYAACgkQbjBXZE7vHeYxSQ/+PnnPrOkKHDHQew8Pq9w2RAOO8gMg 9Ty4L54CsTf21Mqc6GXj6LN3WbQta7CVA0bKeq0+WnmsZ9jkTNh8lJp0/RnZkSUsDT9Tza9r GB0svZnBJMFJgSMfmwa3cBttCh+vqDV3ZIVSG54nPmGfUQMFPlDHccjWIvTvyY3a9SLeamaR jOGye8MQAlAD40fTWK2no6L1b8abGtziTkNh68zfu3wjQkXk4kA4zHroE61PpS3oMD4AyI9L 7A4Zv0Cvs2MhYQ4Qbbmafr+NOhzuunm5CoaRi+762+c508TqgRqH8W1htZCzab0pXHRfywtv 0P+BMT7vN2uMBdhr8c0b/hoGqBTenOmFt71tAyyGcPgI3f7DUxy+cv3GzenWjrvf3uFpxYx4 yFQkUcu06wa61nCdxXU/BWFItryAGGdh2fFXnIYP8NZfdA+zmpymJXDQeMsAEHS0BLTVQ3+M 7W5Ak8p9V+bFMtteBgoM23bskH6mgOAw6Cj/USW4cAJ8b++9zE0/4Bv4iaY5bcsL+h7TqQBH Lk1eByJeVooUa/mqa2UdVJalc8B9NrAnLiyRsg72Nurwzvknv7anSgIkL+doXDaG21DgCYTD wGA5uquIgb8p3/ENgYpDPrsZ72CxVC2NEJjJwwnRBStjJOGQX4lV1uhN1XsZjBbRHdKF2W9g weim8xU= Organization: Red Hat Message-ID: Date: Wed, 30 Jan 2019 21:48:27 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190131020150.zbys2r6me7em2jnl@ast-mbp.dhcp.thefacebook.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 31 Jan 2019 02:48:30 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 01/30/2019 09:01 PM, Alexei Starovoitov wrote: > On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote: >> On 01/30/2019 04:11 PM, Waiman Long wrote: >>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote: >>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote: >>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote: >>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote: >>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote: >>>>>>>> Lockdep warns about false positive: >>>>>>> This is not a false positive, and you probably also need to use >>>>>>> down_read_non_owner() to match this up_read_non_owner(). >>>>>>> >>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different >>>>>>> in the lockdep annotation; there is also optimistic spin stuff that >>>>>>> relies on 'owner' tracking. >>>>>> Can you point out in the code the spin bit? >>>>>> As far as I can see sem->owner is debug only feature. >>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS. >>>>> No, sem->owner is mainly for performing optimistic spinning which is a >>>>> performance feature to make rwsem writer-lock performs similar to mutex. >>>>> The debugging part is just an add-on. It is not the reason for the >>>>> presence of sem->owner. >>>> I see. Got it. >>>> >>>>>> Also there is no down_read_trylock_non_owner() at the moment. >>>>>> We can argue about it for -next, but I'd rather silence lockdep >>>>>> with this patch today. >>>>>> >>>>> We can add down_read_trylock_non_owner() if there is a need for it. It >>>>> should be easy to do. >>>> Yes, but looking through the code it's not clear to me that it's safe >>>> to mix non_owner() versions with regular. >>>> bpf/stackmap.c does down_read_trylock + up_read. >>>> If we add new down_read_trylock_non_owner that set the owner to >>>> NULL | RWSEM_* bits is this safe with conccurent read/write >>>> that do regular versions? >>>> rwsem_can_spin_on_owner() does: >>>> if (owner) { >>>> ret = is_rwsem_owner_spinnable(owner) && >>>> owner_on_cpu(owner); >>>> that looks correct. >>>> For a second I thought there could be fault here due to non_owner. >>>> But there could be other places where it's assumed that owner >>>> is never null? >>> The content of owner is not the cause of the lockdep warning. The >>> lockdep code assumes that the task that acquires the lock will release >>> it some time later. That is not the case when you need to acquire the >>> lock by one task and released by another. In this case, you have to use >>> the non_owner version of down/up_read which disable the lockdep >>> acquire/release tracking. That will be the only difference between the >>> two set of APIs. >>> >>> Cheers, >>> Longman >> BTW, you may want to do something like that to make sure that the lock >> ownership is probably tracked. >> >> -Longman >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index d43b145..79eef9d 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct >> bpf_stack_ >>         } else { >>                 work->sem = ¤t->mm->mmap_sem; >>                 irq_work_queue(&work->irq_work); >> + >> +               /* >> +                * The irq_work will release the mmap_sem with >> +                * up_read_non_owner(). The rwsem_release() is called >> +                * here to release the lock from lockdep's perspective. >> +                */ >> +               rwsem_release(¤t->mm->mmap_sem.dep_map, 1, _RET_IP_); > are you saying the above is enough coupled with up_read_non_owner? > Looking at how amdgpu is using this api... I think they just use non_owner > version when doing things from different task. > So I don't think pairing non_owner with non_owner is strictly necessary. > It seems fine to use down_read_trylock() with above rwsem_release() hack > plus up_read_non_owner(). > Am I missing something? > The statement above is to clear the state for the lockdep so that it knows that the task no longer owns the lock. Without doing that, there is a possibility of some other kind of incorrect lockdep warning may be produced because the code will still think the task hold a read-lock on the mmap_sem. It is also possible no warning will be reported. The bpf code is kind of special that it acquires the mmap_sem. Later on, it either releases it itself (non-NMI) or request irq_work to release it (NMI). So either, you use the _non_owner versions for both acquire and release or fake the release like the code segment above. Peter, please chime in if you have other suggestion. Cheers, Longman