From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1053C433EF for ; Wed, 8 Sep 2021 17:14:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 80F5B6115B for ; Wed, 8 Sep 2021 17:14:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 80F5B6115B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 134E36B0071; Wed, 8 Sep 2021 13:14:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E4A86B0072; Wed, 8 Sep 2021 13:14:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC6F46B0073; Wed, 8 Sep 2021 13:14:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id DF8DB6B0071 for ; Wed, 8 Sep 2021 13:14:43 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9939821949 for ; Wed, 8 Sep 2021 17:14:43 +0000 (UTC) X-FDA: 78565055646.32.8176302 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 2B8A8F0000BE for ; Wed, 8 Sep 2021 17:14:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631121282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ejqc40sWfMbE1YSHcLCom16WT8UJzripXSIXzFK8K1w=; b=a0cbDAherLd1MIn6wTP76ZDIE2jXNp6gWhILLCUE/y3AWodaLhacACy5JIvc5fbyWz34H8 gv/TK1P7Vega9VD9NdKRnOv7XQjT+rUNjZ9CbIlDVDNRD5TRwZGF5vljVkd2rnlzDDTBQG v/9JZps6yNUnG8nKfGlYfUu0X+/5dR4= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-492-G9gQSycTNzaKe8Jt8B_Cmg-1; Wed, 08 Sep 2021 13:14:40 -0400 X-MC-Unique: G9gQSycTNzaKe8Jt8B_Cmg-1 Received: by mail-lj1-f197.google.com with SMTP id b29-20020a2ebc1d000000b001ba014dfa94so1294169ljf.9 for ; Wed, 08 Sep 2021 10:14:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:cc:subject:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Ejqc40sWfMbE1YSHcLCom16WT8UJzripXSIXzFK8K1w=; b=tNCPWy7u3CekNhnpkyVpMgOGyA24R7v3PQOO8qBY7gg04yxYmiwniMY4DqyvNmzbAF HSoOLoVJcxydJN49fNoWJ6DN3zOVpSEKfJwsc/wZBcb77ycMmBwc0E7h9yEPn4uoW2z0 sc+vn+WB1oPEa7ELRvNbrYX+ZtccawMrHK5p4e/mMdhYjwWOEr1uJIH6xm69CYzoUAGH lo6oitbEjajDVyk86eBoHHaTyy1oI4e/0UJhKcuQdGk9Ax+/Y3E2qgOWIL2D7TbfctXR co7orPs5+pnnkb1RLK58GjvWc3K61LLQXf9OlzV8vU8EDe7DxjJIT5jkzLOtmUImqEbr AN+A== X-Gm-Message-State: AOAM530m9VQUI+TJmpmGBhC5rQqg8o9SHKUAP9FK9dRuYsz+3/H9bIgV T0b+vt8SAtfz/rIOwn75BU2C0tZftPRK1NIwfF2XsEvz9K8LSZg6u4TgFrO5eTeui9NaEOJFldG fhCdvSAPePEE= X-Received: by 2002:a05:651c:1683:: with SMTP id bd3mr3731277ljb.323.1631121278816; Wed, 08 Sep 2021 10:14:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx+I/SsfLJUbJ18xeXxd4lV7F2XPmOj+17jTuyWR1+uDvSd+DyIl24dNLFt0ngeBrBrTTFKIQ== X-Received: by 2002:a05:651c:1683:: with SMTP id bd3mr3731245ljb.323.1631121278468; Wed, 08 Sep 2021 10:14:38 -0700 (PDT) Received: from [192.168.42.238] (87-59-106-155-cable.dk.customer.tdc.net. [87.59.106.155]) by smtp.gmail.com with ESMTPSA id r5sm243505lfm.2.2021.09.08.10.14.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Sep 2021 10:14:37 -0700 (PDT) From: Jesper Dangaard Brouer X-Google-Original-From: Jesper Dangaard Brouer Cc: brouer@redhat.com Subject: Re: [patch 031/147] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg To: David Hildenbrand , Vlastimil Babka , Jesper Dangaard Brouer , Andrew Morton , bigeasy@linutronix.de, cl@linux.com, efault@gmx.de, iamjoonsoo.kim@lge.com, jannh@google.com, linux-mm@kvack.org, mgorman@techsingularity.net, mm-commits@vger.kernel.org, penberg@kernel.org, quic_qiancai@quicinc.com, rientjes@google.com, tglx@linutronix.de, torvalds@linux-foundation.org References: <20210908025436.dvsgeCXAh%akpm@linux-foundation.org> <1a8ecf24-dca4-54f2-cdbf-9135b856b773@redhat.com> <6524bba5-f737-3ab4-ee90-d6c70bac04f7@redhat.com> Message-ID: <3a83989f-aa36-3aee-d92f-5ddc912d7fc5@redhat.com> Date: Wed, 8 Sep 2021 19:14:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <6524bba5-f737-3ab4-ee90-d6c70bac04f7@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2B8A8F0000BE X-Stat-Signature: ceq1uq6ad4zqhtkbfc9upftyfsjiiacn Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=a0cbDAhe; spf=none (imf11.hostedemail.com: domain of jbrouer@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=jbrouer@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1631121283-936551 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08/09/2021 16.59, David Hildenbrand wrote: > On 08.09.21 16:55, David Hildenbrand wrote: >> On 08.09.21 15:58, Vlastimil Babka wrote: >>> On 9/8/21 15:05, Jesper Dangaard Brouer wrote: >>>> >>>> >>>> On 08/09/2021 04.54, Andrew Morton wrote: >>>>> From: Vlastimil Babka >>>>> Subject: mm, slub: protect put_cpu_partial() with disabled irqs=20 >>>>> instead of cmpxchg >>>>> >>>>> Jann Horn reported [1] the following theoretically possible race: >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 task A: put_cpu_partial() calls preempt_di= sable() >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 task A: oldpage =3D this_cpu_read(s->cpu_s= lab->partial) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 interrupt: kfree() reaches unfreeze_partia= ls() and discards=20 >>>>> the page >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 task B (on another CPU): reallocates page = as page cache >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 task A: reads page->pages and page->pobjec= ts, which are actually >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 halves of the pointer page->lru.prev >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 task B (on another CPU): frees page >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 interrupt: allocates page as SLUB page and= places it on the=20 >>>>> percpu partial list >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 task A: this_cpu_cmpxchg() succeeds >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 which would cause page->pages and page->po= bjects to end up=20 >>>>> containing >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 halves of pointers that would then influen= ce when=20 >>>>> put_cpu_partial() >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 happens and show up in root-only sysfs fil= es. Maybe that's=20 >>>>> acceptable, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 I don't know. But there should probably at= least be a comment=20 >>>>> for now >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 to point out that we're reading union fiel= ds of a page that=20 >>>>> might be >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 in a completely different state. >>>>> >>>>> Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial()=20 >>>>> is only >>>>> safe against s->cpu_slab->partial manipulation in ___slab_alloc()=20 >>>>> if the >>>>> latter disables irqs, otherwise a __slab_free() in an irq handler=20 >>>>> could >>>>> call put_cpu_partial() in the middle of ___slab_alloc() manipulatin= g >>>>> ->partial and corrupt it.=C2=A0 This becomes an issue on RT after a= =20 >>>>> local_lock >>>>> is introduced in later patch.=C2=A0 The fix means taking the local_= lock=20 >>>>> also in >>>>> put_cpu_partial() on RT. >>>>> >>>>> After debugging this issue, Mike Galbraith suggested [2] that to av= oid >>>>> different locking schemes on RT and !RT, we can just protect >>>>> put_cpu_partial() with disabled irqs (to be converted to >>>>> local_lock_irqsave() later) everywhere.=C2=A0 This should be accept= able=20 >>>>> as it's >>>>> not a fast path, and moving the actual partial unfreezing outside=20 >>>>> of the >>>>> irq disabled section makes it short, and with the retry loop gone=20 >>>>> the code >>>>> can be also simplified.=C2=A0 In addition, the race reported by Jan= n=20 >>>>> should no >>>>> longer be possible. >>>> >>>> Based on my microbench[0] measurement changing preempt_disable to >>>> local_irq_save will cost us 11 cycles (TSC).=C2=A0 I'm not against t= he >>>> change, I just want people to keep this in mind. >>> >>> OK, but this is not a fast path for every allocation/free, so it gets >>> amortized. Also it eliminates a this_cpu_cmpxchg loop, and I'd expect >>> cmpxchg to be expensive too? >>> >>>> On my E5-1650 v4 @ 3.60GHz: >>>> =C2=A0=C2=A0=C2=A0 - preempt_disable(+enable)=C2=A0 cost: 11 cycles(= tsc) 3.161 ns >>>> =C2=A0=C2=A0=C2=A0 - local_irq_save (+restore) cost: 22 cycles(tsc) = 6.331 ns >>>> >>>> Notice the non-save/restore variant is superfast: >>>> =C2=A0=C2=A0=C2=A0 - local_irq_disable(+enable) cost: 6 cycles(tsc) = 1.844 ns >>> >>> It actually surprises me that it's that cheap, and would have expecte= d >>> changing the irq state would be the costly part, not the=20 >>> saving/restoring. >>> Incidentally, would you know what's the cost of save+restore when the >>> irqs are already disabled, so it's effectively a no-op? >> >> It surprises me as well. That would imply that protecting short RCU >> sections using >> >> local_irq_disable >> local_irq_enable >> >> instead of via >> >> preempt_disable >> preempt_enable >> >> would actually be very beneficial. Please don't draw this as a general conclusion. As Linus describe in details, the IRQ disable/enable will be very=20 micro-arch specific. The preempt_disable/enable will likely be more stable/consistent across=20 micro-archs. Keep an eye out for kernel config options when juding=20 preempt_disable/enable performance [1] [1]=20 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/t= ime_bench_sample.c#L363-L367 >> >> Are the numbers trustworthy? :) >> >=20 > .. and especially did the benchmark consider side effects of=20 > enabling/disabling interrupts (pipeline flushes etc ..)? >=20 Of-cause not, this is a microbenchmark... they are per definition not=20 trustworthy :-P -Jesper