From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2352C433EF for ; Fri, 8 Apr 2022 17:57:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238412AbiDHR7R (ORCPT ); Fri, 8 Apr 2022 13:59:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232577AbiDHR7H (ORCPT ); Fri, 8 Apr 2022 13:59:07 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D221210F6E9 for ; Fri, 8 Apr 2022 10:57:00 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id c23so8591552plo.0 for ; Fri, 08 Apr 2022 10:57:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=IudNh/HynxxH+tLoweLS06+fVTpNQOOa59eg8I3l6dE=; b=U2/eA7V9kOv67ug/CEAAH5jJ3bDdjcuxlz1/CKBkgdCZNj0JT/9J3pCPxI/ctDFSbM qK1+Sr9mcyKaUr/H66Pv9yktEidb3BhejDS+0/amDof+SQfyaKA08TzT03iiUsMWJJ3J 1MOBs6THAm4fpRtqfO8Z5nRoWMFj9z3GkfuNpKxv/p7ca1UQgBQvuQUPEasblyTQ9RrO yn1hcR+9v3A8/OJPimHYWYuYVMj6vFYN+ei68jGp8D7XrjHDIdyTkfs0MDCZSOMFvB64 yHKUCeuwajm7ozlzkxvbDs/GO8MPtR8QBXSY2YgA4lYkeM/deK+/SqRK5LFaoYUgODz1 i3nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=IudNh/HynxxH+tLoweLS06+fVTpNQOOa59eg8I3l6dE=; b=bIVHbYPGygZ2RmbA/n2vXV/4LTPLo32xatYHL+EHo+K3OfxgXCtkrFfUjb7lGve/q+ XLOmlA9CXMuY+9PN3Hl+JYhpANN0oFFglWU2uJZ73deVKKkN+GALImcnDy19eUmAJuWi tq2Wm+F27HlvT5jcppD0KrvzqzXQyeqBBtGfSt+vSNaTs6FDG9fGFEobgQslhbBGj3bn BtMS0bB5LtDXFfiH96gOrX1Y2iTiuEIL+1YMgravtBjN0foBg5Yz6ZlXK3sJ7TNHwEUu 1Xnw0ltqpqoJNkeC60VbZYo8MWaD2ds3370InZNgA3QeRNqtvlt3fIN/f8LWAPJOGuZR OTAQ== X-Gm-Message-State: AOAM5337v6mwK+26JKlZ8L+sfpCGoXjMNXczlMwTEuiJXnH4aQ/5Hp9Y 9xHPRrA/qipzdDbzSsAdvqy76g== X-Google-Smtp-Source: ABdhPJzyqykFhKSNRm78BrrnsD1mOVmGCPlJulii4V8oRfU//P/WLig+dmASk3SzvKPSoD3usOwcWw== X-Received: by 2002:a17:903:2346:b0:156:9956:f437 with SMTP id c6-20020a170903234600b001569956f437mr20871133plh.123.1649440620128; Fri, 08 Apr 2022 10:57:00 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id w123-20020a623081000000b005056a4d71e3sm6021624pfw.77.2022.04.08.10.56.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Apr 2022 10:56:59 -0700 (PDT) Date: Fri, 8 Apr 2022 17:56:55 +0000 From: Sean Christopherson To: Andy Lutomirski Cc: Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Message-ID: References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 07, 2022, Andy Lutomirski wrote: > > On Thu, Apr 7, 2022, at 9:05 AM, Sean Christopherson wrote: > > On Thu, Mar 10, 2022, Chao Peng wrote: > >> Since page migration / swapping is not supported yet, MFD_INACCESSIBLE > >> memory behave like longterm pinned pages and thus should be accounted to > >> mm->pinned_vm and be restricted by RLIMIT_MEMLOCK. > >> > >> Signed-off-by: Chao Peng > >> --- > >> mm/shmem.c | 25 ++++++++++++++++++++++++- > >> 1 file changed, 24 insertions(+), 1 deletion(-) > >> > >> diff --git a/mm/shmem.c b/mm/shmem.c > >> index 7b43e274c9a2..ae46fb96494b 100644 > >> --- a/mm/shmem.c > >> +++ b/mm/shmem.c > >> @@ -915,14 +915,17 @@ static void notify_fallocate(struct inode *inode, pgoff_t start, pgoff_t end) > >> static void notify_invalidate_page(struct inode *inode, struct folio *folio, > >> pgoff_t start, pgoff_t end) > >> { > >> -#ifdef CONFIG_MEMFILE_NOTIFIER > >> struct shmem_inode_info *info = SHMEM_I(inode); > >> > >> +#ifdef CONFIG_MEMFILE_NOTIFIER > >> start = max(start, folio->index); > >> end = min(end, folio->index + folio_nr_pages(folio)); > >> > >> memfile_notifier_invalidate(&info->memfile_notifiers, start, end); > >> #endif > >> + > >> + if (info->xflags & SHM_F_INACCESSIBLE) > >> + atomic64_sub(end - start, ¤t->mm->pinned_vm); > > > > As Vishal's to-be-posted selftest discovered, this is broken as current->mm > > may be NULL. Or it may be a completely different mm, e.g. AFAICT there's > > nothing that prevents a different process from punching hole in the shmem > > backing. > > > > How about just not charging the mm in the first place? There’s precedent: > ramfs and hugetlbfs (at least sometimes — I’ve lost track of the current > status). > > In any case, for an administrator to try to assemble the various rlimits into > a coherent policy is, and always has been, quite messy. ISTM cgroup limits, > which can actually add across processes usefully, are much better. > > So, aside from the fact that these fds aren’t in a filesystem and are thus > available by default, I’m not convinced that this accounting is useful or > necessary. > > Maybe we could just have some switch require to enable creation of private > memory in the first place, and anyone who flips that switch without > configuring cgroups is subject to DoS. I personally have no objection to that, and I'm 99% certain Google doesn't rely on RLIMIT_MEMLOCK.