From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ADE6C2D0E4 for ; Tue, 17 Nov 2020 15:10:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 77E8F24248 for ; Tue, 17 Nov 2020 15:09:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="b8lZBQa0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77E8F24248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 902436B005D; Tue, 17 Nov 2020 10:09:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88C526B0068; Tue, 17 Nov 2020 10:09:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72CC66B006C; Tue, 17 Nov 2020 10:09:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 39EB46B005D for ; Tue, 17 Nov 2020 10:09:58 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CEBEC181AEF0B for ; Tue, 17 Nov 2020 15:09:57 +0000 (UTC) X-FDA: 77494245234.18.toad43_5e0f14527332 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id A1FE7100ED9C4 for ; Tue, 17 Nov 2020 15:09:57 +0000 (UTC) X-HE-Tag: toad43_5e0f14527332 X-Filterd-Recvd-Size: 8562 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 17 Nov 2020 15:09:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605625796; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=avqgkPhMAEZAPZkBU6g1zM+EphNhv69bq0mDQ2MSyGE=; b=b8lZBQa0Brc750XxPMdMZZ1aEaeuuxx+2UD79BhNZ/IDSVHSyzxiA+sip0bS2lYsHIGHgh 0g12uEPrSAhaecoQqSZ3HZ4M3u2U/K+tIZs6GSEaBvalxN0etacRB1uDPwA4RgwmzNMACX Y6jntKdtCo6/gb/GSqEaMAJDHwBGlkY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-246-K75Q0U6RPOe3OPIwoMgYvA-1; Tue, 17 Nov 2020 10:09:53 -0500 X-MC-Unique: K75Q0U6RPOe3OPIwoMgYvA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 87EF157090; Tue, 17 Nov 2020 15:09:49 +0000 (UTC) Received: from [10.36.114.99] (ovpn-114-99.ams2.redhat.com [10.36.114.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id BF2366BF6B; Tue, 17 Nov 2020 15:09:40 +0000 (UTC) Subject: Re: [PATCH v8 2/9] mmap: make mlock_future_check() global To: Mike Rapoport Cc: Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org References: <20201112190827.GP4758@kernel.org> <7A16CA44-782D-4ABA-8D93-76BDD0A90F94@redhat.com> <20201115082625.GT4758@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: Date: Tue, 17 Nov 2020 16:09:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20201115082625.GT4758@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.11.20 09:26, Mike Rapoport wrote: > On Thu, Nov 12, 2020 at 09:15:18PM +0100, David Hildenbrand wrote: >> >>> Am 12.11.2020 um 20:08 schrieb Mike Rapoport : >>> >>> =EF=BB=BFOn Thu, Nov 12, 2020 at 05:22:00PM +0100, David Hildenbrand = wrote: >>>>> On 10.11.20 19:06, Mike Rapoport wrote: >>>>> On Tue, Nov 10, 2020 at 06:17:26PM +0100, David Hildenbrand wrote: >>>>>> On 10.11.20 16:14, Mike Rapoport wrote: >>>>>>> From: Mike Rapoport >>>>>>> >>>>>>> It will be used by the upcoming secret memory implementation. >>>>>>> >>>>>>> Signed-off-by: Mike Rapoport >>>>>>> --- >>>>>>> mm/internal.h | 3 +++ >>>>>>> mm/mmap.c | 5 ++--- >>>>>>> 2 files changed, 5 insertions(+), 3 deletions(-) >>>>>>> >>>>>>> diff --git a/mm/internal.h b/mm/internal.h >>>>>>> index c43ccdddb0f6..ae146a260b14 100644 >>>>>>> --- a/mm/internal.h >>>>>>> +++ b/mm/internal.h >>>>>>> @@ -348,6 +348,9 @@ static inline void munlock_vma_pages_all(stru= ct vm_area_struct *vma) >>>>>>> extern void mlock_vma_page(struct page *page); >>>>>>> extern unsigned int munlock_vma_page(struct page *page); >>>>>>> +extern int mlock_future_check(struct mm_struct *mm, unsigned lon= g flags, >>>>>>> + unsigned long len); >>>>>>> + >>>>>>> /* >>>>>>> * Clear the page's PageMlocked(). This can be useful in a si= tuation where >>>>>>> * we want to unconditionally remove a page from the pagecache= -- e.g., >>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>>>>> index 61f72b09d990..c481f088bd50 100644 >>>>>>> --- a/mm/mmap.c >>>>>>> +++ b/mm/mmap.c >>>>>>> @@ -1348,9 +1348,8 @@ static inline unsigned long round_hint_to_m= in(unsigned long hint) >>>>>>> return hint; >>>>>>> } >>>>>>> -static inline int mlock_future_check(struct mm_struct *mm, >>>>>>> - unsigned long flags, >>>>>>> - unsigned long len) >>>>>>> +int mlock_future_check(struct mm_struct *mm, unsigned long flags= , >>>>>>> + unsigned long len) >>>>>>> { >>>>>>> unsigned long locked, lock_limit; >>>>>>> >>>>>> >>>>>> So, an interesting question is if you actually want to charge secr= etmem >>>>>> pages against mlock now, or if you want a dedicated secretmem cgro= up >>>>>> controller instead? >>>>> >>>>> Well, with the current implementation there are three limits an >>>>> administrator can use to control secretmem limits: mlock, memcg and >>>>> kernel parameter. >>>>> >>>>> The kernel parameter puts a global upper limit for secretmem usage, >>>>> memcg accounts all secretmem allocations, including the unused memo= ry in >>>>> large pages caching and mlock allows per task limit for secretmem >>>>> mappings, well, like mlock does. >>>>> >>>>> I didn't consider a dedicated cgroup, as it seems we already have e= nough >>>>> existing knobs and a new one would be unnecessary. >>>> >>>> To me it feels like the mlock() limit is a wrong fit for secretmem. = But >>>> maybe there are other cases of using the mlock() limit without actua= lly >>>> doing mlock() that I am not aware of (most probably :) )? >>> >>> Secretmem does not explicitly calls to mlock() but it does what mlock= () >>> does and a bit more. Citing mlock(2): >>> >>> mlock(), mlock2(), and mlockall() lock part or all of the cal= ling >>> process's virtual address space into RAM, preventing that memory = from >>> being paged to the swap area. >>> >>> So, based on that secretmem pages are not swappable, I think that >>> RLIMIT_MEMLOCK is appropriate here. >>> >> >> The page explicitly lists mlock() system calls. >=20 > Well, it's mlock() man page, isn't it? ;-) ;) >=20 > My thinking was that since secretmem does what mlock() does wrt > swapability, it should at least obey the same limit, i.e. > RLIMIT_MEMLOCK. Right, but at least currently, it behaves like any other CMA allocation=20 (IIRC they are all unmovable and, therefore, not swappable). In the=20 future, if pages would be movable (but not swappable), I guess it might=20 makes more sense. I assume we never ever want to swap secretmem. "man getrlimit" states for RLIMIT_MEMLOCK: "This is the maximum number of bytes of memory that may be locked into RAM. [...] This limit affects mlock(2), mlockall(2), and the mmap(2) MAP_LOCKED operation. Since Linux 2.6.9, it also affects the shmctl(2) SHM_LOCK op=E2=80=90 eration [...]" So that place has to be updated as well I guess? Otherwise this might=20 come as a surprise for users. >=20 >> E.g., we also don=E2=80=98t >> account for gigantic pages - which might be allocated from CMA and are >> not swappable. > =20 > Do you mean gigantic pages in hugetlbfs? Yes > It seems to me that hugetlbfs accounting is a completely different > story. I'd say it is right now comparable to secretmem - which is why I though=20 similar accounting would make sense. --=20 Thanks, David / dhildenb