From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6ED9C433F5 for ; Thu, 27 Jan 2022 11:57:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60ACD6B0071; Thu, 27 Jan 2022 06:57:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E2106B0072; Thu, 27 Jan 2022 06:57:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AB1F6B0073; Thu, 27 Jan 2022 06:57:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0150.hostedemail.com [216.40.44.150]) by kanga.kvack.org (Postfix) with ESMTP id 3C87E6B0071 for ; Thu, 27 Jan 2022 06:57:52 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E617492ED9 for ; Thu, 27 Jan 2022 11:57:51 +0000 (UTC) X-FDA: 79075917942.23.1B3B1B9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 71FFE1C0003 for ; Thu, 27 Jan 2022 11:57:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643284670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GqTgSqJnVrLVP8MoWTF2B7JIB+LN5M/sJxbL4amwDn4=; b=LCPDeDAtiA/PEPftCpxgS0HwgqLal9hO5L90gVyeECCZpZvxMw/sdmpDbQSVCUmI9b1hss R4SlyZY2yI9zbGdS1qmE7Q7dVDN/ZdrqzDqfMAQ0ratncTwe5RuTdNWQkXz6DcnRaBFJfG rBCnBDVxBHzDxIt+iZGlzk2OFogkCkU= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-328-6w6-OqbHM3GltONZ6jP-IQ-1; Thu, 27 Jan 2022 06:57:49 -0500 X-MC-Unique: 6w6-OqbHM3GltONZ6jP-IQ-1 Received: by mail-wm1-f70.google.com with SMTP id z2-20020a05600c220200b0034d2eb95f27so1397208wml.1 for ; Thu, 27 Jan 2022 03:57:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=GqTgSqJnVrLVP8MoWTF2B7JIB+LN5M/sJxbL4amwDn4=; b=14K1uf5283TfQkjKRPN5+hfDvTEKZffTjWhWymXcfG/dInLSVNijBIVnoG/57JudyQ 1DWLooUxznFLwz8XGdozf6C40Aqt+k9n3tFOR2hi4N1S7ydBb5zkp9PDRleMr5zJx6hA QgZtuLcSqWC+zbcM3ssvLj3igp+JW83PMDF8DuqquAF177p7Vr3c8y4X6DokYINf/fmA FN5x8CXU2AB0TcqpQc5jIsCq24wx5+dLGJwd9EGkj1a3JcldFNDlMjzSSDvE0ZlAQyWh UOhf4NQqWD6bVN+xPxoBq301K/gR7ymTTh5hwfFZTd9MIN0AObg8pSxvKojtkyySzuel vPdA== X-Gm-Message-State: AOAM531AI3xeI96u6CGZU+h4sKF1LPQ5z32bXmohCZYPb/V+z8CD/fZS 1Nhnqzt+ea87Uhb+MoPKzBsuPyp1l9tKWM6637Y4eejTBSHnCoti6FPWmPsw5/ihWCELRlzPoFH EtBh5CwViXuQ= X-Received: by 2002:a05:600c:2d52:: with SMTP id a18mr4339153wmg.69.1643284667974; Thu, 27 Jan 2022 03:57:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJz7aKtyvGliXs6H0xr9fl7ALMHwt2lQzThUbe/NooyXhepg8VASewJoe2mr08jK+JCTPTHSAw== X-Received: by 2002:a05:600c:2d52:: with SMTP id a18mr4339124wmg.69.1643284667709; Thu, 27 Jan 2022 03:57:47 -0800 (PST) Received: from ?IPV6:2003:cb:c70d:8300:4812:9d4f:6cd8:7f47? (p200300cbc70d830048129d4f6cd87f47.dip0.t-ipconnect.de. [2003:cb:c70d:8300:4812:9d4f:6cd8:7f47]) by smtp.gmail.com with ESMTPSA id o12sm2394747wry.115.2022.01.27.03.57.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 27 Jan 2022 03:57:47 -0800 (PST) Message-ID: Date: Thu, 27 Jan 2022 12:57:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 To: Mike Kravetz , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Michal Hocko , Naoya Horiguchi , Axel Rasmussen , Peter Xu , Andrea Arcangeli , Mina Almasry , Shuah Khan , Andrew Morton References: <20220113180308.15610-1-mike.kravetz@oracle.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support In-Reply-To: <20220113180308.15610-1-mike.kravetz@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 71FFE1C0003 X-Stat-Signature: s9npwtgfieub7ndcsxkae66srpct8yss Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LCPDeDAt; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf18.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Rspam-User: nil X-HE-Tag: 1643284670-36585 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13.01.22 19:03, Mike Kravetz wrote: > Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP > testing. However, mremap support was recently added in commit > 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed > vma"). While attempting to enable mremap support in the test, it was > discovered that the mremap test indirectly depends on MADV_DONTNEED. > > hugetlb does not support MADV_DONTNEED. However, the only thing > preventing support is a check in can_madv_lru_vma(). Simply removing > the check will enable support. > > This is sent as a RFC because there is no existing use case calling > for hugetlb MADV_DONTNEED support except possibly the userfaultfd test. > However, adding support makes sense as it is fairly trivial and brings > hugetlb functionality more in line with 'normal' memory. > Just a note: QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...) but instead always goes either via hugetlbfs or via memfd. For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems to get the job done (IOW: also discards private anon pages). See the comments in the QEMU code below. I remember that that is somewhat inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED) to make sure a) All file pages are removed b) All private anon pages are removed IIRC hugetlbfs really is different in that regard, but maybe other fs behave similarly. That's why QEMU was able to live for now without MADV_DONTNEED support for hugetlbfs and most probably won't ever need it. ... /* The logic here is messy; * madvise DONTNEED fails for hugepages * fallocate works on hugepages and shmem * shared anonymous memory requires madvise REMOVE */ need_madvise = (rb->page_size == qemu_host_page_size); need_fallocate = rb->fd != -1; if (need_fallocate) { /* For a file, this causes the area of the file to be zero'd * if read, and for hugetlbfs also causes it to be unmapped * so a userfault will trigger. */ #ifdef CONFIG_FALLOCATE_PUNCH_HOLE ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, start, length); if (ret) { ret = -errno; error_report("ram_block_discard_range: Failed to fallocate " "%s:%" PRIx64 " +%zx (%d)", rb->idstr, start, length, ret); goto err; } #else ret = -ENOSYS; error_report("ram_block_discard_range: fallocate not available/file" "%s:%" PRIx64 " +%zx (%d)", rb->idstr, start, length, ret); goto err; #endif } if (need_madvise) { /* For normal RAM this causes it to be unmapped, * for shared memory it causes the local mapping to disappear * and to fall back on the file contents (which we just * fallocate'd away). */ #if defined(CONFIG_MADVISE) if (qemu_ram_is_shared(rb) && rb->fd < 0) { ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE); } else { ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED); } if (ret) { ret = -errno; error_report("ram_block_discard_range: Failed to discard range " "%s:%" PRIx64 " +%zx (%d)", rb->idstr, start, length, ret); goto err; } #else ... -- Thanks, David / dhildenb