From: David Hildenbrand <david@redhat.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
linux-man@vger.kernel.org
Cc: Pankaj Gupta <pankaj.gupta@ionos.com>,
Alejandro Colomar <alx.manpages@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>,
Oscar Salvador <osalvador@suse.de>, Jann Horn <jannh@google.com>,
Mike Rapoport <rppt@kernel.org>,
Linux API <linux-api@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
Date: Wed, 18 Aug 2021 10:35:24 +0200 [thread overview]
Message-ID: <70792f9c-ace1-6876-378b-5388f7948a60@redhat.com> (raw)
In-Reply-To: <d70ef542-051a-521f-807d-fa469b28e897@gmail.com>
On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote:
> Hello David,
>
> Thank you for writing this! Could you please take
> a look at the comments below and revise?
Hi Michael,
thanks for your valuable input. Your feedback will certainly make this
easier to understand for people that are not heavily involved in MM work :)
[...]
>> man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 107 insertions(+)
>>
>> diff --git a/man2/madvise.2 b/man2/madvise.2
>> index f1f384c0c..f6cea9ad2 100644
>> --- a/man2/madvise.2
>> +++ b/man2/madvise.2
>> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing
>> storage.
>> The advice might be ignored for some pages in the range when it is not
>> applicable.
>> +.TP
>> +.BR MADV_POPULATE_READ " (since Linux 5.14)"
>> +Populate (prefault) page tables readable for the whole range without actually
>
> I have trouble to understand "Populate (prefault) page tables readable".
> Does it mean that it is just the page tables are being populated, and the
> PTEs are marked to indicate that the pages are readable? If yes, I
> think some rewording would help.
I actually tried phrasing it similar to our MAP_POPULATE documentation:
("Populate (prefault) page tables for a mapping.")
We will prefault all pages, faulting them in.
>
>> +reading memory.
>
> I don't understand "without actually reading memory"? Do you mean,
> "without actually faulting in the pages"; or something else?
"Populate (prefault) page tables readable, faulting in all pages in the
range just as if manually reading one byte of each page; however, avoid
the actual memory access that would have been performed after handling
the fault."
Does that make it clearer? (avoiding eventually touching the page at all
can be beneficial, especially when dealing with DAX memory where memory
access might be expensive)
>
>> +Depending on the underlying mapping,
>> +map the shared zeropage,
>> +preallocate memory or read the underlying file;
>> +files with holes might or might not preallocate blocks.
>> +Do not generate
>> +.B SIGBUS
>> +when populating fails,
>> +return an error instead.
>
> Better:
>
> [[
> If populating fails, a
> .B SIGBUS
> signal is not generated; instead, an error i returned.
> ]]
>
Sure, thanks.
>> +.IP
>> +If
>> +.B MADV_POPULATE_READ
>> +succeeds,
>> +all page tables have been populated (prefaulted) readable once.
>> +If
>> +.B MADV_POPULATE_READ
>> +fails,
>> +some page tables might have been populated.
>> +.IP
>> +.B MADV_POPULATE_READ
>> +cannot be applied to mappings without read permissions
>> +and special mappings,
>> +for example,
>> +marked with the kernel-internal
>
> s/marked/mappings marked/
>
>> +.B VM_PFNMAP
>> +and
>
> Just checking: should it be "and" or "or" here"?
>
> Looking at the EINVAL error below, I guess "or", and a better
> wording would be:
>
> [[
> ...for example, mappings marked with kernel-internal flags such as
> .B VMPPFNMAP
> or
> .BR BR_V_IO.
> ]]
Much better. Note that there might be more types of mappings that won't
work (e.g., initially also secretmem IIRC).
>
>> +.BR VM_IO .
>> +.IP
>> +Note that with
>> +.BR MADV_POPULATE_READ ,
>> +the process can be killed at any moment when the system runs out of memory.
>> +.TP
>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)"
>> +Populate (prefault) page tables writable for the whole range without actually
>
> I have trouble to understand "Populate (prefault) page tables writable".
> Does it mean that it is just the page tables are being populated, and the
> PTEs are marked to indicate that the pages are writable? If yes, I
> think some rewording would help.
>
>> +writing memory.
>
> I don't understand "without actually writing memory"? Do you mean,
> "without actually faulting in the pages"; or something else?
>
Similar to the other wording:
"Populate (prefault) page tables writable, faulting in all pages in the
range just as if manually writing one byte of each page; however, avoid
the actual memory access that would have been performed after handling
the fault."
>> +Depending on the underlying mapping,
>> +preallocate memory or read the underlying file;
>> +files with holes will preallocate blocks.
>> +Do not generate
>> +.B SIGBUS
>> +when populating fails,
>> +return an error instead.
>
> Better:
>
> [[
> If populating fails, a
> .B SIGBUS
> signal is not generated; instead, an error i returned.
> ]]
>
Ack.
> +.IP
>> +If
>> +.B MADV_POPULATE_WRITE
>> +succeeds,
>> +all page tables have been populated (prefaulted) writable once.
>> +If
>> +.B MADV_POPULATE_WRITE
>> +fails, some page tables might have been populated.
>> +.IP
>> +.B MADV_POPULATE_WRITE
>> +cannot be applied to mappings without write permissions
>> +and special mappings,
>> +for example,
>> +marked with the kernel-internal
>
> s/marked/mappings marked/
>
>> +.B VM_PFNMAP
>> +and
>
> Just checking: should it be "and" or "or" here"?
>
> Looking at the EINVAL error below, I guess "or", and a better
> wording would be:
>
> [[
> ...for example, mappings marked with kernel-internal flags such as
> .B VMPPFNMAP
> or
> .BR BR_V_IO.
> ]]
>
Ack.
>> +.BR VM_IO .
>> +.IP
>> +Note that with
>> +.BR MADV_POPULATE_WRITE ,
>> +the process can be killed at any moment when the system runs out of memory.
>> .SH RETURN VALUE
>> On success,
>> .BR madvise ()
>> @@ -490,6 +556,17 @@ A kernel resource was temporarily unavailable.
>> .B EBADF
>> The map exists, but the area maps something that isn't a file.
>> .TP
>> +.B EFAULT
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +and populating (prefaulting) page tables failed because a
>> +.B SIGBUS
>> +would have been generated on actual memory access and the reason is not a
>> +HW poisoned page.
>
> Maybe:
> s/.$/(see the description of MADV_HWPOISON in this page)./
> ?
>
Sure, we can add that. But note that MADV_HWPOISON is just one of many
ways to HWpoison a page.
>> +.TP
>> .B EINVAL
>> .I addr
>> is not page-aligned or
>> @@ -533,6 +610,18 @@ or
>> .BR VM_PFNMAP
>> ranges.
>> .TP
>> +.B EINVAL
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +but the specified address range includes ranges with insufficient permissions
>> +or incompatible mappings such as
>> +.B VM_IO
>> +or
>> +.BR VM_PFNMAP.
>
> s/.BR VM_PFNMAP./.BR VM_PFNMAP ./
>
Agreed.
>> +.TP
>> .B EIO
>> (for
>> .BR MADV_WILLNEED )
>> @@ -548,6 +637,15 @@ Not enough memory: paging in failed.
>> Addresses in the specified range are not currently
>> mapped, or are outside the address space of the process.
>> .TP
>> +.B ENOMEM
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +and populating (prefaulting) page tables failed because there was not enough
>> +memory.
>> +.TP
>> .B EPERM
>> .I advice
>> is
>> @@ -555,6 +653,15 @@ is
>> but the caller does not have the
>> .B CAP_SYS_ADMIN
>> capability.
>> +.TP
>> +.B EHWPOISON
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +and populating (prefaulting) page tables failed because a HW poisoned page
>> +was encountered.
>> .SH VERSIONS
>> Since Linux 3.18,
>> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
>
> Thanks,
>
Thanks a lot Michael!
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-08-18 8:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-16 8:19 [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE David Hildenbrand
2021-08-17 21:42 ` Michael Kerrisk (man-pages)
2021-08-18 8:35 ` David Hildenbrand [this message]
2021-08-18 20:58 ` Michael Kerrisk (man-pages)
2021-08-19 18:38 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70792f9c-ace1-6876-378b-5388f7948a60@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alx.manpages@gmail.com \
--cc=jannh@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mtk.manpages@gmail.com \
--cc=osalvador@suse.de \
--cc=pankaj.gupta@ionos.com \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).