linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: David Hildenbrand <david@redhat.com>, linux-man@vger.kernel.org
Cc: mtk.manpages@gmail.com, Pankaj Gupta <pankaj.gupta@ionos.com>,
	Alejandro Colomar <alx.manpages@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>, Jann Horn <jannh@google.com>,
	Mike Rapoport <rppt@kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
Date: Wed, 18 Aug 2021 22:58:13 +0200	[thread overview]
Message-ID: <abec69cb-6db9-79ab-869e-907da573bfbc@gmail.com> (raw)
In-Reply-To: <70792f9c-ace1-6876-378b-5388f7948a60@redhat.com>

Hello David,

On 8/18/21 10:35 AM, David Hildenbrand wrote:
> On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote:
>> Hello David,
>>
>> Thank you for writing this! Could you please take
>> a look at the comments below and revise?
> 
> Hi Michael,
> 
> thanks for your valuable input. Your feedback will certainly make this 
> easier to understand for people that are not heavily involved in MM work :)
> 
> [...]
> 
>>>   man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 107 insertions(+)
>>>
>>> diff --git a/man2/madvise.2 b/man2/madvise.2
>>> index f1f384c0c..f6cea9ad2 100644
>>> --- a/man2/madvise.2
>>> +++ b/man2/madvise.2
>>> @@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing
>>>   storage.
>>>   The advice might be ignored for some pages in the range when it is not
>>>   applicable.
>>> +.TP
>>> +.BR MADV_POPULATE_READ " (since Linux 5.14)"
>>> +Populate (prefault) page tables readable for the whole range without actually
>>
>> I have trouble to understand "Populate (prefault) page tables readable".
>> Does it mean that it is just the page tables are being populated, and the
>> PTEs are marked to indicate that the pages are readable? If yes, I
>> think some rewording would help.
> 
> I actually tried phrasing it similar to our MAP_POPULATE documentation:
> 	("Populate  (prefault)  page tables for a mapping.")

Yeah, well that description is a bit thin too :-}.

> We will prefault all pages, faulting them in.
>>
>>> +reading memory.
>>
>> I don't understand "without actually reading memory"? Do you mean,
>> "without actually  faulting in the pages"; or something else?
> 
> "Populate (prefault) page tables readable, faulting in all pages in the 
> range just as if manually reading one byte of each page; however, avoid 
> the actual memory access that would have been performed after handling 
> the fault."
> 
> Does that make it clearer? (avoiding eventually touching the page at all 
> can be beneficial, especially when dealing with DAX memory where memory 
> access might be expensive)

That text is much better. But, what's still not clear to me then is the
dfference between mmap(2) MAP_POPULATE, and MADV_POPULATE_READ and
MADV_POPULATE_WRITE. What is the differnece, and in what situations 
would one prefer one or the other approach? I think it would be helpful
if the manual page said something about these details.

>>> +Depending on the underlying mapping,
>>> +map the shared zeropage,
>>> +preallocate memory or read the underlying file;
>>> +files with holes might or might not preallocate blocks.
>>> +Do not generate
>>> +.B SIGBUS
>>> +when populating fails,
>>> +return an error instead.
>>
>> Better:
>>
>> [[
>> If populating fails, a
>> .B SIGBUS
>> signal is not generated; instead, an error i returned.
>> ]]
>>
> 
> Sure, thanks.
> 
>>> +.IP
>>> +If
>>> +.B MADV_POPULATE_READ
>>> +succeeds,
>>> +all page tables have been populated (prefaulted) readable once.
>>> +If
>>> +.B MADV_POPULATE_READ
>>> +fails,
>>> +some page tables might have been populated.
>>> +.IP
>>> +.B MADV_POPULATE_READ
>>> +cannot be applied to mappings without read permissions
>>> +and special mappings,
>>> +for example,
>>> +marked with the kernel-internal
>>
>> s/marked/mappings marked/
>>
>>> +.B VM_PFNMAP
>>> +and
>>
>> Just checking: should it be "and" or "or" here"?
>>
>> Looking at the EINVAL error below, I guess "or", and a better
>> wording would be:
>>
>> [[
>> ...for example, mappings marked with kernel-internal flags such as
>> .B VMPPFNMAP
>> or
>> .BR BR_V_IO.
>> ]]
> 
> Much better. Note that there might be more types of mappings that won't 
> work (e.g., initially also secretmem IIRC).

Ahh nice. Since there's about to be a memfd_secret() manual page, 
I suggest adding also "or secret memory regions created using
memfd_secret(2)".

>>> +.BR VM_IO .
>>> +.IP
>>> +Note that with
>>> +.BR MADV_POPULATE_READ ,
>>> +the process can be killed at any moment when the system runs out of memory.
>>> +.TP
>>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)"
>>> +Populate (prefault) page tables writable for the whole range without actually
>>
>> I have trouble to understand "Populate (prefault) page tables writable".
>> Does it mean that it is just the page tables are being populated, and the
>> PTEs are marked to indicate that the pages are writable? If yes, I
>> think some rewording would help.
>>
>>> +writing memory.
>>
>> I don't understand "without actually writing memory"? Do you mean,
>> "without actually  faulting in the pages"; or something else?
>>
> 
> Similar to the other wording:
> 
> "Populate (prefault) page tables writable, faulting in all pages in the 
> range just as if manually writing one byte of each page; however, avoid 
> the actual memory access that would have been performed after handling 
> the fault."

Much better, but see also my comments above re MADV_POPULATE_READ.

[...]

>>> +.B EFAULT
>>> +.I advice
>>> +is
>>> +.B MADV_POPULATE_READ
>>> +or
>>> +.BR MADV_POPULATE_WRITE ,
>>> +and populating (prefaulting) page tables failed because a
>>> +.B SIGBUS
>>> +would have been generated on actual memory access and the reason is not a
>>> +HW poisoned page.
>>
>> Maybe:
>> s/.$/(see the description of MADV_HWPOISON in this page)./
>> ?
>>
> 
> Sure, we can add that. But note that MADV_HWPOISON is just one of many 
> ways to HWpoison a page.

Then maybe something like:

"(HW poisoned pages can, for example, be created using the
MADV_HWPOISON flag described elsewhere in this page.)"

[...]

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


  reply	other threads:[~2021-08-18 20:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16  8:19 [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE David Hildenbrand
2021-08-17 21:42 ` Michael Kerrisk (man-pages)
2021-08-18  8:35   ` David Hildenbrand
2021-08-18 20:58     ` Michael Kerrisk (man-pages) [this message]
2021-08-19 18:38       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abec69cb-6db9-79ab-869e-907da573bfbc@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alx.manpages@gmail.com \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pankaj.gupta@ionos.com \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).