linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Chih-En Lin <shiyn.lin@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	John Hubbard <jhubbard@nvidia.com>, Nadav Amit <namit@vmware.com>,
	Barry Song <baohua@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Zach O'Keefe <zokeefe@google.com>,
	Yun Zhou <yun.zhou@windriver.com>,
	Hugh Dickins <hughd@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Yu Zhao <yuzhao@google.com>, Juergen Gross <jgross@suse.com>,
	Tong Tiangen <tongtiangen@huawei.com>,
	Liu Shixin <liushixin2@huawei.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Li kunyu <kunyu@nfschina.com>, Minchan Kim <minchan@kernel.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Gautam Menghani <gautammenghani201@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Mark Brown <broonie@kernel.org>, Will Deacon <will@kernel.org>,
	Vincenzo Frascino <Vincenzo.Frascino@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Andy Lutomirski <luto@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Andrei Vagin <avagin@gmail.com>, Barret Rhoden <brho@google.com>,
	Michal Hocko <mhocko@suse.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Alexey Gladkov <legion@kernel.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org,
	Dinglan Peng <peng301@purdue.edu>,
	Pedro Fonseca <pfonseca@purdue.edu>,
	Jim Huang <jserv@ccns.ncku.edu.tw>,
	Huichun Feng <foxhoundsk.tw@gmail.com>
Subject: Re: [PATCH v4 00/14] Introduce Copy-On-Write to Page Table
Date: Tue, 14 Feb 2023 18:59:50 +0100	[thread overview]
Message-ID: <28f1e75a-a1fc-a172-3628-83575e387f9a@redhat.com> (raw)
In-Reply-To: <Y+vK3tXWHCgTC8qk@strix-laptop>

On 14.02.23 18:54, Chih-En Lin wrote:
>>>
>>>> (2) break_cow_pte() can fail, which means that we can fail some
>>>>       operations (possibly silently halfway through) now. For example,
>>>>       looking at your change_pte_range() change, I suspect it's wrong.
>>>
>>> Maybe I should add WARN_ON() and skip the failed COW PTE.
>>
>> One way or the other we'll have to handle it. WARN_ON() sounds wrong for
>> handling OOM situations (e.g., if only that cgroup is OOM).
> 
> Or we should do the same thing like you mentioned:
> "
> For example, __split_huge_pmd() is currently not able to report a
> failure. I assume that we could sleep in there. And if we're not able to
> allocate any memory in there (with sleeping), maybe the process should
> be zapped either way by the OOM killer.
> "
> 
> But instead of zapping the process, we just skip the failed COW PTE.
> I don't think the user will expect their process to be killed by
> changing the protection.

The process is consuming more memory than it is capable of consuming. 
The process most probably would have died earlier without the PTE 
optimization.

But yeah, it all gets tricky ...

> 
>>>
>>>> (3) handle_cow_pte_fault() looks quite complicated and needs quite some
>>>>       double-checking: we temporarily clear the PMD, to reset it
>>>>       afterwards. I am not sure if that is correct. For example, what
>>>>       stops another page fault stumbling over that pmd_none() and
>>>>       allocating an empty page table? Maybe there are some locking details
>>>>       missing or they are very subtle such that we better document them. I
>>>>      recall that THP played quite some tricks to make such cases work ...
>>>
>>> I think that holding mmap_write_lock may be enough (I added
>>> mmap_assert_write_locked() in the fault function btw). But, I might
>>> be wrong. I will look at the THP stuff to see how they work. Thanks.
>>>
>>
>> Ehm, but page faults don't hold the mmap lock writable? And so are other
>> callers, like MADV_DONTNEED or MADV_FREE.
>>
>> handle_pte_fault()->handle_pte_fault()->mmap_assert_write_locked() should
>> bail out.
>>
>> Either I am missing something or you didn't test with lockdep enabled :)
> 
> You're right. I thought I enabled the lockdep.
> And, why do I have the page fault will handle the mmap lock writable in my mind.
> The page fault holds the mmap lock readable instead of writable.
> ;-)
> 
> I should check/test all the locks again.
> Thanks.

Note that we have other ways of traversing page tables, especially, 
using the rmap which does not hold the mmap lock. Not sure if there are 
similar issues when suddenly finding no page table where there logically 
should be one. Or when a page table gets replaced and modified, while 
rmap code still walks the shared copy. Hm.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2023-02-14 18:00 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07  3:51 [PATCH v4 00/14] Introduce Copy-On-Write to Page Table Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 01/14] mm: Allow user to control COW PTE via prctl Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 02/14] mm: Add Copy-On-Write PTE to fork() Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 03/14] mm: Add break COW PTE fault and helper functions Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 04/14] mm/rmap: Break COW PTE in rmap walking Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 05/14] mm/khugepaged: Break COW PTE before scanning pte Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 06/14] mm/ksm: Break COW PTE before modify shared PTE Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 07/14] mm/madvise: Handle COW-ed PTE with madvise() Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 08/14] mm/gup: Trigger break COW PTE before calling follow_pfn_pte() Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 09/14] mm/mprotect: Break COW PTE before changing protection Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 10/14] mm/userfaultfd: Support COW PTE Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 11/14] mm/migrate_device: " Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 12/14] fs/proc: Support COW PTE with clear_refs_write Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 13/14] events/uprobes: Break COW PTE before replacing page Chih-En Lin
2023-02-07  3:51 ` [PATCH v4 14/14] mm: fork: Enable COW PTE to fork system call Chih-En Lin
2023-02-09 18:15 ` [PATCH v4 00/14] Introduce Copy-On-Write to Page Table Pasha Tatashin
2023-02-10  2:17   ` Chih-En Lin
2023-02-10 16:21     ` Pasha Tatashin
2023-02-10 17:20       ` Chih-En Lin
2023-02-10 19:02         ` Chih-En Lin
2023-02-14  9:58         ` David Hildenbrand
2023-02-14 13:07           ` Pasha Tatashin
2023-02-14 13:17             ` David Hildenbrand
2023-02-14 15:59           ` Chih-En Lin
2023-02-14 16:30             ` Pasha Tatashin
2023-02-14 18:41               ` Chih-En Lin
2023-02-14 18:52                 ` Pasha Tatashin
2023-02-14 19:17                   ` Chih-En Lin
2023-02-14 16:58             ` David Hildenbrand
2023-02-14 17:03               ` David Hildenbrand
2023-02-14 17:56                 ` Chih-En Lin
2023-02-14 17:54               ` Chih-En Lin
2023-02-14 17:59                 ` David Hildenbrand [this message]
2023-02-14 19:06                   ` Chih-En Lin
2023-02-14 17:23           ` Yang Shi
2023-02-14 17:39             ` David Hildenbrand
2023-02-14 18:25               ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28f1e75a-a1fc-a172-3628-83575e387f9a@redhat.com \
    --to=david@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=Vincenzo.Frascino@arm.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=anshuman.khandual@arm.com \
    --cc=avagin@gmail.com \
    --cc=baohua@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=brho@google.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=ebiederm@xmission.com \
    --cc=fenghua.yu@intel.com \
    --cc=foxhoundsk.tw@gmail.com \
    --cc=gautammenghani201@gmail.com \
    --cc=hughd@google.com \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=jolsa@kernel.org \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=kunyu@nfschina.com \
    --cc=legion@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=liushixin2@huawei.com \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=namit@vmware.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peng301@purdue.edu \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pfonseca@purdue.edu \
    --cc=rostedt@goodmis.org \
    --cc=shiyn.lin@gmail.com \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tongtiangen@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yun.zhou@windriver.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).