linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Chih-En Lin <shiyn.lin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	William Kucharski <william.kucharski@oracle.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Suren Baghdasaryan <surenb@google.com>,
	Colin Cross <ccross@google.com>, Feng Tang <feng.tang@intel.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Mike Rapoport <rppt@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Daniel Axtens <dja@axtens.net>,
	Jonathan Marek <jonathan@marek.ca>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andy Lutomirski <luto@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Fenghua Yu <fenghua.yu@intel.com>,
	linux-kernel@vger.kernel.org, Kaiyang Zhao <zhao776@purdue.edu>,
	Huichun Feng <foxhoundsk.tw@gmail.com>,
	Jim Huang <jserv.tw@gmail.com>
Subject: Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table
Date: Sat, 21 May 2022 22:28:59 +0200	[thread overview]
Message-ID: <c931c9dc-c0ed-05f3-7364-a06088ca7754@redhat.com> (raw)
In-Reply-To: <20220521185004.GA1543057@strix-laptop>

On 21.05.22 20:50, Chih-En Lin wrote:
> On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
>> On 19.05.22 20:31, Chih-En Lin wrote:
>>> When creating the user process, it usually uses the Copy-On-Write (COW)
>>> mechanism to save the memory usage and the cost of time for copying.
>>> COW defers the work of copying private memory and shares it across the
>>> processes as read-only. If either process wants to write in these
>>> memories, it will page fault and copy the shared memory, so the process
>>> will now get its private memory right here, which is called break COW.
>>
>> Yes. Lately we've been dealing with advanced COW+GUP pinnings (which
>> resulted in PageAnonExclusive, which should hit upstream soon), and
>> hearing about COW of page tables (and wondering how it will interact
>> with the mapcount, refcount, PageAnonExclusive of anonymous pages) makes
>> me feel a bit uneasy :)
> 
> I saw the series patch of this and knew how complicated handling COW of
> the physical page was [1][2][3][4]. So the COW page table will tend to
> restrict the sharing only to the page table. This means any modification
> to the physical page will trigger the break COW of page table.
> 
> Presently implementation will only update the physical page information
> to the RSS of the owner process of COW PTE. Generally owner is the
> parent process. And the state of the page, like refcount and mapcount,
> will not change under the COW page table.
> 
> But if any situations will lead to the COW page table needs to consider
> the state of physical page, it might be fretful. ;-)

I haven't looked into the details of how GUP deals with these COW page
tables. But I suspect there might be problems with page pinning:
skipping copy_present_page() even for R/O pages is usually problematic
with R/O pinnings of pages. I might be just wrong.

> 
>>>
>>> Presently this kind of technology is only used as the mapping memory.
>>> It still needs to copy the entire page table from the parent.
>>> It might cost a lot of time and memory to copy each page table when the
>>> parent already has a lot of page tables allocated. For example, here is
>>> the state table for mapping the 1 GB memory of forking.
>>>
>>> 	    mmap before fork         mmap after fork
>>> MemTotal:       32746776 kB             32746776 kB
>>> MemFree:        31468152 kB             31463244 kB
>>> AnonPages:       1073836 kB              1073628 kB
>>> Mapped:            39520 kB                39992 kB
>>> PageTables:         3356 kB                 5432 kB
>>
>>
>> I'm missing the most important point: why do we care and why should we
>> care to make our COW/fork implementation even more complicated?
>>
>> Yes, we might save some page tables and we might reduce the fork() time,
>> however, which specific workload really benefits from this and why do we
>> really care about that workload? Without even hearing about an example
>> user in this cover letter (unless I missed it), I naturally wonder about
>> relevance in practice.
>>
>> I assume it really only matters if we fork() realtively large processes,
>> like databases for snapshotting. However, fork() is already a pretty
>> sever performance hit due to COW, and there are alternatives getting
>> developed as a replacement for such use cases (e.g., uffd-wp).
>>
>> I'm also missing a performance evaluation: I'd expect some simple
>> workloads that use fork() might be even slower after fork() with this
>> change.
>>
> 
> The paper mentioned a list of benchmarks of the time cost for On-Demand
> fork. For example, on Redis, the meantime of fork when taking the
> snapshot. Default fork() got 7.40 ms; On-demand Fork (COW PTE table) got
> 0.12 ms. But there are some other cases, like the Response latency
> distribution of Apache HTTP Server, are not have significant benefits
> from their On-demand fork.

Thanks. I expected that snapshotting would pop up and be one of the most
prominent users that could benefit. However, for that specific use case
I am convinced that uffd-wp is the better choice and fork() is just the
old way of doing it. having nothing better at hand. QEMU already
implements snapshotting of VMs that way and I remember that redis also
intended to implement support for uffd-wp. Not sure what happened with
that and if there is anything missing to make it work.

> 
> For the COW page table from this patch, I also take the perf to analyze
> the cost time. But it looks like not different from the default fork.

Interesting, thanks for sharing.

> 
> Here is the report, the mmap-sfork is COW page table version:
> 
>  Performance counter stats for './mmap-fork' (100 runs):
> 
>             373.92 msec task-clock                #    0.992 CPUs utilized            ( +-  0.09% )
>                  1      context-switches          #    2.656 /sec                     ( +-  6.03% )
>                  0      cpu-migrations            #    0.000 /sec
>                881      page-faults               #    2.340 K/sec                    ( +-  0.02% )
>      1,860,460,792      cycles                    #    4.941 GHz                      ( +-  0.08% )
>      1,451,024,912      instructions              #    0.78  insn per cycle           ( +-  0.00% )
>        310,129,843      branches                  #  823.559 M/sec                    ( +-  0.01% )
>          1,552,469      branch-misses             #    0.50% of all branches          ( +-  0.38% )
> 
>           0.377007 +- 0.000480 seconds time elapsed  ( +-  0.13% )
> 
>  Performance counter stats for './mmap-sfork' (100 runs):
> 
>             373.04 msec task-clock                #    0.992 CPUs utilized            ( +-  0.10% )
>                  1      context-switches          #    2.660 /sec                     ( +-  6.58% )
>                  0      cpu-migrations            #    0.000 /sec
>                877      page-faults               #    2.333 K/sec                    ( +-  0.08% )
>      1,851,843,683      cycles                    #    4.926 GHz                      ( +-  0.08% )
>      1,451,763,414      instructions              #    0.78  insn per cycle           ( +-  0.00% )
>        310,270,268      branches                  #  825.352 M/sec                    ( +-  0.01% )
>          1,649,486      branch-misses             #    0.53% of all branches          ( +-  0.49% )
> 
>           0.376095 +- 0.000478 seconds time elapsed  ( +-  0.13% )
> 
> So, the COW of the page table may reduce the time of forking. But it
> builds on the transfer of the copy work to other modified operations
> to the physical page.

Right.

> 
>> I have tons of questions regarding rmap, accounting, GUP, page table
>> walkers, OOM situations in page walkers, but at this point I am not
>> (yet) convinced that the added complexity is really worth it. So I'd
>> appreciate some additional information.
> 
> It seems like I have a lot of work to do. ;-)

Messing with page tables and COW is usually like opening a can of worms :)

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-05-21 20:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 18:31 [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 1/6] mm: Add a new mm flag for Copy-On-Write PTE table Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 2/6] mm: clone3: Add CLONE_COW_PGTABLE flag Chih-En Lin
2022-05-20 14:13   ` Christophe Leroy
2022-05-21  3:50     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 3/6] mm, pgtable: Add ownership for the PTE table Chih-En Lin
2022-05-20 14:15   ` Christophe Leroy
2022-05-21  4:03     ` Chih-En Lin
2022-05-21  4:02   ` Matthew Wilcox
2022-05-21  5:01     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 4/6] mm: Add COW PTE fallback function Chih-En Lin
2022-05-20 14:21   ` Christophe Leroy
2022-05-21  4:15     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 5/6] mm, pgtable: Add the reference counter for COW PTE Chih-En Lin
2022-05-20 14:30   ` Christophe Leroy
2022-05-21  4:22     ` Chih-En Lin
2022-05-21  4:08   ` Matthew Wilcox
2022-05-21  5:10     ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 6/6] mm: Expand Copy-On-Write to PTE table Chih-En Lin
2022-05-20 14:49   ` Christophe Leroy
2022-05-21  4:38     ` Chih-En Lin
2022-05-21  8:59 ` [External] [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Qi Zheng
2022-05-21 19:08   ` Chih-En Lin
2022-05-21 16:07 ` David Hildenbrand
2022-05-21 18:50   ` Chih-En Lin
2022-05-21 20:28     ` David Hildenbrand [this message]
2022-05-21 20:12   ` Matthew Wilcox
2022-05-21 20:22     ` David Hildenbrand
2022-05-21 22:19     ` Andy Lutomirski
2022-05-22  0:31       ` Matthew Wilcox
2022-05-22 15:20         ` Andy Lutomirski
2022-05-22 19:40           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c931c9dc-c0ed-05f3-7364-a06088ca7754@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=ccross@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dietmar.eggemann@arm.com \
    --cc=dja@axtens.net \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=foxhoundsk.tw@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=jhubbard@nvidia.com \
    --cc=jonathan@marek.ca \
    --cc=jserv.tw@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shiyn.lin@gmail.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=zhao776@purdue.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).