From: David Hildenbrand <david@redhat.com>
To: Chih-En Lin <shiyn.lin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Christian Brauner <brauner@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
William Kucharski <william.kucharski@oracle.com>,
John Hubbard <jhubbard@nvidia.com>,
Yunsheng Lin <linyunsheng@huawei.com>,
Arnd Bergmann <arnd@arndb.de>,
Suren Baghdasaryan <surenb@google.com>,
Colin Cross <ccross@google.com>, Feng Tang <feng.tang@intel.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Mike Rapoport <rppt@kernel.org>,
Geert Uytterhoeven <geert@linux-m68k.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Daniel Axtens <dja@axtens.net>,
Jonathan Marek <jonathan@marek.ca>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Peter Xu <peterx@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andy Lutomirski <luto@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Fenghua Yu <fenghua.yu@intel.com>,
linux-kernel@vger.kernel.org, Kaiyang Zhao <zhao776@purdue.edu>,
Huichun Feng <foxhoundsk.tw@gmail.com>,
Jim Huang <jserv.tw@gmail.com>
Subject: Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table
Date: Sat, 21 May 2022 22:28:59 +0200 [thread overview]
Message-ID: <c931c9dc-c0ed-05f3-7364-a06088ca7754@redhat.com> (raw)
In-Reply-To: <20220521185004.GA1543057@strix-laptop>
On 21.05.22 20:50, Chih-En Lin wrote:
> On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
>> On 19.05.22 20:31, Chih-En Lin wrote:
>>> When creating the user process, it usually uses the Copy-On-Write (COW)
>>> mechanism to save the memory usage and the cost of time for copying.
>>> COW defers the work of copying private memory and shares it across the
>>> processes as read-only. If either process wants to write in these
>>> memories, it will page fault and copy the shared memory, so the process
>>> will now get its private memory right here, which is called break COW.
>>
>> Yes. Lately we've been dealing with advanced COW+GUP pinnings (which
>> resulted in PageAnonExclusive, which should hit upstream soon), and
>> hearing about COW of page tables (and wondering how it will interact
>> with the mapcount, refcount, PageAnonExclusive of anonymous pages) makes
>> me feel a bit uneasy :)
>
> I saw the series patch of this and knew how complicated handling COW of
> the physical page was [1][2][3][4]. So the COW page table will tend to
> restrict the sharing only to the page table. This means any modification
> to the physical page will trigger the break COW of page table.
>
> Presently implementation will only update the physical page information
> to the RSS of the owner process of COW PTE. Generally owner is the
> parent process. And the state of the page, like refcount and mapcount,
> will not change under the COW page table.
>
> But if any situations will lead to the COW page table needs to consider
> the state of physical page, it might be fretful. ;-)
I haven't looked into the details of how GUP deals with these COW page
tables. But I suspect there might be problems with page pinning:
skipping copy_present_page() even for R/O pages is usually problematic
with R/O pinnings of pages. I might be just wrong.
>
>>>
>>> Presently this kind of technology is only used as the mapping memory.
>>> It still needs to copy the entire page table from the parent.
>>> It might cost a lot of time and memory to copy each page table when the
>>> parent already has a lot of page tables allocated. For example, here is
>>> the state table for mapping the 1 GB memory of forking.
>>>
>>> mmap before fork mmap after fork
>>> MemTotal: 32746776 kB 32746776 kB
>>> MemFree: 31468152 kB 31463244 kB
>>> AnonPages: 1073836 kB 1073628 kB
>>> Mapped: 39520 kB 39992 kB
>>> PageTables: 3356 kB 5432 kB
>>
>>
>> I'm missing the most important point: why do we care and why should we
>> care to make our COW/fork implementation even more complicated?
>>
>> Yes, we might save some page tables and we might reduce the fork() time,
>> however, which specific workload really benefits from this and why do we
>> really care about that workload? Without even hearing about an example
>> user in this cover letter (unless I missed it), I naturally wonder about
>> relevance in practice.
>>
>> I assume it really only matters if we fork() realtively large processes,
>> like databases for snapshotting. However, fork() is already a pretty
>> sever performance hit due to COW, and there are alternatives getting
>> developed as a replacement for such use cases (e.g., uffd-wp).
>>
>> I'm also missing a performance evaluation: I'd expect some simple
>> workloads that use fork() might be even slower after fork() with this
>> change.
>>
>
> The paper mentioned a list of benchmarks of the time cost for On-Demand
> fork. For example, on Redis, the meantime of fork when taking the
> snapshot. Default fork() got 7.40 ms; On-demand Fork (COW PTE table) got
> 0.12 ms. But there are some other cases, like the Response latency
> distribution of Apache HTTP Server, are not have significant benefits
> from their On-demand fork.
Thanks. I expected that snapshotting would pop up and be one of the most
prominent users that could benefit. However, for that specific use case
I am convinced that uffd-wp is the better choice and fork() is just the
old way of doing it. having nothing better at hand. QEMU already
implements snapshotting of VMs that way and I remember that redis also
intended to implement support for uffd-wp. Not sure what happened with
that and if there is anything missing to make it work.
>
> For the COW page table from this patch, I also take the perf to analyze
> the cost time. But it looks like not different from the default fork.
Interesting, thanks for sharing.
>
> Here is the report, the mmap-sfork is COW page table version:
>
> Performance counter stats for './mmap-fork' (100 runs):
>
> 373.92 msec task-clock # 0.992 CPUs utilized ( +- 0.09% )
> 1 context-switches # 2.656 /sec ( +- 6.03% )
> 0 cpu-migrations # 0.000 /sec
> 881 page-faults # 2.340 K/sec ( +- 0.02% )
> 1,860,460,792 cycles # 4.941 GHz ( +- 0.08% )
> 1,451,024,912 instructions # 0.78 insn per cycle ( +- 0.00% )
> 310,129,843 branches # 823.559 M/sec ( +- 0.01% )
> 1,552,469 branch-misses # 0.50% of all branches ( +- 0.38% )
>
> 0.377007 +- 0.000480 seconds time elapsed ( +- 0.13% )
>
> Performance counter stats for './mmap-sfork' (100 runs):
>
> 373.04 msec task-clock # 0.992 CPUs utilized ( +- 0.10% )
> 1 context-switches # 2.660 /sec ( +- 6.58% )
> 0 cpu-migrations # 0.000 /sec
> 877 page-faults # 2.333 K/sec ( +- 0.08% )
> 1,851,843,683 cycles # 4.926 GHz ( +- 0.08% )
> 1,451,763,414 instructions # 0.78 insn per cycle ( +- 0.00% )
> 310,270,268 branches # 825.352 M/sec ( +- 0.01% )
> 1,649,486 branch-misses # 0.53% of all branches ( +- 0.49% )
>
> 0.376095 +- 0.000478 seconds time elapsed ( +- 0.13% )
>
> So, the COW of the page table may reduce the time of forking. But it
> builds on the transfer of the copy work to other modified operations
> to the physical page.
Right.
>
>> I have tons of questions regarding rmap, accounting, GUP, page table
>> walkers, OOM situations in page walkers, but at this point I am not
>> (yet) convinced that the added complexity is really worth it. So I'd
>> appreciate some additional information.
>
> It seems like I have a lot of work to do. ;-)
Messing with page tables and COW is usually like opening a can of worms :)
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-05-21 20:29 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-19 18:31 [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 1/6] mm: Add a new mm flag for Copy-On-Write PTE table Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 2/6] mm: clone3: Add CLONE_COW_PGTABLE flag Chih-En Lin
2022-05-20 14:13 ` Christophe Leroy
2022-05-21 3:50 ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 3/6] mm, pgtable: Add ownership for the PTE table Chih-En Lin
2022-05-20 14:15 ` Christophe Leroy
2022-05-21 4:03 ` Chih-En Lin
2022-05-21 4:02 ` Matthew Wilcox
2022-05-21 5:01 ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 4/6] mm: Add COW PTE fallback function Chih-En Lin
2022-05-20 14:21 ` Christophe Leroy
2022-05-21 4:15 ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 5/6] mm, pgtable: Add the reference counter for COW PTE Chih-En Lin
2022-05-20 14:30 ` Christophe Leroy
2022-05-21 4:22 ` Chih-En Lin
2022-05-21 4:08 ` Matthew Wilcox
2022-05-21 5:10 ` Chih-En Lin
2022-05-19 18:31 ` [RFC PATCH 6/6] mm: Expand Copy-On-Write to PTE table Chih-En Lin
2022-05-20 14:49 ` Christophe Leroy
2022-05-21 4:38 ` Chih-En Lin
2022-05-21 8:59 ` [External] [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table Qi Zheng
2022-05-21 19:08 ` Chih-En Lin
2022-05-21 16:07 ` David Hildenbrand
2022-05-21 18:50 ` Chih-En Lin
2022-05-21 20:28 ` David Hildenbrand [this message]
2022-05-21 20:12 ` Matthew Wilcox
2022-05-21 20:22 ` David Hildenbrand
2022-05-21 22:19 ` Andy Lutomirski
2022-05-22 0:31 ` Matthew Wilcox
2022-05-22 15:20 ` Andy Lutomirski
2022-05-22 19:40 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c931c9dc-c0ed-05f3-7364-a06088ca7754@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=anshuman.khandual@arm.com \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=brauner@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=ccross@google.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dietmar.eggemann@arm.com \
--cc=dja@axtens.net \
--cc=ebiederm@xmission.com \
--cc=feng.tang@intel.com \
--cc=fenghua.yu@intel.com \
--cc=foxhoundsk.tw@gmail.com \
--cc=geert@linux-m68k.org \
--cc=jhubbard@nvidia.com \
--cc=jonathan@marek.ca \
--cc=jserv.tw@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linyunsheng@huawei.com \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=shiyn.lin@gmail.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=william.kucharski@oracle.com \
--cc=willy@infradead.org \
--cc=zhao776@purdue.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).