* 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing [not found] ` <aeikm2e5Gh5reJ30@lucifer> @ 2026-04-22 12:51 ` Yibin Liu 2026-04-22 16:16 ` Lorenzo Stoakes 0 siblings, 1 reply; 7+ messages in thread From: Yibin Liu @ 2026-04-22 12:51 UTC (permalink / raw) To: Lorenzo Stoakes Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, mjguzik@gmail.com, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org First of all, I am truly sorry for not using RFC. Secondly, I omitted many maintainers because I wanted to “not disturb too many people”, and I apologize deeply for that. I will fully follow these two rules from now on. As for this patch, indeed, as Matthew said, the truncate part is not feasible. My original intention was to apply this to frequently used library files like libc and ld. Contention on the i_mmap_rwsem lock (which eventually turns into osq_lock) caused by these two files alone reaches up to 70% in the “256-core execl” case, as observed from flame graphs. Besides, no one performs truncate operations on libc and ld anyway. So I wanted to try skipping rmap for them. Since they are small, even if they cannot be reclaimed or migrated, I assumed it would not cause much trouble. Of course, this idea was totally wrong, and I will definitely mark such insane proposals with RFC in the future. These ideas are inspired by Mateusz’s work and thoughts (https://lore.kernel.org/linux-mm/CAGudoHEfiOPJ2VGEV3fDT9cDsuoHB-wk8jg-k-EK6JhWgiHkWw@mail.gmail.com/), so I specifically CC’d him to seek more opinions and insights. Lastly, I sincerely apologize for the trouble I have caused the community. I will strictly follow community conventions when sending patches in the future. > NAK obviously. > > I hate to keep saying this to people, but you've got no excuse at this stage > it's been a year or so since we added mm maintainers/reviewers and you're not > sending this to the right people. > > How hard is doing: > > $ scripts/get_maintainer.pl --no-git fs/fcntl.c fs/open.c include/linux/fs.h \ > include/uapi/linux/fcntl.h mm/mmap.c mm/vma.c > Jeff Layton <jlayton@kernel.org> (maintainer:FILE LOCKING (flock() and > fcntl()/lockf())) > Chuck Lever <chuck.lever@oracle.com> (maintainer:FILE LOCKING (flock() and > fcntl()/lockf())) > Alexander Aring <alex.aring@gmail.com> (reviewer:FILE LOCKING (flock() and > fcntl()/lockf())) > Alexander Viro <viro@zeniv.linux.org.uk> (maintainer:FILESYSTEMS (VFS and > infrastructure)) > Christian Brauner <brauner@kernel.org> (maintainer:FILESYSTEMS (VFS and > infrastructure)) > Jan Kara <jack@suse.cz> (reviewer:FILESYSTEMS (VFS and infrastructure)) > Andrew Morton <akpm@linux-foundation.org> (maintainer:MEMORY > MAPPING) > "Liam R. Howlett" <Liam.Howlett@oracle.com> (maintainer:MEMORY > MAPPING) > Lorenzo Stoakes <ljs@kernel.org> (maintainer:MEMORY MAPPING) > Vlastimil Babka <vbabka@kernel.org> (reviewer:MEMORY MAPPING) > Jann Horn <jannh@google.com> (reviewer:MEMORY MAPPING) > Pedro Falcato <pfalcato@suse.de> (reviewer:MEMORY MAPPING) > linux-fsdevel@vger.kernel.org (open list:FILE LOCKING (flock() and fcntl()/lockf())) > linux-kernel@vger.kernel.org (open list) > linux-mm@kvack.org (open list:MEMORY MAPPING) > > ? > > You're sending an insane patch that breaks core mm and you can't even send it > to > the right people... > > (And yet Mateusz is somehow cc'd (he loves that :)) > > This kind of craziness should be an RFC also as David said. > > Both of these things are just rude and not helpful wrt upstream. > ... ... > ... ... > This idea is totally broken. > > If you want to contribute usefully, PLEASE drop this silly idea, come back with > some NUMBERS about the contention you see, and let's have a sensible > discussion > about what we can do to address that? > > Also follow standard upstream kernel procedures - figure out who to email > properly, RFC insane ideas, etc. > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing 2026-04-22 12:51 ` 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing Yibin Liu @ 2026-04-22 16:16 ` Lorenzo Stoakes 2026-04-24 1:08 ` 答复: " Yibin Liu 0 siblings, 1 reply; 7+ messages in thread From: Lorenzo Stoakes @ 2026-04-22 16:16 UTC (permalink / raw) To: Yibin Liu Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, mjguzik@gmail.com, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, Apr 22, 2026 at 12:51:06PM +0000, Yibin Liu wrote: > First of all, I am truly sorry for not using RFC. > Secondly, I omitted many maintainers because I wanted to “not disturb too many people”, > and I apologize deeply for that. I will fully follow these two rules from now on. > > As for this patch, indeed, as Matthew said, the truncate part is not feasible. > My original intention was to apply this to frequently used library files like libc and ld. > Contention on the i_mmap_rwsem lock (which eventually turns into osq_lock) caused by > these two files alone reaches up to 70% in the “256-core execl” case, as observed from > flame graphs. Besides, no one performs truncate operations on libc and ld anyway. Interesting, would be good to see these? And more details on the scenario? What workloads are contending that exactly? > > So I wanted to try skipping rmap for them. Since they are small, even if they cannot > be reclaimed or migrated, I assumed it would not cause much trouble. Of course, > this idea was totally wrong, and I will definitely mark such insane proposals with RFC in the future. > > These ideas are inspired by Mateusz’s work and thoughts > (https://lore.kernel.org/linux-mm/CAGudoHEfiOPJ2VGEV3fDT9cDsuoHB-wk8jg-k-EK6JhWgiHkWw@mail.gmail.com/), > so I specifically CC’d him to seek more opinions and insights. I think the best thing in general going forwards is to bring up this issues in advance, we're more than happy to look into things and very interested in issues with lock contention, latency, etc. And that way you can discuss ideas you might have to tackle up front and we can give you early feedback, which should save time all round and help get us to a good solution :) Just send with a [DISCUSSION] preface and cc- people you feel are relevant (use MAINTAINERS to figure out e.g. maintainers of relevant things, like rmap, mmap, etc.) > > Lastly, I sincerely apologize for the trouble I have caused the community. > I will strictly follow community conventions when sending patches in the future. It's no problem, better to be direct about this - it's more useful to discuss rather than to jump to a solution without community involvement, which might not work out/conflict with other stuff etc. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 7+ messages in thread
* 答复: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing 2026-04-22 16:16 ` Lorenzo Stoakes @ 2026-04-24 1:08 ` Yibin Liu 2026-04-24 3:20 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: Yibin Liu @ 2026-04-24 1:08 UTC (permalink / raw) To: Lorenzo Stoakes Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, mjguzik@gmail.com, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org > On Wed, Apr 22, 2026 at 12:51:06PM +0000, Yibin Liu wrote: > > First of all, I am truly sorry for not using RFC. > > Secondly, I omitted many maintainers because I wanted to “not disturb too > many people”, > > and I apologize deeply for that. I will fully follow these two rules from now on. > > > > As for this patch, indeed, as Matthew said, the truncate part is not feasible. > > My original intention was to apply this to frequently used library files like libc > and ld. > > Contention on the i_mmap_rwsem lock (which eventually turns into osq_lock) > caused by > > these two files alone reaches up to 70% in the “256-core execl” case, as > observed from > > flame graphs. Besides, no one performs truncate operations on libc and ld > anyway. > > Interesting, would be good to see these? And more details on the scenario? > > What workloads are contending that exactly? > It is good to see. On an Intel Emerald Rapids server (112 cores), run the execl benchmark from UnixBench with the command: ./Run -c 220 execl Then perf top shows: 91.53% [kernel] [k] osq_lock 0.50% [kernel] [k] rwsem_spin_on_owner 0.45% perf [.] queue_event 0.42% [kernel] [k] vma_interval_tree_insert 0.36% [kernel] [k] next_uptodate_folio 0.25% [kernel] [k] __zap_vma_range All the osq_lock overhead here comes from rwsem_optimistic_spin, and rwsem_optimistic_spin has many call sources. The breakdown is roughly as follows: 6.13% _dl_main-->mprotect-->...-->__split_vma-->vma_prepare-->down_write(&mapping->i_mmap_rwsem) 6.15% bprm_execve-->...-->exit_mmap-->...-->unlink_file_vma_batch_process-->down_write(&mapping->i_mmap_rwsem) 24.71% vma_link_file-->...-->down_write(&mapping->i_mmap_rwsem) 24.82% mmap_region-->...-->free_pgtalbes-->unlink_file_vma_batch_process-->down_write(&mapping->i_mmap_rwsem) 18.5% mmap_region-->...-->__split_vma->vma_preapre-->down_write(&mapping->i_mmap_rwsem) 12.44% _dl_map_project-->mprotect-->...-->__split_vma-->vma_prepare-->down_write(&mapping->i_mmap_rwsem) And AMD Zen5(9755) performs pretty much the same way (tested with ./Run -c 250 execl). > > > > So I wanted to try skipping rmap for them. Since they are small, even if they > cannot > > be reclaimed or migrated, I assumed it would not cause much trouble. Of > course, > > this idea was totally wrong, and I will definitely mark such insane proposals > with RFC in the future. > > > > These ideas are inspired by Mateusz’s work and thoughts > > > (https://lore.kernel.org/linux-mm/CAGudoHEfiOPJ2VGEV3fDT9cDsuoHB-wk8jg-k > -EK6JhWgiHkWw@mail.gmail.com/), > > so I specifically CC’d him to seek more opinions and insights. > > I think the best thing in general going forwards is to bring up this issues in > advance, we're more than happy to look into things and very interested in issues > with lock contention, latency, etc. > > And that way you can discuss ideas you might have to tackle up front and we can > give you early feedback, which should save time all round and help get us to a > good solution :) > > Just send with a [DISCUSSION] preface and cc- people you feel are relevant (use > MAINTAINERS to figure out e.g. maintainers of relevant things, like rmap, > mmap, > etc.) > > > > > Lastly, I sincerely apologize for the trouble I have caused the community. > > I will strictly follow community conventions when sending patches in the > future. > > It's no problem, better to be direct about this - it's more useful to discuss > rather than to jump to a solution without community involvement, which might > not > work out/conflict with other stuff etc. > > Thanks, Lorenzo Thanks for the kind advice. I will start a discussion first with a [DISCUSSION] tag and involve relevant maintainers for similar ideas in the future. Thanks, Yibin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 答复: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing 2026-04-24 1:08 ` 答复: " Yibin Liu @ 2026-04-24 3:20 ` Matthew Wilcox 2026-04-24 6:20 ` Mateusz Guzik 0 siblings, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2026-04-24 3:20 UTC (permalink / raw) To: Yibin Liu Cc: Lorenzo Stoakes, linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, mjguzik@gmail.com, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Apr 24, 2026 at 01:08:35AM +0000, Yibin Liu wrote: > On an Intel Emerald Rapids server (112 cores), run the execl benchmark from > UnixBench with the command: ./Run -c 220 execl > Then perf top shows: > > 91.53% [kernel] [k] osq_lock > 0.50% [kernel] [k] rwsem_spin_on_owner OK, but does this represent a realistic workload? It's pretty easy to construct workloads that hammer on particular locks; the question is whether it's a relevant performance bottleneck that customers care about. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 答复: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing 2026-04-24 3:20 ` Matthew Wilcox @ 2026-04-24 6:20 ` Mateusz Guzik 2026-04-24 6:54 ` Mateusz Guzik 0 siblings, 1 reply; 7+ messages in thread From: Mateusz Guzik @ 2026-04-24 6:20 UTC (permalink / raw) To: Matthew Wilcox Cc: Yibin Liu, Lorenzo Stoakes, linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Apr 24, 2026 at 5:20 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Apr 24, 2026 at 01:08:35AM +0000, Yibin Liu wrote: > > On an Intel Emerald Rapids server (112 cores), run the execl benchmark from > > UnixBench with the command: ./Run -c 220 execl > > Then perf top shows: > > > > 91.53% [kernel] [k] osq_lock > > 0.50% [kernel] [k] rwsem_spin_on_owner > > OK, but does this represent a realistic workload? It's pretty easy to > construct workloads that hammer on particular locks; the question is > whether it's a relevant performance bottleneck that customers care about. This is a genuine problem when doing large-scale package building. I'll say upfront I have extensive experience with this crap on FreeBSD, I did not run it on Linux myself, but bear with me here -- while FreeBSD is in doubt a less scalable kernel, Linux demonstrated to be suffering from the same problems. Say you have a box with a core count of 100 and get it to work building up to 100 packages at a time. Further, even if you use some form of separation from file-system standpoint on userspace level, you still want to share the common binaries to reduce memory + cache footprint so you at least --bind them. Then you are susceptible to contention issues at least on paper. Granted, building a pig like chromium scales great because it is written in c++ and almost all of the time is spent in userspace, with forks and execs of the compiler highly spread out in time, in turn putting very little pressure on the locks. However, vast majority of packages is very tiny in comparison (literally a few .c files) and this is where things go south as they engage in exec frenzy, looking like a borderline microbenchmark. The primary culprit is configure scripts, issuing an idiotic number of back-to-back execs of short-lived processes (notably sed, but also grep, rm and others). There is a lot of evil in makefiles as well. I don't have numbers handy, but in case of the FreeBSD ports tree we are talking about over 10 000 ports which on their own take few seconds to build. Since these are largely single-threaded, if you have package-building machinery which can saturate the box, you easily end up with parallel builds matching your core count. And when they engage in exec-frenzy for the duration, you may as well be microbenchmarking it. A sufficiently pessimized workload is indistinguishable from a microbenchmark and this here is an example of one. iow this is a real problem, but I don't have specific numbers for Linux. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 答复: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing 2026-04-24 6:20 ` Mateusz Guzik @ 2026-04-24 6:54 ` Mateusz Guzik 0 siblings, 0 replies; 7+ messages in thread From: Mateusz Guzik @ 2026-04-24 6:54 UTC (permalink / raw) To: Matthew Wilcox Cc: Yibin Liu, Lorenzo Stoakes, linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Apr 24, 2026 at 8:20 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > > On Fri, Apr 24, 2026 at 5:20 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Fri, Apr 24, 2026 at 01:08:35AM +0000, Yibin Liu wrote: > > > On an Intel Emerald Rapids server (112 cores), run the execl benchmark from > > > UnixBench with the command: ./Run -c 220 execl > > > Then perf top shows: > > > > > > 91.53% [kernel] [k] osq_lock > > > 0.50% [kernel] [k] rwsem_spin_on_owner > > > > OK, but does this represent a realistic workload? It's pretty easy to > > construct workloads that hammer on particular locks; the question is > > whether it's a relevant performance bottleneck that customers care about. > > This is a genuine problem when doing large-scale package building. > I'll say upfront I have extensive experience with this crap on > FreeBSD, I did not run it on Linux myself, but bear with me here -- > while FreeBSD is in doubt a less scalable kernel, Linux demonstrated > to be suffering from the same problems. > > Say you have a box with a core count of 100 and get it to work > building up to 100 packages at a time. Further, even if you use some > form of separation from file-system standpoint on userspace level, you > still want to share the common binaries to reduce memory + cache > footprint so you at least --bind them. Then you are susceptible to > contention issues at least on paper. > > Granted, building a pig like chromium scales great because it is > written in c++ and almost all of the time is spent in userspace, with > forks and execs of the compiler highly spread out in time, in turn > putting very little pressure on the locks. > > However, vast majority of packages is very tiny in comparison > (literally a few .c files) and this is where things go south as they > engage in exec frenzy, looking like a borderline microbenchmark. The > primary culprit is configure scripts, issuing an idiotic number of > back-to-back execs of short-lived processes (notably sed, but also > grep, rm and others). There is a lot of evil in makefiles as well. > > I don't have numbers handy, but in case of the FreeBSD ports tree we > are talking about over 10 000 ports which on their own take few > seconds to build. Since these are largely single-threaded, if you have > package-building machinery which can saturate the box, you easily end > up with parallel builds matching your core count. And when they engage > in exec-frenzy for the duration, you may as well be microbenchmarking > it. > > A sufficiently pessimized workload is indistinguishable from a > microbenchmark and this here is an example of one. > > iow this is a real problem, but I don't have specific numbers for Linux. I had gcc handy on Linux, so I ran configure on it by hand. This is all autotools generated, so general theme matches the small programs as well. I ended up with the following execs: 285 /usr/bin/sed 95 /usr/bin/rm 88 /usr/bin/grep 77 /usr/bin/cat 30 /usr/bin/gcc 27 /usr/bin/expr 23 /usr/libexec/gcc/x86_64-linux-gnu/15/cc1 21 /usr/bin/as 10 /usr/bin/uname 8 /usr/libexec/gcc/x86_64-linux-gnu/15/collect2 8 /usr/bin/ld 8 /bin/bash 7 /usr/bin/mv 7 /usr/bin/c++ 6 /usr/bin/mkdir 6 /usr/bin/basename 5 /usr/bin/ln 5 /usr/bin/cmp 4 /usr/bin/sort 4 /usr/bin/dirname 3 /usr/bin/rmdir 3 /usr/bin/hostname 3 /usr/bin/gawk 3 /usr/bin/cc 2 /usr/bin/strip 2 /usr/bin/mktemp 2 /usr/bin/ls 2 /usr/bin/egrep 2 /usr/bin/cp 2 /usr/bin/ar 2 /bin/sh 1 /usr/lib/llvm-20/bin/clang 1 /usr/bin/tr 1 /usr/bin/touch 1 /usr/bin/ranlib 1 /usr/bin/install 1 /usr/bin/diff 1 /usr/bin/chmod 1 /usr/bin/arch 1 /home/mjg/repos/gcc/missing 1 /bin/uname 1 /bin/arch 1 ./contrib/compare-debug 1 ./conftest 1 ./configure It spent almost half of the runtime in the kernel, all while there was no contention. So imagine all that to compile a few .c files & rinse & repeat thousands of times in parallel on 100+ cores. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAGudoHHki3gv-HXXMALePDoC+tmao4oWcYgCo9kXNDkEhW4E4g@mail.gmail.com>]
* 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing [not found] ` <CAGudoHHki3gv-HXXMALePDoC+tmao4oWcYgCo9kXNDkEhW4E4g@mail.gmail.com> @ 2026-04-22 13:03 ` Yibin Liu 0 siblings, 0 replies; 7+ messages in thread From: Yibin Liu @ 2026-04-22 13:03 UTC (permalink / raw) To: Mateusz Guzik Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, viro@zeniv.linux.org.uk, brauner@kernel.org, Jianyong Wu, Huangsj, Yuan Zhong, jack@suse.cz, jlayton@kernel.org, chuck.lever@oracle.com, alex.aring@gmail.com, vbabka@kernel.org, jannh@google.com, pfalcato@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Stoakes > On Tue, Apr 21, 2026 at 4:11 AM Yibin Liu <liuyibin@hygon.cn> wrote: > > > > UnixBench execl/shellscript (dynamically linked binaries) at 64+ cores are > > bottlenecked on the i_mmap_rwsem semaphore due to heavy vma > insert/remove > > operations on the i_mmap tree, where libc.so.6 is the most frequent, > > followed by ld-linux-x86-64.so.2 and the test executable itself. > > > > This patch marks such files to skip rmap operations, avoiding frequent > > interval tree insert/remove that cause i_mmap_rwsem lock contention. > > The downside is these files can no longer be reclaimed (along with compact > > and ksm), but since they are small and resident anyway, it's acceptable. > > When all mapping processes exit, files can still be reclaimed normally. > > > > Performance testing shows ~80% improvement in UnixBench execl/shellscript > > scores on Hygon 7490, AMD zen4 9754 and Intel emerald rapids platform. > > > > The other responders have been a little harsh and despite raising > valid points I don't think they gave a proper review. > > The bigger picture is that the problematic rwsem is taken several > times during fork + exec + exit cycle. Normally you end up with 5 > distinct mappings per binary/so, each created with a separate lock > acquire. > > Some time ago I patched exit to batch processing, leaving 1 acquire in > that codepath. fork can and should be patched in a similar vein, but I > don't know if unixbench runs it in this benchmark (i.e., real > workloads certainly suffer from it, I don't know if this particular > bench includes that aspect). This is on top of forking itself being > avoidable should the kernel grow a better interface for executing > binaries. > Thank you for your opnions and advices, I'll try this way > This leaves us with mapping creation on exec. This problem is > unfixable without introduction of better APIs for userspace, which > constitutes quite a challenge. > > The end result is the absolutely horrible case of multiple acquires of > the same lock per iteration. > > One common idea how to reduce contention boils down to shortening lock > hold time. This has very limited effect in face of the aforementioned > multiple acquires and is at best a stop gap -- no matter what, the > ceiling is dictated by the extra acquires and it is incredibly low. > > Your patch keeps the problematic acquire pattern intact and while the > 80% win might sound encouraging, the end result is still severely > underperforming even a state where the lock is taken once in total > during exec. > > Besides that, the internally-visible side effect of non-functional > rmap is pretty bad (and thus e.g., truncate) is pretty bad in its own > right, but let's ignore it. The primary problem here is that the patch > exposes a mechanism for userspace to dictate this in the first place. > Even ignoring the question of who should be using it and when, the > real solution to the problem would be confined to the kernel. Suppose > this patch lands and such a solution is implemented later -- now the > kernel is stuck having to support a now-useless (if not outright > harmful) feature. OK. I understand it now. > > What will fix the problem is sharding the state in some capacity, > provided no unfixable stopgap shows up. > > Any other approach is putting small bandaids on it and can be a > consideration only if the decentralizing locking is proven too > problematic. > > Pedro apparently volunteered to do the work, so I think we can wait to > see what he is going to end up cooking. > > I hope this helps. > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-24 6:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260421020932.3212532-1-liuyibin@hygon.cn>
[not found] ` <aeikm2e5Gh5reJ30@lucifer>
2026-04-22 12:51 ` 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing Yibin Liu
2026-04-22 16:16 ` Lorenzo Stoakes
2026-04-24 1:08 ` 答复: " Yibin Liu
2026-04-24 3:20 ` Matthew Wilcox
2026-04-24 6:20 ` Mateusz Guzik
2026-04-24 6:54 ` Mateusz Guzik
[not found] ` <CAGudoHHki3gv-HXXMALePDoC+tmao4oWcYgCo9kXNDkEhW4E4g@mail.gmail.com>
2026-04-22 13:03 ` Yibin Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox