* [RFC PATCH 0/1] fs/namespace: defer free_mount from namespace_unlock
@ 2023-01-19 20:55 Eric Chanudet
2023-01-19 20:55 ` [RFC PATCH 1/1] " Eric Chanudet
2023-01-19 21:14 ` [RFC PATCH 0/1] " Eric Chanudet
0 siblings, 2 replies; 4+ messages in thread
From: Eric Chanudet @ 2023-01-19 20:55 UTC (permalink / raw)
To: Alexander Viro
Cc: linux-fsdevel, linux-kernel, Alexander Larsson, Andrew Halaney,
Eric Chanudet
We noticed a significant slow down when running a containers on an
Aarch64 system with the RT patch set using the following test:
# mkdir -p "rootfs/bin"
# printf "int main(){return 0;}" | gcc -x c - -static -o rootfs/bin/sh
# crun spec
# perf stat -r 10 --table --null -- crun run test
Performance counter stats for 'crun run test' (10 runs):
# Table of individual measurements:
0.3902 (-0.1941) ##########
0.5791 (-0.0053) #
0.5785 (-0.0058) #
0.5891 (+0.0047) #
0.6682 (+0.0839) ###
0.5507 (-0.0337) ##
0.5888 (+0.0044) #
0.5797 (-0.0047) #
0.5977 (+0.0133) #
0.7217 (+0.1374) ####
# Final result:
0.5844 +- 0.0269 seconds time elapsed ( +- 4.60% )
A 6.2 non-RT kernel results on the same hardware for comparison:
Performance counter stats for 'crun run test' (10 runs):
# Table of individual measurements:
0.1680 (+0.1375) #################
0.0074 (-0.0231) ###############################################################
0.0073 (-0.0232) ################################################################
0.0070 (-0.0235) ####################################################################
0.0072 (-0.0233) #################################################################
0.0794 (+0.0489) #############
0.0078 (-0.0227) ##########################################################
0.0070 (-0.0235) ###################################################################
0.0070 (-0.0235) ####################################################################
0.0068 (-0.0237) ######################################################################
# Final result:
0.0305 +- 0.0169 seconds time elapsed ( +- 55.33% )
It looks like there is one bottleneck in:
-> do_umount
-> namespace_unlock
-> synchronize_rcu_expedited
With the following patch, namespace_unlock will queue up the resources
that needs to be released and defer the operation through call_rcu to
return without waiting for the grace period.
Alexander Larsson (1):
fs/namespace: defer free_mount from namespace_unlock
fs/namespace.c | 42 +++++++++++++++++++++++++++++++++++-------
1 file changed, 35 insertions(+), 7 deletions(-)
--
2.39.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [RFC PATCH 1/1] fs/namespace: defer free_mount from namespace_unlock 2023-01-19 20:55 [RFC PATCH 0/1] fs/namespace: defer free_mount from namespace_unlock Eric Chanudet @ 2023-01-19 20:55 ` Eric Chanudet 2023-01-19 22:03 ` Al Viro 2023-01-19 21:14 ` [RFC PATCH 0/1] " Eric Chanudet 1 sibling, 1 reply; 4+ messages in thread From: Eric Chanudet @ 2023-01-19 20:55 UTC (permalink / raw) To: Alexander Viro Cc: linux-fsdevel, linux-kernel, Alexander Larsson, Andrew Halaney, Eric Chanudet From: Alexander Larsson <alexl@redhat.com> Use call_rcu to defer releasing the umount'ed or detached filesystem when calling namepsace_unlock(). Calling synchronize_rcu_expedited() has a significant cost on RT kernel that default to rcupdate.rcu_normal_after_boot=1. For example, on a 6.2-rt1 kernel: perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt 0.07464 +- 0.00396 seconds time elapsed ( +- 5.31% ) With this change applied: perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt 0.00162604 +- 0.00000637 seconds time elapsed ( +- 0.39% ) Waiting for the grace period before completing the syscall does not seem mandatory. The struct mount umount'ed are queued up for release in a separate list and no longer accessible to following syscalls. Signed-off-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Eric Chanudet <echanude@redhat.com> --- fs/namespace.c | 42 +++++++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index ab467ee58341..11d219a6e83c 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -44,6 +44,11 @@ static unsigned int m_hash_shift __read_mostly; static unsigned int mp_hash_mask __read_mostly; static unsigned int mp_hash_shift __read_mostly; +struct mount_delayed_release { + struct rcu_head rcu; + struct hlist_head release_list; +}; + static __initdata unsigned long mhash_entries; static int __init set_mhash_entries(char *str) { @@ -1582,11 +1587,31 @@ int may_umount(struct vfsmount *mnt) EXPORT_SYMBOL(may_umount); -static void namespace_unlock(void) +static void free_mounts(struct hlist_head *mount_list) { - struct hlist_head head; struct hlist_node *p; struct mount *m; + + hlist_for_each_entry_safe(m, p, mount_list, mnt_umount) { + hlist_del(&m->mnt_umount); + mntput(&m->mnt); + } +} + +static void delayed_mount_release(struct rcu_head *head) +{ + struct mount_delayed_release *drelease = + container_of(head, struct mount_delayed_release, rcu); + + free_mounts(&drelease->release_list); + kfree(drelease); +} + +static void namespace_unlock(void) +{ + struct hlist_head head; + struct mount_delayed_release *drelease; + LIST_HEAD(list); hlist_move_list(&unmounted, &head); @@ -1599,12 +1624,15 @@ static void namespace_unlock(void) if (likely(hlist_empty(&head))) return; - synchronize_rcu_expedited(); - - hlist_for_each_entry_safe(m, p, &head, mnt_umount) { - hlist_del(&m->mnt_umount); - mntput(&m->mnt); + drelease = kmalloc(sizeof(*drelease), GFP_KERNEL); + if (unlikely(!drelease)) { + synchronize_rcu_expedited(); + free_mounts(&head); + return; } + + hlist_move_list(&head, &drelease->release_list); + call_rcu(&drelease->rcu, delayed_mount_release); } static inline void namespace_lock(void) -- 2.39.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH 1/1] fs/namespace: defer free_mount from namespace_unlock 2023-01-19 20:55 ` [RFC PATCH 1/1] " Eric Chanudet @ 2023-01-19 22:03 ` Al Viro 0 siblings, 0 replies; 4+ messages in thread From: Al Viro @ 2023-01-19 22:03 UTC (permalink / raw) To: Eric Chanudet Cc: linux-fsdevel, linux-kernel, Alexander Larsson, Andrew Halaney On Thu, Jan 19, 2023 at 03:55:21PM -0500, Eric Chanudet wrote: > From: Alexander Larsson <alexl@redhat.com> > > Use call_rcu to defer releasing the umount'ed or detached filesystem > when calling namepsace_unlock(). > > Calling synchronize_rcu_expedited() has a significant cost on RT kernel > that default to rcupdate.rcu_normal_after_boot=1. > > For example, on a 6.2-rt1 kernel: > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > 0.07464 +- 0.00396 seconds time elapsed ( +- 5.31% ) > > With this change applied: > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > 0.00162604 +- 0.00000637 seconds time elapsed ( +- 0.39% ) > > Waiting for the grace period before completing the syscall does not seem > mandatory. The struct mount umount'ed are queued up for release in a > separate list and no longer accessible to following syscalls. You *really* do not want to have umount(2) return without having the filesystems shut down. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH 0/1] fs/namespace: defer free_mount from namespace_unlock 2023-01-19 20:55 [RFC PATCH 0/1] fs/namespace: defer free_mount from namespace_unlock Eric Chanudet 2023-01-19 20:55 ` [RFC PATCH 1/1] " Eric Chanudet @ 2023-01-19 21:14 ` Eric Chanudet 1 sibling, 0 replies; 4+ messages in thread From: Eric Chanudet @ 2023-01-19 21:14 UTC (permalink / raw) To: Alexander Viro Cc: linux-fsdevel, linux-kernel, Alexander Larsson, Andrew Halaney On Thu, Jan 19, 2023 at 03:55:20PM -0500, Eric Chanudet wrote: > We noticed a significant slow down when running a containers on an > Aarch64 system with the RT patch set using the following test: > [...] > With the following patch, namespace_unlock will queue up the resources > that needs to be released and defer the operation through call_rcu to > return without waiting for the grace period. I did not CC:linux-rt-users. Resending with them included, my apologies. -- Eric Chanudet ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-01-19 22:21 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-19 20:55 [RFC PATCH 0/1] fs/namespace: defer free_mount from namespace_unlock Eric Chanudet 2023-01-19 20:55 ` [RFC PATCH 1/1] " Eric Chanudet 2023-01-19 22:03 ` Al Viro 2023-01-19 21:14 ` [RFC PATCH 0/1] " Eric Chanudet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).