* [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore
@ 2015-07-13 21:25 Oleg Nesterov
2015-07-13 21:25 ` [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() Oleg Nesterov
` (5 more replies)
0 siblings, 6 replies; 32+ messages in thread
From: Oleg Nesterov @ 2015-07-13 21:25 UTC (permalink / raw)
To: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra
Cc: Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo,
linux-kernel
Hello,
Al, Jan, could you comment? I mean the intent, the patches are
obviously not for inclusion yet.
We can remove everything from struct sb_writers except frozen
(which can become a boolean, it seems) and add the array of
percpu_rw_semaphore's instead.
__sb_start/end_write() can use percpu_down/up_read(), and
freeze/thaw_super() can use percpu_down/up_write().
Why:
- Firstly, __sb_start_write() looks simply buggy. I does
__sb_end_write() if it sees ->frozen, but if it migrates
to another CPU before percpu_counter_dec() sb_wait_write()
can wrongly succeed if there is another task which holds
the same "semaphore": sb_wait_write() can miss the result
of the previous percpu_counter_inc() but see the result
of this percpu_counter_dec().
- This code doesn't look simple. It would be better to rely
on the generic locking code.
- __sb_start_write() will be a little bit faster, but this
is minor.
Todo:
- __sb_start_write(wait => false) always fail.
Thivial, we already have percpu_down_read_trylock() just
this patch wasn't merged yet.
- sb_lockdep_release() and sb_lockdep_acquire() play with
percpu_rw_semaphore's internals.
Trivial, we need a couple of new helper in percpu-rwsem.c.
- Fix get_super_thawed(), it will spin if MS_RDONLY...
It is not clear to me what exactly should we do, but this
doesn't look hard. Perhaps it can just return if MS_RDONLY.
- Most probably I missed something else, and I do not need
how to test.
Finally. freeze_super() calls synchronize_sched_expedited() 3 times in
a row. This is bad and just stupid. But if we change percpu_rw_semaphore
to use rcu_sync (see https://lkml.org/lkml/2015/7/11/211) we can avoid
this and do synchronize_sched() only once. Just we need some more simple
changes in percpu-rwsem.c, so that all sb_writers->rw_sem[] semaphores
could use the single sb_writers->rss.
In this case destroy_super() needs some modifications too,
percpu_free_rwsem() will be might_sleep(). But this looks simple too.
Oleg.
fs/super.c | 147 +++++++++++++++++++--------------------------------
include/linux/fs.h | 14 +----
2 files changed, 58 insertions(+), 103 deletions(-)
^ permalink raw reply [flat|nested] 32+ messages in thread* [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov @ 2015-07-13 21:25 ` Oleg Nesterov 2015-07-14 10:49 ` Jan Kara 2015-07-13 21:25 ` [PATCH 2/4] introduce sb_unlock_frozen() Oleg Nesterov ` (4 subsequent siblings) 5 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-13 21:25 UTC (permalink / raw) To: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra Cc: Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel Preparation. get_super_thawed() can do sb_start/end_write() with the same effect, we are going to kill sb_writers->wait_unfrozen. --- fs/super.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/super.c b/fs/super.c index 928c20f..5ea0edd 100644 --- a/fs/super.c +++ b/fs/super.c @@ -621,8 +621,8 @@ struct super_block *get_super_thawed(struct block_device *bdev) if (!s || s->s_writers.frozen == SB_UNFROZEN) return s; up_read(&s->s_umount); - wait_event(s->s_writers.wait_unfrozen, - s->s_writers.frozen == SB_UNFROZEN); + sb_start_write(s); + sb_end_intwrite(s); put_super(s); } } -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() 2015-07-13 21:25 ` [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() Oleg Nesterov @ 2015-07-14 10:49 ` Jan Kara 2015-07-14 13:38 ` Oleg Nesterov 0 siblings, 1 reply; 32+ messages in thread From: Jan Kara @ 2015-07-14 10:49 UTC (permalink / raw) To: Oleg Nesterov Cc: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Mon 13-07-15 23:25:48, Oleg Nesterov wrote: > Preparation. get_super_thawed() can do sb_start/end_write() with > the same effect, we are going to kill sb_writers->wait_unfrozen. > --- > fs/super.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/super.c b/fs/super.c > index 928c20f..5ea0edd 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -621,8 +621,8 @@ struct super_block *get_super_thawed(struct block_device *bdev) > if (!s || s->s_writers.frozen == SB_UNFROZEN) > return s; > up_read(&s->s_umount); > - wait_event(s->s_writers.wait_unfrozen, > - s->s_writers.frozen == SB_UNFROZEN); > + sb_start_write(s); > + sb_end_intwrite(s); This is definitely buggy - you need to start and end freeze protection at the same level... Honza > put_super(s); > } > } > -- > 1.5.5.1 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() 2015-07-14 10:49 ` Jan Kara @ 2015-07-14 13:38 ` Oleg Nesterov 0 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-14 13:38 UTC (permalink / raw) To: Jan Kara Cc: Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/14, Jan Kara wrote: > > > @@ -621,8 +621,8 @@ struct super_block *get_super_thawed(struct block_device *bdev) > > if (!s || s->s_writers.frozen == SB_UNFROZEN) > > return s; > > up_read(&s->s_umount); > > - wait_event(s->s_writers.wait_unfrozen, > > - s->s_writers.frozen == SB_UNFROZEN); > > + sb_start_write(s); > > + sb_end_intwrite(s); > > This is definitely buggy - you need to start and end freeze protection at > the same level... Of course ;) copy-and-paste bug. Thanks! Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 2/4] introduce sb_unlock_frozen() 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov 2015-07-13 21:25 ` [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() Oleg Nesterov @ 2015-07-13 21:25 ` Oleg Nesterov 2015-07-13 21:25 ` [PATCH 3/4] introduce sb_lockdep_release() Oleg Nesterov ` (3 subsequent siblings) 5 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-13 21:25 UTC (permalink / raw) To: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra Cc: Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel Preaparation. Add the trivial helper which does the wakeup. --- fs/super.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/super.c b/fs/super.c index 5ea0edd..c23bafc 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1254,6 +1254,12 @@ static void sb_wait_write(struct super_block *sb, int level) } while (writers); } +static void sb_unlock_frozen(struct super_block *sb) +{ + smp_wmb(); + wake_up(&sb->s_writers.wait_unfrozen); +} + /** * freeze_super - lock the filesystem and force it into a consistent state * @sb: the super to lock @@ -1340,8 +1346,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); - wake_up(&sb->s_writers.wait_unfrozen); + sb_unlock_frozen(sb); deactivate_locked_super(sb); return ret; } @@ -1387,8 +1392,7 @@ int thaw_super(struct super_block *sb) out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); - wake_up(&sb->s_writers.wait_unfrozen); + sb_unlock_frozen(sb); deactivate_locked_super(sb); return 0; -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/4] introduce sb_lockdep_release() 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov 2015-07-13 21:25 ` [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() Oleg Nesterov 2015-07-13 21:25 ` [PATCH 2/4] introduce sb_unlock_frozen() Oleg Nesterov @ 2015-07-13 21:25 ` Oleg Nesterov 2015-07-13 21:25 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov ` (2 subsequent siblings) 5 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-13 21:25 UTC (permalink / raw) To: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra Cc: Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel Move the "fool the lockdep" code from sb_wait_write() into the new simple helper, sb_lockdep_release(). This makes sense in any case afaics, this way s_op->freeze_fs(sb) is called with these write locks held as it seen by lockdep. --- fs/super.c | 16 +++++++++++----- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/super.c b/fs/super.c index c23bafc..94303fc 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1229,12 +1229,7 @@ static void sb_wait_write(struct super_block *sb, int level) { s64 writers; - /* - * We just cycle-through lockdep here so that it does not complain - * about returning with lock to userspace - */ rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_); - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_); do { DEFINE_WAIT(wait); @@ -1254,6 +1249,16 @@ static void sb_wait_write(struct super_block *sb, int level) } while (writers); } +/* Avoid the warning from lockdep_sys_exit() */ +static void sb_lockdep_release(struct super_block *sb) +{ + int level; + + for (level = 0; level < SB_FREEZE_LEVELS; ++level) { + rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_); + } +} + static void sb_unlock_frozen(struct super_block *sb) { smp_wmb(); @@ -1356,6 +1361,7 @@ int freeze_super(struct super_block *sb) * sees write activity when frozen is set to SB_FREEZE_COMPLETE. */ sb->s_writers.frozen = SB_FREEZE_COMPLETE; + sb_lockdep_release(sb); up_write(&sb->s_umount); return 0; } -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 4/4] change sb_writers to use percpu_rw_semaphore 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov ` (2 preceding siblings ...) 2015-07-13 21:25 ` [PATCH 3/4] introduce sb_lockdep_release() Oleg Nesterov @ 2015-07-13 21:25 ` Oleg Nesterov 2015-07-13 22:23 ` [PATCH RFC 0/4] " Dave Chinner 2015-07-14 10:48 ` Jan Kara 5 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-13 21:25 UTC (permalink / raw) To: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra Cc: Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel __sb_start/end_write() can use percpu_down/up_read(). --- fs/super.c | 135 +++++++++++++++++----------------------------------- include/linux/fs.h | 14 +---- 2 files changed, 47 insertions(+), 102 deletions(-) diff --git a/fs/super.c b/fs/super.c index 94303fc..6e336b8 100644 --- a/fs/super.c +++ b/fs/super.c @@ -147,7 +147,7 @@ static void destroy_super(struct super_block *s) list_lru_destroy(&s->s_dentry_lru); list_lru_destroy(&s->s_inode_lru); for (i = 0; i < SB_FREEZE_LEVELS; i++) - percpu_counter_destroy(&s->s_writers.counter[i]); + percpu_free_rwsem(s->s_writers.rw_sem + i); security_sb_free(s); WARN_ON(!list_empty(&s->s_mounts)); kfree(s->s_subtype); @@ -178,14 +178,10 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) goto fail; for (i = 0; i < SB_FREEZE_LEVELS; i++) { - if (percpu_counter_init(&s->s_writers.counter[i], 0, - GFP_KERNEL) < 0) - goto fail; - lockdep_init_map(&s->s_writers.lock_map[i], sb_writers_name[i], - &type->s_writers_key[i], 0); + __percpu_init_rwsem(&s->s_writers.rw_sem[i], + sb_writers_name[i], + &type->s_writers_key[i]); } - init_waitqueue_head(&s->s_writers.wait); - init_waitqueue_head(&s->s_writers.wait_unfrozen); s->s_bdi = &noop_backing_dev_info; s->s_flags = flags; INIT_HLIST_NODE(&s->s_instances); @@ -1146,43 +1142,10 @@ out: */ void __sb_end_write(struct super_block *sb, int level) { - percpu_counter_dec(&sb->s_writers.counter[level-1]); - /* - * Make sure s_writers are updated before we wake up waiters in - * freeze_super(). - */ - smp_mb(); - if (waitqueue_active(&sb->s_writers.wait)) - wake_up(&sb->s_writers.wait); - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_); + percpu_up_read(sb->s_writers.rw_sem + level-1); } EXPORT_SYMBOL(__sb_end_write); -#ifdef CONFIG_LOCKDEP -/* - * We want lockdep to tell us about possible deadlocks with freezing but - * it's it bit tricky to properly instrument it. Getting a freeze protection - * works as getting a read lock but there are subtle problems. XFS for example - * gets freeze protection on internal level twice in some cases, which is OK - * only because we already hold a freeze protection also on higher level. Due - * to these cases we have to tell lockdep we are doing trylock when we - * already hold a freeze protection for a higher freeze level. - */ -static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock, - unsigned long ip) -{ - int i; - - if (!trylock) { - for (i = 0; i < level - 1; i++) - if (lock_is_held(&sb->s_writers.lock_map[i])) { - trylock = true; - break; - } - } - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip); -} -#endif /* * This is an internal function, please use sb_start_{write,pagefault,intwrite} @@ -1190,27 +1153,15 @@ static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock, */ int __sb_start_write(struct super_block *sb, int level, bool wait) { -retry: - if (unlikely(sb->s_writers.frozen >= level)) { - if (!wait) - return 0; - wait_event(sb->s_writers.wait_unfrozen, - sb->s_writers.frozen < level); - } - -#ifdef CONFIG_LOCKDEP - acquire_freeze_lock(sb, level, !wait, _RET_IP_); -#endif - percpu_counter_inc(&sb->s_writers.counter[level-1]); /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. + * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + * !!!!!!!!! percpu_down_read_trylock() wasn't merged yet !!!!!!!!!!!!! + * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } + if (!wait) + return 0; + + percpu_down_read(sb->s_writers.rw_sem + level-1); return 1; } EXPORT_SYMBOL(__sb_start_write); @@ -1227,42 +1178,46 @@ EXPORT_SYMBOL(__sb_start_write); */ static void sb_wait_write(struct super_block *sb, int level) { - s64 writers; - - rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_); - - do { - DEFINE_WAIT(wait); - - /* - * We use a barrier in prepare_to_wait() to separate setting - * of frozen and checking of the counter - */ - prepare_to_wait(&sb->s_writers.wait, &wait, - TASK_UNINTERRUPTIBLE); - - writers = percpu_counter_sum(&sb->s_writers.counter[level-1]); - if (writers) - schedule(); - - finish_wait(&sb->s_writers.wait, &wait); - } while (writers); + percpu_down_write(sb->s_writers.rw_sem + level-1); } +/* + * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + * !!!!!!!!!!! Move this code into kernel/locking/percpu-rwsem.c !!!!!!!!!!!!!! + * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + */ +#include "../kernel/locking/rwsem.h" /* Avoid the warning from lockdep_sys_exit() */ static void sb_lockdep_release(struct super_block *sb) { int level; + struct percpu_rw_semaphore *sem; + + for (level = 0; level < SB_FREEZE_LEVELS; ++level) { + sem = sb->s_writers.rw_sem + level; + rwsem_clear_owner(&sem->rw_sem); + rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_); + } +} + +static void sb_lockdep_acquire(struct super_block *sb) +{ + int level; + struct percpu_rw_semaphore *sem; for (level = 0; level < SB_FREEZE_LEVELS; ++level) { - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_); + sem = sb->s_writers.rw_sem + level; + rwsem_set_owner(&sem->rw_sem); /* unneeded */ + rwsem_acquire(&sem->rw_sem.dep_map, 0, 1, _RET_IP_); } } static void sb_unlock_frozen(struct super_block *sb) { - smp_wmb(); - wake_up(&sb->s_writers.wait_unfrozen); + int level; + + for (level = 0; level < SB_FREEZE_LEVELS; ++level) + percpu_up_write(sb->s_writers.rw_sem + level); } /** @@ -1323,8 +1278,6 @@ int freeze_super(struct super_block *sb) /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); - /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); @@ -1332,9 +1285,8 @@ int freeze_super(struct super_block *sb) /* Now we go and block page faults... */ down_write(&sb->s_umount); - sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); + sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; sb_wait_write(sb, SB_FREEZE_PAGEFAULT); /* All writers are done so after syncing there won't be dirty data */ @@ -1342,7 +1294,6 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1350,8 +1301,8 @@ int freeze_super(struct super_block *sb) if (ret) { printk(KERN_ERR "VFS:Filesystem freeze failed\n"); - sb->s_writers.frozen = SB_UNFROZEN; sb_unlock_frozen(sb); + sb->s_writers.frozen = SB_UNFROZEN; deactivate_locked_super(sb); return ret; } @@ -1386,6 +1337,8 @@ int thaw_super(struct super_block *sb) if (sb->s_flags & MS_RDONLY) goto out; + sb_lockdep_acquire(sb); + if (sb->s_op->unfreeze_fs) { error = sb->s_op->unfreeze_fs(sb); if (error) { @@ -1396,9 +1349,9 @@ int thaw_super(struct super_block *sb) } } + sb_unlock_frozen(sb); out: sb->s_writers.frozen = SB_UNFROZEN; - sb_unlock_frozen(sb); deactivate_locked_super(sb); return 0; diff --git a/include/linux/fs.h b/include/linux/fs.h index 35ec87e..314e2d0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1,7 +1,6 @@ #ifndef _LINUX_FS_H #define _LINUX_FS_H - #include <linux/linkage.h> #include <linux/wait.h> #include <linux/kdev_t.h> @@ -30,6 +29,7 @@ #include <linux/lockdep.h> #include <linux/percpu-rwsem.h> #include <linux/blk_types.h> +#include <linux/percpu-rwsem.h> #include <asm/byteorder.h> #include <uapi/linux/fs.h> @@ -1246,16 +1246,8 @@ enum { #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1) struct sb_writers { - /* Counters for counting writers at each level */ - struct percpu_counter counter[SB_FREEZE_LEVELS]; - wait_queue_head_t wait; /* queue for waiting for - writers / faults to finish */ - int frozen; /* Is sb frozen? */ - wait_queue_head_t wait_unfrozen; /* queue for waiting for - sb to be thawed */ -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map lock_map[SB_FREEZE_LEVELS]; -#endif + int frozen; /* Is sb frozen? */ + struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; }; struct super_block { -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov ` (3 preceding siblings ...) 2015-07-13 21:25 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov @ 2015-07-13 22:23 ` Dave Chinner 2015-07-13 22:42 ` Oleg Nesterov 2015-07-14 10:48 ` Jan Kara 5 siblings, 1 reply; 32+ messages in thread From: Dave Chinner @ 2015-07-13 22:23 UTC (permalink / raw) To: Oleg Nesterov Cc: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel, linux-fsdevel [ Please cc linux-fsdevel@vger.kernel.org on filesystem infrastructure changes! ] On Mon, Jul 13, 2015 at 11:25:36PM +0200, Oleg Nesterov wrote: > Hello, > > Al, Jan, could you comment? I mean the intent, the patches are > obviously not for inclusion yet. > > We can remove everything from struct sb_writers except frozen > (which can become a boolean, it seems) and add the array of > percpu_rw_semaphore's instead. > > __sb_start/end_write() can use percpu_down/up_read(), and > freeze/thaw_super() can use percpu_down/up_write(). > > Why: > > - Firstly, __sb_start_write() looks simply buggy. I does > __sb_end_write() if it sees ->frozen, but if it migrates > to another CPU before percpu_counter_dec() sb_wait_write() > can wrongly succeed if there is another task which holds > the same "semaphore": sb_wait_write() can miss the result > of the previous percpu_counter_inc() but see the result > of this percpu_counter_dec(). > > - This code doesn't look simple. It would be better to rely > on the generic locking code. > > - __sb_start_write() will be a little bit faster, but this > is minor. > > Todo: > > - __sb_start_write(wait => false) always fail. > > Thivial, we already have percpu_down_read_trylock() just > this patch wasn't merged yet. > > - sb_lockdep_release() and sb_lockdep_acquire() play with > percpu_rw_semaphore's internals. > > Trivial, we need a couple of new helper in percpu-rwsem.c. - try compiling XFS, watch it break on freeze lockdep annotations > - Fix get_super_thawed(), it will spin if MS_RDONLY... > > It is not clear to me what exactly should we do, but this > doesn't look hard. Perhaps it can just return if MS_RDONLY. > > - Most probably I missed something else, and I do not need > how to test. xfstests has many freeze related stress tests. IIRC, generic/068 is the test that historically causes the most problems for freeze infrastructure changes. You'll also need to test at least ext4, XFS and btrfs, because they all stress the freeze code differently. Testing XFS, in particular, is a good idea because it has several custom freeze tests that aren't run on any other filesystem type. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-13 22:23 ` [PATCH RFC 0/4] " Dave Chinner @ 2015-07-13 22:42 ` Oleg Nesterov 2015-07-13 23:14 ` Dave Chinner 0 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-13 22:42 UTC (permalink / raw) To: Dave Chinner Cc: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel, linux-fsdevel On 07/14, Dave Chinner wrote: > > [ Please cc linux-fsdevel@vger.kernel.org on filesystem > infrastructure changes! ] OK, will do. > On Mon, Jul 13, 2015 at 11:25:36PM +0200, Oleg Nesterov wrote: > > > > - sb_lockdep_release() and sb_lockdep_acquire() play with > > percpu_rw_semaphore's internals. > > > > Trivial, we need a couple of new helper in percpu-rwsem.c. > > - try compiling XFS, watch it break on freeze lockdep > annotations Thanks a lot! I see. Still trivial, xfs can use the same helpers rather the abuse lockdep directly. > > - Most probably I missed something else, and I do not need > > how to test. > > xfstests has many freeze related stress tests. IIRC, generic/068 is > the test that historically causes the most problems for freeze > infrastructure changes. You'll also need to test at least ext4, XFS > and btrfs, because they all stress the freeze code differently. > Testing XFS, in particular, is a good idea because it has several > custom freeze tests that aren't run on any other filesystem type. Thanks again. Do you see something fundamentally wrong with this change? Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-13 22:42 ` Oleg Nesterov @ 2015-07-13 23:14 ` Dave Chinner 0 siblings, 0 replies; 32+ messages in thread From: Dave Chinner @ 2015-07-13 23:14 UTC (permalink / raw) To: Oleg Nesterov Cc: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel, linux-fsdevel On Tue, Jul 14, 2015 at 12:42:37AM +0200, Oleg Nesterov wrote: > On 07/14, Dave Chinner wrote: > > > > [ Please cc linux-fsdevel@vger.kernel.org on filesystem > > infrastructure changes! ] > > OK, will do. > > > On Mon, Jul 13, 2015 at 11:25:36PM +0200, Oleg Nesterov wrote: > > > > > > - sb_lockdep_release() and sb_lockdep_acquire() play with > > > percpu_rw_semaphore's internals. > > > > > > Trivial, we need a couple of new helper in percpu-rwsem.c. > > > > - try compiling XFS, watch it break on freeze lockdep > > annotations > > Thanks a lot! I see. Still trivial, xfs can use the same helpers > rather the abuse lockdep directly. > > > > - Most probably I missed something else, and I do not need > > > how to test. > > > > xfstests has many freeze related stress tests. IIRC, generic/068 is > > the test that historically causes the most problems for freeze > > infrastructure changes. You'll also need to test at least ext4, XFS > > and btrfs, because they all stress the freeze code differently. > > Testing XFS, in particular, is a good idea because it has several > > custom freeze tests that aren't run on any other filesystem type. > > Thanks again. > > Do you see something fundamentally wrong with this change? I haven't looked particularly closely at the implementation, just enough to get an idea of the semantics of the new infrasructure (I didn't know that per-cpu rwsems existed!). The freeze code is essentially a multi-level read-optimised read/write barrier and AFAICT the per-cpu rw-sem has those semantics. From that perspective I don't see any fundamental problems, but there may be details that I've missed.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov ` (4 preceding siblings ...) 2015-07-13 22:23 ` [PATCH RFC 0/4] " Dave Chinner @ 2015-07-14 10:48 ` Jan Kara 2015-07-14 13:37 ` Oleg Nesterov 5 siblings, 1 reply; 32+ messages in thread From: Jan Kara @ 2015-07-14 10:48 UTC (permalink / raw) To: Oleg Nesterov Cc: Al Viro, Jan Kara, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel, Dave Hansen Hello, On Mon 13-07-15 23:25:36, Oleg Nesterov wrote: > Al, Jan, could you comment? I mean the intent, the patches are > obviously not for inclusion yet. Thanks for the patches! Hum, what do people have with freeze protection these days? Noone cared about it for years and sudddently two patch sets within a month :) Anyway, have you seen the patch set from Dave Hansen? It is here: https://lkml.org/lkml/2015/6/24/682 He modifies the freezing primitives in a different way. AFAICS the resulting performance of the fast path should be about the same. So unless I'm missing something and there is a significant performance advantage to Dave's patches I'm all for using a generic primitive you suggest. Can you perhaps work with Dave on some common resolution? > We can remove everything from struct sb_writers except frozen > (which can become a boolean, it seems) and add the array of I think 'int' for 'frozen' needs to stay - we need to distinguish at least three states - unfrozen, in the process of being frozen, frozen. > percpu_rw_semaphore's instead. > > __sb_start/end_write() can use percpu_down/up_read(), and > freeze/thaw_super() can use percpu_down/up_write(). > > Why: > > - Firstly, __sb_start_write() looks simply buggy. I does > __sb_end_write() if it sees ->frozen, but if it migrates > to another CPU before percpu_counter_dec() sb_wait_write() > can wrongly succeed if there is another task which holds > the same "semaphore": sb_wait_write() can miss the result > of the previous percpu_counter_inc() but see the result > of this percpu_counter_dec(). Agreed, that's a bug. > - This code doesn't look simple. It would be better to rely > on the generic locking code. Definitely agreed, I just didn't find anything usable (fast enough) at the time... > - __sb_start_write() will be a little bit faster, but this > is minor. Actually Dave could measure the gain achieved by removing the barrier. It would be good to verify that your patches achieve a similar gain. > Todo: > > - __sb_start_write(wait => false) always fail. > > Thivial, we already have percpu_down_read_trylock() just > this patch wasn't merged yet. > > - sb_lockdep_release() and sb_lockdep_acquire() play with > percpu_rw_semaphore's internals. > > Trivial, we need a couple of new helper in percpu-rwsem.c. > > - Fix get_super_thawed(), it will spin if MS_RDONLY... > > It is not clear to me what exactly should we do, but this > doesn't look hard. Perhaps it can just return if MS_RDONLY. What's the exact problem here? > - Most probably I missed something else, and I do not need > how to test. > > Finally. freeze_super() calls synchronize_sched_expedited() 3 times in > a row. This is bad and just stupid. But if we change percpu_rw_semaphore > to use rcu_sync (see https://lkml.org/lkml/2015/7/11/211) we can avoid > this and do synchronize_sched() only once. Just we need some more simple > changes in percpu-rwsem.c, so that all sb_writers->rw_sem[] semaphores > could use the single sb_writers->rss. > > In this case destroy_super() needs some modifications too, > percpu_free_rwsem() will be might_sleep(). But this looks simple too. Otherwise the general conversion idea looks good to me. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-14 10:48 ` Jan Kara @ 2015-07-14 13:37 ` Oleg Nesterov 2015-07-14 21:17 ` Dave Hansen 0 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-14 13:37 UTC (permalink / raw) To: Jan Kara Cc: Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel, Dave Hansen On 07/14, Jan Kara wrote: > > Hello, > > On Mon 13-07-15 23:25:36, Oleg Nesterov wrote: > > Al, Jan, could you comment? I mean the intent, the patches are > > obviously not for inclusion yet. > > Thanks for the patches! Hum, what do people have with freeze protection > these days? Noone cared about it for years and sudddently two patch sets > within a month :) Anyway, have you seen the patch set from Dave Hansen? > > It is here: https://lkml.org/lkml/2015/6/24/682 > He modifies the freezing primitives in a different way. AFAICS the > resulting performance of the fast path should be about the same. At first glance, 2-3 do something similar, yes... > So unless > I'm missing something and there is a significant performance advantage to > Dave's patches I'm all for using a generic primitive you suggest. I think percpu_rw_semaphore looks a bit better. And even a bit faster. And it will not block __sb_start_write() entirely while freeze_super() sleeps in synchronize_rcu(). freeze_super() should be faster too after rcu_sync changes, but this is not that important. But again, to me the main advantage is that we can use the generic primitives and remove this nontrivial code in fs/super.c. > Can you perhaps work with Dave on some common resolution? Dave, what do you think? Will you agree with percpu_rw_semaphore ? > > - __sb_start_write() will be a little bit faster, but this > > is minor. > > Actually Dave could measure the gain achieved by removing the barrier. It > would be good to verify that your patches achieve a similar gain. The fast path in percpu_down_read() is really fast, it does a plain LOAD plus __this_cpu_add() under preempt_disable(). I doubt this can be improved. The actual code is: static bool update_fast_ctr(struct percpu_rw_semaphore *brw, unsigned int val) { bool success = false; preempt_disable(); if (likely(!atomic_read(&brw->write_ctr))) { __this_cpu_add(*brw->fast_read_ctr, val); success = true; } preempt_enable(); return success; } void percpu_down_read(struct percpu_rw_semaphore *brw) { might_sleep(); if (likely(update_fast_ctr(brw, +1))) { rwsem_acquire_read(&brw->rw_sem.dep_map, 0, 0, _RET_IP_); return; } down_read(&brw->rw_sem); atomic_inc(&brw->slow_read_ctr); __up_read(&brw->rw_sem); } > > - Fix get_super_thawed(), it will spin if MS_RDONLY... > > > > It is not clear to me what exactly should we do, but this > > doesn't look hard. Perhaps it can just return if MS_RDONLY. > > What's the exact problem here? Note that freeze_super() does not do sb_wait_write() (which blocks __sb_start_write) if MS_RDONLY. This means that after 1/4 get_super_thawed() will spin until SB_UNFROZEN in this case, sb_start_write() won't block. But please forget, this is not the problem. I mean, afaics this is easy to fix, but the initial version should just keep ->wait_unfrozen specially for get_super_thawed(), then we can remove it in a separate patch. Thanks! Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-14 13:37 ` Oleg Nesterov @ 2015-07-14 21:17 ` Dave Hansen 2015-07-14 21:22 ` Oleg Nesterov 0 siblings, 1 reply; 32+ messages in thread From: Dave Hansen @ 2015-07-14 21:17 UTC (permalink / raw) To: Oleg Nesterov, Jan Kara Cc: Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/14/2015 06:37 AM, Oleg Nesterov wrote: > On 07/14, Jan Kara wrote: >> So unless >> I'm missing something and there is a significant performance advantage to >> Dave's patches I'm all for using a generic primitive you suggest. > > I think percpu_rw_semaphore looks a bit better. And even a bit faster. > And it will not block __sb_start_write() entirely while freeze_super() > sleeps in synchronize_rcu(). That's true, but freeze_super() and the code blocked by it is a super-rare path compared with write(). > freeze_super() should be faster too after rcu_sync changes, but this > is not that important. > > But again, to me the main advantage is that we can use the generic > primitives and remove this nontrivial code in fs/super.c. > >> Can you perhaps work with Dave on some common resolution? > > Dave, what do you think? Will you agree with percpu_rw_semaphore ? Using my little write-1-byte test (under will-it-scale), your 4 patches improves the number of writes/sec by 12%. My 3 patches improve the number of writes/sec by 32%. My patches manage to get rid of the memory barriers entirely in the fast path. Your approach keeps the barriers. Test: https://www.sr71.net/~dave/intel/write1byte.c ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-14 21:17 ` Dave Hansen @ 2015-07-14 21:22 ` Oleg Nesterov 2015-07-14 21:41 ` Dave Hansen 0 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-14 21:22 UTC (permalink / raw) To: Dave Hansen Cc: Jan Kara, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/14, Dave Hansen wrote: > > On 07/14/2015 06:37 AM, Oleg Nesterov wrote: > > On 07/14, Jan Kara wrote: > >> So unless > >> I'm missing something and there is a significant performance advantage to > >> Dave's patches I'm all for using a generic primitive you suggest. > > > > I think percpu_rw_semaphore looks a bit better. And even a bit faster. > > And it will not block __sb_start_write() entirely while freeze_super() > > sleeps in synchronize_rcu(). > > That's true, but freeze_super() and the code blocked by it is a > super-rare path compared with write(). Yes, agreed, this is not that important too. > > freeze_super() should be faster too after rcu_sync changes, but this > > is not that important. > > > > But again, to me the main advantage is that we can use the generic > > primitives and remove this nontrivial code in fs/super.c. > > > >> Can you perhaps work with Dave on some common resolution? > > > > Dave, what do you think? Will you agree with percpu_rw_semaphore ? > > Using my little write-1-byte test (under will-it-scale), your 4 patches > improves the number of writes/sec by 12%. My 3 patches improve the > number of writes/sec by 32%. Thanks... I'll try to understand. Just in case, could you send me (offlist) these 3 patches? > My patches manage to get rid of the memory barriers entirely in the fast > path. Your approach keeps the barriers. Where? No, they do not keep the barriers. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-14 21:22 ` Oleg Nesterov @ 2015-07-14 21:41 ` Dave Hansen 2015-07-15 6:47 ` Jan Kara 0 siblings, 1 reply; 32+ messages in thread From: Dave Hansen @ 2015-07-14 21:41 UTC (permalink / raw) To: Oleg Nesterov Cc: Jan Kara, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/14/2015 02:22 PM, Oleg Nesterov wrote: >> Using my little write-1-byte test (under will-it-scale), your 4 patches >> improves the number of writes/sec by 12%. My 3 patches improve the >> number of writes/sec by 32%. I looked at it again. I tested with this patch in addition to the ones modifying __sb_start/end_write(): https://lkml.org/lkml/2015/6/24/682 That is where the performance delta came from. Your patches (plus the fsnotify optimization) perform very similarly to my approach. Yours remove so much code that I think they are the preferable approach. They don't compile with lockdep on, btw. :) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-14 21:41 ` Dave Hansen @ 2015-07-15 6:47 ` Jan Kara 2015-07-15 18:19 ` Oleg Nesterov 0 siblings, 1 reply; 32+ messages in thread From: Jan Kara @ 2015-07-15 6:47 UTC (permalink / raw) To: Dave Hansen Cc: Oleg Nesterov, Jan Kara, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Tue 14-07-15 14:41:13, Dave Hansen wrote: > On 07/14/2015 02:22 PM, Oleg Nesterov wrote: > >> Using my little write-1-byte test (under will-it-scale), your 4 patches > >> improves the number of writes/sec by 12%. My 3 patches improve the > >> number of writes/sec by 32%. > > I looked at it again. I tested with this patch in addition to the ones > modifying __sb_start/end_write(): > > https://lkml.org/lkml/2015/6/24/682 > > That is where the performance delta came from. Your patches (plus the > fsnotify optimization) perform very similarly to my approach. > > Yours remove so much code that I think they are the preferable approach. > > They don't compile with lockdep on, btw. :) Great, thanks for hashing it out. So I'm also in favor of Oleg's approach as well. We just have to wait until he fixes the outstanding issues with his code. Dave, just send your fsnotify patch separately to AKPM - he usually merges fsnotify stuff. Thanks. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-15 6:47 ` Jan Kara @ 2015-07-15 18:19 ` Oleg Nesterov 2015-07-16 7:26 ` Jan Kara 0 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-15 18:19 UTC (permalink / raw) To: Jan Kara Cc: Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/15, Jan Kara wrote: > > On Tue 14-07-15 14:41:13, Dave Hansen wrote: > > > > I looked at it again. I tested with this patch in addition to the ones > > modifying __sb_start/end_write(): > > > > https://lkml.org/lkml/2015/6/24/682 > > > > That is where the performance delta came from. Your patches (plus the > > fsnotify optimization) perform very similarly to my approach. > > > > Yours remove so much code that I think they are the preferable approach. > > > > They don't compile with lockdep on, btw. :) > > Great, thanks for hashing it out. Yes, thanks Dave! > So I'm also in favor of Oleg's approach > as well. We just have to wait until he fixes the outstanding issues with > his code. Yes. I'll try to make the working version, hopefully this week. But, > Dave, just send your fsnotify patch separately to AKPM - he > usually merges fsnotify stuff. Perhaps it makes to merge other 2 patches from Dave first? (those which change __sb_start/end_write to rely on RCU). Afaics these changes are straightforward and correct. Although I'd suggest to use preempt_disable() and synchronize_sched() instead. I will be happy to (try to) make this conversion on top of his changes. Because I do not want to delay the performance improvements and I do not know when exactly I'll send the next version: I need to finish the previous discussion about rcu_sync first. And the necessary changes in fs/super.c depend on whether percpu_rw_semaphore will have rcu_sync or not (not too much, only destroy_super() depends, but still). And of course, I am worried that I missed something and percpu_rw_semaphore can't work for some reason. The code in fs/super.c looks simple, but it seems that filesystems do the "strange" things with lockdep at least. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-15 18:19 ` Oleg Nesterov @ 2015-07-16 7:26 ` Jan Kara 2015-07-16 7:30 ` Dave Hansen 2015-07-16 17:32 ` Oleg Nesterov 0 siblings, 2 replies; 32+ messages in thread From: Jan Kara @ 2015-07-16 7:26 UTC (permalink / raw) To: Oleg Nesterov Cc: Jan Kara, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Wed 15-07-15 20:19:20, Oleg Nesterov wrote: > On 07/15, Jan Kara wrote: > > So I'm also in favor of Oleg's approach > > as well. We just have to wait until he fixes the outstanding issues with > > his code. > > Yes. I'll try to make the working version, hopefully this week. > > But, > > > Dave, just send your fsnotify patch separately to AKPM - he > > usually merges fsnotify stuff. > > Perhaps it makes to merge other 2 patches from Dave first? (those which > change __sb_start/end_write to rely on RCU). Afaics these changes are > straightforward and correct. Although I'd suggest to use preempt_disable() > and synchronize_sched() instead. I will be happy to (try to) make this > conversion on top of his changes. > > Because I do not want to delay the performance improvements and I do not > know when exactly I'll send the next version: I need to finish the previous > discussion about rcu_sync first. And the necessary changes in fs/super.c > depend on whether percpu_rw_semaphore will have rcu_sync or not (not too > much, only destroy_super() depends, but still). > > And of course, I am worried that I missed something and percpu_rw_semaphore > can't work for some reason. The code in fs/super.c looks simple, but it > seems that filesystems do the "strange" things with lockdep at least. So Dave's patches would go in only in the next merge window anyway so we still have like two-three weeks to decide which patchset to take. If you think it will take you longer, then merging Dave's patches makes some sense although I personally don't think the issue is so important that we have to fix it ASAP and eventual delay of one more release would be OK for me. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-16 7:26 ` Jan Kara @ 2015-07-16 7:30 ` Dave Hansen 2015-07-16 8:55 ` Jan Kara 2015-07-16 17:32 ` Oleg Nesterov 1 sibling, 1 reply; 32+ messages in thread From: Dave Hansen @ 2015-07-16 7:30 UTC (permalink / raw) To: Jan Kara, Oleg Nesterov Cc: Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/16/2015 12:26 AM, Jan Kara wrote: > So Dave's patches would go in only in the next merge window anyway so we > still have like two-three weeks to decide which patchset to take. If you > think it will take you longer, then merging Dave's patches makes some sense > although I personally don't think the issue is so important that we have to > fix it ASAP and eventual delay of one more release would be OK for me. I thought of one more thing. Oleg's patches are fast in the uncontended case. My testing of his code so far was all single-threaded. Will it slow down if we throw a bunch of threads at it an see contention? I don't think my RCU approach will. I can benchmark in the next few days on a larger machine. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-16 7:30 ` Dave Hansen @ 2015-07-16 8:55 ` Jan Kara 0 siblings, 0 replies; 32+ messages in thread From: Jan Kara @ 2015-07-16 8:55 UTC (permalink / raw) To: Dave Hansen Cc: Jan Kara, Oleg Nesterov, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Thu 16-07-15 00:30:27, Dave Hansen wrote: > On 07/16/2015 12:26 AM, Jan Kara wrote: > > So Dave's patches would go in only in the next merge window anyway so we > > still have like two-three weeks to decide which patchset to take. If you > > think it will take you longer, then merging Dave's patches makes some sense > > although I personally don't think the issue is so important that we have to > > fix it ASAP and eventual delay of one more release would be OK for me. > > I thought of one more thing. > > Oleg's patches are fast in the uncontended case. My testing of his code > so far was all single-threaded. Will it slow down if we throw a bunch > of threads at it an see contention? I don't think my RCU approach will. > > I can benchmark in the next few days on a larger machine. Well, his semaphore is percpu as well so I don't think there should be a difference (there may be some difference when freeze is in progress but that doesn't really matter). But it's always good to verify. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-16 7:26 ` Jan Kara 2015-07-16 7:30 ` Dave Hansen @ 2015-07-16 17:32 ` Oleg Nesterov 2015-07-17 1:27 ` Dave Chinner 1 sibling, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-16 17:32 UTC (permalink / raw) To: Jan Kara Cc: Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/16, Jan Kara wrote: > > On Wed 15-07-15 20:19:20, Oleg Nesterov wrote: > > > > Perhaps it makes to merge other 2 patches from Dave first? (those which > > change __sb_start/end_write to rely on RCU). Afaics these changes are > > straightforward and correct. Although I'd suggest to use preempt_disable() > > and synchronize_sched() instead. I will be happy to (try to) make this > > conversion on top of his changes. > > > > Because I do not want to delay the performance improvements and I do not > > know when exactly I'll send the next version: I need to finish the previous > > discussion about rcu_sync first. And the necessary changes in fs/super.c > > depend on whether percpu_rw_semaphore will have rcu_sync or not (not too > > much, only destroy_super() depends, but still). > > > > And of course, I am worried that I missed something and percpu_rw_semaphore > > can't work for some reason. The code in fs/super.c looks simple, but it > > seems that filesystems do the "strange" things with lockdep at least. > > So Dave's patches would go in only in the next merge window anyway so we > still have like two-three weeks to decide which patchset to take. OK, good. > If you > think it will take you longer, Hopefully not. > then merging Dave's patches makes some sense > although I personally don't think the issue is so important that we have to > fix it ASAP and eventual delay of one more release would be OK for me. OK. I will try to do this in any case, I just wanted to say that I can equally do this on top of Dave's patches. To remind, I need to finish the discussion about percpu_rw_semaphore and rcu_sync, then I'll try to make v2. And. The biggest problem is lockdep. Everything else looks really simple although of course I could miss something. And not only because the filesystems abuse lockdep and thus we need some cleanups first. Unless I am totally confused fs/super.c needs some cleanups (and fixes) too, with or without this conversion. Say, acquire_freeze_lock() logic does do not look right: - wait_event(.frozen < level) without rwsem_acquire_read() is just wrong from lockdep perspective. If we are going to deadlock because the caller is buggy, lockdep can't warn us. - __sb_start_write() can race with thaw_super() + freeze_super(), and after "goto retry" the 2nd acquire_freeze_lock() is wrong. - The "tell lockdep we are doing trylock" hack doesn't look nice. I think this is correct, but this logic should be more explicit. Yes, the recursive read_lock() is fine if we hold the lock on higher level. But we do not need to fool lockdep. If we can not deadlock in this case (and I agree we can't) we should simply use wait == F consistently. Something like this (not even compiled tested): static int do_sb_start_write(struct super_block *sb, int level, bool wait, unsigned long ip) { if (wait) rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip); retry: if (unlikely(sb->s_writers.frozen >= level)) { if (!wait) return 0; wait_event(sb->s_writers.wait_unfrozen, sb->s_writers.frozen < level); } percpu_counter_inc(&sb->s_writers.counter[level-1]); /* * Make sure counter is updated before we check for frozen. * freeze_super() first sets frozen and then checks the counter. */ smp_mb(); if (unlikely(sb->s_writers.frozen >= level)) { __sb_end_write(sb, level); goto retry; } if (!wait) rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip); return 1; } /* * This is an internal function, please use sb_start_{write,pagefault,intwrite} * instead. */ int __sb_start_write(struct super_block *sb, int level, bool wait) { bool cantbelocked = false; int ret; #ifdef CONFIG_LOCKDEP /* * We want lockdep to tell us about possible deadlocks with freezing but * it's it bit tricky to properly instrument it. Getting a freeze protection * works as getting a read lock but there are subtle problems. XFS for example * gets freeze protection on internal level twice in some cases, which is OK * only because we already hold a freeze protection also on higher level. Due * to these cases we have to use wait == F (trylock mode) which must not fail. */ if (wait) { int i; for (i = 0; i < level - 1; i++) if (lock_is_held(&sb->s_writers.lock_map[i])) { cantbelocked = true; break; } } #endif ret = do_sb_start_write(sb, level, wait && !cantbelocked, _RET_IP_); WARN_ON(cantbelocked & !ret); return ret; } This should not generate the additional code if CONFIG_LOCKDEP=n and After this patch it will be trivial to convert __sb_start_write(), but we need some more cleanups. And perhaps I'll send some changes (like above) separately, because again, I think they make sense in any case. In short: I'll try to make v2 asap, but this is all I can promise ;) Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-16 17:32 ` Oleg Nesterov @ 2015-07-17 1:27 ` Dave Chinner 2015-07-17 17:31 ` Oleg Nesterov 0 siblings, 1 reply; 32+ messages in thread From: Dave Chinner @ 2015-07-17 1:27 UTC (permalink / raw) To: Oleg Nesterov Cc: Jan Kara, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Thu, Jul 16, 2015 at 07:32:56PM +0200, Oleg Nesterov wrote: > > #ifdef CONFIG_LOCKDEP > /* > * We want lockdep to tell us about possible deadlocks with freezing but > * it's it bit tricky to properly instrument it. Getting a freeze protection > * works as getting a read lock but there are subtle problems. XFS for example > * gets freeze protection on internal level twice in some cases, which is OK Sorry, I've missed something here - where is XFS nesting sb_start_intwrite() calls? XFS only has a call to sb_start_intwrite() in xfs_trans_alloc() and an open coded equivalent in xfs_iomap_write_unwritten(). However, we cannot create nested transaction contexts as doing so is *guaranteed to deadlock the journal*. (e.g. why we use PF_FSTRANS to trigger GFP_NOFS allocation as it prevents direct memory reclaim from causing nested transactions.) Hence if we taking multiple FREEZE_FS level references at a time then there is a bug that needs fixing.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-17 1:27 ` Dave Chinner @ 2015-07-17 17:31 ` Oleg Nesterov 2015-07-17 22:40 ` Dave Chinner 0 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-17 17:31 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/17, Dave Chinner wrote: > > On Thu, Jul 16, 2015 at 07:32:56PM +0200, Oleg Nesterov wrote: > > > > #ifdef CONFIG_LOCKDEP > > /* > > * We want lockdep to tell us about possible deadlocks with freezing but > > * it's it bit tricky to properly instrument it. Getting a freeze protection > > * works as getting a read lock but there are subtle problems. XFS for example > > * gets freeze protection on internal level twice in some cases, which is OK > > Sorry, I've missed something here - where is XFS nesting > sb_start_intwrite() calls? Heh ;) I too tried to understand thi but failed. I was not surprized, I know nothing about fs/. Dave, I didn't write this comment. Please look at acquire_freeze_lock(). If we can remove this logic - great! but this needs a separate change. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-17 17:31 ` Oleg Nesterov @ 2015-07-17 22:40 ` Dave Chinner 2015-07-20 8:26 ` Jan Kara 2015-07-20 16:23 ` Oleg Nesterov 0 siblings, 2 replies; 32+ messages in thread From: Dave Chinner @ 2015-07-17 22:40 UTC (permalink / raw) To: Oleg Nesterov Cc: Jan Kara, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Fri, Jul 17, 2015 at 07:31:17PM +0200, Oleg Nesterov wrote: > On 07/17, Dave Chinner wrote: > > > > On Thu, Jul 16, 2015 at 07:32:56PM +0200, Oleg Nesterov wrote: > > > > > > #ifdef CONFIG_LOCKDEP > > > /* > > > * We want lockdep to tell us about possible deadlocks with freezing but > > > * it's it bit tricky to properly instrument it. Getting a freeze protection > > > * works as getting a read lock but there are subtle problems. XFS for example > > > * gets freeze protection on internal level twice in some cases, which is OK > > > > Sorry, I've missed something here - where is XFS nesting > > sb_start_intwrite() calls? > > Heh ;) I too tried to understand thi but failed. I was not surprized, > I know nothing about fs/. > > Dave, I didn't write this comment. Please look at acquire_freeze_lock(). > If we can remove this logic - great! but this needs a separate change. Oh, I think I know what it was - when we duplicate a transaction for a rolling commit, we do it before committing the current transaction is committed. I *think* that used to take a second freeze reference, which only existed until the first transaction was committed. We do things a bit differently now - we hold a state flag on the transaction to indicate it needs to release the freeze reference when it is freed and we pass it to the new transaction so that the first transaction commit doesn't release it. So, yes, it may well be a stale comment now. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-17 22:40 ` Dave Chinner @ 2015-07-20 8:26 ` Jan Kara 2015-07-22 21:09 ` Oleg Nesterov 2015-07-20 16:23 ` Oleg Nesterov 1 sibling, 1 reply; 32+ messages in thread From: Jan Kara @ 2015-07-20 8:26 UTC (permalink / raw) To: Dave Chinner Cc: Oleg Nesterov, Jan Kara, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On Sat 18-07-15 08:40:15, Dave Chinner wrote: > On Fri, Jul 17, 2015 at 07:31:17PM +0200, Oleg Nesterov wrote: > > On 07/17, Dave Chinner wrote: > > > > > > On Thu, Jul 16, 2015 at 07:32:56PM +0200, Oleg Nesterov wrote: > > > > > > > > #ifdef CONFIG_LOCKDEP > > > > /* > > > > * We want lockdep to tell us about possible deadlocks with freezing but > > > > * it's it bit tricky to properly instrument it. Getting a freeze protection > > > > * works as getting a read lock but there are subtle problems. XFS for example > > > > * gets freeze protection on internal level twice in some cases, which is OK > > > > > > Sorry, I've missed something here - where is XFS nesting > > > sb_start_intwrite() calls? > > > > Heh ;) I too tried to understand thi but failed. I was not surprized, > > I know nothing about fs/. > > > > Dave, I didn't write this comment. Please look at acquire_freeze_lock(). > > If we can remove this logic - great! but this needs a separate change. > > Oh, I think I know what it was - when we duplicate a transaction for > a rolling commit, we do it before committing the current transaction > is committed. I *think* that used to take a second freeze reference, > which only existed until the first transaction was committed. We do > things a bit differently now - we hold a state flag on the > transaction to indicate it needs to release the freeze reference > when it is freed and we pass it to the new transaction so that the > first transaction commit doesn't release it. > > So, yes, it may well be a stale comment now. Yeah, as far as I remember this was the reason why I added the comment. So Oleg, feel free to remove the special code and run xfstests with XFS and lockdep enabled to verify there are really no issues. Thanks! Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-20 8:26 ` Jan Kara @ 2015-07-22 21:09 ` Oleg Nesterov 0 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-22 21:09 UTC (permalink / raw) To: Jan Kara Cc: Dave Chinner, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/20, Jan Kara wrote: > > On Sat 18-07-15 08:40:15, Dave Chinner wrote: > > > > So, yes, it may well be a stale comment now. > > Yeah, as far as I remember this was the reason why I added the comment. So > Oleg, feel free to remove the special code and run xfstests with XFS and > lockdep enabled to verify there are really no issues. Thanks! OK, thanks, will try to do later. Please see another series I'll send in a few minutes. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore 2015-07-17 22:40 ` Dave Chinner 2015-07-20 8:26 ` Jan Kara @ 2015-07-20 16:23 ` Oleg Nesterov 1 sibling, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-20 16:23 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Dave Hansen, Al Viro, Linus Torvalds, Paul McKenney, Peter Zijlstra, Daniel Wagner, Davidlohr Bueso, Ingo Molnar, Tejun Heo, linux-kernel On 07/18, Dave Chinner wrote: > > On Fri, Jul 17, 2015 at 07:31:17PM +0200, Oleg Nesterov wrote: > > > > Dave, I didn't write this comment. Please look at acquire_freeze_lock(). > > If we can remove this logic - great! but this needs a separate change. > > Oh, I think I know what it was - when we duplicate a transaction for > a rolling commit, we do it before committing the current transaction > is committed. I *think* that used to take a second freeze reference, > which only existed until the first transaction was committed. We do > things a bit differently now - we hold a state flag on the > transaction to indicate it needs to release the freeze reference > when it is freed and we pass it to the new transaction so that the > first transaction commit doesn't release it. Just fyi, please do not assume I can understand the explanation above ;) > So, yes, it may well be a stale comment now. Perhaps. But this needs a separate change. Plus even if we remove this hack, this code has other problems with lockdep. I'll send the "lockdep" fixes/cleanup today, please review. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 0/4] change sb_writers to use percpu_rw_semaphore @ 2015-07-22 21:15 Oleg Nesterov 2015-07-22 21:15 ` [PATCH 4/4] " Oleg Nesterov 0 siblings, 1 reply; 32+ messages in thread From: Oleg Nesterov @ 2015-07-22 21:15 UTC (permalink / raw) To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara Cc: Paul E. McKenney, Peter Zijlstra, linux-fsdevel, linux-kernel On top of "[PATCH 0/4] sb_write: lockdep fixes/cleanups" series. Now that it was reviewed (thanks Jan!), let me send the actual conversion. 1-2 add the simple percpu_rw_semaphore changes, this does not conflict with the pending rcu_sync changes. 3/4 is really ugly but please see the changelog, this is the temporary kludge to avoid the problems with other percpu-rwsem changes routed via another tree. 4/4 looks simple and straightforward after the previous series. Testing. Well, so far I only verified that ioctl(FIFREEZE) + ioctl(FITHAW) seems to wors "as expected" on my testing machine with ext3. So probably this needs more testing. Will try to do this later. And after that we can hopefully remove the "trylock" hack in __sb_start_write(), this series doesn't remove it. But. I will be travelling till the end of the next week, and I'm not sure I will have the internet access. So let me apologize in advance if (most probably) I won't be able to reply until I return. Please review. Oleg. fs/super.c | 134 +++++++++++++++-------------------------- include/linux/fs.h | 22 +++---- include/linux/percpu-rwsem.h | 20 ++++++ kernel/locking/percpu-rwsem.c | 13 ++++ 4 files changed, 89 insertions(+), 100 deletions(-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 4/4] change sb_writers to use percpu_rw_semaphore 2015-07-22 21:15 [PATCH " Oleg Nesterov @ 2015-07-22 21:15 ` Oleg Nesterov 2015-07-22 21:34 ` Oleg Nesterov ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-22 21:15 UTC (permalink / raw) To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara Cc: Paul E. McKenney, Peter Zijlstra, linux-fsdevel, linux-kernel We can remove everything from struct sb_writers except frozen and add the array of percpu_rw_semaphore's instead. This patch doesn't remove sb_writers->wait_unfrozen yet, we keep it for get_super_thawed(). We will probably remove it later. This change tries to address the following problems: - Firstly, __sb_start_write() looks simply buggy. It does __sb_end_write() if it sees ->frozen, but if it migrates to another CPU before percpu_counter_dec(), sb_wait_write() can wrongly succeed if there is another task which holds the same "semaphore": sb_wait_write() can miss the result of the previous percpu_counter_inc() but see the result of this percpu_counter_dec(). - As Dave Hansen reports, it is suboptimal. The trivial microbenchmark that writes to a tmpfs file in a loop runs 12% faster if we change this code to rely on RCU and kill the memory barriers. - This code doesn't look simple. It would be better to rely on the generic locking code. According to Dave, this change adds the same performance improvement. Perhaps we should also cleanup the usage of ->frozen. It would be better to set/clear (say) SB_FREEZE_WRITE with the corresponding write-lock held. Currently freeze_super() has to set SB_FREEZE_WRITE before sb_wait_write(SB_FREEZE_WRITE) to avoid the race with itself, we can add another state. The "From now on, no new normal writers can start" removed by this patch was not really correct. Note: with this change both freeze_super() and thaw_super() will do synchronize_sched_expedited() 3 times. This is just ugly. But: - This will be "fixed" by the rcu_sync changes we are going to merge. After that freeze_super()->percpu_down_write() will use synchronize_sched(), and thaw_super() won't use synchronize() at all. This doesn't need any changes in fs/super.c. - Once we merge rcu_sync changes, we can also change super.c so that all wb_write->rw_sem's will share the single ->rss in struct sb_writes, then freeze_super() will need only one synchronize_sched(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/super.c | 113 ++++++++++++++-------------------------------------- include/linux/fs.h | 19 +++------ 2 files changed, 36 insertions(+), 96 deletions(-) diff --git a/fs/super.c b/fs/super.c index 886fddf..29b811b 100644 --- a/fs/super.c +++ b/fs/super.c @@ -142,7 +142,7 @@ static void destroy_super_work(struct work_struct *work) int i; for (i = 0; i < SB_FREEZE_LEVELS; i++) - percpu_counter_destroy(&s->s_writers.counter[i]); + percpu_free_rwsem(&s->s_writers.rw_sem[i]); kfree(s); } @@ -193,13 +193,11 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) goto fail; for (i = 0; i < SB_FREEZE_LEVELS; i++) { - if (percpu_counter_init(&s->s_writers.counter[i], 0, - GFP_KERNEL) < 0) + if (__percpu_init_rwsem(&s->s_writers.rw_sem[i], + sb_writers_name[i], + &type->s_writers_key[i])) goto fail; - lockdep_init_map(&s->s_writers.lock_map[i], sb_writers_name[i], - &type->s_writers_key[i], 0); } - init_waitqueue_head(&s->s_writers.wait); init_waitqueue_head(&s->s_writers.wait_unfrozen); s->s_bdi = &noop_backing_dev_info; s->s_flags = flags; @@ -1161,47 +1159,10 @@ out: */ void __sb_end_write(struct super_block *sb, int level) { - percpu_counter_dec(&sb->s_writers.counter[level-1]); - /* - * Make sure s_writers are updated before we wake up waiters in - * freeze_super(). - */ - smp_mb(); - if (waitqueue_active(&sb->s_writers.wait)) - wake_up(&sb->s_writers.wait); - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_); + percpu_up_read(sb->s_writers.rw_sem + level-1); } EXPORT_SYMBOL(__sb_end_write); -static int do_sb_start_write(struct super_block *sb, int level, bool wait, - unsigned long ip) -{ - if (wait) - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip); -retry: - if (unlikely(sb->s_writers.frozen >= level)) { - if (!wait) - return 0; - wait_event(sb->s_writers.wait_unfrozen, - sb->s_writers.frozen < level); - } - - percpu_counter_inc(&sb->s_writers.counter[level-1]); - /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. - */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } - - if (!wait) - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip); - return 1; -} - /* * This is an internal function, please use sb_start_{write,pagefault,intwrite} * instead. @@ -1209,7 +1170,7 @@ retry: int __sb_start_write(struct super_block *sb, int level, bool wait) { bool force_trylock = false; - int ret; + int ret = 1; #ifdef CONFIG_LOCKDEP /* @@ -1225,13 +1186,17 @@ int __sb_start_write(struct super_block *sb, int level, bool wait) int i; for (i = 0; i < level - 1; i++) - if (lock_is_held(&sb->s_writers.lock_map[i])) { + if (percpu_rwsem_is_held(sb->s_writers.rw_sem + i)) { force_trylock = true; break; } } #endif - ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_); + if (wait && !force_trylock) + percpu_down_read(sb->s_writers.rw_sem + level-1); + else + ret = percpu_down_read_trylock(sb->s_writers.rw_sem + level-1); + WARN_ON(force_trylock & !ret); return ret; } @@ -1249,25 +1214,7 @@ EXPORT_SYMBOL(__sb_start_write); */ static void sb_wait_write(struct super_block *sb, int level) { - s64 writers; - - rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_); - - do { - DEFINE_WAIT(wait); - /* - * We use a barrier in prepare_to_wait() to separate setting - * of frozen and checking of the counter - */ - prepare_to_wait(&sb->s_writers.wait, &wait, - TASK_UNINTERRUPTIBLE); - - writers = percpu_counter_sum(&sb->s_writers.counter[level-1]); - if (writers) - schedule(); - - finish_wait(&sb->s_writers.wait, &wait); - } while (writers); + percpu_down_write(sb->s_writers.rw_sem + level-1); } static void sb_freeze_release(struct super_block *sb) @@ -1275,7 +1222,7 @@ static void sb_freeze_release(struct super_block *sb) int level; /* Avoid the warning from lockdep_sys_exit() */ for (level = 0; level < SB_FREEZE_LEVELS; ++level) - rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_); + percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_); } static void sb_freeze_acquire(struct super_block *sb) @@ -1283,7 +1230,15 @@ static void sb_freeze_acquire(struct super_block *sb) int level; for (level = 0; level < SB_FREEZE_LEVELS; ++level) - rwsem_acquire(sb->s_writers.lock_map + level, 0, 1, _THIS_IP_); + percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_); +} + +static void sb_freeze_unlock(struct super_block *sb) +{ + int level; + + for (level = 0; level < SB_FREEZE_LEVELS; ++level) + percpu_up_write(sb->s_writers.rw_sem + level); } /** @@ -1342,20 +1297,14 @@ int freeze_super(struct super_block *sb) return 0; } - /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); - /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); - sb_wait_write(sb, SB_FREEZE_WRITE); + down_write(&sb->s_umount); /* Now we go and block page faults... */ - down_write(&sb->s_umount); sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); - sb_wait_write(sb, SB_FREEZE_PAGEFAULT); /* All writers are done so after syncing there won't be dirty data */ @@ -1363,7 +1312,6 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1372,9 +1320,8 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + sb_freeze_unlock(sb); wake_up(&sb->s_writers.wait_unfrozen); - sb_freeze_release(sb); deactivate_locked_super(sb); return ret; } @@ -1406,8 +1353,10 @@ int thaw_super(struct super_block *sb) return -EINVAL; } - if (sb->s_flags & MS_RDONLY) + if (sb->s_flags & MS_RDONLY) { + sb->s_writers.frozen = SB_UNFROZEN; goto out; + } sb_freeze_acquire(sb); @@ -1422,13 +1371,11 @@ int thaw_super(struct super_block *sb) } } - sb_freeze_release(sb); -out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + sb_freeze_unlock(sb); +out: wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); - return 0; } EXPORT_SYMBOL(thaw_super); diff --git a/include/linux/fs.h b/include/linux/fs.h index 6addccc..fb2cb4a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1,7 +1,6 @@ #ifndef _LINUX_FS_H #define _LINUX_FS_H - #include <linux/linkage.h> #include <linux/wait.h> #include <linux/kdev_t.h> @@ -31,6 +30,7 @@ #include <linux/percpu-rwsem.h> #include <linux/blk_types.h> #include <linux/workqueue.h> +#include <linux/percpu-rwsem.h> #include <asm/byteorder.h> #include <uapi/linux/fs.h> @@ -1247,16 +1247,9 @@ enum { #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1) struct sb_writers { - /* Counters for counting writers at each level */ - struct percpu_counter counter[SB_FREEZE_LEVELS]; - wait_queue_head_t wait; /* queue for waiting for - writers / faults to finish */ - int frozen; /* Is sb frozen? */ - wait_queue_head_t wait_unfrozen; /* queue for waiting for - sb to be thawed */ -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map lock_map[SB_FREEZE_LEVELS]; -#endif + int frozen; /* Is sb frozen? */ + wait_queue_head_t wait_unfrozen; /* for get_super_thawed() */ + struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; }; struct super_block { @@ -1364,9 +1357,9 @@ void __sb_end_write(struct super_block *sb, int level); int __sb_start_write(struct super_block *sb, int level, bool wait); #define __sb_acquire_write(sb, lev) \ - rwsem_acquire_read(&(sb)->s_writers.lock_map[(lev)-1], 0, 1, _THIS_IP_) + percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_) #define __sb_release_write(sb, lev) \ - rwsem_release(&(sb)->s_writers.lock_map[(lev)-1], 1, _THIS_IP_) + percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_) /** * sb_end_write - drop write access to a superblock -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] change sb_writers to use percpu_rw_semaphore 2015-07-22 21:15 ` [PATCH 4/4] " Oleg Nesterov @ 2015-07-22 21:34 ` Oleg Nesterov 2015-07-28 8:34 ` Jan Kara 2015-08-07 19:54 ` Oleg Nesterov 2 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-07-22 21:34 UTC (permalink / raw) To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara Cc: Paul E. McKenney, Peter Zijlstra, linux-fsdevel, linux-kernel Sorry for noise, but let me say just in case... On 07/22, Oleg Nesterov wrote: > > Perhaps we should also cleanup the usage of ->frozen. It would be > better to set/clear (say) SB_FREEZE_WRITE with the corresponding > write-lock held. Currently freeze_super() has to set SB_FREEZE_WRITE > before sb_wait_write(SB_FREEZE_WRITE) to avoid the race with itself, "Currently" means "after this change". Before this change we obviously need to increment ->frozen before sb_wait_write(). > The "From now on, no new normal writers > can start" removed by this patch was not really correct. Yes, it was confusing even without this change. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] change sb_writers to use percpu_rw_semaphore 2015-07-22 21:15 ` [PATCH 4/4] " Oleg Nesterov 2015-07-22 21:34 ` Oleg Nesterov @ 2015-07-28 8:34 ` Jan Kara 2015-08-03 17:30 ` Oleg Nesterov 2015-08-07 19:54 ` Oleg Nesterov 2 siblings, 1 reply; 32+ messages in thread From: Jan Kara @ 2015-07-28 8:34 UTC (permalink / raw) To: Oleg Nesterov Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, Paul E. McKenney, Peter Zijlstra, linux-fsdevel, linux-kernel On Wed 22-07-15 23:15:41, Oleg Nesterov wrote: > We can remove everything from struct sb_writers except frozen > and add the array of percpu_rw_semaphore's instead. > > This patch doesn't remove sb_writers->wait_unfrozen yet, we keep > it for get_super_thawed(). We will probably remove it later. Yeah, that would be a nice cleanup. > This change tries to address the following problems: > > - Firstly, __sb_start_write() looks simply buggy. It does > __sb_end_write() if it sees ->frozen, but if it migrates > to another CPU before percpu_counter_dec(), sb_wait_write() > can wrongly succeed if there is another task which holds > the same "semaphore": sb_wait_write() can miss the result > of the previous percpu_counter_inc() but see the result > of this percpu_counter_dec(). > > - As Dave Hansen reports, it is suboptimal. The trivial > microbenchmark that writes to a tmpfs file in a loop runs > 12% faster if we change this code to rely on RCU and kill > the memory barriers. > > - This code doesn't look simple. It would be better to rely > on the generic locking code. > > According to Dave, this change adds the same performance > improvement. > > Perhaps we should also cleanup the usage of ->frozen. It would be > better to set/clear (say) SB_FREEZE_WRITE with the corresponding > write-lock held. Currently freeze_super() has to set SB_FREEZE_WRITE > before sb_wait_write(SB_FREEZE_WRITE) to avoid the race with itself, > we can add another state. The "From now on, no new normal writers > can start" removed by this patch was not really correct. The patch looks good, just one question: Why wasn't the above comment really correct? Do you mean it wouldn't be correct after your changes? I agree with that. Also when you'd like to "cleanup the usage of ->frozen", you have to be careful no only about races with freeze_super() itself but also about races with remount (that's one of the reasons why we use s_umount for protecting modifications of ->frozen). So I'm not sure how much we can actually improve on code readability... Anyway, you can add: Reviewed-by: Jan Kara <jack@suse.com> Honza > Note: with this change both freeze_super() and thaw_super() will do > synchronize_sched_expedited() 3 times. This is just ugly. But: > > - This will be "fixed" by the rcu_sync changes we are going > to merge. After that freeze_super()->percpu_down_write() > will use synchronize_sched(), and thaw_super() won't use > synchronize() at all. > > This doesn't need any changes in fs/super.c. > > - Once we merge rcu_sync changes, we can also change super.c > so that all wb_write->rw_sem's will share the single ->rss > in struct sb_writes, then freeze_super() will need only one > synchronize_sched(). > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > --- > fs/super.c | 113 ++++++++++++++-------------------------------------- > include/linux/fs.h | 19 +++------ > 2 files changed, 36 insertions(+), 96 deletions(-) > > diff --git a/fs/super.c b/fs/super.c > index 886fddf..29b811b 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -142,7 +142,7 @@ static void destroy_super_work(struct work_struct *work) > int i; > > for (i = 0; i < SB_FREEZE_LEVELS; i++) > - percpu_counter_destroy(&s->s_writers.counter[i]); > + percpu_free_rwsem(&s->s_writers.rw_sem[i]); > kfree(s); > } > > @@ -193,13 +193,11 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) > goto fail; > > for (i = 0; i < SB_FREEZE_LEVELS; i++) { > - if (percpu_counter_init(&s->s_writers.counter[i], 0, > - GFP_KERNEL) < 0) > + if (__percpu_init_rwsem(&s->s_writers.rw_sem[i], > + sb_writers_name[i], > + &type->s_writers_key[i])) > goto fail; > - lockdep_init_map(&s->s_writers.lock_map[i], sb_writers_name[i], > - &type->s_writers_key[i], 0); > } > - init_waitqueue_head(&s->s_writers.wait); > init_waitqueue_head(&s->s_writers.wait_unfrozen); > s->s_bdi = &noop_backing_dev_info; > s->s_flags = flags; > @@ -1161,47 +1159,10 @@ out: > */ > void __sb_end_write(struct super_block *sb, int level) > { > - percpu_counter_dec(&sb->s_writers.counter[level-1]); > - /* > - * Make sure s_writers are updated before we wake up waiters in > - * freeze_super(). > - */ > - smp_mb(); > - if (waitqueue_active(&sb->s_writers.wait)) > - wake_up(&sb->s_writers.wait); > - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_); > + percpu_up_read(sb->s_writers.rw_sem + level-1); > } > EXPORT_SYMBOL(__sb_end_write); > > -static int do_sb_start_write(struct super_block *sb, int level, bool wait, > - unsigned long ip) > -{ > - if (wait) > - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip); > -retry: > - if (unlikely(sb->s_writers.frozen >= level)) { > - if (!wait) > - return 0; > - wait_event(sb->s_writers.wait_unfrozen, > - sb->s_writers.frozen < level); > - } > - > - percpu_counter_inc(&sb->s_writers.counter[level-1]); > - /* > - * Make sure counter is updated before we check for frozen. > - * freeze_super() first sets frozen and then checks the counter. > - */ > - smp_mb(); > - if (unlikely(sb->s_writers.frozen >= level)) { > - __sb_end_write(sb, level); > - goto retry; > - } > - > - if (!wait) > - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip); > - return 1; > -} > - > /* > * This is an internal function, please use sb_start_{write,pagefault,intwrite} > * instead. > @@ -1209,7 +1170,7 @@ retry: > int __sb_start_write(struct super_block *sb, int level, bool wait) > { > bool force_trylock = false; > - int ret; > + int ret = 1; > > #ifdef CONFIG_LOCKDEP > /* > @@ -1225,13 +1186,17 @@ int __sb_start_write(struct super_block *sb, int level, bool wait) > int i; > > for (i = 0; i < level - 1; i++) > - if (lock_is_held(&sb->s_writers.lock_map[i])) { > + if (percpu_rwsem_is_held(sb->s_writers.rw_sem + i)) { > force_trylock = true; > break; > } > } > #endif > - ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_); > + if (wait && !force_trylock) > + percpu_down_read(sb->s_writers.rw_sem + level-1); > + else > + ret = percpu_down_read_trylock(sb->s_writers.rw_sem + level-1); > + > WARN_ON(force_trylock & !ret); > return ret; > } > @@ -1249,25 +1214,7 @@ EXPORT_SYMBOL(__sb_start_write); > */ > static void sb_wait_write(struct super_block *sb, int level) > { > - s64 writers; > - > - rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_); > - > - do { > - DEFINE_WAIT(wait); > - /* > - * We use a barrier in prepare_to_wait() to separate setting > - * of frozen and checking of the counter > - */ > - prepare_to_wait(&sb->s_writers.wait, &wait, > - TASK_UNINTERRUPTIBLE); > - > - writers = percpu_counter_sum(&sb->s_writers.counter[level-1]); > - if (writers) > - schedule(); > - > - finish_wait(&sb->s_writers.wait, &wait); > - } while (writers); > + percpu_down_write(sb->s_writers.rw_sem + level-1); > } > > static void sb_freeze_release(struct super_block *sb) > @@ -1275,7 +1222,7 @@ static void sb_freeze_release(struct super_block *sb) > int level; > /* Avoid the warning from lockdep_sys_exit() */ > for (level = 0; level < SB_FREEZE_LEVELS; ++level) > - rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_); > + percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_); > } > > static void sb_freeze_acquire(struct super_block *sb) > @@ -1283,7 +1230,15 @@ static void sb_freeze_acquire(struct super_block *sb) > int level; > > for (level = 0; level < SB_FREEZE_LEVELS; ++level) > - rwsem_acquire(sb->s_writers.lock_map + level, 0, 1, _THIS_IP_); > + percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_); > +} > + > +static void sb_freeze_unlock(struct super_block *sb) > +{ > + int level; > + > + for (level = 0; level < SB_FREEZE_LEVELS; ++level) > + percpu_up_write(sb->s_writers.rw_sem + level); > } > > /** > @@ -1342,20 +1297,14 @@ int freeze_super(struct super_block *sb) > return 0; > } > > - /* From now on, no new normal writers can start */ > sb->s_writers.frozen = SB_FREEZE_WRITE; > - smp_wmb(); > - > /* Release s_umount to preserve sb_start_write -> s_umount ordering */ > up_write(&sb->s_umount); > - > sb_wait_write(sb, SB_FREEZE_WRITE); > + down_write(&sb->s_umount); > > /* Now we go and block page faults... */ > - down_write(&sb->s_umount); > sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; > - smp_wmb(); > - > sb_wait_write(sb, SB_FREEZE_PAGEFAULT); > > /* All writers are done so after syncing there won't be dirty data */ > @@ -1363,7 +1312,6 @@ int freeze_super(struct super_block *sb) > > /* Now wait for internal filesystem counter */ > sb->s_writers.frozen = SB_FREEZE_FS; > - smp_wmb(); > sb_wait_write(sb, SB_FREEZE_FS); > > if (sb->s_op->freeze_fs) { > @@ -1372,9 +1320,8 @@ int freeze_super(struct super_block *sb) > printk(KERN_ERR > "VFS:Filesystem freeze failed\n"); > sb->s_writers.frozen = SB_UNFROZEN; > - smp_wmb(); > + sb_freeze_unlock(sb); > wake_up(&sb->s_writers.wait_unfrozen); > - sb_freeze_release(sb); > deactivate_locked_super(sb); > return ret; > } > @@ -1406,8 +1353,10 @@ int thaw_super(struct super_block *sb) > return -EINVAL; > } > > - if (sb->s_flags & MS_RDONLY) > + if (sb->s_flags & MS_RDONLY) { > + sb->s_writers.frozen = SB_UNFROZEN; > goto out; > + } > > sb_freeze_acquire(sb); > > @@ -1422,13 +1371,11 @@ int thaw_super(struct super_block *sb) > } > } > > - sb_freeze_release(sb); > -out: > sb->s_writers.frozen = SB_UNFROZEN; > - smp_wmb(); > + sb_freeze_unlock(sb); > +out: > wake_up(&sb->s_writers.wait_unfrozen); > deactivate_locked_super(sb); > - > return 0; > } > EXPORT_SYMBOL(thaw_super); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 6addccc..fb2cb4a 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1,7 +1,6 @@ > #ifndef _LINUX_FS_H > #define _LINUX_FS_H > > - > #include <linux/linkage.h> > #include <linux/wait.h> > #include <linux/kdev_t.h> > @@ -31,6 +30,7 @@ > #include <linux/percpu-rwsem.h> > #include <linux/blk_types.h> > #include <linux/workqueue.h> > +#include <linux/percpu-rwsem.h> > > #include <asm/byteorder.h> > #include <uapi/linux/fs.h> > @@ -1247,16 +1247,9 @@ enum { > #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1) > > struct sb_writers { > - /* Counters for counting writers at each level */ > - struct percpu_counter counter[SB_FREEZE_LEVELS]; > - wait_queue_head_t wait; /* queue for waiting for > - writers / faults to finish */ > - int frozen; /* Is sb frozen? */ > - wait_queue_head_t wait_unfrozen; /* queue for waiting for > - sb to be thawed */ > -#ifdef CONFIG_DEBUG_LOCK_ALLOC > - struct lockdep_map lock_map[SB_FREEZE_LEVELS]; > -#endif > + int frozen; /* Is sb frozen? */ > + wait_queue_head_t wait_unfrozen; /* for get_super_thawed() */ > + struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; > }; > > struct super_block { > @@ -1364,9 +1357,9 @@ void __sb_end_write(struct super_block *sb, int level); > int __sb_start_write(struct super_block *sb, int level, bool wait); > > #define __sb_acquire_write(sb, lev) \ > - rwsem_acquire_read(&(sb)->s_writers.lock_map[(lev)-1], 0, 1, _THIS_IP_) > + percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_) > #define __sb_release_write(sb, lev) \ > - rwsem_release(&(sb)->s_writers.lock_map[(lev)-1], 1, _THIS_IP_) > + percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_) > > /** > * sb_end_write - drop write access to a superblock > -- > 1.5.5.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] change sb_writers to use percpu_rw_semaphore 2015-07-28 8:34 ` Jan Kara @ 2015-08-03 17:30 ` Oleg Nesterov 0 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-08-03 17:30 UTC (permalink / raw) To: Jan Kara Cc: Al Viro, Dave Chinner, Dave Hansen, Paul E. McKenney, Peter Zijlstra, linux-fsdevel, linux-kernel Hi Jan, Thanks for your review and sorry for delay, I was on vacation. On 07/28, Jan Kara wrote: > > On Wed 22-07-15 23:15:41, Oleg Nesterov wrote: > > > > Perhaps we should also cleanup the usage of ->frozen. It would be > > better to set/clear (say) SB_FREEZE_WRITE with the corresponding > > write-lock held. Currently freeze_super() has to set SB_FREEZE_WRITE > > before sb_wait_write(SB_FREEZE_WRITE) to avoid the race with itself, > > we can add another state. The "From now on, no new normal writers > > can start" removed by this patch was not really correct. > > The patch looks good, just one question: Why wasn't the above comment > really correct? It is not that I think it was wrong, just not 100% accurate even before this change. "w_writers.frozen = SB_FREEZE_WRITE" itself can't guarantee that "no new normal writers can start". We do not know when other CPU's will see the result of this STORE. > Do you mean it wouldn't be correct after your changes? I > agree with that. Yes, yes, this was the actual reason to remove this comment. Sorry for confusion. > Also when you'd like to "cleanup the usage of ->frozen", you have to be > careful no only about races with freeze_super() itself but also about races > with remount (that's one of the reasons why we use s_umount for protecting > modifications of ->frozen). So I'm not sure how much we can actually > improve on code readability... Yes, me too. Probably I should simply remove this (confusing) part of the changelog. > Anyway, you can add: > > Reviewed-by: Jan Kara <jack@suse.com> Thanks! OK. Now I'll try to actually test this all. Hopefully this week. Oleg. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] change sb_writers to use percpu_rw_semaphore 2015-07-22 21:15 ` [PATCH 4/4] " Oleg Nesterov 2015-07-22 21:34 ` Oleg Nesterov 2015-07-28 8:34 ` Jan Kara @ 2015-08-07 19:54 ` Oleg Nesterov 2 siblings, 0 replies; 32+ messages in thread From: Oleg Nesterov @ 2015-08-07 19:54 UTC (permalink / raw) To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara Cc: Paul E. McKenney, Peter Zijlstra, linux-fsdevel, linux-kernel On 07/22, Oleg Nesterov wrote: > > +static void sb_freeze_unlock(struct super_block *sb) > +{ > + int level; > + > + for (level = 0; level < SB_FREEZE_LEVELS; ++level) > + percpu_up_write(sb->s_writers.rw_sem + level); > } OK, this is not exactly right, see the fix below. Otherwise seems to work, but see another email I'll send in reply to 0/4. Oleg. --- a/fs/super.c +++ b/fs/super.c @@ -1237,7 +1237,7 @@ static void sb_freeze_unlock(struct super_block *sb) { int level; - for (level = 0; level < SB_FREEZE_LEVELS; ++level) + for (level = SB_FREEZE_LEVELS; --level >= 0; ) percpu_up_write(sb->s_writers.rw_sem + level); } ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2015-08-07 19:54 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov 2015-07-13 21:25 ` [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() Oleg Nesterov 2015-07-14 10:49 ` Jan Kara 2015-07-14 13:38 ` Oleg Nesterov 2015-07-13 21:25 ` [PATCH 2/4] introduce sb_unlock_frozen() Oleg Nesterov 2015-07-13 21:25 ` [PATCH 3/4] introduce sb_lockdep_release() Oleg Nesterov 2015-07-13 21:25 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov 2015-07-13 22:23 ` [PATCH RFC 0/4] " Dave Chinner 2015-07-13 22:42 ` Oleg Nesterov 2015-07-13 23:14 ` Dave Chinner 2015-07-14 10:48 ` Jan Kara 2015-07-14 13:37 ` Oleg Nesterov 2015-07-14 21:17 ` Dave Hansen 2015-07-14 21:22 ` Oleg Nesterov 2015-07-14 21:41 ` Dave Hansen 2015-07-15 6:47 ` Jan Kara 2015-07-15 18:19 ` Oleg Nesterov 2015-07-16 7:26 ` Jan Kara 2015-07-16 7:30 ` Dave Hansen 2015-07-16 8:55 ` Jan Kara 2015-07-16 17:32 ` Oleg Nesterov 2015-07-17 1:27 ` Dave Chinner 2015-07-17 17:31 ` Oleg Nesterov 2015-07-17 22:40 ` Dave Chinner 2015-07-20 8:26 ` Jan Kara 2015-07-22 21:09 ` Oleg Nesterov 2015-07-20 16:23 ` Oleg Nesterov -- strict thread matches above, loose matches on Subject: below -- 2015-07-22 21:15 [PATCH " Oleg Nesterov 2015-07-22 21:15 ` [PATCH 4/4] " Oleg Nesterov 2015-07-22 21:34 ` Oleg Nesterov 2015-07-28 8:34 ` Jan Kara 2015-08-03 17:30 ` Oleg Nesterov 2015-08-07 19:54 ` Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.