All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Daniel Wagner <daniel.wagner@bmw-carit.de>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore
Date: Thu, 16 Jul 2015 19:32:56 +0200	[thread overview]
Message-ID: <20150716173256.GA17753@redhat.com> (raw)
In-Reply-To: <20150716072654.GE22847@quack.suse.cz>

On 07/16, Jan Kara wrote:
>
> On Wed 15-07-15 20:19:20, Oleg Nesterov wrote:
> >
> > Perhaps it makes to merge other 2 patches from Dave first? (those which
> > change __sb_start/end_write to rely on RCU). Afaics these changes are
> > straightforward and correct. Although I'd suggest to use preempt_disable()
> > and synchronize_sched() instead. I will be happy to (try to) make this
> > conversion on top of his changes.
> >
> > Because I do not want to delay the performance improvements and I do not
> > know when exactly I'll send the next version: I need to finish the previous
> > discussion about rcu_sync first. And the necessary changes in fs/super.c
> > depend on whether percpu_rw_semaphore will have rcu_sync or not (not too
> > much, only destroy_super() depends, but still).
> >
> > And of course, I am worried that I missed something and percpu_rw_semaphore
> > can't work for some reason. The code in fs/super.c looks simple, but it
> > seems that filesystems do the "strange" things with lockdep at least.
>
> So Dave's patches would go in only in the next merge window anyway so we
> still have like two-three weeks to decide which patchset to take.

OK, good.

> If you
> think it will take you longer,

Hopefully not.

> then merging Dave's patches makes some sense
> although I personally don't think the issue is so important that we have to
> fix it ASAP and eventual delay of one more release would be OK for me.

OK. I will try to do this in any case, I just wanted to say that I can
equally do this on top of Dave's patches.

To remind, I need to finish the discussion about percpu_rw_semaphore
and rcu_sync, then I'll try to make v2.

And. The biggest problem is lockdep. Everything else looks really simple
although of course I could miss something. And not only because the
filesystems abuse lockdep and thus we need some cleanups first. Unless
I am totally confused fs/super.c needs some cleanups (and fixes) too,
with or without this conversion. Say, acquire_freeze_lock() logic does
do not look right:

	- wait_event(.frozen < level) without rwsem_acquire_read() is
	  just wrong from lockdep perspective. If we are going to deadlock
	  because the caller is buggy, lockdep can't warn us.

	- __sb_start_write() can race with thaw_super() + freeze_super(),
	  and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

	- The "tell lockdep we are doing trylock" hack doesn't look nice.

	  I think this is correct, but this logic should be more explicit.
	  Yes, the recursive read_lock() is fine if we hold the lock on
	  higher level. But we do not need to fool lockdep. If we can not
	  deadlock in this case (and I agree we can't) we should simply use
	  wait == F consistently.

Something like this (not even compiled tested):

	static int
	do_sb_start_write(struct super_block *sb, int level, bool wait, unsigned long ip)
	{

		if (wait)
			rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
	retry:
		if (unlikely(sb->s_writers.frozen >= level)) {
			if (!wait)
				return 0;
			wait_event(sb->s_writers.wait_unfrozen,
				   sb->s_writers.frozen < level);
		}

		percpu_counter_inc(&sb->s_writers.counter[level-1]);
		/*
		 * Make sure counter is updated before we check for frozen.
		 * freeze_super() first sets frozen and then checks the counter.
		 */
		smp_mb();
		if (unlikely(sb->s_writers.frozen >= level)) {
			__sb_end_write(sb, level);
			goto retry;
		}

		if (!wait)
			rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
		return 1;
	}

	/*
	 * This is an internal function, please use sb_start_{write,pagefault,intwrite}
	 * instead.
	 */
	int __sb_start_write(struct super_block *sb, int level, bool wait)
	{
		bool cantbelocked = false;
		int ret;

	#ifdef CONFIG_LOCKDEP
		/*
		 * We want lockdep to tell us about possible deadlocks with freezing but
		 * it's it bit tricky to properly instrument it. Getting a freeze protection
		 * works as getting a read lock but there are subtle problems. XFS for example
		 * gets freeze protection on internal level twice in some cases, which is OK
		 * only because we already hold a freeze protection also on higher level. Due
		 * to these cases we have to use wait == F (trylock mode) which must not fail.
		 */
		if (wait) {
			int i;

			for (i = 0; i < level - 1; i++)
				if (lock_is_held(&sb->s_writers.lock_map[i])) {
					cantbelocked = true;
					break;
				}
		}
	#endif
		ret = do_sb_start_write(sb, level, wait && !cantbelocked, _RET_IP_);
		WARN_ON(cantbelocked & !ret);
		return ret;
	}

This should not generate the additional code if CONFIG_LOCKDEP=n and
After this patch it will be trivial to convert __sb_start_write(), but
we need some more cleanups. And perhaps I'll send some changes (like
above) separately, because again, I think they make sense in any case.

In short: I'll try to make v2 asap, but this is all I can promise ;)

Oleg.


  parent reply	other threads:[~2015-07-16 17:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13 21:25 [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-13 21:25 ` [PATCH 1/4] change get_super_thawed() to use sb_start/end_write() Oleg Nesterov
2015-07-14 10:49   ` Jan Kara
2015-07-14 13:38     ` Oleg Nesterov
2015-07-13 21:25 ` [PATCH 2/4] introduce sb_unlock_frozen() Oleg Nesterov
2015-07-13 21:25 ` [PATCH 3/4] introduce sb_lockdep_release() Oleg Nesterov
2015-07-13 21:25 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-13 22:23 ` [PATCH RFC 0/4] " Dave Chinner
2015-07-13 22:42   ` Oleg Nesterov
2015-07-13 23:14     ` Dave Chinner
2015-07-14 10:48 ` Jan Kara
2015-07-14 13:37   ` Oleg Nesterov
2015-07-14 21:17     ` Dave Hansen
2015-07-14 21:22       ` Oleg Nesterov
2015-07-14 21:41         ` Dave Hansen
2015-07-15  6:47           ` Jan Kara
2015-07-15 18:19             ` Oleg Nesterov
2015-07-16  7:26               ` Jan Kara
2015-07-16  7:30                 ` Dave Hansen
2015-07-16  8:55                   ` Jan Kara
2015-07-16 17:32                 ` Oleg Nesterov [this message]
2015-07-17  1:27                   ` Dave Chinner
2015-07-17 17:31                     ` Oleg Nesterov
2015-07-17 22:40                       ` Dave Chinner
2015-07-20  8:26                         ` Jan Kara
2015-07-22 21:09                           ` Oleg Nesterov
2015-07-20 16:23                         ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150716173256.GA17753@redhat.com \
    --to=oleg@redhat.com \
    --cc=daniel.wagner@bmw-carit.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.