From: Andrea Parri <parri.andrea@gmail.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
mathieu.desnoyers@efficios.com,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: Reasoning about memory ordering
Date: Fri, 23 Feb 2018 18:31:58 +0100 [thread overview]
Message-ID: <20180223173158.GA3723@andrea> (raw)
In-Reply-To: <0db16ef6-c805-b1f6-527f-8fec149e3df5@suse.com>
On Fri, Feb 23, 2018 at 02:30:22PM +0200, Nikolay Borisov wrote:
> Hello,
>
> I'm cc'ing a bunch of people I know are well-versed in
> the black arts of memory ordering!
>
> Currently in btrfs we have roughly the following sequence:
>
> T1: T2:
> i_size_write(inode, newsize);
> set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); atomic_inc(&inode->i_dio_count);
> smp_mb(); if (iov_iter_rw(iter) == READ) {
> if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) {
> if (atomic_read(&inode->i_dio_count)) { if (atomic_dec_and_test(&inode->i_dio_count))
> wait_queue_head_t *wq = bit_waitqueue(&inode->i_state, __I_DIO_WAKEUP); wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
> DEFINE_WAIT_BIT(q, &inode->i_state, __I_DIO_WAKEUP); }
> if (offset >= i_size_read(inode))
> do { return;
> prepare_to_wait(wq, &q.wq_entry, TASK_UNINTERRUPTIBLE); }
> if (atomic_read(&inode->i_dio_count))
> schedule();
> } while (atomic_read(&inode->i_dio_count));
> finish_wait(wq, &q.wq_entry);
> }
>
> smp_mb__before_atomic();
> clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
>
> The semantics I'm after are:
>
> 1. If T1 goes to sleep, then T2 would see the
> BTRFS_INODE_READDIO_NEED_LOCK and hence will execute the
> atomic_dec_and_test and possibly wake up T1. This flag serves as a way
> to indicate to possibly multiple T2 (dio readers) that T1 is blocked
> and they should unblock it and resort to acquiring some locks (this is not
> visible in this excerpt of code for brevity). It's sort of a back-off
> mechanism.
I don't see how this could be guaranteed, even in a sequentially consistent
world (disclaimer: I'm certainly not familiar with btrfs): what is wrong in
T1 T2
atomic_inc(i_dio_count)
test_bit(NEED_LOCK, flags) // unset
set_bit(NEED_LOCK, flags)
atomic_read(i_dio_count) // >1
--> go to sleep
Thanks,
Andrea
>
> 2. BTRFS_INODE_READDIO_NEED_LOCK bit must be set _before_ going to sleep
>
> 3. BTRFS_INODE_READDIO_NEED_LOCK must be cleared _after_ the thread has
> been woken up.
>
> 4. After T1 is woken up, it's possible that a new T2 comes and doesn't see
> the BTRFS_INODE_READDIO_NEED_LOCK flag set but this is fine, since the check
> for i_size should cause T2 to just return (it will also execute atomic_dec_and_test)
>
> Given this is the current state of the code (it's part of btrfs) I believe
> the following could/should be done:
>
> 1. The smp_mb after the set_bit in T1 could be removed, since there is
> already an implied full mm in prepare_to_wait. That is if we go to sleep,
> then T2 is guaranteed to see the flag/i_size_write happening by merit of
> the implied memory barrier in prepare_to_wait/schedule. But what if it doesn't
> go to sleep? I still would like the i_size_write to be visible to T2
>
> 2. The bit clearing code in T1 should be possible to be replaced by
> clear_bit_unlock (this was suggested by PeterZ on IRC).
>
> 3. I suspect there is a memory barrier in T2 that is missing. Perhaps
> there should be an smp_mb__before_atomic right before the test_bit so that
> it's ordered with the implied smp_mb in T1's prepare_to_wait.
next prev parent reply other threads:[~2018-02-23 17:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-23 12:30 Reasoning about memory ordering Nikolay Borisov
2018-02-23 15:38 ` Alan Cox
2018-02-23 15:59 ` Nikolay Borisov
2018-02-23 17:31 ` Andrea Parri [this message]
2018-02-23 17:45 ` Nikolay Borisov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180223173158.GA3723@andrea \
--to=parri.andrea@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=nborisov@suse.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.