From: Andrea Parri <parri.andrea@gmail.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
mathieu.desnoyers@efficios.com,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: Reasoning about memory ordering
Date: Fri, 23 Feb 2018 18:31:58 +0100 [thread overview]
Message-ID: <20180223173158.GA3723@andrea> (raw)
In-Reply-To: <0db16ef6-c805-b1f6-527f-8fec149e3df5@suse.com>
On Fri, Feb 23, 2018 at 02:30:22PM +0200, Nikolay Borisov wrote:
> Hello,
>
> I'm cc'ing a bunch of people I know are well-versed in
> the black arts of memory ordering!
>
> Currently in btrfs we have roughly the following sequence:
>
> T1: T2:
> i_size_write(inode, newsize);
> set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); atomic_inc(&inode->i_dio_count);
> smp_mb(); if (iov_iter_rw(iter) == READ) {
> if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) {
> if (atomic_read(&inode->i_dio_count)) { if (atomic_dec_and_test(&inode->i_dio_count))
> wait_queue_head_t *wq = bit_waitqueue(&inode->i_state, __I_DIO_WAKEUP); wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
> DEFINE_WAIT_BIT(q, &inode->i_state, __I_DIO_WAKEUP); }
> if (offset >= i_size_read(inode))
> do { return;
> prepare_to_wait(wq, &q.wq_entry, TASK_UNINTERRUPTIBLE); }
> if (atomic_read(&inode->i_dio_count))
> schedule();
> } while (atomic_read(&inode->i_dio_count));
> finish_wait(wq, &q.wq_entry);
> }
>
> smp_mb__before_atomic();
> clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
>
> The semantics I'm after are:
>
> 1. If T1 goes to sleep, then T2 would see the
> BTRFS_INODE_READDIO_NEED_LOCK and hence will execute the
> atomic_dec_and_test and possibly wake up T1. This flag serves as a way
> to indicate to possibly multiple T2 (dio readers) that T1 is blocked
> and they should unblock it and resort to acquiring some locks (this is not
> visible in this excerpt of code for brevity). It's sort of a back-off
> mechanism.
I don't see how this could be guaranteed, even in a sequentially consistent
world (disclaimer: I'm certainly not familiar with btrfs): what is wrong in
T1 T2
atomic_inc(i_dio_count)
test_bit(NEED_LOCK, flags) // unset
set_bit(NEED_LOCK, flags)
atomic_read(i_dio_count) // >1
--> go to sleep
Thanks,
Andrea
>
> 2. BTRFS_INODE_READDIO_NEED_LOCK bit must be set _before_ going to sleep
>
> 3. BTRFS_INODE_READDIO_NEED_LOCK must be cleared _after_ the thread has
> been woken up.
>
> 4. After T1 is woken up, it's possible that a new T2 comes and doesn't see
> the BTRFS_INODE_READDIO_NEED_LOCK flag set but this is fine, since the check
> for i_size should cause T2 to just return (it will also execute atomic_dec_and_test)
>
> Given this is the current state of the code (it's part of btrfs) I believe
> the following could/should be done:
>
> 1. The smp_mb after the set_bit in T1 could be removed, since there is
> already an implied full mm in prepare_to_wait. That is if we go to sleep,
> then T2 is guaranteed to see the flag/i_size_write happening by merit of
> the implied memory barrier in prepare_to_wait/schedule. But what if it doesn't
> go to sleep? I still would like the i_size_write to be visible to T2
>
> 2. The bit clearing code in T1 should be possible to be replaced by
> clear_bit_unlock (this was suggested by PeterZ on IRC).
>
> 3. I suspect there is a memory barrier in T2 that is missing. Perhaps
> there should be an smp_mb__before_atomic right before the test_bit so that
> it's ordered with the implied smp_mb in T1's prepare_to_wait.
next prev parent reply other threads:[~2018-02-23 17:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-23 12:30 Reasoning about memory ordering Nikolay Borisov
2018-02-23 15:38 ` Alan Cox
2018-02-23 15:59 ` Nikolay Borisov
2018-02-23 17:31 ` Andrea Parri [this message]
2018-02-23 17:45 ` Nikolay Borisov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180223173158.GA3723@andrea \
--to=parri.andrea@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=nborisov@suse.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox