From: "jianchao.wang" <jianchao.w.wang@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>,
Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"kernel-team@fb.com" <kernel-team@fb.com>,
"oleg@redhat.com" <oleg@redhat.com>, "hch@lst.de" <hch@lst.de>,
"axboe@kernel.dk" <axboe@kernel.dk>,
"osandov@fb.com" <osandov@fb.com>,
"tj@kernel.org" <tj@kernel.org>
Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme
Date: Fri, 15 Dec 2017 10:12:50 +0800 [thread overview]
Message-ID: <007e5a56-83fb-23b0-64d9-4725f15c596d@oracle.com> (raw)
In-Reply-To: <20171214215404.GK3326@worktop>
On 12/15/2017 05:54 AM, Peter Zijlstra wrote:
> On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote:
>> On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote:
>>> On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote:
>>>> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
>>>>> + write_seqcount_begin(&rq->gstate_seq);
>>>>> + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
>>>>> + blk_add_timer(rq);
>>>>> + write_seqcount_end(&rq->gstate_seq);
>>>>
>>>> My understanding is that both write_seqcount_begin() and write_seqcount_end()
>>>> trigger a write memory barrier. Is a seqcount really faster than a spinlock?
>>>
>>> Yes lots, no atomic operations and no waiting.
>>>
>>> The only constraint for write_seqlock is that there must not be any
>>> concurrency.
>>>
>>> But now that I look at this again, TJ, why can't the below happen?
>>>
>>> write_seqlock_begin();
>>> blk_mq_rq_update_state(rq, IN_FLIGHT);
>>> blk_add_timer(rq);
>>> <timer-irq>
>>> read_seqcount_begin()
>>> while (seq & 1)
>>> cpurelax();
>>> // life-lock
>>> </timer-irq>
>>> write_seqlock_end();
>>
>> Hello Peter,
>>
>> Some time ago the block layer was changed to handle timeouts in thread context
>> instead of interrupt context. See also commit 287922eb0b18 ("block: defer
>> timeouts to a workqueue").
>
> That only makes it a little better:
>
> Task-A Worker
>
> write_seqcount_begin()
> blk_mq_rw_update_state(rq, IN_FLIGHT)
> blk_add_timer(rq)
> <timer>
> schedule_work()
> </timer>
> <context-switch to worker>
> read_seqcount_begin()
> while(seq & 1)
> cpu_relax();
>
Hi Peter
The current seqcount read side is as below:
do {
start = read_seqcount_begin(&rq->gstate_seq);
gstate = READ_ONCE(rq->gstate);
deadline = rq->deadline;
} while (read_seqcount_retry(&rq->gstate_seq, start));
read_seqcount_retry() doesn't check the bit 0, but whether the saved value from
read_seqcount_begin() is equal to the current value of seqcount.
pls refer:
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
{
return unlikely(s->sequence != start);
}
Thanks
Jianchao
>
> Now normally this isn't fatal because Worker will simply spin its entire
> time slice away and we'll eventually schedule our Task-A back in, which
> will complete the seqcount and things will work.
>
> But if, for some reason, our Worker was to have RT priority higher than
> our Task-A we'd be up some creek without no paddles.
>
> We don't happen to have preemption of IRQs off here? That would fix
> things nicely.
>
next prev parent reply other threads:[~2017-12-15 2:12 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-12 19:01 [PATCHSET v2] blk-mq: reimplement timeout handling Tejun Heo
2017-12-12 19:01 ` [PATCH 1/6] blk-mq: protect completion path with RCU Tejun Heo
2017-12-13 3:30 ` jianchao.wang
2017-12-13 16:13 ` Tejun Heo
2017-12-14 2:09 ` jianchao.wang
2017-12-14 17:01 ` Bart Van Assche
2017-12-14 18:14 ` tj
2017-12-12 19:01 ` [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo
2017-12-12 21:37 ` Bart Van Assche
2017-12-12 21:44 ` tj
2017-12-13 5:07 ` jianchao.wang
2017-12-13 16:13 ` Tejun Heo
2017-12-14 18:51 ` Bart Van Assche
2017-12-14 19:19 ` tj
2017-12-14 21:13 ` Bart Van Assche
2017-12-15 13:30 ` tj
2017-12-14 20:20 ` Peter Zijlstra
2017-12-14 21:42 ` Bart Van Assche
2017-12-14 21:54 ` Peter Zijlstra
2017-12-15 2:12 ` jianchao.wang [this message]
2017-12-15 7:31 ` Peter Zijlstra
2017-12-15 15:14 ` jianchao.wang
2017-12-15 2:39 ` Mike Galbraith
2017-12-15 13:50 ` tj
2017-12-12 19:01 ` [PATCH 3/6] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE Tejun Heo
2017-12-12 19:01 ` [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path Tejun Heo
2017-12-14 18:56 ` Bart Van Assche
2017-12-14 19:26 ` tj
2017-12-12 19:01 ` [PATCH 5/6] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq Tejun Heo
2017-12-12 19:01 ` [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED Tejun Heo
2017-12-12 22:20 ` Bart Van Assche
2017-12-12 22:22 ` tj
2017-12-12 20:23 ` [PATCHSET v2] blk-mq: reimplement timeout handling Jens Axboe
2017-12-12 21:40 ` Tejun Heo
2017-12-20 23:41 ` Bart Van Assche
2017-12-21 0:08 ` tj
2017-12-21 1:00 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=007e5a56-83fb-23b0-64d9-4725f15c596d@oracle.com \
--to=jianchao.w.wang@oracle.com \
--cc=Bart.VanAssche@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kernel-team@fb.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=osandov@fb.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).