* [PATCH] blk-wbt: fix indefinite background writeback sleep
@ 2018-06-22 19:26 Jens Axboe
2018-06-22 22:43 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2018-06-22 19:26 UTC (permalink / raw)
To: linux-block@vger.kernel.org
blk-wbt adds waiters to the tail of the waitqueue, and factors in the
task placement in its decision making on whether or not the current task
can proceed. This can cause issues for the lowest class of writers,
since they can get woken up, denied access, and then put back to sleep
at the end of the waitqueue.
Fix this so that we only utilize the tail add for the initial sleep, and
we don't factor in the wait queue placement after we've slept (and are
now using the head addition).
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 4f89b28fa652..7beeabd05f4a 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
* If the waitqueue is already active and we are not the next
* in line to be woken up, wait for our turn.
*/
- if (waitqueue_active(&rqw->wait) &&
+ if (wait && waitqueue_active(&rqw->wait) &&
rqw->wait.head.next != &wait->entry)
return false;
@@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
__acquires(lock)
{
struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
+ struct wait_queue_entry *waitptr = NULL;
DEFINE_WAIT(wait);
- if (may_queue(rwb, rqw, &wait, rw))
+ if (may_queue(rwb, rqw, waitptr, rw))
return;
+ waitptr = &wait;
do {
- prepare_to_wait_exclusive(&rqw->wait, &wait,
+ /*
+ * Don't add ourselves to the wq tail if we've already
+ * slept. Otherwise we can penalize background writes
+ * indefinitely.
+ */
+ if (waitptr)
+ prepare_to_wait_exclusive(&rqw->wait, &wait,
+ TASK_UNINTERRUPTIBLE);
+ else
+ prepare_to_wait(&rqw->wait, &wait,
TASK_UNINTERRUPTIBLE);
- if (may_queue(rwb, rqw, &wait, rw))
+ if (may_queue(rwb, rqw, waitptr, rw))
break;
if (lock) {
@@ -585,6 +596,12 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
spin_lock_irq(lock);
} else
io_schedule();
+
+ /*
+ * After we've slept, we don't want to factor in wq head
+ * placement anymore for may_queue().
+ */
+ waitptr = NULL;
} while (1);
finish_wait(&rqw->wait, &wait);
--
Jens Axboe
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
2018-06-22 19:26 [PATCH] blk-wbt: fix indefinite background writeback sleep Jens Axboe
@ 2018-06-22 22:43 ` Ming Lei
2018-06-22 22:51 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2018-06-22 22:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block@vger.kernel.org
On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
> task placement in its decision making on whether or not the current task
> can proceed. This can cause issues for the lowest class of writers,
> since they can get woken up, denied access, and then put back to sleep
> at the end of the waitqueue.
>
> Fix this so that we only utilize the tail add for the initial sleep, and
> we don't factor in the wait queue placement after we've slept (and are
> now using the head addition).
>
> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index 4f89b28fa652..7beeabd05f4a 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
> * If the waitqueue is already active and we are not the next
> * in line to be woken up, wait for our turn.
> */
> - if (waitqueue_active(&rqw->wait) &&
> + if (wait && waitqueue_active(&rqw->wait) &&
> rqw->wait.head.next != &wait->entry)
> return false;
>
> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
> __acquires(lock)
> {
> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
> + struct wait_queue_entry *waitptr = NULL;
> DEFINE_WAIT(wait);
>
> - if (may_queue(rwb, rqw, &wait, rw))
> + if (may_queue(rwb, rqw, waitptr, rw))
> return;
>
> + waitptr = &wait;
> do {
> - prepare_to_wait_exclusive(&rqw->wait, &wait,
> + /*
> + * Don't add ourselves to the wq tail if we've already
> + * slept. Otherwise we can penalize background writes
> + * indefinitely.
> + */
I saw this indefinite wbt_wait() in systemd-journal when running
aio-stress read test, but just once, not figured out one approach
to reproduce it yet, just wondering if you have quick test case for
reproducing and verifying this issue.
> + if (waitptr)
> + prepare_to_wait_exclusive(&rqw->wait, &wait,
> + TASK_UNINTERRUPTIBLE);
> + else
> + prepare_to_wait(&rqw->wait, &wait,
> TASK_UNINTERRUPTIBLE);
Could you explain a bit why the 'wait_entry' order matters wrt. this
issue? Since other 'wait_entry' still may come at the head meantime to
the same wq before checking in may_queue().
Can we remove 'rqw->wait.head.next != &wait->entry' from may_queue()?
I guess that should be one optimization, but seems quite dangerous since
'rqw->wait.head.next' may point to one freed stack variable.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
2018-06-22 22:43 ` Ming Lei
@ 2018-06-22 22:51 ` Jens Axboe
2018-06-22 23:11 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2018-06-22 22:51 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block@vger.kernel.org
On 6/22/18 4:43 PM, Ming Lei wrote:
> On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
>> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
>> task placement in its decision making on whether or not the current task
>> can proceed. This can cause issues for the lowest class of writers,
>> since they can get woken up, denied access, and then put back to sleep
>> at the end of the waitqueue.
>>
>> Fix this so that we only utilize the tail add for the initial sleep, and
>> we don't factor in the wait queue placement after we've slept (and are
>> now using the head addition).
>>
>> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>> index 4f89b28fa652..7beeabd05f4a 100644
>> --- a/block/blk-wbt.c
>> +++ b/block/blk-wbt.c
>> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
>> * If the waitqueue is already active and we are not the next
>> * in line to be woken up, wait for our turn.
>> */
>> - if (waitqueue_active(&rqw->wait) &&
>> + if (wait && waitqueue_active(&rqw->wait) &&
>> rqw->wait.head.next != &wait->entry)
>> return false;
>>
>> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
>> __acquires(lock)
>> {
>> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
>> + struct wait_queue_entry *waitptr = NULL;
>> DEFINE_WAIT(wait);
>>
>> - if (may_queue(rwb, rqw, &wait, rw))
>> + if (may_queue(rwb, rqw, waitptr, rw))
>> return;
>>
>> + waitptr = &wait;
>> do {
>> - prepare_to_wait_exclusive(&rqw->wait, &wait,
>> + /*
>> + * Don't add ourselves to the wq tail if we've already
>> + * slept. Otherwise we can penalize background writes
>> + * indefinitely.
>> + */
>
> I saw this indefinite wbt_wait() in systemd-journal when running
> aio-stress read test, but just once, not figured out one approach
> to reproduce it yet, just wondering if you have quick test case for
> reproducing and verifying this issue.
I've seen it in production, but I'm currently relying on someone else
to reproduce it synthetically. I'm just providing the patches for
testing.
>> + if (waitptr)
>> + prepare_to_wait_exclusive(&rqw->wait, &wait,
>> + TASK_UNINTERRUPTIBLE);
>> + else
>> + prepare_to_wait(&rqw->wait, &wait,
>> TASK_UNINTERRUPTIBLE);
>
> Could you explain a bit why the 'wait_entry' order matters wrt. this
> issue? Since other 'wait_entry' still may come at the head meantime to
> the same wq before checking in may_queue().
Let's say we have 10 tasks queued up. Each one gets added to the tail,
so when it's our turn, we've now reached the head. We fail to get a
queue token, so we go back to sleep. At that point we should add
back to the head, not the tail, for fairness purposes.
> Can we remove 'rqw->wait.head.next != &wait->entry' from may_queue()?
> I guess that should be one optimization, but seems quite dangerous since
> 'rqw->wait.head.next' may point to one freed stack variable.
I actually changed it somewhat, since the initial check wasn't great
still. See below. It doesn't matter if the contents are stale, we aren't
going to dereference them anyway.
commit e68d8e22a8b9712c47ead489394f75e9df5a02d1
Author: Jens Axboe <axboe@kernel.dk>
Date: Fri Jun 22 13:44:22 2018 -0600
blk-wbt: fix indefinite background writeback sleep
blk-wbt adds waiters to the tail of the waitqueue, and factors in the
task placement in its decision making on whether or not the current task
can proceed. This can cause issues for the lowest class of writers,
since they can get woken up, denied access, and then put back to sleep
at the end of the waitqueue.
Fix this so that we only utilize the tail add for the initial sleep, and
we don't factor in the wait queue placement after we've slept (and are
now using the head addition).
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 4f89b28fa652..41607c0bd849 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -550,8 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
* If the waitqueue is already active and we are not the next
* in line to be woken up, wait for our turn.
*/
- if (waitqueue_active(&rqw->wait) &&
- rqw->wait.head.next != &wait->entry)
+ if (wait && rqw->wait.head.next != &wait->entry)
return false;
return atomic_inc_below(&rqw->inflight, get_limit(rwb, rw));
@@ -567,16 +566,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
__acquires(lock)
{
struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
+ struct wait_queue_entry *waitptr = NULL;
DEFINE_WAIT(wait);
- if (may_queue(rwb, rqw, &wait, rw))
+ if (!waitqueue_active(&rqw->wait) && may_queue(rwb, rqw, waitptr, rw))
return;
+ waitptr = &wait;
do {
- prepare_to_wait_exclusive(&rqw->wait, &wait,
+ /*
+ * Don't add ourselves to the wq tail if we've already
+ * slept. Otherwise we can penalize background writes
+ * indefinitely.
+ */
+ if (waitptr)
+ prepare_to_wait_exclusive(&rqw->wait, &wait,
+ TASK_UNINTERRUPTIBLE);
+ else
+ prepare_to_wait(&rqw->wait, &wait,
TASK_UNINTERRUPTIBLE);
- if (may_queue(rwb, rqw, &wait, rw))
+ if (may_queue(rwb, rqw, waitptr, rw))
break;
if (lock) {
@@ -585,6 +595,12 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
spin_lock_irq(lock);
} else
io_schedule();
+
+ /*
+ * After we've slept, we don't want to factor in wq head
+ * placement anymore for may_queue().
+ */
+ waitptr = NULL;
} while (1);
finish_wait(&rqw->wait, &wait);
--
Jens Axboe
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
2018-06-22 22:51 ` Jens Axboe
@ 2018-06-22 23:11 ` Ming Lei
2018-06-22 23:13 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2018-06-22 23:11 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block@vger.kernel.org
On Fri, Jun 22, 2018 at 04:51:26PM -0600, Jens Axboe wrote:
> On 6/22/18 4:43 PM, Ming Lei wrote:
> > On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
> >> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
> >> task placement in its decision making on whether or not the current task
> >> can proceed. This can cause issues for the lowest class of writers,
> >> since they can get woken up, denied access, and then put back to sleep
> >> at the end of the waitqueue.
> >>
> >> Fix this so that we only utilize the tail add for the initial sleep, and
> >> we don't factor in the wait queue placement after we've slept (and are
> >> now using the head addition).
> >>
> >> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>
> >> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> >> index 4f89b28fa652..7beeabd05f4a 100644
> >> --- a/block/blk-wbt.c
> >> +++ b/block/blk-wbt.c
> >> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
> >> * If the waitqueue is already active and we are not the next
> >> * in line to be woken up, wait for our turn.
> >> */
> >> - if (waitqueue_active(&rqw->wait) &&
> >> + if (wait && waitqueue_active(&rqw->wait) &&
> >> rqw->wait.head.next != &wait->entry)
> >> return false;
> >>
> >> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
> >> __acquires(lock)
> >> {
> >> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
> >> + struct wait_queue_entry *waitptr = NULL;
> >> DEFINE_WAIT(wait);
> >>
> >> - if (may_queue(rwb, rqw, &wait, rw))
> >> + if (may_queue(rwb, rqw, waitptr, rw))
> >> return;
> >>
> >> + waitptr = &wait;
> >> do {
> >> - prepare_to_wait_exclusive(&rqw->wait, &wait,
> >> + /*
> >> + * Don't add ourselves to the wq tail if we've already
> >> + * slept. Otherwise we can penalize background writes
> >> + * indefinitely.
> >> + */
> >
> > I saw this indefinite wbt_wait() in systemd-journal when running
> > aio-stress read test, but just once, not figured out one approach
> > to reproduce it yet, just wondering if you have quick test case for
> > reproducing and verifying this issue.
>
> I've seen it in production, but I'm currently relying on someone else
> to reproduce it synthetically. I'm just providing the patches for
> testing.
>
> >> + if (waitptr)
> >> + prepare_to_wait_exclusive(&rqw->wait, &wait,
> >> + TASK_UNINTERRUPTIBLE);
> >> + else
> >> + prepare_to_wait(&rqw->wait, &wait,
> >> TASK_UNINTERRUPTIBLE);
> >
> > Could you explain a bit why the 'wait_entry' order matters wrt. this
> > issue? Since other 'wait_entry' still may come at the head meantime to
> > the same wq before checking in may_queue().
>
> Let's say we have 10 tasks queued up. Each one gets added to the tail,
> so when it's our turn, we've now reached the head. We fail to get a
> queue token, so we go back to sleep. At that point we should add
> back to the head, not the tail, for fairness purposes.
OK, it is reasonable to do it for fairness purpose, but seems we still
don't know how the wait forever in wbt_wait() is fixed by this way.
I guess the reason is in the check of 'rqw->wait.head.next != &wait->entry',
which is basically removed when the waiter is waken up.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
2018-06-22 23:11 ` Ming Lei
@ 2018-06-22 23:13 ` Jens Axboe
2018-06-23 1:37 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2018-06-22 23:13 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block@vger.kernel.org
On 6/22/18 5:11 PM, Ming Lei wrote:
> On Fri, Jun 22, 2018 at 04:51:26PM -0600, Jens Axboe wrote:
>> On 6/22/18 4:43 PM, Ming Lei wrote:
>>> On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
>>>> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
>>>> task placement in its decision making on whether or not the current task
>>>> can proceed. This can cause issues for the lowest class of writers,
>>>> since they can get woken up, denied access, and then put back to sleep
>>>> at the end of the waitqueue.
>>>>
>>>> Fix this so that we only utilize the tail add for the initial sleep, and
>>>> we don't factor in the wait queue placement after we've slept (and are
>>>> now using the head addition).
>>>>
>>>> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>
>>>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>>>> index 4f89b28fa652..7beeabd05f4a 100644
>>>> --- a/block/blk-wbt.c
>>>> +++ b/block/blk-wbt.c
>>>> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
>>>> * If the waitqueue is already active and we are not the next
>>>> * in line to be woken up, wait for our turn.
>>>> */
>>>> - if (waitqueue_active(&rqw->wait) &&
>>>> + if (wait && waitqueue_active(&rqw->wait) &&
>>>> rqw->wait.head.next != &wait->entry)
>>>> return false;
>>>>
>>>> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
>>>> __acquires(lock)
>>>> {
>>>> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
>>>> + struct wait_queue_entry *waitptr = NULL;
>>>> DEFINE_WAIT(wait);
>>>>
>>>> - if (may_queue(rwb, rqw, &wait, rw))
>>>> + if (may_queue(rwb, rqw, waitptr, rw))
>>>> return;
>>>>
>>>> + waitptr = &wait;
>>>> do {
>>>> - prepare_to_wait_exclusive(&rqw->wait, &wait,
>>>> + /*
>>>> + * Don't add ourselves to the wq tail if we've already
>>>> + * slept. Otherwise we can penalize background writes
>>>> + * indefinitely.
>>>> + */
>>>
>>> I saw this indefinite wbt_wait() in systemd-journal when running
>>> aio-stress read test, but just once, not figured out one approach
>>> to reproduce it yet, just wondering if you have quick test case for
>>> reproducing and verifying this issue.
>>
>> I've seen it in production, but I'm currently relying on someone else
>> to reproduce it synthetically. I'm just providing the patches for
>> testing.
>>
>>>> + if (waitptr)
>>>> + prepare_to_wait_exclusive(&rqw->wait, &wait,
>>>> + TASK_UNINTERRUPTIBLE);
>>>> + else
>>>> + prepare_to_wait(&rqw->wait, &wait,
>>>> TASK_UNINTERRUPTIBLE);
>>>
>>> Could you explain a bit why the 'wait_entry' order matters wrt. this
>>> issue? Since other 'wait_entry' still may come at the head meantime to
>>> the same wq before checking in may_queue().
>>
>> Let's say we have 10 tasks queued up. Each one gets added to the tail,
>> so when it's our turn, we've now reached the head. We fail to get a
>> queue token, so we go back to sleep. At that point we should add
>> back to the head, not the tail, for fairness purposes.
>
> OK, it is reasonable to do it for fairness purpose, but seems we still
> don't know how the wait forever in wbt_wait() is fixed by this way.
It's not, it's just removing a class of unfairness. You should not go
to the back of the queue if you fail, you should remain near the top.
> I guess the reason is in the check of 'rqw->wait.head.next != &wait->entry',
> which is basically removed when the waiter is waken up.
Plus at that point it's useless, since we are the head of queue. The
check only makes sense if we tail add.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
2018-06-22 23:13 ` Jens Axboe
@ 2018-06-23 1:37 ` Ming Lei
2018-06-23 16:20 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2018-06-23 1:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block@vger.kernel.org
On Fri, Jun 22, 2018 at 05:13:31PM -0600, Jens Axboe wrote:
> On 6/22/18 5:11 PM, Ming Lei wrote:
> > On Fri, Jun 22, 2018 at 04:51:26PM -0600, Jens Axboe wrote:
> >> On 6/22/18 4:43 PM, Ming Lei wrote:
> >>> On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
> >>>> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
> >>>> task placement in its decision making on whether or not the current task
> >>>> can proceed. This can cause issues for the lowest class of writers,
> >>>> since they can get woken up, denied access, and then put back to sleep
> >>>> at the end of the waitqueue.
> >>>>
> >>>> Fix this so that we only utilize the tail add for the initial sleep, and
> >>>> we don't factor in the wait queue placement after we've slept (and are
> >>>> now using the head addition).
> >>>>
> >>>> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>
> >>>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> >>>> index 4f89b28fa652..7beeabd05f4a 100644
> >>>> --- a/block/blk-wbt.c
> >>>> +++ b/block/blk-wbt.c
> >>>> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
> >>>> * If the waitqueue is already active and we are not the next
> >>>> * in line to be woken up, wait for our turn.
> >>>> */
> >>>> - if (waitqueue_active(&rqw->wait) &&
> >>>> + if (wait && waitqueue_active(&rqw->wait) &&
> >>>> rqw->wait.head.next != &wait->entry)
> >>>> return false;
> >>>>
> >>>> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
> >>>> __acquires(lock)
> >>>> {
> >>>> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
> >>>> + struct wait_queue_entry *waitptr = NULL;
> >>>> DEFINE_WAIT(wait);
> >>>>
> >>>> - if (may_queue(rwb, rqw, &wait, rw))
> >>>> + if (may_queue(rwb, rqw, waitptr, rw))
> >>>> return;
> >>>>
> >>>> + waitptr = &wait;
> >>>> do {
> >>>> - prepare_to_wait_exclusive(&rqw->wait, &wait,
> >>>> + /*
> >>>> + * Don't add ourselves to the wq tail if we've already
> >>>> + * slept. Otherwise we can penalize background writes
> >>>> + * indefinitely.
> >>>> + */
> >>>
> >>> I saw this indefinite wbt_wait() in systemd-journal when running
> >>> aio-stress read test, but just once, not figured out one approach
> >>> to reproduce it yet, just wondering if you have quick test case for
> >>> reproducing and verifying this issue.
> >>
> >> I've seen it in production, but I'm currently relying on someone else
> >> to reproduce it synthetically. I'm just providing the patches for
> >> testing.
> >>
> >>>> + if (waitptr)
> >>>> + prepare_to_wait_exclusive(&rqw->wait, &wait,
> >>>> + TASK_UNINTERRUPTIBLE);
> >>>> + else
> >>>> + prepare_to_wait(&rqw->wait, &wait,
> >>>> TASK_UNINTERRUPTIBLE);
> >>>
> >>> Could you explain a bit why the 'wait_entry' order matters wrt. this
> >>> issue? Since other 'wait_entry' still may come at the head meantime to
> >>> the same wq before checking in may_queue().
> >>
> >> Let's say we have 10 tasks queued up. Each one gets added to the tail,
> >> so when it's our turn, we've now reached the head. We fail to get a
> >> queue token, so we go back to sleep. At that point we should add
> >> back to the head, not the tail, for fairness purposes.
> >
> > OK, it is reasonable to do it for fairness purpose, but seems we still
> > don't know how the wait forever in wbt_wait() is fixed by this way.
>
> It's not, it's just removing a class of unfairness. You should not go
> to the back of the queue if you fail, you should remain near the top.
>
> > I guess the reason is in the check of 'rqw->wait.head.next != &wait->entry',
> > which is basically removed when the waiter is waken up.
>
> Plus at that point it's useless, since we are the head of queue. The
> check only makes sense if we tail add.
Seems not safe to run the check in case of tail add too:
- just during or after checking 'rqw->wait.head.next != &wait->entry', all
inflight requests in this wq are done
- may_queue() still returns false because 'rqw->wait.head.next != &wait->entry'
returns true, then __wbt_wait() may wait forever.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
2018-06-23 1:37 ` Ming Lei
@ 2018-06-23 16:20 ` Jens Axboe
0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2018-06-23 16:20 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block@vger.kernel.org
On 6/22/18 7:37 PM, Ming Lei wrote:
> On Fri, Jun 22, 2018 at 05:13:31PM -0600, Jens Axboe wrote:
>> On 6/22/18 5:11 PM, Ming Lei wrote:
>>> On Fri, Jun 22, 2018 at 04:51:26PM -0600, Jens Axboe wrote:
>>>> On 6/22/18 4:43 PM, Ming Lei wrote:
>>>>> On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
>>>>>> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
>>>>>> task placement in its decision making on whether or not the current task
>>>>>> can proceed. This can cause issues for the lowest class of writers,
>>>>>> since they can get woken up, denied access, and then put back to sleep
>>>>>> at the end of the waitqueue.
>>>>>>
>>>>>> Fix this so that we only utilize the tail add for the initial sleep, and
>>>>>> we don't factor in the wait queue placement after we've slept (and are
>>>>>> now using the head addition).
>>>>>>
>>>>>> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>
>>>>>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>>>>>> index 4f89b28fa652..7beeabd05f4a 100644
>>>>>> --- a/block/blk-wbt.c
>>>>>> +++ b/block/blk-wbt.c
>>>>>> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
>>>>>> * If the waitqueue is already active and we are not the next
>>>>>> * in line to be woken up, wait for our turn.
>>>>>> */
>>>>>> - if (waitqueue_active(&rqw->wait) &&
>>>>>> + if (wait && waitqueue_active(&rqw->wait) &&
>>>>>> rqw->wait.head.next != &wait->entry)
>>>>>> return false;
>>>>>>
>>>>>> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
>>>>>> __acquires(lock)
>>>>>> {
>>>>>> struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
>>>>>> + struct wait_queue_entry *waitptr = NULL;
>>>>>> DEFINE_WAIT(wait);
>>>>>>
>>>>>> - if (may_queue(rwb, rqw, &wait, rw))
>>>>>> + if (may_queue(rwb, rqw, waitptr, rw))
>>>>>> return;
>>>>>>
>>>>>> + waitptr = &wait;
>>>>>> do {
>>>>>> - prepare_to_wait_exclusive(&rqw->wait, &wait,
>>>>>> + /*
>>>>>> + * Don't add ourselves to the wq tail if we've already
>>>>>> + * slept. Otherwise we can penalize background writes
>>>>>> + * indefinitely.
>>>>>> + */
>>>>>
>>>>> I saw this indefinite wbt_wait() in systemd-journal when running
>>>>> aio-stress read test, but just once, not figured out one approach
>>>>> to reproduce it yet, just wondering if you have quick test case for
>>>>> reproducing and verifying this issue.
>>>>
>>>> I've seen it in production, but I'm currently relying on someone else
>>>> to reproduce it synthetically. I'm just providing the patches for
>>>> testing.
>>>>
>>>>>> + if (waitptr)
>>>>>> + prepare_to_wait_exclusive(&rqw->wait, &wait,
>>>>>> + TASK_UNINTERRUPTIBLE);
>>>>>> + else
>>>>>> + prepare_to_wait(&rqw->wait, &wait,
>>>>>> TASK_UNINTERRUPTIBLE);
>>>>>
>>>>> Could you explain a bit why the 'wait_entry' order matters wrt. this
>>>>> issue? Since other 'wait_entry' still may come at the head meantime to
>>>>> the same wq before checking in may_queue().
>>>>
>>>> Let's say we have 10 tasks queued up. Each one gets added to the tail,
>>>> so when it's our turn, we've now reached the head. We fail to get a
>>>> queue token, so we go back to sleep. At that point we should add
>>>> back to the head, not the tail, for fairness purposes.
>>>
>>> OK, it is reasonable to do it for fairness purpose, but seems we still
>>> don't know how the wait forever in wbt_wait() is fixed by this way.
>>
>> It's not, it's just removing a class of unfairness. You should not go
>> to the back of the queue if you fail, you should remain near the top.
>>
>>> I guess the reason is in the check of 'rqw->wait.head.next != &wait->entry',
>>> which is basically removed when the waiter is waken up.
>>
>> Plus at that point it's useless, since we are the head of queue. The
>> check only makes sense if we tail add.
>
> Seems not safe to run the check in case of tail add too:
>
> - just during or after checking 'rqw->wait.head.next != &wait->entry', all
> inflight requests in this wq are done
>
> - may_queue() still returns false because 'rqw->wait.head.next != &wait->entry'
> returns true, then __wbt_wait() may wait forever.
Yeah, that's a good point. I'll try and think about it for a bit and
rework it. Basically the logic should be that we queue behind others
that are waiting, to provide some fairness in handing out the tokens.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-06-23 16:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-22 19:26 [PATCH] blk-wbt: fix indefinite background writeback sleep Jens Axboe
2018-06-22 22:43 ` Ming Lei
2018-06-22 22:51 ` Jens Axboe
2018-06-22 23:11 ` Ming Lei
2018-06-22 23:13 ` Jens Axboe
2018-06-23 1:37 ` Ming Lei
2018-06-23 16:20 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox