From: Jens Axboe <axboe@fb.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
linux-ext4@vger.kernel.org, fstests@vger.kernel.org,
tarasov@vasily.name
Subject: Re: Test generic/299 stalling forever
Date: Mon, 24 Oct 2016 20:59:12 -0600 [thread overview]
Message-ID: <34b2b4fe-052d-4d13-bc80-211707ea118e@fb.com> (raw)
In-Reply-To: <20161025025456.bbruxu4lg25773sl@thunk.org>
On 10/24/2016 08:54 PM, Theodore Ts'o wrote:
> On Mon, Oct 24, 2016 at 10:28:14AM -0600, Jens Axboe wrote:
>
>> How about the below? Bump the timeout to 5 min, 1 min is a little on the
>> short side, we want normal error handling to be out of the way before
>> that happens. And additionally, break out if we have been marked as
>> reaped/exited, so we avoid grabbing the stat mutex again.
>
> Yep, that works. I tried a test with just the second change:
>
>> + /*
>> + * If we took too long to shut down, the main thread could
>> + * already consider us reaped/exited. If that happens, break
>> + * out and clean up.
>> + */
>> + if (td->runstate >= TD_EXITED)
>> + break;
>> +
>
> And that's sufficient to solve the problem.
Yes, it should be, so glad that it is!
> Increasing the timeout to 5 minute also would be a good idea, so we
> can let the worker threads exit cleanly so the reported stats will be
> completely accurate.
I made that separate change as well. If the job is stuck in the kernel
for some sync operation, we could feasibly be uninterruptible for
minutes. So 1 minutes is too short in any case, and I'd rather just make
this check than sending kill signals since it won't fix the
uninterruptible problem.
> Thanks for your help in figuring out this long-standing problem!
It was easy based on all your info, since I could not reproduce. So
thanks for your help! Everything should be committed now, and I'll cut a
new release tomorrow so we can hopefully put this behind us.
--
Jens Axboe
WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@fb.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>, <linux-ext4@vger.kernel.org>,
<fstests@vger.kernel.org>, <tarasov@vasily.name>
Subject: Re: Test generic/299 stalling forever
Date: Mon, 24 Oct 2016 20:59:12 -0600 [thread overview]
Message-ID: <34b2b4fe-052d-4d13-bc80-211707ea118e@fb.com> (raw)
In-Reply-To: <20161025025456.bbruxu4lg25773sl@thunk.org>
On 10/24/2016 08:54 PM, Theodore Ts'o wrote:
> On Mon, Oct 24, 2016 at 10:28:14AM -0600, Jens Axboe wrote:
>
>> How about the below? Bump the timeout to 5 min, 1 min is a little on the
>> short side, we want normal error handling to be out of the way before
>> that happens. And additionally, break out if we have been marked as
>> reaped/exited, so we avoid grabbing the stat mutex again.
>
> Yep, that works. I tried a test with just the second change:
>
>> + /*
>> + * If we took too long to shut down, the main thread could
>> + * already consider us reaped/exited. If that happens, break
>> + * out and clean up.
>> + */
>> + if (td->runstate >= TD_EXITED)
>> + break;
>> +
>
> And that's sufficient to solve the problem.
Yes, it should be, so glad that it is!
> Increasing the timeout to 5 minute also would be a good idea, so we
> can let the worker threads exit cleanly so the reported stats will be
> completely accurate.
I made that separate change as well. If the job is stuck in the kernel
for some sync operation, we could feasibly be uninterruptible for
minutes. So 1 minutes is too short in any case, and I'd rather just make
this check than sending kill signals since it won't fix the
uninterruptible problem.
> Thanks for your help in figuring out this long-standing problem!
It was easy based on all your info, since I could not reproduce. So
thanks for your help! Everything should be committed now, and I'll cut a
new release tomorrow so we can hopefully put this behind us.
--
Jens Axboe
next prev parent reply other threads:[~2016-10-25 3:00 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-18 15:53 Test generic/299 stalling forever Theodore Ts'o
2015-06-18 16:25 ` Eric Whitney
2015-06-18 23:34 ` Dave Chinner
2015-06-19 2:56 ` Theodore Ts'o
2016-09-29 4:37 ` Theodore Ts'o
2016-10-12 15:46 ` Jens Axboe
2016-10-12 15:46 ` Jens Axboe
2016-10-12 21:14 ` Dave Chinner
2016-10-12 21:19 ` Jens Axboe
2016-10-12 21:19 ` Jens Axboe
2016-10-13 2:15 ` Theodore Ts'o
2016-10-13 2:39 ` Jens Axboe
2016-10-13 2:39 ` Jens Axboe
2016-10-13 23:19 ` Theodore Ts'o
2016-10-18 18:01 ` Theodore Ts'o
2016-10-19 14:06 ` Jens Axboe
2016-10-19 14:06 ` Jens Axboe
2016-10-19 17:49 ` Jens Axboe
2016-10-19 17:49 ` Jens Axboe
2016-10-19 20:32 ` Theodore Ts'o
2016-10-20 14:22 ` Jens Axboe
2016-10-20 14:22 ` Jens Axboe
2016-10-21 22:15 ` Theodore Ts'o
2016-10-23 2:02 ` Theodore Ts'o
2016-10-23 14:32 ` Jens Axboe
2016-10-23 14:32 ` Jens Axboe
2016-10-23 19:33 ` Theodore Ts'o
2016-10-23 21:24 ` Theodore Ts'o
2016-10-24 1:41 ` Jens Axboe
2016-10-24 1:41 ` Jens Axboe
2016-10-24 3:38 ` Theodore Ts'o
2016-10-24 16:28 ` Jens Axboe
2016-10-24 16:28 ` Jens Axboe
2016-10-25 2:54 ` Theodore Ts'o
2016-10-25 2:59 ` Jens Axboe [this message]
2016-10-25 2:59 ` Jens Axboe
2016-10-13 13:08 ` Anatoly Pugachev
2016-10-13 13:36 ` Anatoly Pugachev
2016-10-13 14:28 ` Jens Axboe
2016-10-13 14:28 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34b2b4fe-052d-4d13-bc80-211707ea118e@fb.com \
--to=axboe@fb.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tarasov@vasily.name \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.