public inbox for fstests@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, fstests@vger.kernel.org, tarasov@vasily.name
Subject: Re: Test generic/299 stalling forever
Date: Wed, 12 Oct 2016 15:19:25 -0600	[thread overview]
Message-ID: <f1c80b32-9593-48e0-05e9-4c8b97380b49@fb.com> (raw)
In-Reply-To: <20161012211407.GL23194@dastard>

On 10/12/2016 03:14 PM, Dave Chinner wrote:
> On Thu, Sep 29, 2016 at 12:37:22AM -0400, Theodore Ts'o wrote:
>> On Fri, Jun 19, 2015 at 09:34:30AM +1000, Dave Chinner wrote:
>>> On Thu, Jun 18, 2015 at 11:53:37AM -0400, Theodore Ts'o wrote:
>>>> I've been trying to figure out why generic/299 has occasionally been
>>>> stalling forever.  After taking a closer look, it appears the problem
>>>> is that the fio process is stalling in userspace.  Looking at the ps
>>>> listing, the fio process hasn't run in over six hours, and using
>>>> attaching strace to the fio process, it's stalled in a FUTUEX_WAIT.
>>>>
>>>> Has anyone else seen this?  I'm using fio 2.2.6, and I have a feeling
>>>> that I started seeing this when I started using a newer version of
>>>> fio.  So I'm going to try roll back to an older version of fio and see
>>>> if that causes the problem to go away.
>>>
>>> I'm running on fio 2.1.3 at the moment and I havne't seen any
>>> problems like this for months. Keep in mind that fio does tend to
>>> break in strange ways fairly regularly, so I'd suggest an
>>> upgrade/downgrade of fio as your first move.
>>
>> Out of curiosity, Dave, are you still using fio 2.1.3?  I had upgraded
>
> No.
>
> $ fio -v
> fio-2.1.11
> $
>
>> to the latest fio to fix other test breaks, and I'm stil seeing the
>> occasional generic/299 test failure.  In fact, it's been happening
>> often enough on one of my test platforms[1] that I decided to really
>> dig down and investigate it, and all of the threads were blocking on
>> td->verify_cond in fio's verify.c.
>>
>> It bisected down to this commit:
>>
>> commit e5437a073e658e8154b9e87bab5c7b3b06ed4255
>> Author: Vasily Tarasov <tarasov@vasily.name>
>> Date:   Sun Nov 9 20:22:24 2014 -0700
>>
>>     Fix for a race when fio prints I/O statistics periodically
>>
>>     Below is the demonstration for the latest code in git:
>>     ...
>>
>> So generic/299 passes reliably with this commits parent, and it fails
>> on this commit within a dozen tries or so.  The commit first landed in
>> fio 2.1.14, so it's consistent with Dave's report a year ago he was
>> still using fio 2.1.3.
>
> But I'm still not using a fio recent enough to hit this.

FWIW, this is the commit that fixes it:

commit 39d13e67ef1f4b327c68431f8daf033a03920117
Author: Jens Axboe <axboe@fb.com>
Date:   Fri Aug 26 14:39:30 2016 -0600

     backend: check if we need to update rusage stats, if stat_mutex is busy

2.14 and newer should not have the problem, but earlier versions may
depending on how old...

-- 
Jens Axboe


  reply	other threads:[~2016-10-12 22:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-18 15:53 Test generic/299 stalling forever Theodore Ts'o
2015-06-18 16:25 ` Eric Whitney
2015-06-18 23:34 ` Dave Chinner
2015-06-19  2:56   ` Theodore Ts'o
2016-09-29  4:37   ` Theodore Ts'o
2016-10-12 15:46     ` Jens Axboe
2016-10-12 21:14     ` Dave Chinner
2016-10-12 21:19       ` Jens Axboe [this message]
2016-10-13  2:15         ` Theodore Ts'o
2016-10-13  2:39           ` Jens Axboe
2016-10-13 23:19             ` Theodore Ts'o
2016-10-18 18:01               ` Theodore Ts'o
2016-10-19 14:06                 ` Jens Axboe
2016-10-19 17:49                   ` Jens Axboe
2016-10-19 20:32                     ` Theodore Ts'o
2016-10-20 14:22                       ` Jens Axboe
2016-10-21 22:15                         ` Theodore Ts'o
2016-10-23  2:02                           ` Theodore Ts'o
2016-10-23 14:32                           ` Jens Axboe
2016-10-23 19:33                             ` Theodore Ts'o
2016-10-23 21:24                               ` Theodore Ts'o
2016-10-24  1:41                                 ` Jens Axboe
2016-10-24  3:38                                 ` Theodore Ts'o
2016-10-24 16:28                                   ` Jens Axboe
2016-10-25  2:54                                     ` Theodore Ts'o
2016-10-25  2:59                                       ` Jens Axboe
2016-10-13 13:08           ` Anatoly Pugachev
2016-10-13 13:36             ` Anatoly Pugachev
2016-10-13 14:28               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1c80b32-9593-48e0-05e9-4c8b97380b49@fb.com \
    --to=axboe@fb.com \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tarasov@vasily.name \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox