From: "Darrick J. Wong" <djwong@kernel.org>
To: Filipe Manana <fdmanana@kernel.org>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
Zorro Lang <zlang@redhat.com>,
"fstests@vger.kernel.org" <fstests@vger.kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Chuck Lever III <chuck.lever@oracle.com>,
"djwong@vger.kernel.org" <djwong@vger.kernel.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: generic/650 makes v6.0-rc client unusable
Date: Wed, 9 Nov 2022 10:06:29 -0800 [thread overview]
Message-ID: <Y2vsJc1CKuUNzGID@magnolia> (raw)
In-Reply-To: <CAL3q7H5eV9Sb1axmNgvcbG7UrgGTH3AovaibQuWMz44Jfo-8_w@mail.gmail.com>
On Wed, Nov 09, 2022 at 10:36:04AM +0000, Filipe Manana wrote:
> On Wed, Nov 9, 2022 at 4:22 AM Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com> wrote:
> >
> > On Sep 04, 2022 / 21:15, Zorro Lang wrote:
> > > On Sat, Sep 03, 2022 at 06:43:29PM +0000, Chuck Lever III wrote:
> > > > While investigating some of the other issues that have been
> > > > reported lately, I've found that my v6.0-rc3 NFS/TCP client
> > > > goes off the rails often (but not always) during generic/650.
> > > >
> > > > This is the test that runs a workload while offlining and
> > > > onlining CPUs. My test client has 12 physical cores.
> > > >
> > > > The test appears to start normally, but then after a bit
> > > > the NFS server workload drops to zero and the NFS mount
> > > > disappears. I can't run programs (sudo, for example) on
> > > > the client. Can't log in, even on the console. The console
> > > > has a constant stream of "can't rotate log: Input/Output
> > > > error" type messages.
> >
> > I also observe this failure when I ran fstests using btrfs on my HDDs.
> > The failure is recreated almost always.
>
> I'm wondering what do you get in dmesg, any traces?
>
> I've excluded the test from my runs for over an year now, due to some
> crash that I reported
> to the mm and cpu hotplug people here:
>
> https://lore.kernel.org/linux-mm/CAL3q7H4AyrZ5erimDyO7mOVeppd5BeMw3CS=wGbzrMZrp56ktA@mail.gmail.com/
>
> Unfortunately I had no reply from anyone who works or maintains those
> subsystems.
>
> It didn't happen very often, and I haven't tested again with recent kernels.
I've been testing with xfs/btrfs/ext4 nightly, and haven't seen any
problems with the last two. There's some very infrequent log accounting
problem that is probably a regression from Dave's recent round of log
refactorings, so once we're clear of the write race corruption problem,
I intend to inquire about that.
Granted I also don't have hundreds-of-cpus machines to test this kind of
stuff, so I don't know how well hotplug mania fares on a big iron.
I don't think it's valid to remove a test from the auto group because it
uncovers bugs. If test runner folks want to put it in their own exclude
lists for their own convenience, that's fine with me.
--D
> >
> > > >
> > > > I haven't looked further into this yet. Actually I'm not
> > > > quite sure where to start looking.
> > > >
> > > > I recently switched this client from a local /home to an
> > > > NFS-mounted one, and that's where the xfstests are built
> > > > and run from, fwiw.
> > >
> > > If most of users complain generic/650, I'd like to exclude g/650 from the
> > > "auto" default run group. Any more points?
> >
> > +1. I wish to remove it from the "auto" group. Since I can not login to the test
> > machine after the failure, I suggest to put it in the "dangerous" group.
> >
> > --
> > Shin'ichiro Kawasaki
next prev parent reply other threads:[~2022-11-09 18:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3E21DFEA-8DF7-484B-8122-D578BFF7F9E0@oracle.com>
2022-09-04 13:15 ` generic/650 makes v6.0-rc client unusable Zorro Lang
2022-09-04 16:02 ` Chuck Lever III
2022-09-06 15:50 ` Chuck Lever III
2022-11-09 4:19 ` Shinichiro Kawasaki
2022-11-09 10:36 ` Filipe Manana
2022-11-09 18:06 ` Darrick J. Wong [this message]
2022-11-10 8:49 ` Shinichiro Kawasaki
2022-11-10 15:21 ` Theodore Ts'o
2022-11-10 8:46 ` Shinichiro Kawasaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2vsJc1CKuUNzGID@magnolia \
--to=djwong@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=djwong@vger.kernel.org \
--cc=fdmanana@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox