From: "Darrick J. Wong" <djwong@kernel.org>
To: Filipe Manana <fdmanana@kernel.org>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
Zorro Lang <zlang@redhat.com>,
"fstests@vger.kernel.org" <fstests@vger.kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Chuck Lever III <chuck.lever@oracle.com>,
"djwong@vger.kernel.org" <djwong@vger.kernel.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: generic/650 makes v6.0-rc client unusable
Date: Wed, 9 Nov 2022 10:06:29 -0800 [thread overview]
Message-ID: <Y2vsJc1CKuUNzGID@magnolia> (raw)
In-Reply-To: <CAL3q7H5eV9Sb1axmNgvcbG7UrgGTH3AovaibQuWMz44Jfo-8_w@mail.gmail.com>
On Wed, Nov 09, 2022 at 10:36:04AM +0000, Filipe Manana wrote:
> On Wed, Nov 9, 2022 at 4:22 AM Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com> wrote:
> >
> > On Sep 04, 2022 / 21:15, Zorro Lang wrote:
> > > On Sat, Sep 03, 2022 at 06:43:29PM +0000, Chuck Lever III wrote:
> > > > While investigating some of the other issues that have been
> > > > reported lately, I've found that my v6.0-rc3 NFS/TCP client
> > > > goes off the rails often (but not always) during generic/650.
> > > >
> > > > This is the test that runs a workload while offlining and
> > > > onlining CPUs. My test client has 12 physical cores.
> > > >
> > > > The test appears to start normally, but then after a bit
> > > > the NFS server workload drops to zero and the NFS mount
> > > > disappears. I can't run programs (sudo, for example) on
> > > > the client. Can't log in, even on the console. The console
> > > > has a constant stream of "can't rotate log: Input/Output
> > > > error" type messages.
> >
> > I also observe this failure when I ran fstests using btrfs on my HDDs.
> > The failure is recreated almost always.
>
> I'm wondering what do you get in dmesg, any traces?
>
> I've excluded the test from my runs for over an year now, due to some
> crash that I reported
> to the mm and cpu hotplug people here:
>
> https://lore.kernel.org/linux-mm/CAL3q7H4AyrZ5erimDyO7mOVeppd5BeMw3CS=wGbzrMZrp56ktA@mail.gmail.com/
>
> Unfortunately I had no reply from anyone who works or maintains those
> subsystems.
>
> It didn't happen very often, and I haven't tested again with recent kernels.
I've been testing with xfs/btrfs/ext4 nightly, and haven't seen any
problems with the last two. There's some very infrequent log accounting
problem that is probably a regression from Dave's recent round of log
refactorings, so once we're clear of the write race corruption problem,
I intend to inquire about that.
Granted I also don't have hundreds-of-cpus machines to test this kind of
stuff, so I don't know how well hotplug mania fares on a big iron.
I don't think it's valid to remove a test from the auto group because it
uncovers bugs. If test runner folks want to put it in their own exclude
lists for their own convenience, that's fine with me.
--D
> >
> > > >
> > > > I haven't looked further into this yet. Actually I'm not
> > > > quite sure where to start looking.
> > > >
> > > > I recently switched this client from a local /home to an
> > > > NFS-mounted one, and that's where the xfstests are built
> > > > and run from, fwiw.
> > >
> > > If most of users complain generic/650, I'd like to exclude g/650 from the
> > > "auto" default run group. Any more points?
> >
> > +1. I wish to remove it from the "auto" group. Since I can not login to the test
> > machine after the failure, I suggest to put it in the "dangerous" group.
> >
> > --
> > Shin'ichiro Kawasaki
next prev parent reply other threads:[~2022-11-09 18:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-03 18:43 generic/650 makes v6.0-rc client unusable Chuck Lever III
2022-09-04 8:49 ` David Wysochanski
2022-09-04 12:48 ` Theodore Ts'o
2022-09-04 13:15 ` Zorro Lang
2022-09-04 16:02 ` Chuck Lever III
2022-09-06 15:50 ` Chuck Lever III
2022-11-09 4:19 ` Shinichiro Kawasaki
2022-11-09 10:36 ` Filipe Manana
2022-11-09 18:06 ` Darrick J. Wong [this message]
2022-11-10 8:49 ` Shinichiro Kawasaki
2022-11-10 15:21 ` Theodore Ts'o
2022-11-10 8:46 ` Shinichiro Kawasaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2vsJc1CKuUNzGID@magnolia \
--to=djwong@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=djwong@vger.kernel.org \
--cc=fdmanana@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.