From: Bruce Fields <bfields@fieldses.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Jonathan Woithe <jwoithe@just42.net>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount
Date: Mon, 17 Jan 2022 10:50:19 -0500 [thread overview]
Message-ID: <20220117155019.GD28708@fieldses.org> (raw)
In-Reply-To: <927EED04-840E-4DA6-B2B1-B604A7577B4E@oracle.com>
On Sat, Jan 15, 2022 at 07:46:06PM +0000, Chuck Lever III wrote:
>
> > On Jan 15, 2022, at 3:14 AM, Jonathan Woithe <jwoithe@just42.net> wrote:
> >
> > Hi Chuck
> >
> > Thanks for your response.
> >
> > On Fri, Jan 14, 2022 at 03:18:01PM +0000, Chuck Lever III wrote:
> >>> Recently we migrated an NFS server from a 32-bit environment running
> >>> kernel 4.14.128 to a 64-bit 5.15.x kernel. The NFS configuration remained
> >>> unchanged between the two systems.
> >>>
> >>> On two separate occasions since the upgrade (5 Jan under 5.15.10, 14 Jan
> >>> under 5.15.12) the kernel has oopsed at around the time that an NFS client
> >>> machine is turned on for the day. On both occasions the call trace was
> >>> essentially identical. The full oops sequence is at the end of this email.
> >>> The oops was not observed when running the 4.14.128 kernel.
> >>>
> >>> Is there anything more I can provide to help track down the cause of the
> >>> oops?
> >>
> >> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each
> >> nlm_file"), which was introduced in or around v5.15. You could try a
> >> simple test and back the server down to v5.14.y to see if the problem
> >> persists.
> >
> > I could do this, but only perhaps on Monday when I'm next on site. It may
> > take a while to get an answer though, since it seems we hit the fault only
> > around once every 2 weeks. Since it's a production server we are of course
> > limited in the things I can do.
> >
> > I *may* be able to set up another system as an NFS server and hit that with
> > repeated mount requests. That could help reduce the time we have to wait
> > for an answer.
>
> Given the callback information you provided, I believe that the problem
> is due to a client reboot, not a mount request. The callback shows the
> crash occurs while your server is processing an SM_NOTIFY request from
> one of your clients.
>
>
> > Is it worth considering a revert of 7f024fcd5c97? I guess it depends on how
> > many later patches depended on it.
>
> You can try reverting 7f024fcd5c97, but as I recall there are some
> subsequent changes that depend on that one.
NLM locking on reexports would stop working. Which is a new (and
imperfect) feature, so less important than avoiding this NULL
dereference, if push came to shove. But, let's see if we can just fix
it.....
--b.
next prev parent reply other threads:[~2022-01-17 15:50 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-14 10:39 [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Jonathan Woithe
2022-01-14 15:18 ` Chuck Lever III
2022-01-15 8:14 ` Jonathan Woithe
2022-01-15 19:46 ` Chuck Lever III
2022-01-15 21:23 ` Jonathan Woithe
2022-01-16 22:06 ` Jonathan Woithe
2022-01-16 22:30 ` Chuck Lever III
2022-01-17 7:44 ` Jonathan Woithe
2022-01-17 22:08 ` Jonathan Woithe
2022-01-17 22:11 ` Bruce Fields
2022-01-18 22:00 ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Bruce Fields
2022-01-18 22:00 ` [PATCH 2/2] lockd: fix failure to cleanup client locks Bruce Fields
2022-01-18 22:20 ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Jonathan Woithe
2022-01-18 22:27 ` Bruce Fields
2022-03-23 23:33 ` Jonathan Woithe
2022-03-24 18:28 ` Bruce Fields
2022-01-19 16:18 ` Chuck Lever III
2022-01-31 22:20 ` Jonathan Woithe
2022-02-01 2:10 ` Chuck Lever III
2022-01-17 15:50 ` Bruce Fields [this message]
2022-01-17 18:22 ` [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Chuck Lever III
2022-01-17 15:47 ` Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220117155019.GD28708@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=jwoithe@just42.net \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).