linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bruce Fields <bfields@fieldses.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Jonathan Woithe <jwoithe@just42.net>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount
Date: Mon, 17 Jan 2022 10:50:19 -0500	[thread overview]
Message-ID: <20220117155019.GD28708@fieldses.org> (raw)
In-Reply-To: <927EED04-840E-4DA6-B2B1-B604A7577B4E@oracle.com>

On Sat, Jan 15, 2022 at 07:46:06PM +0000, Chuck Lever III wrote:
> 
> > On Jan 15, 2022, at 3:14 AM, Jonathan Woithe <jwoithe@just42.net> wrote:
> > 
> > Hi Chuck
> > 
> > Thanks for your response.
> > 
> > On Fri, Jan 14, 2022 at 03:18:01PM +0000, Chuck Lever III wrote:
> >>> Recently we migrated an NFS server from a 32-bit environment running 
> >>> kernel 4.14.128 to a 64-bit 5.15.x kernel.  The NFS configuration remained
> >>> unchanged between the two systems.
> >>> 
> >>> On two separate occasions since the upgrade (5 Jan under 5.15.10, 14 Jan
> >>> under 5.15.12) the kernel has oopsed at around the time that an NFS client
> >>> machine is turned on for the day.  On both occasions the call trace was
> >>> essentially identical.  The full oops sequence is at the end of this email. 
> >>> The oops was not observed when running the 4.14.128 kernel.
> >>> 
> >>> Is there anything more I can provide to help track down the cause of the
> >>> oops?
> >> 
> >> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each
> >> nlm_file"), which was introduced in or around v5.15.  You could try a
> >> simple test and back the server down to v5.14.y to see if the problem
> >> persists.
> > 
> > I could do this, but only perhaps on Monday when I'm next on site.  It may
> > take a while to get an answer though, since it seems we hit the fault only
> > around once every 2 weeks.  Since it's a production server we are of course
> > limited in the things I can do.
> > 
> > I *may* be able to set up another system as an NFS server and hit that with
> > repeated mount requests.  That could help reduce the time we have to wait
> > for an answer.
> 
> Given the callback information you provided, I believe that the problem
> is due to a client reboot, not a mount request. The callback shows the
> crash occurs while your server is processing an SM_NOTIFY request from
> one of your clients.
> 
> 
> > Is it worth considering a revert of 7f024fcd5c97?  I guess it depends on how
> > many later patches depended on it.
> 
> You can try reverting 7f024fcd5c97, but as I recall there are some
> subsequent changes that depend on that one.

NLM locking on reexports would stop working.  Which is a new (and
imperfect) feature, so less important than avoiding this NULL
dereference, if push came to shove.  But, let's see if we can just fix
it.....

--b.

  parent reply	other threads:[~2022-01-17 15:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 10:39 [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Jonathan Woithe
2022-01-14 15:18 ` Chuck Lever III
2022-01-15  8:14   ` Jonathan Woithe
2022-01-15 19:46     ` Chuck Lever III
2022-01-15 21:23       ` Jonathan Woithe
2022-01-16 22:06         ` Jonathan Woithe
2022-01-16 22:30           ` Chuck Lever III
2022-01-17  7:44             ` Jonathan Woithe
2022-01-17 22:08               ` Jonathan Woithe
2022-01-17 22:11                 ` Bruce Fields
2022-01-18 22:00                   ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Bruce Fields
2022-01-18 22:00                     ` [PATCH 2/2] lockd: fix failure to cleanup client locks Bruce Fields
2022-01-18 22:20                     ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Jonathan Woithe
2022-01-18 22:27                       ` Bruce Fields
2022-03-23 23:33                         ` Jonathan Woithe
2022-03-24 18:28                           ` Bruce Fields
2022-01-19 16:18                     ` Chuck Lever III
2022-01-31 22:20                       ` Jonathan Woithe
2022-02-01  2:10                         ` Chuck Lever III
2022-01-17 15:50       ` Bruce Fields [this message]
2022-01-17 18:22         ` [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Chuck Lever III
2022-01-17 15:47   ` Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220117155019.GD28708@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=jwoithe@just42.net \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).