All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Herzog <herzog@phys.ethz.ch>
To: Bob Ciotti <bob.ciotti@gmail.com>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Bob Ciotti <bob.ciotti@nasa.gov>
Subject: Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm
Date: Thu, 6 Apr 2023 19:26:15 +0200	[thread overview]
Message-ID: <ZC8AtxuibSDwvK49@phys.ethz.ch> (raw)
In-Reply-To: <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>

Dear Bob,

thanks a lot for your input.

> >>>> That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
> >>>> read from them and write to them. If xfs were to block nfs IO, this should
> >>>> affect other processes too, right?
> >>> It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.
> >> ok good to know. So far we were under the impression that a file system would
> >> block as a whole.
> > 
> > XFS tries to operate in parallel as much as it can. Maybe other filesystems aren't as capable.
> > 
> > If the unresponsive block is part of a superblock or the journal (ie, shared metadata) I would expect XFS to become unresponsive. For I/O on blocks containing file data, it is likely to have more robust behavior.
> > 
> 
> Pretty sure we have seen a similar issue - never fully explained.  From what I recall, the server gets to a low memory state. At that point, efforts to coalesce writes are abandoned, and each write request is processed in line - vs scheduled - all nfsd's then pile up in D.  writes continue to arrive at a rate higher than can keep up. But, the back end store (a high end netapp raid 6 w/240 drives also with xfs) had very little load - not too busy.  Never fully explained it - but Chucks point on  shared metadata block may be good place to look - and whether in-line write at low memory could have synergy.  IIRC, worked around with releases and tunables like minfree kmem et.al. , that came into play to reduce - but not eliminate. I'm away from reference material for a while but I'll review and update if I find anything.
we'll certainly investigate this topic, but right now it's kinda hard to
imagine since I've never seen the file server above ~10G of its 64G of RAM
(excluding page cache of course). We're not even sure heavy writes trigger the
problem, in one case our monitoring hinted at a lot of reads leading up to the
freeze.
OTOH if our issue could be resolved by throwing a bunch of RAM bars into the
server, all the better.


thanks,
-Christian


  parent reply	other threads:[~2023-04-06 17:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-06 11:09 file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm Christian Herzog
2023-04-06 13:48 ` Chuck Lever III
2023-04-06 15:33   ` Christian Herzog
2023-04-06 15:40     ` Chuck Lever III
2023-04-06 15:54       ` Christian Herzog
2023-04-06 16:19         ` Chuck Lever III
     [not found]           ` <4F41FC87-908F-451F-8D2C-089CB7AB5919@gmail.com>
2023-04-06 17:26             ` Christian Herzog [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-04-06 16:59 Christian Herzog
2023-04-07  6:26 ` Yu Kuai
2023-04-20 12:57   ` Christian Herzog

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC8AtxuibSDwvK49@phys.ethz.ch \
    --to=herzog@phys.ethz.ch \
    --cc=bob.ciotti@gmail.com \
    --cc=bob.ciotti@nasa.gov \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.