public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Nix <nix@esperi.org.uk>
To: "bfields\@fieldses.org" <bfields@fieldses.org>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	"linux-nfs\@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10)
Date: Mon, 05 Apr 2021 12:48:16 +0100	[thread overview]
Message-ID: <87wnthhrzz.fsf@esperi.org.uk> (raw)
In-Reply-To: <20210402192059.GA16427@fieldses.org> (bfields@fieldses.org's message of "Fri, 2 Apr 2021 15:20:59 -0400")

On 2 Apr 2021, bfields@fieldses.org said:
> Sorry, did you say whether nfsd threads or rpc.mountd are blocked?

... just about to switch into debugging this, but it does seem to me
that if nfsd threads or (especially) mountd on the server side were
blocked, I'd see misbehaviour with mounts from every client, not just a
few of them. This doesn't happen.

While this is going on, my firewall and other clients not engaging in
the problematic Steam-related activity can talk NFSv4 to the server
perfectly happily: indeed this is actually a problem when debugging
because I have to quiesce the bloody things as much as I can to stop
their RPC traffic flooding the log with irrelevant junk :)

Recovery from this consists only of rebooting the stuck client: the
server and all other clients don't need touching (indeed, I'm typing
this in an emacs on that server, and since it was last rebooted it's
been hit by a client experiencing this hang at least five times: the
mailserver also keeps its mailspool on that server as well, and no
problems there either).

(The server also has fairly silly amounts of RAM compared to the load
it's placed under. I'm not concerned about the possibility of rpc.mountd
getting swapped out. It just doesn't happen. Even things like git gc of
the entire Chromium git repo proceed without swapping.)


btw, the filesystem layout on this machine is, in part:

/dev/main/root             xfs      4294993920  738953092 3556040828  18% /
/dev/mapper/main-vms       xfs      1073231868  406045460  667186408  38% /vm
/dev/mapper/main-steam     ext4     1055852896   85367140  916781564   9% /pkg/non-free/steam
/dev/mapper/main-archive   xfs      3219652608 2761922796  457729812  86% /usr/archive
/dev/mapper/main-pete      xfs      2468405656 2216785448  251620208  90% /usr/archive/music/Pete
/dev/mapper/main-phones    xfs        52411388    4354092   48057296   9% /.nfs/nix/share/phones
/dev/mapper/main-unifi     xfs        10491804    1130228    9361576  11% /var/lib/unifi
/dev/mapper/oracrypt-plain 2147510784  144030636 2003480148   7% /home/oranix/oracle/private

... and you'll note that the exported fs I'm seeing hangs on is actually
*not* the $HOME on the root fs: it's /pkg/non-free/steam, which is ext4
purely because so many games on x86 still fail horribly when 64-bit
inodes are used, and ext4 can emit 32-bit inodes on biggish fses without
horrible performance consequences, unlike xfs.  The relevant import line:

loom:/pkg/non-free/steam/.steam /home/nix/.steam nfs defaults,_netdev

(so it is imported to *subdirectory* of a directory which is a mounted
NFS export, and *that* one is exported from /). The hang also happens
when using nfusr as the NFS client for the .steam import, so whatever it
is isn't just down to the client...

The primary reason I'm using one big fs for almost everything on this
server build is, uh, NFSv4. My last machine had lots of little
filesystems, and the result somehow confused the NFS pseudoroot
construction process so badly that most of the things I tried to export
never appeared on NFSv4, only on v3: only those exports which *were* on
the root filesystem were ever available for NFSv4 mounting, so I was
stuck with v3 on that machine. At (IIRC) Chuck Lever's suggestion (many
years ago, so he probably won't remember) I varied things when I built a
new server and was happy to find that with a less baroque setup and a
bigger rootfs with more stuff on it, NFSv4 seemed perfectly happy and
the pseudoroot was populated fine.

OK let's collect some logs so we're not reasoning in the absence of data
any more. Back soon! (I hope.)

  parent reply	other threads:[~2021-04-05 11:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-03 14:27 steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) Nick Alcock
2021-02-23 22:57 ` J. Bruce Fields
2021-02-23 23:58   ` Trond Myklebust
2021-02-24  2:01     ` bfields
2021-04-01 13:33       ` Nix
2021-04-01 13:44         ` bfields
2021-04-01 21:52           ` Nix
2021-04-02 19:20             ` bfields
2021-04-03 22:41               ` Nix
2021-04-05 11:48               ` Nix [this message]
2021-04-05 16:52       ` Nix
  -- strict thread matches above, loose matches on Subject: below --
2021-02-13 15:21 Nick Alcock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnthhrzz.fsf@esperi.org.uk \
    --to=nix@esperi.org.uk \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox