public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: J David <j.david.lists@gmail.com>
Cc: Rick Macklem <rick.macklem@gmail.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: knfsd server bug when GETATTR follows READDIR
Date: Sat, 21 Dec 2024 12:34:40 -0500	[thread overview]
Message-ID: <e4853faf-0836-4595-952b-69f71150bede@oracle.com> (raw)
In-Reply-To: <CABXB=RQn8qU5TZsWyBpWNaDQpaMPhdi4RYVJ0D1qJAWiFuBAHQ@mail.gmail.com>

On 12/20/24 9:16 PM, J David wrote:
> Hello,
> 
> On Tue, Dec 17, 2024 at 8:51 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>> If they can reproduce
>> this issue with an "in tree" file system contained in a recent upstream
>> Linux kernel, then we can take a look. (Or you and J. David can give it
>> a try).
> 
> Yes, I reproduced this behavior on ext4 with 6.11.5+bpo-amd64 from
> Debian backports on completely different hardware.
> 
> Then I set up another NFS server on Arch (running kernel 6.12.4), and
> reproduced the issue there as well.
> 
> Then, just to be sure, I went and found the instructions for building
> the Linux kernel from source, built and tested both 6.12.6 and
> 6.13-rc3 as downloaded directly from www.kernel.org, and the issue
> occurs with those as well.

Reproducing on v6.13-rc with ext4 is all that was necessary, thank you!


> Additionally, I have tested every combination of FreeBSD, Linux and
> OpenIndiana as client and server to confirm that FreeBSD client with
> Linux server is the only case where this problem occurs.

Interesting.


> Does this count as reproducing the issue with an "in tree" file system
> contained in a recent upstream Linux kernel? I'm asking sincerely; I'm
> so far out of my depth that I'm pretty sure there are sea monsters
> swimming around down there. So I can't rule out the possibility that
> I've done something wrong either in setup or testing.
> 
> During the course of this, I've gotten the reproduction down to
> extracting a 2k tar file and then running "du" on the resulting
> directory from the client. Doesn't matter if the file is untarred on
> the FreeBSD client, the server, or another client. The tar file
> contains a directory with a handful of random Javascript files from
> Drupal. As far as I can tell, it has something to do with the number,
> size, or names of the files. The Drupal project has three separate
> directories all structured like this with the same filenames, but the
> file contents vary. The issue occurs with all of them.
> 
> The Linux /etc/exports file is just:
> 
> /data 192.168.201.0/24(rw,sync)
> 
> (The production case also uses crossmnt and no_subtree_check, anonuid,
> and anongid, but I eliminated those one by one to make sure they
> weren't responsible.)
> 
> The corresponding fstab entry on the FreeBSD 14.2-RELEASE client is:
> 
> 192.168.201.200:/data /data nfs rw,tcp,nfsv4,minorversion=2 0 0

Out of curiosity, do you see the problem recur with nfsv3 or the other
NFSv4 minor versions?


> One additional thing I noticed that really blew my mind is that I can
> shutdown both the client and the server, wait, power them back on, and
> the issue is still there. So it's not something in RAM.  That prompted
> me to try "touch x" in the directory to create a new 0-length file.
> The issue then goes away. Then I can "rm x" and the issue comes back.
> By contrast, I can write megabytes from /dev/random into one of the
> files without affecting anything; the issue stays the same.
> 
> I then tried it with all empty files using the same filenames. The
> issue still occurred. Add or remove one file and the issue goes away.
> I then renamed one of the files to zz.js. Issue still occurs. Renamed
> it to zzz.js. Problem still occurs. Kept going until I got to
> zzzzzz.js and it worked.
> 
> Finally, I got it to the point where running this in an empty mounted
> directory will create the issue:
> 
> rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do
> touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx;
> done; touch y0-xxxxxx.xx
> 
> and this will not:
> 
> rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do
> touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx;
> done; touch y0-xxxxxxx.xx
> 
> (The difference being one extra x in the last filename.)
> 
> It works in the other direction as well. This causes the issue:
> 
> rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do
> touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx;
> done; touch y0-xxx.xx
> 
> This does not:
> 
> rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do
> touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx;
> done; touch y0-xx.xx
> 
> There's a four-character window involving the length of the filenames
> where 62 files in a directory causes this issue. There's a little more
> to it than that; it doesn't look like you can just create 61
> two-letter filenames and then one really long one and get the issue.
> 
> So I haven't found the specifics yet, but perhaps due to pure chance
> this directory structure is exactly right to provoke an incredibly
> obscure edge case?

Well it's likely that this is a problem with READDIR, so file content
is not going to be an issue. The file name lengths are the problem.

Also, I'm wondering what the FreeBSD client's directory readdir
arguments are (how much does it request, what are the maximum limits it
negotiates, and so on). Rick?

Since this isn't reproducible (yet) with a Linux client, let's try
another set of network captures, and you can send these to me
privately.

Start the capture
Mount
Run one of the reproducers above
Unmount
Stop the capture

I'd like to see one with v6.13-rc3 and ext4 that works as expected, and
one with the same configuration that fails.

-- 
Chuck Lever

  reply	other threads:[~2024-12-21 17:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17 22:10 knfsd server bug when GETATTR follows READDIR Rick Macklem
2024-12-18  1:51 ` Chuck Lever
2024-12-21  2:16   ` J David
2024-12-21 17:34     ` Chuck Lever [this message]
2024-12-21 20:52       ` J David
2024-12-21 23:27       ` Rick Macklem
2024-12-21 23:53         ` Rick Macklem
2024-12-22  0:29           ` Rick Macklem
2024-12-22  1:02             ` J David
2024-12-22 20:17           ` Chuck Lever
2024-12-24  6:51             ` Cedric Blancher
2024-12-24 13:45               ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4853faf-0836-4595-952b-69f71150bede@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=j.david.lists@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rick.macklem@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox