From: Jeff Layton <jlayton@kernel.org>
To: dai.ngo@oracle.com, Chuck Lever III <chuck.lever@oracle.com>
Cc: Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@gmail.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Neil Brown <neilb@suse.de>,
"Kornievskaia, Olga" <Olga.Kornievskaia@netapp.com>,
Tom Talpey <tom@talpey.com>
Subject: Re: xfstests results over NFS
Date: Thu, 17 Aug 2023 19:08:06 -0400 [thread overview]
Message-ID: <cd592a05c13226c5e1fb4f390eb2473ba20024ad.camel@kernel.org> (raw)
In-Reply-To: <b535fccd-acd2-8fca-71ac-6aa17ee84708@oracle.com>
On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
> On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
> >
> > On 8/17/23 2:07 PM, Jeff Layton wrote:
> > > On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
> > > > On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> > > > > > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > > > > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
> > > > > > > wrote:
> > > > > > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I finally got my kdevops
> > > > > > > > > > (https://github.com/linux-kdevops/kdevops) test
> > > > > > > > > > rig working well enough to get some publishable results. To
> > > > > > > > > > run fstests,
> > > > > > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > > > > > xfstests' auto group. One client mounts with default options,
> > > > > > > > > > and the
> > > > > > > > > > other uses NFSv3.
> > > > > > > > > >
> > > > > > > > > > I tested 3 kernels:
> > > > > > > > > >
> > > > > > > > > > v6.4.0 (stock release)
> > > > > > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
> > > > > > > > > > yesterday morning)
> > > > > > > > > >
> > > > > > > > > > Here are the results summary of all 3:
> > > > > > > > > >
> > > > > > > > > > KERNEL: 6.4.0
> > > > > > > > > > CPUS: 8
> > > > > > > > > >
> > > > > > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > > > > > > generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > > > > > > generic/444 generic/528 generic/529
> > > > > > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > > > > > > generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > > > > > > generic/529 generic/578 generic/675 generic/688
> > > > > > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > > > > >
> > > > > > > > > > KERNEL: 6.5.0-rc6-g4853c74bd7ab
> > > > > > > > > > CPUS: 8
> > > > > > > > > >
> > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > generic/675 generic/688
> > > > > > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > > > > >
> > > > > > > > > > KERNEL: 6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > > > > > CPUS: 8
> > > > > > > > > >
> > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > generic/675 generic/683 generic/684 generic/688
> > > > > > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > > > > > As long as we're sharing results ... here is what I'm seeing with a
> > > > > > > 6.5-rc6 client & server:
> > > > > > >
> > > > > > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
> > > > > > > --color=none
> > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > >
> > > > > > > > run | device | xunit | hostname | pass | fail |
> > > > > > > skip | time |
> > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > >
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-3 | client | 125 | 4 |
> > > > > > > 464 | 447 s |
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client | 117 | 11 |
> > > > > > > 465 | 478 s |
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client | 119 | 12 |
> > > > > > > 462 | 404 s |
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client | 212 | 18 |
> > > > > > > 363 | 564 s |
> > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > >
> > > > > > >
> > > > > > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > testcase | tcp-3 | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > generic/053 | passed | failure | failure | failure |
> > > > > > > > generic/099 | passed | failure | failure | failure |
> > > > > > > > generic/105 | passed | failure | failure | failure |
> > > > > > > > generic/140 | skipped | skipped | skipped | failure |
> > > > > > > > generic/188 | skipped | skipped | skipped | failure |
> > > > > > > > generic/258 | failure | passed | passed | failure |
> > > > > > > > generic/294 | failure | failure | failure | failure |
> > > > > > > > generic/318 | passed | failure | failure | failure |
> > > > > > > > generic/319 | passed | failure | failure | failure |
> > > > > > > > generic/357 | skipped | skipped | skipped | failure |
> > > > > > > > generic/444 | failure | failure | failure | failure |
> > > > > > > > generic/465 | passed | failure | failure | failure |
> > > > > > > > generic/513 | skipped | skipped | skipped | failure |
> > > > > > > > generic/529 | passed | failure | failure | failure |
> > > > > > > > generic/604 | passed | passed | failure | passed |
> > > > > > > > generic/675 | skipped | skipped | skipped | failure |
> > > > > > > > generic/688 | skipped | skipped | skipped | failure |
> > > > > > > > generic/697 | passed | failure | failure | failure |
> > > > > > > > nfs/002 | failure | failure | failure | failure |
> > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > >
> > > > > > >
> > > > > > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current
> > > > > > > > > > mainline
> > > > > > > > > > kernel doesn't:
> > > > > > > > > >
> > > > > > > > > > generic/193 (some sort of setattr problem)
> > > > > > > > > > generic/528 (known problem with btime handling in client
> > > > > > > > > > that has been fixed)
> > > > > > > > > >
> > > > > > > > > > While I haven't investigated, I'm assuming the 193 bug is also
> > > > > > > > > > something
> > > > > > > > > > that has been fixed in recent kernels. There are also 3 other
> > > > > > > > > > NFSv3
> > > > > > > > > > tests that started passing since v6.4.0. I haven't looked into
> > > > > > > > > > those.
> > > > > > > > > >
> > > > > > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > > > > >
> > > > > > > > > > generic/683
> > > > > > > > > > generic/684
> > > > > > > > > >
> > > > > > > > > > Both of these look like problems with setuid/setgid stripping,
> > > > > > > > > > and still
> > > > > > > > > > need to be investigated. I have more verbose result info on
> > > > > > > > > > the test
> > > > > > > > > > failures if anyone is interested.
> > > > > > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > > > > > filesystem is your server exporting?
> > > > > > >
> > > > > > btrfs
> > > > > >
> > > > > > You are testing linux-next? I need to go back and confirm these
> > > > > > results
> > > > > > too.
> > > > > IMO linux-next is quite important : we keep hitting bugs that
> > > > > appear only after integration -- block and network changes in
> > > > > other trees especially can impact the NFS drivers.
> > > > >
> > > > Indeed, I suspect this is probably something from the vfs tree (though
> > > > we definitely need to confirm that). Today I'm testing:
> > > >
> > > > 6.5.0-rc6-next-20230817-g47762f086974
> > > >
> > > Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
> > > turning off leases on the nfs server and the test started passing. I
> > > probably won't have the cycles to chase this down further.
> > >
> > > The capture looks something like this:
> > >
> > > OPEN (get a write delegation
> > > WRITE
> > > CLOSE
> > > SETATTR (mode 06666)
> > >
> > > ...then presumably a task on the client opens the file again, but the
> > > setuid bits don't get stripped.
> > >
> > > I think either the client will need to strip these bits on a delegated
> > > open, or we'll need to recall write delegations from the client when it
> > > tries to do a SETATTR with a mode that could later end up needing to be
> > > stripped on a subsequent open:
> > >
> > > 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
> > > commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
> > > Author: Dai Ngo <dai.ngo@oracle.com>
> > > Date: Thu Jun 29 18:52:40 2023 -0700
> > >
> > > NFSD: Enable write delegation support
> >
> > The SETATTR should cause the delegation to be recalled. However, I think
> > there is an optimization on server that skips the recall if the SETATTR
> > comes from the same client that has the delegation.
>
> The optimization on the server was done by this commit:
>
> 28df3d1539de nfsd: clients don't need to break their own delegations
>
> Perhaps we should allow this optimization for read delegation only?
>
> Or should the NFS client be responsible for handling the SETATTR and
> and local OPEN on the file that has write delegation granted?
>
I think that setuid/setgid files are really a special case.
We already avoid giving out delegations on setuid/gid files. What we're
not doing currently is revoking the write delegation if the holder tries
to set a mode that involves a setuid/gid bit. If we add that, then that
should close the hole, I think.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-08-17 23:09 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-17 11:21 xfstests results over NFS Jeff Layton
2023-08-17 14:04 ` Chuck Lever III
2023-08-17 14:22 ` Jeff Layton
2023-08-17 15:17 ` Anna Schumaker
2023-08-17 16:27 ` Jeff Layton
2023-08-17 16:31 ` Chuck Lever III
2023-08-17 17:15 ` Jeff Layton
2023-08-17 21:07 ` Jeff Layton
2023-08-17 22:23 ` dai.ngo
2023-08-17 22:59 ` dai.ngo
2023-08-17 23:08 ` Jeff Layton [this message]
2023-08-17 23:28 ` dai.ngo
2023-08-22 16:07 ` dai.ngo
2023-08-22 17:02 ` Jeff Layton
2023-08-22 19:51 ` dai.ngo
2023-08-22 23:15 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd592a05c13226c5e1fb4f390eb2473ba20024ad.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Olga.Kornievskaia@netapp.com \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tom@talpey.com \
--cc=trondmy@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).