linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xfstests results over NFS
@ 2023-08-17 11:21 Jeff Layton
  2023-08-17 14:04 ` Chuck Lever III
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2023-08-17 11:21 UTC (permalink / raw)
  To: Chuck Lever, Trond Myklebust, Anna Schumaker
  Cc: linux-nfs, NeilBrown, Kornievskaia, Olga, Dai Ngo, Tom Talpey

I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
rig working well enough to get some publishable results. To run fstests,
kdevops will spin up a server and (in this case) 2 clients to run
xfstests' auto group. One client mounts with default options, and the
other uses NFSv3.

I tested 3 kernels:

v6.4.0 (stock release)
6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)

Here are the results summary of all 3:

KERNEL:    6.4.0
CPUS:      8

nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
  Failures: generic/053 generic/099 generic/105 generic/124 
    generic/193 generic/258 generic/294 generic/318 generic/319 
    generic/444 generic/528 generic/529 
nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
  Failures: generic/053 generic/099 generic/105 generic/186 
    generic/187 generic/193 generic/294 generic/318 generic/319 
    generic/357 generic/444 generic/486 generic/513 generic/528 
    generic/529 generic/578 generic/675 generic/688 
Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s

KERNEL:    6.5.0-rc6-g4853c74bd7ab
CPUS:      8

nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
  Failures: generic/053 generic/099 generic/105 generic/258 
    generic/294 generic/318 generic/319 generic/444 generic/529 
nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
  Failures: generic/053 generic/099 generic/105 generic/186 
    generic/187 generic/294 generic/318 generic/319 generic/357 
    generic/444 generic/486 generic/513 generic/529 generic/578 
    generic/675 generic/688 
Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s

KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
CPUS:      8

nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
  Failures: generic/053 generic/099 generic/105 generic/258 
    generic/294 generic/318 generic/319 generic/444 generic/529 
nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
  Failures: generic/053 generic/099 generic/105 generic/186 
    generic/187 generic/294 generic/318 generic/319 generic/357 
    generic/444 generic/486 generic/513 generic/529 generic/578 
    generic/675 generic/683 generic/684 generic/688 
Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s

With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
kernel doesn't:

    generic/193	(some sort of setattr problem)
    generic/528	(known problem with btime handling in client that has been fixed)

While I haven't investigated, I'm assuming the 193 bug is also something
that has been fixed in recent kernels. There are also 3 other NFSv3
tests that started passing since v6.4.0. I haven't looked into those.

With the linux-next kernel there are 2 new regressions:

    generic/683
    generic/684

Both of these look like problems with setuid/setgid stripping, and still
need to be investigated. I have more verbose result info on the test
failures if anyone is interested.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 11:21 xfstests results over NFS Jeff Layton
@ 2023-08-17 14:04 ` Chuck Lever III
  2023-08-17 14:22   ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever III @ 2023-08-17 14:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey



> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
> rig working well enough to get some publishable results. To run fstests,
> kdevops will spin up a server and (in this case) 2 clients to run
> xfstests' auto group. One client mounts with default options, and the
> other uses NFSv3.
> 
> I tested 3 kernels:
> 
> v6.4.0 (stock release)
> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
> 
> Here are the results summary of all 3:
> 
> KERNEL:    6.4.0
> CPUS:      8
> 
> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>  Failures: generic/053 generic/099 generic/105 generic/124 
>    generic/193 generic/258 generic/294 generic/318 generic/319 
>    generic/444 generic/528 generic/529 
> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>  Failures: generic/053 generic/099 generic/105 generic/186 
>    generic/187 generic/193 generic/294 generic/318 generic/319 
>    generic/357 generic/444 generic/486 generic/513 generic/528 
>    generic/529 generic/578 generic/675 generic/688 
> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> 
> KERNEL:    6.5.0-rc6-g4853c74bd7ab
> CPUS:      8
> 
> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>  Failures: generic/053 generic/099 generic/105 generic/258 
>    generic/294 generic/318 generic/319 generic/444 generic/529 
> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>  Failures: generic/053 generic/099 generic/105 generic/186 
>    generic/187 generic/294 generic/318 generic/319 generic/357 
>    generic/444 generic/486 generic/513 generic/529 generic/578 
>    generic/675 generic/688 
> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> 
> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> CPUS:      8
> 
> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>  Failures: generic/053 generic/099 generic/105 generic/258 
>    generic/294 generic/318 generic/319 generic/444 generic/529 
> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>  Failures: generic/053 generic/099 generic/105 generic/186 
>    generic/187 generic/294 generic/318 generic/319 generic/357 
>    generic/444 generic/486 generic/513 generic/529 generic/578 
>    generic/675 generic/683 generic/684 generic/688 
> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> 
> With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
> kernel doesn't:
> 
>    generic/193 (some sort of setattr problem)
>    generic/528 (known problem with btime handling in client that has been fixed)
> 
> While I haven't investigated, I'm assuming the 193 bug is also something
> that has been fixed in recent kernels. There are also 3 other NFSv3
> tests that started passing since v6.4.0. I haven't looked into those.
> 
> With the linux-next kernel there are 2 new regressions:
> 
>    generic/683
>    generic/684
> 
> Both of these look like problems with setuid/setgid stripping, and still
> need to be investigated. I have more verbose result info on the test
> failures if anyone is interested.

100% awesome sauce. Out of curiosity:

Does kdevops have a way of publishing (via an autonomous web site)
and archiving these results?

Does the "auto" group include tests that require a SCRATCH_DEV?


--
Chuck Lever



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 14:04 ` Chuck Lever III
@ 2023-08-17 14:22   ` Jeff Layton
  2023-08-17 15:17     ` Anna Schumaker
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2023-08-17 14:22 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey

On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> 
> > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
> > rig working well enough to get some publishable results. To run fstests,
> > kdevops will spin up a server and (in this case) 2 clients to run
> > xfstests' auto group. One client mounts with default options, and the
> > other uses NFSv3.
> > 
> > I tested 3 kernels:
> > 
> > v6.4.0 (stock release)
> > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
> > 
> > Here are the results summary of all 3:
> > 
> > KERNEL:    6.4.0
> > CPUS:      8
> > 
> > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> >  Failures: generic/053 generic/099 generic/105 generic/124 
> >    generic/193 generic/258 generic/294 generic/318 generic/319 
> >    generic/444 generic/528 generic/529 
> > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> >  Failures: generic/053 generic/099 generic/105 generic/186 
> >    generic/187 generic/193 generic/294 generic/318 generic/319 
> >    generic/357 generic/444 generic/486 generic/513 generic/528 
> >    generic/529 generic/578 generic/675 generic/688 
> > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > 
> > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > CPUS:      8
> > 
> > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> >  Failures: generic/053 generic/099 generic/105 generic/258 
> >    generic/294 generic/318 generic/319 generic/444 generic/529 
> > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> >  Failures: generic/053 generic/099 generic/105 generic/186 
> >    generic/187 generic/294 generic/318 generic/319 generic/357 
> >    generic/444 generic/486 generic/513 generic/529 generic/578 
> >    generic/675 generic/688 
> > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > 
> > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > CPUS:      8
> > 
> > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> >  Failures: generic/053 generic/099 generic/105 generic/258 
> >    generic/294 generic/318 generic/319 generic/444 generic/529 
> > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> >  Failures: generic/053 generic/099 generic/105 generic/186 
> >    generic/187 generic/294 generic/318 generic/319 generic/357 
> >    generic/444 generic/486 generic/513 generic/529 generic/578 
> >    generic/675 generic/683 generic/684 generic/688 
> > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > 
> > With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
> > kernel doesn't:
> > 
> >    generic/193 (some sort of setattr problem)
> >    generic/528 (known problem with btime handling in client that has been fixed)
> > 
> > While I haven't investigated, I'm assuming the 193 bug is also something
> > that has been fixed in recent kernels. There are also 3 other NFSv3
> > tests that started passing since v6.4.0. I haven't looked into those.
> > 
> > With the linux-next kernel there are 2 new regressions:
> > 
> >    generic/683
> >    generic/684
> > 
> > Both of these look like problems with setuid/setgid stripping, and still
> > need to be investigated. I have more verbose result info on the test
> > failures if anyone is interested.
> 
> 100% awesome sauce. Out of curiosity:
> 
> Does kdevops have a way of publishing (via an autonomous web site)
> and archiving these results?
> 

There's nothing much prewritten for this. There is some support for
sending emails when you run a "ci" loop. I need to do more investigation
here.

Note that there has been some parallel effort toward CI in the SMB space
using buildbot. It may worthwhile to consider combining efforts somehow.

> Does the "auto" group include tests that require a SCRATCH_DEV?
> 

Yes. The nfs server is configured with 2 exported fs', so I have it
mounting a directory under one as "test" and the other as "scratch".
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 14:22   ` Jeff Layton
@ 2023-08-17 15:17     ` Anna Schumaker
  2023-08-17 16:27       ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: Anna Schumaker @ 2023-08-17 15:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever III, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey

On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> >
> > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
> > > rig working well enough to get some publishable results. To run fstests,
> > > kdevops will spin up a server and (in this case) 2 clients to run
> > > xfstests' auto group. One client mounts with default options, and the
> > > other uses NFSv3.
> > >
> > > I tested 3 kernels:
> > >
> > > v6.4.0 (stock release)
> > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
> > >
> > > Here are the results summary of all 3:
> > >
> > > KERNEL:    6.4.0
> > > CPUS:      8
> > >
> > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > >  Failures: generic/053 generic/099 generic/105 generic/124
> > >    generic/193 generic/258 generic/294 generic/318 generic/319
> > >    generic/444 generic/528 generic/529
> > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > >  Failures: generic/053 generic/099 generic/105 generic/186
> > >    generic/187 generic/193 generic/294 generic/318 generic/319
> > >    generic/357 generic/444 generic/486 generic/513 generic/528
> > >    generic/529 generic/578 generic/675 generic/688
> > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > >
> > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > CPUS:      8
> > >
> > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > >  Failures: generic/053 generic/099 generic/105 generic/258
> > >    generic/294 generic/318 generic/319 generic/444 generic/529
> > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > >  Failures: generic/053 generic/099 generic/105 generic/186
> > >    generic/187 generic/294 generic/318 generic/319 generic/357
> > >    generic/444 generic/486 generic/513 generic/529 generic/578
> > >    generic/675 generic/688
> > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > >
> > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > CPUS:      8
> > >
> > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > >  Failures: generic/053 generic/099 generic/105 generic/258
> > >    generic/294 generic/318 generic/319 generic/444 generic/529
> > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > >  Failures: generic/053 generic/099 generic/105 generic/186
> > >    generic/187 generic/294 generic/318 generic/319 generic/357
> > >    generic/444 generic/486 generic/513 generic/529 generic/578
> > >    generic/675 generic/683 generic/684 generic/688
> > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s

As long as we're sharing results ... here is what I'm seeing with a
6.5-rc6 client & server:

anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
+------+----------------------+---------+----------+------+------+------+-------+
|  run | device               | xunit   | hostname | pass | fail |
skip |  time |
+------+----------------------+---------+----------+------+------+------+-------+
| 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
464 | 447 s |
| 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
465 | 478 s |
| 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
462 | 404 s |
| 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
363 | 564 s |
+------+----------------------+---------+----------+------+------+------+-------+

anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
+-------------+---------+---------+---------+---------+
|    testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
+-------------+---------+---------+---------+---------+
| generic/053 | passed  | failure | failure | failure |
| generic/099 | passed  | failure | failure | failure |
| generic/105 | passed  | failure | failure | failure |
| generic/140 | skipped | skipped | skipped | failure |
| generic/188 | skipped | skipped | skipped | failure |
| generic/258 | failure | passed  | passed  | failure |
| generic/294 | failure | failure | failure | failure |
| generic/318 | passed  | failure | failure | failure |
| generic/319 | passed  | failure | failure | failure |
| generic/357 | skipped | skipped | skipped | failure |
| generic/444 | failure | failure | failure | failure |
| generic/465 | passed  | failure | failure | failure |
| generic/513 | skipped | skipped | skipped | failure |
| generic/529 | passed  | failure | failure | failure |
| generic/604 | passed  | passed  | failure | passed  |
| generic/675 | skipped | skipped | skipped | failure |
| generic/688 | skipped | skipped | skipped | failure |
| generic/697 | passed  | failure | failure | failure |
|     nfs/002 | failure | failure | failure | failure |
+-------------+---------+---------+---------+---------+


> > >
> > > With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
> > > kernel doesn't:
> > >
> > >    generic/193 (some sort of setattr problem)
> > >    generic/528 (known problem with btime handling in client that has been fixed)
> > >
> > > While I haven't investigated, I'm assuming the 193 bug is also something
> > > that has been fixed in recent kernels. There are also 3 other NFSv3
> > > tests that started passing since v6.4.0. I haven't looked into those.
> > >
> > > With the linux-next kernel there are 2 new regressions:
> > >
> > >    generic/683
> > >    generic/684
> > >
> > > Both of these look like problems with setuid/setgid stripping, and still
> > > need to be investigated. I have more verbose result info on the test
> > > failures if anyone is interested.

Interesting that I'm not seeing the 683 & 684 failures. What type of
filesystem is your server exporting?

> >
> > 100% awesome sauce. Out of curiosity:
> >
> > Does kdevops have a way of publishing (via an autonomous web site)
> > and archiving these results?
> >
>
> There's nothing much prewritten for this. There is some support for
> sending emails when you run a "ci" loop. I need to do more investigation
> here.

xfstests has an option to generate an xunit file, which can help here.
I use with my own archiving tool to stick everything into a sqlite
database (https://git.nowheycreamery.com/anna/xfstestsdb).

>
> Note that there has been some parallel effort toward CI in the SMB space
> using buildbot. It may worthwhile to consider combining efforts somehow.

It might be nice to at least see what they're doing. If they have
something that works well, then setting up something similar might be
a good idea.

Anna

>
> > Does the "auto" group include tests that require a SCRATCH_DEV?
> >
>
> Yes. The nfs server is configured with 2 exported fs', so I have it
> mounting a directory under one as "test" and the other as "scratch".
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 15:17     ` Anna Schumaker
@ 2023-08-17 16:27       ` Jeff Layton
  2023-08-17 16:31         ` Chuck Lever III
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2023-08-17 16:27 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Chuck Lever III, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey

On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > 
> > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
> > > > rig working well enough to get some publishable results. To run fstests,
> > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > xfstests' auto group. One client mounts with default options, and the
> > > > other uses NFSv3.
> > > > 
> > > > I tested 3 kernels:
> > > > 
> > > > v6.4.0 (stock release)
> > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
> > > > 
> > > > Here are the results summary of all 3:
> > > > 
> > > > KERNEL:    6.4.0
> > > > CPUS:      8
> > > > 
> > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > >  Failures: generic/053 generic/099 generic/105 generic/124
> > > >    generic/193 generic/258 generic/294 generic/318 generic/319
> > > >    generic/444 generic/528 generic/529
> > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > >  Failures: generic/053 generic/099 generic/105 generic/186
> > > >    generic/187 generic/193 generic/294 generic/318 generic/319
> > > >    generic/357 generic/444 generic/486 generic/513 generic/528
> > > >    generic/529 generic/578 generic/675 generic/688
> > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > 
> > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > CPUS:      8
> > > > 
> > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > >  Failures: generic/053 generic/099 generic/105 generic/258
> > > >    generic/294 generic/318 generic/319 generic/444 generic/529
> > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > >  Failures: generic/053 generic/099 generic/105 generic/186
> > > >    generic/187 generic/294 generic/318 generic/319 generic/357
> > > >    generic/444 generic/486 generic/513 generic/529 generic/578
> > > >    generic/675 generic/688
> > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > 
> > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > CPUS:      8
> > > > 
> > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > >  Failures: generic/053 generic/099 generic/105 generic/258
> > > >    generic/294 generic/318 generic/319 generic/444 generic/529
> > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > >  Failures: generic/053 generic/099 generic/105 generic/186
> > > >    generic/187 generic/294 generic/318 generic/319 generic/357
> > > >    generic/444 generic/486 generic/513 generic/529 generic/578
> > > >    generic/675 generic/683 generic/684 generic/688
> > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> 
> As long as we're sharing results ... here is what I'm seeing with a
> 6.5-rc6 client & server:
> 
> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
> +------+----------------------+---------+----------+------+------+------+-------+
> >  run | device               | xunit   | hostname | pass | fail |
> skip |  time |
> +------+----------------------+---------+----------+------+------+------+-------+
> > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> 464 | 447 s |
> > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> 465 | 478 s |
> > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> 462 | 404 s |
> > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> 363 | 564 s |
> +------+----------------------+---------+----------+------+------+------+-------+
> 
> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> +-------------+---------+---------+---------+---------+
> >    testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> +-------------+---------+---------+---------+---------+
> > generic/053 | passed  | failure | failure | failure |
> > generic/099 | passed  | failure | failure | failure |
> > generic/105 | passed  | failure | failure | failure |
> > generic/140 | skipped | skipped | skipped | failure |
> > generic/188 | skipped | skipped | skipped | failure |
> > generic/258 | failure | passed  | passed  | failure |
> > generic/294 | failure | failure | failure | failure |
> > generic/318 | passed  | failure | failure | failure |
> > generic/319 | passed  | failure | failure | failure |
> > generic/357 | skipped | skipped | skipped | failure |
> > generic/444 | failure | failure | failure | failure |
> > generic/465 | passed  | failure | failure | failure |
> > generic/513 | skipped | skipped | skipped | failure |
> > generic/529 | passed  | failure | failure | failure |
> > generic/604 | passed  | passed  | failure | passed  |
> > generic/675 | skipped | skipped | skipped | failure |
> > generic/688 | skipped | skipped | skipped | failure |
> > generic/697 | passed  | failure | failure | failure |
> >     nfs/002 | failure | failure | failure | failure |
> +-------------+---------+---------+---------+---------+
> 
> 
> > > > 
> > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
> > > > kernel doesn't:
> > > > 
> > > >    generic/193 (some sort of setattr problem)
> > > >    generic/528 (known problem with btime handling in client that has been fixed)
> > > > 
> > > > While I haven't investigated, I'm assuming the 193 bug is also something
> > > > that has been fixed in recent kernels. There are also 3 other NFSv3
> > > > tests that started passing since v6.4.0. I haven't looked into those.
> > > > 
> > > > With the linux-next kernel there are 2 new regressions:
> > > > 
> > > >    generic/683
> > > >    generic/684
> > > > 
> > > > Both of these look like problems with setuid/setgid stripping, and still
> > > > need to be investigated. I have more verbose result info on the test
> > > > failures if anyone is interested.
> 
> Interesting that I'm not seeing the 683 & 684 failures. What type of
> filesystem is your server exporting?
> 

btrfs

You are testing linux-next? I need to go back and confirm these results
too.

> > > 
> > > 100% awesome sauce. Out of curiosity:
> > > 
> > > Does kdevops have a way of publishing (via an autonomous web site)
> > > and archiving these results?
> > > 
> > 
> > There's nothing much prewritten for this. There is some support for
> > sending emails when you run a "ci" loop. I need to do more investigation
> > here.
> 
> xfstests has an option to generate an xunit file, which can help here.
> I use with my own archiving tool to stick everything into a sqlite
> database (https://git.nowheycreamery.com/anna/xfstestsdb).
> 

Yeah, kdevops uses the xunit file to generate its results, AFAIU. TBH, a
lot of the automation surrounding how to collate and evaluate test
results is still something I need to look at more closely. It's not well
documented and is still under pretty heavy development.

> > 
> > Note that there has been some parallel effort toward CI in the SMB space
> > using buildbot. It may worthwhile to consider combining efforts somehow.
> 
> It might be nice to at least see what they're doing. If they have
> something that works well, then setting up something similar might be
> a good idea.
> 

Just my gut feel is that kdevops seems to be more geared toward
"maintainer wants to see a set of results vs. particular kernels",
whereas buildbot seems to be more geared toward automation, and CI type
workloads. There are some CI-ish automation bits in kdevops, but doesn't
seem to be as straightforward as what buildbot has.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 16:27       ` Jeff Layton
@ 2023-08-17 16:31         ` Chuck Lever III
  2023-08-17 17:15           ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever III @ 2023-08-17 16:31 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey



> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>> 
>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>> 
>>>>> I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
>>>>> rig working well enough to get some publishable results. To run fstests,
>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>> xfstests' auto group. One client mounts with default options, and the
>>>>> other uses NFSv3.
>>>>> 
>>>>> I tested 3 kernels:
>>>>> 
>>>>> v6.4.0 (stock release)
>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
>>>>> 
>>>>> Here are the results summary of all 3:
>>>>> 
>>>>> KERNEL:    6.4.0
>>>>> CPUS:      8
>>>>> 
>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>   generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>   generic/444 generic/528 generic/529
>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>   generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>   generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>   generic/529 generic/578 generic/675 generic/688
>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>> 
>>>>> KERNEL:    6.5.0-rc6-g4853c74bd7ab
>>>>> CPUS:      8
>>>>> 
>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>   generic/294 generic/318 generic/319 generic/444 generic/529
>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>   generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>   generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>   generic/675 generic/688
>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>> 
>>>>> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>> CPUS:      8
>>>>> 
>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>   generic/294 generic/318 generic/319 generic/444 generic/529
>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>   generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>   generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>   generic/675 generic/683 generic/684 generic/688
>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>> 
>> As long as we're sharing results ... here is what I'm seeing with a
>> 6.5-rc6 client & server:
>> 
>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
>> +------+----------------------+---------+----------+------+------+------+-------+
>>> run | device               | xunit   | hostname | pass | fail |
>> skip |  time |
>> +------+----------------------+---------+----------+------+------+------+-------+
>>> 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
>> 464 | 447 s |
>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
>> 465 | 478 s |
>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
>> 462 | 404 s |
>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
>> 363 | 564 s |
>> +------+----------------------+---------+----------+------+------+------+-------+
>> 
>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>> +-------------+---------+---------+---------+---------+
>>>   testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>> +-------------+---------+---------+---------+---------+
>>> generic/053 | passed  | failure | failure | failure |
>>> generic/099 | passed  | failure | failure | failure |
>>> generic/105 | passed  | failure | failure | failure |
>>> generic/140 | skipped | skipped | skipped | failure |
>>> generic/188 | skipped | skipped | skipped | failure |
>>> generic/258 | failure | passed  | passed  | failure |
>>> generic/294 | failure | failure | failure | failure |
>>> generic/318 | passed  | failure | failure | failure |
>>> generic/319 | passed  | failure | failure | failure |
>>> generic/357 | skipped | skipped | skipped | failure |
>>> generic/444 | failure | failure | failure | failure |
>>> generic/465 | passed  | failure | failure | failure |
>>> generic/513 | skipped | skipped | skipped | failure |
>>> generic/529 | passed  | failure | failure | failure |
>>> generic/604 | passed  | passed  | failure | passed  |
>>> generic/675 | skipped | skipped | skipped | failure |
>>> generic/688 | skipped | skipped | skipped | failure |
>>> generic/697 | passed  | failure | failure | failure |
>>>    nfs/002 | failure | failure | failure | failure |
>> +-------------+---------+---------+---------+---------+
>> 
>> 
>>>>> 
>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
>>>>> kernel doesn't:
>>>>> 
>>>>>   generic/193 (some sort of setattr problem)
>>>>>   generic/528 (known problem with btime handling in client that has been fixed)
>>>>> 
>>>>> While I haven't investigated, I'm assuming the 193 bug is also something
>>>>> that has been fixed in recent kernels. There are also 3 other NFSv3
>>>>> tests that started passing since v6.4.0. I haven't looked into those.
>>>>> 
>>>>> With the linux-next kernel there are 2 new regressions:
>>>>> 
>>>>>   generic/683
>>>>>   generic/684
>>>>> 
>>>>> Both of these look like problems with setuid/setgid stripping, and still
>>>>> need to be investigated. I have more verbose result info on the test
>>>>> failures if anyone is interested.
>> 
>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>> filesystem is your server exporting?
>> 
> 
> btrfs
> 
> You are testing linux-next? I need to go back and confirm these results
> too.

IMO linux-next is quite important : we keep hitting bugs that
appear only after integration -- block and network changes in
other trees especially can impact the NFS drivers.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 16:31         ` Chuck Lever III
@ 2023-08-17 17:15           ` Jeff Layton
  2023-08-17 21:07             ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2023-08-17 17:15 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey

On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> 
> > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > 
> > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
> > > > > > rig working well enough to get some publishable results. To run fstests,
> > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > xfstests' auto group. One client mounts with default options, and the
> > > > > > other uses NFSv3.
> > > > > > 
> > > > > > I tested 3 kernels:
> > > > > > 
> > > > > > v6.4.0 (stock release)
> > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
> > > > > > 
> > > > > > Here are the results summary of all 3:
> > > > > > 
> > > > > > KERNEL:    6.4.0
> > > > > > CPUS:      8
> > > > > > 
> > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > >   generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > >   generic/444 generic/528 generic/529
> > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > >   generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > >   generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > >   generic/529 generic/578 generic/675 generic/688
> > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > 
> > > > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > > > CPUS:      8
> > > > > > 
> > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > >   generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > >   generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > >   generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > >   generic/675 generic/688
> > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > 
> > > > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > CPUS:      8
> > > > > > 
> > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > >   generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > >   generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > >   generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > >   generic/675 generic/683 generic/684 generic/688
> > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > 
> > > As long as we're sharing results ... here is what I'm seeing with a
> > > 6.5-rc6 client & server:
> > > 
> > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
> > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > run | device               | xunit   | hostname | pass | fail |
> > > skip |  time |
> > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> > > 464 | 447 s |
> > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> > > 465 | 478 s |
> > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> > > 462 | 404 s |
> > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> > > 363 | 564 s |
> > > +------+----------------------+---------+----------+------+------+------+-------+
> > > 
> > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > +-------------+---------+---------+---------+---------+
> > > >   testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > +-------------+---------+---------+---------+---------+
> > > > generic/053 | passed  | failure | failure | failure |
> > > > generic/099 | passed  | failure | failure | failure |
> > > > generic/105 | passed  | failure | failure | failure |
> > > > generic/140 | skipped | skipped | skipped | failure |
> > > > generic/188 | skipped | skipped | skipped | failure |
> > > > generic/258 | failure | passed  | passed  | failure |
> > > > generic/294 | failure | failure | failure | failure |
> > > > generic/318 | passed  | failure | failure | failure |
> > > > generic/319 | passed  | failure | failure | failure |
> > > > generic/357 | skipped | skipped | skipped | failure |
> > > > generic/444 | failure | failure | failure | failure |
> > > > generic/465 | passed  | failure | failure | failure |
> > > > generic/513 | skipped | skipped | skipped | failure |
> > > > generic/529 | passed  | failure | failure | failure |
> > > > generic/604 | passed  | passed  | failure | passed  |
> > > > generic/675 | skipped | skipped | skipped | failure |
> > > > generic/688 | skipped | skipped | skipped | failure |
> > > > generic/697 | passed  | failure | failure | failure |
> > > >    nfs/002 | failure | failure | failure | failure |
> > > +-------------+---------+---------+---------+---------+
> > > 
> > > 
> > > > > > 
> > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
> > > > > > kernel doesn't:
> > > > > > 
> > > > > >   generic/193 (some sort of setattr problem)
> > > > > >   generic/528 (known problem with btime handling in client that has been fixed)
> > > > > > 
> > > > > > While I haven't investigated, I'm assuming the 193 bug is also something
> > > > > > that has been fixed in recent kernels. There are also 3 other NFSv3
> > > > > > tests that started passing since v6.4.0. I haven't looked into those.
> > > > > > 
> > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > 
> > > > > >   generic/683
> > > > > >   generic/684
> > > > > > 
> > > > > > Both of these look like problems with setuid/setgid stripping, and still
> > > > > > need to be investigated. I have more verbose result info on the test
> > > > > > failures if anyone is interested.
> > > 
> > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > filesystem is your server exporting?
> > > 
> > 
> > btrfs
> > 
> > You are testing linux-next? I need to go back and confirm these results
> > too.
> 
> IMO linux-next is quite important : we keep hitting bugs that
> appear only after integration -- block and network changes in
> other trees especially can impact the NFS drivers.
> 

Indeed, I suspect this is probably something from the vfs tree (though
we definitely need to confirm that). Today I'm testing:

    6.5.0-rc6-next-20230817-g47762f086974

[vagrant@kdevops-nfs-default xfstests]$ sudo ./check generic/683
generic/684
SECTION       -- default
FSTYP         -- nfs
PLATFORM      -- Linux/x86_64 kdevops-nfs-default 6.5.0-rc6-next-20230817-g47762f086974 #37 SMP PREEMPT_DYNAMIC Thu Aug 17 10:17:27 EDT 2023
MKFS_OPTIONS  -- kdevops-nfsd:/export/1/fstests/kdevops-nfs-default
MOUNT_OPTIONS -- kdevops-nfsd:/export/1/fstests/kdevops-nfs-default /media/scratch

generic/683       - output mismatch (see /data/fstests-install/xfstests/results/kdevops-nfs-default/6.5.0-rc6-next-20230817-g47762f086974/default/generic/683.out.bad)
    --- tests/generic/683.out   2023-08-17 15:50:52.428385413 +0000
    +++ /data/fstests-install/xfstests/results/kdevops-nfs-default/6.5.0-rc6-next-20230817-g47762f086974/default/generic/683.out.bad    2023-08-17 17:10:05.017250750 +0000
    @@ -1,19 +1,19 @@
     QA output created by 683
     Test 1 - qa_user, non-exec file falloc
     6666 -rwSrwSrw- TEST_DIR/683/a
    -666 -rw-rw-rw- TEST_DIR/683/a
    +6666 -rwSrwSrw- TEST_DIR/683/a
     
     Test 2 - qa_user, group-exec file falloc
    ...
    (Run 'diff -u /data/fstests-install/xfstests/tests/generic/683.out /data/fstests-install/xfstests/results/kdevops-nfs-default/6.5.0-rc6-next-20230817-g47762f086974/default/generic/683.out.bad'  to see the entire diff)
generic/684       - output mismatch (see /data/fstests-install/xfstests/results/kdevops-nfs-default/6.5.0-rc6-next-20230817-g47762f086974/default/generic/684.out.bad)
    --- tests/generic/684.out   2023-08-17 15:50:52.456385413 +0000
    +++ /data/fstests-install/xfstests/results/kdevops-nfs-default/6.5.0-rc6-next-20230817-g47762f086974/default/generic/684.out.bad    2023-08-17 17:10:11.409250750 +0000
    @@ -1,19 +1,19 @@
     QA output created by 684
     Test 1 - qa_user, non-exec file fpunch
     6666 -rwSrwSrw- TEST_DIR/684/a
    -666 -rw-rw-rw- TEST_DIR/684/a
    +6666 -rwSrwSrw- TEST_DIR/684/a
     
     Test 2 - qa_user, group-exec file fpunch
    ...
    (Run 'diff -u /data/fstests-install/xfstests/tests/generic/684.out /data/fstests-install/xfstests/results/kdevops-nfs-default/6.5.0-rc6-next-20230817-g47762f086974/default/generic/684.out.bad'  to see the entire diff)
Ran: generic/683 generic/684
Failures: generic/683 generic/684
Failed 2 of 2 tests

SECTION       -- default
=========================
Ran: generic/683 generic/684
Failures: generic/683 generic/684
Failed 2 of 2 tests


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 17:15           ` Jeff Layton
@ 2023-08-17 21:07             ` Jeff Layton
  2023-08-17 22:23               ` dai.ngo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2023-08-17 21:07 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Dai Ngo, Tom Talpey

On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> > 
> > > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > 
> > > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > > 
> > > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > 
> > > > > > > I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
> > > > > > > rig working well enough to get some publishable results. To run fstests,
> > > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > > xfstests' auto group. One client mounts with default options, and the
> > > > > > > other uses NFSv3.
> > > > > > > 
> > > > > > > I tested 3 kernels:
> > > > > > > 
> > > > > > > v6.4.0 (stock release)
> > > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
> > > > > > > 
> > > > > > > Here are the results summary of all 3:
> > > > > > > 
> > > > > > > KERNEL:    6.4.0
> > > > > > > CPUS:      8
> > > > > > > 
> > > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > > >   generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > > >   generic/444 generic/528 generic/529
> > > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > >   generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > > >   generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > > >   generic/529 generic/578 generic/675 generic/688
> > > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > > 
> > > > > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > > > > CPUS:      8
> > > > > > > 
> > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > >   generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > >   generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > >   generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > >   generic/675 generic/688
> > > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > > 
> > > > > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > > CPUS:      8
> > > > > > > 
> > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > >   generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > >   generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > >   generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > >   generic/675 generic/683 generic/684 generic/688
> > > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > > 
> > > > As long as we're sharing results ... here is what I'm seeing with a
> > > > 6.5-rc6 client & server:
> > > > 
> > > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
> > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > run | device               | xunit   | hostname | pass | fail |
> > > > skip |  time |
> > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> > > > 464 | 447 s |
> > > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> > > > 465 | 478 s |
> > > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> > > > 462 | 404 s |
> > > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> > > > 363 | 564 s |
> > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > 
> > > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > > +-------------+---------+---------+---------+---------+
> > > > >   testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > > +-------------+---------+---------+---------+---------+
> > > > > generic/053 | passed  | failure | failure | failure |
> > > > > generic/099 | passed  | failure | failure | failure |
> > > > > generic/105 | passed  | failure | failure | failure |
> > > > > generic/140 | skipped | skipped | skipped | failure |
> > > > > generic/188 | skipped | skipped | skipped | failure |
> > > > > generic/258 | failure | passed  | passed  | failure |
> > > > > generic/294 | failure | failure | failure | failure |
> > > > > generic/318 | passed  | failure | failure | failure |
> > > > > generic/319 | passed  | failure | failure | failure |
> > > > > generic/357 | skipped | skipped | skipped | failure |
> > > > > generic/444 | failure | failure | failure | failure |
> > > > > generic/465 | passed  | failure | failure | failure |
> > > > > generic/513 | skipped | skipped | skipped | failure |
> > > > > generic/529 | passed  | failure | failure | failure |
> > > > > generic/604 | passed  | passed  | failure | passed  |
> > > > > generic/675 | skipped | skipped | skipped | failure |
> > > > > generic/688 | skipped | skipped | skipped | failure |
> > > > > generic/697 | passed  | failure | failure | failure |
> > > > >    nfs/002 | failure | failure | failure | failure |
> > > > +-------------+---------+---------+---------+---------+
> > > > 
> > > > 
> > > > > > > 
> > > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
> > > > > > > kernel doesn't:
> > > > > > > 
> > > > > > >   generic/193 (some sort of setattr problem)
> > > > > > >   generic/528 (known problem with btime handling in client that has been fixed)
> > > > > > > 
> > > > > > > While I haven't investigated, I'm assuming the 193 bug is also something
> > > > > > > that has been fixed in recent kernels. There are also 3 other NFSv3
> > > > > > > tests that started passing since v6.4.0. I haven't looked into those.
> > > > > > > 
> > > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > > 
> > > > > > >   generic/683
> > > > > > >   generic/684
> > > > > > > 
> > > > > > > Both of these look like problems with setuid/setgid stripping, and still
> > > > > > > need to be investigated. I have more verbose result info on the test
> > > > > > > failures if anyone is interested.
> > > > 
> > > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > > filesystem is your server exporting?
> > > > 
> > > 
> > > btrfs
> > > 
> > > You are testing linux-next? I need to go back and confirm these results
> > > too.
> > 
> > IMO linux-next is quite important : we keep hitting bugs that
> > appear only after integration -- block and network changes in
> > other trees especially can impact the NFS drivers.
> > 
> 
> Indeed, I suspect this is probably something from the vfs tree (though
> we definitely need to confirm that). Today I'm testing:
> 
>     6.5.0-rc6-next-20230817-g47762f086974
> 

Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
turning off leases on the nfs server and the test started passing. I
probably won't have the cycles to chase this down further.

The capture looks something like this:

OPEN (get a write delegation
WRITE
CLOSE
SETATTR (mode 06666)

...then presumably a task on the client opens the file again, but the
setuid bits don't get stripped.

I think either the client will need to strip these bits on a delegated
open, or we'll need to recall write delegations from the client when it
tries to do a SETATTR with a mode that could later end up needing to be
stripped on a subsequent open:

66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
Author: Dai Ngo <dai.ngo@oracle.com>
Date:   Thu Jun 29 18:52:40 2023 -0700

    NFSD: Enable write delegation support


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 21:07             ` Jeff Layton
@ 2023-08-17 22:23               ` dai.ngo
  2023-08-17 22:59                 ` dai.ngo
  0 siblings, 1 reply; 16+ messages in thread
From: dai.ngo @ 2023-08-17 22:23 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey


On 8/17/23 2:07 PM, Jeff Layton wrote:
> On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
>> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
>>>> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>
>>>> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>>>>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>>
>>>>>>>> I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
>>>>>>>> rig working well enough to get some publishable results. To run fstests,
>>>>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>>>>> xfstests' auto group. One client mounts with default options, and the
>>>>>>>> other uses NFSv3.
>>>>>>>>
>>>>>>>> I tested 3 kernels:
>>>>>>>>
>>>>>>>> v6.4.0 (stock release)
>>>>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)
>>>>>>>>
>>>>>>>> Here are the results summary of all 3:
>>>>>>>>
>>>>>>>> KERNEL:    6.4.0
>>>>>>>> CPUS:      8
>>>>>>>>
>>>>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>>>>    generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>>>>    generic/444 generic/528 generic/529
>>>>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>    generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>>>>    generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>>>>    generic/529 generic/578 generic/675 generic/688
>>>>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>>>>>
>>>>>>>> KERNEL:    6.5.0-rc6-g4853c74bd7ab
>>>>>>>> CPUS:      8
>>>>>>>>
>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>    generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>    generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>    generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>    generic/675 generic/688
>>>>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>>>>>
>>>>>>>> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>>>>> CPUS:      8
>>>>>>>>
>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>    generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>    generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>    generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>    generic/675 generic/683 generic/684 generic/688
>>>>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>>>>> As long as we're sharing results ... here is what I'm seeing with a
>>>>> 6.5-rc6 client & server:
>>>>>
>>>>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>> run | device               | xunit   | hostname | pass | fail |
>>>>> skip |  time |
>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>> 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
>>>>> 464 | 447 s |
>>>>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
>>>>> 465 | 478 s |
>>>>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
>>>>> 462 | 404 s |
>>>>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
>>>>> 363 | 564 s |
>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>
>>>>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>>>>> +-------------+---------+---------+---------+---------+
>>>>>>    testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>>>>> +-------------+---------+---------+---------+---------+
>>>>>> generic/053 | passed  | failure | failure | failure |
>>>>>> generic/099 | passed  | failure | failure | failure |
>>>>>> generic/105 | passed  | failure | failure | failure |
>>>>>> generic/140 | skipped | skipped | skipped | failure |
>>>>>> generic/188 | skipped | skipped | skipped | failure |
>>>>>> generic/258 | failure | passed  | passed  | failure |
>>>>>> generic/294 | failure | failure | failure | failure |
>>>>>> generic/318 | passed  | failure | failure | failure |
>>>>>> generic/319 | passed  | failure | failure | failure |
>>>>>> generic/357 | skipped | skipped | skipped | failure |
>>>>>> generic/444 | failure | failure | failure | failure |
>>>>>> generic/465 | passed  | failure | failure | failure |
>>>>>> generic/513 | skipped | skipped | skipped | failure |
>>>>>> generic/529 | passed  | failure | failure | failure |
>>>>>> generic/604 | passed  | passed  | failure | passed  |
>>>>>> generic/675 | skipped | skipped | skipped | failure |
>>>>>> generic/688 | skipped | skipped | skipped | failure |
>>>>>> generic/697 | passed  | failure | failure | failure |
>>>>>>     nfs/002 | failure | failure | failure | failure |
>>>>> +-------------+---------+---------+---------+---------+
>>>>>
>>>>>
>>>>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
>>>>>>>> kernel doesn't:
>>>>>>>>
>>>>>>>>    generic/193 (some sort of setattr problem)
>>>>>>>>    generic/528 (known problem with btime handling in client that has been fixed)
>>>>>>>>
>>>>>>>> While I haven't investigated, I'm assuming the 193 bug is also something
>>>>>>>> that has been fixed in recent kernels. There are also 3 other NFSv3
>>>>>>>> tests that started passing since v6.4.0. I haven't looked into those.
>>>>>>>>
>>>>>>>> With the linux-next kernel there are 2 new regressions:
>>>>>>>>
>>>>>>>>    generic/683
>>>>>>>>    generic/684
>>>>>>>>
>>>>>>>> Both of these look like problems with setuid/setgid stripping, and still
>>>>>>>> need to be investigated. I have more verbose result info on the test
>>>>>>>> failures if anyone is interested.
>>>>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>>>>> filesystem is your server exporting?
>>>>>
>>>> btrfs
>>>>
>>>> You are testing linux-next? I need to go back and confirm these results
>>>> too.
>>> IMO linux-next is quite important : we keep hitting bugs that
>>> appear only after integration -- block and network changes in
>>> other trees especially can impact the NFS drivers.
>>>
>> Indeed, I suspect this is probably something from the vfs tree (though
>> we definitely need to confirm that). Today I'm testing:
>>
>>      6.5.0-rc6-next-20230817-g47762f086974
>>
> Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
> turning off leases on the nfs server and the test started passing. I
> probably won't have the cycles to chase this down further.
>
> The capture looks something like this:
>
> OPEN (get a write delegation
> WRITE
> CLOSE
> SETATTR (mode 06666)
>
> ...then presumably a task on the client opens the file again, but the
> setuid bits don't get stripped.
>
> I think either the client will need to strip these bits on a delegated
> open, or we'll need to recall write delegations from the client when it
> tries to do a SETATTR with a mode that could later end up needing to be
> stripped on a subsequent open:
>
> 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
> commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
> Author: Dai Ngo <dai.ngo@oracle.com>
> Date:   Thu Jun 29 18:52:40 2023 -0700
>
>      NFSD: Enable write delegation support

The SETATTR should cause the delegation to be recalled. However, I think
there is an optimization on server that skips the recall if the SETATTR
comes from the same client that has the delegation.

I'll take a look.

Thanks,
-Dai

>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 22:23               ` dai.ngo
@ 2023-08-17 22:59                 ` dai.ngo
  2023-08-17 23:08                   ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: dai.ngo @ 2023-08-17 22:59 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey


On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
>
> On 8/17/23 2:07 PM, Jeff Layton wrote:
>> On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
>>> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
>>>>> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>
>>>>> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>>>>>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> 
>>>>>> wrote:
>>>>>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>>>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I finally got my kdevops 
>>>>>>>>> (https://github.com/linux-kdevops/kdevops) test
>>>>>>>>> rig working well enough to get some publishable results. To 
>>>>>>>>> run fstests,
>>>>>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>>>>>> xfstests' auto group. One client mounts with default options, 
>>>>>>>>> and the
>>>>>>>>> other uses NFSv3.
>>>>>>>>>
>>>>>>>>> I tested 3 kernels:
>>>>>>>>>
>>>>>>>>> v6.4.0 (stock release)
>>>>>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of 
>>>>>>>>> yesterday morning)
>>>>>>>>>
>>>>>>>>> Here are the results summary of all 3:
>>>>>>>>>
>>>>>>>>> KERNEL:    6.4.0
>>>>>>>>> CPUS:      8
>>>>>>>>>
>>>>>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>>>>>    generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>>>>>    generic/444 generic/528 generic/529
>>>>>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>    generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>>>>>    generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>>>>>    generic/529 generic/578 generic/675 generic/688
>>>>>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>>>>>>
>>>>>>>>> KERNEL:    6.5.0-rc6-g4853c74bd7ab
>>>>>>>>> CPUS:      8
>>>>>>>>>
>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>    generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>    generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>    generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>    generic/675 generic/688
>>>>>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>>>>>>
>>>>>>>>> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>>>>>> CPUS:      8
>>>>>>>>>
>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>    generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>    generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>    generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>    generic/675 generic/683 generic/684 generic/688
>>>>>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>>>>>> As long as we're sharing results ... here is what I'm seeing with a
>>>>>> 6.5-rc6 client & server:
>>>>>>
>>>>>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 
>>>>>> --color=none
>>>>>> +------+----------------------+---------+----------+------+------+------+-------+ 
>>>>>>
>>>>>>> run | device               | xunit   | hostname | pass | fail |
>>>>>> skip |  time |
>>>>>> +------+----------------------+---------+----------+------+------+------+-------+ 
>>>>>>
>>>>>>> 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
>>>>>> 464 | 447 s |
>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
>>>>>> 465 | 478 s |
>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
>>>>>> 462 | 404 s |
>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
>>>>>> 363 | 564 s |
>>>>>> +------+----------------------+---------+----------+------+------+------+-------+ 
>>>>>>
>>>>>>
>>>>>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>    testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>> generic/053 | passed  | failure | failure | failure |
>>>>>>> generic/099 | passed  | failure | failure | failure |
>>>>>>> generic/105 | passed  | failure | failure | failure |
>>>>>>> generic/140 | skipped | skipped | skipped | failure |
>>>>>>> generic/188 | skipped | skipped | skipped | failure |
>>>>>>> generic/258 | failure | passed  | passed  | failure |
>>>>>>> generic/294 | failure | failure | failure | failure |
>>>>>>> generic/318 | passed  | failure | failure | failure |
>>>>>>> generic/319 | passed  | failure | failure | failure |
>>>>>>> generic/357 | skipped | skipped | skipped | failure |
>>>>>>> generic/444 | failure | failure | failure | failure |
>>>>>>> generic/465 | passed  | failure | failure | failure |
>>>>>>> generic/513 | skipped | skipped | skipped | failure |
>>>>>>> generic/529 | passed  | failure | failure | failure |
>>>>>>> generic/604 | passed  | passed  | failure | passed  |
>>>>>>> generic/675 | skipped | skipped | skipped | failure |
>>>>>>> generic/688 | skipped | skipped | skipped | failure |
>>>>>>> generic/697 | passed  | failure | failure | failure |
>>>>>>>     nfs/002 | failure | failure | failure | failure |
>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>
>>>>>>
>>>>>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current 
>>>>>>>>> mainline
>>>>>>>>> kernel doesn't:
>>>>>>>>>
>>>>>>>>>    generic/193 (some sort of setattr problem)
>>>>>>>>>    generic/528 (known problem with btime handling in client 
>>>>>>>>> that has been fixed)
>>>>>>>>>
>>>>>>>>> While I haven't investigated, I'm assuming the 193 bug is also 
>>>>>>>>> something
>>>>>>>>> that has been fixed in recent kernels. There are also 3 other 
>>>>>>>>> NFSv3
>>>>>>>>> tests that started passing since v6.4.0. I haven't looked into 
>>>>>>>>> those.
>>>>>>>>>
>>>>>>>>> With the linux-next kernel there are 2 new regressions:
>>>>>>>>>
>>>>>>>>>    generic/683
>>>>>>>>>    generic/684
>>>>>>>>>
>>>>>>>>> Both of these look like problems with setuid/setgid stripping, 
>>>>>>>>> and still
>>>>>>>>> need to be investigated. I have more verbose result info on 
>>>>>>>>> the test
>>>>>>>>> failures if anyone is interested.
>>>>>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>>>>>> filesystem is your server exporting?
>>>>>>
>>>>> btrfs
>>>>>
>>>>> You are testing linux-next? I need to go back and confirm these 
>>>>> results
>>>>> too.
>>>> IMO linux-next is quite important : we keep hitting bugs that
>>>> appear only after integration -- block and network changes in
>>>> other trees especially can impact the NFS drivers.
>>>>
>>> Indeed, I suspect this is probably something from the vfs tree (though
>>> we definitely need to confirm that). Today I'm testing:
>>>
>>>      6.5.0-rc6-next-20230817-g47762f086974
>>>
>> Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
>> turning off leases on the nfs server and the test started passing. I
>> probably won't have the cycles to chase this down further.
>>
>> The capture looks something like this:
>>
>> OPEN (get a write delegation
>> WRITE
>> CLOSE
>> SETATTR (mode 06666)
>>
>> ...then presumably a task on the client opens the file again, but the
>> setuid bits don't get stripped.
>>
>> I think either the client will need to strip these bits on a delegated
>> open, or we'll need to recall write delegations from the client when it
>> tries to do a SETATTR with a mode that could later end up needing to be
>> stripped on a subsequent open:
>>
>> 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
>> commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
>> Author: Dai Ngo <dai.ngo@oracle.com>
>> Date:   Thu Jun 29 18:52:40 2023 -0700
>>
>>      NFSD: Enable write delegation support
>
> The SETATTR should cause the delegation to be recalled. However, I think
> there is an optimization on server that skips the recall if the SETATTR
> comes from the same client that has the delegation.

The optimization on the server was done by this commit:

28df3d1539de nfsd: clients don't need to break their own delegations

Perhaps we should allow this optimization for read delegation only?

Or should the NFS client be responsible for handling the SETATTR and
and local OPEN on the file that has write delegation granted?

-Dai


>
> I'll take a look.
>
> Thanks,
> -Dai
>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 22:59                 ` dai.ngo
@ 2023-08-17 23:08                   ` Jeff Layton
  2023-08-17 23:28                     ` dai.ngo
  2023-08-22 16:07                     ` dai.ngo
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff Layton @ 2023-08-17 23:08 UTC (permalink / raw)
  To: dai.ngo, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey

On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
> On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
> > 
> > On 8/17/23 2:07 PM, Jeff Layton wrote:
> > > On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
> > > > On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> > > > > > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > > > > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org> 
> > > > > > > wrote:
> > > > > > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org> 
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > I finally got my kdevops 
> > > > > > > > > > (https://github.com/linux-kdevops/kdevops) test
> > > > > > > > > > rig working well enough to get some publishable results. To 
> > > > > > > > > > run fstests,
> > > > > > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > > > > > xfstests' auto group. One client mounts with default options, 
> > > > > > > > > > and the
> > > > > > > > > > other uses NFSv3.
> > > > > > > > > > 
> > > > > > > > > > I tested 3 kernels:
> > > > > > > > > > 
> > > > > > > > > > v6.4.0 (stock release)
> > > > > > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of 
> > > > > > > > > > yesterday morning)
> > > > > > > > > > 
> > > > > > > > > > Here are the results summary of all 3:
> > > > > > > > > > 
> > > > > > > > > > KERNEL:    6.4.0
> > > > > > > > > > CPUS:      8
> > > > > > > > > > 
> > > > > > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > > > > > >    generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > > > > > >    generic/444 generic/528 generic/529
> > > > > > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > >    generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > > > > > >    generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > > > > > >    generic/529 generic/578 generic/675 generic/688
> > > > > > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > > > > > 
> > > > > > > > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > > > > > > > CPUS:      8
> > > > > > > > > > 
> > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > >    generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > >    generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > >    generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > >    generic/675 generic/688
> > > > > > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > > > > > 
> > > > > > > > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > > > > > CPUS:      8
> > > > > > > > > > 
> > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > >    generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > >    generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > >    generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > >    generic/675 generic/683 generic/684 generic/688
> > > > > > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > > > > > As long as we're sharing results ... here is what I'm seeing with a
> > > > > > > 6.5-rc6 client & server:
> > > > > > > 
> > > > > > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 
> > > > > > > --color=none
> > > > > > > +------+----------------------+---------+----------+------+------+------+-------+ 
> > > > > > > 
> > > > > > > > run | device               | xunit   | hostname | pass | fail |
> > > > > > > skip |  time |
> > > > > > > +------+----------------------+---------+----------+------+------+------+-------+ 
> > > > > > > 
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> > > > > > > 464 | 447 s |
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> > > > > > > 465 | 478 s |
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> > > > > > > 462 | 404 s |
> > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> > > > > > > 363 | 564 s |
> > > > > > > +------+----------------------+---------+----------+------+------+------+-------+ 
> > > > > > > 
> > > > > > > 
> > > > > > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > >    testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > generic/053 | passed  | failure | failure | failure |
> > > > > > > > generic/099 | passed  | failure | failure | failure |
> > > > > > > > generic/105 | passed  | failure | failure | failure |
> > > > > > > > generic/140 | skipped | skipped | skipped | failure |
> > > > > > > > generic/188 | skipped | skipped | skipped | failure |
> > > > > > > > generic/258 | failure | passed  | passed  | failure |
> > > > > > > > generic/294 | failure | failure | failure | failure |
> > > > > > > > generic/318 | passed  | failure | failure | failure |
> > > > > > > > generic/319 | passed  | failure | failure | failure |
> > > > > > > > generic/357 | skipped | skipped | skipped | failure |
> > > > > > > > generic/444 | failure | failure | failure | failure |
> > > > > > > > generic/465 | passed  | failure | failure | failure |
> > > > > > > > generic/513 | skipped | skipped | skipped | failure |
> > > > > > > > generic/529 | passed  | failure | failure | failure |
> > > > > > > > generic/604 | passed  | passed  | failure | passed  |
> > > > > > > > generic/675 | skipped | skipped | skipped | failure |
> > > > > > > > generic/688 | skipped | skipped | skipped | failure |
> > > > > > > > generic/697 | passed  | failure | failure | failure |
> > > > > > > >     nfs/002 | failure | failure | failure | failure |
> > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > 
> > > > > > > 
> > > > > > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current 
> > > > > > > > > > mainline
> > > > > > > > > > kernel doesn't:
> > > > > > > > > > 
> > > > > > > > > >    generic/193 (some sort of setattr problem)
> > > > > > > > > >    generic/528 (known problem with btime handling in client 
> > > > > > > > > > that has been fixed)
> > > > > > > > > > 
> > > > > > > > > > While I haven't investigated, I'm assuming the 193 bug is also 
> > > > > > > > > > something
> > > > > > > > > > that has been fixed in recent kernels. There are also 3 other 
> > > > > > > > > > NFSv3
> > > > > > > > > > tests that started passing since v6.4.0. I haven't looked into 
> > > > > > > > > > those.
> > > > > > > > > > 
> > > > > > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > > > > > 
> > > > > > > > > >    generic/683
> > > > > > > > > >    generic/684
> > > > > > > > > > 
> > > > > > > > > > Both of these look like problems with setuid/setgid stripping, 
> > > > > > > > > > and still
> > > > > > > > > > need to be investigated. I have more verbose result info on 
> > > > > > > > > > the test
> > > > > > > > > > failures if anyone is interested.
> > > > > > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > > > > > filesystem is your server exporting?
> > > > > > > 
> > > > > > btrfs
> > > > > > 
> > > > > > You are testing linux-next? I need to go back and confirm these 
> > > > > > results
> > > > > > too.
> > > > > IMO linux-next is quite important : we keep hitting bugs that
> > > > > appear only after integration -- block and network changes in
> > > > > other trees especially can impact the NFS drivers.
> > > > > 
> > > > Indeed, I suspect this is probably something from the vfs tree (though
> > > > we definitely need to confirm that). Today I'm testing:
> > > > 
> > > >      6.5.0-rc6-next-20230817-g47762f086974
> > > > 
> > > Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
> > > turning off leases on the nfs server and the test started passing. I
> > > probably won't have the cycles to chase this down further.
> > > 
> > > The capture looks something like this:
> > > 
> > > OPEN (get a write delegation
> > > WRITE
> > > CLOSE
> > > SETATTR (mode 06666)
> > > 
> > > ...then presumably a task on the client opens the file again, but the
> > > setuid bits don't get stripped.
> > > 
> > > I think either the client will need to strip these bits on a delegated
> > > open, or we'll need to recall write delegations from the client when it
> > > tries to do a SETATTR with a mode that could later end up needing to be
> > > stripped on a subsequent open:
> > > 
> > > 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
> > > commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
> > > Author: Dai Ngo <dai.ngo@oracle.com>
> > > Date:   Thu Jun 29 18:52:40 2023 -0700
> > > 
> > >      NFSD: Enable write delegation support
> > 
> > The SETATTR should cause the delegation to be recalled. However, I think
> > there is an optimization on server that skips the recall if the SETATTR
> > comes from the same client that has the delegation.
> 
> The optimization on the server was done by this commit:
> 
> 28df3d1539de nfsd: clients don't need to break their own delegations
> 
> Perhaps we should allow this optimization for read delegation only?
> 
> Or should the NFS client be responsible for handling the SETATTR and
> and local OPEN on the file that has write delegation granted?
>

I think that setuid/setgid files are really a special case.

We already avoid giving out delegations on setuid/gid files. What we're
not doing currently is revoking the write delegation if the holder tries
to set a mode that involves a setuid/gid bit. If we add that, then that
should close the hole, I think.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 23:08                   ` Jeff Layton
@ 2023-08-17 23:28                     ` dai.ngo
  2023-08-22 16:07                     ` dai.ngo
  1 sibling, 0 replies; 16+ messages in thread
From: dai.ngo @ 2023-08-17 23:28 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey


On 8/17/23 4:08 PM, Jeff Layton wrote:
> On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
>> On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
>>> On 8/17/23 2:07 PM, Jeff Layton wrote:
>>>> On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
>>>>> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
>>>>>>> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>>>>>>>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
>>>>>>>> wrote:
>>>>>>>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>>>>>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I finally got my kdevops
>>>>>>>>>>> (https://github.com/linux-kdevops/kdevops) test
>>>>>>>>>>> rig working well enough to get some publishable results. To
>>>>>>>>>>> run fstests,
>>>>>>>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>>>>>>>> xfstests' auto group. One client mounts with default options,
>>>>>>>>>>> and the
>>>>>>>>>>> other uses NFSv3.
>>>>>>>>>>>
>>>>>>>>>>> I tested 3 kernels:
>>>>>>>>>>>
>>>>>>>>>>> v6.4.0 (stock release)
>>>>>>>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>>>>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
>>>>>>>>>>> yesterday morning)
>>>>>>>>>>>
>>>>>>>>>>> Here are the results summary of all 3:
>>>>>>>>>>>
>>>>>>>>>>> KERNEL:    6.4.0
>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>>>>>>>     generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>>>>>>>     generic/444 generic/528 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>     generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>>>>>>>     generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>>>>>>>     generic/529 generic/578 generic/675 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>>>>>>>>
>>>>>>>>>>> KERNEL:    6.5.0-rc6-g4853c74bd7ab
>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>>     generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>     generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>>     generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>>     generic/675 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>>>>>>>>
>>>>>>>>>>> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>>     generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>     generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>>     generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>>     generic/675 generic/683 generic/684 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>>>>>>>> As long as we're sharing results ... here is what I'm seeing with a
>>>>>>>> 6.5-rc6 client & server:
>>>>>>>>
>>>>>>>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
>>>>>>>> --color=none
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>> run | device               | xunit   | hostname | pass | fail |
>>>>>>>> skip |  time |
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
>>>>>>>> 464 | 447 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
>>>>>>>> 465 | 478 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
>>>>>>>> 462 | 404 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
>>>>>>>> 363 | 564 s |
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>
>>>>>>>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>>     testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>> generic/053 | passed  | failure | failure | failure |
>>>>>>>>> generic/099 | passed  | failure | failure | failure |
>>>>>>>>> generic/105 | passed  | failure | failure | failure |
>>>>>>>>> generic/140 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/188 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/258 | failure | passed  | passed  | failure |
>>>>>>>>> generic/294 | failure | failure | failure | failure |
>>>>>>>>> generic/318 | passed  | failure | failure | failure |
>>>>>>>>> generic/319 | passed  | failure | failure | failure |
>>>>>>>>> generic/357 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/444 | failure | failure | failure | failure |
>>>>>>>>> generic/465 | passed  | failure | failure | failure |
>>>>>>>>> generic/513 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/529 | passed  | failure | failure | failure |
>>>>>>>>> generic/604 | passed  | passed  | failure | passed  |
>>>>>>>>> generic/675 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/688 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/697 | passed  | failure | failure | failure |
>>>>>>>>>      nfs/002 | failure | failure | failure | failure |
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current
>>>>>>>>>>> mainline
>>>>>>>>>>> kernel doesn't:
>>>>>>>>>>>
>>>>>>>>>>>     generic/193 (some sort of setattr problem)
>>>>>>>>>>>     generic/528 (known problem with btime handling in client
>>>>>>>>>>> that has been fixed)
>>>>>>>>>>>
>>>>>>>>>>> While I haven't investigated, I'm assuming the 193 bug is also
>>>>>>>>>>> something
>>>>>>>>>>> that has been fixed in recent kernels. There are also 3 other
>>>>>>>>>>> NFSv3
>>>>>>>>>>> tests that started passing since v6.4.0. I haven't looked into
>>>>>>>>>>> those.
>>>>>>>>>>>
>>>>>>>>>>> With the linux-next kernel there are 2 new regressions:
>>>>>>>>>>>
>>>>>>>>>>>     generic/683
>>>>>>>>>>>     generic/684
>>>>>>>>>>>
>>>>>>>>>>> Both of these look like problems with setuid/setgid stripping,
>>>>>>>>>>> and still
>>>>>>>>>>> need to be investigated. I have more verbose result info on
>>>>>>>>>>> the test
>>>>>>>>>>> failures if anyone is interested.
>>>>>>>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>>>>>>>> filesystem is your server exporting?
>>>>>>>>
>>>>>>> btrfs
>>>>>>>
>>>>>>> You are testing linux-next? I need to go back and confirm these
>>>>>>> results
>>>>>>> too.
>>>>>> IMO linux-next is quite important : we keep hitting bugs that
>>>>>> appear only after integration -- block and network changes in
>>>>>> other trees especially can impact the NFS drivers.
>>>>>>
>>>>> Indeed, I suspect this is probably something from the vfs tree (though
>>>>> we definitely need to confirm that). Today I'm testing:
>>>>>
>>>>>       6.5.0-rc6-next-20230817-g47762f086974
>>>>>
>>>> Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
>>>> turning off leases on the nfs server and the test started passing. I
>>>> probably won't have the cycles to chase this down further.
>>>>
>>>> The capture looks something like this:
>>>>
>>>> OPEN (get a write delegation
>>>> WRITE
>>>> CLOSE
>>>> SETATTR (mode 06666)
>>>>
>>>> ...then presumably a task on the client opens the file again, but the
>>>> setuid bits don't get stripped.
>>>>
>>>> I think either the client will need to strip these bits on a delegated
>>>> open, or we'll need to recall write delegations from the client when it
>>>> tries to do a SETATTR with a mode that could later end up needing to be
>>>> stripped on a subsequent open:
>>>>
>>>> 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
>>>> commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
>>>> Author: Dai Ngo <dai.ngo@oracle.com>
>>>> Date:   Thu Jun 29 18:52:40 2023 -0700
>>>>
>>>>       NFSD: Enable write delegation support
>>> The SETATTR should cause the delegation to be recalled. However, I think
>>> there is an optimization on server that skips the recall if the SETATTR
>>> comes from the same client that has the delegation.
>> The optimization on the server was done by this commit:
>>
>> 28df3d1539de nfsd: clients don't need to break their own delegations
>>
>> Perhaps we should allow this optimization for read delegation only?
>>
>> Or should the NFS client be responsible for handling the SETATTR and
>> and local OPEN on the file that has write delegation granted?
>>
> I think that setuid/setgid files are really a special case.
>
> We already avoid giving out delegations on setuid/gid files. What we're
> not doing currently is revoking the write delegation if the holder tries
> to set a mode that involves a setuid/gid bit. If we add that, then that
> should close the hole, I think.

This approach seems reasonable, I'll work the patch to take care of this
condition.

Thanks Jeff,
-Dai

>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-17 23:08                   ` Jeff Layton
  2023-08-17 23:28                     ` dai.ngo
@ 2023-08-22 16:07                     ` dai.ngo
  2023-08-22 17:02                       ` Jeff Layton
  1 sibling, 1 reply; 16+ messages in thread
From: dai.ngo @ 2023-08-22 16:07 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey


On 8/17/23 4:08 PM, Jeff Layton wrote:
> On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
>> On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
>>> On 8/17/23 2:07 PM, Jeff Layton wrote:
>>>> On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
>>>>> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
>>>>>>> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>>>>>>>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
>>>>>>>> wrote:
>>>>>>>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>>>>>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I finally got my kdevops
>>>>>>>>>>> (https://github.com/linux-kdevops/kdevops) test
>>>>>>>>>>> rig working well enough to get some publishable results. To
>>>>>>>>>>> run fstests,
>>>>>>>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>>>>>>>> xfstests' auto group. One client mounts with default options,
>>>>>>>>>>> and the
>>>>>>>>>>> other uses NFSv3.
>>>>>>>>>>>
>>>>>>>>>>> I tested 3 kernels:
>>>>>>>>>>>
>>>>>>>>>>> v6.4.0 (stock release)
>>>>>>>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>>>>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
>>>>>>>>>>> yesterday morning)
>>>>>>>>>>>
>>>>>>>>>>> Here are the results summary of all 3:
>>>>>>>>>>>
>>>>>>>>>>> KERNEL:    6.4.0
>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>>>>>>>     generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>>>>>>>     generic/444 generic/528 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>     generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>>>>>>>     generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>>>>>>>     generic/529 generic/578 generic/675 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>>>>>>>>
>>>>>>>>>>> KERNEL:    6.5.0-rc6-g4853c74bd7ab
>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>>     generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>     generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>>     generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>>     generic/675 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>>>>>>>>
>>>>>>>>>>> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>>     generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>     generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>>     generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>>     generic/675 generic/683 generic/684 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>>>>>>>> As long as we're sharing results ... here is what I'm seeing with a
>>>>>>>> 6.5-rc6 client & server:
>>>>>>>>
>>>>>>>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
>>>>>>>> --color=none
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>> run | device               | xunit   | hostname | pass | fail |
>>>>>>>> skip |  time |
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
>>>>>>>> 464 | 447 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
>>>>>>>> 465 | 478 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
>>>>>>>> 462 | 404 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
>>>>>>>> 363 | 564 s |
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>
>>>>>>>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>>     testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>> generic/053 | passed  | failure | failure | failure |
>>>>>>>>> generic/099 | passed  | failure | failure | failure |
>>>>>>>>> generic/105 | passed  | failure | failure | failure |
>>>>>>>>> generic/140 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/188 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/258 | failure | passed  | passed  | failure |
>>>>>>>>> generic/294 | failure | failure | failure | failure |
>>>>>>>>> generic/318 | passed  | failure | failure | failure |
>>>>>>>>> generic/319 | passed  | failure | failure | failure |
>>>>>>>>> generic/357 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/444 | failure | failure | failure | failure |
>>>>>>>>> generic/465 | passed  | failure | failure | failure |
>>>>>>>>> generic/513 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/529 | passed  | failure | failure | failure |
>>>>>>>>> generic/604 | passed  | passed  | failure | passed  |
>>>>>>>>> generic/675 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/688 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/697 | passed  | failure | failure | failure |
>>>>>>>>>      nfs/002 | failure | failure | failure | failure |
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current
>>>>>>>>>>> mainline
>>>>>>>>>>> kernel doesn't:
>>>>>>>>>>>
>>>>>>>>>>>     generic/193 (some sort of setattr problem)
>>>>>>>>>>>     generic/528 (known problem with btime handling in client
>>>>>>>>>>> that has been fixed)
>>>>>>>>>>>
>>>>>>>>>>> While I haven't investigated, I'm assuming the 193 bug is also
>>>>>>>>>>> something
>>>>>>>>>>> that has been fixed in recent kernels. There are also 3 other
>>>>>>>>>>> NFSv3
>>>>>>>>>>> tests that started passing since v6.4.0. I haven't looked into
>>>>>>>>>>> those.
>>>>>>>>>>>
>>>>>>>>>>> With the linux-next kernel there are 2 new regressions:
>>>>>>>>>>>
>>>>>>>>>>>     generic/683
>>>>>>>>>>>     generic/684
>>>>>>>>>>>
>>>>>>>>>>> Both of these look like problems with setuid/setgid stripping,
>>>>>>>>>>> and still
>>>>>>>>>>> need to be investigated. I have more verbose result info on
>>>>>>>>>>> the test
>>>>>>>>>>> failures if anyone is interested.
>>>>>>>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>>>>>>>> filesystem is your server exporting?
>>>>>>>>
>>>>>>> btrfs
>>>>>>>
>>>>>>> You are testing linux-next? I need to go back and confirm these
>>>>>>> results
>>>>>>> too.
>>>>>> IMO linux-next is quite important : we keep hitting bugs that
>>>>>> appear only after integration -- block and network changes in
>>>>>> other trees especially can impact the NFS drivers.
>>>>>>
>>>>> Indeed, I suspect this is probably something from the vfs tree (though
>>>>> we definitely need to confirm that). Today I'm testing:
>>>>>
>>>>>       6.5.0-rc6-next-20230817-g47762f086974
>>>>>
>>>> Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
>>>> turning off leases on the nfs server and the test started passing. I
>>>> probably won't have the cycles to chase this down further.
>>>>
>>>> The capture looks something like this:
>>>>
>>>> OPEN (get a write delegation
>>>> WRITE
>>>> CLOSE
>>>> SETATTR (mode 06666)
>>>>
>>>> ...then presumably a task on the client opens the file again, but the
>>>> setuid bits don't get stripped.

OPEN (get a write delegation
WRITE
CLOSE
SETATTR (mode 06666)

The client continues with:

(ALLOCATE,GETATTR)  <<===  this is when the server stripped the SUID and SGID bit
READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
DELERETURN

Here is stack trace of ALLOCATE when the SUID & SGID were stripped:

**** start of notify_change, notice the i_mode bits, SUID & SGID were set:
[notify_change]: d_iname[a] ia_valid[0x1a00] ia_mode[0x0] i_mode[0x8db6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
                         KILL[0] KILL_SUID[1] KILL_SGID[1]

**** end of notify_change, notice the i_mode bits, SUID & SGID were stripped:
[notify_change]: RET[0] d_iname[a] ia_valid[0x1a01] ia_mode[0x81b6] i_mode[0x81b6] [nfsd:2409:Mon Aug 21 23:05:31 2023]

**** stack trace of notify_change comes from ALLOCATE:
Returning from:  0xffffffffb726e764 : notify_change+0x4/0x500 [kernel]
Returning to  :  0xffffffffb726bf99 : __file_remove_privs+0x119/0x170 [kernel]
  0xffffffffb726cfad : file_modified_flags+0x4d/0x110 [kernel]
  0xffffffffc0a2330b : xfs_file_fallocate+0xfb/0x490 [xfs]
  0xffffffffb723e7d8 : vfs_fallocate+0x158/0x380 [kernel]
  0xffffffffc0ddc30a : nfsd4_vfs_fallocate+0x4a/0x70 [nfsd]
  0xffffffffc0def7f2 : nfsd4_allocate+0x72/0xc0 [nfsd]
  0xffffffffc0df2663 : nfsd4_proc_compound+0x3d3/0x730 [nfsd]
  0xffffffffc0dd633b : nfsd_dispatch+0xab/0x1d0 [nfsd]
  0xffffffffc0bda476 : svc_process_common+0x306/0x6e0 [sunrpc]
  0xffffffffc0bdb081 : svc_process+0x131/0x180 [sunrpc]
  0xffffffffc0dd4864 : nfsd+0x84/0xd0 [nfsd]
  0xffffffffb6f0bfd6 : kthread+0xe6/0x120 [kernel]
  0xffffffffb6e587d4 : ret_from_fork+0x34/0x50 [kernel]
  0xffffffffb6e03a3b : ret_from_fork_asm+0x1b/0x30 [kernel]

I think the problem here is that the client does not update the file
attribute after ALLOCATE. The GETATTR in the ALLOCATE compound does
not include the mode bits.

The READDIR's reply show the test file's mode has the SUID & SGID bit
stripped (0666) but apparently these were not used o update the file
attribute.

The test passes when server does not grant write delegation because:

OPEN
WRITE
CLOSE
SETATTR (06666)
OPEN (CLAIM_FH, NOCREATE)
ALLOCATE        <<=== server clear SUID & SGID
GETATTR, CLOSE  <<=== GETATTR has mode bit as 0666, client updates file attribute
READDIR
READDIR

As expected, if the server recalls the write delegation when SETATTR
with SUID/SGID set then the test passes. This is because it forces the
client to send the 2nd OPEN with CLAIM_FH, NOCREATE and then the
(GETATTR, CLOSE) which cause the client to update the file attribute.

-Dai

P.S I have the pcaps of the pass, fail and fixed case if anyone
want to see. I do the tests with generic/683.

>>>>
>>>> I think either the client will need to strip these bits on a delegated
>>>> open, or we'll need to recall write delegations from the client when it
>>>> tries to do a SETATTR with a mode that could later end up needing to be
>>>> stripped on a subsequent open:
>>>>
>>>> 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
>>>> commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
>>>> Author: Dai Ngo <dai.ngo@oracle.com>
>>>> Date:   Thu Jun 29 18:52:40 2023 -0700
>>>>
>>>>       NFSD: Enable write delegation support
>>> The SETATTR should cause the delegation to be recalled. However, I think
>>> there is an optimization on server that skips the recall if the SETATTR
>>> comes from the same client that has the delegation.
>> The optimization on the server was done by this commit:
>>
>> 28df3d1539de nfsd: clients don't need to break their own delegations
>>
>> Perhaps we should allow this optimization for read delegation only?
>>
>> Or should the NFS client be responsible for handling the SETATTR and
>> and local OPEN on the file that has write delegation granted?
>>
> I think that setuid/setgid files are really a special case.
>
> We already avoid giving out delegations on setuid/gid files. What we're
> not doing currently is revoking the write delegation if the holder tries
> to set a mode that involves a setuid/gid bit. If we add that, then that
> should close the hole, I think.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-22 16:07                     ` dai.ngo
@ 2023-08-22 17:02                       ` Jeff Layton
  2023-08-22 19:51                         ` dai.ngo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2023-08-22 17:02 UTC (permalink / raw)
  To: dai.ngo, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey

On Tue, 2023-08-22 at 09:07 -0700, dai.ngo@oracle.com wrote:
> On 8/17/23 4:08 PM, Jeff Layton wrote:
> > On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
> > > On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
> > > > On 8/17/23 2:07 PM, Jeff Layton wrote:
> > > > > On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
> > > > > > On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> > > > > > > > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > > 
> > > > > > > > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > > > > > > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > > > > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > I finally got my kdevops
> > > > > > > > > > > > (https://github.com/linux-kdevops/kdevops) test
> > > > > > > > > > > > rig working well enough to get some publishable results. To
> > > > > > > > > > > > run fstests,
> > > > > > > > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > > > > > > > xfstests' auto group. One client mounts with default options,
> > > > > > > > > > > > and the
> > > > > > > > > > > > other uses NFSv3.
> > > > > > > > > > > > 
> > > > > > > > > > > > I tested 3 kernels:
> > > > > > > > > > > > 
> > > > > > > > > > > > v6.4.0 (stock release)
> > > > > > > > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > > > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
> > > > > > > > > > > > yesterday morning)
> > > > > > > > > > > > 
> > > > > > > > > > > > Here are the results summary of all 3:
> > > > > > > > > > > > 
> > > > > > > > > > > > KERNEL:    6.4.0
> > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > 
> > > > > > > > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > > > > > > > >     generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > > > > > > > >     generic/444 generic/528 generic/529
> > > > > > > > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > >     generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > > > > > > > >     generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > > > > > > > >     generic/529 generic/578 generic/675 generic/688
> > > > > > > > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > > > > > > > 
> > > > > > > > > > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > 
> > > > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > > >     generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > >     generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > > >     generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > > >     generic/675 generic/688
> > > > > > > > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > > > > > > > 
> > > > > > > > > > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > 
> > > > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > > >     generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > >     generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > > >     generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > > >     generic/675 generic/683 generic/684 generic/688
> > > > > > > > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > > > > > > > As long as we're sharing results ... here is what I'm seeing with a
> > > > > > > > > 6.5-rc6 client & server:
> > > > > > > > > 
> > > > > > > > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
> > > > > > > > > --color=none
> > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > 
> > > > > > > > > > run | device               | xunit   | hostname | pass | fail |
> > > > > > > > > skip |  time |
> > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > 
> > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> > > > > > > > > 464 | 447 s |
> > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> > > > > > > > > 465 | 478 s |
> > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> > > > > > > > > 462 | 404 s |
> > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> > > > > > > > > 363 | 564 s |
> > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > >     testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > generic/053 | passed  | failure | failure | failure |
> > > > > > > > > > generic/099 | passed  | failure | failure | failure |
> > > > > > > > > > generic/105 | passed  | failure | failure | failure |
> > > > > > > > > > generic/140 | skipped | skipped | skipped | failure |
> > > > > > > > > > generic/188 | skipped | skipped | skipped | failure |
> > > > > > > > > > generic/258 | failure | passed  | passed  | failure |
> > > > > > > > > > generic/294 | failure | failure | failure | failure |
> > > > > > > > > > generic/318 | passed  | failure | failure | failure |
> > > > > > > > > > generic/319 | passed  | failure | failure | failure |
> > > > > > > > > > generic/357 | skipped | skipped | skipped | failure |
> > > > > > > > > > generic/444 | failure | failure | failure | failure |
> > > > > > > > > > generic/465 | passed  | failure | failure | failure |
> > > > > > > > > > generic/513 | skipped | skipped | skipped | failure |
> > > > > > > > > > generic/529 | passed  | failure | failure | failure |
> > > > > > > > > > generic/604 | passed  | passed  | failure | passed  |
> > > > > > > > > > generic/675 | skipped | skipped | skipped | failure |
> > > > > > > > > > generic/688 | skipped | skipped | skipped | failure |
> > > > > > > > > > generic/697 | passed  | failure | failure | failure |
> > > > > > > > > >      nfs/002 | failure | failure | failure | failure |
> > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current
> > > > > > > > > > > > mainline
> > > > > > > > > > > > kernel doesn't:
> > > > > > > > > > > > 
> > > > > > > > > > > >     generic/193 (some sort of setattr problem)
> > > > > > > > > > > >     generic/528 (known problem with btime handling in client
> > > > > > > > > > > > that has been fixed)
> > > > > > > > > > > > 
> > > > > > > > > > > > While I haven't investigated, I'm assuming the 193 bug is also
> > > > > > > > > > > > something
> > > > > > > > > > > > that has been fixed in recent kernels. There are also 3 other
> > > > > > > > > > > > NFSv3
> > > > > > > > > > > > tests that started passing since v6.4.0. I haven't looked into
> > > > > > > > > > > > those.
> > > > > > > > > > > > 
> > > > > > > > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > > > > > > > 
> > > > > > > > > > > >     generic/683
> > > > > > > > > > > >     generic/684
> > > > > > > > > > > > 
> > > > > > > > > > > > Both of these look like problems with setuid/setgid stripping,
> > > > > > > > > > > > and still
> > > > > > > > > > > > need to be investigated. I have more verbose result info on
> > > > > > > > > > > > the test
> > > > > > > > > > > > failures if anyone is interested.
> > > > > > > > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > > > > > > > filesystem is your server exporting?
> > > > > > > > > 
> > > > > > > > btrfs
> > > > > > > > 
> > > > > > > > You are testing linux-next? I need to go back and confirm these
> > > > > > > > results
> > > > > > > > too.
> > > > > > > IMO linux-next is quite important : we keep hitting bugs that
> > > > > > > appear only after integration -- block and network changes in
> > > > > > > other trees especially can impact the NFS drivers.
> > > > > > > 
> > > > > > Indeed, I suspect this is probably something from the vfs tree (though
> > > > > > we definitely need to confirm that). Today I'm testing:
> > > > > > 
> > > > > >       6.5.0-rc6-next-20230817-g47762f086974
> > > > > > 
> > > > > Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
> > > > > turning off leases on the nfs server and the test started passing. I
> > > > > probably won't have the cycles to chase this down further.
> > > > > 
> > > > > The capture looks something like this:
> > > > > 
> > > > > OPEN (get a write delegation
> > > > > WRITE
> > > > > CLOSE
> > > > > SETATTR (mode 06666)
> > > > > 
> > > > > ...then presumably a task on the client opens the file again, but the
> > > > > setuid bits don't get stripped.
> 
> OPEN (get a write delegation
> WRITE
> CLOSE
> SETATTR (mode 06666)
> 
> The client continues with:
> 
> (ALLOCATE,GETATTR)  <<===  this is when the server stripped the SUID and SGID bit
> READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
> READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
> DELERETURN
> 
> Here is stack trace of ALLOCATE when the SUID & SGID were stripped:
> 
> **** start of notify_change, notice the i_mode bits, SUID & SGID were set:
> [notify_change]: d_iname[a] ia_valid[0x1a00] ia_mode[0x0] i_mode[0x8db6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
>                          KILL[0] KILL_SUID[1] KILL_SGID[1]
> 
> **** end of notify_change, notice the i_mode bits, SUID & SGID were stripped:
> [notify_change]: RET[0] d_iname[a] ia_valid[0x1a01] ia_mode[0x81b6] i_mode[0x81b6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
> 
> **** stack trace of notify_change comes from ALLOCATE:
> Returning from:  0xffffffffb726e764 : notify_change+0x4/0x500 [kernel]
> Returning to  :  0xffffffffb726bf99 : __file_remove_privs+0x119/0x170 [kernel]
>   0xffffffffb726cfad : file_modified_flags+0x4d/0x110 [kernel]
>   0xffffffffc0a2330b : xfs_file_fallocate+0xfb/0x490 [xfs]
>   0xffffffffb723e7d8 : vfs_fallocate+0x158/0x380 [kernel]
>   0xffffffffc0ddc30a : nfsd4_vfs_fallocate+0x4a/0x70 [nfsd]
>   0xffffffffc0def7f2 : nfsd4_allocate+0x72/0xc0 [nfsd]
>   0xffffffffc0df2663 : nfsd4_proc_compound+0x3d3/0x730 [nfsd]
>   0xffffffffc0dd633b : nfsd_dispatch+0xab/0x1d0 [nfsd]
>   0xffffffffc0bda476 : svc_process_common+0x306/0x6e0 [sunrpc]
>   0xffffffffc0bdb081 : svc_process+0x131/0x180 [sunrpc]
>   0xffffffffc0dd4864 : nfsd+0x84/0xd0 [nfsd]
>   0xffffffffb6f0bfd6 : kthread+0xe6/0x120 [kernel]
>   0xffffffffb6e587d4 : ret_from_fork+0x34/0x50 [kernel]
>   0xffffffffb6e03a3b : ret_from_fork_asm+0x1b/0x30 [kernel]
> 
> I think the problem here is that the client does not update the file
> attribute after ALLOCATE. The GETATTR in the ALLOCATE compound does
> not include the mode bits.
> 

Oh, interesting! Have you tried adding the FATTR4_MODE to that GETATTR
call on the client? Does it also fix this?

> The READDIR's reply show the test file's mode has the SUID & SGID bit
> stripped (0666) but apparently these were not used o update the file
> attribute.
> 
> The test passes when server does not grant write delegation because:
> 
> OPEN
> WRITE
> CLOSE
> SETATTR (06666)
> OPEN (CLAIM_FH, NOCREATE)
> ALLOCATE        <<=== server clear SUID & SGID
> GETATTR, CLOSE  <<=== GETATTR has mode bit as 0666, client updates file attribute
> READDIR
> READDIR
> 
> As expected, if the server recalls the write delegation when SETATTR
> with SUID/SGID set then the test passes. This is because it forces the
> client to send the 2nd OPEN with CLAIM_FH, NOCREATE and then the
> (GETATTR, CLOSE) which cause the client to update the file attribute.
> 

What's your sense of the best way to fix this? The stripping of mode
bits isn't covered by the NFSv4 spec, so this will ultimately come down
to a judgment call.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-22 17:02                       ` Jeff Layton
@ 2023-08-22 19:51                         ` dai.ngo
  2023-08-22 23:15                           ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: dai.ngo @ 2023-08-22 19:51 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey


On 8/22/23 10:02 AM, Jeff Layton wrote:
> On Tue, 2023-08-22 at 09:07 -0700, dai.ngo@oracle.com wrote:
>> On 8/17/23 4:08 PM, Jeff Layton wrote:
>>> On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
>>>> On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
>>>>> On 8/17/23 2:07 PM, Jeff Layton wrote:
>>>>>> On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
>>>>>>> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
>>>>>>>>> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>>>
>>>>>>>>> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>>>>>>>>>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
>>>>>>>>>> wrote:
>>>>>>>>>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>>>>>>>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I finally got my kdevops
>>>>>>>>>>>>> (https://github.com/linux-kdevops/kdevops) test
>>>>>>>>>>>>> rig working well enough to get some publishable results. To
>>>>>>>>>>>>> run fstests,
>>>>>>>>>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>>>>>>>>>> xfstests' auto group. One client mounts with default options,
>>>>>>>>>>>>> and the
>>>>>>>>>>>>> other uses NFSv3.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tested 3 kernels:
>>>>>>>>>>>>>
>>>>>>>>>>>>> v6.4.0 (stock release)
>>>>>>>>>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>>>>>>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
>>>>>>>>>>>>> yesterday morning)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here are the results summary of all 3:
>>>>>>>>>>>>>
>>>>>>>>>>>>> KERNEL:    6.4.0
>>>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>>>
>>>>>>>>>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>>>>>>>>>      generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>>>>>>>>>      generic/444 generic/528 generic/529
>>>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>>>      generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>>>>>>>>>      generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>>>>>>>>>      generic/529 generic/578 generic/675 generic/688
>>>>>>>>>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>>>>>>>>>>
>>>>>>>>>>>>> KERNEL:    6.5.0-rc6-g4853c74bd7ab
>>>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>>>
>>>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>>>>      generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>>>      generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>>>>      generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>>>>      generic/675 generic/688
>>>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>>>>>>>>>>
>>>>>>>>>>>>> KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>>>>>>>>>> CPUS:      8
>>>>>>>>>>>>>
>>>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>>>>      generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>>>>      generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>>>>      generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>>>>      generic/675 generic/683 generic/684 generic/688
>>>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>>>>>>>>>> As long as we're sharing results ... here is what I'm seeing with a
>>>>>>>>>> 6.5-rc6 client & server:
>>>>>>>>>>
>>>>>>>>>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
>>>>>>>>>> --color=none
>>>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>>>
>>>>>>>>>>> run | device               | xunit   | hostname | pass | fail |
>>>>>>>>>> skip |  time |
>>>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>>>
>>>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
>>>>>>>>>> 464 | 447 s |
>>>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
>>>>>>>>>> 465 | 478 s |
>>>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
>>>>>>>>>> 462 | 404 s |
>>>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
>>>>>>>>>> 363 | 564 s |
>>>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>>>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>>>>      testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>>>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>>>> generic/053 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/099 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/105 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/140 | skipped | skipped | skipped | failure |
>>>>>>>>>>> generic/188 | skipped | skipped | skipped | failure |
>>>>>>>>>>> generic/258 | failure | passed  | passed  | failure |
>>>>>>>>>>> generic/294 | failure | failure | failure | failure |
>>>>>>>>>>> generic/318 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/319 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/357 | skipped | skipped | skipped | failure |
>>>>>>>>>>> generic/444 | failure | failure | failure | failure |
>>>>>>>>>>> generic/465 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/513 | skipped | skipped | skipped | failure |
>>>>>>>>>>> generic/529 | passed  | failure | failure | failure |
>>>>>>>>>>> generic/604 | passed  | passed  | failure | passed  |
>>>>>>>>>>> generic/675 | skipped | skipped | skipped | failure |
>>>>>>>>>>> generic/688 | skipped | skipped | skipped | failure |
>>>>>>>>>>> generic/697 | passed  | failure | failure | failure |
>>>>>>>>>>>       nfs/002 | failure | failure | failure | failure |
>>>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current
>>>>>>>>>>>>> mainline
>>>>>>>>>>>>> kernel doesn't:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      generic/193 (some sort of setattr problem)
>>>>>>>>>>>>>      generic/528 (known problem with btime handling in client
>>>>>>>>>>>>> that has been fixed)
>>>>>>>>>>>>>
>>>>>>>>>>>>> While I haven't investigated, I'm assuming the 193 bug is also
>>>>>>>>>>>>> something
>>>>>>>>>>>>> that has been fixed in recent kernels. There are also 3 other
>>>>>>>>>>>>> NFSv3
>>>>>>>>>>>>> tests that started passing since v6.4.0. I haven't looked into
>>>>>>>>>>>>> those.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With the linux-next kernel there are 2 new regressions:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      generic/683
>>>>>>>>>>>>>      generic/684
>>>>>>>>>>>>>
>>>>>>>>>>>>> Both of these look like problems with setuid/setgid stripping,
>>>>>>>>>>>>> and still
>>>>>>>>>>>>> need to be investigated. I have more verbose result info on
>>>>>>>>>>>>> the test
>>>>>>>>>>>>> failures if anyone is interested.
>>>>>>>>>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>>>>>>>>>> filesystem is your server exporting?
>>>>>>>>>>
>>>>>>>>> btrfs
>>>>>>>>>
>>>>>>>>> You are testing linux-next? I need to go back and confirm these
>>>>>>>>> results
>>>>>>>>> too.
>>>>>>>> IMO linux-next is quite important : we keep hitting bugs that
>>>>>>>> appear only after integration -- block and network changes in
>>>>>>>> other trees especially can impact the NFS drivers.
>>>>>>>>
>>>>>>> Indeed, I suspect this is probably something from the vfs tree (though
>>>>>>> we definitely need to confirm that). Today I'm testing:
>>>>>>>
>>>>>>>        6.5.0-rc6-next-20230817-g47762f086974
>>>>>>>
>>>>>> Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
>>>>>> turning off leases on the nfs server and the test started passing. I
>>>>>> probably won't have the cycles to chase this down further.
>>>>>>
>>>>>> The capture looks something like this:
>>>>>>
>>>>>> OPEN (get a write delegation
>>>>>> WRITE
>>>>>> CLOSE
>>>>>> SETATTR (mode 06666)
>>>>>>
>>>>>> ...then presumably a task on the client opens the file again, but the
>>>>>> setuid bits don't get stripped.
>> OPEN (get a write delegation
>> WRITE
>> CLOSE
>> SETATTR (mode 06666)
>>
>> The client continues with:
>>
>> (ALLOCATE,GETATTR)  <<===  this is when the server stripped the SUID and SGID bit
>> READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
>> READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
>> DELERETURN
>>
>> Here is stack trace of ALLOCATE when the SUID & SGID were stripped:
>>
>> **** start of notify_change, notice the i_mode bits, SUID & SGID were set:
>> [notify_change]: d_iname[a] ia_valid[0x1a00] ia_mode[0x0] i_mode[0x8db6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
>>                           KILL[0] KILL_SUID[1] KILL_SGID[1]
>>
>> **** end of notify_change, notice the i_mode bits, SUID & SGID were stripped:
>> [notify_change]: RET[0] d_iname[a] ia_valid[0x1a01] ia_mode[0x81b6] i_mode[0x81b6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
>>
>> **** stack trace of notify_change comes from ALLOCATE:
>> Returning from:  0xffffffffb726e764 : notify_change+0x4/0x500 [kernel]
>> Returning to  :  0xffffffffb726bf99 : __file_remove_privs+0x119/0x170 [kernel]
>>    0xffffffffb726cfad : file_modified_flags+0x4d/0x110 [kernel]
>>    0xffffffffc0a2330b : xfs_file_fallocate+0xfb/0x490 [xfs]
>>    0xffffffffb723e7d8 : vfs_fallocate+0x158/0x380 [kernel]
>>    0xffffffffc0ddc30a : nfsd4_vfs_fallocate+0x4a/0x70 [nfsd]
>>    0xffffffffc0def7f2 : nfsd4_allocate+0x72/0xc0 [nfsd]
>>    0xffffffffc0df2663 : nfsd4_proc_compound+0x3d3/0x730 [nfsd]
>>    0xffffffffc0dd633b : nfsd_dispatch+0xab/0x1d0 [nfsd]
>>    0xffffffffc0bda476 : svc_process_common+0x306/0x6e0 [sunrpc]
>>    0xffffffffc0bdb081 : svc_process+0x131/0x180 [sunrpc]
>>    0xffffffffc0dd4864 : nfsd+0x84/0xd0 [nfsd]
>>    0xffffffffb6f0bfd6 : kthread+0xe6/0x120 [kernel]
>>    0xffffffffb6e587d4 : ret_from_fork+0x34/0x50 [kernel]
>>    0xffffffffb6e03a3b : ret_from_fork_asm+0x1b/0x30 [kernel]
>>
>> I think the problem here is that the client does not update the file
>> attribute after ALLOCATE. The GETATTR in the ALLOCATE compound does
>> not include the mode bits.
>>
> Oh, interesting! Have you tried adding the FATTR4_MODE to that GETATTR
> call on the client? Does it also fix this?

Yes, this is what I'm going to try next.

>
>> The READDIR's reply show the test file's mode has the SUID & SGID bit
>> stripped (0666) but apparently these were not used o update the file
>> attribute.
>>
>> The test passes when server does not grant write delegation because:
>>
>> OPEN
>> WRITE
>> CLOSE
>> SETATTR (06666)
>> OPEN (CLAIM_FH, NOCREATE)
>> ALLOCATE        <<=== server clear SUID & SGID
>> GETATTR, CLOSE  <<=== GETATTR has mode bit as 0666, client updates file attribute
>> READDIR
>> READDIR
>>
>> As expected, if the server recalls the write delegation when SETATTR
>> with SUID/SGID set then the test passes. This is because it forces the
>> client to send the 2nd OPEN with CLAIM_FH, NOCREATE and then the
>> (GETATTR, CLOSE) which cause the client to update the file attribute.
>>
> What's your sense of the best way to fix this? The stripping of mode
> bits isn't covered by the NFSv4 spec, so this will ultimately come down
> to a judgment call.

Yes, I did not find anything regarding stripping of SUID/SGID in the NFS4.2
specs. It's done by the 'fs' layer and it has been there since 4/2005 in
the big merge to Linux-2.6.12-rc2 done by Linus. So I think we should leave
it there.

The stripping makes some sense to me since if the file is being expanded
(to be written to) then it should not an executable therefor its SUID/SGID
should be stripped.

-Dai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: xfstests results over NFS
  2023-08-22 19:51                         ` dai.ngo
@ 2023-08-22 23:15                           ` Jeff Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2023-08-22 23:15 UTC (permalink / raw)
  To: dai.ngo, Chuck Lever III
  Cc: Anna Schumaker, Trond Myklebust, Linux NFS Mailing List,
	Neil Brown, Kornievskaia, Olga, Tom Talpey

On Tue, 2023-08-22 at 12:51 -0700, dai.ngo@oracle.com wrote:
> On 8/22/23 10:02 AM, Jeff Layton wrote:
> > On Tue, 2023-08-22 at 09:07 -0700, dai.ngo@oracle.com wrote:
> > > On 8/17/23 4:08 PM, Jeff Layton wrote:
> > > > On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
> > > > > On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
> > > > > > On 8/17/23 2:07 PM, Jeff Layton wrote:
> > > > > > > On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
> > > > > > > > On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> > > > > > > > > > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > > > > > 
> > > > > > > > > > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > > > > > > > > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > > > > > > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I finally got my kdevops
> > > > > > > > > > > > > > (https://github.com/linux-kdevops/kdevops) test
> > > > > > > > > > > > > > rig working well enough to get some publishable results. To
> > > > > > > > > > > > > > run fstests,
> > > > > > > > > > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > > > > > > > > > xfstests' auto group. One client mounts with default options,
> > > > > > > > > > > > > > and the
> > > > > > > > > > > > > > other uses NFSv3.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I tested 3 kernels:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > v6.4.0 (stock release)
> > > > > > > > > > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > > > > > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
> > > > > > > > > > > > > > yesterday morning)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Here are the results summary of all 3:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > KERNEL:    6.4.0
> > > > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > > > > > > > > > >      generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > > > > > > > > > >      generic/444 generic/528 generic/529
> > > > > > > > > > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > > > >      generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > > > > > > > > > >      generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > > > > > > > > > >      generic/529 generic/578 generic/675 generic/688
> > > > > > > > > > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > > > > >      generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > > > >      generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > > > > >      generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > > > > >      generic/675 generic/688
> > > > > > > > > > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > > > > >      generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > > > >      generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > > > > >      generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > > > > >      generic/675 generic/683 generic/684 generic/688
> > > > > > > > > > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > > > > > > > > > As long as we're sharing results ... here is what I'm seeing with a
> > > > > > > > > > > 6.5-rc6 client & server:
> > > > > > > > > > > 
> > > > > > > > > > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
> > > > > > > > > > > --color=none
> > > > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > > > 
> > > > > > > > > > > > run | device               | xunit   | hostname | pass | fail |
> > > > > > > > > > > skip |  time |
> > > > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > > > 
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> > > > > > > > > > > 464 | 447 s |
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> > > > > > > > > > > 465 | 478 s |
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> > > > > > > > > > > 462 | 404 s |
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> > > > > > > > > > > 363 | 564 s |
> > > > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > > >      testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > > > generic/053 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/099 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/105 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/140 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/188 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/258 | failure | passed  | passed  | failure |
> > > > > > > > > > > > generic/294 | failure | failure | failure | failure |
> > > > > > > > > > > > generic/318 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/319 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/357 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/444 | failure | failure | failure | failure |
> > > > > > > > > > > > generic/465 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/513 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/529 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/604 | passed  | passed  | failure | passed  |
> > > > > > > > > > > > generic/675 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/688 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/697 | passed  | failure | failure | failure |
> > > > > > > > > > > >       nfs/002 | failure | failure | failure | failure |
> > > > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current
> > > > > > > > > > > > > > mainline
> > > > > > > > > > > > > > kernel doesn't:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >      generic/193 (some sort of setattr problem)
> > > > > > > > > > > > > >      generic/528 (known problem with btime handling in client
> > > > > > > > > > > > > > that has been fixed)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > While I haven't investigated, I'm assuming the 193 bug is also
> > > > > > > > > > > > > > something
> > > > > > > > > > > > > > that has been fixed in recent kernels. There are also 3 other
> > > > > > > > > > > > > > NFSv3
> > > > > > > > > > > > > > tests that started passing since v6.4.0. I haven't looked into
> > > > > > > > > > > > > > those.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >      generic/683
> > > > > > > > > > > > > >      generic/684
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Both of these look like problems with setuid/setgid stripping,
> > > > > > > > > > > > > > and still
> > > > > > > > > > > > > > need to be investigated. I have more verbose result info on
> > > > > > > > > > > > > > the test
> > > > > > > > > > > > > > failures if anyone is interested.
> > > > > > > > > > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > > > > > > > > > filesystem is your server exporting?
> > > > > > > > > > > 
> > > > > > > > > > btrfs
> > > > > > > > > > 
> > > > > > > > > > You are testing linux-next? I need to go back and confirm these
> > > > > > > > > > results
> > > > > > > > > > too.
> > > > > > > > > IMO linux-next is quite important : we keep hitting bugs that
> > > > > > > > > appear only after integration -- block and network changes in
> > > > > > > > > other trees especially can impact the NFS drivers.
> > > > > > > > > 
> > > > > > > > Indeed, I suspect this is probably something from the vfs tree (though
> > > > > > > > we definitely need to confirm that). Today I'm testing:
> > > > > > > > 
> > > > > > > >        6.5.0-rc6-next-20230817-g47762f086974
> > > > > > > > 
> > > > > > > Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
> > > > > > > turning off leases on the nfs server and the test started passing. I
> > > > > > > probably won't have the cycles to chase this down further.
> > > > > > > 
> > > > > > > The capture looks something like this:
> > > > > > > 
> > > > > > > OPEN (get a write delegation
> > > > > > > WRITE
> > > > > > > CLOSE
> > > > > > > SETATTR (mode 06666)
> > > > > > > 
> > > > > > > ...then presumably a task on the client opens the file again, but the
> > > > > > > setuid bits don't get stripped.
> > > OPEN (get a write delegation
> > > WRITE
> > > CLOSE
> > > SETATTR (mode 06666)
> > > 
> > > The client continues with:
> > > 
> > > (ALLOCATE,GETATTR)  <<===  this is when the server stripped the SUID and SGID bit
> > > READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
> > > READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
> > > DELERETURN
> > > 
> > > Here is stack trace of ALLOCATE when the SUID & SGID were stripped:
> > > 
> > > **** start of notify_change, notice the i_mode bits, SUID & SGID were set:
> > > [notify_change]: d_iname[a] ia_valid[0x1a00] ia_mode[0x0] i_mode[0x8db6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
> > >                           KILL[0] KILL_SUID[1] KILL_SGID[1]
> > > 
> > > **** end of notify_change, notice the i_mode bits, SUID & SGID were stripped:
> > > [notify_change]: RET[0] d_iname[a] ia_valid[0x1a01] ia_mode[0x81b6] i_mode[0x81b6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
> > > 
> > > **** stack trace of notify_change comes from ALLOCATE:
> > > Returning from:  0xffffffffb726e764 : notify_change+0x4/0x500 [kernel]
> > > Returning to  :  0xffffffffb726bf99 : __file_remove_privs+0x119/0x170 [kernel]
> > >    0xffffffffb726cfad : file_modified_flags+0x4d/0x110 [kernel]
> > >    0xffffffffc0a2330b : xfs_file_fallocate+0xfb/0x490 [xfs]
> > >    0xffffffffb723e7d8 : vfs_fallocate+0x158/0x380 [kernel]
> > >    0xffffffffc0ddc30a : nfsd4_vfs_fallocate+0x4a/0x70 [nfsd]
> > >    0xffffffffc0def7f2 : nfsd4_allocate+0x72/0xc0 [nfsd]
> > >    0xffffffffc0df2663 : nfsd4_proc_compound+0x3d3/0x730 [nfsd]
> > >    0xffffffffc0dd633b : nfsd_dispatch+0xab/0x1d0 [nfsd]
> > >    0xffffffffc0bda476 : svc_process_common+0x306/0x6e0 [sunrpc]
> > >    0xffffffffc0bdb081 : svc_process+0x131/0x180 [sunrpc]
> > >    0xffffffffc0dd4864 : nfsd+0x84/0xd0 [nfsd]
> > >    0xffffffffb6f0bfd6 : kthread+0xe6/0x120 [kernel]
> > >    0xffffffffb6e587d4 : ret_from_fork+0x34/0x50 [kernel]
> > >    0xffffffffb6e03a3b : ret_from_fork_asm+0x1b/0x30 [kernel]
> > > 
> > > I think the problem here is that the client does not update the file
> > > attribute after ALLOCATE. The GETATTR in the ALLOCATE compound does
> > > not include the mode bits.
> > > 
> > Oh, interesting! Have you tried adding the FATTR4_MODE to that GETATTR
> > call on the client? Does it also fix this?
> 
> Yes, this is what I'm going to try next.
> 

Great. Keep us posted.

> > 
> > > The READDIR's reply show the test file's mode has the SUID & SGID bit
> > > stripped (0666) but apparently these were not used o update the file
> > > attribute.
> > > 
> > > The test passes when server does not grant write delegation because:
> > > 
> > > OPEN
> > > WRITE
> > > CLOSE
> > > SETATTR (06666)
> > > OPEN (CLAIM_FH, NOCREATE)
> > > ALLOCATE        <<=== server clear SUID & SGID
> > > GETATTR, CLOSE  <<=== GETATTR has mode bit as 0666, client updates file attribute
> > > READDIR
> > > READDIR
> > > 
> > > As expected, if the server recalls the write delegation when SETATTR
> > > with SUID/SGID set then the test passes. This is because it forces the
> > > client to send the 2nd OPEN with CLAIM_FH, NOCREATE and then the
> > > (GETATTR, CLOSE) which cause the client to update the file attribute.
> > > 
> > What's your sense of the best way to fix this? The stripping of mode
> > bits isn't covered by the NFSv4 spec, so this will ultimately come down
> > to a judgment call.
> 
> Yes, I did not find anything regarding stripping of SUID/SGID in the NFS4.2
> specs. It's done by the 'fs' layer and it has been there since 4/2005 in
> the big merge to Linux-2.6.12-rc2 done by Linus. So I think we should leave
> it there.
> 
> The stripping makes some sense to me since if the file is being expanded
> (to be written to) then it should not an executable therefor its SUID/SGID
> should be stripped.
> 

Right. The point is that POSIX requires setuid clearing, but the NFSv4
spec doesn't say anything about it. Ultimately, it's the server's
responsibility to actually clear the bits.

Having the client also fetch the mode does sound like the right thing to
do here. It should be cheap for most servers to provide anyway, given
that they will have the inode in-core.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-22 23:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-17 11:21 xfstests results over NFS Jeff Layton
2023-08-17 14:04 ` Chuck Lever III
2023-08-17 14:22   ` Jeff Layton
2023-08-17 15:17     ` Anna Schumaker
2023-08-17 16:27       ` Jeff Layton
2023-08-17 16:31         ` Chuck Lever III
2023-08-17 17:15           ` Jeff Layton
2023-08-17 21:07             ` Jeff Layton
2023-08-17 22:23               ` dai.ngo
2023-08-17 22:59                 ` dai.ngo
2023-08-17 23:08                   ` Jeff Layton
2023-08-17 23:28                     ` dai.ngo
2023-08-22 16:07                     ` dai.ngo
2023-08-22 17:02                       ` Jeff Layton
2023-08-22 19:51                         ` dai.ngo
2023-08-22 23:15                           ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).