From: dai.ngo@oracle.com
To: Jeff Layton <jlayton@kernel.org>,
Chuck Lever III <chuck.lever@oracle.com>
Cc: Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@gmail.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Neil Brown <neilb@suse.de>,
"Kornievskaia, Olga" <Olga.Kornievskaia@netapp.com>,
Tom Talpey <tom@talpey.com>
Subject: Re: xfstests results over NFS
Date: Thu, 17 Aug 2023 16:28:36 -0700 [thread overview]
Message-ID: <90ea5539-0350-c137-30f6-cab87e47428c@oracle.com> (raw)
In-Reply-To: <cd592a05c13226c5e1fb4f390eb2473ba20024ad.camel@kernel.org>
On 8/17/23 4:08 PM, Jeff Layton wrote:
> On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@oracle.com wrote:
>> On 8/17/23 3:23 PM, dai.ngo@oracle.com wrote:
>>> On 8/17/23 2:07 PM, Jeff Layton wrote:
>>>> On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
>>>>> On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
>>>>>>> On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
>>>>>>>> On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@kernel.org>
>>>>>>>> wrote:
>>>>>>>>> On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
>>>>>>>>>>> On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@kernel.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I finally got my kdevops
>>>>>>>>>>> (https://github.com/linux-kdevops/kdevops) test
>>>>>>>>>>> rig working well enough to get some publishable results. To
>>>>>>>>>>> run fstests,
>>>>>>>>>>> kdevops will spin up a server and (in this case) 2 clients to run
>>>>>>>>>>> xfstests' auto group. One client mounts with default options,
>>>>>>>>>>> and the
>>>>>>>>>>> other uses NFSv3.
>>>>>>>>>>>
>>>>>>>>>>> I tested 3 kernels:
>>>>>>>>>>>
>>>>>>>>>>> v6.4.0 (stock release)
>>>>>>>>>>> 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
>>>>>>>>>>> 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
>>>>>>>>>>> yesterday morning)
>>>>>>>>>>>
>>>>>>>>>>> Here are the results summary of all 3:
>>>>>>>>>>>
>>>>>>>>>>> KERNEL: 6.4.0
>>>>>>>>>>> CPUS: 8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/124
>>>>>>>>>>> generic/193 generic/258 generic/294 generic/318 generic/319
>>>>>>>>>>> generic/444 generic/528 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>> generic/187 generic/193 generic/294 generic/318 generic/319
>>>>>>>>>>> generic/357 generic/444 generic/486 generic/513 generic/528
>>>>>>>>>>> generic/529 generic/578 generic/675 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
>>>>>>>>>>>
>>>>>>>>>>> KERNEL: 6.5.0-rc6-g4853c74bd7ab
>>>>>>>>>>> CPUS: 8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>> generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>> generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>> generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>> generic/675 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
>>>>>>>>>>>
>>>>>>>>>>> KERNEL: 6.5.0-rc6-next-20230816-gef66bf8aeb91
>>>>>>>>>>> CPUS: 8
>>>>>>>>>>>
>>>>>>>>>>> nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/258
>>>>>>>>>>> generic/294 generic/318 generic/319 generic/444 generic/529
>>>>>>>>>>> nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
>>>>>>>>>>> Failures: generic/053 generic/099 generic/105 generic/186
>>>>>>>>>>> generic/187 generic/294 generic/318 generic/319 generic/357
>>>>>>>>>>> generic/444 generic/486 generic/513 generic/529 generic/578
>>>>>>>>>>> generic/675 generic/683 generic/684 generic/688
>>>>>>>>>>> Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
>>>>>>>> As long as we're sharing results ... here is what I'm seeing with a
>>>>>>>> 6.5-rc6 client & server:
>>>>>>>>
>>>>>>>> anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
>>>>>>>> --color=none
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>> run | device | xunit | hostname | pass | fail |
>>>>>>>> skip | time |
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-3 | client | 125 | 4 |
>>>>>>>> 464 | 447 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.0 | client | 117 | 11 |
>>>>>>>> 465 | 478 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.1 | client | 119 | 12 |
>>>>>>>> 462 | 404 s |
>>>>>>>>> 1741 | server:/srv/xfs/test | tcp-4.2 | client | 212 | 18 |
>>>>>>>> 363 | 564 s |
>>>>>>>> +------+----------------------+---------+----------+------+------+------+-------+
>>>>>>>>
>>>>>>>>
>>>>>>>> anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>> testcase | tcp-3 | tcp-4.0 | tcp-4.1 | tcp-4.2 |
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>> generic/053 | passed | failure | failure | failure |
>>>>>>>>> generic/099 | passed | failure | failure | failure |
>>>>>>>>> generic/105 | passed | failure | failure | failure |
>>>>>>>>> generic/140 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/188 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/258 | failure | passed | passed | failure |
>>>>>>>>> generic/294 | failure | failure | failure | failure |
>>>>>>>>> generic/318 | passed | failure | failure | failure |
>>>>>>>>> generic/319 | passed | failure | failure | failure |
>>>>>>>>> generic/357 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/444 | failure | failure | failure | failure |
>>>>>>>>> generic/465 | passed | failure | failure | failure |
>>>>>>>>> generic/513 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/529 | passed | failure | failure | failure |
>>>>>>>>> generic/604 | passed | passed | failure | passed |
>>>>>>>>> generic/675 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/688 | skipped | skipped | skipped | failure |
>>>>>>>>> generic/697 | passed | failure | failure | failure |
>>>>>>>>> nfs/002 | failure | failure | failure | failure |
>>>>>>>> +-------------+---------+---------+---------+---------+
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> With NFSv4.2, v6.4.0 has 2 extra failures that the current
>>>>>>>>>>> mainline
>>>>>>>>>>> kernel doesn't:
>>>>>>>>>>>
>>>>>>>>>>> generic/193 (some sort of setattr problem)
>>>>>>>>>>> generic/528 (known problem with btime handling in client
>>>>>>>>>>> that has been fixed)
>>>>>>>>>>>
>>>>>>>>>>> While I haven't investigated, I'm assuming the 193 bug is also
>>>>>>>>>>> something
>>>>>>>>>>> that has been fixed in recent kernels. There are also 3 other
>>>>>>>>>>> NFSv3
>>>>>>>>>>> tests that started passing since v6.4.0. I haven't looked into
>>>>>>>>>>> those.
>>>>>>>>>>>
>>>>>>>>>>> With the linux-next kernel there are 2 new regressions:
>>>>>>>>>>>
>>>>>>>>>>> generic/683
>>>>>>>>>>> generic/684
>>>>>>>>>>>
>>>>>>>>>>> Both of these look like problems with setuid/setgid stripping,
>>>>>>>>>>> and still
>>>>>>>>>>> need to be investigated. I have more verbose result info on
>>>>>>>>>>> the test
>>>>>>>>>>> failures if anyone is interested.
>>>>>>>> Interesting that I'm not seeing the 683 & 684 failures. What type of
>>>>>>>> filesystem is your server exporting?
>>>>>>>>
>>>>>>> btrfs
>>>>>>>
>>>>>>> You are testing linux-next? I need to go back and confirm these
>>>>>>> results
>>>>>>> too.
>>>>>> IMO linux-next is quite important : we keep hitting bugs that
>>>>>> appear only after integration -- block and network changes in
>>>>>> other trees especially can impact the NFS drivers.
>>>>>>
>>>>> Indeed, I suspect this is probably something from the vfs tree (though
>>>>> we definitely need to confirm that). Today I'm testing:
>>>>>
>>>>> 6.5.0-rc6-next-20230817-g47762f086974
>>>>>
>>>> Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
>>>> turning off leases on the nfs server and the test started passing. I
>>>> probably won't have the cycles to chase this down further.
>>>>
>>>> The capture looks something like this:
>>>>
>>>> OPEN (get a write delegation
>>>> WRITE
>>>> CLOSE
>>>> SETATTR (mode 06666)
>>>>
>>>> ...then presumably a task on the client opens the file again, but the
>>>> setuid bits don't get stripped.
>>>>
>>>> I think either the client will need to strip these bits on a delegated
>>>> open, or we'll need to recall write delegations from the client when it
>>>> tries to do a SETATTR with a mode that could later end up needing to be
>>>> stripped on a subsequent open:
>>>>
>>>> 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
>>>> commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
>>>> Author: Dai Ngo <dai.ngo@oracle.com>
>>>> Date: Thu Jun 29 18:52:40 2023 -0700
>>>>
>>>> NFSD: Enable write delegation support
>>> The SETATTR should cause the delegation to be recalled. However, I think
>>> there is an optimization on server that skips the recall if the SETATTR
>>> comes from the same client that has the delegation.
>> The optimization on the server was done by this commit:
>>
>> 28df3d1539de nfsd: clients don't need to break their own delegations
>>
>> Perhaps we should allow this optimization for read delegation only?
>>
>> Or should the NFS client be responsible for handling the SETATTR and
>> and local OPEN on the file that has write delegation granted?
>>
> I think that setuid/setgid files are really a special case.
>
> We already avoid giving out delegations on setuid/gid files. What we're
> not doing currently is revoking the write delegation if the holder tries
> to set a mode that involves a setuid/gid bit. If we add that, then that
> should close the hole, I think.
This approach seems reasonable, I'll work the patch to take care of this
condition.
Thanks Jeff,
-Dai
>
next prev parent reply other threads:[~2023-08-17 23:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-17 11:21 xfstests results over NFS Jeff Layton
2023-08-17 14:04 ` Chuck Lever III
2023-08-17 14:22 ` Jeff Layton
2023-08-17 15:17 ` Anna Schumaker
2023-08-17 16:27 ` Jeff Layton
2023-08-17 16:31 ` Chuck Lever III
2023-08-17 17:15 ` Jeff Layton
2023-08-17 21:07 ` Jeff Layton
2023-08-17 22:23 ` dai.ngo
2023-08-17 22:59 ` dai.ngo
2023-08-17 23:08 ` Jeff Layton
2023-08-17 23:28 ` dai.ngo [this message]
2023-08-22 16:07 ` dai.ngo
2023-08-22 17:02 ` Jeff Layton
2023-08-22 19:51 ` dai.ngo
2023-08-22 23:15 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=90ea5539-0350-c137-30f6-cab87e47428c@oracle.com \
--to=dai.ngo@oracle.com \
--cc=Olga.Kornievskaia@netapp.com \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tom@talpey.com \
--cc=trondmy@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).