* Re: [PATCH] nfs: simplify and guarantee owner uniqueness.
[not found] ` <172652955677.17050.4744720185342907808@noble.neil.brown.name>
@ 2024-09-19 6:38 ` Jon Hunter
0 siblings, 0 replies; 3+ messages in thread
From: Jon Hunter @ 2024-09-19 6:38 UTC (permalink / raw)
To: NeilBrown, Steven Price
Cc: Trond Myklebust, Anna Schumaker, linux-nfs,
linux-tegra@vger.kernel.org
Hi Neil,
On 17/09/2024 00:32, NeilBrown wrote:
> On Tue, 17 Sep 2024, Steven Price wrote:
>>
>> Hi Neil,
>>
>> I'm seeing issues on a test board using an NFS root which I've bisected
>> to this commit in linux-next. The kernel spits out many errors of the form:
>>
>> [ 7.478995] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.599462] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.600570] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.615243] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.636756] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.644808] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.653605] NFS: v4 server <ip> returned a bad sequence-id error!
>> [ 7.692836] NFS: nfs4_reclaim_open_state: unhandled error -10026
>> [ 7.699573] NFSv4: state recovery failed for open file
>> arm-linux-gnueabihf/libgpg-error.so.0.29.0, error = -10026
>> [ 7.711055] NFSv4: state recovery failed for open file
>> arm-linux-gnueabihf/libgpg-error.so.0.29.0, error = -10026
>>
>> (with the filename obviously varying)
>>
>> The NFS server is a standard Debian 12 system.
>>
>> Any ideas?
>
> Not immediately. It appears that when the client opens a file during
> recovery, the server doesn't like the seqid that it uses...
>
> Recover happens when the server restarts and when the client and server
> have been out of contact for an extended period or time (>90 seconds by
> default).
> Was either of those the case here? Which one?
I am seeing various failures on -next and bisect is also pointing to
this commit. Reverting it does fix these issues. On one board I also
observed ...
[ 12.674296] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 12.780476] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 12.829071] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 12.971432] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 13.102700] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 13.171315] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 13.216019] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 13.273610] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
[ 13.298471] NFS: v4 server 192.168.99.1 returned a bad sequence-id error!
And on the same board I see ...
[ 16.496417] NFS: nfs4_reclaim_open_state: unhandled error -10026
[ 16.991736] NFS: nfs4_reclaim_open_state: unhandled error -10026
[ 17.106226] NFS: nfs4_reclaim_open_state: unhandled error -10026
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] nfs: simplify and guarantee owner uniqueness.
[not found] ` <172680136351.17050.10296437171546281772@noble.neil.brown.name>
@ 2024-09-22 12:56 ` Jon Hunter
2024-09-22 23:21 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Jon Hunter @ 2024-09-22 12:56 UTC (permalink / raw)
To: NeilBrown, Steven Price
Cc: Trond Myklebust, Anna Schumaker, linux-nfs,
linux-tegra@vger.kernel.org
Hi Neil,
On 20/09/2024 04:02, NeilBrown wrote:
> On Thu, 19 Sep 2024, Steven Price wrote:
>> On 19/09/2024 02:29, NeilBrown wrote:
>>> On Wed, 18 Sep 2024, Steven Price wrote:
>>>> Hi Neil,
>>>>
>>>> (Dropping the list/others due to the attachment)
>>>
>>> (re-adding others now - thanks for the attachment).
>>>
>>>>
>>>> Attached, this is booting a kernel compiled from 00fd839ca761 ("nfs:
>>>> simplify and guarantee owner uniqueness.") which uses an NFS root with a
>>>> Debian bullseye userspace.
>>>
>>> This shows that the owner_id was always different - or almost always.
>>> Once it repeated we got an error because the seqid kept increasing.
>>> This is because the xdr encoding is broken.
>>>
>>> Please apply this incremental patch and confirm that it works now.
>>
>> Thanks, I've tested the below and I don't see NFS errors any more.
>>
>> Tested-by: Steven Price <steven.price@arm.com>
>
> Thanks Steve.
>
> Anna: could you please squash this fix in to the commit?
> Jon: could you please confirm that this fixes your problem too.
>
> Thanks,
> NeilBrown
>
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 1aaf908acc5d..88bcbcba1381 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -1429,7 +1429,7 @@ static inline void encode_openhdr(struct xdr_stream *xdr, const struct nfs_opena
> *p++ = cpu_to_be32(28);
> p = xdr_encode_opaque_fixed(p, "open id:", 8);
> *p++ = cpu_to_be32(arg->server->s_dev);
> - xdr_encode_hyper(p, arg->id.uniquifier);
> + p = xdr_encode_hyper(p, arg->id.uniquifier);
> xdr_encode_hyper(p, arg->id.create_time);
> }
Works for me!
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Thanks
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] nfs: simplify and guarantee owner uniqueness.
2024-09-22 12:56 ` Jon Hunter
@ 2024-09-22 23:21 ` NeilBrown
0 siblings, 0 replies; 3+ messages in thread
From: NeilBrown @ 2024-09-22 23:21 UTC (permalink / raw)
To: Jon Hunter
Cc: Steven Price, Trond Myklebust, Anna Schumaker, linux-nfs,
linux-tegra@vger.kernel.org
On Sun, 22 Sep 2024, Jon Hunter wrote:
> Hi Neil,
>
> On 20/09/2024 04:02, NeilBrown wrote:
> > On Thu, 19 Sep 2024, Steven Price wrote:
> >> On 19/09/2024 02:29, NeilBrown wrote:
> >>> On Wed, 18 Sep 2024, Steven Price wrote:
> >>>> Hi Neil,
> >>>>
> >>>> (Dropping the list/others due to the attachment)
> >>>
> >>> (re-adding others now - thanks for the attachment).
> >>>
> >>>>
> >>>> Attached, this is booting a kernel compiled from 00fd839ca761 ("nfs:
> >>>> simplify and guarantee owner uniqueness.") which uses an NFS root with a
> >>>> Debian bullseye userspace.
> >>>
> >>> This shows that the owner_id was always different - or almost always.
> >>> Once it repeated we got an error because the seqid kept increasing.
> >>> This is because the xdr encoding is broken.
> >>>
> >>> Please apply this incremental patch and confirm that it works now.
> >>
> >> Thanks, I've tested the below and I don't see NFS errors any more.
> >>
> >> Tested-by: Steven Price <steven.price@arm.com>
> >
> > Thanks Steve.
> >
> > Anna: could you please squash this fix in to the commit?
> > Jon: could you please confirm that this fixes your problem too.
> >
> > Thanks,
> > NeilBrown
> >
> > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > index 1aaf908acc5d..88bcbcba1381 100644
> > --- a/fs/nfs/nfs4xdr.c
> > +++ b/fs/nfs/nfs4xdr.c
> > @@ -1429,7 +1429,7 @@ static inline void encode_openhdr(struct xdr_stream *xdr, const struct nfs_opena
> > *p++ = cpu_to_be32(28);
> > p = xdr_encode_opaque_fixed(p, "open id:", 8);
> > *p++ = cpu_to_be32(arg->server->s_dev);
> > - xdr_encode_hyper(p, arg->id.uniquifier);
> > + p = xdr_encode_hyper(p, arg->id.uniquifier);
> > xdr_encode_hyper(p, arg->id.create_time);
> > }
>
>
> Works for me!
Thanks Jon.
Anna has updated the patch so the fixed version is what will land
upstream.
NeilBrown
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-09-22 23:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <172558992310.4433.1385243627662249022@noble.neil.brown.name>
[not found] ` <5c90c3d0-c51f-4012-9ab6-408d023570c8@arm.com>
[not found] ` <172652955677.17050.4744720185342907808@noble.neil.brown.name>
2024-09-19 6:38 ` [PATCH] nfs: simplify and guarantee owner uniqueness Jon Hunter
[not found] <1d66e015-1ca7-4786-893c-9224ad0c7371@arm.com>
[not found] ` <172680136351.17050.10296437171546281772@noble.neil.brown.name>
2024-09-22 12:56 ` Jon Hunter
2024-09-22 23:21 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).