public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 9p: use kvzalloc for readdir buffer
@ 2026-04-15  5:45 Pierre Barre
  2026-04-15  9:01 ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Pierre Barre @ 2026-04-15  5:45 UTC (permalink / raw)
  To: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet
  Cc: Christian Schoenebeck, v9fs, linux-kernel

The readdir buffer is sized to msize, so kzalloc() can fail under
fragmentation with a page allocation failure in v9fs_alloc_rdir_buf()
/ v9fs_dir_readdir_dotl().

The buffer is only a response sink and is never pack_sg_list()'d,
so kvzalloc() is safe for all transports, unlike the fcall buffers
fixed in e21d451a82f3.

Signed-off-by: Pierre Barre <pierre@barre.sh>
---
 fs/9p/vfs_dir.c | 2 +-
 net/9p/client.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
index af7f72abbb76..487c177aae38 100644
--- a/fs/9p/vfs_dir.c
+++ b/fs/9p/vfs_dir.c
@@ -70,7 +70,7 @@ static struct p9_rdir *v9fs_alloc_rdir_buf(struct file *filp, int buflen)
        struct p9_fid *fid = filp->private_data;

        if (!fid->rdir)
-               fid->rdir = kzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);
+               fid->rdir = kvzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);
        return fid->rdir;
 }

diff --git a/net/9p/client.c b/net/9p/client.c
index f60d1d041adb..6d9b9054841e 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -765,7 +765,7 @@ static void p9_fid_destroy(struct p9_fid *fid)
        spin_lock_irqsave(&clnt->lock, flags);
        idr_remove(&clnt->fids, fid->fid);
        spin_unlock_irqrestore(&clnt->lock, flags);
-       kfree(fid->rdir);
+       kvfree(fid->rdir);
        kfree(fid);
 }

--
2.51.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] 9p: use kvzalloc for readdir buffer
  2026-04-15  5:45 [PATCH] 9p: use kvzalloc for readdir buffer Pierre Barre
@ 2026-04-15  9:01 ` David Laight
  2026-04-15  9:27   ` Pierre Barre
  0 siblings, 1 reply; 6+ messages in thread
From: David Laight @ 2026-04-15  9:01 UTC (permalink / raw)
  To: Pierre Barre
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
	Christian Schoenebeck, v9fs, linux-kernel

On Wed, 15 Apr 2026 07:45:08 +0200
"Pierre Barre" <pierre@barre.sh> wrote:

> The readdir buffer is sized to msize, so kzalloc() can fail under
> fragmentation with a page allocation failure in v9fs_alloc_rdir_buf()
> / v9fs_dir_readdir_dotl().
> 
> The buffer is only a response sink and is never pack_sg_list()'d,
> so kvzalloc() is safe for all transports, unlike the fcall buffers
> fixed in e21d451a82f3.
> 
> Signed-off-by: Pierre Barre <pierre@barre.sh>
> ---
>  fs/9p/vfs_dir.c | 2 +-
>  net/9p/client.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
> index af7f72abbb76..487c177aae38 100644
> --- a/fs/9p/vfs_dir.c
> +++ b/fs/9p/vfs_dir.c
> @@ -70,7 +70,7 @@ static struct p9_rdir *v9fs_alloc_rdir_buf(struct file *filp, int buflen)
>         struct p9_fid *fid = filp->private_data;
> 
>         if (!fid->rdir)
> -               fid->rdir = kzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);
> +               fid->rdir = kvzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);

That code isn't really well thought out at all.
The default for buflen seems to be 128k + 24 - 24.
struct p9_rdir adds another 8 bytes, I'm not sure whether that doubles it to
128k or just adds an extra page (which might be 64k not 4k).
It also looks as though the user can specify an arbitrary buffers size (min 4k).
So who knows how big that (and other associated) buffers actually are.

The buffer seems to persist across readdir() system calls.
I can't see any code that might attempt to honour seekdir().
I hope there is a global lock that stops multiple threads issuing readdir()
on the same fd in parallel.

	David

>         return fid->rdir;
>  }
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index f60d1d041adb..6d9b9054841e 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -765,7 +765,7 @@ static void p9_fid_destroy(struct p9_fid *fid)
>         spin_lock_irqsave(&clnt->lock, flags);
>         idr_remove(&clnt->fids, fid->fid);
>         spin_unlock_irqrestore(&clnt->lock, flags);
> -       kfree(fid->rdir);
> +       kvfree(fid->rdir);
>         kfree(fid);
>  }
> 
> --
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] 9p: use kvzalloc for readdir buffer
  2026-04-15  9:01 ` David Laight
@ 2026-04-15  9:27   ` Pierre Barre
  2026-04-15 10:36     ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Pierre Barre @ 2026-04-15  9:27 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Van Hensbergen, Latchesar Ionkov, asmadeus,
	Christian Schoenebeck, v9fs, linux-kernel

Hi David,

Perhaps what you describe can explain what I was seeing there: https://lore.kernel.org/v9fs/496d10b9-40fe-4f81-8014-37497c37ff63@app.fastmail.com/

Thanks,
Pierre

On Wed, Apr 15, 2026, at 11:01, David Laight wrote:
> On Wed, 15 Apr 2026 07:45:08 +0200
> "Pierre Barre" <pierre@barre.sh> wrote:
>
>> The readdir buffer is sized to msize, so kzalloc() can fail under
>> fragmentation with a page allocation failure in v9fs_alloc_rdir_buf()
>> / v9fs_dir_readdir_dotl().
>> 
>> The buffer is only a response sink and is never pack_sg_list()'d,
>> so kvzalloc() is safe for all transports, unlike the fcall buffers
>> fixed in e21d451a82f3.
>> 
>> Signed-off-by: Pierre Barre <pierre@barre.sh>
>> ---
>>  fs/9p/vfs_dir.c | 2 +-
>>  net/9p/client.c | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
>> index af7f72abbb76..487c177aae38 100644
>> --- a/fs/9p/vfs_dir.c
>> +++ b/fs/9p/vfs_dir.c
>> @@ -70,7 +70,7 @@ static struct p9_rdir *v9fs_alloc_rdir_buf(struct file *filp, int buflen)
>>         struct p9_fid *fid = filp->private_data;
>> 
>>         if (!fid->rdir)
>> -               fid->rdir = kzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);
>> +               fid->rdir = kvzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);
>
> That code isn't really well thought out at all.
> The default for buflen seems to be 128k + 24 - 24.
> struct p9_rdir adds another 8 bytes, I'm not sure whether that doubles it to
> 128k or just adds an extra page (which might be 64k not 4k).
> It also looks as though the user can specify an arbitrary buffers size (min 4k).
> So who knows how big that (and other associated) buffers actually are.
>
> The buffer seems to persist across readdir() system calls.
> I can't see any code that might attempt to honour seekdir().
> I hope there is a global lock that stops multiple threads issuing readdir()
> on the same fd in parallel.
>
> 	David
>
>>         return fid->rdir;
>>  }
>> 
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index f60d1d041adb..6d9b9054841e 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -765,7 +765,7 @@ static void p9_fid_destroy(struct p9_fid *fid)
>>         spin_lock_irqsave(&clnt->lock, flags);
>>         idr_remove(&clnt->fids, fid->fid);
>>         spin_unlock_irqrestore(&clnt->lock, flags);
>> -       kfree(fid->rdir);
>> +       kvfree(fid->rdir);
>>         kfree(fid);
>>  }
>> 
>> --
>> 2.51.0
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] 9p: use kvzalloc for readdir buffer
  2026-04-15  9:27   ` Pierre Barre
@ 2026-04-15 10:36     ` David Laight
  2026-04-16  2:31       ` Dominique Martinet
  0 siblings, 1 reply; 6+ messages in thread
From: David Laight @ 2026-04-15 10:36 UTC (permalink / raw)
  To: Pierre Barre
  Cc: Eric Van Hensbergen, Latchesar Ionkov, asmadeus,
	Christian Schoenebeck, v9fs, linux-kernel

On Wed, 15 Apr 2026 11:27:21 +0200
"Pierre Barre" <pierre@barre.sh> wrote:

> Hi David,
> 
> Perhaps what you describe can explain what I was seeing there:
> https://lore.kernel.org/v9fs/496d10b9-40fe-4f81-8014-37497c37ff63@app.fastmail.com/
(After seeking, getdents() returns stale cached entries instead of fetching from the new position.)

Absolutely.
But the fix probably isn't trivial.
The offset that you need for the seek isn't directly related to the number
of bytes copied to the user buffer - which is what I suspect ftell() (or
whatever gets used) returns.
I think there is some mechanism for arbitrary directory offsets; but IIRC that
requires the code put the 'file system offset for the next directory entry'
somewhere in the directory entry.
Such an offset would have to be one the remote system would understand.

A partial 'non-fix' would be to reject seeks to other than offset 0.

	David

> 
> Thanks,
> Pierre
> 
> On Wed, Apr 15, 2026, at 11:01, David Laight wrote:
> > On Wed, 15 Apr 2026 07:45:08 +0200
> > "Pierre Barre" <pierre@barre.sh> wrote:
> >  
> >> The readdir buffer is sized to msize, so kzalloc() can fail under
> >> fragmentation with a page allocation failure in v9fs_alloc_rdir_buf()
> >> / v9fs_dir_readdir_dotl().
> >> 
> >> The buffer is only a response sink and is never pack_sg_list()'d,
> >> so kvzalloc() is safe for all transports, unlike the fcall buffers
> >> fixed in e21d451a82f3.
> >> 
> >> Signed-off-by: Pierre Barre <pierre@barre.sh>
> >> ---
> >>  fs/9p/vfs_dir.c | 2 +-
> >>  net/9p/client.c | 2 +-
> >>  2 files changed, 2 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
> >> index af7f72abbb76..487c177aae38 100644
> >> --- a/fs/9p/vfs_dir.c
> >> +++ b/fs/9p/vfs_dir.c
> >> @@ -70,7 +70,7 @@ static struct p9_rdir *v9fs_alloc_rdir_buf(struct file *filp, int buflen)
> >>         struct p9_fid *fid = filp->private_data;
> >> 
> >>         if (!fid->rdir)
> >> -               fid->rdir = kzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);
> >> +               fid->rdir = kvzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL);  
> >
> > That code isn't really well thought out at all.
> > The default for buflen seems to be 128k + 24 - 24.
> > struct p9_rdir adds another 8 bytes, I'm not sure whether that doubles it to
> > 128k or just adds an extra page (which might be 64k not 4k).
> > It also looks as though the user can specify an arbitrary buffers size (min 4k).
> > So who knows how big that (and other associated) buffers actually are.
> >
> > The buffer seems to persist across readdir() system calls.
> > I can't see any code that might attempt to honour seekdir().
> > I hope there is a global lock that stops multiple threads issuing readdir()
> > on the same fd in parallel.
> >
> > 	David
> >  
> >>         return fid->rdir;
> >>  }
> >> 
> >> diff --git a/net/9p/client.c b/net/9p/client.c
> >> index f60d1d041adb..6d9b9054841e 100644
> >> --- a/net/9p/client.c
> >> +++ b/net/9p/client.c
> >> @@ -765,7 +765,7 @@ static void p9_fid_destroy(struct p9_fid *fid)
> >>         spin_lock_irqsave(&clnt->lock, flags);
> >>         idr_remove(&clnt->fids, fid->fid);
> >>         spin_unlock_irqrestore(&clnt->lock, flags);
> >> -       kfree(fid->rdir);
> >> +       kvfree(fid->rdir);
> >>         kfree(fid);
> >>  }
> >> 
> >> --
> >> 2.51.0
> >>  


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] 9p: use kvzalloc for readdir buffer
  2026-04-15 10:36     ` David Laight
@ 2026-04-16  2:31       ` Dominique Martinet
  2026-04-16  9:18         ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Dominique Martinet @ 2026-04-16  2:31 UTC (permalink / raw)
  To: David Laight
  Cc: Pierre Barre, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, linux-kernel

David Laight wrote on Wed, Apr 15, 2026 at 11:36:25AM +0100:
> > Perhaps what you describe can explain what I was seeing there:
> > https://lore.kernel.org/v9fs/496d10b9-40fe-4f81-8014-37497c37ff63@app.fastmail.com/
> (After seeking, getdents() returns stale cached entries instead of fetching from the new position.)
> 
> Absolutely.
> But the fix probably isn't trivial.
> The offset that you need for the seek isn't directly related to the number
> of bytes copied to the user buffer - which is what I suspect ftell() (or
> whatever gets used) returns.
> I think there is some mechanism for arbitrary directory offsets; but IIRC that
> requires the code put the 'file system offset for the next directory entry'
> somewhere in the directory entry.
> Such an offset would have to be one the remote system would understand.
> 
> A partial 'non-fix' would be to reject seeks to other than offset 0.


Thank you both for working on this and reviewing -- this is all
historical code that hasn't seen much love.


9p Treaddir sends an offset on each call, so I think it'd be fine to
invalidate buffer/remember whatever the client set in a custom llseek
function and send this to the server on readdir call, but I honestly
didn't give this any more thought than the past 2 mintes (I'm totally
swamped and can't keep up/didn't even notice the bug report this
december, sorry :/)
I think some filesystems only allow seeks to 0 already? So given there
is a precendent this might be fine, but I don't see the harm in allowing
custom offsets: the server needs to be able to deal with junk offsets in
read requests anyway, so it's not a problem for me if userspace can set
something invalid and get itself stuck on EINVAL or whatever.


As for locking the vfs takes the file's f_lock for seek, but there
doesn't seem to be anything in the readdir path that would do that, so I
guess it probably would blow up with parallel readdirs on the same fd,
and could use improving...

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] 9p: use kvzalloc for readdir buffer
  2026-04-16  2:31       ` Dominique Martinet
@ 2026-04-16  9:18         ` David Laight
  0 siblings, 0 replies; 6+ messages in thread
From: David Laight @ 2026-04-16  9:18 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Pierre Barre, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, linux-kernel

On Thu, 16 Apr 2026 11:31:02 +0900
Dominique Martinet <asmadeus@codewreck.org> wrote:

> David Laight wrote on Wed, Apr 15, 2026 at 11:36:25AM +0100:
> > > Perhaps what you describe can explain what I was seeing there:
> > > https://lore.kernel.org/v9fs/496d10b9-40fe-4f81-8014-37497c37ff63@app.fastmail.com/  
> > (After seeking, getdents() returns stale cached entries instead of fetching from the new position.)
> > 
> > Absolutely.
> > But the fix probably isn't trivial.
> > The offset that you need for the seek isn't directly related to the number
> > of bytes copied to the user buffer - which is what I suspect ftell() (or
> > whatever gets used) returns.
> > I think there is some mechanism for arbitrary directory offsets; but IIRC that
> > requires the code put the 'file system offset for the next directory entry'
> > somewhere in the directory entry.
> > Such an offset would have to be one the remote system would understand.
> > 
> > A partial 'non-fix' would be to reject seeks to other than offset 0.  
> 
> 
> Thank you both for working on this and reviewing -- this is all
> historical code that hasn't seen much love.
> 
> 
> 9p Treaddir sends an offset on each call, so I think it'd be fine to
> invalidate buffer/remember whatever the client set in a custom llseek
> function and send this to the server on readdir call, but I honestly
> didn't give this any more thought than the past 2 mintes (I'm totally
> swamped and can't keep up/didn't even notice the bug report this
> december, sorry :/)

The problem is you need to get the correct offsets for the various calls.
So the directory entry sent to the user needs the correct offset the
server would use for the (next) directory entry.
If the latter aren't byte offsets into the buffer it probably impossible.

I have a vague recollection of some fs encoding 'directory block number'
and 'offset in directory block' into the 'file offset' for directory fd.

> I think some filesystems only allow seeks to 0 already? So given there
> is a precendent this might be fine, but I don't see the harm in allowing
> custom offsets: the server needs to be able to deal with junk offsets in
> read requests anyway, so it's not a problem for me if userspace can set
> something invalid and get itself stuck on EINVAL or whatever.
> 
> 
> As for locking the vfs takes the file's f_lock for seek, but there
> doesn't seem to be anything in the readdir path that would do that, so I
> guess it probably would blow up with parallel readdirs on the same fd,
> and could use improving...

There might be something, file->offset needs protecting for readdir()
the same as for read() (but not pread()).
At least once (for some filesystem on some unix variant) readdir() would
have been pretty much the same as read().

	David

> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-16  9:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15  5:45 [PATCH] 9p: use kvzalloc for readdir buffer Pierre Barre
2026-04-15  9:01 ` David Laight
2026-04-15  9:27   ` Pierre Barre
2026-04-15 10:36     ` David Laight
2026-04-16  2:31       ` Dominique Martinet
2026-04-16  9:18         ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox