linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd.schubert@fastmail.fm>
To: Miklos Szeredi <miklos@szeredi.hu>, Jaco Kroon <jaco@uls.co.za>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Randy Dunlap <rdunlap@infradead.org>,
	Antonio SJ Musumeci <trapexit@spawn.link>
Subject: Re: [PATCH] fuse: enable larger read buffers for readdir [v2].
Date: Thu, 27 Jul 2023 21:16:53 +0200	[thread overview]
Message-ID: <15fad0eb-b161-b87d-9964-e77a7193de48@fastmail.fm> (raw)
In-Reply-To: <CAJfpegvJ7FOS35yiKsTAzQh5Uf71FatU-kTJpXJtDPQbXeMgxA@mail.gmail.com>



On 7/27/23 17:35, Miklos Szeredi wrote:
> On Thu, 27 Jul 2023 at 10:13, Jaco Kroon <jaco@uls.co.za> wrote:
>>
>> This patch does not mess with the caching infrastructure like the
>> previous one, which we believe caused excessive CPU and broke directory
>> listings in some cases.
>>
>> This version only affects the uncached read, which then during parse adds an
>> entry at a time to the cached structures by way of copying, and as such,
>> we believe this should be sufficient.
>>
>> We're still seeing cases where getdents64 takes ~10s (this was the case
>> in any case without this patch, the difference now that we get ~500
>> entries for that time rather than the 14-18 previously).  We believe
>> that that latency is introduced on glusterfs side and is under separate
>> discussion with the glusterfs developers.
>>
>> This is still a compile-time option, but a working one compared to
>> previous patch.  For now this works, but it's not recommended for merge
>> (as per email discussion).
>>
>> This still uses alloc_pages rather than kvmalloc/kvfree.
>>
>> Signed-off-by: Jaco Kroon <jaco@uls.co.za>
>> ---
>>   fs/fuse/Kconfig   | 16 ++++++++++++++++
>>   fs/fuse/readdir.c | 18 ++++++++++++------
>>   2 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
>> index 038ed0b9aaa5..0783f9ee5cd3 100644
>> --- a/fs/fuse/Kconfig
>> +++ b/fs/fuse/Kconfig
>> @@ -18,6 +18,22 @@ config FUSE_FS
>>            If you want to develop a userspace FS, or if you want to use
>>            a filesystem based on FUSE, answer Y or M.
>>
>> +config FUSE_READDIR_ORDER
>> +       int
>> +       range 0 5
>> +       default 5
>> +       help
>> +               readdir performance varies greatly depending on the size of the read.
>> +               Larger buffers results in larger reads, thus fewer reads and higher
>> +               performance in return.
>> +
>> +               You may want to reduce this value on seriously constrained memory
>> +               systems where 128KiB (assuming 4KiB pages) cache pages is not ideal.
>> +
>> +               This value reprents the order of the number of pages to allocate (ie,
>> +               the shift value).  A value of 0 is thus 1 page (4KiB) where 5 is 32
>> +               pages (128KiB).
>> +
>>   config CUSE
>>          tristate "Character device in Userspace support"
>>          depends on FUSE_FS
>> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
>> index dc603479b30e..47cea4d91228 100644
>> --- a/fs/fuse/readdir.c
>> +++ b/fs/fuse/readdir.c
>> @@ -13,6 +13,12 @@
>>   #include <linux/pagemap.h>
>>   #include <linux/highmem.h>
>>
>> +#define READDIR_PAGES_ORDER            CONFIG_FUSE_READDIR_ORDER
>> +#define READDIR_PAGES                  (1 << READDIR_PAGES_ORDER)
>> +#define READDIR_PAGES_SIZE             (PAGE_SIZE << READDIR_PAGES_ORDER)
>> +#define READDIR_PAGES_MASK             (READDIR_PAGES_SIZE - 1)
>> +#define READDIR_PAGES_SHIFT            (PAGE_SHIFT + READDIR_PAGES_ORDER)
>> +
>>   static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
>>   {
>>          struct fuse_conn *fc = get_fuse_conn(dir);
>> @@ -328,25 +334,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
>>          struct fuse_mount *fm = get_fuse_mount(inode);
>>          struct fuse_io_args ia = {};
>>          struct fuse_args_pages *ap = &ia.ap;
>> -       struct fuse_page_desc desc = { .length = PAGE_SIZE };
>> +       struct fuse_page_desc desc = { .length = READDIR_PAGES_SIZE };
> 
> Does this really work?  I would've thought we are relying on single
> page lengths somewhere.
> 
>>          u64 attr_version = 0;
>>          bool locked;
>>
>> -       page = alloc_page(GFP_KERNEL);
>> +       page = alloc_pages(GFP_KERNEL, READDIR_PAGES_ORDER);
>>          if (!page)
>>                  return -ENOMEM;
>>
>>          plus = fuse_use_readdirplus(inode, ctx);
>>          ap->args.out_pages = true;
>> -       ap->num_pages = 1;
>> +       ap->num_pages = READDIR_PAGES;
> 
> No.  This is the array lenght, which is 1.  This is the hack I guess,
> which makes the above trick work.
> 
> Better use kvmalloc, which might have a slightly worse performance
> than a large page, but definitely not worse than the current single
> page.
> 
> If we want to optimize the overhead of kvmalloc (and it's a big if)
> then the parse_dir*file() functions would need to be converted to
> using a page array instead of a plain kernel pointer, which would add
> some complexity for sure.

One simple possibility might be to do pos=0 with a small buffer size 
single page and only if pos is set  we switch to a larger buffer - that 
way small directories don't get the overhead of the large allocation. 
Although following your idea to to the getdents buffer size - this is 
something libc could already start with.


Cheers,
Bernd

  parent reply	other threads:[~2023-07-27 19:17 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26 10:59 [PATCH] fuse: enable larger read buffers for readdir Jaco Kroon
2023-07-26 11:43 ` Jaco Kroon
2023-07-26 13:53 ` Bernd Schubert
2023-07-26 15:26   ` Jaco Kroon
2023-07-26 15:30     ` Bernd Schubert
2023-07-26 15:45     ` Bernd Schubert
2023-07-26 17:23       ` Antonio SJ Musumeci
2023-07-26 18:25         ` Jaco Kroon
2023-07-27 15:21           ` Miklos Szeredi
2023-07-27 19:21           ` Miklos Szeredi
2023-07-27 19:43             ` Bernd Schubert
2023-07-28  8:42               ` Miklos Szeredi
2023-07-26 15:19 ` Randy Dunlap
2023-07-27  8:12 ` [PATCH] fuse: enable larger read buffers for readdir [v2] Jaco Kroon
2023-07-27 15:35   ` Miklos Szeredi
2023-07-27 16:58     ` Jaco Kroon
2023-07-27 19:17       ` Miklos Szeredi
2023-07-27 19:16     ` Bernd Schubert [this message]
2023-07-27 20:35     ` Bernd Schubert
2023-07-28  5:05       ` Jaco Kroon
2025-03-14 22:16   ` fuse: increase readdir() buffer size Jaco Kroon
2025-03-14 22:16     ` [PATCH 1/2] fs: Supply dir_context.count as readdir buffer size hint Jaco Kroon
2025-03-29  9:20       ` Christophe JAILLET
2025-03-30 14:27         ` Jaco Kroon
2025-03-14 22:16     ` [PATCH 2/2] fuse: Adjust readdir() buffer to requesting buffer size Jaco Kroon
2025-03-31 16:41       ` Joanne Koong
2025-03-31 20:43         ` Jaco Kroon
2025-03-31 21:48           ` Joanne Koong
2025-03-31 23:01             ` Joanne Koong
2025-03-28 10:15     ` fuse: increase readdir() " Jaco Kroon
2025-03-28 10:16       ` Bernd Schubert
2025-03-28 19:40       ` David Laight
2025-03-29  8:59         ` Jaco Kroon
2025-04-01 14:18     ` fuse: increase readdir() buffer size [v4] Jaco Kroon
2025-04-01 14:18       ` [PATCH 1/2] fs: Supply dir_context.count as readdir buffer size hint Jaco Kroon
2025-04-01 14:18       ` [PATCH 2/2] fuse: Adjust readdir() buffer to requesting buffer size Jaco Kroon
2025-04-01 14:40         ` Miklos Szeredi
2025-04-01 15:03           ` Jaco Kroon
2025-04-01 15:33             ` Miklos Szeredi
2025-04-02  7:54               ` Jaco Kroon
2025-04-02  8:18                 ` Miklos Szeredi
2025-04-02  8:52                   ` Jaco Kroon
2025-04-02  9:10                     ` Bernd Schubert
2025-04-02 11:13                       ` Jaco Kroon
2025-04-02 11:35                         ` Miklos Szeredi
2025-04-02 11:59                         ` Bernd Schubert
2025-04-08 14:19               ` Bernd Schubert
2025-04-09  7:12                 ` Jaco Kroon
2025-04-09  8:31                   ` Bernd Schubert
2025-04-09 15:03                     ` Jaco Kroon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15fad0eb-b161-b87d-9964-e77a7193de48@fastmail.fm \
    --to=bernd.schubert@fastmail.fm \
    --cc=jaco@uls.co.za \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=rdunlap@infradead.org \
    --cc=trapexit@spawn.link \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).