From: Ira Weiny <ira.weiny@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>, <reiserfs-devel@vger.kernel.org>,
<linux-fsdevel@vger.kernel.org>,
"Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Subject: Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
Date: Tue, 20 Dec 2022 15:59:39 -0800 [thread overview]
Message-ID: <Y6JMazsjbPRJ7oMM@iweiny-desk3> (raw)
In-Reply-To: <Y6IAUetp7nihz9Qu@casper.infradead.org>
On Tue, Dec 20, 2022 at 06:34:57PM +0000, Matthew Wilcox wrote:
> On Tue, Dec 20, 2022 at 08:58:52AM -0800, Ira Weiny wrote:
> > On Tue, Dec 20, 2022 at 12:18:01PM +0100, Jan Kara wrote:
> > > On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> > > > But that doesn't solve the "What about fs block size > PAGE_SIZE"
> > > > problem that we also want to solve. Here's a concrete example:
> > > >
> > > > static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> > > > {
> > > > - struct page *page = bh->b_page;
> > > > + struct folio *folio = bh->b_folio;
> > > > char *addr;
> > > > __u32 checksum;
> > > >
> > > > - addr = kmap_atomic(page);
> > > > - checksum = crc32_be(crc32_sum,
> > > > - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> > > > - kunmap_atomic(addr);
> > > > + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> > > > +
> > > > + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> > > > + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> > > > + kunmap_local(addr);
> > > >
> > > > return checksum;
> > > > }
> > > >
> > > > I don't want to add a lot of complexity to handle the case of b_size >
> > > > PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> > > > many people. I'd rather have the assertion that we don't support it.
> > > > But if there's a good higher-level abstraction I'm missing here ...
> > >
> > > Just out of curiosity: So far I was thinking folio is physically contiguous
> > > chunk of memory. And if it is, then it does not seem as a huge overkill if
> > > kmap_local_folio() just maps the whole folio?
> >
> > Willy proposed that previously but we could not come to a consensus on how to
> > do it.
> >
> > https://lore.kernel.org/all/Yv2VouJb2pNbP59m@iweiny-desk3/
> >
> > FWIW I still think increasing the entries to cover any foreseeable need would
> > be sufficient because HIGHMEM does not need to be optimized. Couldn't we hide
> > the entry count into some config option which is only set if a FS needs a
> > larger block size on a HIGHMEM system?
>
> "any foreseeable need"? I mean ... I'd like to support 2MB folios,
> even on HIGHMEM machines, and that's 512 entries. If we're doing
> memcpy_to_folio(), we know that's only one mapping, but still, 512
> entries is _a lot_ of address space to be reserving on a 32-bit machine.
I'm confused. A memcpy_to_folio() could loop to map the pages as needed
depending on the amount of data to copy. Or just map/unmap in a loop.
This seems like an argument to have a memcpy_to_folio() to hide such nastiness
on HIGHMEM from the user.
> I don't know exactly what the address space layout is on x86-PAE or
> ARM-PAE these days, but as I recall, the low 3GB is user and the high
> 1GB is divided between LOWMEM and VMAP space; something like 800MB of
> LOWMEM and 200MB of vmap/kmap/PCI iomem/...
>
> Where I think we can absolutely get away with this reasoning is having
> a kmap_local_buffer(). It's perfectly reasonable to restrict fs block
> size to 64kB (after all, we've been limiting it to 4kB on x86 for thirty
> years), and having a __kmap_local_pfns(pfn, n, prot) doesn't seem like
> a terribly bad idea to me.
>
> So ... is this our path forward:
>
> - Introduce a complex memcpy_to/from_folio() in highmem.c that mirrors
> zero_user_segments()
> - Have a simple memcpy_to/from_folio() in highmem.h that mirrors
> zero_user_segments()
I'm confused again. What is the difference between the complex/simple other
than inline vs not?
> - Convert __kmap_local_pfn_prot() to __kmap_local_pfns()
I'm not sure I follow this need but I think you are speaking of having the
mapping of multiple pages in a tight loop in the preemption disabled region?
Frankly, I think this is an over optimization for HIGHMEM. Just loop calling
kmap_local_page() (either with or without an unmap depending on the details.)
> - Add kmap_local_buffer() that can handle buffer_heads up to, say, 16x
> PAGE_SIZE
I really just don't know the details of the various file systems.[*] Is this
something which could be hidden in Kconfig magic and just call this
kmap_local_folio()?
My gut says that HIGHMEM systems don't need large block size FS's. So could
large block size FS's be limited to !HIGHMEM configs?
Ira
[*] I only play a file system developer on TV. ;-)
next prev parent reply other threads:[~2022-12-20 23:59 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 1/8] reiserfs: use b_folio instead of b_page in some obvious cases Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0() Matthew Wilcox (Oracle)
2022-12-17 17:14 ` Ira Weiny
2022-12-17 19:07 ` Matthew Wilcox
2022-12-17 23:33 ` Ira Weiny
2022-12-19 10:42 ` Jan Kara
2022-12-16 20:53 ` [PATCH 3/8] reiserfs: Convert direct2indirect() to call folio_zero_range() Matthew Wilcox (Oracle)
2022-12-17 21:08 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 4/8] reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio() Matthew Wilcox (Oracle)
2022-12-17 23:44 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 5/8] reiserfs: Convert do_journal_end() " Matthew Wilcox (Oracle)
2022-12-17 23:52 ` Ira Weiny
2022-12-20 9:35 ` Matthew Wilcox
2022-12-20 11:18 ` Jan Kara
2022-12-20 16:58 ` Ira Weiny
2022-12-20 18:34 ` Matthew Wilcox
2022-12-20 23:59 ` Ira Weiny [this message]
2022-12-21 19:04 ` Matthew Wilcox
2022-12-22 10:37 ` Jan Kara
2022-12-16 20:53 ` [PATCH 6/8] reiserfs: Convert map_block_for_writepage() " Matthew Wilcox (Oracle)
2022-12-18 0:02 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 7/8] reiserfs: Convert convert_tail_for_hole() to use folios Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 8/8] reiserfs: Use flush_dcache_folio() in reiserfs_quota_write() Matthew Wilcox (Oracle)
2022-12-17 20:43 ` [PATCH 0/8] Convert reiserfs from b_page to b_folio Fabio M. De Francesco
2022-12-17 23:39 ` Ira Weiny
2022-12-18 8:09 ` Fabio M. De Francesco
2022-12-18 17:59 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6JMazsjbPRJ7oMM@iweiny-desk3 \
--to=ira.weiny@intel.com \
--cc=fmdefrancesco@gmail.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=reiserfs-devel@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).