* [PATCH 1/8] reiserfs: use b_folio instead of b_page in some obvious cases
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0() Matthew Wilcox (Oracle)
` (7 subsequent siblings)
8 siblings, 0 replies; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
These are checks against NULL, tests for uptodateness, comments and
printing the value.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/inode.c | 12 ++++++------
fs/reiserfs/prints.c | 4 ++--
fs/reiserfs/tail_conversion.c | 9 +++++----
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index c7d1fa526dea..41c0a785e9ab 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -313,7 +313,7 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
* associated with it that is yet to be written to disk.
*/
if ((args & GET_BLOCK_NO_HOLE)
- && !PageUptodate(bh_result->b_page)) {
+ && !folio_test_uptodate(bh_result->b_folio)) {
return -ENOENT;
}
return 0;
@@ -345,7 +345,7 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
* yet to be written to disk.
*/
if ((args & GET_BLOCK_NO_HOLE)
- && !PageUptodate(bh_result->b_page)) {
+ && !folio_test_uptodate(bh_result->b_folio)) {
ret = -ENOENT;
}
@@ -376,7 +376,7 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
* to date, we don't want read old data off disk. Set the up
* to date bit on the buffer instead and jump to the end
*/
- if (!bh_result->b_page || PageUptodate(bh_result->b_page)) {
+ if (!bh_result->b_folio || folio_test_uptodate(bh_result->b_folio)) {
set_buffer_uptodate(bh_result);
goto finished;
}
@@ -510,7 +510,7 @@ static int reiserfs_get_blocks_direct_io(struct inode *inode,
{
int ret;
- bh_result->b_page = NULL;
+ bh_result->b_folio = NULL;
/*
* We set the b_size before reiserfs_get_block call since it is
@@ -967,11 +967,11 @@ int reiserfs_get_block(struct inode *inode, sector_t block,
set_buffer_uptodate(unbh);
/*
- * unbh->b_page == NULL in case of DIRECT_IO request,
+ * unbh->b_folio == NULL in case of DIRECT_IO request,
* this means buffer will disappear shortly, so it
* should not be added to
*/
- if (unbh->b_page) {
+ if (unbh->b_folio) {
/*
* we've converted the tail, so we must
* flush unbh before the transaction commits
diff --git a/fs/reiserfs/prints.c b/fs/reiserfs/prints.c
index 84a194b77f19..c2efbc14d9c8 100644
--- a/fs/reiserfs/prints.c
+++ b/fs/reiserfs/prints.c
@@ -155,11 +155,11 @@ static int scnprintf_block_head(char *buf, size_t size, struct buffer_head *bh)
static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh)
{
return scnprintf(buf, size,
- "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)",
+ "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, folio %p, (%s, %s, %s)",
bh->b_bdev, bh->b_size,
(unsigned long long)bh->b_blocknr,
atomic_read(&(bh->b_count)),
- bh->b_state, bh->b_page,
+ bh->b_state, bh->b_folio,
buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE",
buffer_dirty(bh) ? "DIRTY" : "CLEAN",
buffer_locked(bh) ? "LOCKED" : "UNLOCKED");
diff --git a/fs/reiserfs/tail_conversion.c b/fs/reiserfs/tail_conversion.c
index 2cec61af2a9e..a61bca73c45f 100644
--- a/fs/reiserfs/tail_conversion.c
+++ b/fs/reiserfs/tail_conversion.c
@@ -127,11 +127,11 @@ int direct2indirect(struct reiserfs_transaction_handle *th, struct inode *inode,
* we only send the unbh pointer if the buffer is not
* up to date. this avoids overwriting good data from
* writepage() with old data from the disk or buffer cache
- * Special case: unbh->b_page will be NULL if we are coming
+ * Special case: unbh->b_folio will be NULL if we are coming
* through DIRECT_IO handler here.
*/
- if (!unbh->b_page || buffer_uptodate(unbh)
- || PageUptodate(unbh->b_page)) {
+ if (!unbh->b_folio || buffer_uptodate(unbh)
+ || folio_test_uptodate(unbh->b_folio)) {
up_to_date_bh = NULL;
} else {
up_to_date_bh = unbh;
@@ -176,7 +176,8 @@ void reiserfs_unmap_buffer(struct buffer_head *bh)
* interested in removing it from per-sb j_dirty_buffers list, to avoid
* BUG() on attempt to write not mapped buffer
*/
- if ((!list_empty(&bh->b_assoc_buffers) || bh->b_private) && bh->b_page) {
+ if ((!list_empty(&bh->b_assoc_buffers) || bh->b_private) &&
+ bh->b_folio) {
struct inode *inode = bh->b_folio->mapping->host;
struct reiserfs_journal *j = SB_JOURNAL(inode->i_sb);
spin_lock(&j->j_dirty_buffers_lock);
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0()
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 1/8] reiserfs: use b_folio instead of b_page in some obvious cases Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-17 17:14 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 3/8] reiserfs: Convert direct2indirect() to call folio_zero_range() Matthew Wilcox (Oracle)
` (6 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
Switch from the deprecated kmap() to kmap_local_folio(). For the
kunmap_local(), I subtract off 'chars' to prevent the possibility that
p has wrapped into the next page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/inode.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 41c0a785e9ab..0ca2d439510a 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -390,8 +390,7 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
* sure we need to. But, this means the item might move if
* kmap schedules
*/
- p = (char *)kmap(bh_result->b_page);
- p += offset;
+ p = kmap_local_folio(bh_result->b_folio, offset);
memset(p, 0, inode->i_sb->s_blocksize);
do {
if (!is_direct_le_ih(ih)) {
@@ -439,8 +438,8 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
ih = tp_item_head(&path);
} while (1);
- flush_dcache_page(bh_result->b_page);
- kunmap(bh_result->b_page);
+ flush_dcache_folio(bh_result->b_folio);
+ kunmap_local(p - chars);
finished:
pathrelse(&path);
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0()
2022-12-16 20:53 ` [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0() Matthew Wilcox (Oracle)
@ 2022-12-17 17:14 ` Ira Weiny
2022-12-17 19:07 ` Matthew Wilcox
0 siblings, 1 reply; 28+ messages in thread
From: Ira Weiny @ 2022-12-17 17:14 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Fri, Dec 16, 2022 at 08:53:41PM +0000, Matthew Wilcox (Oracle) wrote:
> Switch from the deprecated kmap() to kmap_local_folio(). For the
> kunmap_local(), I subtract off 'chars' to prevent the possibility that
> p has wrapped into the next page.
Thanks for tackling this one. I think the conversion is mostly safe because I
don't see any reason the mapping is passed to another thread.
But comments like this make me leary:
"But, this means the item might move if kmap schedules"
What does that mean? That seems to imply there is something wrong with the
base code separate from the kmapping.
To the patch, I think subtracting chars might be an issue. If chars > offset
and the loop takes the first 'if (done) break;' path then p will end up
pointing at the previous page wouldn't it?
Perhaps it is just safer to store the base pointer in this case?
Ira
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> fs/reiserfs/inode.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> index 41c0a785e9ab..0ca2d439510a 100644
> --- a/fs/reiserfs/inode.c
> +++ b/fs/reiserfs/inode.c
> @@ -390,8 +390,7 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
> * sure we need to. But, this means the item might move if
> * kmap schedules
> */
> - p = (char *)kmap(bh_result->b_page);
> - p += offset;
> + p = kmap_local_folio(bh_result->b_folio, offset);
> memset(p, 0, inode->i_sb->s_blocksize);
> do {
> if (!is_direct_le_ih(ih)) {
> @@ -439,8 +438,8 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
> ih = tp_item_head(&path);
> } while (1);
>
> - flush_dcache_page(bh_result->b_page);
> - kunmap(bh_result->b_page);
> + flush_dcache_folio(bh_result->b_folio);
> + kunmap_local(p - chars);
>
> finished:
> pathrelse(&path);
> --
> 2.35.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0()
2022-12-17 17:14 ` Ira Weiny
@ 2022-12-17 19:07 ` Matthew Wilcox
2022-12-17 23:33 ` Ira Weiny
2022-12-19 10:42 ` Jan Kara
0 siblings, 2 replies; 28+ messages in thread
From: Matthew Wilcox @ 2022-12-17 19:07 UTC (permalink / raw)
To: Ira Weiny; +Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Sat, Dec 17, 2022 at 09:14:05AM -0800, Ira Weiny wrote:
> On Fri, Dec 16, 2022 at 08:53:41PM +0000, Matthew Wilcox (Oracle) wrote:
> > Switch from the deprecated kmap() to kmap_local_folio(). For the
> > kunmap_local(), I subtract off 'chars' to prevent the possibility that
> > p has wrapped into the next page.
>
> Thanks for tackling this one. I think the conversion is mostly safe because I
> don't see any reason the mapping is passed to another thread.
>
> But comments like this make me leary:
>
> "But, this means the item might move if kmap schedules"
>
> What does that mean? That seems to imply there is something wrong with the
> base code separate from the kmapping.
I should probably have deleted that comment. I'm pretty sure what it
refers to is that we don't hold a lock that prevents the item from
moving. When ReiserFS was written, we didn't have CONFIG_PREEMPT, so
if kmap() scheduled, that was a point at which the item could move.
I don't think I introduced any additional brokenness by converting
from kmap() to kmap_local(). Maybe I'm wrong and somebody who
understands ReiserFS can explain.
> To the patch, I think subtracting chars might be an issue. If chars > offset
> and the loop takes the first 'if (done) break;' path then p will end up
> pointing at the previous page wouldn't it?
I thought about that and managed to convince myself that chars was
always < offset. But now I'm not sure again. Easiest way to fix
this is actually to move the p += chars before the if (done) break;.
I also need to rev this patch because it assumes that b_folio is a
single page.
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 008855ddb365..be13ce7a38e1 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -295,7 +295,6 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
int ret;
int result;
int done = 0;
- unsigned long offset;
/* prepare the key to look for the 'block'-th block of file */
make_cpu_key(&key, inode,
@@ -380,17 +379,16 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
set_buffer_uptodate(bh_result);
goto finished;
}
- /* read file tail into part of page */
- offset = (cpu_key_k_offset(&key) - 1) & (PAGE_SIZE - 1);
copy_item_head(&tmp_ih, ih);
/*
* we only want to kmap if we are reading the tail into the page.
* this is not the common case, so we don't kmap until we are
- * sure we need to. But, this means the item might move if
- * kmap schedules
+ * sure we need to.
*/
- p = kmap_local_folio(bh_result->b_folio, offset);
+ p = kmap_local_folio(bh_result->b_folio,
+ offset_in_folio(bh_result->b_folio,
+ cpu_key_k_offset(&key) - 1));
memset(p, 0, inode->i_sb->s_blocksize);
do {
if (!is_direct_le_ih(ih)) {
@@ -413,12 +411,11 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
chars = ih_item_len(ih) - path.pos_in_item;
}
memcpy(p, ih_item_body(bh, ih) + path.pos_in_item, chars);
+ p += chars;
if (done)
break;
- p += chars;
-
/*
* we done, if read direct item is not the last item of
* node FIXME: we could try to check right delimiting key
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0()
2022-12-17 19:07 ` Matthew Wilcox
@ 2022-12-17 23:33 ` Ira Weiny
2022-12-19 10:42 ` Jan Kara
1 sibling, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2022-12-17 23:33 UTC (permalink / raw)
To: Matthew Wilcox
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Sat, Dec 17, 2022 at 07:07:12PM +0000, Matthew Wilcox wrote:
> On Sat, Dec 17, 2022 at 09:14:05AM -0800, Ira Weiny wrote:
> > On Fri, Dec 16, 2022 at 08:53:41PM +0000, Matthew Wilcox (Oracle) wrote:
> > > Switch from the deprecated kmap() to kmap_local_folio(). For the
> > > kunmap_local(), I subtract off 'chars' to prevent the possibility that
> > > p has wrapped into the next page.
> >
> > Thanks for tackling this one. I think the conversion is mostly safe because I
> > don't see any reason the mapping is passed to another thread.
> >
> > But comments like this make me leary:
> >
> > "But, this means the item might move if kmap schedules"
> >
> > What does that mean? That seems to imply there is something wrong with the
> > base code separate from the kmapping.
>
> I should probably have deleted that comment. I'm pretty sure what it
> refers to is that we don't hold a lock that prevents the item from
> moving. When ReiserFS was written, we didn't have CONFIG_PREEMPT, so
> if kmap() scheduled, that was a point at which the item could move.
> I don't think I introduced any additional brokenness by converting
> from kmap() to kmap_local().
Agreed. I only said mostly safe because of the question about chars below.
> Maybe I'm wrong and somebody who
> understands ReiserFS can explain.
Yep.
>
> > To the patch, I think subtracting chars might be an issue. If chars > offset
> > and the loop takes the first 'if (done) break;' path then p will end up
> > pointing at the previous page wouldn't it?
>
> I thought about that and managed to convince myself that chars was
> always < offset. But now I'm not sure again. Easiest way to fix
> this is actually to move the p += chars before the if (done) break;.
I thought about that too. So yea that is fine.
>
> I also need to rev this patch because it assumes that b_folio is a
> single page.
>
> diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> index 008855ddb365..be13ce7a38e1 100644
> --- a/fs/reiserfs/inode.c
> +++ b/fs/reiserfs/inode.c
> @@ -295,7 +295,6 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
> int ret;
> int result;
> int done = 0;
> - unsigned long offset;
>
> /* prepare the key to look for the 'block'-th block of file */
> make_cpu_key(&key, inode,
> @@ -380,17 +379,16 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
> set_buffer_uptodate(bh_result);
> goto finished;
> }
> - /* read file tail into part of page */
> - offset = (cpu_key_k_offset(&key) - 1) & (PAGE_SIZE - 1);
> copy_item_head(&tmp_ih, ih);
>
> /*
> * we only want to kmap if we are reading the tail into the page.
> * this is not the common case, so we don't kmap until we are
> - * sure we need to. But, this means the item might move if
> - * kmap schedules
> + * sure we need to.
> */
> - p = kmap_local_folio(bh_result->b_folio, offset);
> + p = kmap_local_folio(bh_result->b_folio,
> + offset_in_folio(bh_result->b_folio,
> + cpu_key_k_offset(&key) - 1));
Yes good addition.
With these changes:
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> memset(p, 0, inode->i_sb->s_blocksize);
> do {
> if (!is_direct_le_ih(ih)) {
> @@ -413,12 +411,11 @@ static int _get_block_create_0(struct inode *inode, sector_t block,
> chars = ih_item_len(ih) - path.pos_in_item;
> }
> memcpy(p, ih_item_body(bh, ih) + path.pos_in_item, chars);
> + p += chars;
>
> if (done)
> break;
>
> - p += chars;
> -
> /*
> * we done, if read direct item is not the last item of
> * node FIXME: we could try to check right delimiting key
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0()
2022-12-17 19:07 ` Matthew Wilcox
2022-12-17 23:33 ` Ira Weiny
@ 2022-12-19 10:42 ` Jan Kara
1 sibling, 0 replies; 28+ messages in thread
From: Jan Kara @ 2022-12-19 10:42 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Ira Weiny, reiserfs-devel, Jan Kara, linux-fsdevel,
Fabio M. De Francesco
On Sat 17-12-22 19:07:12, Matthew Wilcox wrote:
> On Sat, Dec 17, 2022 at 09:14:05AM -0800, Ira Weiny wrote:
> > On Fri, Dec 16, 2022 at 08:53:41PM +0000, Matthew Wilcox (Oracle) wrote:
> > > Switch from the deprecated kmap() to kmap_local_folio(). For the
> > > kunmap_local(), I subtract off 'chars' to prevent the possibility that
> > > p has wrapped into the next page.
> >
> > Thanks for tackling this one. I think the conversion is mostly safe because I
> > don't see any reason the mapping is passed to another thread.
> >
> > But comments like this make me leary:
> >
> > "But, this means the item might move if kmap schedules"
> >
> > What does that mean? That seems to imply there is something wrong with the
> > base code separate from the kmapping.
>
> I should probably have deleted that comment. I'm pretty sure what it
> refers to is that we don't hold a lock that prevents the item from
> moving. When ReiserFS was written, we didn't have CONFIG_PREEMPT, so
> if kmap() scheduled, that was a point at which the item could move.
> I don't think I introduced any additional brokenness by converting
> from kmap() to kmap_local(). Maybe I'm wrong and somebody who
> understands ReiserFS can explain.
I think you've got it mostly right. Reiserfs comes from the times of the
Big Kernel Lock which reiserfs was using to protect all its data
structures. And that lock was automagically released when we scheduled. So
when kmap scheduled, the lock preventing item from moving got dropped and
we could have a problem. These days BKL in reiserfs got replaced by the
reiserfs_write_lock() / reiserfs_write_unlock() magic which mostly behaves
as a standard mutex so this is not a concern anymore.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 3/8] reiserfs: Convert direct2indirect() to call folio_zero_range()
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 1/8] reiserfs: use b_folio instead of b_page in some obvious cases Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 2/8] reiserfs: use kmap_local_folio() in _get_block_create_0() Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-17 21:08 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 4/8] reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio() Matthew Wilcox (Oracle)
` (5 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
Remove this open-coded call to kmap()/memset()/kunmap() with the
higher-level abstraction folio_zero_range().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/tail_conversion.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/reiserfs/tail_conversion.c b/fs/reiserfs/tail_conversion.c
index a61bca73c45f..ca36bb88b8b0 100644
--- a/fs/reiserfs/tail_conversion.c
+++ b/fs/reiserfs/tail_conversion.c
@@ -151,11 +151,11 @@ int direct2indirect(struct reiserfs_transaction_handle *th, struct inode *inode,
* out the unused part of the block (it was not up to date before)
*/
if (up_to_date_bh) {
- unsigned pgoff =
- (tail_offset + total_tail - 1) & (PAGE_SIZE - 1);
- char *kaddr = kmap_atomic(up_to_date_bh->b_page);
- memset(kaddr + pgoff, 0, blk_size - total_tail);
- kunmap_atomic(kaddr);
+ size_t start = offset_in_folio(up_to_date_bh->b_folio,
+ (tail_offset + total_tail - 1));
+
+ folio_zero_range(up_to_date_bh->b_folio, start,
+ blk_size - total_tail);
}
REISERFS_I(inode)->i_first_direct_byte = U32_MAX;
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/8] reiserfs: Convert direct2indirect() to call folio_zero_range()
2022-12-16 20:53 ` [PATCH 3/8] reiserfs: Convert direct2indirect() to call folio_zero_range() Matthew Wilcox (Oracle)
@ 2022-12-17 21:08 ` Ira Weiny
0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2022-12-17 21:08 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Fri, Dec 16, 2022 at 08:53:42PM +0000, Matthew Wilcox (Oracle) wrote:
> Remove this open-coded call to kmap()/memset()/kunmap() with the
> higher-level abstraction folio_zero_range().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
LGTM
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> ---
> fs/reiserfs/tail_conversion.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/reiserfs/tail_conversion.c b/fs/reiserfs/tail_conversion.c
> index a61bca73c45f..ca36bb88b8b0 100644
> --- a/fs/reiserfs/tail_conversion.c
> +++ b/fs/reiserfs/tail_conversion.c
> @@ -151,11 +151,11 @@ int direct2indirect(struct reiserfs_transaction_handle *th, struct inode *inode,
> * out the unused part of the block (it was not up to date before)
> */
> if (up_to_date_bh) {
> - unsigned pgoff =
> - (tail_offset + total_tail - 1) & (PAGE_SIZE - 1);
> - char *kaddr = kmap_atomic(up_to_date_bh->b_page);
> - memset(kaddr + pgoff, 0, blk_size - total_tail);
> - kunmap_atomic(kaddr);
> + size_t start = offset_in_folio(up_to_date_bh->b_folio,
> + (tail_offset + total_tail - 1));
> +
> + folio_zero_range(up_to_date_bh->b_folio, start,
> + blk_size - total_tail);
> }
>
> REISERFS_I(inode)->i_first_direct_byte = U32_MAX;
> --
> 2.35.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 4/8] reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio()
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
` (2 preceding siblings ...)
2022-12-16 20:53 ` [PATCH 3/8] reiserfs: Convert direct2indirect() to call folio_zero_range() Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-17 23:44 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 5/8] reiserfs: Convert do_journal_end() " Matthew Wilcox (Oracle)
` (4 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
kmap_atomic() is deprecated, and so is bh->b_page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/stree.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 84c12a1947b2..309c61bd90e0 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1359,12 +1359,13 @@ int reiserfs_delete_item(struct reiserfs_transaction_handle *th,
* -clm
*/
- data = kmap_atomic(un_bh->b_page);
- off = ((le_ih_k_offset(&s_ih) - 1) & (PAGE_SIZE - 1));
- memcpy(data + off,
+ off = offset_in_folio(un_bh->b_folio,
+ le_ih_k_offset(&s_ih) - 1);
+ data = kmap_local_folio(un_bh->b_folio, off);
+ memcpy(data,
ih_item_body(PATH_PLAST_BUFFER(path), &s_ih),
ret_value);
- kunmap_atomic(data);
+ kunmap_local(data);
}
/* Perform balancing after all resources have been collected at once. */
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 4/8] reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio()
2022-12-16 20:53 ` [PATCH 4/8] reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio() Matthew Wilcox (Oracle)
@ 2022-12-17 23:44 ` Ira Weiny
0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2022-12-17 23:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Fri, Dec 16, 2022 at 08:53:43PM +0000, Matthew Wilcox (Oracle) wrote:
> kmap_atomic() is deprecated, and so is bh->b_page.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> ---
> fs/reiserfs/stree.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
> index 84c12a1947b2..309c61bd90e0 100644
> --- a/fs/reiserfs/stree.c
> +++ b/fs/reiserfs/stree.c
> @@ -1359,12 +1359,13 @@ int reiserfs_delete_item(struct reiserfs_transaction_handle *th,
> * -clm
> */
>
> - data = kmap_atomic(un_bh->b_page);
> - off = ((le_ih_k_offset(&s_ih) - 1) & (PAGE_SIZE - 1));
> - memcpy(data + off,
> + off = offset_in_folio(un_bh->b_folio,
> + le_ih_k_offset(&s_ih) - 1);
> + data = kmap_local_folio(un_bh->b_folio, off);
> + memcpy(data,
> ih_item_body(PATH_PLAST_BUFFER(path), &s_ih),
> ret_value);
> - kunmap_atomic(data);
> + kunmap_local(data);
> }
>
> /* Perform balancing after all resources have been collected at once. */
> --
> 2.35.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
` (3 preceding siblings ...)
2022-12-16 20:53 ` [PATCH 4/8] reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio() Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-17 23:52 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 6/8] reiserfs: Convert map_block_for_writepage() " Matthew Wilcox (Oracle)
` (3 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
Remove uses of kmap() and b_page. Also move the set_buffer_uptodate()
call to after the memcpy() so that the memory barrier in
set_buffer_uptodate() actually ensures that the old data can't be visible.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/journal.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 9ce4ec296b74..faf2f09d82e1 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -4200,21 +4200,19 @@ static int do_journal_end(struct reiserfs_transaction_handle *th, int flags)
/* copy all the real blocks into log area. dirty log blocks */
if (buffer_journaled(cn->bh)) {
struct buffer_head *tmp_bh;
+ size_t offset = offset_in_folio(cn->bh->b_folio,
+ cn->bh->b_data);
char *addr;
- struct page *page;
tmp_bh =
journal_getblk(sb,
SB_ONDISK_JOURNAL_1st_BLOCK(sb) +
((cur_write_start +
jindex) %
SB_ONDISK_JOURNAL_SIZE(sb)));
+ addr = kmap_local_folio(cn->bh->b_folio, offset);
+ memcpy(tmp_bh->b_data, addr, cn->bh->b_size);
+ kunmap_local(addr);
set_buffer_uptodate(tmp_bh);
- page = cn->bh->b_page;
- addr = kmap(page);
- memcpy(tmp_bh->b_data,
- addr + offset_in_page(cn->bh->b_data),
- cn->bh->b_size);
- kunmap(page);
mark_buffer_dirty(tmp_bh);
jindex++;
set_buffer_journal_dirty(cn->bh);
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-16 20:53 ` [PATCH 5/8] reiserfs: Convert do_journal_end() " Matthew Wilcox (Oracle)
@ 2022-12-17 23:52 ` Ira Weiny
2022-12-20 9:35 ` Matthew Wilcox
0 siblings, 1 reply; 28+ messages in thread
From: Ira Weiny @ 2022-12-17 23:52 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Fri, Dec 16, 2022 at 08:53:44PM +0000, Matthew Wilcox (Oracle) wrote:
> Remove uses of kmap() and b_page. Also move the set_buffer_uptodate()
> call to after the memcpy() so that the memory barrier in
> set_buffer_uptodate() actually ensures that the old data can't be visible.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> fs/reiserfs/journal.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
> index 9ce4ec296b74..faf2f09d82e1 100644
> --- a/fs/reiserfs/journal.c
> +++ b/fs/reiserfs/journal.c
> @@ -4200,21 +4200,19 @@ static int do_journal_end(struct reiserfs_transaction_handle *th, int flags)
> /* copy all the real blocks into log area. dirty log blocks */
> if (buffer_journaled(cn->bh)) {
> struct buffer_head *tmp_bh;
> + size_t offset = offset_in_folio(cn->bh->b_folio,
> + cn->bh->b_data);
> char *addr;
> - struct page *page;
> tmp_bh =
> journal_getblk(sb,
> SB_ONDISK_JOURNAL_1st_BLOCK(sb) +
> ((cur_write_start +
> jindex) %
> SB_ONDISK_JOURNAL_SIZE(sb)));
> + addr = kmap_local_folio(cn->bh->b_folio, offset);
> + memcpy(tmp_bh->b_data, addr, cn->bh->b_size);
> + kunmap_local(addr);
I think we should have a memcpy_{to|from}_folio() like we do for the pages.
Did I miss this in the earlier patch?
Ira
> set_buffer_uptodate(tmp_bh);
> - page = cn->bh->b_page;
> - addr = kmap(page);
> - memcpy(tmp_bh->b_data,
> - addr + offset_in_page(cn->bh->b_data),
> - cn->bh->b_size);
> - kunmap(page);
> mark_buffer_dirty(tmp_bh);
> jindex++;
> set_buffer_journal_dirty(cn->bh);
> --
> 2.35.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-17 23:52 ` Ira Weiny
@ 2022-12-20 9:35 ` Matthew Wilcox
2022-12-20 11:18 ` Jan Kara
0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2022-12-20 9:35 UTC (permalink / raw)
To: Ira Weiny; +Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Sat, Dec 17, 2022 at 03:52:50PM -0800, Ira Weiny wrote:
> > + addr = kmap_local_folio(cn->bh->b_folio, offset);
> > + memcpy(tmp_bh->b_data, addr, cn->bh->b_size);
> > + kunmap_local(addr);
>
> I think we should have a memcpy_{to|from}_folio() like we do for the pages.
>
> Did I miss this in the earlier patch?
I've been going back-and-forth on a memcpy_(to|from)_folio().
I'm generally in favour of higher-level abstractions; I just don't
know that this is the right one.
On the one hand, we have a memcpy_(to|from)_page() already and
the folio version leads to an obvious conversion.
On the other hand, it's massively overkill for all the places I've been
looking at in reiserfs, which generally just want to touch the contents of
a single buffer_head, which is block-sized. Obviously I have a sampling
bias since I'm looking for bh->b_page usages. But I was thinking about
a kmap_local_buffer() for these cases.
But that doesn't solve the "What about fs block size > PAGE_SIZE"
problem that we also want to solve. Here's a concrete example:
static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
{
- struct page *page = bh->b_page;
+ struct folio *folio = bh->b_folio;
char *addr;
__u32 checksum;
- addr = kmap_atomic(page);
- checksum = crc32_be(crc32_sum,
- (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
- kunmap_atomic(addr);
+ BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
+
+ addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
+ checksum = crc32_be(crc32_sum, addr, bh->b_size);
+ kunmap_local(addr);
return checksum;
}
I don't want to add a lot of complexity to handle the case of b_size >
PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
many people. I'd rather have the assertion that we don't support it.
But if there's a good higher-level abstraction I'm missing here ...
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-20 9:35 ` Matthew Wilcox
@ 2022-12-20 11:18 ` Jan Kara
2022-12-20 16:58 ` Ira Weiny
0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-12-20 11:18 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Ira Weiny, reiserfs-devel, Jan Kara, linux-fsdevel,
Fabio M. De Francesco
On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> But that doesn't solve the "What about fs block size > PAGE_SIZE"
> problem that we also want to solve. Here's a concrete example:
>
> static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> {
> - struct page *page = bh->b_page;
> + struct folio *folio = bh->b_folio;
> char *addr;
> __u32 checksum;
>
> - addr = kmap_atomic(page);
> - checksum = crc32_be(crc32_sum,
> - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> - kunmap_atomic(addr);
> + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> +
> + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> + kunmap_local(addr);
>
> return checksum;
> }
>
> I don't want to add a lot of complexity to handle the case of b_size >
> PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> many people. I'd rather have the assertion that we don't support it.
> But if there's a good higher-level abstraction I'm missing here ...
Just out of curiosity: So far I was thinking folio is physically contiguous
chunk of memory. And if it is, then it does not seem as a huge overkill if
kmap_local_folio() just maps the whole folio? Or are you concerned about
the overhead of finding big enough hole in the vmap area?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-20 11:18 ` Jan Kara
@ 2022-12-20 16:58 ` Ira Weiny
2022-12-20 18:34 ` Matthew Wilcox
0 siblings, 1 reply; 28+ messages in thread
From: Ira Weiny @ 2022-12-20 16:58 UTC (permalink / raw)
To: Jan Kara
Cc: Matthew Wilcox, reiserfs-devel, linux-fsdevel,
Fabio M. De Francesco
On Tue, Dec 20, 2022 at 12:18:01PM +0100, Jan Kara wrote:
> On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> > But that doesn't solve the "What about fs block size > PAGE_SIZE"
> > problem that we also want to solve. Here's a concrete example:
> >
> > static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> > {
> > - struct page *page = bh->b_page;
> > + struct folio *folio = bh->b_folio;
> > char *addr;
> > __u32 checksum;
> >
> > - addr = kmap_atomic(page);
> > - checksum = crc32_be(crc32_sum,
> > - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> > - kunmap_atomic(addr);
> > + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> > +
> > + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> > + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> > + kunmap_local(addr);
> >
> > return checksum;
> > }
> >
> > I don't want to add a lot of complexity to handle the case of b_size >
> > PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> > many people. I'd rather have the assertion that we don't support it.
> > But if there's a good higher-level abstraction I'm missing here ...
>
> Just out of curiosity: So far I was thinking folio is physically contiguous
> chunk of memory. And if it is, then it does not seem as a huge overkill if
> kmap_local_folio() just maps the whole folio?
Willy proposed that previously but we could not come to a consensus on how to
do it.
https://lore.kernel.org/all/Yv2VouJb2pNbP59m@iweiny-desk3/
FWIW I still think increasing the entries to cover any foreseeable need would
be sufficient because HIGHMEM does not need to be optimized. Couldn't we hide
the entry count into some config option which is only set if a FS needs a
larger block size on a HIGHMEM system?
Ira
> Or are you concerned about
> the overhead of finding big enough hole in the vmap area?
>
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-20 16:58 ` Ira Weiny
@ 2022-12-20 18:34 ` Matthew Wilcox
2022-12-20 23:59 ` Ira Weiny
0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2022-12-20 18:34 UTC (permalink / raw)
To: Ira Weiny; +Cc: Jan Kara, reiserfs-devel, linux-fsdevel, Fabio M. De Francesco
On Tue, Dec 20, 2022 at 08:58:52AM -0800, Ira Weiny wrote:
> On Tue, Dec 20, 2022 at 12:18:01PM +0100, Jan Kara wrote:
> > On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> > > But that doesn't solve the "What about fs block size > PAGE_SIZE"
> > > problem that we also want to solve. Here's a concrete example:
> > >
> > > static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> > > {
> > > - struct page *page = bh->b_page;
> > > + struct folio *folio = bh->b_folio;
> > > char *addr;
> > > __u32 checksum;
> > >
> > > - addr = kmap_atomic(page);
> > > - checksum = crc32_be(crc32_sum,
> > > - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> > > - kunmap_atomic(addr);
> > > + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> > > +
> > > + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> > > + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> > > + kunmap_local(addr);
> > >
> > > return checksum;
> > > }
> > >
> > > I don't want to add a lot of complexity to handle the case of b_size >
> > > PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> > > many people. I'd rather have the assertion that we don't support it.
> > > But if there's a good higher-level abstraction I'm missing here ...
> >
> > Just out of curiosity: So far I was thinking folio is physically contiguous
> > chunk of memory. And if it is, then it does not seem as a huge overkill if
> > kmap_local_folio() just maps the whole folio?
>
> Willy proposed that previously but we could not come to a consensus on how to
> do it.
>
> https://lore.kernel.org/all/Yv2VouJb2pNbP59m@iweiny-desk3/
>
> FWIW I still think increasing the entries to cover any foreseeable need would
> be sufficient because HIGHMEM does not need to be optimized. Couldn't we hide
> the entry count into some config option which is only set if a FS needs a
> larger block size on a HIGHMEM system?
"any foreseeable need"? I mean ... I'd like to support 2MB folios,
even on HIGHMEM machines, and that's 512 entries. If we're doing
memcpy_to_folio(), we know that's only one mapping, but still, 512
entries is _a lot_ of address space to be reserving on a 32-bit machine.
I don't know exactly what the address space layout is on x86-PAE or
ARM-PAE these days, but as I recall, the low 3GB is user and the high
1GB is divided between LOWMEM and VMAP space; something like 800MB of
LOWMEM and 200MB of vmap/kmap/PCI iomem/...
Where I think we can absolutely get away with this reasoning is having
a kmap_local_buffer(). It's perfectly reasonable to restrict fs block
size to 64kB (after all, we've been limiting it to 4kB on x86 for thirty
years), and having a __kmap_local_pfns(pfn, n, prot) doesn't seem like
a terribly bad idea to me.
So ... is this our path forward:
- Introduce a complex memcpy_to/from_folio() in highmem.c that mirrors
zero_user_segments()
- Have a simple memcpy_to/from_folio() in highmem.h that mirrors
zero_user_segments()
- Convert __kmap_local_pfn_prot() to __kmap_local_pfns()
- Add kmap_local_buffer() that can handle buffer_heads up to, say, 16x
PAGE_SIZE
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-20 18:34 ` Matthew Wilcox
@ 2022-12-20 23:59 ` Ira Weiny
2022-12-21 19:04 ` Matthew Wilcox
0 siblings, 1 reply; 28+ messages in thread
From: Ira Weiny @ 2022-12-20 23:59 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jan Kara, reiserfs-devel, linux-fsdevel, Fabio M. De Francesco
On Tue, Dec 20, 2022 at 06:34:57PM +0000, Matthew Wilcox wrote:
> On Tue, Dec 20, 2022 at 08:58:52AM -0800, Ira Weiny wrote:
> > On Tue, Dec 20, 2022 at 12:18:01PM +0100, Jan Kara wrote:
> > > On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> > > > But that doesn't solve the "What about fs block size > PAGE_SIZE"
> > > > problem that we also want to solve. Here's a concrete example:
> > > >
> > > > static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> > > > {
> > > > - struct page *page = bh->b_page;
> > > > + struct folio *folio = bh->b_folio;
> > > > char *addr;
> > > > __u32 checksum;
> > > >
> > > > - addr = kmap_atomic(page);
> > > > - checksum = crc32_be(crc32_sum,
> > > > - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> > > > - kunmap_atomic(addr);
> > > > + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> > > > +
> > > > + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> > > > + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> > > > + kunmap_local(addr);
> > > >
> > > > return checksum;
> > > > }
> > > >
> > > > I don't want to add a lot of complexity to handle the case of b_size >
> > > > PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> > > > many people. I'd rather have the assertion that we don't support it.
> > > > But if there's a good higher-level abstraction I'm missing here ...
> > >
> > > Just out of curiosity: So far I was thinking folio is physically contiguous
> > > chunk of memory. And if it is, then it does not seem as a huge overkill if
> > > kmap_local_folio() just maps the whole folio?
> >
> > Willy proposed that previously but we could not come to a consensus on how to
> > do it.
> >
> > https://lore.kernel.org/all/Yv2VouJb2pNbP59m@iweiny-desk3/
> >
> > FWIW I still think increasing the entries to cover any foreseeable need would
> > be sufficient because HIGHMEM does not need to be optimized. Couldn't we hide
> > the entry count into some config option which is only set if a FS needs a
> > larger block size on a HIGHMEM system?
>
> "any foreseeable need"? I mean ... I'd like to support 2MB folios,
> even on HIGHMEM machines, and that's 512 entries. If we're doing
> memcpy_to_folio(), we know that's only one mapping, but still, 512
> entries is _a lot_ of address space to be reserving on a 32-bit machine.
I'm confused. A memcpy_to_folio() could loop to map the pages as needed
depending on the amount of data to copy. Or just map/unmap in a loop.
This seems like an argument to have a memcpy_to_folio() to hide such nastiness
on HIGHMEM from the user.
> I don't know exactly what the address space layout is on x86-PAE or
> ARM-PAE these days, but as I recall, the low 3GB is user and the high
> 1GB is divided between LOWMEM and VMAP space; something like 800MB of
> LOWMEM and 200MB of vmap/kmap/PCI iomem/...
>
> Where I think we can absolutely get away with this reasoning is having
> a kmap_local_buffer(). It's perfectly reasonable to restrict fs block
> size to 64kB (after all, we've been limiting it to 4kB on x86 for thirty
> years), and having a __kmap_local_pfns(pfn, n, prot) doesn't seem like
> a terribly bad idea to me.
>
> So ... is this our path forward:
>
> - Introduce a complex memcpy_to/from_folio() in highmem.c that mirrors
> zero_user_segments()
> - Have a simple memcpy_to/from_folio() in highmem.h that mirrors
> zero_user_segments()
I'm confused again. What is the difference between the complex/simple other
than inline vs not?
> - Convert __kmap_local_pfn_prot() to __kmap_local_pfns()
I'm not sure I follow this need but I think you are speaking of having the
mapping of multiple pages in a tight loop in the preemption disabled region?
Frankly, I think this is an over optimization for HIGHMEM. Just loop calling
kmap_local_page() (either with or without an unmap depending on the details.)
> - Add kmap_local_buffer() that can handle buffer_heads up to, say, 16x
> PAGE_SIZE
I really just don't know the details of the various file systems.[*] Is this
something which could be hidden in Kconfig magic and just call this
kmap_local_folio()?
My gut says that HIGHMEM systems don't need large block size FS's. So could
large block size FS's be limited to !HIGHMEM configs?
Ira
[*] I only play a file system developer on TV. ;-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-20 23:59 ` Ira Weiny
@ 2022-12-21 19:04 ` Matthew Wilcox
2022-12-22 10:37 ` Jan Kara
0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2022-12-21 19:04 UTC (permalink / raw)
To: Ira Weiny; +Cc: Jan Kara, reiserfs-devel, linux-fsdevel, Fabio M. De Francesco
On Tue, Dec 20, 2022 at 03:59:39PM -0800, Ira Weiny wrote:
> On Tue, Dec 20, 2022 at 06:34:57PM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 20, 2022 at 08:58:52AM -0800, Ira Weiny wrote:
> > > On Tue, Dec 20, 2022 at 12:18:01PM +0100, Jan Kara wrote:
> > > > On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> > > > > But that doesn't solve the "What about fs block size > PAGE_SIZE"
> > > > > problem that we also want to solve. Here's a concrete example:
> > > > >
> > > > > static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> > > > > {
> > > > > - struct page *page = bh->b_page;
> > > > > + struct folio *folio = bh->b_folio;
> > > > > char *addr;
> > > > > __u32 checksum;
> > > > >
> > > > > - addr = kmap_atomic(page);
> > > > > - checksum = crc32_be(crc32_sum,
> > > > > - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> > > > > - kunmap_atomic(addr);
> > > > > + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> > > > > +
> > > > > + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> > > > > + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> > > > > + kunmap_local(addr);
> > > > >
> > > > > return checksum;
> > > > > }
> > > > >
> > > > > I don't want to add a lot of complexity to handle the case of b_size >
> > > > > PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> > > > > many people. I'd rather have the assertion that we don't support it.
> > > > > But if there's a good higher-level abstraction I'm missing here ...
> > > >
> > > > Just out of curiosity: So far I was thinking folio is physically contiguous
> > > > chunk of memory. And if it is, then it does not seem as a huge overkill if
> > > > kmap_local_folio() just maps the whole folio?
> > >
> > > Willy proposed that previously but we could not come to a consensus on how to
> > > do it.
> > >
> > > https://lore.kernel.org/all/Yv2VouJb2pNbP59m@iweiny-desk3/
> > >
> > > FWIW I still think increasing the entries to cover any foreseeable need would
> > > be sufficient because HIGHMEM does not need to be optimized. Couldn't we hide
> > > the entry count into some config option which is only set if a FS needs a
> > > larger block size on a HIGHMEM system?
> >
> > "any foreseeable need"? I mean ... I'd like to support 2MB folios,
> > even on HIGHMEM machines, and that's 512 entries. If we're doing
> > memcpy_to_folio(), we know that's only one mapping, but still, 512
> > entries is _a lot_ of address space to be reserving on a 32-bit machine.
>
> I'm confused. A memcpy_to_folio() could loop to map the pages as needed
> depending on the amount of data to copy. Or just map/unmap in a loop.
>
> This seems like an argument to have a memcpy_to_folio() to hide such nastiness
> on HIGHMEM from the user.
I see that you are confused. What I'm not quite sure of is how I confused
you, so I'm just going to try again in different words.
Given the desire to support 2MB folios on x86/ARM PAE systems, we can't
have a kmap_local_entire_folio() because that would take up too much
address space.
But we can have a kmap_local_buffer() / kummap_local_buffer(). We can
restrict the maximum fs block size (== buffer->b-size) to a reasonably
small multiple of PAGE_SIZE, eg 16. That will let us kmap the entire
buffer, after making some of the changes described below.
That solves the jbd2_checksum_data() problem above, but isn't necessarily
the best solution for every filesystem "need to copy to a folio" problem.
So I think we do want memcpy_to/from_folio(), split out like the current
zero_user_segments().
I also think we want a copy_folio_from_iter_atomic(). Right now
iomap_write_iter() is a bit of a mess; it retrieves a multi-page folio
from the page cache multiple times instead of copying as much as it can
from userspace to the folio. There are some interesting issues to deal
with here, but putting it in iov_iter.c is better than hiding it in the
iomap code.
> > I don't know exactly what the address space layout is on x86-PAE or
> > ARM-PAE these days, but as I recall, the low 3GB is user and the high
> > 1GB is divided between LOWMEM and VMAP space; something like 800MB of
> > LOWMEM and 200MB of vmap/kmap/PCI iomem/...
> >
> > Where I think we can absolutely get away with this reasoning is having
> > a kmap_local_buffer(). It's perfectly reasonable to restrict fs block
> > size to 64kB (after all, we've been limiting it to 4kB on x86 for thirty
> > years), and having a __kmap_local_pfns(pfn, n, prot) doesn't seem like
> > a terribly bad idea to me.
> >
> > So ... is this our path forward:
> >
> > - Introduce a complex memcpy_to/from_folio() in highmem.c that mirrors
> > zero_user_segments()
> > - Have a simple memcpy_to/from_folio() in highmem.h that mirrors
> > zero_user_segments()
>
> I'm confused again. What is the difference between the complex/simple other
> than inline vs not?
>
> > - Convert __kmap_local_pfn_prot() to __kmap_local_pfns()
>
> I'm not sure I follow this need but I think you are speaking of having the
> mapping of multiple pages in a tight loop in the preemption disabled region?
>
> Frankly, I think this is an over optimization for HIGHMEM. Just loop calling
> kmap_local_page() (either with or without an unmap depending on the details.)
See the jbd2_checksum_data() example at the top, and design me a better
API that doesn't involve putting complexity into jbd2 ;-)
> > - Add kmap_local_buffer() that can handle buffer_heads up to, say, 16x
> > PAGE_SIZE
>
> I really just don't know the details of the various file systems.[*] Is this
> something which could be hidden in Kconfig magic and just call this
> kmap_local_folio()?
>
> My gut says that HIGHMEM systems don't need large block size FS's. So could
> large block size FS's be limited to !HIGHMEM configs?
They could, and that's the current approach, but it does seem plausible
that we could support HIGHMEM systems with fs-block-size > PAGE_SIZE
with only a little extra work.
> [*] I only play a file system developer on TV. ;-)
That's OK, I'm only pretending to be an MM developer. Keep quiet, and
I think we can get away with this.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/8] reiserfs: Convert do_journal_end() to use kmap_local_folio()
2022-12-21 19:04 ` Matthew Wilcox
@ 2022-12-22 10:37 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2022-12-22 10:37 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Ira Weiny, Jan Kara, reiserfs-devel, linux-fsdevel,
Fabio M. De Francesco
On Wed 21-12-22 19:04:02, Matthew Wilcox wrote:
> On Tue, Dec 20, 2022 at 03:59:39PM -0800, Ira Weiny wrote:
> > On Tue, Dec 20, 2022 at 06:34:57PM +0000, Matthew Wilcox wrote:
> > > On Tue, Dec 20, 2022 at 08:58:52AM -0800, Ira Weiny wrote:
> > > > On Tue, Dec 20, 2022 at 12:18:01PM +0100, Jan Kara wrote:
> > > > > On Tue 20-12-22 09:35:43, Matthew Wilcox wrote:
> > > > > > But that doesn't solve the "What about fs block size > PAGE_SIZE"
> > > > > > problem that we also want to solve. Here's a concrete example:
> > > > > >
> > > > > > static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
> > > > > > {
> > > > > > - struct page *page = bh->b_page;
> > > > > > + struct folio *folio = bh->b_folio;
> > > > > > char *addr;
> > > > > > __u32 checksum;
> > > > > >
> > > > > > - addr = kmap_atomic(page);
> > > > > > - checksum = crc32_be(crc32_sum,
> > > > > > - (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
> > > > > > - kunmap_atomic(addr);
> > > > > > + BUG_ON(IS_ENABLED(CONFIG_HIGHMEM) && bh->b_size > PAGE_SIZE);
> > > > > > +
> > > > > > + addr = kmap_local_folio(folio, offset_in_folio(folio, bh->b_data));
> > > > > > + checksum = crc32_be(crc32_sum, addr, bh->b_size);
> > > > > > + kunmap_local(addr);
> > > > > >
> > > > > > return checksum;
> > > > > > }
> > > > > >
> > > > > > I don't want to add a lot of complexity to handle the case of b_size >
> > > > > > PAGE_SIZE on a HIGHMEM machine since that's not going to benefit terribly
> > > > > > many people. I'd rather have the assertion that we don't support it.
> > > > > > But if there's a good higher-level abstraction I'm missing here ...
> > > > >
> > > > > Just out of curiosity: So far I was thinking folio is physically contiguous
> > > > > chunk of memory. And if it is, then it does not seem as a huge overkill if
> > > > > kmap_local_folio() just maps the whole folio?
> > > >
> > > > Willy proposed that previously but we could not come to a consensus on how to
> > > > do it.
> > > >
> > > > https://lore.kernel.org/all/Yv2VouJb2pNbP59m@iweiny-desk3/
> > > >
> > > > FWIW I still think increasing the entries to cover any foreseeable need would
> > > > be sufficient because HIGHMEM does not need to be optimized. Couldn't we hide
> > > > the entry count into some config option which is only set if a FS needs a
> > > > larger block size on a HIGHMEM system?
> > >
> > > "any foreseeable need"? I mean ... I'd like to support 2MB folios,
> > > even on HIGHMEM machines, and that's 512 entries. If we're doing
> > > memcpy_to_folio(), we know that's only one mapping, but still, 512
> > > entries is _a lot_ of address space to be reserving on a 32-bit machine.
> >
> > I'm confused. A memcpy_to_folio() could loop to map the pages as needed
> > depending on the amount of data to copy. Or just map/unmap in a loop.
> >
> > This seems like an argument to have a memcpy_to_folio() to hide such nastiness
> > on HIGHMEM from the user.
>
> I see that you are confused. What I'm not quite sure of is how I confused
> you, so I'm just going to try again in different words.
>
> Given the desire to support 2MB folios on x86/ARM PAE systems, we can't
> have a kmap_local_entire_folio() because that would take up too much
> address space.
Is that really a problem? I mean sure 2MB is noticeable in 32-bit address
space but these mappings are very shortlived due to their nature (and the
API kind of enforces that) so there'd hardly be more than a handful of them
existing in parallel on a system. Or is my expectation wrong?
But I agree the solution with memcpy_to/from_folio() works as well.
> > [*] I only play a file system developer on TV. ;-)
>
> That's OK, I'm only pretending to be an MM developer. Keep quiet, and
> I think we can get away with this.
"All the world's a stage, and all the men and women merely players." :)
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 6/8] reiserfs: Convert map_block_for_writepage() to use kmap_local_folio()
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
` (4 preceding siblings ...)
2022-12-16 20:53 ` [PATCH 5/8] reiserfs: Convert do_journal_end() " Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-18 0:02 ` Ira Weiny
2022-12-16 20:53 ` [PATCH 7/8] reiserfs: Convert convert_tail_for_hole() to use folios Matthew Wilcox (Oracle)
` (2 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
Removes uses of kmap() and b_page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/inode.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 0ca2d439510a..b79848111957 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2360,6 +2360,7 @@ static int map_block_for_writepage(struct inode *inode,
struct item_head tmp_ih;
struct item_head *ih;
struct buffer_head *bh;
+ char *p;
__le32 *item;
struct cpu_key key;
INITIALIZE_PATH(path);
@@ -2382,7 +2383,8 @@ static int map_block_for_writepage(struct inode *inode,
return -EIO;
}
- kmap(bh_result->b_page);
+ p = kmap_local_folio(bh_result->b_folio,
+ offset_in_folio(bh_result->b_folio, byte_offset - 1));
start_over:
reiserfs_write_lock(inode->i_sb);
make_cpu_key(&key, inode, byte_offset, TYPE_ANY, 3);
@@ -2413,9 +2415,6 @@ static int map_block_for_writepage(struct inode *inode,
set_block_dev_mapped(bh_result,
get_block_num(item, pos_in_item), inode);
} else if (is_direct_le_ih(ih)) {
- char *p;
- p = page_address(bh_result->b_page);
- p += (byte_offset - 1) & (PAGE_SIZE - 1);
copy_size = ih_item_len(ih) - pos_in_item;
fs_gen = get_generation(inode->i_sb);
@@ -2491,7 +2490,7 @@ static int map_block_for_writepage(struct inode *inode,
}
}
}
- kunmap(bh_result->b_page);
+ kunmap_local(p);
if (!retval && buffer_mapped(bh_result) && bh_result->b_blocknr == 0) {
/*
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 6/8] reiserfs: Convert map_block_for_writepage() to use kmap_local_folio()
2022-12-16 20:53 ` [PATCH 6/8] reiserfs: Convert map_block_for_writepage() " Matthew Wilcox (Oracle)
@ 2022-12-18 0:02 ` Ira Weiny
0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2022-12-18 0:02 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Fabio M. De Francesco
On Fri, Dec 16, 2022 at 08:53:45PM +0000, Matthew Wilcox (Oracle) wrote:
> Removes uses of kmap() and b_page.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
LGTM
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> ---
> fs/reiserfs/inode.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> index 0ca2d439510a..b79848111957 100644
> --- a/fs/reiserfs/inode.c
> +++ b/fs/reiserfs/inode.c
> @@ -2360,6 +2360,7 @@ static int map_block_for_writepage(struct inode *inode,
> struct item_head tmp_ih;
> struct item_head *ih;
> struct buffer_head *bh;
> + char *p;
> __le32 *item;
> struct cpu_key key;
> INITIALIZE_PATH(path);
> @@ -2382,7 +2383,8 @@ static int map_block_for_writepage(struct inode *inode,
> return -EIO;
> }
>
> - kmap(bh_result->b_page);
> + p = kmap_local_folio(bh_result->b_folio,
> + offset_in_folio(bh_result->b_folio, byte_offset - 1));
> start_over:
> reiserfs_write_lock(inode->i_sb);
> make_cpu_key(&key, inode, byte_offset, TYPE_ANY, 3);
> @@ -2413,9 +2415,6 @@ static int map_block_for_writepage(struct inode *inode,
> set_block_dev_mapped(bh_result,
> get_block_num(item, pos_in_item), inode);
> } else if (is_direct_le_ih(ih)) {
> - char *p;
> - p = page_address(bh_result->b_page);
> - p += (byte_offset - 1) & (PAGE_SIZE - 1);
> copy_size = ih_item_len(ih) - pos_in_item;
>
> fs_gen = get_generation(inode->i_sb);
> @@ -2491,7 +2490,7 @@ static int map_block_for_writepage(struct inode *inode,
> }
> }
> }
> - kunmap(bh_result->b_page);
> + kunmap_local(p);
>
> if (!retval && buffer_mapped(bh_result) && bh_result->b_blocknr == 0) {
> /*
> --
> 2.35.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 7/8] reiserfs: Convert convert_tail_for_hole() to use folios
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
` (5 preceding siblings ...)
2022-12-16 20:53 ` [PATCH 6/8] reiserfs: Convert map_block_for_writepage() " Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-16 20:53 ` [PATCH 8/8] reiserfs: Use flush_dcache_folio() in reiserfs_quota_write() Matthew Wilcox (Oracle)
2022-12-17 20:43 ` [PATCH 0/8] Convert reiserfs from b_page to b_folio Fabio M. De Francesco
8 siblings, 0 replies; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
It's not clear to me that reiserfs will ever support large folios, but
now this function will operate correctly if they are enabled.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/inode.c | 45 +++++++++++++++++++++++----------------------
1 file changed, 23 insertions(+), 22 deletions(-)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index b79848111957..008855ddb365 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -571,57 +571,58 @@ static int convert_tail_for_hole(struct inode *inode,
unsigned long index;
unsigned long tail_end;
unsigned long tail_start;
- struct page *tail_page;
- struct page *hole_page = bh_result->b_page;
+ struct folio *tail_folio;
+ struct folio *hole_folio = bh_result->b_folio;
int retval = 0;
if ((tail_offset & (bh_result->b_size - 1)) != 1)
return -EIO;
- /* always try to read until the end of the block */
- tail_start = tail_offset & (PAGE_SIZE - 1);
- tail_end = (tail_start | (bh_result->b_size - 1)) + 1;
-
index = tail_offset >> PAGE_SHIFT;
/*
- * hole_page can be zero in case of direct_io, we are sure
+ * hole_folio can be zero in case of direct_io, we are sure
* that we cannot get here if we write with O_DIRECT into tail page
*/
- if (!hole_page || index != hole_page->index) {
- tail_page = grab_cache_page(inode->i_mapping, index);
+ if (!hole_folio || !folio_contains(hole_folio, index)) {
+ tail_folio = __filemap_get_folio(inode->i_mapping, index,
+ FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+ mapping_gfp_mask(inode->i_mapping));
retval = -ENOMEM;
- if (!tail_page) {
+ if (!tail_folio)
goto out;
- }
} else {
- tail_page = hole_page;
+ tail_folio = hole_folio;
}
+ /* always try to read until the end of the block */
+ tail_start = offset_in_folio(tail_folio, tail_offset);
+ tail_end = (tail_start | (bh_result->b_size - 1)) + 1;
+
/*
* we don't have to make sure the conversion did not happen while
- * we were locking the page because anyone that could convert
+ * we were locking the folio because anyone that could convert
* must first take i_mutex.
*
- * We must fix the tail page for writing because it might have buffers
+ * We must fix the tail folio for writing because it might have buffers
* that are mapped, but have a block number of 0. This indicates tail
- * data that has been read directly into the page, and
+ * data that has been read directly into the folio, and
* __block_write_begin won't trigger a get_block in this case.
*/
- fix_tail_page_for_writing(tail_page);
- retval = __reiserfs_write_begin(tail_page, tail_start,
+ fix_tail_page_for_writing(&tail_folio->page);
+ retval = __reiserfs_write_begin(&tail_folio->page, tail_start,
tail_end - tail_start);
if (retval)
goto unlock;
/* tail conversion might change the data in the page */
- flush_dcache_page(tail_page);
+ flush_dcache_folio(tail_folio);
- retval = reiserfs_commit_write(NULL, tail_page, tail_start, tail_end);
+ retval = reiserfs_commit_write(NULL, &tail_folio->page, tail_start, tail_end);
unlock:
- if (tail_page != hole_page) {
- unlock_page(tail_page);
- put_page(tail_page);
+ if (tail_folio != hole_folio) {
+ folio_unlock(tail_folio);
+ folio_put(tail_folio);
}
out:
return retval;
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 8/8] reiserfs: Use flush_dcache_folio() in reiserfs_quota_write()
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
` (6 preceding siblings ...)
2022-12-16 20:53 ` [PATCH 7/8] reiserfs: Convert convert_tail_for_hole() to use folios Matthew Wilcox (Oracle)
@ 2022-12-16 20:53 ` Matthew Wilcox (Oracle)
2022-12-17 20:43 ` [PATCH 0/8] Convert reiserfs from b_page to b_folio Fabio M. De Francesco
8 siblings, 0 replies; 28+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-12-16 20:53 UTC (permalink / raw)
To: reiserfs-devel
Cc: Matthew Wilcox (Oracle), Jan Kara, linux-fsdevel,
Fabio M. De Francesco, Ira Weiny
Remove a use of b_page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/reiserfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 929acce6e731..ab9502381b40 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -2567,7 +2567,7 @@ static ssize_t reiserfs_quota_write(struct super_block *sb, int type,
}
lock_buffer(bh);
memcpy(bh->b_data + offset, data, tocopy);
- flush_dcache_page(bh->b_page);
+ flush_dcache_folio(bh->b_folio);
set_buffer_uptodate(bh);
unlock_buffer(bh);
reiserfs_write_lock(sb);
--
2.35.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 0/8] Convert reiserfs from b_page to b_folio
2022-12-16 20:53 [PATCH 0/8] Convert reiserfs from b_page to b_folio Matthew Wilcox (Oracle)
` (7 preceding siblings ...)
2022-12-16 20:53 ` [PATCH 8/8] reiserfs: Use flush_dcache_folio() in reiserfs_quota_write() Matthew Wilcox (Oracle)
@ 2022-12-17 20:43 ` Fabio M. De Francesco
2022-12-17 23:39 ` Ira Weiny
8 siblings, 1 reply; 28+ messages in thread
From: Fabio M. De Francesco @ 2022-12-17 20:43 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel, Ira Weiny
On venerdì 16 dicembre 2022 21:53:39 CET Matthew Wilcox (Oracle) wrote:
> These patches apply on top of
> https://lore.kernel.org/linux-fsdevel/20221215214402.3522366-1-willy@infradead
> .org/
>
> The non-trivial ones mostly revolve around uses of kmap()/kmap_atomic(),
> so review from the experts on those would be welcome.
I took a quick look at your conversions and they made me recall that months
ago you converted to kmap_local_folio() a previous conversion from kmap() to
kmap_local_page() in ext2_get_page(): commit 37ce0b319b287 ("ext2: Use a folio
in ext2_get_page()").
So I just saw kmap_local_folio again. Unfortunately, because of my
inexperience, I'm not able to see why we should prefer the use of this
function instead of kmap_local_page().
Can you please tell me why and when we should prefer kmap_local_folio() in
those cases too where kmap_local_page() can work properly? I'm asking because
these days I'm converting other *_get_page() from kmap() (including the series
to fs/ufs that I sent today).
> If these all look
> good to people, I can pass them off to Andrew for the 6.3 merge window.
>
> Running xfstests against reiserfs gives me 313/701 failures before and
> after this set of patches.
It has happened several times to me too. Some patches of mine have failures
from xfstests whose amounts and types don't change with or without my changes.
Several of them have already been merged. I guess that if they don't add
further failures everything is alright.
However, something is broken for sure... xfstests or the filesystems? :-/
Thanks,
Fabio
> I don't have a huge amount of confidence
> that we're really getting good coverage from that test run!
>
> Matthew Wilcox (Oracle) (8):
> reiserfs: use b_folio instead of b_page in some obvious cases
> reiserfs: use kmap_local_folio() in _get_block_create_0()
> reiserfs: Convert direct2indirect() to call folio_zero_range()
> reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio()
> reiserfs: Convert do_journal_end() to use kmap_local_folio()
> reiserfs: Convert map_block_for_writepage() to use kmap_local_folio()
> reiserfs: Convert convert_tail_for_hole() to use folios
> reiserfs: Use flush_dcache_folio() in reiserfs_quota_write()
>
> fs/reiserfs/inode.c | 73 +++++++++++++++++------------------
> fs/reiserfs/journal.c | 12 +++---
> fs/reiserfs/prints.c | 4 +-
> fs/reiserfs/stree.c | 9 +++--
> fs/reiserfs/super.c | 2 +-
> fs/reiserfs/tail_conversion.c | 19 ++++-----
> 6 files changed, 59 insertions(+), 60 deletions(-)
>
> --
> 2.35.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/8] Convert reiserfs from b_page to b_folio
2022-12-17 20:43 ` [PATCH 0/8] Convert reiserfs from b_page to b_folio Fabio M. De Francesco
@ 2022-12-17 23:39 ` Ira Weiny
2022-12-18 8:09 ` Fabio M. De Francesco
0 siblings, 1 reply; 28+ messages in thread
From: Ira Weiny @ 2022-12-17 23:39 UTC (permalink / raw)
To: Fabio M. De Francesco
Cc: Matthew Wilcox (Oracle), reiserfs-devel, Jan Kara, linux-fsdevel
On Sat, Dec 17, 2022 at 09:43:11PM +0100, Fabio M. De Francesco wrote:
> On venerdì 16 dicembre 2022 21:53:39 CET Matthew Wilcox (Oracle) wrote:
> > These patches apply on top of
> > https://lore.kernel.org/linux-fsdevel/20221215214402.3522366-1-willy@infradead
> > .org/
> >
> > The non-trivial ones mostly revolve around uses of kmap()/kmap_atomic(),
> > so review from the experts on those would be welcome.
>
> I took a quick look at your conversions and they made me recall that months
> ago you converted to kmap_local_folio() a previous conversion from kmap() to
> kmap_local_page() in ext2_get_page(): commit 37ce0b319b287 ("ext2: Use a folio
> in ext2_get_page()").
>
> So I just saw kmap_local_folio again. Unfortunately, because of my
> inexperience, I'm not able to see why we should prefer the use of this
> function instead of kmap_local_page().
>
> Can you please tell me why and when we should prefer kmap_local_folio() in
> those cases too where kmap_local_page() can work properly? I'm asking because
> these days I'm converting other *_get_page() from kmap() (including the series
> to fs/ufs that I sent today).
Fabio kmap_local_folio() works on folios and handles determining which page in
the folio is the correct one to map.
AFAICT (from a quick grep) fs/ufs does not have folio support. I am sure
Mathew would appreciate converting fs/ufs to folios if you have the time and
want to figure it out.
Ira
>
> > If these all look
> > good to people, I can pass them off to Andrew for the 6.3 merge window.
> >
> > Running xfstests against reiserfs gives me 313/701 failures before and
> > after this set of patches.
>
> It has happened several times to me too. Some patches of mine have failures
> from xfstests whose amounts and types don't change with or without my changes.
>
> Several of them have already been merged. I guess that if they don't add
> further failures everything is alright.
>
> However, something is broken for sure... xfstests or the filesystems? :-/
>
> Thanks,
>
> Fabio
>
> > I don't have a huge amount of confidence
> > that we're really getting good coverage from that test run!
> >
> > Matthew Wilcox (Oracle) (8):
> > reiserfs: use b_folio instead of b_page in some obvious cases
> > reiserfs: use kmap_local_folio() in _get_block_create_0()
> > reiserfs: Convert direct2indirect() to call folio_zero_range()
> > reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio()
> > reiserfs: Convert do_journal_end() to use kmap_local_folio()
> > reiserfs: Convert map_block_for_writepage() to use kmap_local_folio()
> > reiserfs: Convert convert_tail_for_hole() to use folios
> > reiserfs: Use flush_dcache_folio() in reiserfs_quota_write()
> >
> > fs/reiserfs/inode.c | 73 +++++++++++++++++------------------
> > fs/reiserfs/journal.c | 12 +++---
> > fs/reiserfs/prints.c | 4 +-
> > fs/reiserfs/stree.c | 9 +++--
> > fs/reiserfs/super.c | 2 +-
> > fs/reiserfs/tail_conversion.c | 19 ++++-----
> > 6 files changed, 59 insertions(+), 60 deletions(-)
> >
> > --
> > 2.35.1
>
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/8] Convert reiserfs from b_page to b_folio
2022-12-17 23:39 ` Ira Weiny
@ 2022-12-18 8:09 ` Fabio M. De Francesco
2022-12-18 17:59 ` Matthew Wilcox
0 siblings, 1 reply; 28+ messages in thread
From: Fabio M. De Francesco @ 2022-12-18 8:09 UTC (permalink / raw)
To: Ira Weiny, Matthew Wilcox (Oracle)
Cc: reiserfs-devel, Jan Kara, linux-fsdevel
On Sun, 18 Dec 2022, 00:40 Ira Weiny, <ira.weiny@intel.com> wrote:
On Sat, Dec 17, 2022 at 09:43:11PM +0100, Fabio M. De Francesco wrote:
> On venerdì 16 dicembre 2022 21:53:39 CET Matthew Wilcox (Oracle) wrote:
> > > These patches apply on top of
> > > https://lore.kernel.org/linux-fsdevel/20221215214402.3522366-1-willy@infradead
> > > .org/
> > >
> > > The non-trivial ones mostly revolve around uses of kmap()/kmap_atomic(),
> > > so review from the experts on those would be welcome.
> >
> > I took a quick look at your conversions and they made me recall that
> > months ago you converted to kmap_local_folio() a previous conversion from
> > kmap() to kmap_local_page() in ext2_get_page(): commit 37ce0b319b287
> > ("ext2: Use a folio in ext2_get_page()").
> >
> > So I just saw kmap_local_folio() again. Unfortunately, because of my
> > inexperience, I'm not able to understand on my own why we should prefer
> > the use of this function instead of kmap_local_page().
> >
> > Can you please tell me why and when we should prefer kmap_local_folio() in
> > those cases too where kmap_local_page() can work properly? I'm asking
> > because these days I'm converting other *_get_page() from kmap()
> > (including the series to fs/ufs that I sent today).
> Fabio kmap_local_folio() works on folios and handles determining which page
> in the folio is the correct one to map.
Ira, I understand that pages are parts of folios and that, for mapping, we
need to determine the right page to map. Correct?
I think that I was not able to ask my question clearly. Please, let me rework
with other words how I went to my question and reword the question itself...
It all started when months ago I saw a patch from Matthew about the conversion
from kmap_local_page() to kmap_local_folio() in ext2_get_page().
Here I wanted to comment on the xfstests failures but, when I read patch 2/8
of this series and saw kmap() converted to kmap_local_folio(), I thought to
also use this opportunity to ask about why and when kmap_local_folio() should
be preferred over kmap_local_page().
Obviously, I have nothing against these conversions. I would only like to
understand what are the reasons behind the preference for the folio function.
Mine is a general question about design, necessity, opportunity: what were the
reasons why, in the above-mentioned cases, the use of kmap_local_folio() has
been preferred over kmap_local_page()?
I saw that this series is about converting from b_page to b_folio, therefore
kmap_local_folio() is the obvious choice here.
But my mind went back again to ext2_get_page :-)
It looks to me that ext2_get_page() was working properly with
kmap_local_page() (since you made the conversion from kmap()). Therefore I
could not understand why it is preferred to call read_mapping_folio() to get a
folio and then map a page of that folio with kmap_local_folio().
I used to think that read_mapping_page() + kmap_local_page() was all we
needed. ATM I have not enough knowledge of VFS/filesystems to understand on my
own what we gain from the other way to local map pages.
I hope to having been clearer this time...
Can you and/or Matthew please say some words about this?
> AFAICT (from a quick grep) fs/ufs does not have folio support.
I was not specifically talking about the fs/ufs conversion. Other conversions
are in my queue (e.g., fs/sysv is next according to Al's suggestions, and in
January others will be added to the same queue).
> I am sure Mathew would appreciate converting fs/ufs to folios if you have
> the time and want to figure it out.
About a year ago Matthew provided me with precious help when I was converting
a Unisys driver from IDR to XArray, so I guess he would be helpful with this
task too :-)
I'd really like to work on converting fs/ufs to folios but you know that I'll
have enough time to work on other projects only starting by the end of
January.
AFAIK this task has mainly got to do with the conversions of the address space
operations (correct?). I know too little to be able to estimate how much time
it takes but I'm pretty sure it needs more than I currently can set aside.
Instead I could easily devolve the time it is needed for making the
memcpy_{to|from}_folio() helpers you talked about in a patch of this series,
unless you or Matthew prefer to do yourselves. Please let me know.
Thanks,
Fabio
> Ira
>
> > If these all look
> > good to people, I can pass them off to Andrew for the 6.3 merge window.
> >
> > Running xfstests against reiserfs gives me 313/701 failures before and
> > after this set of patches.
>
> It has happened several times to me too. Some patches of mine have failures
> from xfstests whose amounts and types don't change with or without my
changes.
>
> Several of them have already been merged. I guess that if they don't add
> further failures everything is alright.
>
> However, something is broken for sure... xfstests or the filesystems? :-/
>
> Thanks,
>
> Fabio
>
> > I don't have a huge amount of confidence
> > that we're really getting good coverage from that test run!
> >
> > Matthew Wilcox (Oracle) (8):
> > reiserfs: use b_folio instead of b_page in some obvious cases
> > reiserfs: use kmap_local_folio() in _get_block_create_0()
> > reiserfs: Convert direct2indirect() to call folio_zero_range()
> > reiserfs: Convert reiserfs_delete_item() to use kmap_local_folio()
> > reiserfs: Convert do_journal_end() to use kmap_local_folio()
> > reiserfs: Convert map_block_for_writepage() to use kmap_local_folio()
> > reiserfs: Convert convert_tail_for_hole() to use folios
> > reiserfs: Use flush_dcache_folio() in reiserfs_quota_write()
> >
> > fs/reiserfs/inode.c | 73 +++++++++++++++++------------------
> > fs/reiserfs/journal.c | 12 +++---
> > fs/reiserfs/prints.c | 4 +-
> > fs/reiserfs/stree.c | 9 +++--
> > fs/reiserfs/super.c | 2 +-
> > fs/reiserfs/tail_conversion.c | 19 ++++-----
> > 6 files changed, 59 insertions(+), 60 deletions(-)
> >
> > --
> > 2.35.1
>
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/8] Convert reiserfs from b_page to b_folio
2022-12-18 8:09 ` Fabio M. De Francesco
@ 2022-12-18 17:59 ` Matthew Wilcox
0 siblings, 0 replies; 28+ messages in thread
From: Matthew Wilcox @ 2022-12-18 17:59 UTC (permalink / raw)
To: Fabio M. De Francesco; +Cc: Ira Weiny, reiserfs-devel, Jan Kara, linux-fsdevel
On Sun, Dec 18, 2022 at 09:09:56AM +0100, Fabio M. De Francesco wrote:
> It all started when months ago I saw a patch from Matthew about the conversion
> from kmap_local_page() to kmap_local_folio() in ext2_get_page().
>
> Here I wanted to comment on the xfstests failures but, when I read patch 2/8
> of this series and saw kmap() converted to kmap_local_folio(), I thought to
> also use this opportunity to ask about why and when kmap_local_folio() should
> be preferred over kmap_local_page().
>
> Obviously, I have nothing against these conversions. I would only like to
> understand what are the reasons behind the preference for the folio function.
I should probably update this, but here's a good start on folio vs page:
https://lore.kernel.org/linux-fsdevel/YvV1KTyzZ+Jrtj9x@casper.infradead.org/
> Mine is a general question about design, necessity, opportunity: what were the
> reasons why, in the above-mentioned cases, the use of kmap_local_folio() has
> been preferred over kmap_local_page()?
>
> I saw that this series is about converting from b_page to b_folio, therefore
> kmap_local_folio() is the obvious choice here.
>
> But my mind went back again to ext2_get_page :-)
>
> It looks to me that ext2_get_page() was working properly with
> kmap_local_page() (since you made the conversion from kmap()). Therefore I
> could not understand why it is preferred to call read_mapping_folio() to get a
> folio and then map a page of that folio with kmap_local_folio().
>
> I used to think that read_mapping_page() + kmap_local_page() was all we
> needed. ATM I have not enough knowledge of VFS/filesystems to understand on my
> own what we gain from the other way to local map pages.
read_mapping_page() is scheduled for removal. All callers need to be
converted to read_mapping_folio() (or another variant if preferable).
I don't mind following along behind your conversions to kmap_local_page()
and converting them to kmap_local_folio(), but if I can go straight from
kmap() / kmap_atomic() to kmap_local_folio(), then I'll do that.
Eventually, kmap_local_page() will _probably_ disappear. ext2_get_page()
is really only partially converted, and you can see that by the way it
calls ext2_check_page() instead of ext2_check_folio(). I have a design
in place for ext2_check_folio() that handles mapping large folios,
but it isn't on the top of my pile right now.
> I'd really like to work on converting fs/ufs to folios but you know that I'll
> have enough time to work on other projects only starting by the end of
> January.
>
> AFAIK this task has mainly got to do with the conversions of the address space
> operations (correct?). I know too little to be able to estimate how much time
> it takes but I'm pretty sure it needs more than I currently can set aside.
>
> Instead I could easily devolve the time it is needed for making the
> memcpy_{to|from}_folio() helpers you talked about in a patch of this series,
> unless you or Matthew prefer to do yourselves. Please let me know.
I've been wondering about a memcpy_(to|from)_folio() helper too, but I
haven't read Ira's message about it yet. I'll comment over there.
^ permalink raw reply [flat|nested] 28+ messages in thread