* [PATCH v2 0/7] More buffer_head cleanups
@ 2023-11-09 21:06 Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 1/7] buffer: Return bool from grow_dev_folio() Matthew Wilcox (Oracle)
` (6 more replies)
0 siblings, 7 replies; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
The first patch is a left-over from last cycle. The rest fix "obvious"
block size > PAGE_SIZE problems. I haven't tested with a large block
size setup (but I have done an ext4 xfstests run).
v2:
- Pick up R-b tags from Pankaj
- Use div_u64() instead of a raw divide. Thanks, i386.
- Add a patch for __block_write_begin_int(), spotted by Ryusuke Konishi
- That inspired the seventh patch which finally eliminates block_size_bits().
Matthew Wilcox (Oracle) (7):
buffer: Return bool from grow_dev_folio()
buffer: Calculate block number inside folio_init_buffers()
buffer: Fix grow_buffers() for block size > PAGE_SIZE
buffer: Cast block to loff_t before shifting it
buffer: Fix various functions for block size > PAGE_SIZE
buffer: Handle large folios in __block_write_begin_int()
buffer: Fix more functions for block size > PAGE_SIZE
fs/buffer.c | 130 +++++++++++++++++++++-------------------------------
1 file changed, 53 insertions(+), 77 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 1/7] buffer: Return bool from grow_dev_folio()
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-10 3:43 ` Ryusuke Konishi
2023-11-09 21:06 ` [PATCH v2 2/7] buffer: Calculate block number inside folio_init_buffers() Matthew Wilcox (Oracle)
` (5 subsequent siblings)
6 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
Rename grow_dev_page() to grow_dev_folio() and make it return a bool.
Document what that bool means; it's more subtle than it first appears.
Also rename the 'failed' label to 'unlock' beacuse it's not exactly
'failed'. It just hasn't succeeded.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/buffer.c | 50 +++++++++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 967f34b70aa8..8dad6c691e14 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1024,40 +1024,43 @@ static sector_t folio_init_buffers(struct folio *folio,
}
/*
- * Create the page-cache page that contains the requested block.
+ * Create the page-cache folio that contains the requested block.
*
* This is used purely for blockdev mappings.
+ *
+ * Returns false if we have a 'permanent' failure. Returns true if
+ * we succeeded, or the caller should retry.
*/
-static int
-grow_dev_page(struct block_device *bdev, sector_t block,
- pgoff_t index, int size, int sizebits, gfp_t gfp)
+static bool grow_dev_folio(struct block_device *bdev, sector_t block,
+ pgoff_t index, unsigned size, int sizebits, gfp_t gfp)
{
struct inode *inode = bdev->bd_inode;
struct folio *folio;
struct buffer_head *bh;
- sector_t end_block;
- int ret = 0;
+ sector_t end_block = 0;
folio = __filemap_get_folio(inode->i_mapping, index,
FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
if (IS_ERR(folio))
- return PTR_ERR(folio);
+ return false;
bh = folio_buffers(folio);
if (bh) {
if (bh->b_size == size) {
end_block = folio_init_buffers(folio, bdev,
(sector_t)index << sizebits, size);
- goto done;
+ goto unlock;
}
+
+ /* Caller should retry if this call fails */
+ end_block = ~0ULL;
if (!try_to_free_buffers(folio))
- goto failed;
+ goto unlock;
}
- ret = -ENOMEM;
bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
if (!bh)
- goto failed;
+ goto unlock;
/*
* Link the folio to the buffers and initialise them. Take the
@@ -1069,20 +1072,19 @@ grow_dev_page(struct block_device *bdev, sector_t block,
end_block = folio_init_buffers(folio, bdev,
(sector_t)index << sizebits, size);
spin_unlock(&inode->i_mapping->private_lock);
-done:
- ret = (block < end_block) ? 1 : -ENXIO;
-failed:
+unlock:
folio_unlock(folio);
folio_put(folio);
- return ret;
+ return block < end_block;
}
/*
- * Create buffers for the specified block device block's page. If
- * that page was dirty, the buffers are set dirty also.
+ * Create buffers for the specified block device block's folio. If
+ * that folio was dirty, the buffers are set dirty also. Returns false
+ * if we've hit a permanent error.
*/
-static int
-grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
+static bool grow_buffers(struct block_device *bdev, sector_t block,
+ unsigned size, gfp_t gfp)
{
pgoff_t index;
int sizebits;
@@ -1099,11 +1101,11 @@ grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
"device %pg\n",
__func__, (unsigned long long)block,
bdev);
- return -EIO;
+ return false;
}
- /* Create a page with the proper size buffers.. */
- return grow_dev_page(bdev, block, index, size, sizebits, gfp);
+ /* Create a folio with the proper size buffers */
+ return grow_dev_folio(bdev, block, index, size, sizebits, gfp);
}
static struct buffer_head *
@@ -1124,14 +1126,12 @@ __getblk_slow(struct block_device *bdev, sector_t block,
for (;;) {
struct buffer_head *bh;
- int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
- ret = grow_buffers(bdev, block, size, gfp);
- if (ret < 0)
+ if (!grow_buffers(bdev, block, size, gfp))
return NULL;
}
}
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 2/7] buffer: Calculate block number inside folio_init_buffers()
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 1/7] buffer: Return bool from grow_dev_folio() Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE Matthew Wilcox (Oracle)
` (4 subsequent siblings)
6 siblings, 0 replies; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
The calculation of block from index doesn't work for devices with a block
size larger than PAGE_SIZE as we end up shifting by a negative number.
Instead, calculate the number of the first block from the folio's
position in the block device. We no longer need to pass sizebits to
grow_dev_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
---
fs/buffer.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 8dad6c691e14..44e0c0b7f71f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -995,11 +995,12 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
* Initialise the state of a blockdev folio's buffers.
*/
static sector_t folio_init_buffers(struct folio *folio,
- struct block_device *bdev, sector_t block, int size)
+ struct block_device *bdev, unsigned size)
{
struct buffer_head *head = folio_buffers(folio);
struct buffer_head *bh = head;
bool uptodate = folio_test_uptodate(folio);
+ sector_t block = div_u64(folio_pos(folio), size);
sector_t end_block = blkdev_max_block(bdev, size);
do {
@@ -1032,7 +1033,7 @@ static sector_t folio_init_buffers(struct folio *folio,
* we succeeded, or the caller should retry.
*/
static bool grow_dev_folio(struct block_device *bdev, sector_t block,
- pgoff_t index, unsigned size, int sizebits, gfp_t gfp)
+ pgoff_t index, unsigned size, gfp_t gfp)
{
struct inode *inode = bdev->bd_inode;
struct folio *folio;
@@ -1047,8 +1048,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
bh = folio_buffers(folio);
if (bh) {
if (bh->b_size == size) {
- end_block = folio_init_buffers(folio, bdev,
- (sector_t)index << sizebits, size);
+ end_block = folio_init_buffers(folio, bdev, size);
goto unlock;
}
@@ -1069,8 +1069,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
*/
spin_lock(&inode->i_mapping->private_lock);
link_dev_buffers(folio, bh);
- end_block = folio_init_buffers(folio, bdev,
- (sector_t)index << sizebits, size);
+ end_block = folio_init_buffers(folio, bdev, size);
spin_unlock(&inode->i_mapping->private_lock);
unlock:
folio_unlock(folio);
@@ -1105,7 +1104,7 @@ static bool grow_buffers(struct block_device *bdev, sector_t block,
}
/* Create a folio with the proper size buffers */
- return grow_dev_folio(bdev, block, index, size, sizebits, gfp);
+ return grow_dev_folio(bdev, block, index, size, gfp);
}
static struct buffer_head *
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 1/7] buffer: Return bool from grow_dev_folio() Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 2/7] buffer: Calculate block number inside folio_init_buffers() Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-10 6:37 ` Ryusuke Konishi
2023-11-12 4:52 ` kernel test robot
2023-11-09 21:06 ` [PATCH v2 4/7] buffer: Cast block to loff_t before shifting it Matthew Wilcox (Oracle)
` (3 subsequent siblings)
6 siblings, 2 replies; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
We must not shift by a negative number so work in terms of a byte
offset to avoid the awkward shift left-or-right-depending-on-sign
option. This means we need to use check_mul_overflow() to ensure
that a large block number does not result in a wrap.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/buffer.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 44e0c0b7f71f..9c3f49cf8d28 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1085,26 +1085,21 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
static bool grow_buffers(struct block_device *bdev, sector_t block,
unsigned size, gfp_t gfp)
{
- pgoff_t index;
- int sizebits;
-
- sizebits = PAGE_SHIFT - __ffs(size);
- index = block >> sizebits;
+ loff_t pos;
/*
- * Check for a block which wants to lie outside our maximum possible
- * pagecache index. (this comparison is done using sector_t types).
+ * Check for a block which lies outside our maximum possible
+ * pagecache index.
*/
- if (unlikely(index != block >> sizebits)) {
- printk(KERN_ERR "%s: requested out-of-range block %llu for "
- "device %pg\n",
+ if (check_mul_overflow(block, size, &pos) || pos > MAX_LFS_FILESIZE) {
+ printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
__func__, (unsigned long long)block,
bdev);
return false;
}
/* Create a folio with the proper size buffers */
- return grow_dev_folio(bdev, block, index, size, gfp);
+ return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
}
static struct buffer_head *
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] buffer: Cast block to loff_t before shifting it
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
` (2 preceding siblings ...)
2023-11-09 21:06 ` [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
` (2 subsequent siblings)
6 siblings, 0 replies; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
While sector_t is always defined as a u64 today, that hasn't always been
the case and it might not always be the same size as loff_t in the future.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 9c3f49cf8d28..ab345fd4da12 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2008,7 +2008,7 @@ static int
iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
const struct iomap *iomap)
{
- loff_t offset = block << inode->i_blkbits;
+ loff_t offset = (loff_t)block << inode->i_blkbits;
bh->b_bdev = iomap->bdev;
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
` (3 preceding siblings ...)
2023-11-09 21:06 ` [PATCH v2 4/7] buffer: Cast block to loff_t before shifting it Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-10 7:48 ` Ryusuke Konishi
2023-11-09 21:06 ` [PATCH v2 6/7] buffer: Handle large folios in __block_write_begin_int() Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
6 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
If i_blkbits is larger than PAGE_SHIFT, we shift by a negative number,
which is undefined. It is safe to shift the block left as a block
device must be smaller than MAX_LFS_FILESIZE, which is guaranteed to
fit in loff_t.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
---
fs/buffer.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index ab345fd4da12..faf1916200c2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -199,7 +199,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
int all_mapped = 1;
static DEFINE_RATELIMIT_STATE(last_warned, HZ, 1);
- index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
+ index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
folio = __filemap_get_folio(bd_mapping, index, FGP_ACCESSED, 0);
if (IS_ERR(folio))
goto out;
@@ -1693,13 +1693,13 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
struct inode *bd_inode = bdev->bd_inode;
struct address_space *bd_mapping = bd_inode->i_mapping;
struct folio_batch fbatch;
- pgoff_t index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
+ pgoff_t index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
pgoff_t end;
int i, count;
struct buffer_head *bh;
struct buffer_head *head;
- end = (block + len - 1) >> (PAGE_SHIFT - bd_inode->i_blkbits);
+ end = ((loff_t)(block + len - 1) << bd_inode->i_blkbits) / PAGE_SIZE;
folio_batch_init(&fbatch);
while (filemap_get_folios(bd_mapping, &index, end, &fbatch)) {
count = folio_batch_count(&fbatch);
@@ -2660,8 +2660,8 @@ int block_truncate_page(struct address_space *mapping,
return 0;
length = blocksize - length;
- iblock = (sector_t)index << (PAGE_SHIFT - inode->i_blkbits);
-
+ iblock = ((loff_t)index * PAGE_SIZE) >> inode->i_blkbits;
+
folio = filemap_grab_folio(mapping, index);
if (IS_ERR(folio))
return PTR_ERR(folio);
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 6/7] buffer: Handle large folios in __block_write_begin_int()
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
` (4 preceding siblings ...)
2023-11-09 21:06 ` [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
6 siblings, 0 replies; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel, Ryusuke Konishi
When __block_write_begin_int() was converted to support folios, we
did not expect large folios to be passed to it. With the current
work to support large block size storage devices, this will no longer
be true so change the checks on 'from' and 'to' to be related to the
size of the folio instead of PAGE_SIZE. Also remove an assumption that
the block size is smaller than PAGE_SIZE.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
---
fs/buffer.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index faf1916200c2..ef444ab53a9b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2075,27 +2075,24 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
get_block_t *get_block, const struct iomap *iomap)
{
- unsigned from = pos & (PAGE_SIZE - 1);
- unsigned to = from + len;
+ size_t from = offset_in_folio(folio, pos);
+ size_t to = from + len;
struct inode *inode = folio->mapping->host;
- unsigned block_start, block_end;
+ size_t block_start, block_end;
sector_t block;
int err = 0;
- unsigned blocksize, bbits;
+ size_t blocksize;
struct buffer_head *bh, *head, *wait[2], **wait_bh=wait;
BUG_ON(!folio_test_locked(folio));
- BUG_ON(from > PAGE_SIZE);
- BUG_ON(to > PAGE_SIZE);
+ BUG_ON(to > folio_size(folio));
BUG_ON(from > to);
head = folio_create_buffers(folio, inode, 0);
blocksize = head->b_size;
- bbits = block_size_bits(blocksize);
-
- block = (sector_t)folio->index << (PAGE_SHIFT - bbits);
+ block = div_u64(folio_pos(folio), blocksize);
- for(bh = head, block_start = 0; bh != head || !block_start;
+ for (bh = head, block_start = 0; bh != head || !block_start;
block++, block_start=block_end, bh = bh->b_this_page) {
block_end = block_start + blocksize;
if (block_end <= from || block_start >= to) {
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
` (5 preceding siblings ...)
2023-11-09 21:06 ` [PATCH v2 6/7] buffer: Handle large folios in __block_write_begin_int() Matthew Wilcox (Oracle)
@ 2023-11-09 21:06 ` Matthew Wilcox (Oracle)
2023-11-10 4:50 ` Eric Biggers
6 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-11-09 21:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
Both __block_write_full_folio() and block_read_full_folio() assumed
that block size <= PAGE_SIZE. Replace the shift with a divide, which
is probably cheaper than first calculating the shift. That lets us
remove block_size_bits() as these were the last callers.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/buffer.c | 27 ++++++---------------------
1 file changed, 6 insertions(+), 21 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index ef444ab53a9b..4eb44ccdc6be 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1742,19 +1742,6 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
}
EXPORT_SYMBOL(clean_bdev_aliases);
-/*
- * Size is a power-of-two in the range 512..PAGE_SIZE,
- * and the case we care about most is PAGE_SIZE.
- *
- * So this *could* possibly be written with those
- * constraints in mind (relevant mostly if some
- * architecture has a slow bit-scan instruction)
- */
-static inline int block_size_bits(unsigned int blocksize)
-{
- return ilog2(blocksize);
-}
-
static struct buffer_head *folio_create_buffers(struct folio *folio,
struct inode *inode,
unsigned int b_state)
@@ -1807,7 +1794,7 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
sector_t block;
sector_t last_block;
struct buffer_head *bh, *head;
- unsigned int blocksize, bbits;
+ size_t blocksize;
int nr_underway = 0;
blk_opf_t write_flags = wbc_to_write_flags(wbc);
@@ -1826,10 +1813,9 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
bh = head;
blocksize = bh->b_size;
- bbits = block_size_bits(blocksize);
- block = (sector_t)folio->index << (PAGE_SHIFT - bbits);
- last_block = (i_size_read(inode) - 1) >> bbits;
+ block = div_u64(folio_pos(folio), blocksize);
+ last_block = div_u64(i_size_read(inode) - 1, blocksize);
/*
* Get all the dirty buffers mapped to disk addresses and
@@ -2355,7 +2341,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
struct inode *inode = folio->mapping->host;
sector_t iblock, lblock;
struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
- unsigned int blocksize, bbits;
+ size_t blocksize;
int nr, i;
int fully_mapped = 1;
bool page_error = false;
@@ -2369,10 +2355,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
head = folio_create_buffers(folio, inode, 0);
blocksize = head->b_size;
- bbits = block_size_bits(blocksize);
- iblock = (sector_t)folio->index << (PAGE_SHIFT - bbits);
- lblock = (limit+blocksize-1) >> bbits;
+ iblock = div_u64(folio_pos(folio), blocksize);
+ lblock = div_u64(limit + blocksize - 1, blocksize);
bh = head;
nr = 0;
i = 0;
--
2.42.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 1/7] buffer: Return bool from grow_dev_folio()
2023-11-09 21:06 ` [PATCH v2 1/7] buffer: Return bool from grow_dev_folio() Matthew Wilcox (Oracle)
@ 2023-11-10 3:43 ` Ryusuke Konishi
2023-12-30 9:23 ` Matthew Wilcox
0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2023-11-10 3:43 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 6:07 AM Matthew Wilcox (Oracle) wrote:
>
> Rename grow_dev_page() to grow_dev_folio() and make it return a bool.
> Document what that bool means; it's more subtle than it first appears.
> Also rename the 'failed' label to 'unlock' beacuse it's not exactly
> 'failed'. It just hasn't succeeded.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> fs/buffer.c | 50 +++++++++++++++++++++++++-------------------------
> 1 file changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 967f34b70aa8..8dad6c691e14 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1024,40 +1024,43 @@ static sector_t folio_init_buffers(struct folio *folio,
> }
>
> /*
> - * Create the page-cache page that contains the requested block.
> + * Create the page-cache folio that contains the requested block.
> *
> * This is used purely for blockdev mappings.
> + *
> + * Returns false if we have a 'permanent' failure. Returns true if
> + * we succeeded, or the caller should retry.
> */
> -static int
> -grow_dev_page(struct block_device *bdev, sector_t block,
> - pgoff_t index, int size, int sizebits, gfp_t gfp)
> +static bool grow_dev_folio(struct block_device *bdev, sector_t block,
> + pgoff_t index, unsigned size, int sizebits, gfp_t gfp)
> {
> struct inode *inode = bdev->bd_inode;
> struct folio *folio;
> struct buffer_head *bh;
> - sector_t end_block;
> - int ret = 0;
> + sector_t end_block = 0;
>
> folio = __filemap_get_folio(inode->i_mapping, index,
> FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
> if (IS_ERR(folio))
> - return PTR_ERR(folio);
> + return false;
>
> bh = folio_buffers(folio);
> if (bh) {
> if (bh->b_size == size) {
> end_block = folio_init_buffers(folio, bdev,
> (sector_t)index << sizebits, size);
> - goto done;
> + goto unlock;
> }
> +
> + /* Caller should retry if this call fails */
> + end_block = ~0ULL;
> if (!try_to_free_buffers(folio))
> - goto failed;
> + goto unlock;
> }
>
> - ret = -ENOMEM;
> bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
> if (!bh)
> - goto failed;
> + goto unlock;
Regarding this folio_alloc_buffers() error path,
If folio_buffers() was NULL, here end_block is 0, so this function
returns false (which means "have a permanent failure").
But, if folio_buffers() existed and they were freed with
try_to_free_buffers() because of bh->b_size != size, here end_block
has been set to ~0ULL, so it seems to return true ("succeeded").
Does this semantic change match your intent?
Otherwise, I think end_block should be set to 0 just before calling
folio_alloc_buffers().
Regards,
Ryusuke Konishi
>
> /*
> * Link the folio to the buffers and initialise them. Take the
> @@ -1069,20 +1072,19 @@ grow_dev_page(struct block_device *bdev, sector_t block,
> end_block = folio_init_buffers(folio, bdev,
> (sector_t)index << sizebits, size);
> spin_unlock(&inode->i_mapping->private_lock);
> -done:
> - ret = (block < end_block) ? 1 : -ENXIO;
> -failed:
> +unlock:
> folio_unlock(folio);
> folio_put(folio);
> - return ret;
> + return block < end_block;
> }
>
> /*
> - * Create buffers for the specified block device block's page. If
> - * that page was dirty, the buffers are set dirty also.
> + * Create buffers for the specified block device block's folio. If
> + * that folio was dirty, the buffers are set dirty also. Returns false
> + * if we've hit a permanent error.
> */
> -static int
> -grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
> +static bool grow_buffers(struct block_device *bdev, sector_t block,
> + unsigned size, gfp_t gfp)
> {
> pgoff_t index;
> int sizebits;
> @@ -1099,11 +1101,11 @@ grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
> "device %pg\n",
> __func__, (unsigned long long)block,
> bdev);
> - return -EIO;
> + return false;
> }
>
> - /* Create a page with the proper size buffers.. */
> - return grow_dev_page(bdev, block, index, size, sizebits, gfp);
> + /* Create a folio with the proper size buffers */
> + return grow_dev_folio(bdev, block, index, size, sizebits, gfp);
> }
>
> static struct buffer_head *
> @@ -1124,14 +1126,12 @@ __getblk_slow(struct block_device *bdev, sector_t block,
>
> for (;;) {
> struct buffer_head *bh;
> - int ret;
>
> bh = __find_get_block(bdev, block, size);
> if (bh)
> return bh;
>
> - ret = grow_buffers(bdev, block, size, gfp);
> - if (ret < 0)
> + if (!grow_buffers(bdev, block, size, gfp))
> return NULL;
> }
> }
> --
> 2.42.0
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE
2023-11-09 21:06 ` [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
@ 2023-11-10 4:50 ` Eric Biggers
2023-11-10 14:26 ` Matthew Wilcox
0 siblings, 1 reply; 26+ messages in thread
From: Eric Biggers @ 2023-11-10 4:50 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Thu, Nov 09, 2023 at 09:06:08PM +0000, Matthew Wilcox (Oracle) wrote:
> Replace the shift with a divide, which is probably cheaper than first
> calculating the shift.
No. The divs are almost certainly more expensive on most CPUs, especially when
doing two of them. On x86_64 for example, it will be two div instructions
instead of one bsr and two shr instructions. The two divs are much more costly.
The block size is always a power of 2; we should take advantage of that.
- Eric
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-09 21:06 ` [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE Matthew Wilcox (Oracle)
@ 2023-11-10 6:37 ` Ryusuke Konishi
2023-11-10 11:29 ` Ryusuke Konishi
2023-11-12 4:52 ` kernel test robot
1 sibling, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2023-11-10 6:37 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 6:06 AM Matthew Wilcox (Oracle) wrote:
>
> We must not shift by a negative number so work in terms of a byte
> offset to avoid the awkward shift left-or-right-depending-on-sign
> option. This means we need to use check_mul_overflow() to ensure
> that a large block number does not result in a wrap.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> fs/buffer.c | 17 ++++++-----------
> 1 file changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 44e0c0b7f71f..9c3f49cf8d28 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1085,26 +1085,21 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
> static bool grow_buffers(struct block_device *bdev, sector_t block,
> unsigned size, gfp_t gfp)
> {
> - pgoff_t index;
> - int sizebits;
> -
> - sizebits = PAGE_SHIFT - __ffs(size);
> - index = block >> sizebits;
> + loff_t pos;
>
> /*
> - * Check for a block which wants to lie outside our maximum possible
> - * pagecache index. (this comparison is done using sector_t types).
> + * Check for a block which lies outside our maximum possible
> + * pagecache index.
> */
> - if (unlikely(index != block >> sizebits)) {
> - printk(KERN_ERR "%s: requested out-of-range block %llu for "
> - "device %pg\n",
> + if (check_mul_overflow(block, size, &pos) || pos > MAX_LFS_FILESIZE) {
> + printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
> __func__, (unsigned long long)block,
> bdev);
> return false;
> }
>
> /* Create a folio with the proper size buffers */
> - return grow_dev_folio(bdev, block, index, size, gfp);
> + return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
"pos" has a loff_t type (= long long type).
Was it okay to do C division directly on 32-bit architectures?
Regards,
Ryusuke Konishi
> }
>
> static struct buffer_head *
> --
> 2.42.0
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE
2023-11-09 21:06 ` [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
@ 2023-11-10 7:48 ` Ryusuke Konishi
2023-11-10 13:36 ` Matthew Wilcox
0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2023-11-10 7:48 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 6:06 AM Matthew Wilcox (Oracle) wrote:
>
> If i_blkbits is larger than PAGE_SHIFT, we shift by a negative number,
> which is undefined. It is safe to shift the block left as a block
> device must be smaller than MAX_LFS_FILESIZE, which is guaranteed to
> fit in loff_t.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
> fs/buffer.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index ab345fd4da12..faf1916200c2 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -199,7 +199,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
> int all_mapped = 1;
> static DEFINE_RATELIMIT_STATE(last_warned, HZ, 1);
>
> - index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
> + index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
Multiple 64-bit divisions are used in this patch, but why not use two
stage shifts as shown below if there is no left shift overflow and the
sign bit will not be set ?
index = ((loff_t)block << bd_inode->i_blkbits) >> PAGE_SHIFT;
Regards,
Ryusuke Konishi
> folio = __filemap_get_folio(bd_mapping, index, FGP_ACCESSED, 0);
> if (IS_ERR(folio))
> goto out;
> @@ -1693,13 +1693,13 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
> struct inode *bd_inode = bdev->bd_inode;
> struct address_space *bd_mapping = bd_inode->i_mapping;
> struct folio_batch fbatch;
> - pgoff_t index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
> + pgoff_t index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
> pgoff_t end;
> int i, count;
> struct buffer_head *bh;
> struct buffer_head *head;
>
> - end = (block + len - 1) >> (PAGE_SHIFT - bd_inode->i_blkbits);
> + end = ((loff_t)(block + len - 1) << bd_inode->i_blkbits) / PAGE_SIZE;
> folio_batch_init(&fbatch);
> while (filemap_get_folios(bd_mapping, &index, end, &fbatch)) {
> count = folio_batch_count(&fbatch);
> @@ -2660,8 +2660,8 @@ int block_truncate_page(struct address_space *mapping,
> return 0;
>
> length = blocksize - length;
> - iblock = (sector_t)index << (PAGE_SHIFT - inode->i_blkbits);
> -
> + iblock = ((loff_t)index * PAGE_SIZE) >> inode->i_blkbits;
> +
> folio = filemap_grab_folio(mapping, index);
> if (IS_ERR(folio))
> return PTR_ERR(folio);
> --
> 2.42.0
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-10 6:37 ` Ryusuke Konishi
@ 2023-11-10 11:29 ` Ryusuke Konishi
2023-11-10 13:23 ` Matthew Wilcox
0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2023-11-10 11:29 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 3:37 PM Ryusuke Konishi wrote:
>
> On Fri, Nov 10, 2023 at 6:06 AM Matthew Wilcox (Oracle) wrote:
> >
> > We must not shift by a negative number so work in terms of a byte
> > offset to avoid the awkward shift left-or-right-depending-on-sign
> > option. This means we need to use check_mul_overflow() to ensure
> > that a large block number does not result in a wrap.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> > fs/buffer.c | 17 ++++++-----------
> > 1 file changed, 6 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 44e0c0b7f71f..9c3f49cf8d28 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1085,26 +1085,21 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
> > static bool grow_buffers(struct block_device *bdev, sector_t block,
> > unsigned size, gfp_t gfp)
> > {
> > - pgoff_t index;
> > - int sizebits;
> > -
> > - sizebits = PAGE_SHIFT - __ffs(size);
> > - index = block >> sizebits;
> > + loff_t pos;
> >
> > /*
> > - * Check for a block which wants to lie outside our maximum possible
> > - * pagecache index. (this comparison is done using sector_t types).
> > + * Check for a block which lies outside our maximum possible
> > + * pagecache index.
> > */
> > - if (unlikely(index != block >> sizebits)) {
> > - printk(KERN_ERR "%s: requested out-of-range block %llu for "
> > - "device %pg\n",
> > + if (check_mul_overflow(block, size, &pos) || pos > MAX_LFS_FILESIZE) {
> > + printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
> > __func__, (unsigned long long)block,
> > bdev);
> > return false;
> > }
> >
> > /* Create a folio with the proper size buffers */
> > - return grow_dev_folio(bdev, block, index, size, gfp);
>
> > + return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
>
> "pos" has a loff_t type (= long long type).
> Was it okay to do C division directly on 32-bit architectures?
>
> Regards,
> Ryusuke Konishi
Similar to the comment for patch 5/7, can we safely use the generally
less expensive shift operation "pos >> PAGE_SHIFT" here ?
Regards,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-10 11:29 ` Ryusuke Konishi
@ 2023-11-10 13:23 ` Matthew Wilcox
0 siblings, 0 replies; 26+ messages in thread
From: Matthew Wilcox @ 2023-11-10 13:23 UTC (permalink / raw)
To: Ryusuke Konishi
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 08:29:41PM +0900, Ryusuke Konishi wrote:
> On Fri, Nov 10, 2023 at 3:37 PM Ryusuke Konishi wrote:
> > > + return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
> >
> > "pos" has a loff_t type (= long long type).
> > Was it okay to do C division directly on 32-bit architectures?
>
> Similar to the comment for patch 5/7, can we safely use the generally
> less expensive shift operation "pos >> PAGE_SHIFT" here ?
If your compiler sees x / 4096 and doesn't optimise it to x >> 12,
you need a better compiler.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE
2023-11-10 7:48 ` Ryusuke Konishi
@ 2023-11-10 13:36 ` Matthew Wilcox
2023-11-10 15:23 ` Ryusuke Konishi
0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2023-11-10 13:36 UTC (permalink / raw)
To: Ryusuke Konishi
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 04:48:02PM +0900, Ryusuke Konishi wrote:
> On Fri, Nov 10, 2023 at 6:06 AM Matthew Wilcox (Oracle) wrote:
> > +++ b/fs/buffer.c
> > @@ -199,7 +199,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
> > int all_mapped = 1;
> > static DEFINE_RATELIMIT_STATE(last_warned, HZ, 1);
> >
> > - index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
> > + index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
>
> Multiple 64-bit divisions are used in this patch, but why not use two
> stage shifts as shown below if there is no left shift overflow and the
> sign bit will not be set ?
>
> index = ((loff_t)block << bd_inode->i_blkbits) >> PAGE_SHIFT;
Here's what the compiler turns that into:
3223: 49 8b 86 80 00 00 00 mov 0x80(%r14),%rax
322a: 4c 89 ee mov %r13,%rsi
322d: ba 01 00 00 00 mov $0x1,%edx
3232: 0f b6 88 c2 00 00 00 movzbl 0xc2(%rax),%ecx
^ this is where we load i_blkbits into ecx
3239: 48 89 45 d0 mov %rax,-0x30(%rbp)
323d: 4c 8b 60 30 mov 0x30(%rax),%r12
3241: 48 d3 e6 shl %cl,%rsi
^ shift left by %cl (the bottom byte of ecx)
3244: 31 c9 xor %ecx,%ecx
3246: 48 c1 ee 0c shr $0xc,%rsi
^ shift right by 12
324a: 4c 89 e7 mov %r12,%rdi
324d: e8 00 00 00 00 call 3252 <__find_get_block+0x82>
324e: R_X86_64_PLT32 __filemap_get_folio-0x4
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE
2023-11-10 4:50 ` Eric Biggers
@ 2023-11-10 14:26 ` Matthew Wilcox
2023-11-11 18:06 ` Eric Biggers
0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2023-11-10 14:26 UTC (permalink / raw)
To: Eric Biggers
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Thu, Nov 09, 2023 at 08:50:19PM -0800, Eric Biggers wrote:
> On Thu, Nov 09, 2023 at 09:06:08PM +0000, Matthew Wilcox (Oracle) wrote:
> > Replace the shift with a divide, which is probably cheaper than first
> > calculating the shift.
>
> No. The divs are almost certainly more expensive on most CPUs, especially when
> doing two of them. On x86_64 for example, it will be two div instructions
> instead of one bsr and two shr instructions. The two divs are much more costly.
> The block size is always a power of 2; we should take advantage of that.
I just looked it up and it's more expensive than I thought, but I don't
see a huge difference here.
DIV has a 23-cycle latency on Skylake (just to pick a relatively recent
CPU). But it has reciprocal throughput of 6, so two DIVs have latency
of 29, not 46. Unless the second DIV depends on the result of the first
(it doesn't).
BSF (the ffs instruction) has latency 3. SHRD has latency 4, but the
SHR is going to depend on the output of the BSF, so they necessarily
add to 7 cycles latency. SHR has reciprocal latency of 2, so
BSF,SHR,SHR will be 9 cycles.
So I've cost 20 cycles, which is about the cost of missing in the L1
cache and hitting in the L2 cache. That doesn't feel too bad to me?
Would you want to invest more engineering time in changing it?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE
2023-11-10 13:36 ` Matthew Wilcox
@ 2023-11-10 15:23 ` Ryusuke Konishi
0 siblings, 0 replies; 26+ messages in thread
From: Ryusuke Konishi @ 2023-11-10 15:23 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 10:36 PM Matthew Wilcox wrote:
>
> On Fri, Nov 10, 2023 at 04:48:02PM +0900, Ryusuke Konishi wrote:
> > On Fri, Nov 10, 2023 at 6:06 AM Matthew Wilcox (Oracle) wrote:
> > > +++ b/fs/buffer.c
> > > @@ -199,7 +199,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
> > > int all_mapped = 1;
> > > static DEFINE_RATELIMIT_STATE(last_warned, HZ, 1);
> > >
> > > - index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
> > > + index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
> >
> > Multiple 64-bit divisions are used in this patch, but why not use two
> > stage shifts as shown below if there is no left shift overflow and the
> > sign bit will not be set ?
> >
> > index = ((loff_t)block << bd_inode->i_blkbits) >> PAGE_SHIFT;
>
> Here's what the compiler turns that into:
>
> 3223: 49 8b 86 80 00 00 00 mov 0x80(%r14),%rax
> 322a: 4c 89 ee mov %r13,%rsi
> 322d: ba 01 00 00 00 mov $0x1,%edx
> 3232: 0f b6 88 c2 00 00 00 movzbl 0xc2(%rax),%ecx
> ^ this is where we load i_blkbits into ecx
> 3239: 48 89 45 d0 mov %rax,-0x30(%rbp)
> 323d: 4c 8b 60 30 mov 0x30(%rax),%r12
> 3241: 48 d3 e6 shl %cl,%rsi
> ^ shift left by %cl (the bottom byte of ecx)
> 3244: 31 c9 xor %ecx,%ecx
> 3246: 48 c1 ee 0c shr $0xc,%rsi
> ^ shift right by 12
> 324a: 4c 89 e7 mov %r12,%rdi
> 324d: e8 00 00 00 00 call 3252 <__find_get_block+0x82>
> 324e: R_X86_64_PLT32 __filemap_get_folio-0x4
Ah, similarly in my environment, these divisions by constant are
optimized (and no link errors happen even on a 32-bit machine).
Sorry for taking up your time with unnecessary comments along with
that for patch 3/7.
Regards,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE
2023-11-10 14:26 ` Matthew Wilcox
@ 2023-11-11 18:06 ` Eric Biggers
2023-11-12 4:52 ` Matthew Wilcox
0 siblings, 1 reply; 26+ messages in thread
From: Eric Biggers @ 2023-11-11 18:06 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 02:26:43PM +0000, Matthew Wilcox wrote:
> Would you want to invest more engineering time in changing it?
It seems logical to just keep the existing approach instead of spending time
trying to justify changing it to a less efficient one (divides).
- Eric
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-09 21:06 ` [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE Matthew Wilcox (Oracle)
2023-11-10 6:37 ` Ryusuke Konishi
@ 2023-11-12 4:52 ` kernel test robot
2023-11-13 17:10 ` Andrew Morton
1 sibling, 1 reply; 26+ messages in thread
From: kernel test robot @ 2023-11-12 4:52 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Andrew Morton
Cc: llvm, oe-kbuild-all, Linux Memory Management List,
Matthew Wilcox (Oracle), Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
Hi Matthew,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master next-20231110]
[cannot apply to v6.6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/buffer-Return-bool-from-grow_dev_folio/20231110-051651
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20231109210608.2252323-4-willy%40infradead.org
patch subject: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
config: hexagon-comet_defconfig (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311121240.AN8GbAbe-lkp@intel.com/
All errors (new ones prefixed by >>):
>> ld.lld: error: undefined symbol: __muloti4
>>> referenced by buffer.c
>>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
>>> referenced by buffer.c
>>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE
2023-11-11 18:06 ` Eric Biggers
@ 2023-11-12 4:52 ` Matthew Wilcox
2023-11-12 6:00 ` Eric Biggers
0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2023-11-12 4:52 UTC (permalink / raw)
To: Eric Biggers
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Sat, Nov 11, 2023 at 10:06:13AM -0800, Eric Biggers wrote:
> On Fri, Nov 10, 2023 at 02:26:43PM +0000, Matthew Wilcox wrote:
> > Would you want to invest more engineering time in changing it?
>
> It seems logical to just keep the existing approach instead of spending time
> trying to justify changing it to a less efficient one (divides).
Except the existing approach doesn't work for block size > PAGE_SIZE
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE
2023-11-12 4:52 ` Matthew Wilcox
@ 2023-11-12 6:00 ` Eric Biggers
0 siblings, 0 replies; 26+ messages in thread
From: Eric Biggers @ 2023-11-12 6:00 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Sun, Nov 12, 2023 at 04:52:56AM +0000, Matthew Wilcox wrote:
> On Sat, Nov 11, 2023 at 10:06:13AM -0800, Eric Biggers wrote:
> > On Fri, Nov 10, 2023 at 02:26:43PM +0000, Matthew Wilcox wrote:
> > > Would you want to invest more engineering time in changing it?
> >
> > It seems logical to just keep the existing approach instead of spending time
> > trying to justify changing it to a less efficient one (divides).
>
> Except the existing approach doesn't work for block size > PAGE_SIZE
A shift does still work; the block size is still a power of 2, after all.
'(sector_t)folio->index << (PAGE_SHIFT - bbits)' just needs to be changed to
'folio_pos(folio) >> bbits'.
- Eric
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-12 4:52 ` kernel test robot
@ 2023-11-13 17:10 ` Andrew Morton
2023-11-13 17:20 ` Nathan Chancellor
0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2023-11-13 17:10 UTC (permalink / raw)
To: kernel test robot
Cc: Matthew Wilcox (Oracle), llvm, oe-kbuild-all,
Linux Memory Management List, Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
On Sun, 12 Nov 2023 12:52:00 +0800 kernel test robot <lkp@intel.com> wrote:
> Hi Matthew,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
> [also build test ERROR on linus/master next-20231110]
> [cannot apply to v6.6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/buffer-Return-bool-from-grow_dev_folio/20231110-051651
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20231109210608.2252323-4-willy%40infradead.org
> patch subject: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
> config: hexagon-comet_defconfig (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/config)
> compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202311121240.AN8GbAbe-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> >> ld.lld: error: undefined symbol: __muloti4
> >>> referenced by buffer.c
> >>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
> >>> referenced by buffer.c
> >>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
>
What a peculiar compiler.
I assume this fixes?
--- a/fs/buffer.c~buffer-fix-grow_buffers-for-block-size-page_size-fix
+++ a/fs/buffer.c
@@ -1099,7 +1099,7 @@ static bool grow_buffers(struct block_de
}
/* Create a folio with the proper size buffers */
- return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
+ return grow_dev_folio(bdev, block, pos >> PAGE_SHIFT, size, gfp);
}
static struct buffer_head *
_
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-13 17:10 ` Andrew Morton
@ 2023-11-13 17:20 ` Nathan Chancellor
2023-11-15 11:14 ` Naresh Kamboju
0 siblings, 1 reply; 26+ messages in thread
From: Nathan Chancellor @ 2023-11-13 17:20 UTC (permalink / raw)
To: Andrew Morton
Cc: kernel test robot, Matthew Wilcox (Oracle), llvm, oe-kbuild-all,
Linux Memory Management List, Hannes Reinecke, Luis Chamberlain,
Pankaj Raghav, linux-fsdevel
On Mon, Nov 13, 2023 at 09:10:06AM -0800, Andrew Morton wrote:
> On Sun, 12 Nov 2023 12:52:00 +0800 kernel test robot <lkp@intel.com> wrote:
>
> > Hi Matthew,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on akpm-mm/mm-everything]
> > [also build test ERROR on linus/master next-20231110]
> > [cannot apply to v6.6]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/buffer-Return-bool-from-grow_dev_folio/20231110-051651
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link: https://lore.kernel.org/r/20231109210608.2252323-4-willy%40infradead.org
> > patch subject: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
> > config: hexagon-comet_defconfig (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/config)
> > compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202311121240.AN8GbAbe-lkp@intel.com/
> >
> > All errors (new ones prefixed by >>):
> >
> > >> ld.lld: error: undefined symbol: __muloti4
> > >>> referenced by buffer.c
> > >>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
> > >>> referenced by buffer.c
> > >>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
> >
>
> What a peculiar compiler.
>
> I assume this fixes?
>
> --- a/fs/buffer.c~buffer-fix-grow_buffers-for-block-size-page_size-fix
> +++ a/fs/buffer.c
> @@ -1099,7 +1099,7 @@ static bool grow_buffers(struct block_de
> }
>
> /* Create a folio with the proper size buffers */
> - return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
> + return grow_dev_folio(bdev, block, pos >> PAGE_SHIFT, size, gfp);
> }
>
> static struct buffer_head *
> _
>
>
No, this is not a division libcall. This seems to be related to the
types of the variables used in __builtin_mul_overflow() :/ for some odd
reason, clang generates a libcall when passing in an 'unsigned long
long' and 'unsigned int', which apparently has not been done before in
the kernel?
https://github.com/ClangBuiltLinux/linux/issues/1958
https://godbolt.org/z/csfGc6z6c
A cast would work around this but that could have other implications I
am not aware of (I've done little further investigation due to LPC):
diff --git a/fs/buffer.c b/fs/buffer.c
index 4eb44ccdc6be..d39934783743 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1091,7 +1091,7 @@ static bool grow_buffers(struct block_device *bdev, sector_t block,
* Check for a block which lies outside our maximum possible
* pagecache index.
*/
- if (check_mul_overflow(block, size, &pos) || pos > MAX_LFS_FILESIZE) {
+ if (check_mul_overflow(block, (u64)size, &pos) || pos > MAX_LFS_FILESIZE) {
printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
__func__, (unsigned long long)block,
bdev);
Cheers,
Nathan
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
2023-11-13 17:20 ` Nathan Chancellor
@ 2023-11-15 11:14 ` Naresh Kamboju
0 siblings, 0 replies; 26+ messages in thread
From: Naresh Kamboju @ 2023-11-15 11:14 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Andrew Morton, kernel test robot, Matthew Wilcox (Oracle), llvm,
oe-kbuild-all, Linux Memory Management List, Hannes Reinecke,
Luis Chamberlain, Pankaj Raghav, linux-fsdevel
Hi Nathan,
On Mon, 13 Nov 2023 at 22:50, Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Mon, Nov 13, 2023 at 09:10:06AM -0800, Andrew Morton wrote:
> > On Sun, 12 Nov 2023 12:52:00 +0800 kernel test robot <lkp@intel.com> wrote:
> >
> > > Hi Matthew,
> > >
> > > kernel test robot noticed the following build errors:
> > >
> > > [auto build test ERROR on akpm-mm/mm-everything]
> > > [also build test ERROR on linus/master next-20231110]
> > > [cannot apply to v6.6]
> > > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > > And when submitting patch, we suggest to use '--base' as documented in
> > > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> > >
> > > url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/buffer-Return-bool-from-grow_dev_folio/20231110-051651
> > > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > > patch link: https://lore.kernel.org/r/20231109210608.2252323-4-willy%40infradead.org
> > > patch subject: [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE
> > > config: hexagon-comet_defconfig (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/config)
> > > compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
> > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231112/202311121240.AN8GbAbe-lkp@intel.com/reproduce)
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <lkp@intel.com>
> > > | Closes: https://lore.kernel.org/oe-kbuild-all/202311121240.AN8GbAbe-lkp@intel.com/
KFT CI also have been noticing this build problem on Linux next.
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > >
> > > All errors (new ones prefixed by >>):
> > >
> > > >> ld.lld: error: undefined symbol: __muloti4
> > > >>> referenced by buffer.c
> > > >>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
> > > >>> referenced by buffer.c
> > > >>> fs/buffer.o:(bdev_getblk) in archive vmlinux.a
> > >
> >
> > What a peculiar compiler.
> >
> > I assume this fixes?
> >
> > --- a/fs/buffer.c~buffer-fix-grow_buffers-for-block-size-page_size-fix
> > +++ a/fs/buffer.c
> > @@ -1099,7 +1099,7 @@ static bool grow_buffers(struct block_de
> > }
> >
> > /* Create a folio with the proper size buffers */
> > - return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
> > + return grow_dev_folio(bdev, block, pos >> PAGE_SHIFT, size, gfp);
> > }
> >
> > static struct buffer_head *
> > _
> >
> >
>
> No, this is not a division libcall. This seems to be related to the
> types of the variables used in __builtin_mul_overflow() :/ for some odd
> reason, clang generates a libcall when passing in an 'unsigned long
> long' and 'unsigned int', which apparently has not been done before in
> the kernel?
>
> https://github.com/ClangBuiltLinux/linux/issues/1958
> https://godbolt.org/z/csfGc6z6c
>
> A cast would work around this but that could have other implications I
> am not aware of (I've done little further investigation due to LPC):
Thanks for providing this fix patch.
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 4eb44ccdc6be..d39934783743 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1091,7 +1091,7 @@ static bool grow_buffers(struct block_device *bdev, sector_t block,
> * Check for a block which lies outside our maximum possible
> * pagecache index.
> */
> - if (check_mul_overflow(block, size, &pos) || pos > MAX_LFS_FILESIZE) {
> + if (check_mul_overflow(block, (u64)size, &pos) || pos > MAX_LFS_FILESIZE) {
> printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
> __func__, (unsigned long long)block,
> bdev);
>
> Cheers,
> Nathan
- Naresh
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 1/7] buffer: Return bool from grow_dev_folio()
2023-11-10 3:43 ` Ryusuke Konishi
@ 2023-12-30 9:23 ` Matthew Wilcox
2023-12-30 12:22 ` Ryusuke Konishi
0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2023-12-30 9:23 UTC (permalink / raw)
To: Ryusuke Konishi
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Fri, Nov 10, 2023 at 12:43:43PM +0900, Ryusuke Konishi wrote:
> On Fri, Nov 10, 2023 at 6:07 AM Matthew Wilcox (Oracle) wrote:
> > + /* Caller should retry if this call fails */
> > + end_block = ~0ULL;
> > if (!try_to_free_buffers(folio))
> > - goto failed;
> > + goto unlock;
> > }
> >
>
> > - ret = -ENOMEM;
> > bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
> > if (!bh)
> > - goto failed;
> > + goto unlock;
>
> Regarding this folio_alloc_buffers() error path,
> If folio_buffers() was NULL, here end_block is 0, so this function
> returns false (which means "have a permanent failure").
>
> But, if folio_buffers() existed and they were freed with
> try_to_free_buffers() because of bh->b_size != size, here end_block
> has been set to ~0ULL, so it seems to return true ("succeeded").
>
> Does this semantic change match your intent?
>
> Otherwise, I think end_block should be set to 0 just before calling
> folio_alloc_buffers().
Thanks for the review, and sorry for taking so long to get back to you.
The change was unintentional (but memory allocation failure wth GFP_KERNEL
happens so rarely that our testing was never going to catch it)
I think I should just move the assignment to end_block inside the 'if'.
It's just more obvious what's going on. Andrew prodded me to be more
explicit about why memory allocation is a "permanent" failure, but
failing to free buffers is not.
I'll turn this into a proper patch submission later.
diff --git a/fs/buffer.c b/fs/buffer.c
index d5ce6b29c893..d3bcf601d3e5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1028,8 +1028,8 @@ static sector_t folio_init_buffers(struct folio *folio,
*
* This is used purely for blockdev mappings.
*
- * Returns false if we have a 'permanent' failure. Returns true if
- * we succeeded, or the caller should retry.
+ * Returns false if we have a failure which cannot be cured by retrying
+ * without sleeping. Returns true if we succeeded, or the caller should retry.
*/
static bool grow_dev_folio(struct block_device *bdev, sector_t block,
pgoff_t index, unsigned size, gfp_t gfp)
@@ -1051,10 +1051,17 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
goto unlock;
}
- /* Caller should retry if this call fails */
- end_block = ~0ULL;
- if (!try_to_free_buffers(folio))
+ /*
+ * Retrying may succeed; for example the folio may finish
+ * writeback, or buffers may be cleaned. This should not
+ * happen very often; maybe we have old buffers attached to
+ * this blockdev's page cache and we're trying to change
+ * the block size?
+ */
+ if (!try_to_free_buffers(folio)) {
+ end_block = ~0ULL;
goto unlock;
+ }
}
bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 1/7] buffer: Return bool from grow_dev_folio()
2023-12-30 9:23 ` Matthew Wilcox
@ 2023-12-30 12:22 ` Ryusuke Konishi
0 siblings, 0 replies; 26+ messages in thread
From: Ryusuke Konishi @ 2023-12-30 12:22 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hannes Reinecke, Luis Chamberlain, Pankaj Raghav,
linux-fsdevel
On Sat, Dec 30, 2023 at 6:23 PM Matthew Wilcox wrote:
>
> On Fri, Nov 10, 2023 at 12:43:43PM +0900, Ryusuke Konishi wrote:
> > On Fri, Nov 10, 2023 at 6:07 AM Matthew Wilcox (Oracle) wrote:
> > > + /* Caller should retry if this call fails */
> > > + end_block = ~0ULL;
> > > if (!try_to_free_buffers(folio))
> > > - goto failed;
> > > + goto unlock;
> > > }
> > >
> >
> > > - ret = -ENOMEM;
> > > bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
> > > if (!bh)
> > > - goto failed;
> > > + goto unlock;
> >
> > Regarding this folio_alloc_buffers() error path,
> > If folio_buffers() was NULL, here end_block is 0, so this function
> > returns false (which means "have a permanent failure").
> >
> > But, if folio_buffers() existed and they were freed with
> > try_to_free_buffers() because of bh->b_size != size, here end_block
> > has been set to ~0ULL, so it seems to return true ("succeeded").
> >
> > Does this semantic change match your intent?
> >
> > Otherwise, I think end_block should be set to 0 just before calling
> > folio_alloc_buffers().
>
> Thanks for the review, and sorry for taking so long to get back to you.
> The change was unintentional (but memory allocation failure wth GFP_KERNEL
> happens so rarely that our testing was never going to catch it)
>
> I think I should just move the assignment to end_block inside the 'if'.
> It's just more obvious what's going on.
Agree. I also think this fix is better.
Regards,
Ryusuke Konishi
> Andrew prodded me to be more
> explicit about why memory allocation is a "permanent" failure, but
> failing to free buffers is not.
>
> I'll turn this into a proper patch submission later.
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index d5ce6b29c893..d3bcf601d3e5 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1028,8 +1028,8 @@ static sector_t folio_init_buffers(struct folio *folio,
> *
> * This is used purely for blockdev mappings.
> *
> - * Returns false if we have a 'permanent' failure. Returns true if
> - * we succeeded, or the caller should retry.
> + * Returns false if we have a failure which cannot be cured by retrying
> + * without sleeping. Returns true if we succeeded, or the caller should retry.
> */
> static bool grow_dev_folio(struct block_device *bdev, sector_t block,
> pgoff_t index, unsigned size, gfp_t gfp)
> @@ -1051,10 +1051,17 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
> goto unlock;
> }
>
> - /* Caller should retry if this call fails */
> - end_block = ~0ULL;
> - if (!try_to_free_buffers(folio))
> + /*
> + * Retrying may succeed; for example the folio may finish
> + * writeback, or buffers may be cleaned. This should not
> + * happen very often; maybe we have old buffers attached to
> + * this blockdev's page cache and we're trying to change
> + * the block size?
> + */
> + if (!try_to_free_buffers(folio)) {
> + end_block = ~0ULL;
> goto unlock;
> + }
> }
>
> bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2023-12-30 12:22 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-09 21:06 [PATCH v2 0/7] More buffer_head cleanups Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 1/7] buffer: Return bool from grow_dev_folio() Matthew Wilcox (Oracle)
2023-11-10 3:43 ` Ryusuke Konishi
2023-12-30 9:23 ` Matthew Wilcox
2023-12-30 12:22 ` Ryusuke Konishi
2023-11-09 21:06 ` [PATCH v2 2/7] buffer: Calculate block number inside folio_init_buffers() Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 3/7] buffer: Fix grow_buffers() for block size > PAGE_SIZE Matthew Wilcox (Oracle)
2023-11-10 6:37 ` Ryusuke Konishi
2023-11-10 11:29 ` Ryusuke Konishi
2023-11-10 13:23 ` Matthew Wilcox
2023-11-12 4:52 ` kernel test robot
2023-11-13 17:10 ` Andrew Morton
2023-11-13 17:20 ` Nathan Chancellor
2023-11-15 11:14 ` Naresh Kamboju
2023-11-09 21:06 ` [PATCH v2 4/7] buffer: Cast block to loff_t before shifting it Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 5/7] buffer: Fix various functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
2023-11-10 7:48 ` Ryusuke Konishi
2023-11-10 13:36 ` Matthew Wilcox
2023-11-10 15:23 ` Ryusuke Konishi
2023-11-09 21:06 ` [PATCH v2 6/7] buffer: Handle large folios in __block_write_begin_int() Matthew Wilcox (Oracle)
2023-11-09 21:06 ` [PATCH v2 7/7] buffer: Fix more functions for block size > PAGE_SIZE Matthew Wilcox (Oracle)
2023-11-10 4:50 ` Eric Biggers
2023-11-10 14:26 ` Matthew Wilcox
2023-11-11 18:06 ` Eric Biggers
2023-11-12 4:52 ` Matthew Wilcox
2023-11-12 6:00 ` Eric Biggers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).