From: Matthew Wilcox <willy@infradead.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: hare@suse.de, dave@stgolabs.net, david@fromorbit.com,
djwong@kernel.org, kbusch@kernel.org, john.g.garry@oracle.com,
hch@lst.de, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-block@vger.kernel.org, gost.dev@samsung.com,
p.raghav@samsung.com, da.gomez@samsung.com,
kernel@pankajraghav.com
Subject: Re: [PATCH 0/5] fs/buffer: strack reduction on async read
Date: Thu, 19 Dec 2024 03:51:34 +0000 [thread overview]
Message-ID: <Z2OYRkpRcUFIOFog@casper.infradead.org> (raw)
In-Reply-To: <Z2OEmALBGB8ARLlc@bombadil.infradead.org>
On Wed, Dec 18, 2024 at 06:27:36PM -0800, Luis Chamberlain wrote:
> On Wed, Dec 18, 2024 at 08:05:29PM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 17, 2024 at 06:26:21PM -0800, Luis Chamberlain wrote:
> > > This splits up a minor enhancement from the bs > ps device support
> > > series into its own series for better review / focus / testing.
> > > This series just addresses the reducing the array size used and cleaning
> > > up the async read to be easier to read and maintain.
> >
> > How about this approach instead -- get rid of the batch entirely?
>
> Less is more! I wish it worked, but we end up with a null pointer on
> ext4/032 (and indeed this is the test that helped me find most bugs in
> what I was working on):
Yeah, I did no testing; just wanted to give people a different approach
to consider.
> [ 106.034851] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 106.046300] RIP: 0010:end_buffer_async_read_io+0x11/0x90
> [ 106.047819] Code: f2 ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 8b 47 10 48 89 fb 48 8b 40 18 <48> 8b 00 f6 40 0d 40 74 0d 0f b7 00 66 25 00 f0 66 3d 00 80 74 09
That decodes as:
5: 53 push %rbx
6: 48 8b 47 10 mov 0x10(%rdi),%rax
a: 48 89 fb mov %rdi,%rbx
d: 48 8b 40 18 mov 0x18(%rax),%rax
11:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
14: f6 40 0d 40 testb $0x40,0xd(%rax)
6: bh->b_folio
d: b_folio->mapping
11: mapping->host
So folio->mapping is NULL.
Ah, I see the problem. end_buffer_async_read() uses the buffer_async_read
test to decide if all buffers on the page are uptodate or not. So both
having no batch (ie this patch) and having a batch which is smaller than
the number of buffers in the folio can lead to folio_end_read() being
called prematurely (ie we'll unlock the folio before finishing reading
every buffer in the folio).
Once the folio is unlocked, it can be truncated. That's a second-order
problem, but it's the one your test happened to hit.
This should fix the problem; we always have at least one BH held in
the submission path with the async_read flag set, so
end_buffer_async_read() will not end it prematurely.
By the way, do you have CONFIG_VM_DEBUG enabled in your testing?
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
in folio_end_read() should have tripped before hitting the race with
truncate.
diff --git a/fs/buffer.c b/fs/buffer.c
index cc8452f60251..fd2633e4a5d2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2361,9 +2361,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
{
struct inode *inode = folio->mapping->host;
sector_t iblock, lblock;
- struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+ struct buffer_head *bh, *head, *prev = NULL;
size_t blocksize;
- int nr, i;
+ int i;
int fully_mapped = 1;
bool page_error = false;
loff_t limit = i_size_read(inode);
@@ -2380,7 +2380,6 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
iblock = div_u64(folio_pos(folio), blocksize);
lblock = div_u64(limit + blocksize - 1, blocksize);
bh = head;
- nr = 0;
i = 0;
do {
@@ -2411,40 +2410,33 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
if (buffer_uptodate(bh))
continue;
}
- arr[nr++] = bh;
+
+ lock_buffer(bh);
+ if (buffer_uptodate(bh)) {
+ unlock_buffer(bh);
+ continue;
+ }
+
+ mark_buffer_async_read(bh);
+ if (prev)
+ submit_bh(REQ_OP_READ, prev);
+ prev = bh;
} while (i++, iblock++, (bh = bh->b_this_page) != head);
if (fully_mapped)
folio_set_mappedtodisk(folio);
- if (!nr) {
- /*
- * All buffers are uptodate or get_block() returned an
- * error when trying to map them - we can finish the read.
- */
- folio_end_read(folio, !page_error);
- return 0;
- }
-
- /* Stage two: lock the buffers */
- for (i = 0; i < nr; i++) {
- bh = arr[i];
- lock_buffer(bh);
- mark_buffer_async_read(bh);
- }
-
/*
- * Stage 3: start the IO. Check for uptodateness
- * inside the buffer lock in case another process reading
- * the underlying blockdev brought it uptodate (the sct fix).
+ * All buffers are uptodate or get_block() returned an error
+ * when trying to map them - we must finish the read because
+ * end_buffer_async_read() will never be called on any buffer
+ * in this folio.
*/
- for (i = 0; i < nr; i++) {
- bh = arr[i];
- if (buffer_uptodate(bh))
- end_buffer_async_read(bh, 1);
- else
- submit_bh(REQ_OP_READ, bh);
- }
+ if (prev)
+ submit_bh(REQ_OP_READ, prev);
+ else
+ folio_end_read(folio, !page_error);
+
return 0;
}
EXPORT_SYMBOL(block_read_full_folio);
next prev parent reply other threads:[~2024-12-19 3:51 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-18 2:26 [PATCH 0/5] fs/buffer: strack reduction on async read Luis Chamberlain
2024-12-18 2:26 ` [PATCH 1/5] fs/buffer: move async batch read code into a helper Luis Chamberlain
2024-12-18 2:26 ` [PATCH 2/5] fs/buffer: simplify block_read_full_folio() with bh_offset() Luis Chamberlain
2024-12-18 2:26 ` [PATCH 3/5] fs/buffer: add a for_each_bh() for block_read_full_folio() Luis Chamberlain
2024-12-18 19:20 ` Matthew Wilcox
2024-12-18 2:26 ` [PATCH 4/5] fs/buffer: add iteration support " Luis Chamberlain
2024-12-18 2:26 ` [PATCH 5/5] fs/buffer: reduce stack usage on bh_read_iter() Luis Chamberlain
2024-12-18 2:47 ` Luis Chamberlain
2024-12-18 20:05 ` [PATCH 0/5] fs/buffer: strack reduction on async read Matthew Wilcox
2024-12-19 2:27 ` Luis Chamberlain
2024-12-19 3:51 ` Matthew Wilcox [this message]
2024-12-30 17:30 ` Luis Chamberlain
2025-01-31 16:54 ` Luis Chamberlain
2025-01-31 22:01 ` Matthew Wilcox
2025-02-03 14:00 ` Luis Chamberlain
2024-12-19 6:28 ` Christoph Hellwig
2024-12-19 17:53 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2OYRkpRcUFIOFog@casper.infradead.org \
--to=willy@infradead.org \
--cc=da.gomez@samsung.com \
--cc=dave@stgolabs.net \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=kernel@pankajraghav.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ritesh.list@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.