All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at fs/ext4/inode.c:1914 - page_buffers()
@ 2022-11-23 14:48 Ivan Zahariev
  2022-12-05 17:27 ` Ivan Zahariev
  2022-12-05 21:10 ` Theodore Ts'o
  0 siblings, 2 replies; 20+ messages in thread
From: Ivan Zahariev @ 2022-11-23 14:48 UTC (permalink / raw)
  To: linux-ext4; +Cc: Theodore Ts'o, Greg Kroah-Hartman

Hello,

Starting with kernel 5.15 for the past eight months we have a total of 
12 kernel panics at a fleet of 1000 KVM/Qemu machines which look the 
following way:

     kernel BUG at fs/ext4/inode.c:1914

Switching from kernel 4.14 to 5.15 almost immediately triggered the 
problem. Therefore we are very confident that userland activity is more 
or less the same and is not the root cause. The kernel function which 
triggers the BUG is __ext4_journalled_writepage(). In 5.15 the code for 
__ext4_journalled_writepage() in "fs/ext4/inode.c" is the same as the 
current kernel "master". The line where the BUG is triggered is:

     struct buffer_head *page_bufs = page_buffers(page)

The definition of "page_buffers(page)" in "include/linux/buffer_head.h" 
hasn't changed since 4.14, so no difference here. This is where the 
actual "kernel BUG" event is triggered:

     /* If we *know* page->private refers to buffer_heads */
     #define page_buffers(page) \
         ({ \
             BUG_ON(!PagePrivate(page)); \
             ((struct buffer_head *)page_private(page)); \
         })
     #define page_has_buffers(page) PagePrivate(page)

Initially I thought that the issue is already discussed here: 
https://lore.kernel.org/all/Yg0m6IjcNmfaSokM@google.com/
But this seems to be another (solved) problem and Theodore Ts'o already 
made a quick fix by simply reporting the rare occurrence and continuing 
forward. The commit is in 5.15 (and in the latest kernel), so it's not 
helping our case: 
https://github.com/torvalds/linux/commit/cc5095747edfb054ca2068d01af20be3fcc3634f

Back to the problem! 99% of the difference between 4.14 and the latest 
kernel for __ext4_journalled_writepage() in "fs/ext4/inode.c" comes from 
the following commit: 
https://github.com/torvalds/linux/commit/5c48a7df91499e371ef725895b2e2d21a126e227

Is it safe that we revert this patch on the latest 5.15 kernel, so that 
we can confirm if this resolves the issue for us?

Best regards.
--Ivan



^ permalink raw reply	[flat|nested] 20+ messages in thread
* kernel BUG at fs/ext4/inode.c:1914 - page_buffers()
@ 2023-10-04  9:37 Mathieu Othacehe
  2023-10-04 10:10 ` Sasha Levin
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Othacehe @ 2023-10-04  9:37 UTC (permalink / raw)
  To: stable; +Cc: jack, Marcus Hoffmann, tytso, famzah, gregkh, anton.reding

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]


Hello,

I have been experimenting this issue:
https://www.spinics.net/lists/linux-ext4/msg86259.html, on a 5.15
kernel.

This issue caused by 5c48a7df9149 ("ext4: fix an use-after-free issue
about data=journal writeback mode") is affecting ext4 users with
data=journal on all stable kernels.

Jan proposed a fix here
https://www.spinics.net/lists/linux-ext4/msg87054.html which solves the
situation for me.

Now this fix is not upstream because the data journaling support has
been rewritten. As suggested by Jan, that would mean that we could
either backport the following patches from upstream:

bd159398a2d2 ("jdb2: Don't refuse invalidation of already invalidated buffers")
d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
265e72efa99f ("ext4: Keep pages with journalled data dirty")
5e1bdea6391d ("ext4: Clear dirty bit from pages without data to write")
1f1a55f0bf06 ("ext4: Commit transaction before writing back pages in data=journal mode")
e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()")
c000dfec7e88 ("ext4: Drop special handling of journalled data from extent shifting operations")
783ae448b7a2 ("ext4: Fix special handling of journalled data from extent zeroing")
56c2a0e3d90d ("ext4: Drop special handling of journalled data from ext4_evict_inode()")
7c375870fdc5 ("ext4: Drop special handling of journalled data from ext4_quota_on()")
951cafa6b80e ("ext4: Simplify handling of journalled data in ext4_bmap()")
ab382539adcb ("ext4: Update comment in mpage_prepare_extent_to_map()")
d0ab8368c175 ("Revert "ext4: Fix warnings when freezing filesystem with journaled data"")
1077b2d53ef5 ("ext4: fix fsync for non-directories")

Or apply the proposed, attached patch. Do you think that would be an
option?

Thanks,

Mathieu


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-fix-ext4-journalled-crash.patch --]
[-- Type: text/x-patch, Size: 3228 bytes --]

From 17ec3d08a7878625c08ab37c45a8dc3c619db7fb Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@xxxxxxx>
Date: Thu, 12 Jan 2023 14:46:12 +0100
Subject: [PATCH] ext4: Fix crash in __ext4_journalled_writepage()

When __ext4_journalled_writepage() unlocks the page, there's nothing
that prevents another process from finding the page and reclaiming
buffers from it (because we have cleaned the page dirty bit and buffers
needn't have the dirty bit set). When that happens we crash in
__ext4_journalled_writepage() when trying to get the page buffers. Fix
the problem by redirtying the page before unlocking it (so that reclaim
and other users know the page isn't written yet) and by checking the
page is still dirty after reacquiring the page lock. This should also
make sure the page still has buffers.

Fixes: 5c48a7df9149 ("ext4: fix an use-after-free issue about data=journal writeback mode")
CC: stable@xxxxxxxxxxxxxxx
Signed-off-by: Jan Kara <jack@xxxxxxx>
---
 fs/ext4/inode.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 64a783f22105..b9f1fd05cec6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -138,7 +138,6 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
 
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
 				unsigned int length);
-static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_meta_trans_blocks(struct inode *inode, int lblocks,
 				  int pextents);
 
@@ -1858,7 +1857,8 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 	return 0;
 }
 
-static int __ext4_journalled_writepage(struct page *page,
+static int __ext4_journalled_writepage(struct writeback_control *wbc,
+				       struct page *page,
 				       unsigned int len)
 {
 	struct address_space *mapping = page->mapping;
@@ -1869,8 +1869,6 @@ static int __ext4_journalled_writepage(struct page *page,
 	struct buffer_head *inode_bh = NULL;
 	loff_t size;
 
-	ClearPageChecked(page);
-
 	if (inline_data) {
 		BUG_ON(page->index != 0);
 		BUG_ON(len > ext4_get_max_inline_size(inode));
@@ -1884,6 +1882,7 @@ static int __ext4_journalled_writepage(struct page *page,
 	 * out from under us.
 	 */
 	get_page(page);
+	redirty_page_for_writepage(wbc, page);
 	unlock_page(page);
 
 	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
@@ -1897,8 +1896,10 @@ static int __ext4_journalled_writepage(struct page *page,
 
 	lock_page(page);
 	put_page(page);
+	ClearPageChecked(page);
 	size = i_size_read(inode);
-	if (page->mapping != mapping || page_offset(page) > size) {
+	if (page->mapping != mapping || page_offset(page) >= size ||
+	    !clear_page_dirty_for_io(page)) {
 		/* The page got truncated from under us */
 		ext4_journal_stop(handle);
 		ret = 0;
@@ -2055,7 +2056,7 @@ static int ext4_writepage(struct page *page,
 		 * It's mmapped pagecache.  Add buffers and journal it.  There
 		 * doesn't seem much point in redirtying the page here.
 		 */
-		return __ext4_journalled_writepage(page, len);
+		return __ext4_journalled_writepage(wbc, page, len);
 
 	ext4_io_submit_init(&io_submit, wbc);
 	io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-06  9:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-23 14:48 kernel BUG at fs/ext4/inode.c:1914 - page_buffers() Ivan Zahariev
2022-12-05 17:27 ` Ivan Zahariev
2022-12-05 21:10 ` Theodore Ts'o
2022-12-05 21:50   ` Ivan Zahariev
2023-01-12 15:07   ` Jan Kara
2023-03-15 11:27     ` Ivan Zahariev
2023-03-15 17:32       ` Jan Kara
2023-03-15 18:57         ` Theodore Ts'o
2023-05-11  9:21           ` Marcus Hoffmann
2023-05-12 12:19             ` Greg KH
2023-05-12 14:24               ` Marcus Hoffmann
2023-05-12 22:50                 ` Greg KH
2023-05-15  9:10                 ` Jan Kara
2023-09-20  9:40             ` Mathieu Othacehe
2023-09-22  9:54               ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2023-10-04  9:37 Mathieu Othacehe
2023-10-04 10:10 ` Sasha Levin
2023-10-05  3:54   ` Theodore Ts'o
2023-10-05  7:08   ` Mathieu Othacehe
2023-10-06  9:15     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.