[Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
@ 2017-05-12 13:31 bugzilla-daemon
  2017-05-12 14:53 ` [Bug 195729] " bugzilla-daemon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 13:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

            Bug ID: 195729
           Summary: JBD2: Spotted dirty metadata buffer (dev = sda1,
                    blocknr = 1766784).There's a risk of filesystem
                    corruption in case of system crash.
           Product: File System
           Version: 2.5
    Kernel Version: 3.10.27
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: jqiaoulk@gmail.com
        Regression: No

Hi:

I recently see the error message below when doing the following mount
option in hard drive, after a while, Kernel reports BUG_ON.

The standard kernel version is 3.10.27. This issue can been seen in
many other customer devices with this kernel version.

There are several things worthwhile to mention beforehand:

1. I don't believe its a hardware related issue since it could be
duplicated in our many devices.

2. I don't think it is a real Filesystem corruption since these box
never reboot or remount during the test.

3. Devices are under heavy memory pressure when this issue occur,

4. It is not straightforward and might involve tremendous work to
switch to the latest kernel version.

Mount Option:

/dev/sda1 on /mnt/ type ext4
(rw,nosuid,relatime,nodelalloc,journal_checksum,nobarrier,data=journal)

"

JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).
There's a risk of filesystem corruption in case of system crash.

...........................................................

kernel BUG at fs/jbd2/commit.c:1059!
[<801b0e10>] (jbd2_journal_commit_transaction+0x1304/0x177c) from
[<801b569c>] (kjournald2+0xa8/0x248)
[<801b569c>] (kjournald2+0xa8/0x248) from [<800414c8>] (kthread+0xa0/0xac)
[<800414c8>] (kthread+0xa0/0xac) from [<8000dc98>] (ret_from_fork+0x14/0x3c)

"

The code is listed below at Line 1059:

jbd2_journal_commit_transaction

{

         .....................................................

J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
* it has been freed by this transaction and hence it
* could not have been reallocated until this

}

So, I firstly get the call stack to understand why JBD2 reports
Spotted dirty metadata buffer. The dump stack is listed below:

4,173710,227778211328,-;[<80015cf0>] (unwind_backtrace+0x0/0x118) from
[<80011f98>] (show_stack+0x10/0x14)
4,173711,227778220057,-;[<80011f98>] (show_stack+0x10/0x14) from
[<80013b5c>] (ipi_backtrace+0x70/0xa8)
4,173712,227778228492,-;[<80013b5c>] (ipi_backtrace+0x70/0xa8) from
[<80013cf0>] (arch_trigger_all_cpu_backtrace+0x40/0xe0)
4,173713,227778238689,-;[<80013cf0>]
(arch_trigger_all_cpu_backtrace+0x40/0xe0) from [<801ac99c>]
(warn_dirty_buffer+0x2c/0x38)
4,173714,227778249227,-;[<801ac99c>] (warn_dirty_buffer+0x2c/0x38)
from [<801adcc4>] (__jbd2_journal_file_buffer+0xa8/0x1d0)
4,173715,227778259500,-;[<801adcc4>]
(__jbd2_journal_file_buffer+0xa8/0x1d0) from [<801ae8cc>]
(jbd2_journal_dirty_metadata+0x1ec/0x258)
4,173716,227778270828,-;[<801ae8cc>]
(jbd2_journal_dirty_metadata+0x1ec/0x258) from [<8018f680>]
(__ext4_handle_dirty_metadata+0xe0/0x190)
4,173717,227778282332,-;[<8018f680>]
(__ext4_handle_dirty_metadata+0xe0/0x190) from [<8016949c>]
(write_end_fn+0x50/0x7c)
4,173718,227778292340,-;[<8016949c>] (write_end_fn+0x50/0x7c) from
[<8016a860>] (ext4_walk_page_buffers+0x88/0xa0)
4,173719,227778301741,-;[<8016a860>]
(ext4_walk_page_buffers+0x88/0xa0) from [<8016ee8c>]
(ext4_journalled_write_end+0x160/0x308)
4,173720,227778312462,-;[<8016ee8c>]
(ext4_journalled_write_end+0x160/0x308) from [<800afd44>]
(generic_file_buffered_write+0x154/0x224)
4,173721,227778323775,-;[<800afd44>]
(generic_file_buffered_write+0x154/0x224) from [<800b146c>]
(__generic_file_aio_write+0x350/0x3a0)
4,173722,227778334999,-;[<800b146c>]
(__generic_file_aio_write+0x350/0x3a0) from [<800b1510>]
(generic_file_aio_write+0x54/0xb4)
4,173723,227778345615,-;[<800b1510>]
(generic_file_aio_write+0x54/0xb4) from [<80165478>]
(ext4_file_write+0x348/0x430)
4,173724,227778355453,-;[<80165478>] (ext4_file_write+0x348/0x430)
from [<800ef3ec>] (do_sync_readv_writev+0x74/0x98)
4,173725,227778365115,-;[<800ef3ec>] (do_sync_readv_writev+0x74/0x98)
from [<800f01e8>] (do_readv_writev+0xd4/0x200)
4,173726,227778374690,-;[<800f01e8>] (do_readv_writev+0xd4/0x200) from
[<800f03dc>] (vfs_writev+0x58/0x70)
4,173727,227778383399,-;[<800f03dc>] (vfs_writev+0x58/0x70) from
[<800f0494>] (SyS_writev+0x38/0x68)
4,173728,227778391586,-;[<800f0494>] (SyS_writev+0x38/0x68) from
[<8000dbc0>] (ret_fast_syscall+0x0/0x30)

Based on the comment of the JBD2 code in __jbd2_journal_file_buffer, I
quote "For metadata buffers, we track dirty bit in buffer_jbddirty
instead of buffer_dirty. We should not see a dirty bit set here
because we clear it in do_get_write_access but e.g.
tune2fs can modify the sb and set the dirty bit at any time so we try
to gracefully handle that."
it means jbd2 code shouldn't set this buffer_dirty anyway.

and Another comment in do_get_write_access() in transaction.c, it says
more, i quote " it is journaled, and we don't expect dirty buffers
in that state (the buffers should be marked JBD_Dirty instead.)  So
either the IO is being done under our own control and this is a bug,
or it's a third party IO such as dump(8) (which may leave the buffer
scheduled for read --- ie. locked but not dirty) or tune2fs (which may
actually havethe buffer dirtied, ugh.)
"

In our case, the dirty bit of buffer_head is changed between
ext4_write_begin() and ext4_journalled_write_end() as bleow:

"
ext4_write_begin() --> BH dirty bit is 0

ext4_journalled_write_end() --> BH dirty bit is changed to 1 !!! and
error is reported
"

Then I do an experiment change as below in ext4_journalled_write_end()
to see whether the original issue could be duplicated.
After almost a week test, the issue doesn't appear. Therefore, it
seems there are some race conditions that change the value
of BH dirty between ext4_write_begin() and
ext4_journalled_write_end(). (Note: do_journal_get_write_access has
already been
called at ext4_write_begin )

+               ret = ext4_walk_page_buffers(handle, page_buffers(page), from,
+                                            to,
&partial,do_journal_get_write_access);
+
                ret = ext4_walk_page_buffers(handle, page_buffers(page), from,
                                             to, &partial, write_end_fn);

After reviewing the code, it looks although
do_journal_get_write_access() imposes lock on buffer header, it
doesn't lock the whole routine between ext4_write_begin() and
ext4_journalled_write_end(). so i think the buffer head should be
locked at the time until ext4_journalled_write_end() is finished.
Initially, I imagine that I can do the following change but then I
realise a existing buffer lock has already been placed in
jbd2_journal_get_write_access, so making a big buffer lock will
quickly become a performance issue and
also involves a lot of changes.

+ lock all the buffer headers belonging to this page

ext4_write_begin()
.....................................................
 ext4_journalled_write_end()

+ unlock all the buffer headers belonging to this page

Therefore, I propose the following change for simplicity and performance:

--- /linux-3.10.27_old/fs/ext4/inode.c 2014-01-15 23:29:14.000000000 +0000
+++/linux-3.10.27_new/fs/ext4/inode.c 2017-05-11 22:47:58.986146486 +0100
@@ -1084,10 +1084,18 @@
  int ret;
  if (!buffer_mapped(bh) || buffer_freed(bh))
  return 0;
+
+ lock_buffer();
+ if (buffer_dirty(bh)) {
+     clear_buffer_dirty(bh);
+ }
+
  set_buffer_uptodate(bh);
  ret = ext4_handle_dirty_metadata(handle, NULL, bh);
  clear_buffer_meta(bh);
  clear_buffer_prio(bh);
+
+ unlock_buffer();
  return ret;
 }

Thanks
Jmqiao

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 195729] JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
  2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
@ 2017-05-12 14:53 ` bugzilla-daemon
  2017-05-12 14:53 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 14:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

--- Comment #1 from jqiaoulk@gmail.com ---
It tells us that clk_set_parent() will potentially goes to sleep under some
circumstances(in this case, to acquire the mutex lock).

Therefore,with regard to the Driver caller routine shown below, it must
incorrect runs in critical section, which possible call spin_lock() or other
related functions which disable the preemption. Since it is not in the IRQ
context, it is very likely spink lock is called in this failure case. Looking
into the Driver code, should give the clue.

[Fri May 05 05:26:09.992 2017] [40636.613352] [<7eea79ec>]
(vibe_os_clk_set_parent+0x34/0xb0 [vibe_os]) from [<7eec3cd0>]
(_ZN12CSTmClockLLA6EnableE29stm_clk_divider_output_name_t31stm_clk_divider_output_source_t31stm_clk_divider_output_divide_t+0xd4/0xf0
[stmcore_display_stiH407])
[Fri May 05 05:26:10.017 2017] [40636.635408] [<7eec3cd0>]
(_ZN12CSTmClockLLA6EnableE29stm_clk_divider_output_name_t31stm_clk_divider_output_source_t31stm_clk_divider_output_divide_t+0xd4/0xf0
[stmcore_display_stiH407]) from [<7eec8c2c>]
(_ZN16CSTmMainTVOutput15SetOutputFormatEj+0xc8/0x1b0 [stmcore_display_stiH407])
[Fri May 05 05:26:10.036 2017] [40636.660506] [<7eec8c2c>]
(_ZN16CSTmMainTVOutput15SetOutputFormatEj+0xc8/0x1b0 [stmcore_display_stiH407])
from [<7eebbd40>]
(_ZN16CSTmMasterOutput5StartEPK18stm_display_mode_s+0x1b8/0x1f8
[stmcore_display_stiH407])
[Fri May 05 05:26:10.069 2017] [40636.679522] [<7eebbd40>]
(_ZN16CSTmMasterOutput5StartEPK18stm_display_mode_s+0x1b8/0x1f8
[stmcore_display_stiH407]) from [<7eeba0ac>]
(stm_display_output_start+0x6c/0xb4 [stmcore_display_stiH407])
[Fri May 05 05:26:10.069 2017] [40636.697123] [<7eeba0ac>]
(stm_display_output_start+0x6c/0xb4 [stmcore_display_stiH407]) from
[<7f444650>] (dil_blender_output_start+0x8c/0x17c [stm_cdi_dvb])
[Fri May 05 05:26:10.079 2017] [40636.711321] [<7f444650>]
(dil_blender_output_start+0x8c/0x17c [stm_cdi_dvb]) from [<7f45717c>]
(cdi_hdmi_exit+0xb64/0xbc4 [stm_cdi_dvb])
[Fri May 05 05:26:10.092 2017] [40636.723660] [<7f45717c>]
(cdi_hdmi_exit+0xb64/0xbc4 [stm_cdi_dvb]) from [<7f4578a4>]
(dil_hdmi_set_video_code+0x4c/0x1d4 [stm_cdi_dvb])
[Fri May 05 05:26:10.104 2017] [40636.735912] [<7f4578a4>]
(dil_hdmi_set_video_code+0x4c/0x1d4 [stm_cdi_dvb]) from [<7f44ed74>]
(hdmi_release+0x316c/0x781c [stm_cdi_dvb])
[Fri May 05 05:26:10.151 2017] [40636.748251] [<7f44ed74>]
(hdmi_release+0x316c/0x781c [stm_cdi_dvb]) from [<7f4539a0>]
(hdmi_ioctl+0x57c/0x2d40 [stm_cdi_dvb])
[Fri May 05 05:26:10.152 2017] [40636.759631] [<7f4539a0>]
(hdmi_ioctl+0x57c/0x2d40 [stm_cdi_dvb]) from [<800fd4dc>] (vfs_ioctl+0x28/0x3c)
[Fri May 05 05:26:10.152 2017] [40636.769137] [<800fd4dc>]
(vfs_ioctl+0x28/0x3c) from [<800fdf48>] (do_vfs_ioctl+0x4b0/0x560)
[Fri May 05 05:26:10.152 2017] [40636.777484] [<800fdf48>]
(do_vfs_ioctl+0x4b0/0x560) from [<800fe02c>] (SyS_ioctl+0x34/0x60)
[Fri May 05 05:26:10.152 2017] [40636.785836] [<800fe02c>]
(SyS_ioctl+0x34/0x60) from [<8000dbc0>] (ret_fast_syscall+0x0/0x30)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 195729] JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
  2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
  2017-05-12 14:53 ` [Bug 195729] " bugzilla-daemon
@ 2017-05-12 14:53 ` bugzilla-daemon
  2017-05-12 14:54 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 14:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

--- Comment #2 from jqiaoulk@gmail.com ---
So briefly speaking, one following call must incorrect call spin_lock or other
functions that disable the preemption in this failure case:

For example:

Failure case:

stm_display_output_start()
spin_lock()
_ZN16CSTmMasterOutput5StartEPK18stm_display_mode_s  -------> We can't sleep
here since it is in the critical section !!!
spin_unlock()
Call Sequence:

(1) vibe_os_clk_set_parent

(2)
_ZN12CSTmClockLLA6EnableE29stm_clk_divider_output_name_t31stm_clk_divider_output_source_t31stm_clk_divider_output_divide_t

(3) _ZN16CSTmMainTVOutput15SetOutputFormatEj

(4) _ZN16CSTmMasterOutput5StartEPK18stm_display_mode_s

(5) stm_display_output_start()

(6) dil_blender_output_start

(7) cdi_hdmi_exit

(8) dil_hdmi_set_video_code

(9) hdmi_release

(10) hdmi_ioctl()

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 195729] JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
  2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
  2017-05-12 14:53 ` [Bug 195729] " bugzilla-daemon
  2017-05-12 14:53 ` bugzilla-daemon
@ 2017-05-12 14:54 ` bugzilla-daemon
  2017-05-12 14:54 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 14:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

--- Comment #3 from jqiaoulk@gmail.com ---
This is the root cause, which is pretty straightforward:

The spin lock is firstly acquired at the following function:

OutputResults CSTmMasterOutput::Start(const stm_display_mode_t* pModeLine) {
vibe_os_lock_resource(m_lock);

if(!this->SetOutputFormat(m_ulOutputFormat)) {
TRC;
goto stop_and_exit;
}

vibe_os_unlock_resource(m_lock);
}

and this->SetOutputFormat will eventually call vibe_os_clk_set_parent() as
below:

bool CSTmClockLLA::Enable {
vibe_os_clk_set_parent(output, source);
}

Based on the callstack, we know vibe_os_clk_set_parent will call
clk_set_parent().
clk_set_parent() is a basic kernel API that would potentially go to sleep,
based on the call stack in the log.

Therefore, CSTmMasterOutput::Start would potentially go to sleeps while holding
spin_Lock().

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 195729] JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
  2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
                   ` (2 preceding siblings ...)
  2017-05-12 14:54 ` bugzilla-daemon
@ 2017-05-12 14:54 ` bugzilla-daemon
  2017-05-12 14:58 ` bugzilla-daemon
  2017-05-12 14:59 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 14:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

--- Comment #4 from jqiaoulk@gmail.com ---
ST needs to review the code and decide which is the best place to place
spin_lock or change to mutex.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 195729] JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
  2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
                   ` (3 preceding siblings ...)
  2017-05-12 14:54 ` bugzilla-daemon
@ 2017-05-12 14:58 ` bugzilla-daemon
  2017-05-12 14:59 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 14:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

--- Comment #5 from jqiaoulk@gmail.com ---
Oh...Sorry, Comment post to wrong place, I will close this ticket and create a
new one...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 195729] JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash.
  2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
                   ` (4 preceding siblings ...)
  2017-05-12 14:58 ` bugzilla-daemon
@ 2017-05-12 14:59 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2017-05-12 14:59 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=195729

jqiaoulk@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-12 14:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-12 13:31 [Bug 195729] New: JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 1766784).There's a risk of filesystem corruption in case of system crash bugzilla-daemon
2017-05-12 14:53 ` [Bug 195729] " bugzilla-daemon
2017-05-12 14:53 ` bugzilla-daemon
2017-05-12 14:54 ` bugzilla-daemon
2017-05-12 14:54 ` bugzilla-daemon
2017-05-12 14:58 ` bugzilla-daemon
2017-05-12 14:59 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).