[Regression] 3.15 mmc related ext4 corruption with qemu-system-arm

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
@ 2014-06-12  5:35 John Stultz
  2014-06-12 12:09 ` Ulf Hansson
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: John Stultz @ 2014-06-12  5:35 UTC (permalink / raw)
  To: Ulf Hansson, Chris Ball, Peter Maydell
  Cc: Johan Rudholm, Russell King - ARM Linux, Theodore Ts'o, lkml

I've been seeing some ext4 corruption with recent kernels under qemu-system-arm.

This issue seems to crop up after shutting down uncleanly (terminating
qemu), shortly after booting about 50% of the time.

ext4/mmc related dmesg details are:
[    3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at
0x10005000 irq 41,42 (pio)
[    3.268316] mmc0: new SDHC card at address 4567
[    3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB
[    3.315699]  mmcblk0: p1 p2 p3 p4 < p5 p6 >
...
[   11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option
[   11.904714] EXT4-fs (mmcblk0p5): recovery complete
[   11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered
data mode. Opts: nomblk_io_submit,errors=panic
...
[   91.558824] EXT4-fs error (device mmcblk0p5):
ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in
gd; block bitmap corrupt.
[   91.560641] Aborting journal on device mmcblk0p5-8.
[   91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
panic forced after error
[   91.562589]
[   91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560
[   91.564616] [<c00116e5>] (unwind_backtrace) from [<c000f3b1>]
(show_stack+0x11/0x14)
[   91.565154] [<c000f3b1>] (show_stack) from [<c04262b1>]
(dump_stack+0x59/0x7c)
[   91.565666] [<c04262b1>] (dump_stack) from [<c0423297>] (panic+0x67/0x178)
[   91.566147] [<c0423297>] (panic) from [<c0134bb1>]
(ext4_handle_error+0x69/0x74)
[   91.566659] [<c0134bb1>] (ext4_handle_error) from [<c0135437>]
(__ext4_grp_locked_error+0x6b/0x160)
[   91.567223] [<c0135437>] (__ext4_grp_locked_error) from
[<c0143079>] (ext4_mb_generate_buddy+0x1b1/0x29c)
[   91.567860] [<c0143079>] (ext4_mb_generate_buddy) from [<c01447e5>]
(ext4_mb_init_cache+0x219/0x4e0)
[   91.568473] [<c01447e5>] (ext4_mb_init_cache) from [<c0144b67>]
(ext4_mb_init_group+0xbb/0x138)
[   91.569021] [<c0144b67>] (ext4_mb_init_group) from [<c0144cd7>]
(ext4_mb_good_group+0xf3/0xfc)
[   91.569659] [<c0144cd7>] (ext4_mb_good_group) from [<c0145c8f>]
(ext4_mb_regular_allocator+0x153/0x2c4)
[   91.570250] [<c0145c8f>] (ext4_mb_regular_allocator) from
[<c0148095>] (ext4_mb_new_blocks+0x2fd/0x4e4)
[   91.570868] [<c0148095>] (ext4_mb_new_blocks) from [<c013f931>]
(ext4_ext_map_blocks+0x965/0x10bc)
[   91.571444] [<c013f931>] (ext4_ext_map_blocks) from [<c0122c8b>]
(ext4_map_blocks+0xfb/0x36c)
[   91.571992] [<c0122c8b>] (ext4_map_blocks) from [<c01263b1>]
(mpage_map_and_submit_extent+0x99/0x5f0)
[   91.572614] [<c01263b1>] (mpage_map_and_submit_extent) from
[<c0126bc1>] (ext4_writepages+0x2b9/0x4e8)
[   91.573201] [<c0126bc1>] (ext4_writepages) from [<c0094ae9>]
(do_writepages+0x19/0x28)
[   91.573709] [<c0094ae9>] (do_writepages) from [<c008c811>]
(__filemap_fdatawrite_range+0x3d/0x44)
[   91.574265] [<c008c811>] (__filemap_fdatawrite_range) from
[<c008c883>] (filemap_flush+0x23/0x28)
[   91.574854] [<c008c883>] (filemap_flush) from [<c012bf75>]
(ext4_rename+0x2f9/0x3e4)
[   91.575360] [<c012bf75>] (ext4_rename) from [<c00c3363>]
(vfs_rename+0x183/0x45c)
[   91.575911] [<c00c3363>] (vfs_rename) from [<c00c3867>]
(SyS_renameat2+0x22b/0x26c)
[   91.576460] [<c00c3867>] (SyS_renameat2) from [<c00c38df>]
(SyS_rename+0x1f/0x24)
[   91.576961] [<c00c38df>] (SyS_rename) from [<c000cd01>]
(ret_fast_syscall+0x1/0x5c)


Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
(mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
be surprising, as I saw problems with that patch earlier in the
3.15-rc cycle:
    https://lkml.org/lkml/2014/4/14/824

However that discussion petered out (possibly my fault for not
following up) as to if it was an issue with the patch or a issue with
qemu.  Then the original issue disappeared for me, which I figured was
due to a fix upstream, but now I'm guessing coincided with me updating
my system and getting qemu v2.0 (where as previously I was on 1.5).

$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard

While the previous behavior was annoying and kept my emulated
environments from booting, this while a bit more rare and subtle eats
the disks, which is much more painful for my testing.

Unfortunately reverting the change (manually, as it doesn't revert
cleanly anymore) doesn't seem to completely avoid the issue, so the
bisection may have gone slightly astray (though it is interesting it
landed on the same commit I earlier had trouble with). So I'll
back-track and double check some of the last few "good" results to
validate I didn't just luck into 3 good boots accidentally. I'll also
review my revert in case I missed something subtle in doing it
manually.

Anyway, if there is any thoughts on how to better chase this down and
debug it, I'd appreciate it! I can also provide reproduction
instructions with a pre-built Linaro android disk image and hand built
kernel if anyone wants to debug this themselves.

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-12  5:35 [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm John Stultz
@ 2014-06-12 12:09 ` Ulf Hansson
  2014-06-12 12:15   ` Peter Maydell
  2014-06-12 23:51 ` John Stultz
  2014-08-08 21:14 ` John Stultz
  2 siblings, 1 reply; 14+ messages in thread
From: Ulf Hansson @ 2014-06-12 12:09 UTC (permalink / raw)
  To: John Stultz
  Cc: Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, Theodore Ts'o, lkml

On 12 June 2014 07:35, John Stultz <john.stultz@linaro.org> wrote:
> I've been seeing some ext4 corruption with recent kernels under qemu-system-arm.
>
> This issue seems to crop up after shutting down uncleanly (terminating
> qemu), shortly after booting about 50% of the time.
>
> ext4/mmc related dmesg details are:
> [    3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at
> 0x10005000 irq 41,42 (pio)
> [    3.268316] mmc0: new SDHC card at address 4567
> [    3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB
> [    3.315699]  mmcblk0: p1 p2 p3 p4 < p5 p6 >
> ...
> [   11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option
> [   11.904714] EXT4-fs (mmcblk0p5): recovery complete
> [   11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered
> data mode. Opts: nomblk_io_submit,errors=panic
> ...
> [   91.558824] EXT4-fs error (device mmcblk0p5):
> ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in
> gd; block bitmap corrupt.
> [   91.560641] Aborting journal on device mmcblk0p5-8.
> [   91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
> panic forced after error
> [   91.562589]
> [   91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560
> [   91.564616] [<c00116e5>] (unwind_backtrace) from [<c000f3b1>]
> (show_stack+0x11/0x14)
> [   91.565154] [<c000f3b1>] (show_stack) from [<c04262b1>]
> (dump_stack+0x59/0x7c)
> [   91.565666] [<c04262b1>] (dump_stack) from [<c0423297>] (panic+0x67/0x178)
> [   91.566147] [<c0423297>] (panic) from [<c0134bb1>]
> (ext4_handle_error+0x69/0x74)
> [   91.566659] [<c0134bb1>] (ext4_handle_error) from [<c0135437>]
> (__ext4_grp_locked_error+0x6b/0x160)
> [   91.567223] [<c0135437>] (__ext4_grp_locked_error) from
> [<c0143079>] (ext4_mb_generate_buddy+0x1b1/0x29c)
> [   91.567860] [<c0143079>] (ext4_mb_generate_buddy) from [<c01447e5>]
> (ext4_mb_init_cache+0x219/0x4e0)
> [   91.568473] [<c01447e5>] (ext4_mb_init_cache) from [<c0144b67>]
> (ext4_mb_init_group+0xbb/0x138)
> [   91.569021] [<c0144b67>] (ext4_mb_init_group) from [<c0144cd7>]
> (ext4_mb_good_group+0xf3/0xfc)
> [   91.569659] [<c0144cd7>] (ext4_mb_good_group) from [<c0145c8f>]
> (ext4_mb_regular_allocator+0x153/0x2c4)
> [   91.570250] [<c0145c8f>] (ext4_mb_regular_allocator) from
> [<c0148095>] (ext4_mb_new_blocks+0x2fd/0x4e4)
> [   91.570868] [<c0148095>] (ext4_mb_new_blocks) from [<c013f931>]
> (ext4_ext_map_blocks+0x965/0x10bc)
> [   91.571444] [<c013f931>] (ext4_ext_map_blocks) from [<c0122c8b>]
> (ext4_map_blocks+0xfb/0x36c)
> [   91.571992] [<c0122c8b>] (ext4_map_blocks) from [<c01263b1>]
> (mpage_map_and_submit_extent+0x99/0x5f0)
> [   91.572614] [<c01263b1>] (mpage_map_and_submit_extent) from
> [<c0126bc1>] (ext4_writepages+0x2b9/0x4e8)
> [   91.573201] [<c0126bc1>] (ext4_writepages) from [<c0094ae9>]
> (do_writepages+0x19/0x28)
> [   91.573709] [<c0094ae9>] (do_writepages) from [<c008c811>]
> (__filemap_fdatawrite_range+0x3d/0x44)
> [   91.574265] [<c008c811>] (__filemap_fdatawrite_range) from
> [<c008c883>] (filemap_flush+0x23/0x28)
> [   91.574854] [<c008c883>] (filemap_flush) from [<c012bf75>]
> (ext4_rename+0x2f9/0x3e4)
> [   91.575360] [<c012bf75>] (ext4_rename) from [<c00c3363>]
> (vfs_rename+0x183/0x45c)
> [   91.575911] [<c00c3363>] (vfs_rename) from [<c00c3867>]
> (SyS_renameat2+0x22b/0x26c)
> [   91.576460] [<c00c3867>] (SyS_renameat2) from [<c00c38df>]
> (SyS_rename+0x1f/0x24)
> [   91.576961] [<c00c38df>] (SyS_rename) from [<c000cd01>]
> (ret_fast_syscall+0x1/0x5c)
>
>
> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
> be surprising, as I saw problems with that patch earlier in the
> 3.15-rc cycle:
>     https://lkml.org/lkml/2014/4/14/824
>
> However that discussion petered out (possibly my fault for not
> following up) as to if it was an issue with the patch or a issue with
> qemu.  Then the original issue disappeared for me, which I figured was
> due to a fix upstream, but now I'm guessing coincided with me updating
> my system and getting qemu v2.0 (where as previously I was on 1.5).
>
> $ qemu-system-arm -version
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> While the previous behavior was annoying and kept my emulated
> environments from booting, this while a bit more rare and subtle eats
> the disks, which is much more painful for my testing.
>
> Unfortunately reverting the change (manually, as it doesn't revert
> cleanly anymore) doesn't seem to completely avoid the issue, so the
> bisection may have gone slightly astray (though it is interesting it
> landed on the same commit I earlier had trouble with). So I'll
> back-track and double check some of the last few "good" results to
> validate I didn't just luck into 3 good boots accidentally. I'll also
> review my revert in case I missed something subtle in doing it
> manually.
>
> Anyway, if there is any thoughts on how to better chase this down and
> debug it, I'd appreciate it! I can also provide reproduction
> instructions with a pre-built Linaro android disk image and hand built
> kernel if anyone wants to debug this themselves.

According to your log, the primecell-periphid is 0x00041181, which
means mmci will be using the arm_variant.

A simple fix; for the arm_variant, go back to use the old behaviour.

A quite simple fix; Invent a new primecell-periphid and a new
corresponding variant and use the old behaviour for this variant. The
new primecell-periphid then needs to be provided through DT for the
QEMU dtb.

Is there any of the above solution you see as the preferred one?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-12 12:09 ` Ulf Hansson
@ 2014-06-12 12:15   ` Peter Maydell
  2014-06-13 11:35     ` Ulf Hansson
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Maydell @ 2014-06-12 12:15 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: John Stultz, Chris Ball, Johan Rudholm, Russell King - ARM Linux,
	Theodore Ts'o, lkml

On 12 June 2014 13:09, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> A simple fix; for the arm_variant, go back to use the old behaviour.
>
> A quite simple fix; Invent a new primecell-periphid and a new
> corresponding variant and use the old behaviour for this variant. The
> new primecell-periphid then needs to be provided through DT for the
> QEMU dtb.
>
> Is there any of the above solution you see as the preferred one?

Those both sound like workarounds, not fixes, to me. Somebody
needs to identify whether the bug here is in:
 * the kernel (unlikely, but possibly the kernel has a race
   condition that only gets triggered by QEMU's "operations
   that take time in h/w happen instantaneously in emulation"
   behaviour)
 * the QEMU model of the PL181
 * the QEMU model of the SD card
and then fix whichever of these is not conforming to the
specs/docs/etc.

Also, there's no such thing as a "QEMU dtb", at least for
most of our board models. QEMU models the actual hardware
(sometimes buggily or incompletely) and so should use the
exact same dtb you would use with the hardware.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-12  5:35 [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm John Stultz
  2014-06-12 12:09 ` Ulf Hansson
@ 2014-06-12 23:51 ` John Stultz
  2014-06-13 12:28   ` Ulf Hansson
  2014-08-08 21:14 ` John Stultz
  2 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2014-06-12 23:51 UTC (permalink / raw)
  To: Ulf Hansson, Chris Ball, Peter Maydell
  Cc: Johan Rudholm, Russell King - ARM Linux, Theodore Ts'o, lkml

On Wed, Jun 11, 2014 at 10:35 PM, John Stultz <john.stultz@linaro.org> wrote:
> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
> be surprising, as I saw problems with that patch earlier in the
> 3.15-rc cycle:
>     https://lkml.org/lkml/2014/4/14/824
>
[...]
>
> Unfortunately reverting the change (manually, as it doesn't revert
> cleanly anymore) doesn't seem to completely avoid the issue, so the
> bisection may have gone slightly astray (though it is interesting it
> landed on the same commit I earlier had trouble with). So I'll
> back-track and double check some of the last few "good" results to
> validate I didn't just luck into 3 good boots accidentally. I'll also
> review my revert in case I missed something subtle in doing it
> manually.

So I'm getting some baffling results. I started going back over the
git bisect logs to see if I had mis-marked a revision as good due to
the issue just not reproducing.

However, despite many many reboots the last good commit in my branch
- bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: Fixup busy
detection while...) doesn't ever show the issue. While the immediately
following commit which bisect found -
e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq
before DATA irq) always does.

The immensely frustrating part is while backing that single change off
from its commit sha always makes the issue go away, reverting that
change from on top of v3.15 doesn't. The issue persists. Since it
doesn't revert cleanly, I also reverted a following patch that it
interacted with 8d94b54d99ea968a9d188ca0e68793ebed601220 (mmc: mmci:
Enable support for busy detection....) to make sure I didn't miss some
dependency and the issue *still* crops up. In fact, doing a git diff
bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3..v3.15 drivers/mmc/  doesn't
seem to resolve the issue.

So I'm really at a bit of a loss on what to do next. While it seems
that the "mmci: Handle CMD irq before DATA..." commit is problematic,
there also seems to be some other commit in v3.15 which results in the
same problematic behavior.  I may try to bisect again between the
first bad commit and v3.15, reverting the bad commit each time to see
if I can chase it down, but if anyone has better debugging tools here,
I'd greatly appreciate it.

Again, I'm happy to help interested folks get this reproducing on
their own machine for debugging.

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-12 12:15   ` Peter Maydell
@ 2014-06-13 11:35     ` Ulf Hansson
  0 siblings, 0 replies; 14+ messages in thread
From: Ulf Hansson @ 2014-06-13 11:35 UTC (permalink / raw)
  To: Peter Maydell
  Cc: John Stultz, Chris Ball, Johan Rudholm, Russell King - ARM Linux,
	Theodore Ts'o, lkml

On 12 June 2014 14:15, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 12 June 2014 13:09, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> A simple fix; for the arm_variant, go back to use the old behaviour.
>>
>> A quite simple fix; Invent a new primecell-periphid and a new
>> corresponding variant and use the old behaviour for this variant. The
>> new primecell-periphid then needs to be provided through DT for the
>> QEMU dtb.
>>
>> Is there any of the above solution you see as the preferred one?
>
> Those both sound like workarounds, not fixes, to me. Somebody
> needs to identify whether the bug here is in:
>  * the kernel (unlikely, but possibly the kernel has a race
>    condition that only gets triggered by QEMU's "operations
>    that take time in h/w happen instantaneously in emulation"
>    behaviour)
>  * the QEMU model of the PL181
>  * the QEMU model of the SD card
> and then fix whichever of these is not conforming to the
> specs/docs/etc.

You are right. But...

Since we (or actually me) have made the ARM model to break (it worked
nicely before), I just wanted to restore the behaviour as a quick fix.
I believe going into this in detail can take some more time,
especially if it's related to the ARM model, right!?

Kind regards
Uffe

>
> Also, there's no such thing as a "QEMU dtb", at least for
> most of our board models. QEMU models the actual hardware
> (sometimes buggily or incompletely) and so should use the
> exact same dtb you would use with the hardware.
>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-12 23:51 ` John Stultz
@ 2014-06-13 12:28   ` Ulf Hansson
  2014-06-16  7:22     ` Jeff Chua
  0 siblings, 1 reply; 14+ messages in thread
From: Ulf Hansson @ 2014-06-13 12:28 UTC (permalink / raw)
  To: John Stultz
  Cc: Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, Theodore Ts'o, lkml

On 13 June 2014 01:51, John Stultz <john.stultz@linaro.org> wrote:
> On Wed, Jun 11, 2014 at 10:35 PM, John Stultz <john.stultz@linaro.org> wrote:
>> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
>> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
>> be surprising, as I saw problems with that patch earlier in the
>> 3.15-rc cycle:
>>     https://lkml.org/lkml/2014/4/14/824
>>
> [...]
>>
>> Unfortunately reverting the change (manually, as it doesn't revert
>> cleanly anymore) doesn't seem to completely avoid the issue, so the
>> bisection may have gone slightly astray (though it is interesting it
>> landed on the same commit I earlier had trouble with). So I'll
>> back-track and double check some of the last few "good" results to
>> validate I didn't just luck into 3 good boots accidentally. I'll also
>> review my revert in case I missed something subtle in doing it
>> manually.
>
> So I'm getting some baffling results. I started going back over the
> git bisect logs to see if I had mis-marked a revision as good due to
> the issue just not reproducing.
>
> However, despite many many reboots the last good commit in my branch
> - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: Fixup busy
> detection while...) doesn't ever show the issue. While the immediately
> following commit which bisect found -
> e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq
> before DATA irq) always does.
>
> The immensely frustrating part is while backing that single change off
> from its commit sha always makes the issue go away, reverting that
> change from on top of v3.15 doesn't. The issue persists. Since it
> doesn't revert cleanly, I also reverted a following patch that it
> interacted with 8d94b54d99ea968a9d188ca0e68793ebed601220 (mmc: mmci:
> Enable support for busy detection....) to make sure I didn't miss some
> dependency and the issue *still* crops up. In fact, doing a git diff
> bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3..v3.15 drivers/mmc/  doesn't
> seem to resolve the issue.
>
> So I'm really at a bit of a loss on what to do next. While it seems
> that the "mmci: Handle CMD irq before DATA..." commit is problematic,
> there also seems to be some other commit in v3.15 which results in the
> same problematic behavior.  I may try to bisect again between the
> first bad commit and v3.15, reverting the bad commit each time to see
> if I can chase it down, but if anyone has better debugging tools here,
> I'd greatly appreciate it.
>
> Again, I'm happy to help interested folks get this reproducing on
> their own machine for debugging.
>

Hi John,

I have quickly implemented my proposal 1). I am testing them on real
HW now, will post the patches as soon as I can and keep you on cc.

I would also really appreciate if you could help out giving them a
quick try for your QEMU environment.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-13 12:28   ` Ulf Hansson
@ 2014-06-16  7:22     ` Jeff Chua
  2014-06-16 13:02       ` Ulf Hansson
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Chua @ 2014-06-16  7:22 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: John Stultz, Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, Theodore Ts'o, lkml

On Fri, Jun 13, 2014 at 8:28 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 13 June 2014 01:51, John Stultz <john.stultz@linaro.org> wrote:
>> On Wed, Jun 11, 2014 at 10:35 PM, John Stultz john.stultz@linaro.org> wrote:

> I have quickly implemented my proposal 1). I am testing them on real
> HW now, will post the patches as soon as I can and keep you on cc.
>
> I would also really appreciate if you could help out giving them a
> quick try for your QEMU environment.

Please cc me the patch. I'm seeing my host's reiserfs corrupted with
qemu all over the places in linux-3.15.0 (linux-3.16-rc1). Pretty sure
it's qemu as it doesn't seem to happen if I don't run qemu.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-16  7:22     ` Jeff Chua
@ 2014-06-16 13:02       ` Ulf Hansson
  0 siblings, 0 replies; 14+ messages in thread
From: Ulf Hansson @ 2014-06-16 13:02 UTC (permalink / raw)
  To: Jeff Chua
  Cc: John Stultz, Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, Theodore Ts'o, lkml

On 16 June 2014 09:22, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
> On Fri, Jun 13, 2014 at 8:28 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 13 June 2014 01:51, John Stultz <john.stultz@linaro.org> wrote:
>>> On Wed, Jun 11, 2014 at 10:35 PM, John Stultz john.stultz@linaro.org> wrote:
>
>> I have quickly implemented my proposal 1). I am testing them on real
>> HW now, will post the patches as soon as I can and keep you on cc.
>>
>> I would also really appreciate if you could help out giving them a
>> quick try for your QEMU environment.
>
> Please cc me the patch. I'm seeing my host's reiserfs corrupted with
> qemu all over the places in linux-3.15.0 (linux-3.16-rc1). Pretty sure
> it's qemu as it doesn't seem to happen if I don't run qemu.
>
> Thanks,
> Jeff

Hi Jeff,

I posted them prior I noticed this, sorry. There are three patches for
the mmci driver, just search for "mmci".

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-06-12  5:35 [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm John Stultz
  2014-06-12 12:09 ` Ulf Hansson
  2014-06-12 23:51 ` John Stultz
@ 2014-08-08 21:14 ` John Stultz
  2014-08-09  0:15   ` Kees Cook
  2 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2014-08-08 21:14 UTC (permalink / raw)
  To: Ulf Hansson, Chris Ball, Peter Maydell
  Cc: Johan Rudholm, Russell King - ARM Linux, Theodore Ts'o, lkml,
	Kees Cook

On 06/11/2014 10:35 PM, John Stultz wrote:
> I've been seeing some ext4 corruption with recent kernels under qemu-system-arm.
>
> This issue seems to crop up after shutting down uncleanly (terminating
> qemu), shortly after booting about 50% of the time.
>
> ext4/mmc related dmesg details are:
> [    3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at
> 0x10005000 irq 41,42 (pio)
> [    3.268316] mmc0: new SDHC card at address 4567
> [    3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB
> [    3.315699]  mmcblk0: p1 p2 p3 p4 < p5 p6 >
> ...
> [   11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option
> [   11.904714] EXT4-fs (mmcblk0p5): recovery complete
> [   11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered
> data mode. Opts: nomblk_io_submit,errors=panic
> ...
> [   91.558824] EXT4-fs error (device mmcblk0p5):
> ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in
> gd; block bitmap corrupt.
> [   91.560641] Aborting journal on device mmcblk0p5-8.
> [   91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
> panic forced after error
> [   91.562589]
> [   91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560
> [   91.564616] [<c00116e5>] (unwind_backtrace) from [<c000f3b1>]
> (show_stack+0x11/0x14)
> [   91.565154] [<c000f3b1>] (show_stack) from [<c04262b1>]
> (dump_stack+0x59/0x7c)
> [   91.565666] [<c04262b1>] (dump_stack) from [<c0423297>] (panic+0x67/0x178)
> [   91.566147] [<c0423297>] (panic) from [<c0134bb1>]
> (ext4_handle_error+0x69/0x74)
> [   91.566659] [<c0134bb1>] (ext4_handle_error) from [<c0135437>]
> (__ext4_grp_locked_error+0x6b/0x160)
> [   91.567223] [<c0135437>] (__ext4_grp_locked_error) from
> [<c0143079>] (ext4_mb_generate_buddy+0x1b1/0x29c)
> [   91.567860] [<c0143079>] (ext4_mb_generate_buddy) from [<c01447e5>]
> (ext4_mb_init_cache+0x219/0x4e0)
> [   91.568473] [<c01447e5>] (ext4_mb_init_cache) from [<c0144b67>]
> (ext4_mb_init_group+0xbb/0x138)
> [   91.569021] [<c0144b67>] (ext4_mb_init_group) from [<c0144cd7>]
> (ext4_mb_good_group+0xf3/0xfc)
> [   91.569659] [<c0144cd7>] (ext4_mb_good_group) from [<c0145c8f>]
> (ext4_mb_regular_allocator+0x153/0x2c4)
> [   91.570250] [<c0145c8f>] (ext4_mb_regular_allocator) from
> [<c0148095>] (ext4_mb_new_blocks+0x2fd/0x4e4)
> [   91.570868] [<c0148095>] (ext4_mb_new_blocks) from [<c013f931>]
> (ext4_ext_map_blocks+0x965/0x10bc)
> [   91.571444] [<c013f931>] (ext4_ext_map_blocks) from [<c0122c8b>]
> (ext4_map_blocks+0xfb/0x36c)
> [   91.571992] [<c0122c8b>] (ext4_map_blocks) from [<c01263b1>]
> (mpage_map_and_submit_extent+0x99/0x5f0)
> [   91.572614] [<c01263b1>] (mpage_map_and_submit_extent) from
> [<c0126bc1>] (ext4_writepages+0x2b9/0x4e8)
> [   91.573201] [<c0126bc1>] (ext4_writepages) from [<c0094ae9>]
> (do_writepages+0x19/0x28)
> [   91.573709] [<c0094ae9>] (do_writepages) from [<c008c811>]
> (__filemap_fdatawrite_range+0x3d/0x44)
> [   91.574265] [<c008c811>] (__filemap_fdatawrite_range) from
> [<c008c883>] (filemap_flush+0x23/0x28)
> [   91.574854] [<c008c883>] (filemap_flush) from [<c012bf75>]
> (ext4_rename+0x2f9/0x3e4)
> [   91.575360] [<c012bf75>] (ext4_rename) from [<c00c3363>]
> (vfs_rename+0x183/0x45c)
> [   91.575911] [<c00c3363>] (vfs_rename) from [<c00c3867>]
> (SyS_renameat2+0x22b/0x26c)
> [   91.576460] [<c00c3867>] (SyS_renameat2) from [<c00c38df>]
> (SyS_rename+0x1f/0x24)
> [   91.576961] [<c00c38df>] (SyS_rename) from [<c000cd01>]
> (ret_fast_syscall+0x1/0x5c)
>
>
> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
> be surprising, as I saw problems with that patch earlier in the
> 3.15-rc cycle:
>     https://lkml.org/lkml/2014/4/14/824
>
> However that discussion petered out (possibly my fault for not
> following up) as to if it was an issue with the patch or a issue with
> qemu.  Then the original issue disappeared for me, which I figured was
> due to a fix upstream, but now I'm guessing coincided with me updating
> my system and getting qemu v2.0 (where as previously I was on 1.5).
>
> $ qemu-system-arm -version
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> While the previous behavior was annoying and kept my emulated
> environments from booting, this while a bit more rare and subtle eats
> the disks, which is much more painful for my testing.
>
> Unfortunately reverting the change (manually, as it doesn't revert
> cleanly anymore) doesn't seem to completely avoid the issue, so the
> bisection may have gone slightly astray (though it is interesting it
> landed on the same commit I earlier had trouble with). So I'll
> back-track and double check some of the last few "good" results to
> validate I didn't just luck into 3 good boots accidentally. I'll also
> review my revert in case I missed something subtle in doing it
> manually.
>
> Anyway, if there is any thoughts on how to better chase this down and
> debug it, I'd appreciate it! I can also provide reproduction
> instructions with a pre-built Linaro android disk image and hand built
> kernel if anyone wants to debug this themselves.

So I just wanted to check if anyone else tried looking into this issue?
I'd be happy to share my qemu environment, config, etc.

I sunk a couple of weeks bisecting to try to narrow down the more
sporadic issue, but was unsuccessful past the initial commit above.
Since then I've been far too swamped to spend any more time on it. Even
so, its a *major* pain for testing but it seems like no one else really
cares?

thanks
-john


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-08-08 21:14 ` John Stultz
@ 2014-08-09  0:15   ` Kees Cook
  2014-08-09  0:17     ` John Stultz
  0 siblings, 1 reply; 14+ messages in thread
From: Kees Cook @ 2014-08-09  0:15 UTC (permalink / raw)
  To: John Stultz
  Cc: Ulf Hansson, Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, Theodore Ts'o, lkml

On Fri, Aug 8, 2014 at 2:14 PM, John Stultz <john.stultz@linaro.org> wrote:
> On 06/11/2014 10:35 PM, John Stultz wrote:
>> I've been seeing some ext4 corruption with recent kernels under qemu-system-arm.
>>
>> This issue seems to crop up after shutting down uncleanly (terminating
>> qemu), shortly after booting about 50% of the time.
>>
>> ext4/mmc related dmesg details are:
>> [    3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at
>> 0x10005000 irq 41,42 (pio)
>> [    3.268316] mmc0: new SDHC card at address 4567
>> [    3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB
>> [    3.315699]  mmcblk0: p1 p2 p3 p4 < p5 p6 >
>> ...
>> [   11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option
>> [   11.904714] EXT4-fs (mmcblk0p5): recovery complete
>> [   11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered
>> data mode. Opts: nomblk_io_submit,errors=panic
>> ...
>> [   91.558824] EXT4-fs error (device mmcblk0p5):
>> ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in
>> gd; block bitmap corrupt.
>> [   91.560641] Aborting journal on device mmcblk0p5-8.
>> [   91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>> panic forced after error
>> [   91.562589]
>> [   91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560
>> [   91.564616] [<c00116e5>] (unwind_backtrace) from [<c000f3b1>]
>> (show_stack+0x11/0x14)
>> [   91.565154] [<c000f3b1>] (show_stack) from [<c04262b1>]
>> (dump_stack+0x59/0x7c)
>> [   91.565666] [<c04262b1>] (dump_stack) from [<c0423297>] (panic+0x67/0x178)
>> [   91.566147] [<c0423297>] (panic) from [<c0134bb1>]
>> (ext4_handle_error+0x69/0x74)
>> [   91.566659] [<c0134bb1>] (ext4_handle_error) from [<c0135437>]
>> (__ext4_grp_locked_error+0x6b/0x160)
>> [   91.567223] [<c0135437>] (__ext4_grp_locked_error) from
>> [<c0143079>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>> [   91.567860] [<c0143079>] (ext4_mb_generate_buddy) from [<c01447e5>]
>> (ext4_mb_init_cache+0x219/0x4e0)
>> [   91.568473] [<c01447e5>] (ext4_mb_init_cache) from [<c0144b67>]
>> (ext4_mb_init_group+0xbb/0x138)
>> [   91.569021] [<c0144b67>] (ext4_mb_init_group) from [<c0144cd7>]
>> (ext4_mb_good_group+0xf3/0xfc)
>> [   91.569659] [<c0144cd7>] (ext4_mb_good_group) from [<c0145c8f>]
>> (ext4_mb_regular_allocator+0x153/0x2c4)
>> [   91.570250] [<c0145c8f>] (ext4_mb_regular_allocator) from
>> [<c0148095>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>> [   91.570868] [<c0148095>] (ext4_mb_new_blocks) from [<c013f931>]
>> (ext4_ext_map_blocks+0x965/0x10bc)
>> [   91.571444] [<c013f931>] (ext4_ext_map_blocks) from [<c0122c8b>]
>> (ext4_map_blocks+0xfb/0x36c)
>> [   91.571992] [<c0122c8b>] (ext4_map_blocks) from [<c01263b1>]
>> (mpage_map_and_submit_extent+0x99/0x5f0)
>> [   91.572614] [<c01263b1>] (mpage_map_and_submit_extent) from
>> [<c0126bc1>] (ext4_writepages+0x2b9/0x4e8)
>> [   91.573201] [<c0126bc1>] (ext4_writepages) from [<c0094ae9>]
>> (do_writepages+0x19/0x28)
>> [   91.573709] [<c0094ae9>] (do_writepages) from [<c008c811>]
>> (__filemap_fdatawrite_range+0x3d/0x44)
>> [   91.574265] [<c008c811>] (__filemap_fdatawrite_range) from
>> [<c008c883>] (filemap_flush+0x23/0x28)
>> [   91.574854] [<c008c883>] (filemap_flush) from [<c012bf75>]
>> (ext4_rename+0x2f9/0x3e4)
>> [   91.575360] [<c012bf75>] (ext4_rename) from [<c00c3363>]
>> (vfs_rename+0x183/0x45c)
>> [   91.575911] [<c00c3363>] (vfs_rename) from [<c00c3867>]
>> (SyS_renameat2+0x22b/0x26c)
>> [   91.576460] [<c00c3867>] (SyS_renameat2) from [<c00c38df>]
>> (SyS_rename+0x1f/0x24)
>> [   91.576961] [<c00c38df>] (SyS_rename) from [<c000cd01>]
>> (ret_fast_syscall+0x1/0x5c)
>>
>>
>> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
>> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
>> be surprising, as I saw problems with that patch earlier in the
>> 3.15-rc cycle:
>>     https://lkml.org/lkml/2014/4/14/824
>>
>> However that discussion petered out (possibly my fault for not
>> following up) as to if it was an issue with the patch or a issue with
>> qemu.  Then the original issue disappeared for me, which I figured was
>> due to a fix upstream, but now I'm guessing coincided with me updating
>> my system and getting qemu v2.0 (where as previously I was on 1.5).
>>
>> $ qemu-system-arm -version
>> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
>> (c) 2003-2008 Fabrice Bellard
>>
>> While the previous behavior was annoying and kept my emulated
>> environments from booting, this while a bit more rare and subtle eats
>> the disks, which is much more painful for my testing.
>>
>> Unfortunately reverting the change (manually, as it doesn't revert
>> cleanly anymore) doesn't seem to completely avoid the issue, so the
>> bisection may have gone slightly astray (though it is interesting it
>> landed on the same commit I earlier had trouble with). So I'll
>> back-track and double check some of the last few "good" results to
>> validate I didn't just luck into 3 good boots accidentally. I'll also
>> review my revert in case I missed something subtle in doing it
>> manually.
>>
>> Anyway, if there is any thoughts on how to better chase this down and
>> debug it, I'd appreciate it! I can also provide reproduction
>> instructions with a pre-built Linaro android disk image and hand built
>> kernel if anyone wants to debug this themselves.
>
> So I just wanted to check if anyone else tried looking into this issue?
> I'd be happy to share my qemu environment, config, etc.
>
> I sunk a couple of weeks bisecting to try to narrow down the more
> sporadic issue, but was unsuccessful past the initial commit above.
> Since then I've been far too swamped to spend any more time on it. Even
> so, its a *major* pain for testing but it seems like no one else really
> cares?

I'm in the same boat as far as poor bisection results. :(

However, I keep using the 3-patch mmci fix series from Ulf, and
haven't hit any trouble with them. Though perhaps I'm just getting
lucky?

http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=arm/fix-mmci

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-08-09  0:15   ` Kees Cook
@ 2014-08-09  0:17     ` John Stultz
  2014-08-09  0:32       ` Theodore Ts'o
  0 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2014-08-09  0:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ulf Hansson, Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, Theodore Ts'o, lkml

On 08/08/2014 05:15 PM, Kees Cook wrote:
> On Fri, Aug 8, 2014 at 2:14 PM, John Stultz <john.stultz@linaro.org> wrote:
>> I sunk a couple of weeks bisecting to try to narrow down the more
>> sporadic issue, but was unsuccessful past the initial commit above.
>> Since then I've been far too swamped to spend any more time on it. Even
>> so, its a *major* pain for testing but it seems like no one else really
>> cares?
> I'm in the same boat as far as poor bisection results. :(
>
> However, I keep using the 3-patch mmci fix series from Ulf, and
> haven't hit any trouble with them. Though perhaps I'm just getting
> lucky?
>
> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=arm/fix-mmci

I guess I'll give that another shot then.

thanks
-john


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-08-09  0:17     ` John Stultz
@ 2014-08-09  0:32       ` Theodore Ts'o
  2014-08-09  4:03         ` John Stultz
  0 siblings, 1 reply; 14+ messages in thread
From: Theodore Ts'o @ 2014-08-09  0:32 UTC (permalink / raw)
  To: John Stultz
  Cc: Kees Cook, Ulf Hansson, Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, lkml

On Fri, Aug 08, 2014 at 05:17:54PM -0700, John Stultz wrote:
> On 08/08/2014 05:15 PM, Kees Cook wrote:
> > On Fri, Aug 8, 2014 at 2:14 PM, John Stultz <john.stultz@linaro.org> wrote:
> >> I sunk a couple of weeks bisecting to try to narrow down the more
> >> sporadic issue, but was unsuccessful past the initial commit above.
> >> Since then I've been far too swamped to spend any more time on it. Even
> >> so, its a *major* pain for testing but it seems like no one else really
> >> cares?
> > I'm in the same boat as far as poor bisection results. :(
> >
> > However, I keep using the 3-patch mmci fix series from Ulf, and
> > haven't hit any trouble with them. Though perhaps I'm just getting
> > lucky?
> >
> > http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=arm/fix-mmci
> 
> I guess I'll give that another shot 

There was an ext4 bug that might have caused this problem.  It was
fixed in v3.15.6 and v3.16-rc5.

commit f9ae9cf5d72b3926ca48ea60e15bdbb840f42372
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri Jul 11 13:55:40 2014 -0400

    ext4: revert commit which was causing fs corruption after journal replays
    
    Commit 007649375f6af2 ("ext4: initialize multi-block allocator before
    checking block descriptors") causes the block group descriptor's count
    of the number of free blocks to become inconsistent with the number of
    free blocks in the allocation bitmap.  This is a harmless form of fs
    corruption, but it causes the kernel to potentially remount the file
    system read-only, or to panic, depending on the file systems's error
    behavior.
    
    Thanks to Eric Whitney for his tireless work to reproduce and to find
    the guilty commit.
    
    Fixes: 007649375f6af2 ("ext4: initialize multi-block allocator before checki
    
    Cc: stable@vger.kernel.org  # 3.15
    Reported-by: David Jander <david@protonic.nl>
    Reported-by: Matteo Croce <technoboy85@gmail.com>
    Tested-by: Eric Whitney <enwlinux@gmail.com>
    Suggested-by: Eric Whitney <enwlinux@gmail.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

The bug wouldn't always trigger, which is probably why it gave you so
much trouble trying to do the bisect.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-08-09  0:32       ` Theodore Ts'o
@ 2014-08-09  4:03         ` John Stultz
  2014-08-11  8:04           ` Ulf Hansson
  0 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2014-08-09  4:03 UTC (permalink / raw)
  To: Theodore Ts'o, John Stultz, Kees Cook, Ulf Hansson,
	Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, lkml

On Fri, Aug 8, 2014 at 5:32 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Aug 08, 2014 at 05:17:54PM -0700, John Stultz wrote:
>> On 08/08/2014 05:15 PM, Kees Cook wrote:
>> > On Fri, Aug 8, 2014 at 2:14 PM, John Stultz <john.stultz@linaro.org> wrote:
>> >> I sunk a couple of weeks bisecting to try to narrow down the more
>> >> sporadic issue, but was unsuccessful past the initial commit above.
>> >> Since then I've been far too swamped to spend any more time on it. Even
>> >> so, its a *major* pain for testing but it seems like no one else really
>> >> cares?
>> > I'm in the same boat as far as poor bisection results. :(
>> >
>> > However, I keep using the 3-patch mmci fix series from Ulf, and
>> > haven't hit any trouble with them. Though perhaps I'm just getting
>> > lucky?
>> >
>> > http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=arm/fix-mmci
>>
>> I guess I'll give that another shot
>
> There was an ext4 bug that might have caused this problem.  It was
> fixed in v3.15.6 and v3.16-rc5.

Sweet! Many thanks to Eric for chasing that down (I spent a ton of
time with no results, so I can't imagine how much effort it took him)
and you for pointing it out.

So yes, so far I'm not seeing any filesystem panics w/ Linus' head  +
with Ulf's patches in Kees tree above. That makes me *very* happy.

Ulf: Are you planning to push those upstream (and to -stable) soon?

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
  2014-08-09  4:03         ` John Stultz
@ 2014-08-11  8:04           ` Ulf Hansson
  0 siblings, 0 replies; 14+ messages in thread
From: Ulf Hansson @ 2014-08-11  8:04 UTC (permalink / raw)
  To: John Stultz, Kees Cook
  Cc: Theodore Ts'o, Chris Ball, Peter Maydell, Johan Rudholm,
	Russell King - ARM Linux, lkml

On 9 August 2014 06:03, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Aug 8, 2014 at 5:32 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> On Fri, Aug 08, 2014 at 05:17:54PM -0700, John Stultz wrote:
>>> On 08/08/2014 05:15 PM, Kees Cook wrote:
>>> > On Fri, Aug 8, 2014 at 2:14 PM, John Stultz <john.stultz@linaro.org> wrote:
>>> >> I sunk a couple of weeks bisecting to try to narrow down the more
>>> >> sporadic issue, but was unsuccessful past the initial commit above.
>>> >> Since then I've been far too swamped to spend any more time on it. Even
>>> >> so, its a *major* pain for testing but it seems like no one else really
>>> >> cares?
>>> > I'm in the same boat as far as poor bisection results. :(
>>> >
>>> > However, I keep using the 3-patch mmci fix series from Ulf, and
>>> > haven't hit any trouble with them. Though perhaps I'm just getting
>>> > lucky?
>>> >
>>> > http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=arm/fix-mmci
>>>
>>> I guess I'll give that another shot
>>
>> There was an ext4 bug that might have caused this problem.  It was
>> fixed in v3.15.6 and v3.16-rc5.
>
> Sweet! Many thanks to Eric for chasing that down (I spent a ton of
> time with no results, so I can't imagine how much effort it took him)
> and you for pointing it out.
>
> So yes, so far I'm not seeing any filesystem panics w/ Linus' head  +
> with Ulf's patches in Kees tree above. That makes me *very* happy.
>
> Ulf: Are you planning to push those upstream (and to -stable) soon?

Sorry for the delay!

I will apply them just now and add a stable tag to the commits!

Kind regards
Uffe

>
> thanks
> -john

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-11  8:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-12  5:35 [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm John Stultz
2014-06-12 12:09 ` Ulf Hansson
2014-06-12 12:15   ` Peter Maydell
2014-06-13 11:35     ` Ulf Hansson
2014-06-12 23:51 ` John Stultz
2014-06-13 12:28   ` Ulf Hansson
2014-06-16  7:22     ` Jeff Chua
2014-06-16 13:02       ` Ulf Hansson
2014-08-08 21:14 ` John Stultz
2014-08-09  0:15   ` Kees Cook
2014-08-09  0:17     ` John Stultz
2014-08-09  0:32       ` Theodore Ts'o
2014-08-09  4:03         ` John Stultz
2014-08-11  8:04           ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).