[PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
@ 2014-06-16  8:09 Ulf Hansson
  2014-06-16 18:58 ` John Stultz
  0 siblings, 1 reply; 11+ messages in thread
From: Ulf Hansson @ 2014-06-16  8:09 UTC (permalink / raw)
  To: linux-arm-kernel

Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
when using the ARM model of the PL181 and running QEMU.

The bug was reported for the following QEMU version:
$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard

To resolve the problem, let's restore the old behavior were the DATA
irq is handled prior the CMD irq, but only for the arm_variant, which
the problem was reported for.

Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/host/mmci.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 5d20bfba..e4d4707 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -74,6 +74,7 @@ static unsigned int fmax = 515633;
  * @pwrreg_nopower: bits in MMCIPOWER don't controls ext. power supply
  * @explicit_mclk_control: enable explicit mclk control in driver.
  * @qcom_fifo: enables qcom specific fifo pio read logic.
+ * @reversed_irq_handling: handle data irq before cmd irq.
  */
 struct variant_data {
 	unsigned int		clkreg;
@@ -97,6 +98,7 @@ struct variant_data {
 	bool			pwrreg_nopower;
 	bool			explicit_mclk_control;
 	bool			qcom_fifo;
+	bool			reversed_irq_handling;
 };
 
 static struct variant_data variant_arm = {
@@ -105,6 +107,7 @@ static struct variant_data variant_arm = {
 	.datalength_bits	= 16,
 	.pwrreg_powerup		= MCI_PWR_UP,
 	.f_max			= 100000000,
+	.reversed_irq_handling	= true,
 };
 
 static struct variant_data variant_arm_extended_fifo = {
@@ -1236,8 +1239,13 @@ static irqreturn_t mmci_irq(int irq, void *dev_id)
 
 		dev_dbg(mmc_dev(host->mmc), "irq0 (data+cmd) %08x\n", status);
 
-		mmci_cmd_irq(host, host->cmd, status);
-		mmci_data_irq(host, host->data, status);
+		if (host->variant->reversed_irq_handling) {
+			mmci_data_irq(host, host->data, status);
+			mmci_cmd_irq(host, host->cmd, status);
+		} else {
+			mmci_cmd_irq(host, host->cmd, status);
+			mmci_data_irq(host, host->data, status);
+		}
 
 		/* Don't poll for busy completion in irq context. */
 		if (host->busy_status)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-16  8:09 [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant Ulf Hansson
@ 2014-06-16 18:58 ` John Stultz
  2014-06-16 21:20   ` Ulf Hansson
  0 siblings, 1 reply; 11+ messages in thread
From: John Stultz @ 2014-06-16 18:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
> when using the ARM model of the PL181 and running QEMU.
>
> The bug was reported for the following QEMU version:
> $ qemu-system-arm -version
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> To resolve the problem, let's restore the old behavior were the DATA
> irq is handled prior the CMD irq, but only for the arm_variant, which
> the problem was reported for.
>
> Reported-by: John Stultz <john.stultz@linaro.org>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Hey Ulf,
  Thanks for sending these out (though I wasn't cc'ed on this 3'rd
patch). The first two apply for me, but this one doesn't. What should
I be basing this off of?

thanks
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-16 18:58 ` John Stultz
@ 2014-06-16 21:20   ` Ulf Hansson
  2014-06-16 22:41     ` John Stultz
  0 siblings, 1 reply; 11+ messages in thread
From: Ulf Hansson @ 2014-06-16 21:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 16 June 2014 20:58, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
>> when using the ARM model of the PL181 and running QEMU.
>>
>> The bug was reported for the following QEMU version:
>> $ qemu-system-arm -version
>> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
>> (c) 2003-2008 Fabrice Bellard
>>
>> To resolve the problem, let's restore the old behavior were the DATA
>> irq is handled prior the CMD irq, but only for the arm_variant, which
>> the problem was reported for.
>>
>> Reported-by: John Stultz <john.stultz@linaro.org>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> Hey Ulf,
>   Thanks for sending these out (though I wasn't cc'ed on this 3'rd
> patch). The first two apply for me, but this one doesn't. What should
> I be basing this off of?

Hi John,

Thanks for helping out testing!

This patch based upon my latest mmc tree and the next branch. I tried
to apply it for 3.15, and I think you will be able resolve the
conflict - I should be quite trivial.

Patch 1 and 2 applies on 3.15 as well. So I guess testing this on 3.15
will provide us with answers if it solves our problem.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-16 21:20   ` Ulf Hansson
@ 2014-06-16 22:41     ` John Stultz
  2014-06-16 23:29       ` John Stultz
  0 siblings, 1 reply; 11+ messages in thread
From: John Stultz @ 2014-06-16 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> This patch based upon my latest mmc tree and the next branch. I tried
> to apply it for 3.15, and I think you will be able resolve the
> conflict - I should be quite trivial.

No worries. I just didn't want to waste time resolving it if it was
logically dependent on some other change.

I'll give it a shot and get back to you.

thanks
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-16 22:41     ` John Stultz
@ 2014-06-16 23:29       ` John Stultz
  2014-06-17  7:33         ` Ulf Hansson
  0 siblings, 1 reply; 11+ messages in thread
From: John Stultz @ 2014-06-16 23:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> This patch based upon my latest mmc tree and the next branch. I tried
>> to apply it for 3.15, and I think you will be able resolve the
>> conflict - I should be quite trivial.
>
> No worries. I just didn't want to waste time resolving it if it was
> logically dependent on some other change.
>
> I'll give it a shot and get back to you.

So unfortunately I'm still seeing trouble..

[   94.202843] EXT4-fs error (device mmcblk0p5):
ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
gd; block bitmap corrupt.
[   94.203873] Aborting journal on device mmcblk0p5-8.
[   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
panic forced after error
[   94.206553]
[   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
3.15.0-00002-g044f37a-dirty #589
[   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
(show_stack+0x11/0x14)
[   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
(dump_stack+0x59/0x7c)
[   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
[   94.209724] [<c042a57f>] (panic) from [<c0135055>]
(ext4_handle_error+0x69/0x74)
[   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
(__ext4_grp_locked_error+0x6b/0x160)
[   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
[<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
[   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
(ext4_mb_init_cache+0x219/0x4e0)
[   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
(ext4_mb_init_group+0xbb/0x13c)
[   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
(ext4_mb_good_group+0xf3/0xfc)
[   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
(ext4_mb_regular_allocator+0x153/0x2c4)
[   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
[<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
[   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
(ext4_ext_map_blocks+0x965/0x10f0)
[   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
(ext4_map_blocks+0xff/0x374)
[   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
[<c0127049>] (ext4_writepages+0x2b9/0x4e8)
[   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
(do_writepages+0x19/0x28)
[   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
(__filemap_fdatawrite_range+0x3d/0x44)
[   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
[<c008cc3f>] (filemap_flush+0x23/0x28)
[   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
(ext4_rename+0x2f9/0x3e4)
[   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
(vfs_rename+0x183/0x45c)
[   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
(SyS_renameat2+0x22b/0x26c)
[   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
(SyS_rename+0x1f/0x24)
[   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
(ret_fast_syscall+0x1/0x5c)


That said, this mirrors the behavior when I was reverting your change
by hand on-top of 3.15. While git bisect pointed to your patch and
reverting it from the commit seems to resolve the issue at that point,
there seems to be some other commit in the 3.14->3.15-rc1 interval
that is causing problems as well.

Are there any sort of debugging options for mmc that I can use to try
to better narrow down whats going wrong?

thanks
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-16 23:29       ` John Stultz
@ 2014-06-17  7:33         ` Ulf Hansson
  2014-06-17 13:57           ` John Stultz
  2014-06-27 20:37           ` Kees Cook
  0 siblings, 2 replies; 11+ messages in thread
From: Ulf Hansson @ 2014-06-17  7:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> This patch based upon my latest mmc tree and the next branch. I tried
>>> to apply it for 3.15, and I think you will be able resolve the
>>> conflict - I should be quite trivial.
>>
>> No worries. I just didn't want to waste time resolving it if it was
>> logically dependent on some other change.
>>
>> I'll give it a shot and get back to you.
>
> So unfortunately I'm still seeing trouble..
>
> [   94.202843] EXT4-fs error (device mmcblk0p5):
> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
> gd; block bitmap corrupt.
> [   94.203873] Aborting journal on device mmcblk0p5-8.
> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
> panic forced after error
> [   94.206553]
> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
> 3.15.0-00002-g044f37a-dirty #589
> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
> (show_stack+0x11/0x14)
> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
> (dump_stack+0x59/0x7c)
> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
> (ext4_handle_error+0x69/0x74)
> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
> (__ext4_grp_locked_error+0x6b/0x160)
> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
> (ext4_mb_init_cache+0x219/0x4e0)
> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
> (ext4_mb_init_group+0xbb/0x13c)
> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
> (ext4_mb_good_group+0xf3/0xfc)
> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
> (ext4_mb_regular_allocator+0x153/0x2c4)
> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
> (ext4_ext_map_blocks+0x965/0x10f0)
> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
> (ext4_map_blocks+0xff/0x374)
> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
> (do_writepages+0x19/0x28)
> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
> (__filemap_fdatawrite_range+0x3d/0x44)
> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
> [<c008cc3f>] (filemap_flush+0x23/0x28)
> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
> (ext4_rename+0x2f9/0x3e4)
> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
> (vfs_rename+0x183/0x45c)
> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
> (SyS_renameat2+0x22b/0x26c)
> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
> (SyS_rename+0x1f/0x24)
> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
> (ret_fast_syscall+0x1/0x5c)
>
>
> That said, this mirrors the behavior when I was reverting your change
> by hand on-top of 3.15. While git bisect pointed to your patch and
> reverting it from the commit seems to resolve the issue at that point,
> there seems to be some other commit in the 3.14->3.15-rc1 interval
> that is causing problems as well.
>
> Are there any sort of debugging options for mmc that I can use to try
> to better narrow down whats going wrong?

It seems like you want to debug the mmci host driver and unfortunate
the debug utilities available are only dev_dbg prints. I wouldn't be
surprised if the problem goes away when you enable them. :-)

I have some other locally stored debug patches for mmci, but those are
not re-based and I am not sure you want to deal with them as is.

I guess I need to set up the QEMU environment and run the tests
myself, unless we go for the revert path.
How do you perform the tests, is just a simple mounting/un-mounting
that triggers the problem?
Any specific things that I need to think of when running QEMU?

Kind regards
Uffe

>
> thanks
> -john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-17  7:33         ` Ulf Hansson
@ 2014-06-17 13:57           ` John Stultz
  2014-06-27 20:37           ` Kees Cook
  1 sibling, 0 replies; 11+ messages in thread
From: John Stultz @ 2014-06-17 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> It seems like you want to debug the mmci host driver and unfortunate
> the debug utilities available are only dev_dbg prints. I wouldn't be
> surprised if the problem goes away when you enable them. :-)
>
> I have some other locally stored debug patches for mmci, but those are
> not re-based and I am not sure you want to deal with them as is.
>
> I guess I need to set up the QEMU environment and run the tests
> myself, unless we go for the revert path.

Again, a simple revert *doesn't* seem to work. There's something else
in the 3.14->3.15-rc1 series that is also causing problems. Reverting
the patch only seems to work very close to the point that the commit
was made. I'm trying to redo the bisection reverting the first
problematic change each step to see if I can narrow the second problem
(but I've had some interruptions so I've not made much progress there
yet).

> How do you perform the tests, is just a simple mounting/un-mounting
> that triggers the problem?
> Any specific things that I need to think of when running QEMU?

Unfortunately its not quite so simple as mounting/unmounting.

My environment is the Linaro Android image:
http://releases.linaro.org/14.04/android/images/armv7-lsk/vexpress.img.bz2
(using the initrd in
http://releases.linaro.org/14.04/android/images/armv7-lsk/boot.tar.bz2).

My qemu line is:
QEMU_AUDIO_DRV=none qemu-system-arm -kernel zImage -initrd initrd -M
vexpress-a9 -cpu cortex-a9 -nographic -vnc :0 -m 1G -append
'root=/dev/mmcblk0p1 rw mem=1024M raid=noautodetect
console=ttyAMA0,38400n8 rootwait vmalloc=256MB devtmpfs.mount=0' -sd
vexpress-android.img -redir tcp:5555::5555

The first bootup is usually *very* slow (since it has to dex
everything), and I usually use the zImage kernel in the boot.tar.bz2
to let that finish so I can get a sane backup image. Connecting via
vnc to localhost:0 will let you know when its booted all the way and
finished dexing. I'd expect to wait ~5-10 minutes.

I then kill qemu, and make a backup copy of the disk image so future
boots are a bit faster.

And then booting again with my kernel zImage (config attached).

It most frequently triggers within ~90 seconds of boot, but not always
(I'll go ahead and kill qemu again if I don't see it). Usually I can
reproduce it within 4 boots. One potential clue here is that it
doesn't seem to trigger on the first boot w/ the affected kernel.

Every time I see corruption, I revert back to the backup dex'ed disk image.

thanks
-john
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qemu.config
Type: application/octet-stream
Size: 62989 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140617/523bf999/attachment-0001.obj>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-17  7:33         ` Ulf Hansson
  2014-06-17 13:57           ` John Stultz
@ 2014-06-27 20:37           ` Kees Cook
  2014-06-27 22:53             ` John Stultz
  1 sibling, 1 reply; 11+ messages in thread
From: Kees Cook @ 2014-06-27 20:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>>> This patch based upon my latest mmc tree and the next branch. I tried
>>>> to apply it for 3.15, and I think you will be able resolve the
>>>> conflict - I should be quite trivial.
>>>
>>> No worries. I just didn't want to waste time resolving it if it was
>>> logically dependent on some other change.
>>>
>>> I'll give it a shot and get back to you.
>>
>> So unfortunately I'm still seeing trouble..
>>
>> [   94.202843] EXT4-fs error (device mmcblk0p5):
>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
>> gd; block bitmap corrupt.
>> [   94.203873] Aborting journal on device mmcblk0p5-8.
>> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>> panic forced after error
>> [   94.206553]
>> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
>> 3.15.0-00002-g044f37a-dirty #589
>> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
>> (show_stack+0x11/0x14)
>> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
>> (dump_stack+0x59/0x7c)
>> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
>> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
>> (ext4_handle_error+0x69/0x74)
>> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
>> (__ext4_grp_locked_error+0x6b/0x160)
>> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
>> (ext4_mb_init_cache+0x219/0x4e0)
>> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
>> (ext4_mb_init_group+0xbb/0x13c)
>> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
>> (ext4_mb_good_group+0xf3/0xfc)
>> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
>> (ext4_mb_regular_allocator+0x153/0x2c4)
>> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
>> (ext4_ext_map_blocks+0x965/0x10f0)
>> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
>> (ext4_map_blocks+0xff/0x374)
>> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
>> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
>> (do_writepages+0x19/0x28)
>> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
>> (__filemap_fdatawrite_range+0x3d/0x44)
>> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
>> [<c008cc3f>] (filemap_flush+0x23/0x28)
>> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
>> (ext4_rename+0x2f9/0x3e4)
>> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
>> (vfs_rename+0x183/0x45c)
>> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
>> (SyS_renameat2+0x22b/0x26c)
>> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
>> (SyS_rename+0x1f/0x24)
>> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
>> (ret_fast_syscall+0x1/0x5c)
>>
>>
>> That said, this mirrors the behavior when I was reverting your change
>> by hand on-top of 3.15. While git bisect pointed to your patch and
>> reverting it from the commit seems to resolve the issue at that point,
>> there seems to be some other commit in the 3.14->3.15-rc1 interval
>> that is causing problems as well.
>>
>> Are there any sort of debugging options for mmc that I can use to try
>> to better narrow down whats going wrong?
>
> It seems like you want to debug the mmci host driver and unfortunate
> the debug utilities available are only dev_dbg prints. I wouldn't be
> surprised if the problem goes away when you enable them. :-)
>
> I have some other locally stored debug patches for mmci, but those are
> not re-based and I am not sure you want to deal with them as is.
>
> I guess I need to set up the QEMU environment and run the tests
> myself, unless we go for the revert path.
> How do you perform the tests, is just a simple mounting/un-mounting
> that triggers the problem?
> Any specific things that I need to think of when running QEMU?

FWIW, I'm hitting this problem as well. For me, it is every time I try
to boot. Only reverting to 3.14 makes it go away, and this series
doesn't fix it for me either. :(

My only difference is that I don't run with an initrd:

qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb
rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage
-drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append
"root=/dev/mmcblk0p1 console=ttyAMA0"

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-27 20:37           ` Kees Cook
@ 2014-06-27 22:53             ` John Stultz
  2014-07-01 17:45               ` Kees Cook
  0 siblings, 1 reply; 11+ messages in thread
From: John Stultz @ 2014-06-27 22:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
>>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>>>> This patch based upon my latest mmc tree and the next branch. I tried
>>>>> to apply it for 3.15, and I think you will be able resolve the
>>>>> conflict - I should be quite trivial.
>>>>
>>>> No worries. I just didn't want to waste time resolving it if it was
>>>> logically dependent on some other change.
>>>>
>>>> I'll give it a shot and get back to you.
>>>
>>> So unfortunately I'm still seeing trouble..
>>>
>>> [   94.202843] EXT4-fs error (device mmcblk0p5):
>>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
>>> gd; block bitmap corrupt.
>>> [   94.203873] Aborting journal on device mmcblk0p5-8.
>>> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>>> panic forced after error
>>> [   94.206553]
>>> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
>>> 3.15.0-00002-g044f37a-dirty #589
>>> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
>>> (show_stack+0x11/0x14)
>>> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
>>> (dump_stack+0x59/0x7c)
>>> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
>>> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
>>> (ext4_handle_error+0x69/0x74)
>>> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
>>> (__ext4_grp_locked_error+0x6b/0x160)
>>> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
>>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>>> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
>>> (ext4_mb_init_cache+0x219/0x4e0)
>>> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
>>> (ext4_mb_init_group+0xbb/0x13c)
>>> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
>>> (ext4_mb_good_group+0xf3/0xfc)
>>> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
>>> (ext4_mb_regular_allocator+0x153/0x2c4)
>>> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
>>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>>> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
>>> (ext4_ext_map_blocks+0x965/0x10f0)
>>> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
>>> (ext4_map_blocks+0xff/0x374)
>>> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
>>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
>>> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
>>> (do_writepages+0x19/0x28)
>>> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
>>> (__filemap_fdatawrite_range+0x3d/0x44)
>>> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
>>> [<c008cc3f>] (filemap_flush+0x23/0x28)
>>> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
>>> (ext4_rename+0x2f9/0x3e4)
>>> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
>>> (vfs_rename+0x183/0x45c)
>>> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
>>> (SyS_renameat2+0x22b/0x26c)
>>> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
>>> (SyS_rename+0x1f/0x24)
>>> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
>>> (ret_fast_syscall+0x1/0x5c)
>>>
>>>
>>> That said, this mirrors the behavior when I was reverting your change
>>> by hand on-top of 3.15. While git bisect pointed to your patch and
>>> reverting it from the commit seems to resolve the issue at that point,
>>> there seems to be some other commit in the 3.14->3.15-rc1 interval
>>> that is causing problems as well.
>>>
>>> Are there any sort of debugging options for mmc that I can use to try
>>> to better narrow down whats going wrong?
>>
>> It seems like you want to debug the mmci host driver and unfortunate
>> the debug utilities available are only dev_dbg prints. I wouldn't be
>> surprised if the problem goes away when you enable them. :-)
>>
>> I have some other locally stored debug patches for mmci, but those are
>> not re-based and I am not sure you want to deal with them as is.
>>
>> I guess I need to set up the QEMU environment and run the tests
>> myself, unless we go for the revert path.
>> How do you perform the tests, is just a simple mounting/un-mounting
>> that triggers the problem?
>> Any specific things that I need to think of when running QEMU?
>
> FWIW, I'm hitting this problem as well. For me, it is every time I try
> to boot. Only reverting to 3.14 makes it go away, and this series
> doesn't fix it for me either. :(
>
> My only difference is that I don't run with an initrd:
>
> qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb
> rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage
> -drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append
> "root=/dev/mmcblk0p1 console=ttyAMA0"

I've been continuing to try to bisect this down with
8d94b54d99ea968a9d188ca0e68793ebed601220 and
e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems
like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the
bisection results are really inconsistent.  I suspect it actually
shows up earlier, its just its harder to trip the problem with the
patches reverted, so I'm marking good commits that are actually bad.

If you are seeing this on every bootup, it might be worth trying to do
the bisection with the two commits above reverted to see if you can
narrow it down any better?

thanks
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-06-27 22:53             ` John Stultz
@ 2014-07-01 17:45               ` Kees Cook
  2014-07-01 18:19                 ` John Stultz
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2014-07-01 17:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 27, 2014 at 3:53 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
>>>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>>>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>>>>> This patch based upon my latest mmc tree and the next branch. I tried
>>>>>> to apply it for 3.15, and I think you will be able resolve the
>>>>>> conflict - I should be quite trivial.
>>>>>
>>>>> No worries. I just didn't want to waste time resolving it if it was
>>>>> logically dependent on some other change.
>>>>>
>>>>> I'll give it a shot and get back to you.
>>>>
>>>> So unfortunately I'm still seeing trouble..
>>>>
>>>> [   94.202843] EXT4-fs error (device mmcblk0p5):
>>>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
>>>> gd; block bitmap corrupt.
>>>> [   94.203873] Aborting journal on device mmcblk0p5-8.
>>>> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>>>> panic forced after error
>>>> [   94.206553]
>>>> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
>>>> 3.15.0-00002-g044f37a-dirty #589
>>>> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
>>>> (show_stack+0x11/0x14)
>>>> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
>>>> (dump_stack+0x59/0x7c)
>>>> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
>>>> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
>>>> (ext4_handle_error+0x69/0x74)
>>>> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
>>>> (__ext4_grp_locked_error+0x6b/0x160)
>>>> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
>>>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>>>> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
>>>> (ext4_mb_init_cache+0x219/0x4e0)
>>>> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
>>>> (ext4_mb_init_group+0xbb/0x13c)
>>>> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
>>>> (ext4_mb_good_group+0xf3/0xfc)
>>>> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
>>>> (ext4_mb_regular_allocator+0x153/0x2c4)
>>>> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
>>>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>>>> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
>>>> (ext4_ext_map_blocks+0x965/0x10f0)
>>>> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
>>>> (ext4_map_blocks+0xff/0x374)
>>>> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
>>>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
>>>> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
>>>> (do_writepages+0x19/0x28)
>>>> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
>>>> (__filemap_fdatawrite_range+0x3d/0x44)
>>>> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
>>>> [<c008cc3f>] (filemap_flush+0x23/0x28)
>>>> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
>>>> (ext4_rename+0x2f9/0x3e4)
>>>> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
>>>> (vfs_rename+0x183/0x45c)
>>>> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
>>>> (SyS_renameat2+0x22b/0x26c)
>>>> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
>>>> (SyS_rename+0x1f/0x24)
>>>> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
>>>> (ret_fast_syscall+0x1/0x5c)
>>>>
>>>>
>>>> That said, this mirrors the behavior when I was reverting your change
>>>> by hand on-top of 3.15. While git bisect pointed to your patch and
>>>> reverting it from the commit seems to resolve the issue at that point,
>>>> there seems to be some other commit in the 3.14->3.15-rc1 interval
>>>> that is causing problems as well.
>>>>
>>>> Are there any sort of debugging options for mmc that I can use to try
>>>> to better narrow down whats going wrong?
>>>
>>> It seems like you want to debug the mmci host driver and unfortunate
>>> the debug utilities available are only dev_dbg prints. I wouldn't be
>>> surprised if the problem goes away when you enable them. :-)
>>>
>>> I have some other locally stored debug patches for mmci, but those are
>>> not re-based and I am not sure you want to deal with them as is.
>>>
>>> I guess I need to set up the QEMU environment and run the tests
>>> myself, unless we go for the revert path.
>>> How do you perform the tests, is just a simple mounting/un-mounting
>>> that triggers the problem?
>>> Any specific things that I need to think of when running QEMU?
>>
>> FWIW, I'm hitting this problem as well. For me, it is every time I try
>> to boot. Only reverting to 3.14 makes it go away, and this series
>> doesn't fix it for me either. :(
>>
>> My only difference is that I don't run with an initrd:
>>
>> qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb
>> rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage
>> -drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append
>> "root=/dev/mmcblk0p1 console=ttyAMA0"
>
> I've been continuing to try to bisect this down with
> 8d94b54d99ea968a9d188ca0e68793ebed601220 and
> e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems
> like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the
> bisection results are really inconsistent.  I suspect it actually
> shows up earlier, its just its harder to trip the problem with the
> patches reverted, so I'm marking good commits that are actually bad.
>
> If you are seeing this on every bootup, it might be worth trying to do
> the bisection with the two commits above reverted to see if you can
> narrow it down any better?

And now I can't reproduce it! I think I was being tricked by
filesystem corruption that spanned some of my test boots. I'm going to
start this over and try again.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant
  2014-07-01 17:45               ` Kees Cook
@ 2014-07-01 18:19                 ` John Stultz
  0 siblings, 0 replies; 11+ messages in thread
From: John Stultz @ 2014-07-01 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 1, 2014 at 10:45 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jun 27, 2014 at 3:53 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@chromium.org> wrote:
>> I've been continuing to try to bisect this down with
>> 8d94b54d99ea968a9d188ca0e68793ebed601220 and
>> e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems
>> like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the
>> bisection results are really inconsistent.  I suspect it actually
>> shows up earlier, its just its harder to trip the problem with the
>> patches reverted, so I'm marking good commits that are actually bad.
>>
>> If you are seeing this on every bootup, it might be worth trying to do
>> the bisection with the two commits above reverted to see if you can
>> narrow it down any better?
>
> And now I can't reproduce it! I think I was being tricked by
> filesystem corruption that spanned some of my test boots. I'm going to
> start this over and try again.

Yea. :( I've been starting over and over on this quite a bit as well.
With those two patches reverted, it gets harder to reproduce reliably,
but I'm still seeing trobule.

Bisecting is still not pointing to anything obvious, my current
earliest bad commit right now is:
2aafe1a4d451866e3e7b476e2fa0813b69b313c1. I'm still working on it, but
its slow going.


thanks
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-01 18:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-16  8:09 [PATCH V2 3/3] mmc: mmci: Reverse IRQ handling for the arm_variant Ulf Hansson
2014-06-16 18:58 ` John Stultz
2014-06-16 21:20   ` Ulf Hansson
2014-06-16 22:41     ` John Stultz
2014-06-16 23:29       ` John Stultz
2014-06-17  7:33         ` Ulf Hansson
2014-06-17 13:57           ` John Stultz
2014-06-27 20:37           ` Kees Cook
2014-06-27 22:53             ` John Stultz
2014-07-01 17:45               ` Kees Cook
2014-07-01 18:19                 ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).