linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AHCI bug?: a lockup in ahci_interrupt with fbs enabled pmp
@ 2013-06-06  6:31 Yu Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Yu Liu @ 2013-06-06  6:31 UTC (permalink / raw)
  To: linux-ide

Hi all,

I met a lockup while I was running IO test on disks connected
with an fbs enabled pmp board and an ahci host.

looks like the reason for the lockup is as below:
    ahci_interrert()
        | spin_lock(&host->lock);        // get host->lock
        | ahci_port_intr()
            | ahci_error_intr()                // status & PORT_IRQ_ERROR
                | ata_link_online()           // if fbs_enabled
                    | sata_scr_read()
                        | sata_pmp_scr_read()    // using pmp
                            |ata_exec_internal()
                                | ata_exec_internal_sg()
                                    | spin_lock_irqsave(ap->lock, flags);
since ap->lock == &host->lock,
these two spin_lock get conflict

Did I miss anything? Can someone confirm the issue?

my dump info is listed below:
---
RIP: 0010:[<ffffffff814c867f>]  [<ffffffff814c867f>]
_spin_lock_irqsave+0x2f/0x40
RSP: 0000:ffff880001e43c38  EFLAGS: 00000097
RAX: 000000000000b685 RBX: ffff88007654de48 RCX: 000000000000b684
RDX: 0000000000000082 RSI: ffff880001e43d88 RDI: ffff88007930c158
RBP: ffff880001e43c38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000001ff8 R11: 0000000000000246 R12: ffff88007654c000
R13: ffff88007654de48 R14: ffff88007654dce0 R15: 0000000000000000
FS:  00007fc12e0eb700(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f3b55311cc1 CR3: 0000000078f7a000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gzip (pid: 15126, threadinfo ffff880067a86000, task ffff880037d02a70)
Stack:
 ffff880001e43ce8 ffffffff813558f7 ffff880000000003 0000000000000000
<0> ffffffff81328537 ffff880001e43cc0 000000008134743b ffff880001e43cc0
<0> e4ffffff81262545 ffff880001e43d88 0000000000000000 ffff880001e43cc0
Call Trace:
 <IRQ>
 [<ffffffff813558f7>] ata_exec_internal_sg+0x67/0x570
 [<ffffffff81328537>] ? put_device+0x17/0x20
 [<ffffffff81355e79>] ata_exec_internal+0x79/0xb0
 [<ffffffff8134699f>] ? scsi_run_queue+0xcf/0x380
 [<ffffffff813406c0>] ? __scsi_put_command+0x60/0xa0
 [<ffffffff8136740f>] sata_pmp_read+0x7f/0xb0
 [<ffffffff8101adf5>] ? native_sched_clock+0x15/0x70
 [<ffffffff81367505>] sata_pmp_scr_read+0x35/0xb0
 [<ffffffff81353096>] sata_scr_read+0x26/0x60
 [<ffffffff81353708>] ata_phys_link_online+0x18/0x30
 [<ffffffff81353750>] ata_link_online+0x30/0x70
 [<ffffffffa066b964>] ahci_interrupt+0x684/0x790 [ahci]
 [<ffffffff810d7750>] handle_IRQ_event+0x60/0x170
 [<ffffffff81073983>] ? __do_softirq+0x113/0x1d0
 [<ffffffff810d9e46>] handle_edge_irq+0xc6/0x160
 [<ffffffff81015fb9>] handle_irq+0x49/0xa0
 [<ffffffff814cd62c>] do_IRQ+0x6c/0xf0
 [<ffffffff81013ad3>] ret_from_intr+0x0/0x11
 <EOI>


Thanks,
Yu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp
@ 2013-06-06  6:33 Yu Liu
  2013-06-06 21:47 ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Yu Liu @ 2013-06-06  6:33 UTC (permalink / raw)
  To: linux-ide

Hi all,

I met a lockup while I was running IO test on disks connected
with an fbs enabled pmp board and an ahci host.

looks like the reason for the lockup is as below:
    ahci_interrert()
        | spin_lock(&host->lock);                // get host->lock
        | ahci_port_intr()
            | ahci_error_intr()                  // status & PORT_IRQ_ERROR
                | ata_link_online()              // if fbs_enabled
                    | sata_scr_read()
                        | sata_pmp_scr_read()    // using pmp
                            | ata_exec_internal()
                                | ata_exec_internal_sg()
                                    | spin_lock_irqsave(ap->lock, flags);
since ap->lock == &host->lock,
these two spin_lock get conflict

Can someone confirm the issue? Did I miss anything?


my dump info is listed below:
---
RIP: 0010:[<ffffffff814c867f>]  [<ffffffff814c867f>]
_spin_lock_irqsave+0x2f/0x40
RSP: 0000:ffff880001e43c38  EFLAGS: 00000097
RAX: 000000000000b685 RBX: ffff88007654de48 RCX: 000000000000b684
RDX: 0000000000000082 RSI: ffff880001e43d88 RDI: ffff88007930c158
RBP: ffff880001e43c38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000001ff8 R11: 0000000000000246 R12: ffff88007654c000
R13: ffff88007654de48 R14: ffff88007654dce0 R15: 0000000000000000
FS:  00007fc12e0eb700(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f3b55311cc1 CR3: 0000000078f7a000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gzip (pid: 15126, threadinfo ffff880067a86000, task ffff880037d02a70)
Stack:
 ffff880001e43ce8 ffffffff813558f7 ffff880000000003 0000000000000000
<0> ffffffff81328537 ffff880001e43cc0 000000008134743b ffff880001e43cc0
<0> e4ffffff81262545 ffff880001e43d88 0000000000000000 ffff880001e43cc0
Call Trace:
 <IRQ>
 [<ffffffff813558f7>] ata_exec_internal_sg+0x67/0x570
 [<ffffffff81328537>] ? put_device+0x17/0x20
 [<ffffffff81355e79>] ata_exec_internal+0x79/0xb0
 [<ffffffff8134699f>] ? scsi_run_queue+0xcf/0x380
 [<ffffffff813406c0>] ? __scsi_put_command+0x60/0xa0
 [<ffffffff8136740f>] sata_pmp_read+0x7f/0xb0
 [<ffffffff8101adf5>] ? native_sched_clock+0x15/0x70
 [<ffffffff81367505>] sata_pmp_scr_read+0x35/0xb0
 [<ffffffff81353096>] sata_scr_read+0x26/0x60
 [<ffffffff81353708>] ata_phys_link_online+0x18/0x30
 [<ffffffff81353750>] ata_link_online+0x30/0x70
 [<ffffffffa066b964>] ahci_interrupt+0x684/0x790 [ahci]
 [<ffffffff810d7750>] handle_IRQ_event+0x60/0x170
 [<ffffffff81073983>] ? __do_softirq+0x113/0x1d0
 [<ffffffff810d9e46>] handle_edge_irq+0xc6/0x160
 [<ffffffff81015fb9>] handle_irq+0x49/0xa0
 [<ffffffff814cd62c>] do_IRQ+0x6c/0xf0
 [<ffffffff81013ad3>] ret_from_intr+0x0/0x11
 <EOI>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp
  2013-06-06  6:33 AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp Yu Liu
@ 2013-06-06 21:47 ` Tejun Heo
  2013-06-07  4:29   ` Huang, Shane
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2013-06-06 21:47 UTC (permalink / raw)
  To: Yu Liu; +Cc: linux-ide, Shane Huang

Cc'ing Shane.

On Thu, Jun 06, 2013 at 02:33:20PM +0800, Yu Liu wrote:
> Hi all,
> 
> I met a lockup while I was running IO test on disks connected
> with an fbs enabled pmp board and an ahci host.
> 
> looks like the reason for the lockup is as below:
>     ahci_interrert()
>         | spin_lock(&host->lock);                // get host->lock
>         | ahci_port_intr()
>             | ahci_error_intr()                  // status & PORT_IRQ_ERROR
>                 | ata_link_online()              // if fbs_enabled
>                     | sata_scr_read()
>                         | sata_pmp_scr_read()    // using pmp
>                             | ata_exec_internal()
>                                 | ata_exec_internal_sg()
>                                     | spin_lock_irqsave(ap->lock, flags);
> since ap->lock == &host->lock,
> these two spin_lock get conflict
> 
> Can someone confirm the issue? Did I miss anything?

Yeah, it's a bug.  ata_link_online() can't be called from interrupt
handlers.  Shane?  Can you please look into it?  What's the purpose of
ata_link_online() in ahci_error_intr()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp
  2013-06-06 21:47 ` Tejun Heo
@ 2013-06-07  4:29   ` Huang, Shane
  2013-06-07 22:56     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Huang, Shane @ 2013-06-07  4:29 UTC (permalink / raw)
  To: Tejun Heo, Yu Liu; +Cc: linux-ide@vger.kernel.org, Huang, Shane

> Yeah, it's a bug.  ata_link_online() can't be called from interrupt
> handlers.  Shane?  Can you please look into it?  What's the purpose
> of ata_link_online() in ahci_error_intr()?

ata_link_online() was used to check that pmp link is active...
which should be replaced by ata_link_active()?


Thanks,
Shane



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp
  2013-06-07  4:29   ` Huang, Shane
@ 2013-06-07 22:56     ` Tejun Heo
  2013-06-08  5:59       ` Huang, Shane
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2013-06-07 22:56 UTC (permalink / raw)
  To: Huang, Shane; +Cc: Yu Liu, linux-ide@vger.kernel.org

On Fri, Jun 07, 2013 at 04:29:47AM +0000, Huang, Shane wrote:
> > Yeah, it's a bug.  ata_link_online() can't be called from interrupt
> > handlers.  Shane?  Can you please look into it?  What's the purpose
> > of ata_link_online() in ahci_error_intr()?
> 
> ata_link_online() was used to check that pmp link is active...
> which should be replaced by ata_link_active()?

ata_link_sactive() asks whether there are commands in progress.  I
don't think that fits in there.  Can't it just bounce to EH for actual
error handling?  Why is the link online check necessary?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp
  2013-06-07 22:56     ` Tejun Heo
@ 2013-06-08  5:59       ` Huang, Shane
  2013-06-08  6:01         ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Huang, Shane @ 2013-06-08  5:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Yu Liu, linux-ide@vger.kernel.org, Huang, Shane

Tejun,

> ata_link_sactive() asks whether there are commands in progress.  I
> don't think that fits in there.  Can't it just bounce to EH for actual
> error handling?  Why is the link online check necessary?

I tried hard to recall why I put ata_link_online() check there, at
last I find it was suggested by you in 2009 when you reviewed v2 :-)

http://marc.info/?l=linux-ide&m=125170571525422&w=2

Need I submit a patch to remove online check or you will handle it?

Thanks,
Shane



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp
  2013-06-08  5:59       ` Huang, Shane
@ 2013-06-08  6:01         ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2013-06-08  6:01 UTC (permalink / raw)
  To: Huang, Shane; +Cc: Yu Liu, linux-ide@vger.kernel.org

Hello,

On Sat, Jun 08, 2013 at 05:59:27AM +0000, Huang, Shane wrote:
> > ata_link_sactive() asks whether there are commands in progress.  I
> > don't think that fits in there.  Can't it just bounce to EH for actual
> > error handling?  Why is the link online check necessary?
> 
> I tried hard to recall why I put ata_link_online() check there, at
> last I find it was suggested by you in 2009 when you reviewed v2 :-)
> 
> http://marc.info/?l=linux-ide&m=125170571525422&w=2

Heh, so it's my own stupidity. :)

> Need I submit a patch to remove online check or you will handle it?

It'd be great if you can submit a patch.

Thanks a lot!

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-08  6:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-06  6:33 AHCI bug: a lockup in ahci_interrupt with fbs enabled pmp Yu Liu
2013-06-06 21:47 ` Tejun Heo
2013-06-07  4:29   ` Huang, Shane
2013-06-07 22:56     ` Tejun Heo
2013-06-08  5:59       ` Huang, Shane
2013-06-08  6:01         ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2013-06-06  6:31 AHCI bug?: " Yu Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).