public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Emulex lpfcdriver v8.0.2 available
@ 2004-05-26 23:19 Smart, James
  2004-06-01 18:52 ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Smart, James @ 2004-05-26 23:19 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: 'linux-scsi@vger.kernel.org'

Christoph,

FYI - we've updated the image on SourceForge. In this drop, we've addressed
many of your comments, including those on moving the discovery tasklet to a
thread, and moving the timer functions inline. The full changelog is up on
SF.

Thanks.

-- James S

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Emulex lpfcdriver v8.0.2 available
  2004-05-26 23:19 Emulex lpfcdriver v8.0.2 available Smart, James
@ 2004-06-01 18:52 ` Anton Blanchard
  2004-06-01 19:01   ` 'Christoph Hellwig'
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2004-06-01 18:52 UTC (permalink / raw)
  To: Smart, James
  Cc: 'Christoph Hellwig', 'linux-scsi@vger.kernel.org'


Hi James,

> FYI - we've updated the image on SourceForge. In this drop, we've addressed
> many of your comments, including those on moving the discovery tasklet to a
> thread, and moving the timer functions inline. The full changelog is up on
> SF.

We gave it a spin on a ppc64 box and managed to hit the following
backtrace when doing heavy writeout. I wonder if a number of these
atomic allocations (in lpfc_get_scsi_buf etc) should be using a mempool,
otherwise we could deadlock on writeout.

Anton

md0_raid5: page allocation failure. order:0, mode:0x20
Call Trace:
[c000000000078e80] .__alloc_pages+0x448/0x450 (unreliable)
[c00000000009be0c] .alloc_pages_current+0xc0/0xe4
[c000000000078edc] .__get_free_pages+0x54/0x1e0
[c00000000007df40] .kmem_getpages+0x48/0x210
[c00000000007e3f8] .cache_alloc_refill+0x2f0/0x67c
[c00000000007e8dc] .kmem_cache_alloc+0x74/0x78
[d0000000003bdd98] .lpfc_get_scsi_buf+0x40/0x2bc [lpfc]
[d0000000003be1c4] .lpfc_queuecommand+0xa0/0xac4 [lpfc]
[d00000000006e06c] .scsi_dispatch_cmd+0x22c/0x378 [scsi_mod]
[d000000000074b74] .scsi_request_fn+0x2e4/0x590 [scsi_mod]
[c00000000021b5c0] .blk_run_queue+0x60/0x90
[d000000000073dd8] .scsi_run_queue+0x134/0x2a8 [scsi_mod]
[d000000000075068] .scsi_end_request+0x12c/0x168 [scsi_mod]
[d000000000075220] .scsi_io_completion+0x17c/0x524 [scsi_mod]
[d00000000003a5fc] .sd_rw_intr+0xac/0x304 [sd_mod]
[d00000000006d8b4] .scsi_finish_command+0xd8/0x130 [scsi_mod]
[d00000000006e4e8] .scsi_softirq+0x178/0x188 [scsi_mod]
[c000000000053120] .__do_softirq+0xa8/0x16c
[c0000000000171c8] .call_do_softirq+0x14/0x24
[c000000000011ddc] .do_softirq+0x90/0xa0
[c000000000012b54] .do_IRQ+0xf8/0x118
[c00000000000b034] HardwareInterrupt_entry+0x14/0x18
--- Exception: 500 at .__make_request+0x428/0x810
    LR = .__make_request+0x3b8/0x810
[c0000000002197c4] .generic_make_request+0x138/0x21c
[d0000000004485d4] .handle_stripe+0x1160/0x13d8 [raid5]
[d000000000449f7c] .raid5d+0xe0/0x248 [raid5]
[c00000000028e28c] .md_thread+0x204/0x294
[c000000000017764] .kernel_thread+0x4c/0x68

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Emulex lpfcdriver v8.0.2 available
  2004-06-01 18:52 ` Anton Blanchard
@ 2004-06-01 19:01   ` 'Christoph Hellwig'
  0 siblings, 0 replies; 9+ messages in thread
From: 'Christoph Hellwig' @ 2004-06-01 19:01 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Smart, James, 'linux-scsi@vger.kernel.org'

On Wed, Jun 02, 2004 at 04:52:00AM +1000, Anton Blanchard wrote:
> We gave it a spin on a ppc64 box and managed to hit the following
> backtrace when doing heavy writeout. I wonder if a number of these
> atomic allocations (in lpfc_get_scsi_buf etc) should be using a mempool,
> otherwise we could deadlock on writeout.

The ->queuecommand path has a few more problems.  The number of different
memory allocation is just too many, this needs some thinking into needing
less allocations.  Most allocations are happening under a spinlock and
with irqs disabled which is not exactly a good idea.  And the whole
I/O submission path is very badly structured and hard to follow.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Emulex lpfcdriver v8.0.2 available
@ 2004-06-02 13:03 Smart, James
  2004-06-08  7:19 ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Smart, James @ 2004-06-02 13:03 UTC (permalink / raw)
  To: 'Christoph Hellwig', Anton Blanchard
  Cc: 'linux-scsi@vger.kernel.org'


Anton,

Thanks for the feedback.

We've avoided (actually removed) more driver memory pools. The thought was
that it was unneeded, especially as kmem_alloc is ultimately mempool backed.

The real issue, as Christoph points out, is the allocations taking place
under spinlocks, thus requiring the atomic allocations. The driver has a
very ugly lock scheme due to past history, and one which we are hesitant to
rewrite right now (it will happen, but not in the short term).  

Anyway - there's a couple of things we're going to do...  The first is to
reorder the fastpath (queuecommand -> to the hardware) so that we can
perform the allocations outside of locks, thus loose the atomic requirement.
The number of allocations is not excessive, but there are a couple of spots
where separate control structures are always allocated which we can optimize
by embedding them in other structures.  Hopefully, this can make the code
drop due the end of the week.

-- James S


> -----Original Message-----
> From: 'Christoph Hellwig' [mailto:hch@infradead.org]
> Sent: Tuesday, June 01, 2004 3:01 PM
> To: Anton Blanchard
> Cc: Smart, James; 'linux-scsi@vger.kernel.org'
> Subject: Re: Emulex lpfcdriver v8.0.2 available
> 
> 
> On Wed, Jun 02, 2004 at 04:52:00AM +1000, Anton Blanchard wrote:
> > We gave it a spin on a ppc64 box and managed to hit the following
> > backtrace when doing heavy writeout. I wonder if a number of these
> > atomic allocations (in lpfc_get_scsi_buf etc) should be 
> using a mempool,
> > otherwise we could deadlock on writeout.
> 
> The ->queuecommand path has a few more problems.  The number 
> of different
> memory allocation is just too many, this needs some thinking 
> into needing
> less allocations.  Most allocations are happening under a spinlock and
> with irqs disabled which is not exactly a good idea.  And the whole
> I/O submission path is very badly structured and hard to follow.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Emulex lpfcdriver v8.0.2 available
  2004-06-02 13:03 Smart, James
@ 2004-06-08  7:19 ` Anton Blanchard
  0 siblings, 0 replies; 9+ messages in thread
From: Anton Blanchard @ 2004-06-08  7:19 UTC (permalink / raw)
  To: Smart, James
  Cc: 'Christoph Hellwig', 'linux-scsi@vger.kernel.org'


Hi James,

We scored the following oops on the lpfcdriver-2.6-8.0.3 driver.
lpfc_delay_done hits a NULL pointer. It looks like clkData is not
initialised:

        list_del((struct list_head *)clkData);

I noticed you use casts all over the place in list functions, why not just
pass in &clkData->list?

Anton

NIP: C0000000002D2A40 XER: 0000000020000000 LR: C000000000060AE0
REGS: c00000000ffffb40 TRAP: 0300   Not tainted  (2.6.7-rc2)
MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000000000, DSISR: 0000000042000000
TASK: c00000000066e840[0] 'swapper' THREAD: c000000000500000 CPU: 0
GPR00: 0000000000200200 C00000000FFFFDC0 C000000000832EB0 C00000203873E838
GPR04: C00000203873E800 C0000000009EDF00 0000000000000000 C000001FFF976000
GPR08: 0000000000000000 0000000000100100 0000000000000000 9000000000009032
GPR12: C000000000687FC0 C000000000504000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: C000000000503A00 000000C1000000D5 0000000000000000 0000000000000096
GPR28: C00000000FFFFEC0 C00000000FFFC000 C000000000693818 C000000000C6FF00
NIP [c0000000002d2a40] .lpfc_delay_done+0xa8/0xec
LR [c000000000060ae0] .run_timer_softirq+0x1bc/0x308
Call Trace:
[c000000000084374] .wb_timer_fn+0x20/0x6c (unreliable)
[c000000000060ae0] .run_timer_softirq+0x1bc/0x308
[c00000000005b774] .__do_softirq+0xa8/0x16c
[c000000000016b80] .call_do_softirq+0x14/0x24
[c00000000001102c] .do_softirq+0x90/0xa0
[c0000000000141a8] .timer_interrupt+0x3a4/0x47c
[c00000000000a2b4] Decrementer_common+0xb4/0x100
--- Exception: 901 at .default_idle+0x64/0xac
    LR = .default_idle+0xa4/0xac

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Emulex lpfcdriver v8.0.2 available
@ 2004-06-08 14:29 Smart, James
  2004-06-09  1:39 ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Smart, James @ 2004-06-08 14:29 UTC (permalink / raw)
  To: 'Anton Blanchard'
  Cc: 'Christoph Hellwig', 'linux-scsi@vger.kernel.org'

Anton,

Ouch!. The bug has already been fixed and will be in the next drop.  Also,
your comment is well taken.

-- james

> -----Original Message-----
> From: Anton Blanchard [mailto:anton@samba.org]
> Sent: Tuesday, June 08, 2004 3:20 AM
> To: Smart, James
> Cc: 'Christoph Hellwig'; 'linux-scsi@vger.kernel.org'
> Subject: Re: Emulex lpfcdriver v8.0.2 available
> 
> 
> 
> Hi James,
> 
> We scored the following oops on the lpfcdriver-2.6-8.0.3 driver.
> lpfc_delay_done hits a NULL pointer. It looks like clkData is not
> initialised:
> 
>         list_del((struct list_head *)clkData);
> 
> I noticed you use casts all over the place in list functions, 
> why not just
> pass in &clkData->list?
> 
> Anton
> 
> NIP: C0000000002D2A40 XER: 0000000020000000 LR: C000000000060AE0
> REGS: c00000000ffffb40 TRAP: 0300   Not tainted  (2.6.7-rc2)
> MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: 0000000000000000, DSISR: 0000000042000000
> TASK: c00000000066e840[0] 'swapper' THREAD: c000000000500000 CPU: 0
> GPR00: 0000000000200200 C00000000FFFFDC0 C000000000832EB0 
> C00000203873E838
> GPR04: C00000203873E800 C0000000009EDF00 0000000000000000 
> C000001FFF976000
> GPR08: 0000000000000000 0000000000100100 0000000000000000 
> 9000000000009032
> GPR12: C000000000687FC0 C000000000504000 0000000000000000 
> 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> GPR24: C000000000503A00 000000C1000000D5 0000000000000000 
> 0000000000000096
> GPR28: C00000000FFFFEC0 C00000000FFFC000 C000000000693818 
> C000000000C6FF00
> NIP [c0000000002d2a40] .lpfc_delay_done+0xa8/0xec
> LR [c000000000060ae0] .run_timer_softirq+0x1bc/0x308
> Call Trace:
> [c000000000084374] .wb_timer_fn+0x20/0x6c (unreliable)
> [c000000000060ae0] .run_timer_softirq+0x1bc/0x308
> [c00000000005b774] .__do_softirq+0xa8/0x16c
> [c000000000016b80] .call_do_softirq+0x14/0x24
> [c00000000001102c] .do_softirq+0x90/0xa0
> [c0000000000141a8] .timer_interrupt+0x3a4/0x47c
> [c00000000000a2b4] Decrementer_common+0xb4/0x100
> --- Exception: 901 at .default_idle+0x64/0xac
>     LR = .default_idle+0xa4/0xac
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Emulex lpfcdriver v8.0.2 available
@ 2004-06-08 14:54 Smart, James
  2004-06-08 17:03 ` 'Christoph Hellwig'
  0 siblings, 1 reply; 9+ messages in thread
From: Smart, James @ 2004-06-08 14:54 UTC (permalink / raw)
  To: Smart, James, 'Christoph Hellwig', Anton Blanchard
  Cc: 'linux-scsi@vger.kernel.org'

FYI - and status update.

A malloc or two was condense on the fast path. However, the reorg so alloc
was not under lock is not done yet. Should be in the next drop. In
queuecommand, we will: unload the host lock, perform all allocations and do
prep data, then take the driver lock, complete state checks and submit to
hardware. Should allow all the ATOMIC allocations to go back to normal for
the queuecommand sequence.

-- James


> -----Original Message-----
> From: Smart, James 
> Sent: Wednesday, June 02, 2004 9:03 AM
> To: 'Christoph Hellwig'; Anton Blanchard
> Cc: 'linux-scsi@vger.kernel.org'
> Subject: RE: Emulex lpfcdriver v8.0.2 available
> 
> 
> 
> Anton,
> 
> Thanks for the feedback.
> 
> We've avoided (actually removed) more driver memory pools. 
> The thought was
> that it was unneeded, especially as kmem_alloc is ultimately 
> mempool backed.
> 
> The real issue, as Christoph points out, is the allocations 
> taking place
> under spinlocks, thus requiring the atomic allocations. The 
> driver has a
> very ugly lock scheme due to past history, and one which we 
> are hesitant to
> rewrite right now (it will happen, but not in the short term).  
> 
> Anyway - there's a couple of things we're going to do...  The 
> first is to
> reorder the fastpath (queuecommand -> to the hardware) so that we can
> perform the allocations outside of locks, thus loose the 
> atomic requirement.
> The number of allocations is not excessive, but there are a 
> couple of spots
> where separate control structures are always allocated which 
> we can optimize
> by embedding them in other structures.  Hopefully, this can 
> make the code
> drop due the end of the week.
> 
> -- James S
> 
> 
> > -----Original Message-----
> > From: 'Christoph Hellwig' [mailto:hch@infradead.org]
> > Sent: Tuesday, June 01, 2004 3:01 PM
> > To: Anton Blanchard
> > Cc: Smart, James; 'linux-scsi@vger.kernel.org'
> > Subject: Re: Emulex lpfcdriver v8.0.2 available
> > 
> > 
> > On Wed, Jun 02, 2004 at 04:52:00AM +1000, Anton Blanchard wrote:
> > > We gave it a spin on a ppc64 box and managed to hit the following
> > > backtrace when doing heavy writeout. I wonder if a number of these
> > > atomic allocations (in lpfc_get_scsi_buf etc) should be 
> > using a mempool,
> > > otherwise we could deadlock on writeout.
> > 
> > The ->queuecommand path has a few more problems.  The number 
> > of different
> > memory allocation is just too many, this needs some thinking 
> > into needing
> > less allocations.  Most allocations are happening under a 
> spinlock and
> > with irqs disabled which is not exactly a good idea.  And the whole
> > I/O submission path is very badly structured and hard to follow.
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Emulex lpfcdriver v8.0.2 available
  2004-06-08 14:54 Smart, James
@ 2004-06-08 17:03 ` 'Christoph Hellwig'
  0 siblings, 0 replies; 9+ messages in thread
From: 'Christoph Hellwig' @ 2004-06-08 17:03 UTC (permalink / raw)
  To: Smart, James; +Cc: Anton Blanchard, 'linux-scsi@vger.kernel.org'

On Tue, Jun 08, 2004 at 10:54:27AM -0400, Smart, James wrote:
> FYI - and status update.
> 
> A malloc or two was condense on the fast path. However, the reorg so alloc
> was not under lock is not done yet. Should be in the next drop. In
> queuecommand, we will: unload the host lock, perform all allocations and do
> prep data, then take the driver lock, complete state checks and submit to
> hardware. Should allow all the ATOMIC allocations to go back to normal for
> the queuecommand sequence.

Not quite normal, it still needs to be GFP_NOIO to avoid deadlocks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Emulex lpfcdriver v8.0.2 available
  2004-06-08 14:29 Smart, James
@ 2004-06-09  1:39 ` Anton Blanchard
  0 siblings, 0 replies; 9+ messages in thread
From: Anton Blanchard @ 2004-06-09  1:39 UTC (permalink / raw)
  To: Smart, James
  Cc: 'Christoph Hellwig', 'linux-scsi@vger.kernel.org'

 
> Ouch!. The bug has already been fixed and will be in the next drop.  Also,
> your comment is well taken.

Thanks James, another thing I noticed was we printed a whole bunch of:

scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

Looks like its due to the slave alloc call.

Anton

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-06-09  1:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 23:19 Emulex lpfcdriver v8.0.2 available Smart, James
2004-06-01 18:52 ` Anton Blanchard
2004-06-01 19:01   ` 'Christoph Hellwig'
  -- strict thread matches above, loose matches on Subject: below --
2004-06-02 13:03 Smart, James
2004-06-08  7:19 ` Anton Blanchard
2004-06-08 14:29 Smart, James
2004-06-09  1:39 ` Anton Blanchard
2004-06-08 14:54 Smart, James
2004-06-08 17:03 ` 'Christoph Hellwig'

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox