linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ips.c broken since 2.6.23 on x86_64?
@ 2008-02-13 21:43 Tim Pepper
       [not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>
  2008-02-14 11:48 ` FUJITA Tomonori
  0 siblings, 2 replies; 26+ messages in thread
From: Tim Pepper @ 2008-02-13 21:43 UTC (permalink / raw)
  To: linux-scsi; +Cc: FUJITA Tomonori, Mark_Salyzyn, ipslinux

We recently upgraded a production x86_64 machine with serveraid
cards to 2.6.24 and noted that /proc/scsi/scsi showed garbage for our
serveraid service processors.  sg_inq also returned garbage from the
service processors' sg devices.  After a few iterations I started seeing
meaninful stuff in the garbage.  Not sure if it was returning live memory
or just unzero'd.  Either way not good so we went back to a known good,
older kernel and tried to repro on a similar machine.  We got different,
but still bad results in terms of pointing at memory badness.

FWIW, the original machine had the following hardware:
    scsi0 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4H>
    scsi1 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4M>
and the repro's have been on a machine with just:
    scsi0 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4Mx>

On the repro machine I'm getting a hang on ips driver load with the following
logged:

Feb 13 13:16:08 ipstest kernel: [  915.236563] scsi3 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4Mx>
Feb 13 13:16:08 ipstest kernel: [  915.236839] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
Feb 13 13:16:08 ipstest kernel: [  915.236863]  [check_addr+16/80] check_addr+0x10/0x50
Feb 13 13:16:08 ipstest kernel: [  915.237209] PGD 79fff067 PUD 7a898067 PMD 0
Feb 13 13:16:08 ipstest kernel: [  915.237341] Oops: 0000 [1] SMP
Feb 13 13:16:08 ipstest kernel: [  915.237463] CPU 1
Feb 13 13:16:08 ipstest kernel: [  915.239436] Modules linked in: ips aic94xx
Feb 13 13:16:08 ipstest kernel: [  915.239559] Pid: 5213, comm: scsi_scan_3 Not tainted 2.6.23-ips_as_module #3
Feb 13 13:16:08 ipstest kernel: [  915.239692] RIP: 0010:[check_addr+16/80]  [check_addr+16/80] check_addr+0x10/0x50
Feb 13 13:16:08 ipstest kernel: [  915.239932] RSP: 0018:ffff810076d87900  EFLAGS: 00010082
Feb 13 13:16:08 ipstest kernel: [  915.240059] RAX: 0000000000000000 RBX: ffff81007b636300 RCX: 0000000000000024
Feb 13 13:16:08 ipstest kernel: [  915.240196] RDX: 000000007b636b00 RSI: ffffffff8077cde0 RDI: ffffffff806c4ed5
Feb 13 13:16:08 ipstest kernel: [  915.240332] RBP: ffff810076d87900 R08: 0000000000000500 R09: 0000000000000000
Feb 13 13:16:08 ipstest kernel: [  915.240468] R10: ffff81007aa33b40 R11: 0000000000000060 R12: 0000000000000000
Feb 13 13:16:08 ipstest kernel: [  915.240605] R13: 0000000000000001 R14: ffffffff8077cde0 R15: ffff81007aa33a80
Feb 13 13:16:08 ipstest kernel: [  915.240741] FS:  0000000000000000(0000) GS:ffff810001039300(0000) knlGS:0000000000000000
Feb 13 13:16:08 ipstest kernel: [  915.240981] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 13 13:16:08 ipstest kernel: [  915.241111] CR2: 0000000000000000 CR3: 0000000078a98000 CR4: 00000000000006e0
Feb 13 13:16:08 ipstest kernel: [  915.241248] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 13 13:16:08 ipstest kernel: [  915.241384] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 13 13:16:08 ipstest kernel: [  915.241520] Process scsi_scan_3 (pid: 5213, threadinfo ffff810076d86000, task ffff81007be26720)
Feb 13 13:16:08 ipstest kernel: [  915.241761] Stack:  ffff810076d87930 ffffffff802125c3 ffff81007aa33a80 ffff81007480cf50
Feb 13 13:16:08 ipstest kernel: [  915.242006]  0000000000000000 ffff81007ba38ca8 ffff810076d87940 ffffffff8046fb42
Feb 13 13:16:08 ipstest kernel: [  915.242248]  ffff810076d879c0 ffffffff8801c2ee ffff81007aa33af0 000000017aa33af0
Feb 13 13:16:08 ipstest kernel: [  915.242389] Call Trace:
Feb 13 13:16:08 ipstest kernel: [  915.242606]  [nommu_map_sg+115/144] nommu_map_sg+0x73/0x90
Feb 13 13:16:08 ipstest kernel: [  915.242736]  [scsi_dma_map+66/96] scsi_dma_map+0x42/0x60
Feb 13 13:16:08 ipstest kernel: [  915.242867]  [_end+124884230/2127548952] :ips:ips_next+0x33e/0xc00
Feb 13 13:16:08 ipstest kernel: [  915.242986]  [scsi_done+0/48] scsi_done+0x0/0x30
Feb 13 13:16:08 ipstest kernel: [  915.243114]  [_end+124896894/2127548952] :ips:ips_queue+0x106/0x1f0
Feb 13 13:16:08 ipstest kernel: [  915.243240]  [scsi_dispatch_cmd+498/784] scsi_dispatch_cmd+0x1f2/0x310
Feb 13 13:16:08 ipstest kernel: [  915.243370]  [scsi_request_fn+491/976] scsi_request_fn+0x1eb/0x3d0
Feb 13 13:16:08 ipstest kernel: [  915.243500]  [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30
Feb 13 13:16:08 ipstest kernel: [  915.243630]  [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0
Feb 13 13:16:08 ipstest kernel: [  915.243761]  [blk_execute_rq+122/224] blk_execute_rq+0x7a/0xe0
Feb 13 13:16:08 ipstest kernel: [  915.243889]  [scsi_execute+240/288] scsi_execute+0xf0/0x120
Feb 13 13:16:08 ipstest kernel: [  915.244016]  [scsi_execute_req+134/240] scsi_execute_req+0x86/0xf0
Feb 13 13:16:08 ipstest kernel: [  915.244145]  [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90
Feb 13 13:16:08 ipstest kernel: [  915.244279]  [sas_expander_match+27/160] sas_expander_match+0x1b/0xa0
Feb 13 13:16:08 ipstest kernel: [  915.244412]  [get_device+23/32] get_device+0x17/0x20
Feb 13 13:16:08 ipstest kernel: [  915.244534]  [__scsi_scan_target+220/1696] __scsi_scan_target+0xdc/0x6a0
Feb 13 13:16:08 ipstest kernel: [  915.244665]  [enqueue_entity+172/432] enqueue_entity+0xac/0x1b0
Feb 13 13:16:08 ipstest kernel: [  915.244793]  [update_curr_load+135/160] update_curr_load+0x87/0xa0
Feb 13 13:16:08 ipstest kernel: [  915.244923]  [__check_preempt_curr_fair+107/128] __check_preempt_curr_fair+0x6b/0x80
Feb 13 13:16:08 ipstest kernel: [  915.245057]  [update_curr+258/272] update_curr+0x102/0x110
Feb 13 13:16:08 ipstest kernel: [  915.245186]  [scsi_scan_channel+139/160] scsi_scan_channel+0x8b/0xa0
Feb 13 13:16:08 ipstest kernel: [  915.245315]  [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160
Feb 13 13:16:08 ipstest kernel: [  915.245447]  [do_scan_async+0/320] do_scan_async+0x0/0x140
Feb 13 13:16:08 ipstest kernel: [  915.245574]  [do_scsi_scan_host+126/128] do_scsi_scan_host+0x7e/0x80
Feb 13 13:16:08 ipstest kernel: [  915.245703]  [do_scan_async+23/320] do_scan_async+0x17/0x140
Feb 13 13:16:08 ipstest kernel: [  915.245832]  [do_scan_async+0/320] do_scan_async+0x0/0x140
Feb 13 13:16:08 ipstest kernel: [  915.245962]  [kthread+77/128] kthread+0x4d/0x80
Feb 13 13:16:08 ipstest kernel: [  915.246086]  [child_rip+10/18] child_rip+0xa/0x12
Feb 13 13:16:08 ipstest kernel: [  915.246209]  [kthread+0/128] kthread+0x0/0x80
Feb 13 13:16:08 ipstest kernel: [  915.246333]  [child_rip+0/18] child_rip+0x0/0x12
Feb 13 13:16:08 ipstest kernel: [  915.246457]
Feb 13 13:16:08 ipstest kernel: [  915.246564]
Feb 13 13:16:08 ipstest kernel: [  915.246565] Code: 4c 8b 00 48 8d 04 0a 4c 39 c0 76 2b b8 fe ff ff ff 31 f6 49
Feb 13 13:16:08 ipstest kernel: [  915.246933] RIP  [check_addr+16/80] check_addr+0x10/0x50
Feb 13 13:16:08 ipstest kernel: [  915.247062]  RSP <ffff810076d87900>
Feb 13 13:16:08 ipstest kernel: [  915.247181] CR2: 0000000000000000

I was able to narrow it down in as much as with this reverted the machine
seems to run fine:
    commit 2f4cf91cc0a1f32f75e1fa0a4d70a9bc7340a302
    [SCSI] ips: convert to use the data buffer accessors

Nothing looks overly suspicious in that patch per se, although based
on the list archives it looks like related changes caused other drivers
grief.  I've tried a variety of things to get a little more debug info,
but to no avail.  If anybody has any suggestions, I'd appreciate them!

This repro's 100% on driver load so it's relatively easy (unfortunately
no remote power or serial console available) to test patches...

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
       [not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>
@ 2008-02-14  0:04   ` Tim Pepper
  2008-02-18 14:57     ` Salyzyn, Mark
  0 siblings, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-14  0:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: FUJITA Tomonori, Mark_Salyzyn

(replaced missing cc's including linux-scsi)

On Wed 13 Feb at 14:39:07 -0800 Mark_Salyzyn@adaptec.com said:
> 
> - Is the command path code even reaching the controller's processor inquiry
>   spoofing section?
> 
>                 if (scb->scsi_cmd->cmnd[0] == INQUIRY) {
>                 IPS_SCSI_INQ_DATA inquiry;
> 
>                 memset(&inquiry, 0,
>                sizeof (IPS_SCSI_INQ_DATA));
> 
>                 inquiry.DeviceType =
>                     IPS_SCSI_INQ_TYPE_PROCESSOR;

I just added printk's in ips_send_cmd()'s INQUIRY case just before
the above condition test and just within the conditional block.
Neither showed on the console.

> The target_id check may be flawed? It is not using the
> scmd/sdev accessors and could be the wrong value as a result. Wild goose
> since arrays are working correctly (?)

In the original case the arrays appeared to be working correctly although
we were worried.  In the repro case we don't actually have any disk
attached...Forgot to mention that in the original email.

> - There appears to be a missing 'break' statement for the START_STOP command
>   immediately preceding the TEST_UNIT_READY/INQUIRY cases. What are the side
>   effects of that? It appears to be innocuous.

No change noted with the missing break added.

> - ips_scmd_buf_write is used for array capacity and other fiddly bits and
>   must be functioning correctly (?), so I can NOT harm it's functionality
>   except to question if the kmap_atomic returned a non-null value, it's
>   return value apparently is not checked. It's failure could speak of
>   problem(s) elsewhere.

I've got a printk there and haven't seen any output from it.  Haven't seen
anything from any of the printk's I've tried actually before:

In the run above to check if ips_send_cmd() hits the condition you were
curious about...I did capture the following from printk's I added in
ips_allocatescbs():

pci_alloc_consistent returns ha->scbs    @ 0x18446604437762007040
pci_alloc_consistent returns ips_sg.list @ 0x18446604437762002944
pci_alloc_consistent returns ha->scbs    @ 0x18446604437698871296
pci_alloc_consistent returns ips_sg.list @ 0x18446604437756837888

I take that as ips_init_phase2() being called and presumable returning
SUCCESS.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-13 21:43 ips.c broken since 2.6.23 on x86_64? Tim Pepper
       [not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>
@ 2008-02-14 11:48 ` FUJITA Tomonori
  2008-02-14 23:55   ` Tim Pepper
  1 sibling, 1 reply; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-14 11:48 UTC (permalink / raw)
  To: lnxninja; +Cc: linux-scsi, fujita.tomonori, Mark_Salyzyn, ipslinux, tomof

On Wed, 13 Feb 2008 13:43:24 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> We recently upgraded a production x86_64 machine with serveraid
> cards to 2.6.24 and noted that /proc/scsi/scsi showed garbage for our
> serveraid service processors.  sg_inq also returned garbage from the
> service processors' sg devices.  After a few iterations I started seeing
> meaninful stuff in the garbage.  Not sure if it was returning live memory
> or just unzero'd.  Either way not good so we went back to a known good,
> older kernel and tried to repro on a similar machine.  We got different,
> but still bad results in terms of pointing at memory badness.
> 
> FWIW, the original machine had the following hardware:
>     scsi0 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4H>
>     scsi1 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4M>
> and the repro's have been on a machine with just:
>     scsi0 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4Mx>
> 
> On the repro machine I'm getting a hang on ips driver load with the following
> logged:
> 
> Feb 13 13:16:08 ipstest kernel: [  915.236563] scsi3 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4Mx>
> Feb 13 13:16:08 ipstest kernel: [  915.236839] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> Feb 13 13:16:08 ipstest kernel: [  915.236863]  [check_addr+16/80] check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [  915.237209] PGD 79fff067 PUD 7a898067 PMD 0
> Feb 13 13:16:08 ipstest kernel: [  915.237341] Oops: 0000 [1] SMP
> Feb 13 13:16:08 ipstest kernel: [  915.237463] CPU 1
> Feb 13 13:16:08 ipstest kernel: [  915.239436] Modules linked in: ips aic94xx
> Feb 13 13:16:08 ipstest kernel: [  915.239559] Pid: 5213, comm: scsi_scan_3 Not tainted 2.6.23-ips_as_module #3
> Feb 13 13:16:08 ipstest kernel: [  915.239692] RIP: 0010:[check_addr+16/80]  [check_addr+16/80] check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [  915.239932] RSP: 0018:ffff810076d87900  EFLAGS: 00010082
> Feb 13 13:16:08 ipstest kernel: [  915.240059] RAX: 0000000000000000 RBX: ffff81007b636300 RCX: 0000000000000024
> Feb 13 13:16:08 ipstest kernel: [  915.240196] RDX: 000000007b636b00 RSI: ffffffff8077cde0 RDI: ffffffff806c4ed5
> Feb 13 13:16:08 ipstest kernel: [  915.240332] RBP: ffff810076d87900 R08: 0000000000000500 R09: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.240468] R10: ffff81007aa33b40 R11: 0000000000000060 R12: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.240605] R13: 0000000000000001 R14: ffffffff8077cde0 R15: ffff81007aa33a80
> Feb 13 13:16:08 ipstest kernel: [  915.240741] FS:  0000000000000000(0000) GS:ffff810001039300(0000) knlGS:0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.240981] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> Feb 13 13:16:08 ipstest kernel: [  915.241111] CR2: 0000000000000000 CR3: 0000000078a98000 CR4: 00000000000006e0
> Feb 13 13:16:08 ipstest kernel: [  915.241248] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.241384] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Feb 13 13:16:08 ipstest kernel: [  915.241520] Process scsi_scan_3 (pid: 5213, threadinfo ffff810076d86000, task ffff81007be26720)
> Feb 13 13:16:08 ipstest kernel: [  915.241761] Stack:  ffff810076d87930 ffffffff802125c3 ffff81007aa33a80 ffff81007480cf50
> Feb 13 13:16:08 ipstest kernel: [  915.242006]  0000000000000000 ffff81007ba38ca8 ffff810076d87940 ffffffff8046fb42
> Feb 13 13:16:08 ipstest kernel: [  915.242248]  ffff810076d879c0 ffffffff8801c2ee ffff81007aa33af0 000000017aa33af0
> Feb 13 13:16:08 ipstest kernel: [  915.242389] Call Trace:
> Feb 13 13:16:08 ipstest kernel: [  915.242606]  [nommu_map_sg+115/144] nommu_map_sg+0x73/0x90
> Feb 13 13:16:08 ipstest kernel: [  915.242736]  [scsi_dma_map+66/96] scsi_dma_map+0x42/0x60
> Feb 13 13:16:08 ipstest kernel: [  915.242867]  [_end+124884230/2127548952] :ips:ips_next+0x33e/0xc00
> Feb 13 13:16:08 ipstest kernel: [  915.242986]  [scsi_done+0/48] scsi_done+0x0/0x30
> Feb 13 13:16:08 ipstest kernel: [  915.243114]  [_end+124896894/2127548952] :ips:ips_queue+0x106/0x1f0
> Feb 13 13:16:08 ipstest kernel: [  915.243240]  [scsi_dispatch_cmd+498/784] scsi_dispatch_cmd+0x1f2/0x310
> Feb 13 13:16:08 ipstest kernel: [  915.243370]  [scsi_request_fn+491/976] scsi_request_fn+0x1eb/0x3d0
> Feb 13 13:16:08 ipstest kernel: [  915.243500]  [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30
> Feb 13 13:16:08 ipstest kernel: [  915.243630]  [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0
> Feb 13 13:16:08 ipstest kernel: [  915.243761]  [blk_execute_rq+122/224] blk_execute_rq+0x7a/0xe0
> Feb 13 13:16:08 ipstest kernel: [  915.243889]  [scsi_execute+240/288] scsi_execute+0xf0/0x120
> Feb 13 13:16:08 ipstest kernel: [  915.244016]  [scsi_execute_req+134/240] scsi_execute_req+0x86/0xf0
> Feb 13 13:16:08 ipstest kernel: [  915.244145]  [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90
> Feb 13 13:16:08 ipstest kernel: [  915.244279]  [sas_expander_match+27/160] sas_expander_match+0x1b/0xa0
> Feb 13 13:16:08 ipstest kernel: [  915.244412]  [get_device+23/32] get_device+0x17/0x20
> Feb 13 13:16:08 ipstest kernel: [  915.244534]  [__scsi_scan_target+220/1696] __scsi_scan_target+0xdc/0x6a0
> Feb 13 13:16:08 ipstest kernel: [  915.244665]  [enqueue_entity+172/432] enqueue_entity+0xac/0x1b0
> Feb 13 13:16:08 ipstest kernel: [  915.244793]  [update_curr_load+135/160] update_curr_load+0x87/0xa0
> Feb 13 13:16:08 ipstest kernel: [  915.244923]  [__check_preempt_curr_fair+107/128] __check_preempt_curr_fair+0x6b/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.245057]  [update_curr+258/272] update_curr+0x102/0x110
> Feb 13 13:16:08 ipstest kernel: [  915.245186]  [scsi_scan_channel+139/160] scsi_scan_channel+0x8b/0xa0
> Feb 13 13:16:08 ipstest kernel: [  915.245315]  [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160
> Feb 13 13:16:08 ipstest kernel: [  915.245447]  [do_scan_async+0/320] do_scan_async+0x0/0x140
> Feb 13 13:16:08 ipstest kernel: [  915.245574]  [do_scsi_scan_host+126/128] do_scsi_scan_host+0x7e/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.245703]  [do_scan_async+23/320] do_scan_async+0x17/0x140
> Feb 13 13:16:08 ipstest kernel: [  915.245832]  [do_scan_async+0/320] do_scan_async+0x0/0x140
> Feb 13 13:16:08 ipstest kernel: [  915.245962]  [kthread+77/128] kthread+0x4d/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.246086]  [child_rip+10/18] child_rip+0xa/0x12
> Feb 13 13:16:08 ipstest kernel: [  915.246209]  [kthread+0/128] kthread+0x0/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.246333]  [child_rip+0/18] child_rip+0x0/0x12
> Feb 13 13:16:08 ipstest kernel: [  915.246457]
> Feb 13 13:16:08 ipstest kernel: [  915.246564]
> Feb 13 13:16:08 ipstest kernel: [  915.246565] Code: 4c 8b 00 48 8d 04 0a 4c 39 c0 76 2b b8 fe ff ff ff 31 f6 49
> Feb 13 13:16:08 ipstest kernel: [  915.246933] RIP  [check_addr+16/80] check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [  915.247062]  RSP <ffff810076d87900>
> Feb 13 13:16:08 ipstest kernel: [  915.247181] CR2: 0000000000000000
> 
> I was able to narrow it down in as much as with this reverted the machine
> seems to run fine:
>     commit 2f4cf91cc0a1f32f75e1fa0a4d70a9bc7340a302
>     [SCSI] ips: convert to use the data buffer accessors
> 
> Nothing looks overly suspicious in that patch per se, although based
> on the list archives it looks like related changes caused other drivers
> grief.  I've tried a variety of things to get a little more debug info,
> but to no avail.  If anybody has any suggestions, I'd appreciate them!

Really sorry about the bug.

I have a slight doubt on the breakup code though I'm not sure you hit
the code. Reverting only the breakup part works? The patch is against
2.6.24.


diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 5c5a9b2..acabb19 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -3251,34 +3251,52 @@ ips_done(ips_ha_t * ha, ips_scb_t * scb)
 		 * the rest of the data and continue.
 		 */
 		if ((scb->breakup) || (scb->sg_break)) {
-                        struct scatterlist *sg;
-                        int i, sg_dma_index, ips_sg_index = 0;
-
 			/* we had a data breakup */
 			scb->data_len = 0;
 
-                        sg = scsi_sglist(scb->scsi_cmd);
-
-                        /* Spin forward to last dma chunk */
-                        sg_dma_index = scb->breakup;
-                        for (i = 0; i < scb->breakup; i++)
-                                sg = sg_next(sg);
-
-			/* Take care of possible partial on last chunk */
-                        ips_fill_scb_sg_single(ha,
-                                               sg_dma_address(sg),
-                                               scb, ips_sg_index++,
-                                               sg_dma_len(sg));
-
-                        for (; sg_dma_index < scsi_sg_count(scb->scsi_cmd);
-                             sg_dma_index++, sg = sg_next(sg)) {
-                                if (ips_fill_scb_sg_single
-                                    (ha,
-                                     sg_dma_address(sg),
-                                     scb, ips_sg_index++,
-                                     sg_dma_len(sg)) < 0)
-                                        break;
-                        }
+			if (scb->sg_count) {
+				/* S/G request */
+				struct scatterlist *sg;
+				int ips_sg_index = 0;
+				int sg_dma_index;
+
+				sg = scb->scsi_cmd->request_buffer;
+
+				/* Spin forward to last dma chunk */
+				sg_dma_index = scb->breakup;
+
+				/* Take care of possible partial on last chunk */
+				ips_fill_scb_sg_single(ha,
+						       sg_dma_address(&sg
+								      [sg_dma_index]),
+						       scb, ips_sg_index++,
+						       sg_dma_len(&sg
+								  [sg_dma_index]));
+
+				for (; sg_dma_index < scb->sg_count;
+				     sg_dma_index++) {
+					if (ips_fill_scb_sg_single
+					    (ha,
+					     sg_dma_address(&sg[sg_dma_index]),
+					     scb, ips_sg_index++,
+					     sg_dma_len(&sg[sg_dma_index])) < 0)
+						break;
+
+				}
+
+			} else {
+				/* Non S/G Request */
+				(void) ips_fill_scb_sg_single(ha,
+							      scb->
+							      data_busaddr +
+							      (scb->sg_break *
+							       ha->max_xfer),
+							      scb, 0,
+							      scb->scsi_cmd->
+							      request_bufflen -
+							      (scb->sg_break *
+							       ha->max_xfer));
+			}
 
 			scb->dcdb.transfer_length = scb->data_len;
 			scb->dcdb.cmd_attribute |=

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-14 11:48 ` FUJITA Tomonori
@ 2008-02-14 23:55   ` Tim Pepper
  2008-02-15  0:13     ` FUJITA Tomonori
  0 siblings, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-14 23:55 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-scsi, Mark_Salyzyn, tomof

On Thu 14 Feb at 20:48:38 +0900 fujita.tomonori@lab.ntt.co.jp said:
> 
> I have a slight doubt on the breakup code though I'm not sure you hit
> the code. Reverting only the breakup part works? The patch is against
> 2.6.24.

I've tested this revert you posted.  Essentially the same trace is logged
(see below).  The machine doesn't die though.  But /proc/scsi/scsi doesn't
show the hardware and I am getting an additional trace dumped now (see
futher below).

Feb 14 16:04:13 ipstest kernel: [ 1046.531180] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 18
Feb 14 16:04:13 ipstest kernel: [ 1046.553420] scsi3 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4Mx>
Feb 14 16:04:13 ipstest kernel: [ 1046.554103] PGD 7cc1f067 PUD 7c4f6067 PMD 0
Feb 14 16:04:13 ipstest kernel: [ 1046.554348] CPU 0
Feb 14 16:04:13 ipstest kernel: [ 1046.554460] Modules linked in: ips aic94xx
Feb 14 16:04:13 ipstest kernel: [ 1046.554587] Pid: 4476, comm: scsi_scan_3 Not tainted 2.6.24-fujita-revert-breakup #1
Feb 14 16:04:13 ipstest kernel: [ 1046.554827] RIP: 0010:[check_addr+16/80]  [check_addr+16/80] check_addr+0x10/0x50
Feb 14 16:04:13 ipstest kernel: [ 1046.555079] RSP: 0018:ffff81007b0f78f0  EFLAGS: 00010082
Feb 14 16:04:13 ipstest kernel: [ 1046.555208] RAX: 0000000000000000 RBX: ffff81007b92a300 RCX: 0000000000000024
Feb 14 16:04:13 ipstest kernel: [ 1046.555350] RDX: 000000007b92a600 RSI: ffffffff807a7340 RDI: ffffffff806e7f43
Feb 14 16:04:13 ipstest kernel: [ 1046.555494] RBP: ffff81007b0f78f0 R08: 0000000000000500 R09: 0000000000000000
Feb 14 16:04:13 ipstest kernel: [ 1046.555632] R10: ffff81007c463b38 R11: 0000000000000060 R12: 0000000000000000
Feb 14 16:04:13 ipstest kernel: [ 1046.555784] R13: 0000000000000001 R14: ffffffff807a7340 R15: ffff81007c463a80
Feb 14 16:04:13 ipstest kernel: [ 1046.555924] FS:  0000000000000000(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000
Feb 14 16:04:13 ipstest kernel: [ 1046.556173] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 14 16:04:13 ipstest kernel: [ 1046.556306] CR2: 0000000000000000 CR3: 000000007cd02000 CR4: 00000000000006e0
Feb 14 16:04:13 ipstest kernel: [ 1046.556443] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 14 16:04:13 ipstest kernel: [ 1046.556582] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 14 16:04:13 ipstest kernel: [ 1046.556731] Process scsi_scan_3 (pid: 4476, threadinfo ffff81007b0f6000, task ffff81007b9295c0)
Feb 14 16:04:13 ipstest kernel: [ 1046.556972] Stack:  ffff81007b0f7920 ffffffff80213118 ffff81007c463a80 ffff81007b0fcf50
Feb 14 16:04:13 ipstest kernel: [ 1046.557221]  0000000000000000 ffff81007cd14450 ffff81007b0f7930 ffffffff8047a4c2
Feb 14 16:04:13 ipstest kernel: [ 1046.557463]  ffff81007b0f79b0 ffffffff8801c28e ffff81007c463ae8 000000017c463ae8
Feb 14 16:04:13 ipstest kernel: [ 1046.557603] Call Trace:
Feb 14 16:04:13 ipstest kernel: [ 1046.557830]  [nommu_map_sg+104/176] nommu_map_sg+0x68/0xb0
Feb 14 16:04:13 ipstest kernel: [ 1046.557949]  [scsi_dma_map+66/96] scsi_dma_map+0x42/0x60
Feb 14 16:04:13 ipstest kernel: [ 1046.558083]  [_end+124740294/2127405112] :ips:ips_next+0x33e/0xc10
Feb 14 16:04:13 ipstest kernel: [ 1046.558205]  [scsi_done+0/48] scsi_done+0x0/0x30
Feb 14 16:04:13 ipstest kernel: [ 1046.558332]  [_end+124752974/2127405112] :ips:ips_queue+0x106/0x1f0
Feb 14 16:04:13 ipstest kernel: [ 1046.558459]  [scsi_dispatch_cmd+453/736] scsi_dispatch_cmd+0x1c5/0x2e0
Feb 14 16:04:13 ipstest kernel: [ 1046.558589]  [scsi_request_fn+491/976] scsi_request_fn+0x1eb/0x3d0
Feb 14 16:04:13 ipstest kernel: [ 1046.558725]  [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30
Feb 14 16:04:13 ipstest kernel: [ 1046.558851]  [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0
Feb 14 16:04:13 ipstest kernel: [ 1046.558982]  [blk_execute_rq+122/224] blk_execute_rq+0x7a/0xe0
Feb 14 16:04:13 ipstest kernel: [ 1046.559110]  [scsi_execute+240/288] scsi_execute+0xf0/0x120
Feb 14 16:04:13 ipstest kernel: [ 1046.559237]  [scsi_execute_req+134/240] scsi_execute_req+0x86/0xf0
Feb 14 16:04:13 ipstest kernel: [ 1046.559366]  [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90
Feb 14 16:04:13 ipstest kernel: [ 1046.559501]  [sas_expander_match+27/160] sas_expander_match+0x1b/0xa0
Feb 14 16:04:13 ipstest kernel: [ 1046.559633]  [get_device+23/32] get_device+0x17/0x20
Feb 14 16:04:13 ipstest kernel: [ 1046.559755]  [__scsi_scan_target+220/1680] __scsi_scan_target+0xdc/0x690
Feb 14 16:04:13 ipstest kernel: [ 1046.559887]  [enqueue_task_fair+32/64] enqueue_task_fair+0x20/0x40
Feb 14 16:04:13 ipstest kernel: [ 1046.560019]  [enqueue_task+19/48] enqueue_task+0x13/0x30
Feb 14 16:04:13 ipstest kernel: [ 1046.560141]  [update_curr+101/192] update_curr+0x65/0xc0
Feb 14 16:04:13 ipstest kernel: [ 1046.560268]  [scsi_scan_channel+139/160] scsi_scan_channel+0x8b/0xa0
Feb 14 16:04:13 ipstest kernel: [ 1046.560397]  [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160
Feb 14 16:04:13 ipstest kernel: [ 1046.560532]  [thread_return+61/1307] thread_return+0x3d/0x51b
Feb 14 16:04:13 ipstest kernel: [ 1046.560657]  [do_scan_async+0/384] do_scan_async+0x0/0x180
Feb 14 16:04:13 ipstest kernel: [ 1046.560784]  [do_scsi_scan_host+126/128] do_scsi_scan_host+0x7e/0x80
Feb 14 16:04:13 ipstest kernel: [ 1046.560913]  [do_scan_async+29/384] do_scan_async+0x1d/0x180
Feb 14 16:04:13 ipstest kernel: [ 1046.561041]  [do_scan_async+0/384] do_scan_async+0x0/0x180
Feb 14 16:04:13 ipstest kernel: [ 1046.561172]  [kthread+77/128] kthread+0x4d/0x80
Feb 14 16:04:13 ipstest kernel: [ 1046.561294]  [child_rip+10/18] child_rip+0xa/0x12
Feb 14 16:04:13 ipstest kernel: [ 1046.561417]  [kthread+0/128] kthread+0x0/0x80
Feb 14 16:04:13 ipstest kernel: [ 1046.561542]  [child_rip+0/18] child_rip+0x0/0x12
Feb 14 16:04:13 ipstest kernel: [ 1046.561666]
Feb 14 16:04:13 ipstest kernel: [ 1046.561772]
Feb 14 16:04:13 ipstest kernel: [ 1046.561773] Code: 4c 8b 00 48 8d 04 0a 4c 39 c0 76 2b b8 fe ff ff ff 31 f6 49
Feb 14 16:04:13 ipstest kernel: [ 1046.562268]  RSP <ffff81007b0f78f0>
Feb 14 16:04:13 ipstest kernel: [ 1046.562853] ---[ end trace 3d5dd58222a8e80a ]---
Feb 14 16:04:28 ipstest kernel: [ 1061.857443] CPU 0:
Feb 14 16:04:28 ipstest kernel: [ 1061.857673] Modules linked in: ips aic94xx
Feb 14 16:04:28 ipstest kernel: [ 1061.857996] Pid: 0, comm: swapper Tainted: G      D 2.6.24-fujita-revert-breakup #1
Feb 14 16:04:28 ipstest kernel: [ 1061.858291] RIP: 0010:[_spin_lock_irqsave+22/48]  [_spin_lock_irqsave+22/48] _spin_lock_irqsave+0x16/0x30
Feb 14 16:04:28 ipstest kernel: [ 1061.858647] RSP: 0018:ffffffff8087fe60  EFLAGS: 00000286
Feb 14 16:04:28 ipstest kernel: [ 1061.858825] RAX: 0000000000000282 RBX: ffffffff8087fe60 RCX: ffff81007c463ae8
Feb 14 16:04:28 ipstest kernel: [ 1061.859012] RDX: ffff81007cd14000 RSI: 0000000000000001 RDI: ffff81007cd14050
Feb 14 16:04:28 ipstest kernel: [ 1061.859198] RBP: ffffffff8087fde0 R08: 000000000213c81f R09: 0000000000000010
Feb 14 16:04:28 ipstest kernel: [ 1061.859385] R10: 0000000000000005 R11: 0000000000000246 R12: ffffffff8020ced6
Feb 14 16:04:28 ipstest kernel: [ 1061.859572] R13: ffffffff8087fde0 R14: ffff81007c463a80 R15: ffff81007cd14000
Feb 14 16:04:28 ipstest kernel: [ 1061.859763] FS:  0000000000000000(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000
Feb 14 16:04:28 ipstest kernel: [ 1061.860058] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 14 16:04:28 ipstest kernel: [ 1061.860240] CR2: 0000000000611ec0 CR3: 000000007c466000 CR4: 00000000000006e0
Feb 14 16:04:28 ipstest kernel: [ 1061.860425] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 14 16:04:28 ipstest kernel: [ 1061.860613] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 14 16:04:28 ipstest kernel: [ 1061.860821]
Feb 14 16:04:28 ipstest kernel: [ 1061.860822] Call Trace:
Feb 14 16:04:28 ipstest kernel: [ 1061.861141]  <IRQ>  [scsi_eh_scmd_add+62/208] scsi_eh_scmd_add+0x3e/0xd0
Feb 14 16:04:28 ipstest kernel: [ 1061.861390]  [scsi_times_out+0/192] scsi_times_out+0x0/0xc0
Feb 14 16:04:28 ipstest kernel: [ 1061.861573]  [scsi_times_out+79/192] scsi_times_out+0x4f/0xc0
Feb 14 16:04:28 ipstest kernel: [ 1061.861758]  [run_timer_softirq+359/464] run_timer_softirq+0x167/0x1d0
Feb 14 16:04:28 ipstest kernel: [ 1061.861942]  [__do_softirq+116/240] __do_softirq+0x74/0xf0
Feb 14 16:04:28 ipstest kernel: [ 1061.862129]  [profile_tick+94/160] profile_tick+0x5e/0xa0
Feb 14 16:04:28 ipstest kernel: [ 1061.862313]  [call_softirq+28/48] call_softirq+0x1c/0x30
Feb 14 16:04:28 ipstest kernel: [ 1061.862495]  [do_softirq+61/144] do_softirq+0x3d/0x90
Feb 14 16:04:28 ipstest kernel: [ 1061.862676]  [irq_exit+69/80] irq_exit+0x45/0x50
Feb 14 16:04:28 ipstest kernel: [ 1061.862857]  [smp_apic_timer_interrupt+74/112] smp_apic_timer_interrupt+0x4a/0x70
Feb 14 16:04:28 ipstest kernel: [ 1061.863041]  [mwait_idle+0/80] mwait_idle+0x0/0x50
Feb 14 16:04:28 ipstest kernel: [ 1061.863223]  [apic_timer_interrupt+102/112] apic_timer_interrupt+0x66/0x70
Feb 14 16:04:28 ipstest kernel: [ 1061.863403]  <EOI>  [mwait_idle+70/80] mwait_idle+0x46/0x50
Feb 14 16:04:28 ipstest kernel: [ 1061.863677]  [enter_idle+34/48] enter_idle+0x22/0x30
Feb 14 16:04:28 ipstest kernel: [ 1061.863857]  [cpu_idle+115/160] cpu_idle+0x73/0xa0
Feb 14 16:04:28 ipstest kernel: [ 1061.864039]  [rest_init+90/96] rest_init+0x5a/0x60
Feb 14 16:04:28 ipstest kernel: [ 1061.864225]  [start_kernel+618/752] start_kernel+0x26a/0x2f0
Feb 14 16:04:28 ipstest kernel: [ 1061.864406]  [x86_64_start_kernel+298/320] _sinittext+0x12a/0x140
Feb 14 16:04:28 ipstest kernel: [ 1061.864585]


That second trace then is repeatedly dumping every 12seconds without end.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-14 23:55   ` Tim Pepper
@ 2008-02-15  0:13     ` FUJITA Tomonori
  2008-02-15  1:16       ` Tim Pepper
  0 siblings, 1 reply; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-15  0:13 UTC (permalink / raw)
  To: lnxninja; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn, tomof

On Thu, 14 Feb 2008 15:55:49 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> On Thu 14 Feb at 20:48:38 +0900 fujita.tomonori@lab.ntt.co.jp said:
> > 
> > I have a slight doubt on the breakup code though I'm not sure you hit
> > the code. Reverting only the breakup part works? The patch is against
> > 2.6.24.
> 
> I've tested this revert you posted.  Essentially the same trace is logged
> (see below).  The machine doesn't die though.  But /proc/scsi/scsi doesn't
> show the hardware and I am getting an additional trace dumped now (see
> futher below).

Thanks. So we surely have a bug in the non-breakup part.

I've just found one bug. Can you try this patch against 2.6.24?

Thanks,

diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 5c5a9b2..d8f5eb5 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -1575,12 +1575,12 @@ ips_make_passthru(ips_ha_t *ha, struct scsi_cmnd *SC, ips_scb_t *scb, int intr)
 	ips_passthru_t *pt;
 	int length = 0;
 	int i, ret;
-        struct scatterlist *sg = scsi_sglist(SC);
+        struct scatterlist *sg;
 
 	METHOD_TRACE("ips_make_passthru", 1);
 
         scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i)
-                length += sg[i].length;
+                length += sg->length;
 
 	if (length < sizeof (ips_passthru_t)) {
 		/* wrong size */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-15  0:13     ` FUJITA Tomonori
@ 2008-02-15  1:16       ` Tim Pepper
  2008-02-15 16:09         ` FUJITA Tomonori
  0 siblings, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-15  1:16 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-scsi, Mark_Salyzyn, tomof

On Fri 15 Feb at 09:13:16 +0900 fujita.tomonori@lab.ntt.co.jp said:
> 
> Thanks. So we surely have a bug in the non-breakup part.
> 
> I've just found one bug. Can you try this patch against 2.6.24?

Tested and unfortunately no change.  Behaves same as the breakup-revert patch.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-15  1:16       ` Tim Pepper
@ 2008-02-15 16:09         ` FUJITA Tomonori
  2008-02-15 22:50           ` Tim Pepper
  0 siblings, 1 reply; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-15 16:09 UTC (permalink / raw)
  To: lnxninja; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn, tomof

On Thu, 14 Feb 2008 17:16:36 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> On Fri 15 Feb at 09:13:16 +0900 fujita.tomonori@lab.ntt.co.jp said:
> > 
> > Thanks. So we surely have a bug in the non-breakup part.
> > 
> > I've just found one bug. Can you try this patch against 2.6.24?
> 
> Tested and unfortunately no change.  Behaves same as the breakup-revert patch.

Thanks and sorry about that.

Now I don't have any idea. I split the patch. Can you please apply
them in a step-by-step way and tell me which patch breaks the driver?

There are nine patches against 2.6.24:

http://www.kernel.org/pub/linux/kernel/people/tomo/ips/

The first one is just reverting the data buffer accessors
conversion. It would be nice if we could just revert it but we
can't. These changes are necessary to compile the driver against post
2.6.24.

Really sorry about the troubles,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-15 16:09         ` FUJITA Tomonori
@ 2008-02-15 22:50           ` Tim Pepper
  2008-02-16  0:41             ` FUJITA Tomonori
  0 siblings, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-15 22:50 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn

On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said:
> 
> The first one is just reverting the data buffer accessors
> conversion. It would be nice if we could just revert it but we
> can't. These changes are necessary to compile the driver against post
> 2.6.24.

Fujita-san,

Unfortunately (and not too surprisingly given what we've tried so far) with
only the first of your series reverted the driver is working fine for me
again.

I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly
similar sounding issues with other drivers.  Could there be some memory
uninitialised?  I did try changing all the ips.c kmalloc's to kzalloc's,
but that didn't help.  Also that thread ties into pci gart.  The machines
we've been using are liable to getting pci calgary although given my
.config has:
    CONFIG_GART_IOMMU=y
    CONFIG_CALGARY_IOMMU=y
    # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
and that when booting this mainline I don't see any Calgary related
messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not
actually running the calgary iommu code in these repros.

Anyway, I greatly appreciate your efforts so far in trying to find what
could be wrong here!

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-15 22:50           ` Tim Pepper
@ 2008-02-16  0:41             ` FUJITA Tomonori
  2008-02-17 12:52               ` Boaz Harrosh
  2008-02-17 21:15               ` Tim Pepper
  0 siblings, 2 replies; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-16  0:41 UTC (permalink / raw)
  To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Fri, 15 Feb 2008 14:50:57 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said:
> > 
> > The first one is just reverting the data buffer accessors
> > conversion. It would be nice if we could just revert it but we
> > can't. These changes are necessary to compile the driver against post
> > 2.6.24.
> 
> Fujita-san,
> 
> Unfortunately (and not too surprisingly given what we've tried so far) with
> only the first of your series reverted the driver is working fine for me
> again.

Do you mean that you applied only the following two patches against
2.6.24, and then it doesn't work?

0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch
0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch

If so, the second patch is broken. Did you saw BUG_ON message (I added
some BUG_ON to the patch)?


> I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly
> similar sounding issues with other drivers.  Could there be some memory
> uninitialised?  I did try changing all the ips.c kmalloc's to kzalloc's,
> but that didn't help.  Also that thread ties into pci gart.  The machines
> we've been using are liable to getting pci calgary although given my
> .config has:
>     CONFIG_GART_IOMMU=y
>     CONFIG_CALGARY_IOMMU=y
>     # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
> and that when booting this mainline I don't see any Calgary related
> messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not
> actually running the calgary iommu code in these repros.

Yes, probabaly, your machine doesn't use any IOMMU hardware
(nommu_map_sg function was in your crash log).


> Anyway, I greatly appreciate your efforts so far in trying to find what
> could be wrong here!

Really sorry about the troubles and thanks for testing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-16  0:41             ` FUJITA Tomonori
@ 2008-02-17 12:52               ` Boaz Harrosh
  2008-02-17 13:09                 ` Boaz Harrosh
  2008-02-17 21:15               ` Tim Pepper
  1 sibling, 1 reply; 26+ messages in thread
From: Boaz Harrosh @ 2008-02-17 12:52 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Sat, Feb 16 2008 at 2:41 +0200, FUJITA Tomonori <tomof@acm.org> wrote:
> On Fri, 15 Feb 2008 14:50:57 -0800
> Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:
> 
>> On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said:
>>> The first one is just reverting the data buffer accessors
>>> conversion. It would be nice if we could just revert it but we
>>> can't. These changes are necessary to compile the driver against post
>>> 2.6.24.
>> Fujita-san,
>>
>> Unfortunately (and not too surprisingly given what we've tried so far) with
>> only the first of your series reverted the driver is working fine for me
>> again.
> 
> Do you mean that you applied only the following two patches against
> 2.6.24, and then it doesn't work?
> 
> 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch
> 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch
> 
> If so, the second patch is broken. Did you saw BUG_ON message (I added
> some BUG_ON to the patch)?
> 
> 
>> I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly
>> similar sounding issues with other drivers.  Could there be some memory
>> uninitialised?  I did try changing all the ips.c kmalloc's to kzalloc's,
>> but that didn't help.  Also that thread ties into pci gart.  The machines
>> we've been using are liable to getting pci calgary although given my
>> .config has:
>>     CONFIG_GART_IOMMU=y
>>     CONFIG_CALGARY_IOMMU=y
>>     # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
>> and that when booting this mainline I don't see any Calgary related
>> messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not
>> actually running the calgary iommu code in these repros.
> 
> Yes, probabaly, your machine doesn't use any IOMMU hardware
> (nommu_map_sg function was in your crash log).
> 
> 
>> Anyway, I greatly appreciate your efforts so far in trying to find what
>> could be wrong here!
> 
> Really sorry about the troubles and thanks for testing.
> -
Tomo hi
It looks like the same bug we had with USB's isd200 and protocol.c. An overflow
of a data buffer bigger then then the sglist. There 2 it was in the INQUIRY command.

You just need to also check for sg != NULL in the for() loop.

Tim Please test below patch. It's ontop of 2.6.24 but should also apply to
2.6.25-rcx

Boaz
-- 
>From ec20bea25c9fe2400378b19c128b15fef3c7cbb6 Mon Sep 17 00:00:00 2001
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Sun, 17 Feb 2008 14:50:25 +0200
Subject: [PATCH] ips: Avoid overflow in writing scsi command data

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/ips.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 5c5a9b2..1d12253 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -3517,7 +3517,7 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count)
         struct scatterlist *sg = scsi_sglist(scmd);
 
         for (i = 0, xfer_cnt = 0;
-             (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
+             (i < scsi_sg_count(scmd)) && (xfer_cnt < count) && sg; i++) {
                 min_cnt = min(count - xfer_cnt, sg[i].length);
 
                 /* kmap_atomic() ensures addressability of the data buffer.*/
-- 
1.5.3.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-17 12:52               ` Boaz Harrosh
@ 2008-02-17 13:09                 ` Boaz Harrosh
  2008-02-17 21:16                   ` Tim Pepper
  0 siblings, 1 reply; 26+ messages in thread
From: Boaz Harrosh @ 2008-02-17 13:09 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Sun, Feb 17 2008 at 14:52 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On Sat, Feb 16 2008 at 2:41 +0200, FUJITA Tomonori <tomof@acm.org> wrote:
>> On Fri, 15 Feb 2008 14:50:57 -0800
>> Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:
>>
>>> On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said:
>>>> The first one is just reverting the data buffer accessors
>>>> conversion. It would be nice if we could just revert it but we
>>>> can't. These changes are necessary to compile the driver against post
>>>> 2.6.24.
>>> Fujita-san,
>>>
>>> Unfortunately (and not too surprisingly given what we've tried so far) with
>>> only the first of your series reverted the driver is working fine for me
>>> again.
>> Do you mean that you applied only the following two patches against
>> 2.6.24, and then it doesn't work?
>>
>> 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch
>> 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch
>>
>> If so, the second patch is broken. Did you saw BUG_ON message (I added
>> some BUG_ON to the patch)?
>>
>>
>>> I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly
>>> similar sounding issues with other drivers.  Could there be some memory
>>> uninitialised?  I did try changing all the ips.c kmalloc's to kzalloc's,
>>> but that didn't help.  Also that thread ties into pci gart.  The machines
>>> we've been using are liable to getting pci calgary although given my
>>> .config has:
>>>     CONFIG_GART_IOMMU=y
>>>     CONFIG_CALGARY_IOMMU=y
>>>     # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
>>> and that when booting this mainline I don't see any Calgary related
>>> messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not
>>> actually running the calgary iommu code in these repros.
>> Yes, probabaly, your machine doesn't use any IOMMU hardware
>> (nommu_map_sg function was in your crash log).
>>
>>
>>> Anyway, I greatly appreciate your efforts so far in trying to find what
>>> could be wrong here!
>> Really sorry about the troubles and thanks for testing.
>> -
> Tomo hi
> It looks like the same bug we had with USB's isd200 and protocol.c. An overflow
> of a data buffer bigger then then the sglist. There 2 it was in the INQUIRY command.
> 
> You just need to also check for sg != NULL in the for() loop.
> 
> Tim Please test below patch. It's ontop of 2.6.24 but should also apply to
> 2.6.25-rcx
> 
> Boaz
> -- 
> From ec20bea25c9fe2400378b19c128b15fef3c7cbb6 Mon Sep 17 00:00:00 2001
> From: Boaz Harrosh <bharrosh@panasas.com>
> Date: Sun, 17 Feb 2008 14:50:25 +0200
> Subject: [PATCH] ips: Avoid overflow in writing scsi command data
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  drivers/scsi/ips.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
> index 5c5a9b2..1d12253 100644
> --- a/drivers/scsi/ips.c
> +++ b/drivers/scsi/ips.c
> @@ -3517,7 +3517,7 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count)

Disregard that patch it is totally wrong. That's not it. Sorry

Boaz


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-16  0:41             ` FUJITA Tomonori
  2008-02-17 12:52               ` Boaz Harrosh
@ 2008-02-17 21:15               ` Tim Pepper
  2008-02-17 22:31                 ` FUJITA Tomonori
  1 sibling, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-17 21:15 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn

On Sat 16 Feb at 09:41:48 +0900 tomof@acm.org said:
> 
> Do you mean that you applied only the following two patches against
> 2.6.24, and then it doesn't work?
> 
> 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch

I only applies the 0001 patch and the driver appeared to function fine.

> If so, the second patch is broken. Did you saw BUG_ON message (I added
> some BUG_ON to the patch)?

I did not apply the 0002 patch.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-17 13:09                 ` Boaz Harrosh
@ 2008-02-17 21:16                   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-02-17 21:16 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: FUJITA Tomonori, lnxninja, fujita.tomonori, linux-scsi,
	Mark_Salyzyn

On Sun 17 Feb at 15:09:06 +0200 bharrosh@panasas.com said:
> > 
> > You just need to also check for sg != NULL in the for() loop.
> > 
> > Tim Please test below patch. It's ontop of 2.6.24 but should also apply to
> > 2.6.25-rcx


> Disregard that patch it is totally wrong. That's not it. Sorry
> 
> Boaz
> 

You sure?  I was about to give it a quick run...but I guess I wont then. :)

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-17 21:15               ` Tim Pepper
@ 2008-02-17 22:31                 ` FUJITA Tomonori
  2008-02-17 23:29                   ` Tim Pepper
  2008-02-17 23:37                   ` Tim Pepper
  0 siblings, 2 replies; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-17 22:31 UTC (permalink / raw)
  To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Sun, 17 Feb 2008 13:15:40 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> On Sat 16 Feb at 09:41:48 +0900 tomof@acm.org said:
> > 
> > Do you mean that you applied only the following two patches against
> > 2.6.24, and then it doesn't work?
> > 
> > 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch
> 
> I only applies the 0001 patch and the driver appeared to function fine.

I see, thanks. It would be nice if we can push only the 0001 patch to
mainline, however as I explained, we can't unfortunately. The
0002-0009 patches are necessary for post 2.6.24.


> > If so, the second patch is broken. Did you saw BUG_ON message (I added
> > some BUG_ON to the patch)?
> 
> I did not apply the 0002 patch.

Can you apply the 0001 and 0002 against 2.6.24 and see how it works?
If it works well, then please apply the 0001, 0002 and 0003.

Please the patches in a step-by-step way and tell me which patch
breaks the driver.

Thanks,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-17 22:31                 ` FUJITA Tomonori
@ 2008-02-17 23:29                   ` Tim Pepper
  2008-02-17 23:37                   ` Tim Pepper
  1 sibling, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-02-17 23:29 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn

On Mon 18 Feb at 07:31:56 +0900 tomof@acm.org said:
> 
> Can you apply the 0001 and 0002 against 2.6.24 and see how it works?
> If it works well, then please apply the 0001, 0002 and 0003.
> 
> Please the patches in a step-by-step way and tell me which patch
> breaks the driver.

I wasn't thinking Friday...we already expected 0001 to fix it, so the
hope is one of the others breaks it and that is useful info.  I'll apply
and test the rest of the series.  Sorry!

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-17 22:31                 ` FUJITA Tomonori
  2008-02-17 23:29                   ` Tim Pepper
@ 2008-02-17 23:37                   ` Tim Pepper
  2008-02-18 13:32                     ` FUJITA Tomonori
  1 sibling, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-17 23:37 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn

On Mon 19 Feb at 07:31:56 +0900 tomof@acm.org said:
> 
> Can you apply the 0001 and 0002 against 2.6.24 and see how it works?
> If it works well, then please apply the 0001, 0002 and 0003.

Fujita-san,

I've started through the patches in order, cumulatively and after applying
0005 things break.  I wont be able to test anything else until tomorrow
when I can phycisally reset the machine...

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-17 23:37                   ` Tim Pepper
@ 2008-02-18 13:32                     ` FUJITA Tomonori
  2008-02-18 23:30                       ` Tim Pepper
  2008-02-19  8:02                       ` FUJITA Tomonori
  0 siblings, 2 replies; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-18 13:32 UTC (permalink / raw)
  To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Sun, 17 Feb 2008 15:37:02 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> On Mon 19 Feb at 07:31:56 +0900 tomof@acm.org said:
> > 
> > Can you apply the 0001 and 0002 against 2.6.24 and see how it works?
> > If it works well, then please apply the 0001, 0002 and 0003.
> 
> Fujita-san,
> 
> I've started through the patches in order, cumulatively and after applying
> 0005 things break.  I wont be able to test anything else until tomorrow
> when I can phycisally reset the machine...

Great, thanks a lot!

Can you apply this patch after the 0005 patch and see how it works? If
it works, then please continue to test 0006, 0007 ...


diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 05bb6ea..39cdd68 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -6906,7 +6906,7 @@ ips_register_scsi(int index)
 	sh->max_channel = ha->nbus - 1;
 	sh->can_queue = ha->max_cmds - 1;
 
-	scsi_add_host(sh, NULL);
+	scsi_add_host(sh, &ha->pcidev->dev);
 	scsi_scan_host(sh);
 
 	return 0;
-- 
1.5.3.7


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: ips.c broken since 2.6.23 on x86_64?
  2008-02-14  0:04   ` Tim Pepper
@ 2008-02-18 14:57     ` Salyzyn, Mark
  2008-02-18 23:34       ` Tim Pepper
  0 siblings, 1 reply; 26+ messages in thread
From: Salyzyn, Mark @ 2008-02-18 14:57 UTC (permalink / raw)
  To: Tim Pepper, linux-scsi@vger.kernel.org; +Cc: FUJITA Tomonori

The path needs to be triggered, it is the path to handle spoofing of the Adapter's inquiry.

You need more printk instrumentation to determine *why* it is not reaching that code path. What is the result of scb->scsi_cmd. scb->bus, ips_is_passthru(scb->scsi_cmd)?

The sg breakup issue may need to be looked at, but keep in mind the driver is going down a path that was not intended.

Sincerely -- Mark Salyzyn

> -----Original Message-----
> From: Tim Pepper [mailto:lnxninja@linux.vnet.ibm.com]
> Sent: Wednesday, February 13, 2008 7:04 PM
> To: linux-scsi@vger.kernel.org
> Cc: FUJITA Tomonori; Salyzyn, Mark
> Subject: Re: ips.c broken since 2.6.23 on x86_64?
>
> (replaced missing cc's including linux-scsi)
>
> On Wed 13 Feb at 14:39:07 -0800 Mark_Salyzyn@adaptec.com said:
> >
> > - Is the command path code even reaching the controller's
> processor inquiry
> >   spoofing section?
> >
> >                 if (scb->scsi_cmd->cmnd[0] == INQUIRY) {
> >                 IPS_SCSI_INQ_DATA inquiry;
> >
> >                 memset(&inquiry, 0,
> >                sizeof (IPS_SCSI_INQ_DATA));
> >
> >                 inquiry.DeviceType =
> >                     IPS_SCSI_INQ_TYPE_PROCESSOR;
>
> I just added printk's in ips_send_cmd()'s INQUIRY case just before
> the above condition test and just within the conditional block.
> Neither showed on the console.
>
> > The target_id check may be flawed? It is not using the
> > scmd/sdev accessors and could be the wrong value as a
> result. Wild goose
> > since arrays are working correctly (?)
>
> In the original case the arrays appeared to be working
> correctly although
> we were worried.  In the repro case we don't actually have any disk
> attached...Forgot to mention that in the original email.
>
> > - There appears to be a missing 'break' statement for the
> START_STOP command
> >   immediately preceding the TEST_UNIT_READY/INQUIRY cases.
> What are the side
> >   effects of that? It appears to be innocuous.
>
> No change noted with the missing break added.
>
> > - ips_scmd_buf_write is used for array capacity and other
> fiddly bits and
> >   must be functioning correctly (?), so I can NOT harm it's
> functionality
> >   except to question if the kmap_atomic returned a non-null
> value, it's
> >   return value apparently is not checked. It's failure
> could speak of
> >   problem(s) elsewhere.
>
> I've got a printk there and haven't seen any output from it.
> Haven't seen
> anything from any of the printk's I've tried actually before:
>
> In the run above to check if ips_send_cmd() hits the
> condition you were
> curious about...I did capture the following from printk's I added in
> ips_allocatescbs():
>
> pci_alloc_consistent returns ha->scbs    @ 0x18446604437762007040
> pci_alloc_consistent returns ips_sg.list @ 0x18446604437762002944
> pci_alloc_consistent returns ha->scbs    @ 0x18446604437698871296
> pci_alloc_consistent returns ips_sg.list @ 0x18446604437756837888
>
> I take that as ips_init_phase2() being called and presumable returning
> SUCCESS.
>
> --
> Tim Pepper  <lnxninja@linux.vnet.ibm.com>
> IBM Linux Technology Center
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-18 13:32                     ` FUJITA Tomonori
@ 2008-02-18 23:30                       ` Tim Pepper
  2008-02-19  0:11                         ` FUJITA Tomonori
  2008-02-19  8:02                       ` FUJITA Tomonori
  1 sibling, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-18 23:30 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Mon 18 Feb at 22:32:46 +0900 tomof@acm.org said:
> 
> diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
> index 05bb6ea..39cdd68 100644
> --- a/drivers/scsi/ips.c
> +++ b/drivers/scsi/ips.c
> @@ -6906,7 +6906,7 @@ ips_register_scsi(int index)
>  	sh->max_channel = ha->nbus - 1;
>  	sh->can_queue = ha->max_cmds - 1;
> 
> -	scsi_add_host(sh, NULL);
> +	scsi_add_host(sh, &ha->pcidev->dev);
>  	scsi_scan_host(sh);
> 
>  	return 0;

Fujita-san,

This applies and runs well on top of your 0005 patch!  The rest of the
patches also then apply in order and run successfully.

Just to confirm, I applied the above alone to a clean 2.6.24 and things
again build and run successfully.  For completeness I also reproduced
the problem against 2.6.23.16 and verified the above patch fixes on that
kernel version as well.

Assuming this patch is accepted for 2.6.25, please also queue it for
the 2.6.23/24 stable trees.

Thank you very much for your help in tracking this issue down!

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-18 14:57     ` Salyzyn, Mark
@ 2008-02-18 23:34       ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-02-18 23:34 UTC (permalink / raw)
  To: Salyzyn, Mark; +Cc: Tim Pepper, linux-scsi@vger.kernel.org, FUJITA Tomonori

On Mon 18 Feb at 06:57:14 -0800 Mark_Salyzyn@adaptec.com said:
> The path needs to be triggered, it is the path to handle spoofing
> of the Adapter's inquiry.
> 
> You need more printk instrumentation to determine *why* it is not
> reaching that code path. What is the result of scb->scsi_cmd.
> scb->bus, ips_is_passthru(scb->scsi_cmd)?
> 
> The sg breakup issue may need to be looked at, but keep in mind
> the driver is going down a path that was not intended.

Mark,

Fujita Tomonori has noted the following:

-       scsi_add_host(sh, NULL);
+       scsi_add_host(sh, &ha->pcidev->dev);

which appears to fix things for me.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-18 23:30                       ` Tim Pepper
@ 2008-02-19  0:11                         ` FUJITA Tomonori
  2008-02-19  3:48                           ` Tim Pepper
  0 siblings, 1 reply; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-19  0:11 UTC (permalink / raw)
  To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn

On Mon, 18 Feb 2008 15:30:58 -0800
Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:

> On Mon 18 Feb at 22:32:46 +0900 tomof@acm.org said:
> > 
> > diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
> > index 05bb6ea..39cdd68 100644
> > --- a/drivers/scsi/ips.c
> > +++ b/drivers/scsi/ips.c
> > @@ -6906,7 +6906,7 @@ ips_register_scsi(int index)
> >  	sh->max_channel = ha->nbus - 1;
> >  	sh->can_queue = ha->max_cmds - 1;
> > 
> > -	scsi_add_host(sh, NULL);
> > +	scsi_add_host(sh, &ha->pcidev->dev);
> >  	scsi_scan_host(sh);
> > 
> >  	return 0;
> 
> Fujita-san,
> 
> This applies and runs well on top of your 0005 patch!  The rest of the
> patches also then apply in order and run successfully.

Great, thanks a lot!


> Just to confirm, I applied the above alone to a clean 2.6.24 and things
> again build and run successfully.  For completeness I also reproduced
> the problem against 2.6.23.16 and verified the above patch fixes on that
> kernel version as well.

Nice. There is another bug on 2.6.24 but we rarely hit this so 2.6.24
works most of the time:

http://marc.info/?l=linux-scsi&m=120303487528875&w=2


> Assuming this patch is accepted for 2.6.25, please also queue it for
> the 2.6.23/24 stable trees.

Yes, I will take care about it.

Can you please help me just once more? 2.6.25-rc2 fixed this bug in a
bit different way by chance. Please test 2.6.25-rc2 with the attached
patch to make sure that ips in 2.6.25 works well.


> Thank you very much for your help in tracking this issue down!

No problem. I should have fixed it long time ago. Really sorry about
it.


diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index bb152fb..429592a 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -1576,7 +1576,7 @@ ips_make_passthru(ips_ha_t *ha, struct scsi_cmnd *SC, ips_scb_t *scb, int intr)
 	METHOD_TRACE("ips_make_passthru", 1);
 
         scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i)
-                length += sg[i].length;
+                length += sg->length;
 
 	if (length < sizeof (ips_passthru_t)) {
 		/* wrong size */
@@ -3510,15 +3510,16 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count)
         struct scatterlist *sg = scsi_sglist(scmd);
 
         for (i = 0, xfer_cnt = 0;
-             (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
-                min_cnt = min(count - xfer_cnt, sg[i].length);
+             (i < scsi_sg_count(scmd)) && (xfer_cnt < count);
+             i++, sg = sg_next(sg)) {
+                min_cnt = min(count - xfer_cnt, sg->length);
 
                 /* kmap_atomic() ensures addressability of the data buffer.*/
                 /* local_irq_save() protects the KM_IRQ0 address slot.     */
                 local_irq_save(flags);
-                buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset;
+                buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset;
                 memcpy(buffer, &cdata[xfer_cnt], min_cnt);
-                kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
+                kunmap_atomic(buffer - sg->offset, KM_IRQ0);
                 local_irq_restore(flags);
 
                 xfer_cnt += min_cnt;
@@ -3543,15 +3544,16 @@ ips_scmd_buf_read(struct scsi_cmnd *scmd, void *data, unsigned int count)
         struct scatterlist *sg = scsi_sglist(scmd);
 
         for (i = 0, xfer_cnt = 0;
-             (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
-                min_cnt = min(count - xfer_cnt, sg[i].length);
+             (i < scsi_sg_count(scmd)) && (xfer_cnt < count);
+             i++, sg = sg_next(sg)) {
+                min_cnt = min(count - xfer_cnt, sg->length);
 
                 /* kmap_atomic() ensures addressability of the data buffer.*/
                 /* local_irq_save() protects the KM_IRQ0 address slot.     */
                 local_irq_save(flags);
-                buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset;
+                buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset;
                 memcpy(&cdata[xfer_cnt], buffer, min_cnt);
-                kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
+                kunmap_atomic(buffer - sg->offset, KM_IRQ0);
                 local_irq_restore(flags);
 
                 xfer_cnt += min_cnt;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-19  0:11                         ` FUJITA Tomonori
@ 2008-02-19  3:48                           ` Tim Pepper
  2008-02-19  8:22                             ` FUJITA Tomonori
  0 siblings, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-02-19  3:48 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: tomof, linux-scsi, Mark_Salyzyn

On Feb 18, 2008 4:11 PM, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> Can you please help me just once more? 2.6.25-rc2 fixed this bug in a
> bit different way by chance. Please test 2.6.25-rc2 with the attached
> patch to make sure that ips in 2.6.25 works well.

Confirmed...the patch below against 2.6.25-rc2 also works for me.

Thank you again Fujita-san!

>
> diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
> index bb152fb..429592a 100644
> --- a/drivers/scsi/ips.c
> +++ b/drivers/scsi/ips.c
> @@ -1576,7 +1576,7 @@ ips_make_passthru(ips_ha_t *ha, struct scsi_cmnd *SC, ips_scb_t *scb, int intr)
>         METHOD_TRACE("ips_make_passthru", 1);
>
>          scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i)
> -                length += sg[i].length;
> +                length += sg->length;
>
>         if (length < sizeof (ips_passthru_t)) {
>                 /* wrong size */
> @@ -3510,15 +3510,16 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count)
>          struct scatterlist *sg = scsi_sglist(scmd);
>
>          for (i = 0, xfer_cnt = 0;
> -             (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
> -                min_cnt = min(count - xfer_cnt, sg[i].length);
> +             (i < scsi_sg_count(scmd)) && (xfer_cnt < count);
> +             i++, sg = sg_next(sg)) {
> +                min_cnt = min(count - xfer_cnt, sg->length);
>
>                  /* kmap_atomic() ensures addressability of the data buffer.*/
>                  /* local_irq_save() protects the KM_IRQ0 address slot.     */
>                  local_irq_save(flags);
> -                buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset;
> +                buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset;
>                  memcpy(buffer, &cdata[xfer_cnt], min_cnt);
> -                kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
> +                kunmap_atomic(buffer - sg->offset, KM_IRQ0);
>                  local_irq_restore(flags);
>
>                  xfer_cnt += min_cnt;
> @@ -3543,15 +3544,16 @@ ips_scmd_buf_read(struct scsi_cmnd *scmd, void *data, unsigned int count)
>          struct scatterlist *sg = scsi_sglist(scmd);
>
>          for (i = 0, xfer_cnt = 0;
> -             (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
> -                min_cnt = min(count - xfer_cnt, sg[i].length);
> +             (i < scsi_sg_count(scmd)) && (xfer_cnt < count);
> +             i++, sg = sg_next(sg)) {
> +                min_cnt = min(count - xfer_cnt, sg->length);
>
>                  /* kmap_atomic() ensures addressability of the data buffer.*/
>                  /* local_irq_save() protects the KM_IRQ0 address slot.     */
>                  local_irq_save(flags);
> -                buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset;
> +                buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset;
>                  memcpy(&cdata[xfer_cnt], buffer, min_cnt);
> -                kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
> +                kunmap_atomic(buffer - sg->offset, KM_IRQ0);
>                  local_irq_restore(flags);
>
>                  xfer_cnt += min_cnt;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-18 13:32                     ` FUJITA Tomonori
  2008-02-18 23:30                       ` Tim Pepper
@ 2008-02-19  8:02                       ` FUJITA Tomonori
  2008-02-19 16:06                         ` James Bottomley
  1 sibling, 1 reply; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-19  8:02 UTC (permalink / raw)
  To: James.Bottomley; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn

ips did scsi_add_host(sh, NULL) so scsi_dma_map uses
shost_gendev.parent that isn't initialized properly, then the kernel
crashes. 2.6.23 and 2.6.24 have this bug.

We can fix this by calling scsi_add_host with pdev->dev, in the
standard way (like the following way) but this bug was fixed in the
current Linus tree by:

commit 2551a13e61d3c3df6c2da6de5a3ece78e6d67111
Author: Jeff Garzik <jeff@garzik.org>
Date:   Thu Dec 13 16:14:10 2007 -0800

[SCSI] ips: handle scsi_add_host() failure, and other err cleanups


James, the legitimate way to fix stable trees is sending this commit
(not sending a patch that was not committed upstream)?


On Mon, 18 Feb 2008 22:32:46 +0900
FUJITA Tomonori <tomof@acm.org> wrote:

> On Sun, 17 Feb 2008 15:37:02 -0800
> Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote:
> 
> > On Mon 19 Feb at 07:31:56 +0900 tomof@acm.org said:
> > > 
> > > Can you apply the 0001 and 0002 against 2.6.24 and see how it works?
> > > If it works well, then please apply the 0001, 0002 and 0003.
> > 
> > Fujita-san,
> > 
> > I've started through the patches in order, cumulatively and after applying
> > 0005 things break.  I wont be able to test anything else until tomorrow
> > when I can phycisally reset the machine...
> 
> Great, thanks a lot!
> 
> Can you apply this patch after the 0005 patch and see how it works? If
> it works, then please continue to test 0006, 0007 ...
> 
> 
> diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
> index 05bb6ea..39cdd68 100644
> --- a/drivers/scsi/ips.c
> +++ b/drivers/scsi/ips.c
> @@ -6906,7 +6906,7 @@ ips_register_scsi(int index)
>  	sh->max_channel = ha->nbus - 1;
>  	sh->can_queue = ha->max_cmds - 1;
>  
> -	scsi_add_host(sh, NULL);
> +	scsi_add_host(sh, &ha->pcidev->dev);
>  	scsi_scan_host(sh);
>  
>  	return 0;
> -- 
> 1.5.3.7
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-19  3:48                           ` Tim Pepper
@ 2008-02-19  8:22                             ` FUJITA Tomonori
  0 siblings, 0 replies; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-19  8:22 UTC (permalink / raw)
  To: lnxninja; +Cc: fujita.tomonori, tomof, linux-scsi, Mark_Salyzyn

On Mon, 18 Feb 2008 19:48:49 -0800
"Tim Pepper" <lnxninja@linux.vnet.ibm.com> wrote:

> On Feb 18, 2008 4:11 PM, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> > Can you please help me just once more? 2.6.25-rc2 fixed this bug in a
> > bit different way by chance. Please test 2.6.25-rc2 with the attached
> > patch to make sure that ips in 2.6.25 works well.
> 
> Confirmed...the patch below against 2.6.25-rc2 also works for me.

Thanks a lot!

I'll make sure that the ips driver in 2.6.23-stable, 2.6.24-stable
(and 2.6.25) will work well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-19  8:02                       ` FUJITA Tomonori
@ 2008-02-19 16:06                         ` James Bottomley
  2008-02-19 20:29                           ` FUJITA Tomonori
  0 siblings, 1 reply; 26+ messages in thread
From: James Bottomley @ 2008-02-19 16:06 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn, Jeff Garzik

On Tue, 2008-02-19 at 17:02 +0900, FUJITA Tomonori wrote:
> ips did scsi_add_host(sh, NULL) so scsi_dma_map uses
> shost_gendev.parent that isn't initialized properly, then the kernel
> crashes. 2.6.23 and 2.6.24 have this bug.
> 
> We can fix this by calling scsi_add_host with pdev->dev, in the
> standard way (like the following way) but this bug was fixed in the
> current Linus tree by:
> 
> commit 2551a13e61d3c3df6c2da6de5a3ece78e6d67111
> Author: Jeff Garzik <jeff@garzik.org>
> Date:   Thu Dec 13 16:14:10 2007 -0800
> 
> [SCSI] ips: handle scsi_add_host() failure, and other err cleanups
> 
> 
> James, the legitimate way to fix stable trees is sending this commit
> (not sending a patch that was not committed upstream)?

Well, the upstream patch doesn't look so bad as a stable candidate to my
eye.  Just because it's an unintended bugfix doesn't automatically
invalidate it.

The reason stable likes backports of existing upstream patches is
because they've supposedly been well tested in upstream.  Although that
doesn't apply in this case because the other bug rather prevented
testing, the principle is still sound.

So, would there be any problems simply backporting this?

James



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ips.c broken since 2.6.23 on x86_64?
  2008-02-19 16:06                         ` James Bottomley
@ 2008-02-19 20:29                           ` FUJITA Tomonori
  0 siblings, 0 replies; 26+ messages in thread
From: FUJITA Tomonori @ 2008-02-19 20:29 UTC (permalink / raw)
  To: James.Bottomley
  Cc: tomof, lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn, jeff

On Tue, 19 Feb 2008 10:06:39 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Tue, 2008-02-19 at 17:02 +0900, FUJITA Tomonori wrote:
> > ips did scsi_add_host(sh, NULL) so scsi_dma_map uses
> > shost_gendev.parent that isn't initialized properly, then the kernel
> > crashes. 2.6.23 and 2.6.24 have this bug.
> > 
> > We can fix this by calling scsi_add_host with pdev->dev, in the
> > standard way (like the following way) but this bug was fixed in the
> > current Linus tree by:
> > 
> > commit 2551a13e61d3c3df6c2da6de5a3ece78e6d67111
> > Author: Jeff Garzik <jeff@garzik.org>
> > Date:   Thu Dec 13 16:14:10 2007 -0800
> > 
> > [SCSI] ips: handle scsi_add_host() failure, and other err cleanups
> > 
> > 
> > James, the legitimate way to fix stable trees is sending this commit
> > (not sending a patch that was not committed upstream)?
> 
> Well, the upstream patch doesn't look so bad as a stable candidate to my
> eye.  Just because it's an unintended bugfix doesn't automatically
> invalidate it.
> 
> The reason stable likes backports of existing upstream patches is
> because they've supposedly been well tested in upstream.  Although that
> doesn't apply in this case because the other bug rather prevented
> testing, the principle is still sound.
> 
> So, would there be any problems simply backporting this?

Thanks, I see. There is no problem. Please backport this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-02-19 20:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-13 21:43 ips.c broken since 2.6.23 on x86_64? Tim Pepper
     [not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>
2008-02-14  0:04   ` Tim Pepper
2008-02-18 14:57     ` Salyzyn, Mark
2008-02-18 23:34       ` Tim Pepper
2008-02-14 11:48 ` FUJITA Tomonori
2008-02-14 23:55   ` Tim Pepper
2008-02-15  0:13     ` FUJITA Tomonori
2008-02-15  1:16       ` Tim Pepper
2008-02-15 16:09         ` FUJITA Tomonori
2008-02-15 22:50           ` Tim Pepper
2008-02-16  0:41             ` FUJITA Tomonori
2008-02-17 12:52               ` Boaz Harrosh
2008-02-17 13:09                 ` Boaz Harrosh
2008-02-17 21:16                   ` Tim Pepper
2008-02-17 21:15               ` Tim Pepper
2008-02-17 22:31                 ` FUJITA Tomonori
2008-02-17 23:29                   ` Tim Pepper
2008-02-17 23:37                   ` Tim Pepper
2008-02-18 13:32                     ` FUJITA Tomonori
2008-02-18 23:30                       ` Tim Pepper
2008-02-19  0:11                         ` FUJITA Tomonori
2008-02-19  3:48                           ` Tim Pepper
2008-02-19  8:22                             ` FUJITA Tomonori
2008-02-19  8:02                       ` FUJITA Tomonori
2008-02-19 16:06                         ` James Bottomley
2008-02-19 20:29                           ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).