* ips.c broken since 2.6.23 on x86_64?
@ 2008-02-13 21:43 Tim Pepper
[not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>
2008-02-14 11:48 ` FUJITA Tomonori
0 siblings, 2 replies; 26+ messages in thread
From: Tim Pepper @ 2008-02-13 21:43 UTC (permalink / raw)
To: linux-scsi; +Cc: FUJITA Tomonori, Mark_Salyzyn, ipslinux
We recently upgraded a production x86_64 machine with serveraid
cards to 2.6.24 and noted that /proc/scsi/scsi showed garbage for our
serveraid service processors. sg_inq also returned garbage from the
service processors' sg devices. After a few iterations I started seeing
meaninful stuff in the garbage. Not sure if it was returning live memory
or just unzero'd. Either way not good so we went back to a known good,
older kernel and tried to repro on a similar machine. We got different,
but still bad results in terms of pointing at memory badness.
FWIW, the original machine had the following hardware:
scsi0 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4H>
scsi1 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4M>
and the repro's have been on a machine with just:
scsi0 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4Mx>
On the repro machine I'm getting a hang on ips driver load with the following
logged:
Feb 13 13:16:08 ipstest kernel: [ 915.236563] scsi3 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4Mx>
Feb 13 13:16:08 ipstest kernel: [ 915.236839] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
Feb 13 13:16:08 ipstest kernel: [ 915.236863] [check_addr+16/80] check_addr+0x10/0x50
Feb 13 13:16:08 ipstest kernel: [ 915.237209] PGD 79fff067 PUD 7a898067 PMD 0
Feb 13 13:16:08 ipstest kernel: [ 915.237341] Oops: 0000 [1] SMP
Feb 13 13:16:08 ipstest kernel: [ 915.237463] CPU 1
Feb 13 13:16:08 ipstest kernel: [ 915.239436] Modules linked in: ips aic94xx
Feb 13 13:16:08 ipstest kernel: [ 915.239559] Pid: 5213, comm: scsi_scan_3 Not tainted 2.6.23-ips_as_module #3
Feb 13 13:16:08 ipstest kernel: [ 915.239692] RIP: 0010:[check_addr+16/80] [check_addr+16/80] check_addr+0x10/0x50
Feb 13 13:16:08 ipstest kernel: [ 915.239932] RSP: 0018:ffff810076d87900 EFLAGS: 00010082
Feb 13 13:16:08 ipstest kernel: [ 915.240059] RAX: 0000000000000000 RBX: ffff81007b636300 RCX: 0000000000000024
Feb 13 13:16:08 ipstest kernel: [ 915.240196] RDX: 000000007b636b00 RSI: ffffffff8077cde0 RDI: ffffffff806c4ed5
Feb 13 13:16:08 ipstest kernel: [ 915.240332] RBP: ffff810076d87900 R08: 0000000000000500 R09: 0000000000000000
Feb 13 13:16:08 ipstest kernel: [ 915.240468] R10: ffff81007aa33b40 R11: 0000000000000060 R12: 0000000000000000
Feb 13 13:16:08 ipstest kernel: [ 915.240605] R13: 0000000000000001 R14: ffffffff8077cde0 R15: ffff81007aa33a80
Feb 13 13:16:08 ipstest kernel: [ 915.240741] FS: 0000000000000000(0000) GS:ffff810001039300(0000) knlGS:0000000000000000
Feb 13 13:16:08 ipstest kernel: [ 915.240981] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 13 13:16:08 ipstest kernel: [ 915.241111] CR2: 0000000000000000 CR3: 0000000078a98000 CR4: 00000000000006e0
Feb 13 13:16:08 ipstest kernel: [ 915.241248] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 13 13:16:08 ipstest kernel: [ 915.241384] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 13 13:16:08 ipstest kernel: [ 915.241520] Process scsi_scan_3 (pid: 5213, threadinfo ffff810076d86000, task ffff81007be26720)
Feb 13 13:16:08 ipstest kernel: [ 915.241761] Stack: ffff810076d87930 ffffffff802125c3 ffff81007aa33a80 ffff81007480cf50
Feb 13 13:16:08 ipstest kernel: [ 915.242006] 0000000000000000 ffff81007ba38ca8 ffff810076d87940 ffffffff8046fb42
Feb 13 13:16:08 ipstest kernel: [ 915.242248] ffff810076d879c0 ffffffff8801c2ee ffff81007aa33af0 000000017aa33af0
Feb 13 13:16:08 ipstest kernel: [ 915.242389] Call Trace:
Feb 13 13:16:08 ipstest kernel: [ 915.242606] [nommu_map_sg+115/144] nommu_map_sg+0x73/0x90
Feb 13 13:16:08 ipstest kernel: [ 915.242736] [scsi_dma_map+66/96] scsi_dma_map+0x42/0x60
Feb 13 13:16:08 ipstest kernel: [ 915.242867] [_end+124884230/2127548952] :ips:ips_next+0x33e/0xc00
Feb 13 13:16:08 ipstest kernel: [ 915.242986] [scsi_done+0/48] scsi_done+0x0/0x30
Feb 13 13:16:08 ipstest kernel: [ 915.243114] [_end+124896894/2127548952] :ips:ips_queue+0x106/0x1f0
Feb 13 13:16:08 ipstest kernel: [ 915.243240] [scsi_dispatch_cmd+498/784] scsi_dispatch_cmd+0x1f2/0x310
Feb 13 13:16:08 ipstest kernel: [ 915.243370] [scsi_request_fn+491/976] scsi_request_fn+0x1eb/0x3d0
Feb 13 13:16:08 ipstest kernel: [ 915.243500] [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30
Feb 13 13:16:08 ipstest kernel: [ 915.243630] [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0
Feb 13 13:16:08 ipstest kernel: [ 915.243761] [blk_execute_rq+122/224] blk_execute_rq+0x7a/0xe0
Feb 13 13:16:08 ipstest kernel: [ 915.243889] [scsi_execute+240/288] scsi_execute+0xf0/0x120
Feb 13 13:16:08 ipstest kernel: [ 915.244016] [scsi_execute_req+134/240] scsi_execute_req+0x86/0xf0
Feb 13 13:16:08 ipstest kernel: [ 915.244145] [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90
Feb 13 13:16:08 ipstest kernel: [ 915.244279] [sas_expander_match+27/160] sas_expander_match+0x1b/0xa0
Feb 13 13:16:08 ipstest kernel: [ 915.244412] [get_device+23/32] get_device+0x17/0x20
Feb 13 13:16:08 ipstest kernel: [ 915.244534] [__scsi_scan_target+220/1696] __scsi_scan_target+0xdc/0x6a0
Feb 13 13:16:08 ipstest kernel: [ 915.244665] [enqueue_entity+172/432] enqueue_entity+0xac/0x1b0
Feb 13 13:16:08 ipstest kernel: [ 915.244793] [update_curr_load+135/160] update_curr_load+0x87/0xa0
Feb 13 13:16:08 ipstest kernel: [ 915.244923] [__check_preempt_curr_fair+107/128] __check_preempt_curr_fair+0x6b/0x80
Feb 13 13:16:08 ipstest kernel: [ 915.245057] [update_curr+258/272] update_curr+0x102/0x110
Feb 13 13:16:08 ipstest kernel: [ 915.245186] [scsi_scan_channel+139/160] scsi_scan_channel+0x8b/0xa0
Feb 13 13:16:08 ipstest kernel: [ 915.245315] [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160
Feb 13 13:16:08 ipstest kernel: [ 915.245447] [do_scan_async+0/320] do_scan_async+0x0/0x140
Feb 13 13:16:08 ipstest kernel: [ 915.245574] [do_scsi_scan_host+126/128] do_scsi_scan_host+0x7e/0x80
Feb 13 13:16:08 ipstest kernel: [ 915.245703] [do_scan_async+23/320] do_scan_async+0x17/0x140
Feb 13 13:16:08 ipstest kernel: [ 915.245832] [do_scan_async+0/320] do_scan_async+0x0/0x140
Feb 13 13:16:08 ipstest kernel: [ 915.245962] [kthread+77/128] kthread+0x4d/0x80
Feb 13 13:16:08 ipstest kernel: [ 915.246086] [child_rip+10/18] child_rip+0xa/0x12
Feb 13 13:16:08 ipstest kernel: [ 915.246209] [kthread+0/128] kthread+0x0/0x80
Feb 13 13:16:08 ipstest kernel: [ 915.246333] [child_rip+0/18] child_rip+0x0/0x12
Feb 13 13:16:08 ipstest kernel: [ 915.246457]
Feb 13 13:16:08 ipstest kernel: [ 915.246564]
Feb 13 13:16:08 ipstest kernel: [ 915.246565] Code: 4c 8b 00 48 8d 04 0a 4c 39 c0 76 2b b8 fe ff ff ff 31 f6 49
Feb 13 13:16:08 ipstest kernel: [ 915.246933] RIP [check_addr+16/80] check_addr+0x10/0x50
Feb 13 13:16:08 ipstest kernel: [ 915.247062] RSP <ffff810076d87900>
Feb 13 13:16:08 ipstest kernel: [ 915.247181] CR2: 0000000000000000
I was able to narrow it down in as much as with this reverted the machine
seems to run fine:
commit 2f4cf91cc0a1f32f75e1fa0a4d70a9bc7340a302
[SCSI] ips: convert to use the data buffer accessors
Nothing looks overly suspicious in that patch per se, although based
on the list archives it looks like related changes caused other drivers
grief. I've tried a variety of things to get a little more debug info,
but to no avail. If anybody has any suggestions, I'd appreciate them!
This repro's 100% on driver load so it's relatively easy (unfortunately
no remote power or serial console available) to test patches...
--
Tim Pepper <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 26+ messages in thread[parent not found: <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>]
* Re: ips.c broken since 2.6.23 on x86_64? [not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com> @ 2008-02-14 0:04 ` Tim Pepper 2008-02-18 14:57 ` Salyzyn, Mark 0 siblings, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-14 0:04 UTC (permalink / raw) To: linux-scsi; +Cc: FUJITA Tomonori, Mark_Salyzyn (replaced missing cc's including linux-scsi) On Wed 13 Feb at 14:39:07 -0800 Mark_Salyzyn@adaptec.com said: > > - Is the command path code even reaching the controller's processor inquiry > spoofing section? > > if (scb->scsi_cmd->cmnd[0] == INQUIRY) { > IPS_SCSI_INQ_DATA inquiry; > > memset(&inquiry, 0, > sizeof (IPS_SCSI_INQ_DATA)); > > inquiry.DeviceType = > IPS_SCSI_INQ_TYPE_PROCESSOR; I just added printk's in ips_send_cmd()'s INQUIRY case just before the above condition test and just within the conditional block. Neither showed on the console. > The target_id check may be flawed? It is not using the > scmd/sdev accessors and could be the wrong value as a result. Wild goose > since arrays are working correctly (?) In the original case the arrays appeared to be working correctly although we were worried. In the repro case we don't actually have any disk attached...Forgot to mention that in the original email. > - There appears to be a missing 'break' statement for the START_STOP command > immediately preceding the TEST_UNIT_READY/INQUIRY cases. What are the side > effects of that? It appears to be innocuous. No change noted with the missing break added. > - ips_scmd_buf_write is used for array capacity and other fiddly bits and > must be functioning correctly (?), so I can NOT harm it's functionality > except to question if the kmap_atomic returned a non-null value, it's > return value apparently is not checked. It's failure could speak of > problem(s) elsewhere. I've got a printk there and haven't seen any output from it. Haven't seen anything from any of the printk's I've tried actually before: In the run above to check if ips_send_cmd() hits the condition you were curious about...I did capture the following from printk's I added in ips_allocatescbs(): pci_alloc_consistent returns ha->scbs @ 0x18446604437762007040 pci_alloc_consistent returns ips_sg.list @ 0x18446604437762002944 pci_alloc_consistent returns ha->scbs @ 0x18446604437698871296 pci_alloc_consistent returns ips_sg.list @ 0x18446604437756837888 I take that as ips_init_phase2() being called and presumable returning SUCCESS. -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: ips.c broken since 2.6.23 on x86_64? 2008-02-14 0:04 ` Tim Pepper @ 2008-02-18 14:57 ` Salyzyn, Mark 2008-02-18 23:34 ` Tim Pepper 0 siblings, 1 reply; 26+ messages in thread From: Salyzyn, Mark @ 2008-02-18 14:57 UTC (permalink / raw) To: Tim Pepper, linux-scsi@vger.kernel.org; +Cc: FUJITA Tomonori The path needs to be triggered, it is the path to handle spoofing of the Adapter's inquiry. You need more printk instrumentation to determine *why* it is not reaching that code path. What is the result of scb->scsi_cmd. scb->bus, ips_is_passthru(scb->scsi_cmd)? The sg breakup issue may need to be looked at, but keep in mind the driver is going down a path that was not intended. Sincerely -- Mark Salyzyn > -----Original Message----- > From: Tim Pepper [mailto:lnxninja@linux.vnet.ibm.com] > Sent: Wednesday, February 13, 2008 7:04 PM > To: linux-scsi@vger.kernel.org > Cc: FUJITA Tomonori; Salyzyn, Mark > Subject: Re: ips.c broken since 2.6.23 on x86_64? > > (replaced missing cc's including linux-scsi) > > On Wed 13 Feb at 14:39:07 -0800 Mark_Salyzyn@adaptec.com said: > > > > - Is the command path code even reaching the controller's > processor inquiry > > spoofing section? > > > > if (scb->scsi_cmd->cmnd[0] == INQUIRY) { > > IPS_SCSI_INQ_DATA inquiry; > > > > memset(&inquiry, 0, > > sizeof (IPS_SCSI_INQ_DATA)); > > > > inquiry.DeviceType = > > IPS_SCSI_INQ_TYPE_PROCESSOR; > > I just added printk's in ips_send_cmd()'s INQUIRY case just before > the above condition test and just within the conditional block. > Neither showed on the console. > > > The target_id check may be flawed? It is not using the > > scmd/sdev accessors and could be the wrong value as a > result. Wild goose > > since arrays are working correctly (?) > > In the original case the arrays appeared to be working > correctly although > we were worried. In the repro case we don't actually have any disk > attached...Forgot to mention that in the original email. > > > - There appears to be a missing 'break' statement for the > START_STOP command > > immediately preceding the TEST_UNIT_READY/INQUIRY cases. > What are the side > > effects of that? It appears to be innocuous. > > No change noted with the missing break added. > > > - ips_scmd_buf_write is used for array capacity and other > fiddly bits and > > must be functioning correctly (?), so I can NOT harm it's > functionality > > except to question if the kmap_atomic returned a non-null > value, it's > > return value apparently is not checked. It's failure > could speak of > > problem(s) elsewhere. > > I've got a printk there and haven't seen any output from it. > Haven't seen > anything from any of the printk's I've tried actually before: > > In the run above to check if ips_send_cmd() hits the > condition you were > curious about...I did capture the following from printk's I added in > ips_allocatescbs(): > > pci_alloc_consistent returns ha->scbs @ 0x18446604437762007040 > pci_alloc_consistent returns ips_sg.list @ 0x18446604437762002944 > pci_alloc_consistent returns ha->scbs @ 0x18446604437698871296 > pci_alloc_consistent returns ips_sg.list @ 0x18446604437756837888 > > I take that as ips_init_phase2() being called and presumable returning > SUCCESS. > > -- > Tim Pepper <lnxninja@linux.vnet.ibm.com> > IBM Linux Technology Center > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-18 14:57 ` Salyzyn, Mark @ 2008-02-18 23:34 ` Tim Pepper 0 siblings, 0 replies; 26+ messages in thread From: Tim Pepper @ 2008-02-18 23:34 UTC (permalink / raw) To: Salyzyn, Mark; +Cc: Tim Pepper, linux-scsi@vger.kernel.org, FUJITA Tomonori On Mon 18 Feb at 06:57:14 -0800 Mark_Salyzyn@adaptec.com said: > The path needs to be triggered, it is the path to handle spoofing > of the Adapter's inquiry. > > You need more printk instrumentation to determine *why* it is not > reaching that code path. What is the result of scb->scsi_cmd. > scb->bus, ips_is_passthru(scb->scsi_cmd)? > > The sg breakup issue may need to be looked at, but keep in mind > the driver is going down a path that was not intended. Mark, Fujita Tomonori has noted the following: - scsi_add_host(sh, NULL); + scsi_add_host(sh, &ha->pcidev->dev); which appears to fix things for me. -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-13 21:43 ips.c broken since 2.6.23 on x86_64? Tim Pepper [not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com> @ 2008-02-14 11:48 ` FUJITA Tomonori 2008-02-14 23:55 ` Tim Pepper 1 sibling, 1 reply; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-14 11:48 UTC (permalink / raw) To: lnxninja; +Cc: linux-scsi, fujita.tomonori, Mark_Salyzyn, ipslinux, tomof On Wed, 13 Feb 2008 13:43:24 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > We recently upgraded a production x86_64 machine with serveraid > cards to 2.6.24 and noted that /proc/scsi/scsi showed garbage for our > serveraid service processors. sg_inq also returned garbage from the > service processors' sg devices. After a few iterations I started seeing > meaninful stuff in the garbage. Not sure if it was returning live memory > or just unzero'd. Either way not good so we went back to a known good, > older kernel and tried to repro on a similar machine. We got different, > but still bad results in terms of pointing at memory badness. > > FWIW, the original machine had the following hardware: > scsi0 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4H> > scsi1 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4M> > and the repro's have been on a machine with just: > scsi0 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4Mx> > > On the repro machine I'm getting a hang on ips driver load with the following > logged: > > Feb 13 13:16:08 ipstest kernel: [ 915.236563] scsi3 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4Mx> > Feb 13 13:16:08 ipstest kernel: [ 915.236839] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: > Feb 13 13:16:08 ipstest kernel: [ 915.236863] [check_addr+16/80] check_addr+0x10/0x50 > Feb 13 13:16:08 ipstest kernel: [ 915.237209] PGD 79fff067 PUD 7a898067 PMD 0 > Feb 13 13:16:08 ipstest kernel: [ 915.237341] Oops: 0000 [1] SMP > Feb 13 13:16:08 ipstest kernel: [ 915.237463] CPU 1 > Feb 13 13:16:08 ipstest kernel: [ 915.239436] Modules linked in: ips aic94xx > Feb 13 13:16:08 ipstest kernel: [ 915.239559] Pid: 5213, comm: scsi_scan_3 Not tainted 2.6.23-ips_as_module #3 > Feb 13 13:16:08 ipstest kernel: [ 915.239692] RIP: 0010:[check_addr+16/80] [check_addr+16/80] check_addr+0x10/0x50 > Feb 13 13:16:08 ipstest kernel: [ 915.239932] RSP: 0018:ffff810076d87900 EFLAGS: 00010082 > Feb 13 13:16:08 ipstest kernel: [ 915.240059] RAX: 0000000000000000 RBX: ffff81007b636300 RCX: 0000000000000024 > Feb 13 13:16:08 ipstest kernel: [ 915.240196] RDX: 000000007b636b00 RSI: ffffffff8077cde0 RDI: ffffffff806c4ed5 > Feb 13 13:16:08 ipstest kernel: [ 915.240332] RBP: ffff810076d87900 R08: 0000000000000500 R09: 0000000000000000 > Feb 13 13:16:08 ipstest kernel: [ 915.240468] R10: ffff81007aa33b40 R11: 0000000000000060 R12: 0000000000000000 > Feb 13 13:16:08 ipstest kernel: [ 915.240605] R13: 0000000000000001 R14: ffffffff8077cde0 R15: ffff81007aa33a80 > Feb 13 13:16:08 ipstest kernel: [ 915.240741] FS: 0000000000000000(0000) GS:ffff810001039300(0000) knlGS:0000000000000000 > Feb 13 13:16:08 ipstest kernel: [ 915.240981] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > Feb 13 13:16:08 ipstest kernel: [ 915.241111] CR2: 0000000000000000 CR3: 0000000078a98000 CR4: 00000000000006e0 > Feb 13 13:16:08 ipstest kernel: [ 915.241248] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Feb 13 13:16:08 ipstest kernel: [ 915.241384] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Feb 13 13:16:08 ipstest kernel: [ 915.241520] Process scsi_scan_3 (pid: 5213, threadinfo ffff810076d86000, task ffff81007be26720) > Feb 13 13:16:08 ipstest kernel: [ 915.241761] Stack: ffff810076d87930 ffffffff802125c3 ffff81007aa33a80 ffff81007480cf50 > Feb 13 13:16:08 ipstest kernel: [ 915.242006] 0000000000000000 ffff81007ba38ca8 ffff810076d87940 ffffffff8046fb42 > Feb 13 13:16:08 ipstest kernel: [ 915.242248] ffff810076d879c0 ffffffff8801c2ee ffff81007aa33af0 000000017aa33af0 > Feb 13 13:16:08 ipstest kernel: [ 915.242389] Call Trace: > Feb 13 13:16:08 ipstest kernel: [ 915.242606] [nommu_map_sg+115/144] nommu_map_sg+0x73/0x90 > Feb 13 13:16:08 ipstest kernel: [ 915.242736] [scsi_dma_map+66/96] scsi_dma_map+0x42/0x60 > Feb 13 13:16:08 ipstest kernel: [ 915.242867] [_end+124884230/2127548952] :ips:ips_next+0x33e/0xc00 > Feb 13 13:16:08 ipstest kernel: [ 915.242986] [scsi_done+0/48] scsi_done+0x0/0x30 > Feb 13 13:16:08 ipstest kernel: [ 915.243114] [_end+124896894/2127548952] :ips:ips_queue+0x106/0x1f0 > Feb 13 13:16:08 ipstest kernel: [ 915.243240] [scsi_dispatch_cmd+498/784] scsi_dispatch_cmd+0x1f2/0x310 > Feb 13 13:16:08 ipstest kernel: [ 915.243370] [scsi_request_fn+491/976] scsi_request_fn+0x1eb/0x3d0 > Feb 13 13:16:08 ipstest kernel: [ 915.243500] [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30 > Feb 13 13:16:08 ipstest kernel: [ 915.243630] [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0 > Feb 13 13:16:08 ipstest kernel: [ 915.243761] [blk_execute_rq+122/224] blk_execute_rq+0x7a/0xe0 > Feb 13 13:16:08 ipstest kernel: [ 915.243889] [scsi_execute+240/288] scsi_execute+0xf0/0x120 > Feb 13 13:16:08 ipstest kernel: [ 915.244016] [scsi_execute_req+134/240] scsi_execute_req+0x86/0xf0 > Feb 13 13:16:08 ipstest kernel: [ 915.244145] [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90 > Feb 13 13:16:08 ipstest kernel: [ 915.244279] [sas_expander_match+27/160] sas_expander_match+0x1b/0xa0 > Feb 13 13:16:08 ipstest kernel: [ 915.244412] [get_device+23/32] get_device+0x17/0x20 > Feb 13 13:16:08 ipstest kernel: [ 915.244534] [__scsi_scan_target+220/1696] __scsi_scan_target+0xdc/0x6a0 > Feb 13 13:16:08 ipstest kernel: [ 915.244665] [enqueue_entity+172/432] enqueue_entity+0xac/0x1b0 > Feb 13 13:16:08 ipstest kernel: [ 915.244793] [update_curr_load+135/160] update_curr_load+0x87/0xa0 > Feb 13 13:16:08 ipstest kernel: [ 915.244923] [__check_preempt_curr_fair+107/128] __check_preempt_curr_fair+0x6b/0x80 > Feb 13 13:16:08 ipstest kernel: [ 915.245057] [update_curr+258/272] update_curr+0x102/0x110 > Feb 13 13:16:08 ipstest kernel: [ 915.245186] [scsi_scan_channel+139/160] scsi_scan_channel+0x8b/0xa0 > Feb 13 13:16:08 ipstest kernel: [ 915.245315] [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160 > Feb 13 13:16:08 ipstest kernel: [ 915.245447] [do_scan_async+0/320] do_scan_async+0x0/0x140 > Feb 13 13:16:08 ipstest kernel: [ 915.245574] [do_scsi_scan_host+126/128] do_scsi_scan_host+0x7e/0x80 > Feb 13 13:16:08 ipstest kernel: [ 915.245703] [do_scan_async+23/320] do_scan_async+0x17/0x140 > Feb 13 13:16:08 ipstest kernel: [ 915.245832] [do_scan_async+0/320] do_scan_async+0x0/0x140 > Feb 13 13:16:08 ipstest kernel: [ 915.245962] [kthread+77/128] kthread+0x4d/0x80 > Feb 13 13:16:08 ipstest kernel: [ 915.246086] [child_rip+10/18] child_rip+0xa/0x12 > Feb 13 13:16:08 ipstest kernel: [ 915.246209] [kthread+0/128] kthread+0x0/0x80 > Feb 13 13:16:08 ipstest kernel: [ 915.246333] [child_rip+0/18] child_rip+0x0/0x12 > Feb 13 13:16:08 ipstest kernel: [ 915.246457] > Feb 13 13:16:08 ipstest kernel: [ 915.246564] > Feb 13 13:16:08 ipstest kernel: [ 915.246565] Code: 4c 8b 00 48 8d 04 0a 4c 39 c0 76 2b b8 fe ff ff ff 31 f6 49 > Feb 13 13:16:08 ipstest kernel: [ 915.246933] RIP [check_addr+16/80] check_addr+0x10/0x50 > Feb 13 13:16:08 ipstest kernel: [ 915.247062] RSP <ffff810076d87900> > Feb 13 13:16:08 ipstest kernel: [ 915.247181] CR2: 0000000000000000 > > I was able to narrow it down in as much as with this reverted the machine > seems to run fine: > commit 2f4cf91cc0a1f32f75e1fa0a4d70a9bc7340a302 > [SCSI] ips: convert to use the data buffer accessors > > Nothing looks overly suspicious in that patch per se, although based > on the list archives it looks like related changes caused other drivers > grief. I've tried a variety of things to get a little more debug info, > but to no avail. If anybody has any suggestions, I'd appreciate them! Really sorry about the bug. I have a slight doubt on the breakup code though I'm not sure you hit the code. Reverting only the breakup part works? The patch is against 2.6.24. diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 5c5a9b2..acabb19 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -3251,34 +3251,52 @@ ips_done(ips_ha_t * ha, ips_scb_t * scb) * the rest of the data and continue. */ if ((scb->breakup) || (scb->sg_break)) { - struct scatterlist *sg; - int i, sg_dma_index, ips_sg_index = 0; - /* we had a data breakup */ scb->data_len = 0; - sg = scsi_sglist(scb->scsi_cmd); - - /* Spin forward to last dma chunk */ - sg_dma_index = scb->breakup; - for (i = 0; i < scb->breakup; i++) - sg = sg_next(sg); - - /* Take care of possible partial on last chunk */ - ips_fill_scb_sg_single(ha, - sg_dma_address(sg), - scb, ips_sg_index++, - sg_dma_len(sg)); - - for (; sg_dma_index < scsi_sg_count(scb->scsi_cmd); - sg_dma_index++, sg = sg_next(sg)) { - if (ips_fill_scb_sg_single - (ha, - sg_dma_address(sg), - scb, ips_sg_index++, - sg_dma_len(sg)) < 0) - break; - } + if (scb->sg_count) { + /* S/G request */ + struct scatterlist *sg; + int ips_sg_index = 0; + int sg_dma_index; + + sg = scb->scsi_cmd->request_buffer; + + /* Spin forward to last dma chunk */ + sg_dma_index = scb->breakup; + + /* Take care of possible partial on last chunk */ + ips_fill_scb_sg_single(ha, + sg_dma_address(&sg + [sg_dma_index]), + scb, ips_sg_index++, + sg_dma_len(&sg + [sg_dma_index])); + + for (; sg_dma_index < scb->sg_count; + sg_dma_index++) { + if (ips_fill_scb_sg_single + (ha, + sg_dma_address(&sg[sg_dma_index]), + scb, ips_sg_index++, + sg_dma_len(&sg[sg_dma_index])) < 0) + break; + + } + + } else { + /* Non S/G Request */ + (void) ips_fill_scb_sg_single(ha, + scb-> + data_busaddr + + (scb->sg_break * + ha->max_xfer), + scb, 0, + scb->scsi_cmd-> + request_bufflen - + (scb->sg_break * + ha->max_xfer)); + } scb->dcdb.transfer_length = scb->data_len; scb->dcdb.cmd_attribute |= ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-14 11:48 ` FUJITA Tomonori @ 2008-02-14 23:55 ` Tim Pepper 2008-02-15 0:13 ` FUJITA Tomonori 0 siblings, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-14 23:55 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: linux-scsi, Mark_Salyzyn, tomof On Thu 14 Feb at 20:48:38 +0900 fujita.tomonori@lab.ntt.co.jp said: > > I have a slight doubt on the breakup code though I'm not sure you hit > the code. Reverting only the breakup part works? The patch is against > 2.6.24. I've tested this revert you posted. Essentially the same trace is logged (see below). The machine doesn't die though. But /proc/scsi/scsi doesn't show the hardware and I am getting an additional trace dumped now (see futher below). Feb 14 16:04:13 ipstest kernel: [ 1046.531180] ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 18 (level, low) -> IRQ 18 Feb 14 16:04:13 ipstest kernel: [ 1046.553420] scsi3 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4Mx> Feb 14 16:04:13 ipstest kernel: [ 1046.554103] PGD 7cc1f067 PUD 7c4f6067 PMD 0 Feb 14 16:04:13 ipstest kernel: [ 1046.554348] CPU 0 Feb 14 16:04:13 ipstest kernel: [ 1046.554460] Modules linked in: ips aic94xx Feb 14 16:04:13 ipstest kernel: [ 1046.554587] Pid: 4476, comm: scsi_scan_3 Not tainted 2.6.24-fujita-revert-breakup #1 Feb 14 16:04:13 ipstest kernel: [ 1046.554827] RIP: 0010:[check_addr+16/80] [check_addr+16/80] check_addr+0x10/0x50 Feb 14 16:04:13 ipstest kernel: [ 1046.555079] RSP: 0018:ffff81007b0f78f0 EFLAGS: 00010082 Feb 14 16:04:13 ipstest kernel: [ 1046.555208] RAX: 0000000000000000 RBX: ffff81007b92a300 RCX: 0000000000000024 Feb 14 16:04:13 ipstest kernel: [ 1046.555350] RDX: 000000007b92a600 RSI: ffffffff807a7340 RDI: ffffffff806e7f43 Feb 14 16:04:13 ipstest kernel: [ 1046.555494] RBP: ffff81007b0f78f0 R08: 0000000000000500 R09: 0000000000000000 Feb 14 16:04:13 ipstest kernel: [ 1046.555632] R10: ffff81007c463b38 R11: 0000000000000060 R12: 0000000000000000 Feb 14 16:04:13 ipstest kernel: [ 1046.555784] R13: 0000000000000001 R14: ffffffff807a7340 R15: ffff81007c463a80 Feb 14 16:04:13 ipstest kernel: [ 1046.555924] FS: 0000000000000000(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000 Feb 14 16:04:13 ipstest kernel: [ 1046.556173] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 14 16:04:13 ipstest kernel: [ 1046.556306] CR2: 0000000000000000 CR3: 000000007cd02000 CR4: 00000000000006e0 Feb 14 16:04:13 ipstest kernel: [ 1046.556443] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 14 16:04:13 ipstest kernel: [ 1046.556582] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 14 16:04:13 ipstest kernel: [ 1046.556731] Process scsi_scan_3 (pid: 4476, threadinfo ffff81007b0f6000, task ffff81007b9295c0) Feb 14 16:04:13 ipstest kernel: [ 1046.556972] Stack: ffff81007b0f7920 ffffffff80213118 ffff81007c463a80 ffff81007b0fcf50 Feb 14 16:04:13 ipstest kernel: [ 1046.557221] 0000000000000000 ffff81007cd14450 ffff81007b0f7930 ffffffff8047a4c2 Feb 14 16:04:13 ipstest kernel: [ 1046.557463] ffff81007b0f79b0 ffffffff8801c28e ffff81007c463ae8 000000017c463ae8 Feb 14 16:04:13 ipstest kernel: [ 1046.557603] Call Trace: Feb 14 16:04:13 ipstest kernel: [ 1046.557830] [nommu_map_sg+104/176] nommu_map_sg+0x68/0xb0 Feb 14 16:04:13 ipstest kernel: [ 1046.557949] [scsi_dma_map+66/96] scsi_dma_map+0x42/0x60 Feb 14 16:04:13 ipstest kernel: [ 1046.558083] [_end+124740294/2127405112] :ips:ips_next+0x33e/0xc10 Feb 14 16:04:13 ipstest kernel: [ 1046.558205] [scsi_done+0/48] scsi_done+0x0/0x30 Feb 14 16:04:13 ipstest kernel: [ 1046.558332] [_end+124752974/2127405112] :ips:ips_queue+0x106/0x1f0 Feb 14 16:04:13 ipstest kernel: [ 1046.558459] [scsi_dispatch_cmd+453/736] scsi_dispatch_cmd+0x1c5/0x2e0 Feb 14 16:04:13 ipstest kernel: [ 1046.558589] [scsi_request_fn+491/976] scsi_request_fn+0x1eb/0x3d0 Feb 14 16:04:13 ipstest kernel: [ 1046.558725] [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30 Feb 14 16:04:13 ipstest kernel: [ 1046.558851] [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0 Feb 14 16:04:13 ipstest kernel: [ 1046.558982] [blk_execute_rq+122/224] blk_execute_rq+0x7a/0xe0 Feb 14 16:04:13 ipstest kernel: [ 1046.559110] [scsi_execute+240/288] scsi_execute+0xf0/0x120 Feb 14 16:04:13 ipstest kernel: [ 1046.559237] [scsi_execute_req+134/240] scsi_execute_req+0x86/0xf0 Feb 14 16:04:13 ipstest kernel: [ 1046.559366] [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90 Feb 14 16:04:13 ipstest kernel: [ 1046.559501] [sas_expander_match+27/160] sas_expander_match+0x1b/0xa0 Feb 14 16:04:13 ipstest kernel: [ 1046.559633] [get_device+23/32] get_device+0x17/0x20 Feb 14 16:04:13 ipstest kernel: [ 1046.559755] [__scsi_scan_target+220/1680] __scsi_scan_target+0xdc/0x690 Feb 14 16:04:13 ipstest kernel: [ 1046.559887] [enqueue_task_fair+32/64] enqueue_task_fair+0x20/0x40 Feb 14 16:04:13 ipstest kernel: [ 1046.560019] [enqueue_task+19/48] enqueue_task+0x13/0x30 Feb 14 16:04:13 ipstest kernel: [ 1046.560141] [update_curr+101/192] update_curr+0x65/0xc0 Feb 14 16:04:13 ipstest kernel: [ 1046.560268] [scsi_scan_channel+139/160] scsi_scan_channel+0x8b/0xa0 Feb 14 16:04:13 ipstest kernel: [ 1046.560397] [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160 Feb 14 16:04:13 ipstest kernel: [ 1046.560532] [thread_return+61/1307] thread_return+0x3d/0x51b Feb 14 16:04:13 ipstest kernel: [ 1046.560657] [do_scan_async+0/384] do_scan_async+0x0/0x180 Feb 14 16:04:13 ipstest kernel: [ 1046.560784] [do_scsi_scan_host+126/128] do_scsi_scan_host+0x7e/0x80 Feb 14 16:04:13 ipstest kernel: [ 1046.560913] [do_scan_async+29/384] do_scan_async+0x1d/0x180 Feb 14 16:04:13 ipstest kernel: [ 1046.561041] [do_scan_async+0/384] do_scan_async+0x0/0x180 Feb 14 16:04:13 ipstest kernel: [ 1046.561172] [kthread+77/128] kthread+0x4d/0x80 Feb 14 16:04:13 ipstest kernel: [ 1046.561294] [child_rip+10/18] child_rip+0xa/0x12 Feb 14 16:04:13 ipstest kernel: [ 1046.561417] [kthread+0/128] kthread+0x0/0x80 Feb 14 16:04:13 ipstest kernel: [ 1046.561542] [child_rip+0/18] child_rip+0x0/0x12 Feb 14 16:04:13 ipstest kernel: [ 1046.561666] Feb 14 16:04:13 ipstest kernel: [ 1046.561772] Feb 14 16:04:13 ipstest kernel: [ 1046.561773] Code: 4c 8b 00 48 8d 04 0a 4c 39 c0 76 2b b8 fe ff ff ff 31 f6 49 Feb 14 16:04:13 ipstest kernel: [ 1046.562268] RSP <ffff81007b0f78f0> Feb 14 16:04:13 ipstest kernel: [ 1046.562853] ---[ end trace 3d5dd58222a8e80a ]--- Feb 14 16:04:28 ipstest kernel: [ 1061.857443] CPU 0: Feb 14 16:04:28 ipstest kernel: [ 1061.857673] Modules linked in: ips aic94xx Feb 14 16:04:28 ipstest kernel: [ 1061.857996] Pid: 0, comm: swapper Tainted: G D 2.6.24-fujita-revert-breakup #1 Feb 14 16:04:28 ipstest kernel: [ 1061.858291] RIP: 0010:[_spin_lock_irqsave+22/48] [_spin_lock_irqsave+22/48] _spin_lock_irqsave+0x16/0x30 Feb 14 16:04:28 ipstest kernel: [ 1061.858647] RSP: 0018:ffffffff8087fe60 EFLAGS: 00000286 Feb 14 16:04:28 ipstest kernel: [ 1061.858825] RAX: 0000000000000282 RBX: ffffffff8087fe60 RCX: ffff81007c463ae8 Feb 14 16:04:28 ipstest kernel: [ 1061.859012] RDX: ffff81007cd14000 RSI: 0000000000000001 RDI: ffff81007cd14050 Feb 14 16:04:28 ipstest kernel: [ 1061.859198] RBP: ffffffff8087fde0 R08: 000000000213c81f R09: 0000000000000010 Feb 14 16:04:28 ipstest kernel: [ 1061.859385] R10: 0000000000000005 R11: 0000000000000246 R12: ffffffff8020ced6 Feb 14 16:04:28 ipstest kernel: [ 1061.859572] R13: ffffffff8087fde0 R14: ffff81007c463a80 R15: ffff81007cd14000 Feb 14 16:04:28 ipstest kernel: [ 1061.859763] FS: 0000000000000000(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000 Feb 14 16:04:28 ipstest kernel: [ 1061.860058] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 14 16:04:28 ipstest kernel: [ 1061.860240] CR2: 0000000000611ec0 CR3: 000000007c466000 CR4: 00000000000006e0 Feb 14 16:04:28 ipstest kernel: [ 1061.860425] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 14 16:04:28 ipstest kernel: [ 1061.860613] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 14 16:04:28 ipstest kernel: [ 1061.860821] Feb 14 16:04:28 ipstest kernel: [ 1061.860822] Call Trace: Feb 14 16:04:28 ipstest kernel: [ 1061.861141] <IRQ> [scsi_eh_scmd_add+62/208] scsi_eh_scmd_add+0x3e/0xd0 Feb 14 16:04:28 ipstest kernel: [ 1061.861390] [scsi_times_out+0/192] scsi_times_out+0x0/0xc0 Feb 14 16:04:28 ipstest kernel: [ 1061.861573] [scsi_times_out+79/192] scsi_times_out+0x4f/0xc0 Feb 14 16:04:28 ipstest kernel: [ 1061.861758] [run_timer_softirq+359/464] run_timer_softirq+0x167/0x1d0 Feb 14 16:04:28 ipstest kernel: [ 1061.861942] [__do_softirq+116/240] __do_softirq+0x74/0xf0 Feb 14 16:04:28 ipstest kernel: [ 1061.862129] [profile_tick+94/160] profile_tick+0x5e/0xa0 Feb 14 16:04:28 ipstest kernel: [ 1061.862313] [call_softirq+28/48] call_softirq+0x1c/0x30 Feb 14 16:04:28 ipstest kernel: [ 1061.862495] [do_softirq+61/144] do_softirq+0x3d/0x90 Feb 14 16:04:28 ipstest kernel: [ 1061.862676] [irq_exit+69/80] irq_exit+0x45/0x50 Feb 14 16:04:28 ipstest kernel: [ 1061.862857] [smp_apic_timer_interrupt+74/112] smp_apic_timer_interrupt+0x4a/0x70 Feb 14 16:04:28 ipstest kernel: [ 1061.863041] [mwait_idle+0/80] mwait_idle+0x0/0x50 Feb 14 16:04:28 ipstest kernel: [ 1061.863223] [apic_timer_interrupt+102/112] apic_timer_interrupt+0x66/0x70 Feb 14 16:04:28 ipstest kernel: [ 1061.863403] <EOI> [mwait_idle+70/80] mwait_idle+0x46/0x50 Feb 14 16:04:28 ipstest kernel: [ 1061.863677] [enter_idle+34/48] enter_idle+0x22/0x30 Feb 14 16:04:28 ipstest kernel: [ 1061.863857] [cpu_idle+115/160] cpu_idle+0x73/0xa0 Feb 14 16:04:28 ipstest kernel: [ 1061.864039] [rest_init+90/96] rest_init+0x5a/0x60 Feb 14 16:04:28 ipstest kernel: [ 1061.864225] [start_kernel+618/752] start_kernel+0x26a/0x2f0 Feb 14 16:04:28 ipstest kernel: [ 1061.864406] [x86_64_start_kernel+298/320] _sinittext+0x12a/0x140 Feb 14 16:04:28 ipstest kernel: [ 1061.864585] That second trace then is repeatedly dumping every 12seconds without end. -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-14 23:55 ` Tim Pepper @ 2008-02-15 0:13 ` FUJITA Tomonori 2008-02-15 1:16 ` Tim Pepper 0 siblings, 1 reply; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-15 0:13 UTC (permalink / raw) To: lnxninja; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn, tomof On Thu, 14 Feb 2008 15:55:49 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > On Thu 14 Feb at 20:48:38 +0900 fujita.tomonori@lab.ntt.co.jp said: > > > > I have a slight doubt on the breakup code though I'm not sure you hit > > the code. Reverting only the breakup part works? The patch is against > > 2.6.24. > > I've tested this revert you posted. Essentially the same trace is logged > (see below). The machine doesn't die though. But /proc/scsi/scsi doesn't > show the hardware and I am getting an additional trace dumped now (see > futher below). Thanks. So we surely have a bug in the non-breakup part. I've just found one bug. Can you try this patch against 2.6.24? Thanks, diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 5c5a9b2..d8f5eb5 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -1575,12 +1575,12 @@ ips_make_passthru(ips_ha_t *ha, struct scsi_cmnd *SC, ips_scb_t *scb, int intr) ips_passthru_t *pt; int length = 0; int i, ret; - struct scatterlist *sg = scsi_sglist(SC); + struct scatterlist *sg; METHOD_TRACE("ips_make_passthru", 1); scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i) - length += sg[i].length; + length += sg->length; if (length < sizeof (ips_passthru_t)) { /* wrong size */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-15 0:13 ` FUJITA Tomonori @ 2008-02-15 1:16 ` Tim Pepper 2008-02-15 16:09 ` FUJITA Tomonori 0 siblings, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-15 1:16 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: linux-scsi, Mark_Salyzyn, tomof On Fri 15 Feb at 09:13:16 +0900 fujita.tomonori@lab.ntt.co.jp said: > > Thanks. So we surely have a bug in the non-breakup part. > > I've just found one bug. Can you try this patch against 2.6.24? Tested and unfortunately no change. Behaves same as the breakup-revert patch. -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-15 1:16 ` Tim Pepper @ 2008-02-15 16:09 ` FUJITA Tomonori 2008-02-15 22:50 ` Tim Pepper 0 siblings, 1 reply; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-15 16:09 UTC (permalink / raw) To: lnxninja; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn, tomof On Thu, 14 Feb 2008 17:16:36 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > On Fri 15 Feb at 09:13:16 +0900 fujita.tomonori@lab.ntt.co.jp said: > > > > Thanks. So we surely have a bug in the non-breakup part. > > > > I've just found one bug. Can you try this patch against 2.6.24? > > Tested and unfortunately no change. Behaves same as the breakup-revert patch. Thanks and sorry about that. Now I don't have any idea. I split the patch. Can you please apply them in a step-by-step way and tell me which patch breaks the driver? There are nine patches against 2.6.24: http://www.kernel.org/pub/linux/kernel/people/tomo/ips/ The first one is just reverting the data buffer accessors conversion. It would be nice if we could just revert it but we can't. These changes are necessary to compile the driver against post 2.6.24. Really sorry about the troubles, ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-15 16:09 ` FUJITA Tomonori @ 2008-02-15 22:50 ` Tim Pepper 2008-02-16 0:41 ` FUJITA Tomonori 0 siblings, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-15 22:50 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said: > > The first one is just reverting the data buffer accessors > conversion. It would be nice if we could just revert it but we > can't. These changes are necessary to compile the driver against post > 2.6.24. Fujita-san, Unfortunately (and not too surprisingly given what we've tried so far) with only the first of your series reverted the driver is working fine for me again. I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly similar sounding issues with other drivers. Could there be some memory uninitialised? I did try changing all the ips.c kmalloc's to kzalloc's, but that didn't help. Also that thread ties into pci gart. The machines we've been using are liable to getting pci calgary although given my .config has: CONFIG_GART_IOMMU=y CONFIG_CALGARY_IOMMU=y # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set and that when booting this mainline I don't see any Calgary related messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not actually running the calgary iommu code in these repros. Anyway, I greatly appreciate your efforts so far in trying to find what could be wrong here! -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-15 22:50 ` Tim Pepper @ 2008-02-16 0:41 ` FUJITA Tomonori 2008-02-17 12:52 ` Boaz Harrosh 2008-02-17 21:15 ` Tim Pepper 0 siblings, 2 replies; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-16 0:41 UTC (permalink / raw) To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn On Fri, 15 Feb 2008 14:50:57 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said: > > > > The first one is just reverting the data buffer accessors > > conversion. It would be nice if we could just revert it but we > > can't. These changes are necessary to compile the driver against post > > 2.6.24. > > Fujita-san, > > Unfortunately (and not too surprisingly given what we've tried so far) with > only the first of your series reverted the driver is working fine for me > again. Do you mean that you applied only the following two patches against 2.6.24, and then it doesn't work? 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch If so, the second patch is broken. Did you saw BUG_ON message (I added some BUG_ON to the patch)? > I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly > similar sounding issues with other drivers. Could there be some memory > uninitialised? I did try changing all the ips.c kmalloc's to kzalloc's, > but that didn't help. Also that thread ties into pci gart. The machines > we've been using are liable to getting pci calgary although given my > .config has: > CONFIG_GART_IOMMU=y > CONFIG_CALGARY_IOMMU=y > # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set > and that when booting this mainline I don't see any Calgary related > messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not > actually running the calgary iommu code in these repros. Yes, probabaly, your machine doesn't use any IOMMU hardware (nommu_map_sg function was in your crash log). > Anyway, I greatly appreciate your efforts so far in trying to find what > could be wrong here! Really sorry about the troubles and thanks for testing. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-16 0:41 ` FUJITA Tomonori @ 2008-02-17 12:52 ` Boaz Harrosh 2008-02-17 13:09 ` Boaz Harrosh 2008-02-17 21:15 ` Tim Pepper 1 sibling, 1 reply; 26+ messages in thread From: Boaz Harrosh @ 2008-02-17 12:52 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn On Sat, Feb 16 2008 at 2:41 +0200, FUJITA Tomonori <tomof@acm.org> wrote: > On Fri, 15 Feb 2008 14:50:57 -0800 > Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > >> On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said: >>> The first one is just reverting the data buffer accessors >>> conversion. It would be nice if we could just revert it but we >>> can't. These changes are necessary to compile the driver against post >>> 2.6.24. >> Fujita-san, >> >> Unfortunately (and not too surprisingly given what we've tried so far) with >> only the first of your series reverted the driver is working fine for me >> again. > > Do you mean that you applied only the following two patches against > 2.6.24, and then it doesn't work? > > 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch > 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch > > If so, the second patch is broken. Did you saw BUG_ON message (I added > some BUG_ON to the patch)? > > >> I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly >> similar sounding issues with other drivers. Could there be some memory >> uninitialised? I did try changing all the ips.c kmalloc's to kzalloc's, >> but that didn't help. Also that thread ties into pci gart. The machines >> we've been using are liable to getting pci calgary although given my >> .config has: >> CONFIG_GART_IOMMU=y >> CONFIG_CALGARY_IOMMU=y >> # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set >> and that when booting this mainline I don't see any Calgary related >> messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not >> actually running the calgary iommu code in these repros. > > Yes, probabaly, your machine doesn't use any IOMMU hardware > (nommu_map_sg function was in your crash log). > > >> Anyway, I greatly appreciate your efforts so far in trying to find what >> could be wrong here! > > Really sorry about the troubles and thanks for testing. > - Tomo hi It looks like the same bug we had with USB's isd200 and protocol.c. An overflow of a data buffer bigger then then the sglist. There 2 it was in the INQUIRY command. You just need to also check for sg != NULL in the for() loop. Tim Please test below patch. It's ontop of 2.6.24 but should also apply to 2.6.25-rcx Boaz -- >From ec20bea25c9fe2400378b19c128b15fef3c7cbb6 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh <bharrosh@panasas.com> Date: Sun, 17 Feb 2008 14:50:25 +0200 Subject: [PATCH] ips: Avoid overflow in writing scsi command data Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- drivers/scsi/ips.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 5c5a9b2..1d12253 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -3517,7 +3517,7 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count) struct scatterlist *sg = scsi_sglist(scmd); for (i = 0, xfer_cnt = 0; - (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) { + (i < scsi_sg_count(scmd)) && (xfer_cnt < count) && sg; i++) { min_cnt = min(count - xfer_cnt, sg[i].length); /* kmap_atomic() ensures addressability of the data buffer.*/ -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-17 12:52 ` Boaz Harrosh @ 2008-02-17 13:09 ` Boaz Harrosh 2008-02-17 21:16 ` Tim Pepper 0 siblings, 1 reply; 26+ messages in thread From: Boaz Harrosh @ 2008-02-17 13:09 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn On Sun, Feb 17 2008 at 14:52 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > On Sat, Feb 16 2008 at 2:41 +0200, FUJITA Tomonori <tomof@acm.org> wrote: >> On Fri, 15 Feb 2008 14:50:57 -0800 >> Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: >> >>> On Sat 16 Feb at 01:09:43 +0900 tomof@acm.org said: >>>> The first one is just reverting the data buffer accessors >>>> conversion. It would be nice if we could just revert it but we >>>> can't. These changes are necessary to compile the driver against post >>>> 2.6.24. >>> Fujita-san, >>> >>> Unfortunately (and not too surprisingly given what we've tried so far) with >>> only the first of your series reverted the driver is working fine for me >>> again. >> Do you mean that you applied only the following two patches against >> 2.6.24, and then it doesn't work? >> >> 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch >> 0002-ips-kill-the-map_single-path-in-ips_scmd_buf_write.patch >> >> If so, the second patch is broken. Did you saw BUG_ON message (I added >> some BUG_ON to the patch)? >> >> >>> I saw (eg: replies to http://lkml.org/lkml/2007/5/11/132) some possibly >>> similar sounding issues with other drivers. Could there be some memory >>> uninitialised? I did try changing all the ips.c kmalloc's to kzalloc's, >>> but that didn't help. Also that thread ties into pci gart. The machines >>> we've been using are liable to getting pci calgary although given my >>> .config has: >>> CONFIG_GART_IOMMU=y >>> CONFIG_CALGARY_IOMMU=y >>> # CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set >>> and that when booting this mainline I don't see any Calgary related >>> messages like I get from eg: Ubuntu's 2.6.22-14-server...I'm probably not >>> actually running the calgary iommu code in these repros. >> Yes, probabaly, your machine doesn't use any IOMMU hardware >> (nommu_map_sg function was in your crash log). >> >> >>> Anyway, I greatly appreciate your efforts so far in trying to find what >>> could be wrong here! >> Really sorry about the troubles and thanks for testing. >> - > Tomo hi > It looks like the same bug we had with USB's isd200 and protocol.c. An overflow > of a data buffer bigger then then the sglist. There 2 it was in the INQUIRY command. > > You just need to also check for sg != NULL in the for() loop. > > Tim Please test below patch. It's ontop of 2.6.24 but should also apply to > 2.6.25-rcx > > Boaz > -- > From ec20bea25c9fe2400378b19c128b15fef3c7cbb6 Mon Sep 17 00:00:00 2001 > From: Boaz Harrosh <bharrosh@panasas.com> > Date: Sun, 17 Feb 2008 14:50:25 +0200 > Subject: [PATCH] ips: Avoid overflow in writing scsi command data > > Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> > --- > drivers/scsi/ips.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c > index 5c5a9b2..1d12253 100644 > --- a/drivers/scsi/ips.c > +++ b/drivers/scsi/ips.c > @@ -3517,7 +3517,7 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count) Disregard that patch it is totally wrong. That's not it. Sorry Boaz ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-17 13:09 ` Boaz Harrosh @ 2008-02-17 21:16 ` Tim Pepper 0 siblings, 0 replies; 26+ messages in thread From: Tim Pepper @ 2008-02-17 21:16 UTC (permalink / raw) To: Boaz Harrosh Cc: FUJITA Tomonori, lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn On Sun 17 Feb at 15:09:06 +0200 bharrosh@panasas.com said: > > > > You just need to also check for sg != NULL in the for() loop. > > > > Tim Please test below patch. It's ontop of 2.6.24 but should also apply to > > 2.6.25-rcx > Disregard that patch it is totally wrong. That's not it. Sorry > > Boaz > You sure? I was about to give it a quick run...but I guess I wont then. :) -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-16 0:41 ` FUJITA Tomonori 2008-02-17 12:52 ` Boaz Harrosh @ 2008-02-17 21:15 ` Tim Pepper 2008-02-17 22:31 ` FUJITA Tomonori 1 sibling, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-17 21:15 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn On Sat 16 Feb at 09:41:48 +0900 tomof@acm.org said: > > Do you mean that you applied only the following two patches against > 2.6.24, and then it doesn't work? > > 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch I only applies the 0001 patch and the driver appeared to function fine. > If so, the second patch is broken. Did you saw BUG_ON message (I added > some BUG_ON to the patch)? I did not apply the 0002 patch. -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-17 21:15 ` Tim Pepper @ 2008-02-17 22:31 ` FUJITA Tomonori 2008-02-17 23:29 ` Tim Pepper 2008-02-17 23:37 ` Tim Pepper 0 siblings, 2 replies; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-17 22:31 UTC (permalink / raw) To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn On Sun, 17 Feb 2008 13:15:40 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > On Sat 16 Feb at 09:41:48 +0900 tomof@acm.org said: > > > > Do you mean that you applied only the following two patches against > > 2.6.24, and then it doesn't work? > > > > 0001-ips-revert-the-changes-for-the-data-buffer-accessor.patch > > I only applies the 0001 patch and the driver appeared to function fine. I see, thanks. It would be nice if we can push only the 0001 patch to mainline, however as I explained, we can't unfortunately. The 0002-0009 patches are necessary for post 2.6.24. > > If so, the second patch is broken. Did you saw BUG_ON message (I added > > some BUG_ON to the patch)? > > I did not apply the 0002 patch. Can you apply the 0001 and 0002 against 2.6.24 and see how it works? If it works well, then please apply the 0001, 0002 and 0003. Please the patches in a step-by-step way and tell me which patch breaks the driver. Thanks, ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-17 22:31 ` FUJITA Tomonori @ 2008-02-17 23:29 ` Tim Pepper 2008-02-17 23:37 ` Tim Pepper 1 sibling, 0 replies; 26+ messages in thread From: Tim Pepper @ 2008-02-17 23:29 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn On Mon 18 Feb at 07:31:56 +0900 tomof@acm.org said: > > Can you apply the 0001 and 0002 against 2.6.24 and see how it works? > If it works well, then please apply the 0001, 0002 and 0003. > > Please the patches in a step-by-step way and tell me which patch > breaks the driver. I wasn't thinking Friday...we already expected 0001 to fix it, so the hope is one of the others breaks it and that is useful info. I'll apply and test the rest of the series. Sorry! -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-17 22:31 ` FUJITA Tomonori 2008-02-17 23:29 ` Tim Pepper @ 2008-02-17 23:37 ` Tim Pepper 2008-02-18 13:32 ` FUJITA Tomonori 1 sibling, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-17 23:37 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: fujita.tomonori, linux-scsi, Mark_Salyzyn On Mon 19 Feb at 07:31:56 +0900 tomof@acm.org said: > > Can you apply the 0001 and 0002 against 2.6.24 and see how it works? > If it works well, then please apply the 0001, 0002 and 0003. Fujita-san, I've started through the patches in order, cumulatively and after applying 0005 things break. I wont be able to test anything else until tomorrow when I can phycisally reset the machine... -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-17 23:37 ` Tim Pepper @ 2008-02-18 13:32 ` FUJITA Tomonori 2008-02-18 23:30 ` Tim Pepper 2008-02-19 8:02 ` FUJITA Tomonori 0 siblings, 2 replies; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-18 13:32 UTC (permalink / raw) To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn On Sun, 17 Feb 2008 15:37:02 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > On Mon 19 Feb at 07:31:56 +0900 tomof@acm.org said: > > > > Can you apply the 0001 and 0002 against 2.6.24 and see how it works? > > If it works well, then please apply the 0001, 0002 and 0003. > > Fujita-san, > > I've started through the patches in order, cumulatively and after applying > 0005 things break. I wont be able to test anything else until tomorrow > when I can phycisally reset the machine... Great, thanks a lot! Can you apply this patch after the 0005 patch and see how it works? If it works, then please continue to test 0006, 0007 ... diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 05bb6ea..39cdd68 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -6906,7 +6906,7 @@ ips_register_scsi(int index) sh->max_channel = ha->nbus - 1; sh->can_queue = ha->max_cmds - 1; - scsi_add_host(sh, NULL); + scsi_add_host(sh, &ha->pcidev->dev); scsi_scan_host(sh); return 0; -- 1.5.3.7 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-18 13:32 ` FUJITA Tomonori @ 2008-02-18 23:30 ` Tim Pepper 2008-02-19 0:11 ` FUJITA Tomonori 2008-02-19 8:02 ` FUJITA Tomonori 1 sibling, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-18 23:30 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn On Mon 18 Feb at 22:32:46 +0900 tomof@acm.org said: > > diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c > index 05bb6ea..39cdd68 100644 > --- a/drivers/scsi/ips.c > +++ b/drivers/scsi/ips.c > @@ -6906,7 +6906,7 @@ ips_register_scsi(int index) > sh->max_channel = ha->nbus - 1; > sh->can_queue = ha->max_cmds - 1; > > - scsi_add_host(sh, NULL); > + scsi_add_host(sh, &ha->pcidev->dev); > scsi_scan_host(sh); > > return 0; Fujita-san, This applies and runs well on top of your 0005 patch! The rest of the patches also then apply in order and run successfully. Just to confirm, I applied the above alone to a clean 2.6.24 and things again build and run successfully. For completeness I also reproduced the problem against 2.6.23.16 and verified the above patch fixes on that kernel version as well. Assuming this patch is accepted for 2.6.25, please also queue it for the 2.6.23/24 stable trees. Thank you very much for your help in tracking this issue down! -- Tim Pepper <lnxninja@linux.vnet.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-18 23:30 ` Tim Pepper @ 2008-02-19 0:11 ` FUJITA Tomonori 2008-02-19 3:48 ` Tim Pepper 0 siblings, 1 reply; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-19 0:11 UTC (permalink / raw) To: lnxninja; +Cc: tomof, fujita.tomonori, linux-scsi, Mark_Salyzyn On Mon, 18 Feb 2008 15:30:58 -0800 Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > On Mon 18 Feb at 22:32:46 +0900 tomof@acm.org said: > > > > diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c > > index 05bb6ea..39cdd68 100644 > > --- a/drivers/scsi/ips.c > > +++ b/drivers/scsi/ips.c > > @@ -6906,7 +6906,7 @@ ips_register_scsi(int index) > > sh->max_channel = ha->nbus - 1; > > sh->can_queue = ha->max_cmds - 1; > > > > - scsi_add_host(sh, NULL); > > + scsi_add_host(sh, &ha->pcidev->dev); > > scsi_scan_host(sh); > > > > return 0; > > Fujita-san, > > This applies and runs well on top of your 0005 patch! The rest of the > patches also then apply in order and run successfully. Great, thanks a lot! > Just to confirm, I applied the above alone to a clean 2.6.24 and things > again build and run successfully. For completeness I also reproduced > the problem against 2.6.23.16 and verified the above patch fixes on that > kernel version as well. Nice. There is another bug on 2.6.24 but we rarely hit this so 2.6.24 works most of the time: http://marc.info/?l=linux-scsi&m=120303487528875&w=2 > Assuming this patch is accepted for 2.6.25, please also queue it for > the 2.6.23/24 stable trees. Yes, I will take care about it. Can you please help me just once more? 2.6.25-rc2 fixed this bug in a bit different way by chance. Please test 2.6.25-rc2 with the attached patch to make sure that ips in 2.6.25 works well. > Thank you very much for your help in tracking this issue down! No problem. I should have fixed it long time ago. Really sorry about it. diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index bb152fb..429592a 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -1576,7 +1576,7 @@ ips_make_passthru(ips_ha_t *ha, struct scsi_cmnd *SC, ips_scb_t *scb, int intr) METHOD_TRACE("ips_make_passthru", 1); scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i) - length += sg[i].length; + length += sg->length; if (length < sizeof (ips_passthru_t)) { /* wrong size */ @@ -3510,15 +3510,16 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count) struct scatterlist *sg = scsi_sglist(scmd); for (i = 0, xfer_cnt = 0; - (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) { - min_cnt = min(count - xfer_cnt, sg[i].length); + (i < scsi_sg_count(scmd)) && (xfer_cnt < count); + i++, sg = sg_next(sg)) { + min_cnt = min(count - xfer_cnt, sg->length); /* kmap_atomic() ensures addressability of the data buffer.*/ /* local_irq_save() protects the KM_IRQ0 address slot. */ local_irq_save(flags); - buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset; + buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset; memcpy(buffer, &cdata[xfer_cnt], min_cnt); - kunmap_atomic(buffer - sg[i].offset, KM_IRQ0); + kunmap_atomic(buffer - sg->offset, KM_IRQ0); local_irq_restore(flags); xfer_cnt += min_cnt; @@ -3543,15 +3544,16 @@ ips_scmd_buf_read(struct scsi_cmnd *scmd, void *data, unsigned int count) struct scatterlist *sg = scsi_sglist(scmd); for (i = 0, xfer_cnt = 0; - (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) { - min_cnt = min(count - xfer_cnt, sg[i].length); + (i < scsi_sg_count(scmd)) && (xfer_cnt < count); + i++, sg = sg_next(sg)) { + min_cnt = min(count - xfer_cnt, sg->length); /* kmap_atomic() ensures addressability of the data buffer.*/ /* local_irq_save() protects the KM_IRQ0 address slot. */ local_irq_save(flags); - buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset; + buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset; memcpy(&cdata[xfer_cnt], buffer, min_cnt); - kunmap_atomic(buffer - sg[i].offset, KM_IRQ0); + kunmap_atomic(buffer - sg->offset, KM_IRQ0); local_irq_restore(flags); xfer_cnt += min_cnt; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-19 0:11 ` FUJITA Tomonori @ 2008-02-19 3:48 ` Tim Pepper 2008-02-19 8:22 ` FUJITA Tomonori 0 siblings, 1 reply; 26+ messages in thread From: Tim Pepper @ 2008-02-19 3:48 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: tomof, linux-scsi, Mark_Salyzyn On Feb 18, 2008 4:11 PM, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > Can you please help me just once more? 2.6.25-rc2 fixed this bug in a > bit different way by chance. Please test 2.6.25-rc2 with the attached > patch to make sure that ips in 2.6.25 works well. Confirmed...the patch below against 2.6.25-rc2 also works for me. Thank you again Fujita-san! > > diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c > index bb152fb..429592a 100644 > --- a/drivers/scsi/ips.c > +++ b/drivers/scsi/ips.c > @@ -1576,7 +1576,7 @@ ips_make_passthru(ips_ha_t *ha, struct scsi_cmnd *SC, ips_scb_t *scb, int intr) > METHOD_TRACE("ips_make_passthru", 1); > > scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i) > - length += sg[i].length; > + length += sg->length; > > if (length < sizeof (ips_passthru_t)) { > /* wrong size */ > @@ -3510,15 +3510,16 @@ ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count) > struct scatterlist *sg = scsi_sglist(scmd); > > for (i = 0, xfer_cnt = 0; > - (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) { > - min_cnt = min(count - xfer_cnt, sg[i].length); > + (i < scsi_sg_count(scmd)) && (xfer_cnt < count); > + i++, sg = sg_next(sg)) { > + min_cnt = min(count - xfer_cnt, sg->length); > > /* kmap_atomic() ensures addressability of the data buffer.*/ > /* local_irq_save() protects the KM_IRQ0 address slot. */ > local_irq_save(flags); > - buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset; > + buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset; > memcpy(buffer, &cdata[xfer_cnt], min_cnt); > - kunmap_atomic(buffer - sg[i].offset, KM_IRQ0); > + kunmap_atomic(buffer - sg->offset, KM_IRQ0); > local_irq_restore(flags); > > xfer_cnt += min_cnt; > @@ -3543,15 +3544,16 @@ ips_scmd_buf_read(struct scsi_cmnd *scmd, void *data, unsigned int count) > struct scatterlist *sg = scsi_sglist(scmd); > > for (i = 0, xfer_cnt = 0; > - (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) { > - min_cnt = min(count - xfer_cnt, sg[i].length); > + (i < scsi_sg_count(scmd)) && (xfer_cnt < count); > + i++, sg = sg_next(sg)) { > + min_cnt = min(count - xfer_cnt, sg->length); > > /* kmap_atomic() ensures addressability of the data buffer.*/ > /* local_irq_save() protects the KM_IRQ0 address slot. */ > local_irq_save(flags); > - buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset; > + buffer = kmap_atomic(sg_page(sg), KM_IRQ0) + sg->offset; > memcpy(&cdata[xfer_cnt], buffer, min_cnt); > - kunmap_atomic(buffer - sg[i].offset, KM_IRQ0); > + kunmap_atomic(buffer - sg->offset, KM_IRQ0); > local_irq_restore(flags); > > xfer_cnt += min_cnt; ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-19 3:48 ` Tim Pepper @ 2008-02-19 8:22 ` FUJITA Tomonori 0 siblings, 0 replies; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-19 8:22 UTC (permalink / raw) To: lnxninja; +Cc: fujita.tomonori, tomof, linux-scsi, Mark_Salyzyn On Mon, 18 Feb 2008 19:48:49 -0800 "Tim Pepper" <lnxninja@linux.vnet.ibm.com> wrote: > On Feb 18, 2008 4:11 PM, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > > Can you please help me just once more? 2.6.25-rc2 fixed this bug in a > > bit different way by chance. Please test 2.6.25-rc2 with the attached > > patch to make sure that ips in 2.6.25 works well. > > Confirmed...the patch below against 2.6.25-rc2 also works for me. Thanks a lot! I'll make sure that the ips driver in 2.6.23-stable, 2.6.24-stable (and 2.6.25) will work well. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-18 13:32 ` FUJITA Tomonori 2008-02-18 23:30 ` Tim Pepper @ 2008-02-19 8:02 ` FUJITA Tomonori 2008-02-19 16:06 ` James Bottomley 1 sibling, 1 reply; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-19 8:02 UTC (permalink / raw) To: James.Bottomley; +Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn ips did scsi_add_host(sh, NULL) so scsi_dma_map uses shost_gendev.parent that isn't initialized properly, then the kernel crashes. 2.6.23 and 2.6.24 have this bug. We can fix this by calling scsi_add_host with pdev->dev, in the standard way (like the following way) but this bug was fixed in the current Linus tree by: commit 2551a13e61d3c3df6c2da6de5a3ece78e6d67111 Author: Jeff Garzik <jeff@garzik.org> Date: Thu Dec 13 16:14:10 2007 -0800 [SCSI] ips: handle scsi_add_host() failure, and other err cleanups James, the legitimate way to fix stable trees is sending this commit (not sending a patch that was not committed upstream)? On Mon, 18 Feb 2008 22:32:46 +0900 FUJITA Tomonori <tomof@acm.org> wrote: > On Sun, 17 Feb 2008 15:37:02 -0800 > Tim Pepper <lnxninja@linux.vnet.ibm.com> wrote: > > > On Mon 19 Feb at 07:31:56 +0900 tomof@acm.org said: > > > > > > Can you apply the 0001 and 0002 against 2.6.24 and see how it works? > > > If it works well, then please apply the 0001, 0002 and 0003. > > > > Fujita-san, > > > > I've started through the patches in order, cumulatively and after applying > > 0005 things break. I wont be able to test anything else until tomorrow > > when I can phycisally reset the machine... > > Great, thanks a lot! > > Can you apply this patch after the 0005 patch and see how it works? If > it works, then please continue to test 0006, 0007 ... > > > diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c > index 05bb6ea..39cdd68 100644 > --- a/drivers/scsi/ips.c > +++ b/drivers/scsi/ips.c > @@ -6906,7 +6906,7 @@ ips_register_scsi(int index) > sh->max_channel = ha->nbus - 1; > sh->can_queue = ha->max_cmds - 1; > > - scsi_add_host(sh, NULL); > + scsi_add_host(sh, &ha->pcidev->dev); > scsi_scan_host(sh); > > return 0; > -- > 1.5.3.7 > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-19 8:02 ` FUJITA Tomonori @ 2008-02-19 16:06 ` James Bottomley 2008-02-19 20:29 ` FUJITA Tomonori 0 siblings, 1 reply; 26+ messages in thread From: James Bottomley @ 2008-02-19 16:06 UTC (permalink / raw) To: FUJITA Tomonori Cc: lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn, Jeff Garzik On Tue, 2008-02-19 at 17:02 +0900, FUJITA Tomonori wrote: > ips did scsi_add_host(sh, NULL) so scsi_dma_map uses > shost_gendev.parent that isn't initialized properly, then the kernel > crashes. 2.6.23 and 2.6.24 have this bug. > > We can fix this by calling scsi_add_host with pdev->dev, in the > standard way (like the following way) but this bug was fixed in the > current Linus tree by: > > commit 2551a13e61d3c3df6c2da6de5a3ece78e6d67111 > Author: Jeff Garzik <jeff@garzik.org> > Date: Thu Dec 13 16:14:10 2007 -0800 > > [SCSI] ips: handle scsi_add_host() failure, and other err cleanups > > > James, the legitimate way to fix stable trees is sending this commit > (not sending a patch that was not committed upstream)? Well, the upstream patch doesn't look so bad as a stable candidate to my eye. Just because it's an unintended bugfix doesn't automatically invalidate it. The reason stable likes backports of existing upstream patches is because they've supposedly been well tested in upstream. Although that doesn't apply in this case because the other bug rather prevented testing, the principle is still sound. So, would there be any problems simply backporting this? James ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: ips.c broken since 2.6.23 on x86_64? 2008-02-19 16:06 ` James Bottomley @ 2008-02-19 20:29 ` FUJITA Tomonori 0 siblings, 0 replies; 26+ messages in thread From: FUJITA Tomonori @ 2008-02-19 20:29 UTC (permalink / raw) To: James.Bottomley Cc: tomof, lnxninja, fujita.tomonori, linux-scsi, Mark_Salyzyn, jeff On Tue, 19 Feb 2008 10:06:39 -0600 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Tue, 2008-02-19 at 17:02 +0900, FUJITA Tomonori wrote: > > ips did scsi_add_host(sh, NULL) so scsi_dma_map uses > > shost_gendev.parent that isn't initialized properly, then the kernel > > crashes. 2.6.23 and 2.6.24 have this bug. > > > > We can fix this by calling scsi_add_host with pdev->dev, in the > > standard way (like the following way) but this bug was fixed in the > > current Linus tree by: > > > > commit 2551a13e61d3c3df6c2da6de5a3ece78e6d67111 > > Author: Jeff Garzik <jeff@garzik.org> > > Date: Thu Dec 13 16:14:10 2007 -0800 > > > > [SCSI] ips: handle scsi_add_host() failure, and other err cleanups > > > > > > James, the legitimate way to fix stable trees is sending this commit > > (not sending a patch that was not committed upstream)? > > Well, the upstream patch doesn't look so bad as a stable candidate to my > eye. Just because it's an unintended bugfix doesn't automatically > invalidate it. > > The reason stable likes backports of existing upstream patches is > because they've supposedly been well tested in upstream. Although that > doesn't apply in this case because the other bug rather prevented > testing, the principle is still sound. > > So, would there be any problems simply backporting this? Thanks, I see. There is no problem. Please backport this. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2008-02-19 20:31 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-13 21:43 ips.c broken since 2.6.23 on x86_64? Tim Pepper
[not found] ` <532ABFBDAAC3A34EB12EBA6CEC2838F439964688@ADPE2K703.adaptec.com>
2008-02-14 0:04 ` Tim Pepper
2008-02-18 14:57 ` Salyzyn, Mark
2008-02-18 23:34 ` Tim Pepper
2008-02-14 11:48 ` FUJITA Tomonori
2008-02-14 23:55 ` Tim Pepper
2008-02-15 0:13 ` FUJITA Tomonori
2008-02-15 1:16 ` Tim Pepper
2008-02-15 16:09 ` FUJITA Tomonori
2008-02-15 22:50 ` Tim Pepper
2008-02-16 0:41 ` FUJITA Tomonori
2008-02-17 12:52 ` Boaz Harrosh
2008-02-17 13:09 ` Boaz Harrosh
2008-02-17 21:16 ` Tim Pepper
2008-02-17 21:15 ` Tim Pepper
2008-02-17 22:31 ` FUJITA Tomonori
2008-02-17 23:29 ` Tim Pepper
2008-02-17 23:37 ` Tim Pepper
2008-02-18 13:32 ` FUJITA Tomonori
2008-02-18 23:30 ` Tim Pepper
2008-02-19 0:11 ` FUJITA Tomonori
2008-02-19 3:48 ` Tim Pepper
2008-02-19 8:22 ` FUJITA Tomonori
2008-02-19 8:02 ` FUJITA Tomonori
2008-02-19 16:06 ` James Bottomley
2008-02-19 20:29 ` FUJITA Tomonori
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.