* Re: 2.6.23-mm1
[not found] ` <64bb37e0710120131y6b939951y74c50bd596b1d938@mail.gmail.com>
@ 2007-10-12 8:37 ` Andrew Morton
2007-10-12 12:46 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 8:01 ` 2.6.23-mm1 Torsten Kaiser
0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2007-10-12 8:37 UTC (permalink / raw)
To: Torsten Kaiser; +Cc: linux-kernel, linux-ide
On Fri, 12 Oct 2007 10:31:42 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
> On 10/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Fri, 12 Oct 2007 14:03:28 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > > On Thu, 11 Oct 2007 21:31:26 -0700
> > > Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > >
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23/2.6.23-mm1/
> > > >
> > > > - I've been largely avoiding applying anything since rc8-mm2 in an attempt
> > > > to stabilise things for the 2.6.23 merge.
> > > >
> > > On RHEL5/x86_64 environment,
> > >
> > > ==
> > > [kamezawa@hannibal ref-2.6.23-mm1]$ make menuconfig
> > > Makefile:456: /home/kamezawa/ref-2.6.23-mm1/arch//Makefile: No such file or directory
> > > make: *** No rule to make target `/home/kamezawa/ref-2.6.23-mm1/arch//Makefile'. Stop.
> > > ==
> > >
> > > $(ARCH) cannot be detected automatically...
> >
> > So you need to set $ARCH by hand? I always do that so I didn't notice this.
>
> After setting ARCH by hand, it build and booted OK for me.
OK.
> But I did add the patch fromhttp://lkml.org/lkml/2007/10/11/48 as my
> personal hotfix.
I think Jeff has that in hand?
> Two things I noted in my logs:
> [ 16.040000] NET: Registered protocol family 1
> [ 16.050000] NET: Registered protocol family 17
> [ 16.060000] NET: Registered protocol family 15
> [ 16.080000] sysctl table check failed: /sunrpc/transports .7249.14
> Missing strategy
> [ 16.100000] sysctl table check failed: /sunrpc/transports .7249.14
> Unknown sysctl binary path
> [ 16.130000] RPC: Registered udp transport module.
> [ 16.140000] RPC: Registered tcp transport module.
> ... but NFSv4 still works.
Yeah, Bruce will be dropping the relevant patch - when it comes back it
should use CTL_UNNUMBERED.
> Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in
> smartd database.
hm.
> Oct 12 10:23:03 treogen [ 105.990000] WARNING: at
> drivers/ata/libata-core.c:5752 ata_qc_issue()
Let's cc linux-ide.
> Oct 12 10:23:03 treogen [ 105.990000]
> Oct 12 10:23:03 treogen [ 105.990000] Call Trace:
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>]
> ata_qc_issue+0x47f/0x540
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>]
> ata_scsi_flush_xlat+0x0/0x30
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044a6ea>]
> ata_scsi_translate+0xfa/0x180
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044d84d>]
> ata_scsi_queuecmd+0x12d/0x210
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804333d0>]
> scsi_dispatch_cmd+0x150/0x250
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804391f1>]
> scsi_request_fn+0x1f1/0x360
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039b827>]
> elv_insert+0x167/0x250
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff803a0ac2>]
> __make_request+0xe2/0x670
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d560>]
> generic_make_request+0x1d0/0x3c0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc1b9>]
> bio_alloc_bioset+0xb9/0x140
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc061>]
> __bio_clone+0x91/0xc0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d7b6>]
> submit_bio+0x66/0xf0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804cc06e>]
> write_page+0x16e/0x2c0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80231b01>]
> dequeue_task_fair+0x51/0xb0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804c482d>]
> md_update_sb+0x18d/0x320
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804c9065>]
> md_check_recovery+0x1f5/0x550
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804bf1d3>] raid5d+0x23/0x490
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8023eb12>]
> try_to_del_timer_sync+0x52/0x60
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff805b0057>]
> schedule_timeout+0x67/0xd0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8023e740>]
> process_timeout+0x0/0x10
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff805b004a>]
> schedule_timeout+0x5a/0xd0
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa40>]
> md_thread+0x30/0x100
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a710>]
> autoremove_wake_function+0x0/0x30
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a32b>] kthread+0x4b/0x80
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8020c9d8>] child_rip+0xa/0x12
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a2e0>] kthread+0x0/0x80
> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8020c9ce>] child_rip+0x0/0x12
> Oct 12 10:23:03 treogen [ 105.990000]
> Oct 12 10:23:13 treogen [ 115.940000] ata3.00: exception Emask 0x0
> SAct 0x0 SErr 0x0 action 0x2 frozen
> Oct 12 10:23:13 treogen [ 115.940000] ata3.00: cmd
> b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> Oct 12 10:23:13 treogen [ 115.940000] res
> 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Oct 12 10:23:13 treogen [ 115.940000] ata3.00: status: { DRDY }
> Oct 12 10:23:14 treogen [ 116.270000] ata3: soft resetting link
> Oct 12 10:23:14 treogen [ 116.430000] ata3: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300)
> Oct 12 10:23:14 treogen [ 116.740000] ata3.00: configured for UDMA/133
> Oct 12 10:23:14 treogen [ 116.740000] ata3: EH complete
> Oct 12 10:23:14 treogen [ 116.740000] WARNING: at
> drivers/ata/libata-core.c:5752 ata_qc_issue()
> Oct 12 10:23:14 treogen [ 116.740000]
> Oct 12 10:23:14 treogen [ 116.740000] Call Trace:
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804442ef>]
> ata_qc_issue+0x47f/0x540
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80449c80>]
> ata_scsi_flush_xlat+0x0/0x30
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8044a6ea>]
> ata_scsi_translate+0xfa/0x180
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8044d84d>]
> ata_scsi_queuecmd+0x12d/0x210
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804333d0>]
> scsi_dispatch_cmd+0x150/0x250
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804391f1>]
> scsi_request_fn+0x1f1/0x360
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436b80>]
> scsi_error_handler+0x0/0x310
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8039fe73>]
> blk_run_queue+0x43/0x80
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80438659>]
> scsi_run_host_queues+0x19/0x40
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436d54>]
> scsi_error_handler+0x1d4/0x310
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436b80>]
> scsi_error_handler+0x0/0x310
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8024a32b>] kthread+0x4b/0x80
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8020c9d8>] child_rip+0xa/0x12
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8024a2e0>] kthread+0x0/0x80
> Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8020c9ce>] child_rip+0x0/0x12
> Oct 12 10:23:14 treogen [ 116.740000]
> Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] 625142448
> 512-byte hardware sectors (320073 MB)
> Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Write Protect is off
> Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA
> Oct 12 10:23:24 treogen [ 126.740000] ata3.00: exception Emask 0x0
> SAct 0x0 SErr 0x0 action 0x2 frozen
> Oct 12 10:23:24 treogen [ 126.740000] ata3.00: cmd
> b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> Oct 12 10:23:24 treogen [ 126.740000] res
> 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Oct 12 10:23:24 treogen [ 126.740000] ata3.00: status: { DRDY }
> Oct 12 10:23:24 treogen [ 127.070000] ata3: soft resetting link
> Oct 12 10:23:25 treogen [ 127.230000] ata3: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300)
> Oct 12 10:23:25 treogen [ 127.370000] ata3.00: configured for UDMA/133
> Oct 12 10:23:25 treogen [ 127.370000] ata3: EH complete
> Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] 625142448
> 512-byte hardware sectors (320073 MB)
> Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Write Protect is off
> Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA
> Oct 12 10:23:25 treogen smartd[6091]: Device: /dev/sdc, is SMART
> capable. Adding to "monitor" list.
> ... but I can still access the filesystem and the RAID device on that drive.
> (sdc is MAXTOR STM332082 3.AA sata-drive on a MCP55 using sata_nv with
> swncq activated)
>
> Torsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-12 8:37 ` 2.6.23-mm1 Andrew Morton
@ 2007-10-12 12:46 ` Torsten Kaiser
2007-10-13 8:01 ` 2.6.23-mm1 Torsten Kaiser
1 sibling, 0 replies; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-12 12:46 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-ide
On 10/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 12 Oct 2007 10:31:42 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
> > But I did add the patch from http://lkml.org/lkml/2007/10/11/48 as my
> > personal hotfix.
>
> I think Jeff has that in hand?
I would more expect Jens, as the breakage in ata_sg_is_last comes
through the sglist patches from the block gittree.
That comment was more to say, that this patch does not blow up. ;)
> > Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in
> > smartd database.
>
> hm.
smartd always said that. Never thought that it would matter.
And it also say this about the other two identical drives that are
connected via the SiI 3132 instead the MCP55. And until now smartd
worked with this drive, logging temperature changes into
/var/log/messages.
hm: Even with the warnings below it does that:
Oct 12 10:53:25 treogen smartd[6095]: Device: /dev/sdc, SMART Usage
Attribute: 195 Hardware_ECC_Recovered changed from 57 to 58
Oct 12 11:23:26 treogen smartd[6095]: Device: /dev/sdc, SMART Usage
Attribute: 190 Temperature_Celsius changed from 51 to 50
Oct 12 11:23:26 treogen smartd[6095]: Device: /dev/sdc, SMART Usage
Attribute: 194 Temperature_Celsius changed from 49 to 50
Oct 12 13:23:25 treogen smartd[6095]: Device: /dev/sdc, SMART Usage
Attribute: 195 Hardware_ECC_Recovered changed from 58 to 57
But I have not seen any new WARNINGs...
Torsten
> > Oct 12 10:23:03 treogen [ 105.990000] WARNING: at
> > drivers/ata/libata-core.c:5752 ata_qc_issue()
>
> Let's cc linux-ide.
>
> > Oct 12 10:23:03 treogen [ 105.990000]
> > Oct 12 10:23:03 treogen [ 105.990000] Call Trace:
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>]
> > ata_qc_issue+0x47f/0x540
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>]
> > ata_scsi_flush_xlat+0x0/0x30
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044a6ea>]
> > ata_scsi_translate+0xfa/0x180
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044d84d>]
> > ata_scsi_queuecmd+0x12d/0x210
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804333d0>]
> > scsi_dispatch_cmd+0x150/0x250
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804391f1>]
> > scsi_request_fn+0x1f1/0x360
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039b827>]
> > elv_insert+0x167/0x250
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff803a0ac2>]
> > __make_request+0xe2/0x670
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d560>]
> > generic_make_request+0x1d0/0x3c0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc1b9>]
> > bio_alloc_bioset+0xb9/0x140
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc061>]
> > __bio_clone+0x91/0xc0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d7b6>]
> > submit_bio+0x66/0xf0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804cc06e>]
> > write_page+0x16e/0x2c0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80231b01>]
> > dequeue_task_fair+0x51/0xb0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804c482d>]
> > md_update_sb+0x18d/0x320
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804c9065>]
> > md_check_recovery+0x1f5/0x550
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804bf1d3>] raid5d+0x23/0x490
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8023eb12>]
> > try_to_del_timer_sync+0x52/0x60
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff805b0057>]
> > schedule_timeout+0x67/0xd0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8023e740>]
> > process_timeout+0x0/0x10
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff805b004a>]
> > schedule_timeout+0x5a/0xd0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa40>]
> > md_thread+0x30/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a710>]
> > autoremove_wake_function+0x0/0x30
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a32b>] kthread+0x4b/0x80
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8020c9d8>] child_rip+0xa/0x12
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a2e0>] kthread+0x0/0x80
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8020c9ce>] child_rip+0x0/0x12
> > Oct 12 10:23:03 treogen [ 105.990000]
> > Oct 12 10:23:13 treogen [ 115.940000] ata3.00: exception Emask 0x0
> > SAct 0x0 SErr 0x0 action 0x2 frozen
> > Oct 12 10:23:13 treogen [ 115.940000] ata3.00: cmd
> > b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> > Oct 12 10:23:13 treogen [ 115.940000] res
> > 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> > Oct 12 10:23:13 treogen [ 115.940000] ata3.00: status: { DRDY }
> > Oct 12 10:23:14 treogen [ 116.270000] ata3: soft resetting link
> > Oct 12 10:23:14 treogen [ 116.430000] ata3: SATA link up 3.0 Gbps
> > (SStatus 123 SControl 300)
> > Oct 12 10:23:14 treogen [ 116.740000] ata3.00: configured for UDMA/133
> > Oct 12 10:23:14 treogen [ 116.740000] ata3: EH complete
> > Oct 12 10:23:14 treogen [ 116.740000] WARNING: at
> > drivers/ata/libata-core.c:5752 ata_qc_issue()
> > Oct 12 10:23:14 treogen [ 116.740000]
> > Oct 12 10:23:14 treogen [ 116.740000] Call Trace:
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804442ef>]
> > ata_qc_issue+0x47f/0x540
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80449c80>]
> > ata_scsi_flush_xlat+0x0/0x30
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8044a6ea>]
> > ata_scsi_translate+0xfa/0x180
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8044d84d>]
> > ata_scsi_queuecmd+0x12d/0x210
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804333d0>]
> > scsi_dispatch_cmd+0x150/0x250
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804391f1>]
> > scsi_request_fn+0x1f1/0x360
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436b80>]
> > scsi_error_handler+0x0/0x310
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8039fe73>]
> > blk_run_queue+0x43/0x80
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80438659>]
> > scsi_run_host_queues+0x19/0x40
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436d54>]
> > scsi_error_handler+0x1d4/0x310
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436b80>]
> > scsi_error_handler+0x0/0x310
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8024a32b>] kthread+0x4b/0x80
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8020c9d8>] child_rip+0xa/0x12
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8024a2e0>] kthread+0x0/0x80
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8020c9ce>] child_rip+0x0/0x12
> > Oct 12 10:23:14 treogen [ 116.740000]
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] 625142448
> > 512-byte hardware sectors (320073 MB)
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Write Protect is off
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Write cache:
> > enabled, read cache: enabled, doesn't support DPO or FUA
> > Oct 12 10:23:24 treogen [ 126.740000] ata3.00: exception Emask 0x0
> > SAct 0x0 SErr 0x0 action 0x2 frozen
> > Oct 12 10:23:24 treogen [ 126.740000] ata3.00: cmd
> > b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> > Oct 12 10:23:24 treogen [ 126.740000] res
> > 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> > Oct 12 10:23:24 treogen [ 126.740000] ata3.00: status: { DRDY }
> > Oct 12 10:23:24 treogen [ 127.070000] ata3: soft resetting link
> > Oct 12 10:23:25 treogen [ 127.230000] ata3: SATA link up 3.0 Gbps
> > (SStatus 123 SControl 300)
> > Oct 12 10:23:25 treogen [ 127.370000] ata3.00: configured for UDMA/133
> > Oct 12 10:23:25 treogen [ 127.370000] ata3: EH complete
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] 625142448
> > 512-byte hardware sectors (320073 MB)
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Write Protect is off
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Write cache:
> > enabled, read cache: enabled, doesn't support DPO or FUA
> > Oct 12 10:23:25 treogen smartd[6091]: Device: /dev/sdc, is SMART
> > capable. Adding to "monitor" list.
> > ... but I can still access the filesystem and the RAID device on that drive.
> > (sdc is MAXTOR STM332082 3.AA sata-drive on a MCP55 using sata_nv with
> > swncq activated)
> >
> > Torsten
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-12 8:37 ` 2.6.23-mm1 Andrew Morton
2007-10-12 12:46 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 8:01 ` Torsten Kaiser
2007-10-13 10:55 ` 2.6.23-mm1 Jeff Garzik
1 sibling, 1 reply; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-13 8:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-ide
On 10/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 12 Oct 2007 10:31:42 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
> > Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in
> > smartd database.
>
> hm.
>
> > Oct 12 10:23:03 treogen [ 105.990000] WARNING: at
> > drivers/ata/libata-core.c:5752 ata_qc_issue()
>
> Let's cc linux-ide.
>
> > Oct 12 10:23:03 treogen [ 105.990000]
> > Oct 12 10:23:03 treogen [ 105.990000] Call Trace:
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>]
> > ata_qc_issue+0x47f/0x540
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>]
> > ata_scsi_flush_xlat+0x0/0x30
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044a6ea>]
> > ata_scsi_translate+0xfa/0x180
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8044d84d>]
> > ata_scsi_queuecmd+0x12d/0x210
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804333d0>]
> > scsi_dispatch_cmd+0x150/0x250
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804391f1>]
> > scsi_request_fn+0x1f1/0x360
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039b827>]
> > elv_insert+0x167/0x250
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff803a0ac2>]
> > __make_request+0xe2/0x670
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d560>]
> > generic_make_request+0x1d0/0x3c0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc1b9>]
> > bio_alloc_bioset+0xb9/0x140
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff802bc061>]
> > __bio_clone+0x91/0xc0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8039d7b6>]
> > submit_bio+0x66/0xf0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804cc06e>]
> > write_page+0x16e/0x2c0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80231b01>]
> > dequeue_task_fair+0x51/0xb0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804c482d>]
> > md_update_sb+0x18d/0x320
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804c9065>]
> > md_check_recovery+0x1f5/0x550
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804bf1d3>] raid5d+0x23/0x490
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8023eb12>]
> > try_to_del_timer_sync+0x52/0x60
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff805b0057>]
> > schedule_timeout+0x67/0xd0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8023e740>]
> > process_timeout+0x0/0x10
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff805b004a>]
> > schedule_timeout+0x5a/0xd0
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa40>]
> > md_thread+0x30/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a710>]
> > autoremove_wake_function+0x0/0x30
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804caa10>] md_thread+0x0/0x100
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a32b>] kthread+0x4b/0x80
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8020c9d8>] child_rip+0xa/0x12
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8024a2e0>] kthread+0x0/0x80
> > Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff8020c9ce>] child_rip+0x0/0x12
> > Oct 12 10:23:03 treogen [ 105.990000]
> > Oct 12 10:23:13 treogen [ 115.940000] ata3.00: exception Emask 0x0
> > SAct 0x0 SErr 0x0 action 0x2 frozen
> > Oct 12 10:23:13 treogen [ 115.940000] ata3.00: cmd
> > b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> > Oct 12 10:23:13 treogen [ 115.940000] res
> > 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> > Oct 12 10:23:13 treogen [ 115.940000] ata3.00: status: { DRDY }
> > Oct 12 10:23:14 treogen [ 116.270000] ata3: soft resetting link
> > Oct 12 10:23:14 treogen [ 116.430000] ata3: SATA link up 3.0 Gbps
> > (SStatus 123 SControl 300)
> > Oct 12 10:23:14 treogen [ 116.740000] ata3.00: configured for UDMA/133
> > Oct 12 10:23:14 treogen [ 116.740000] ata3: EH complete
> > Oct 12 10:23:14 treogen [ 116.740000] WARNING: at
> > drivers/ata/libata-core.c:5752 ata_qc_issue()
> > Oct 12 10:23:14 treogen [ 116.740000]
> > Oct 12 10:23:14 treogen [ 116.740000] Call Trace:
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804442ef>]
> > ata_qc_issue+0x47f/0x540
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80449c80>]
> > ata_scsi_flush_xlat+0x0/0x30
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8044a6ea>]
> > ata_scsi_translate+0xfa/0x180
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8044d84d>]
> > ata_scsi_queuecmd+0x12d/0x210
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804333d0>]
> > scsi_dispatch_cmd+0x150/0x250
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff804391f1>]
> > scsi_request_fn+0x1f1/0x360
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436b80>]
> > scsi_error_handler+0x0/0x310
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8039fe73>]
> > blk_run_queue+0x43/0x80
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80438659>]
> > scsi_run_host_queues+0x19/0x40
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436d54>]
> > scsi_error_handler+0x1d4/0x310
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff80436b80>]
> > scsi_error_handler+0x0/0x310
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8024a32b>] kthread+0x4b/0x80
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8020c9d8>] child_rip+0xa/0x12
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8024a2e0>] kthread+0x0/0x80
> > Oct 12 10:23:14 treogen [ 116.740000] [<ffffffff8020c9ce>] child_rip+0x0/0x12
> > Oct 12 10:23:14 treogen [ 116.740000]
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] 625142448
> > 512-byte hardware sectors (320073 MB)
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Write Protect is off
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> > Oct 12 10:23:14 treogen [ 116.770000] sd 2:0:0:0: [sdc] Write cache:
> > enabled, read cache: enabled, doesn't support DPO or FUA
> > Oct 12 10:23:24 treogen [ 126.740000] ata3.00: exception Emask 0x0
> > SAct 0x0 SErr 0x0 action 0x2 frozen
> > Oct 12 10:23:24 treogen [ 126.740000] ata3.00: cmd
> > b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> > Oct 12 10:23:24 treogen [ 126.740000] res
> > 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> > Oct 12 10:23:24 treogen [ 126.740000] ata3.00: status: { DRDY }
> > Oct 12 10:23:24 treogen [ 127.070000] ata3: soft resetting link
> > Oct 12 10:23:25 treogen [ 127.230000] ata3: SATA link up 3.0 Gbps
> > (SStatus 123 SControl 300)
> > Oct 12 10:23:25 treogen [ 127.370000] ata3.00: configured for UDMA/133
> > Oct 12 10:23:25 treogen [ 127.370000] ata3: EH complete
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] 625142448
> > 512-byte hardware sectors (320073 MB)
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Write Protect is off
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> > Oct 12 10:23:25 treogen [ 127.370000] sd 2:0:0:0: [sdc] Write cache:
> > enabled, read cache: enabled, doesn't support DPO or FUA
> > Oct 12 10:23:25 treogen smartd[6091]: Device: /dev/sdc, is SMART
> > capable. Adding to "monitor" list.
> > ... but I can still access the filesystem and the RAID device on that drive.
> > (sdc is MAXTOR STM332082 3.AA sata-drive on a MCP55 using sata_nv with
> > swncq activated)
> >
> > Torsten
>
On the next boot no WARNING show up.
On the third boot with 2.6.23-mm1 the drive failed completely:
First I got this WARNING:
Oct 13 07:46:48 treogen smartd[6081]: Device: /dev/sdc, opened
Oct 13 07:46:48 treogen [ 99.850000] WARNING: at
drivers/ata/libata-core.c:5761 ata_qc_issue()
Oct 13 07:46:48 treogen [ 99.850000]
Oct 13 07:46:48 treogen [ 99.850000] Call Trace:
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044431a>]
ata_qc_issue+0x4aa/0x540
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044ce30>]
ata_scsi_pass_thru+0x0/0x2c0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044a6ea>]
ata_scsi_translate+0xfa/0x180
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044d84d>]
ata_scsi_queuecmd+0x12d/0x210
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff804333d0>]
scsi_dispatch_cmd+0x150/0x250
Oct 13 07:46:48 treogen smartd[6081]: Device: /dev/sdc, not found in
smartd database.
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff804391f1>]
scsi_request_fn+0x1f1/0x360
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8039f362>]
blk_execute_rq_nowait+0x62/0xb0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8039f446>]
blk_execute_rq+0x96/0x110
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8039f5b1>]
get_request_wait+0x21/0x1a0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8022c8ea>]
__wake_up_common+0x5a/0x90
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80438e14>]
scsi_execute+0xe4/0x120
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044cb14>]
ata_cmd_ioctl+0x124/0x270
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044cd67>]
ata_scsi_ioctl+0x107/0x1d0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8043424c>]
scsi_ioctl+0xbc/0x330
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff803a14f3>]
blkdev_driver_ioctl+0x93/0xa0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff803a1766>]
blkdev_ioctl+0x266/0x7c0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8022c8ea>]
__wake_up_common+0x5a/0x90
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8022c8ea>]
__wake_up_common+0x5a/0x90
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8022d543>] __wake_up+0x43/0x70
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff802babba>]
invalidate_inode_buffers+0x2a/0x100
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8024a5e0>]
bit_waitqueue+0x10/0xd0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff802bd08b>]
block_ioctl+0x1b/0x30
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff802a08bf>] do_ioctl+0x2f/0xa0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff802a0b50>]
vfs_ioctl+0x220/0x2d0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff802a0c91>] sys_ioctl+0x91/0xb0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8020bbbe>]
system_call+0x7e/0x83
Oct 13 07:46:48 treogen [ 99.850000]
Oct 13 07:46:48 treogen [ 99.850000] ata3: EH in SWNCQ
mode,QC:qc_active 0x3 sactive 0x1
Oct 13 07:46:48 treogen [ 99.850000] ata3: SWNCQ:qc_active 0x1
defer_bits 0x0 last_issue_tag 0x0
Oct 13 07:46:48 treogen [ 99.850000] dhfis 0x1 dmafis 0x0 sdbfis 0x0
Oct 13 07:46:48 treogen [ 99.850000] ata3: ATA_REG 0x51 ERR_REG 0x4
Oct 13 07:46:48 treogen [ 99.850000] ata3: tag : dhfis dmafis sdbfis sacitve
Oct 13 07:46:48 treogen [ 99.850000] ata3: tag 0x0: 1 0 0 1
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: exception Emask 0x1
SAct 0x1 SErr 0x0 action 0x6 frozen
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: Ata error. fis:0x41
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: cmd
60/30:00:d1:6b:db/00:00:18:00:00/40 tag 0 cdb 0x0 data 24576 in
Oct 13 07:46:48 treogen [ 99.850000] res
51/04:00:01:4f:c2/04:00:d1:6b:db/00 Emask 0x1 (device error)
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: status: { DRDY ERR }
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: error: { ABRT }
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: cmd
b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 1 cdb 0x0 data 0
Oct 13 07:46:48 treogen [ 99.850000] res
51/04:00:01:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: status: { DRDY ERR }
Oct 13 07:46:48 treogen [ 99.850000] ata3.00: error: { ABRT }
Oct 13 07:46:48 treogen [ 99.850000] ata3: hard resetting link
Oct 13 07:46:49 treogen [ 100.360000] ata3: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Oct 13 07:46:49 treogen [ 100.510000] ata3.00: configured for UDMA/133
Oct 13 07:46:49 treogen [ 100.510000] ata3: EH complete
then the other two WARNINGs again. (drivers/ata/libata-core.c:5752)
After that the drive is inaccessible.
The last now "good" kernel for this problem is probable
2.6.23-rc8-mm1. That version only had the sata_sil24-bug
(ata_sg_is_last). I only booted 2.6.23-rc8-mm2 one time and that one
try did not complete bootup. But was neither able to see the complete
OOPS or save it. And as I was still trying to find the other bug, I
did not investigate more.
Torsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 8:01 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 10:55 ` Jeff Garzik
2007-10-13 12:03 ` 2.6.23-mm1 Torsten Kaiser
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2007-10-13 10:55 UTC (permalink / raw)
To: Torsten Kaiser
Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
Torsten Kaiser wrote:
> On 10/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Fri, 12 Oct 2007 10:31:42 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
>>> Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in
>>> smartd database.
>> hm.
>>
>>> Oct 12 10:23:03 treogen [ 105.990000] WARNING: at
>>> drivers/ata/libata-core.c:5752 ata_qc_issue()
>> Let's cc linux-ide.
>>
>>> Oct 12 10:23:03 treogen [ 105.990000]
>>> Oct 12 10:23:03 treogen [ 105.990000] Call Trace:
>>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>]
>>> ata_qc_issue+0x47f/0x540
>>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
>>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>]
>>> ata_scsi_flush_xlat+0x0/0x30
> Oct 13 07:46:48 treogen [ 99.850000]
> Oct 13 07:46:48 treogen [ 99.850000] ata3: EH in SWNCQ
> mode,QC:qc_active 0x3 sactive 0x1
> Oct 13 07:46:48 treogen [ 99.850000] ata3: SWNCQ:qc_active 0x1
> defer_bits 0x0 last_issue_tag 0x0
The WARNING indicates that there is a SWNCQ bug in sata_nv. Given that
the problem appears when SYNCHRONIZE CACHE is being issued, I would
guess that sata_nv is not properly handling non-queued commands.
NVIDIA CC'd.
This is a patch from libata-dev.git#nv-swncq (via #ALL).
Jeff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 10:55 ` 2.6.23-mm1 Jeff Garzik
@ 2007-10-13 12:03 ` Torsten Kaiser
2007-10-13 12:19 ` 2.6.23-mm1 Jeff Garzik
0 siblings, 1 reply; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-13 12:03 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
> Torsten Kaiser wrote:
> > On 10/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> On Fri, 12 Oct 2007 10:31:42 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
> >>> Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in
> >>> smartd database.
> >> hm.
> >>
> >>> Oct 12 10:23:03 treogen [ 105.990000] WARNING: at
> >>> drivers/ata/libata-core.c:5752 ata_qc_issue()
> >> Let's cc linux-ide.
> >>
> >>> Oct 12 10:23:03 treogen [ 105.990000]
> >>> Oct 12 10:23:03 treogen [ 105.990000] Call Trace:
> >>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>]
> >>> ata_qc_issue+0x47f/0x540
> >>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
> >>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>]
> >>> ata_scsi_flush_xlat+0x0/0x30
>
> > Oct 13 07:46:48 treogen [ 99.850000]
> > Oct 13 07:46:48 treogen [ 99.850000] ata3: EH in SWNCQ
> > mode,QC:qc_active 0x3 sactive 0x1
> > Oct 13 07:46:48 treogen [ 99.850000] ata3: SWNCQ:qc_active 0x1
> > defer_bits 0x0 last_issue_tag 0x0
>
> The WARNING indicates that there is a SWNCQ bug in sata_nv. Given that
> the problem appears when SYNCHRONIZE CACHE is being issued, I would
I can't follow you on SYNCHRONIZE CACHE.
The only command written to the syslog in the errors where
0x60==ATA_CMD_FPDMA_READ and 0xB0 (which is not in
include/linux/ata.h, but ATA-6 says that this is SMART related. That
makes sense, as smartd is failing).
> guess that sata_nv is not properly handling non-queued commands.
But that still seems correct, as I would not expect that SMART
commands get queued. (Thats just a guess, as I did not try to find the
code that does this distinction)
> This is a patch from libata-dev.git#nv-swncq (via #ALL).
Comparing sata_nv.c from 2.6.23-rc8-mm1 and 2.6.23-mm1 I see two
changes, that look suspicious:
http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=commitdiff;h=31cc23b34913bc173680bdc87af79e551bf8cc0d
The comment says: "ahci and sata_sil24 are converted to use ata_std_qc_defer()."
But the patch also adds ".qc_defer = ata_std_qc_defer," to sata_nv.c
The second change is the removal of the 'lock' spinlock from sata_nv.c
that was used in nv_swncq_qc_issue and nv_swncq_host_interrupt.
Should I try to revert one or both of these changes?
Torsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 12:03 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 12:19 ` Jeff Garzik
2007-10-13 14:32 ` 2.6.23-mm1 Torsten Kaiser
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2007-10-13 12:19 UTC (permalink / raw)
To: Torsten Kaiser
Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
Torsten Kaiser wrote:
> On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
>> Torsten Kaiser wrote:
>>> On 10/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
>>>> On Fri, 12 Oct 2007 10:31:42 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
>>>>> Oct 12 10:23:03 treogen smartd[6091]: Device: /dev/sdc, not found in
>>>>> smartd database.
>>>> hm.
>>>>
>>>>> Oct 12 10:23:03 treogen [ 105.990000] WARNING: at
>>>>> drivers/ata/libata-core.c:5752 ata_qc_issue()
>>>> Let's cc linux-ide.
>>>>
>>>>> Oct 12 10:23:03 treogen [ 105.990000]
>>>>> Oct 12 10:23:03 treogen [ 105.990000] Call Trace:
>>>>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff804442ef>]
>>>>> ata_qc_issue+0x47f/0x540
>>>>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80432e60>] scsi_done+0x0/0x20
>>>>> Oct 12 10:23:03 treogen [ 105.990000] [<ffffffff80449c80>]
>>>>> ata_scsi_flush_xlat+0x0/0x30
>>> Oct 13 07:46:48 treogen [ 99.850000]
>>> Oct 13 07:46:48 treogen [ 99.850000] ata3: EH in SWNCQ
>>> mode,QC:qc_active 0x3 sactive 0x1
>>> Oct 13 07:46:48 treogen [ 99.850000] ata3: SWNCQ:qc_active 0x1
>>> defer_bits 0x0 last_issue_tag 0x0
>> The WARNING indicates that there is a SWNCQ bug in sata_nv. Given that
>> the problem appears when SYNCHRONIZE CACHE is being issued, I would
>
> I can't follow you on SYNCHRONIZE CACHE.
> The only command written to the syslog in the errors where
> 0x60==ATA_CMD_FPDMA_READ and 0xB0 (which is not in
> include/linux/ata.h, but ATA-6 says that this is SMART related. That
> makes sense, as smartd is failing).
In the traceback you have "ata_scsi_flush_xlat", which is the function
that translates a SCSI sync-cache command into an ATA flush-cache command.
The "WARNING: at drivers/ata/libata-core.c:5752 ata_qc_issue()" also
guides us to the code comment
/* Make sure only one non-NCQ command is outstanding. The
* check is skipped for old EH because it reuses active qc to
* request ATAPI sense.
*/
which is a check related to NCQ->off and off->NCQ edge cases.
So those are the two bits of information I found interesting.
>> guess that sata_nv is not properly handling non-queued commands.
>
> But that still seems correct, as I would not expect that SMART
> commands get queued. (Thats just a guess, as I did not try to find the
> code that does this distinction)
>
>> This is a patch from libata-dev.git#nv-swncq (via #ALL).
>
> Comparing sata_nv.c from 2.6.23-rc8-mm1 and 2.6.23-mm1 I see two
> changes, that look suspicious:
>
> http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=commitdiff;h=31cc23b34913bc173680bdc87af79e551bf8cc0d
>
> The comment says: "ahci and sata_sil24 are converted to use ata_std_qc_defer()."
> But the patch also adds ".qc_defer = ata_std_qc_defer," to sata_nv.c
>
> The second change is the removal of the 'lock' spinlock from sata_nv.c
> that was used in nv_swncq_qc_issue and nv_swncq_host_interrupt.
>
> Should I try to revert one or both of these changes?
If you are git-capable, IMO the next steps in problem elimination should be
* download latest linux-2.6.git (currently
752097cec53eea111d087c545179b421e2bde98a)
* build and test linux-2.6.git, to establish a new baseline
* download latest libata-dev.git#nv-swncq (currently
3cb664c2d319a4fde5028c3c5dab6221fe70bd2d)
* build and test, with sata_nv module option swncq=0
* build and test, with sata_nv module option swncq=1
That will get -mm out of the picture, use the same baseline kernel for
all three tests (nv-swncq is based off of
752097cec53eea111d087c545179b421e2bde98a) and narrow things down to the
precise changes that went upstream (or are on the 'nv-swncq' branch,
waiting to go upstream).
My gut feeling is that there is a lingering bug in sata_nv SWNCQ somewhere.
Jeff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 12:19 ` 2.6.23-mm1 Jeff Garzik
@ 2007-10-13 14:32 ` Torsten Kaiser
2007-10-13 14:40 ` 2.6.23-mm1 Torsten Kaiser
0 siblings, 1 reply; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-13 14:32 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
> Torsten Kaiser wrote:
> > On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
> >> Torsten Kaiser wrote:
> > I can't follow you on SYNCHRONIZE CACHE.
> > The only command written to the syslog in the errors where
> > 0x60==ATA_CMD_FPDMA_READ and 0xB0 (which is not in
> > include/linux/ata.h, but ATA-6 says that this is SMART related. That
> > makes sense, as smartd is failing).
>
> In the traceback you have "ata_scsi_flush_xlat", which is the function
> that translates a SCSI sync-cache command into an ATA flush-cache command.
Aha. That makes sense.
But on the second error, where the drive was kicked out completely all
three traces did not have ata_scsi_flush_xlat.
First WARNING:
Oct 13 07:46:48 treogen [ 99.850000] Call Trace:
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044431a>]
ata_qc_issue+0x4aa/0x540
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044ce30>]
ata_scsi_pass_thru+0x0/0x2c0
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff8044a6ea>]
ata_scsi_translate+0xfa/0x180
Oct 13 07:46:48 treogen [ 99.850000] [<ffffffff80432e60>] scsi_done+0x0/0x20
...
Second+Third:
Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff804442ef>]
ata_qc_issue+0x47f/0x540
Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff80432e60>] scsi_done+0x0/0x20
Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff80432e60>] scsi_done+0x0/0x20
Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff8044a440>]
ata_scsi_rw_xlat+0x0/0x1b0
Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff8044a6ea>]
ata_scsi_translate+0xfa/0x180
Oct 13 07:46:49 treogen [ 100.510000] [<ffffffff80432e60>] scsi_done+0x0/0x20
...
So the commands that generate the WARNINGs seem only later collateral damage.
> The "WARNING: at drivers/ata/libata-core.c:5752 ata_qc_issue()" also
> guides us to the code comment
>
> /* Make sure only one non-NCQ command is outstanding. The
> * check is skipped for old EH because it reuses active qc to
> * request ATAPI sense.
> */
>
> which is a check related to NCQ->off and off->NCQ edge cases.
>
> So those are the two bits of information I found interesting.
But I very much agree about this. But rather than 'normal' edges with
the cache flushes, I would blame it on the SMART commands from smartd
that trigger the switch.
Both errors happend during the startup of smartd.
> >> guess that sata_nv is not properly handling non-queued commands.
> >
> > But that still seems correct, as I would not expect that SMART
> > commands get queued. (Thats just a guess, as I did not try to find the
> > code that does this distinction)
> >
> >> This is a patch from libata-dev.git#nv-swncq (via #ALL).
> >
> > Comparing sata_nv.c from 2.6.23-rc8-mm1 and 2.6.23-mm1 I see two
> > changes, that look suspicious:
> >
> > http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=commitdiff;h=31cc23b34913bc173680bdc87af79e551bf8cc0d
> >
> > The comment says: "ahci and sata_sil24 are converted to use ata_std_qc_defer()."
> > But the patch also adds ".qc_defer = ata_std_qc_defer," to sata_nv.c
Looking more at this patch, I thing the code change is correct and
only the comment is missing sata_nv. (Only ahci, sil24 and nv seem to
use NCQ und so need the logic from qc_defer)
> > The second change is the removal of the 'lock' spinlock from sata_nv.c
> > that was used in nv_swncq_qc_issue and nv_swncq_host_interrupt.
> >
> > Should I try to revert one or both of these changes?
>
> If you are git-capable, IMO the next steps in problem elimination should be
... I should really take the time install this, but I don't think git
will help in this special case, because:
> * download latest linux-2.6.git (currently
> 752097cec53eea111d087c545179b421e2bde98a)
> * build and test linux-2.6.git, to establish a new baseline
2.6.23-rc8-mm1 worked.
> * download latest libata-dev.git#nv-swncq (currently
> 3cb664c2d319a4fde5028c3c5dab6221fe70bd2d)
That commit (3cb664c2d319a4fde5028c3c5dab6221fe70bd2d) seems to be the
only commit relevant to swncq, as it adds it completely without any
partial steps that could be bisected.
> * build and test, with sata_nv module option swncq=0
> * build and test, with sata_nv module option swncq=1
I will try this. Currently I have sata_nv.swncq=1 in my kernel
commandline so its trivial to change that.
But as only 2 out of 3 boots failed, I think I hit another heisenbug.
> My gut feeling is that there is a lingering bug in sata_nv SWNCQ somewhere.
Older versions of SWNCQ already worked for me, so I don't think its a
general problem.
And as the symptoms would nicely fit into a race condition when
manipulating the NCQ state, the removal of the lock protecting the
private sata_nv defer_queue between 2.6.23-rc8-mm1 and 2.6.23-mm1
looks like the prime suspect.
So now booting with and without swncq and if swncq=0 works, I will try
to add the lock back...
Torsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 14:32 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 14:40 ` Torsten Kaiser
2007-10-13 15:13 ` 2.6.23-mm1 Torsten Kaiser
0 siblings, 1 reply; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-13 14:40 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
On 10/13/07, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
> > Torsten Kaiser wrote:
> > > Comparing sata_nv.c from 2.6.23-rc8-mm1 and 2.6.23-mm1 I see two
> > > changes, that look suspicious:
> > >
> > > http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=commitdiff;h=31cc23b34913bc173680bdc87af79e551bf8cc0d
> > >
> > > The comment says: "ahci and sata_sil24 are converted to use ata_std_qc_defer()."
> > > But the patch also adds ".qc_defer = ata_std_qc_defer," to sata_nv.c
>
> Looking more at this patch, I thing the code change is correct and
> only the comment is missing sata_nv. (Only ahci, sil24 and nv seem to
> use NCQ und so need the logic from qc_defer)
Wait!
I think I found the bug: Its a evil interaction between the above
patch and the swncq patch that is applied later.
The qc_defer patch removes the old ata_scmd_need_defer that was always
called for all drivers and substitutes it for ata_std_qc_defer and
adds it as aops->qc_defer to all drivers that support NCQ *at that
point*.
Then the swncq patch adds a new NCQ capable driver, but the nobody
added the qc_defer-ops to the ops-structure that is added. So swncq
will never defer any commands and the first command that would need to
be defered (the SMART commands) blows up, if there is still another
command in flight.
I will only add the qc_defer and try this...
Torsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 14:40 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 15:13 ` Torsten Kaiser
2007-10-13 17:48 ` 2.6.23-mm1 Jeff Garzik
0 siblings, 1 reply; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-13 15:13 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
On 10/13/07, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> Wait!
>
> I think I found the bug: Its a evil interaction between the above
> patch and the swncq patch that is applied later.
> The qc_defer patch removes the old ata_scmd_need_defer that was always
> called for all drivers and substitutes it for ata_std_qc_defer and
> adds it as aops->qc_defer to all drivers that support NCQ *at that
> point*.
> Then the swncq patch adds a new NCQ capable driver, but the nobody
> added the qc_defer-ops to the ops-structure that is added. So swncq
> will never defer any commands and the first command that would need to
> be defered (the SMART commands) blows up, if there is still another
> command in flight.
>
> I will only add the qc_defer and try this...
3 boots, all worked. So I'm very sure that was the bug, but I will now
do a little load testing...
The only strange thing about 2.6.23-mm1 is, that it takes ~4 second
more to boot.
2.6.23-rc8-mm1:
[ 3.720000] scsi0 : sata_sil24
[ 3.730000] scsi1 : sata_sil24
[ 3.740000] ata1: SATA max UDMA/100 irq 17
[ 3.750000] ata2: SATA max UDMA/100 irq 17
[ 4.110000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 4.160000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[ 4.180000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 4.240000] ata1.00: configured for UDMA/100
[ 4.600000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 4.660000] ata2.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[ 4.680000] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 4.730000] ata2.00: configured for UDMA/100
2.6.23-mm1:
[ 3.650000] scsi0 : sata_sil24
[ 3.660000] scsi1 : sata_sil24
[ 3.660000] ata1: SATA max UDMA/100 host m128@0xefeffc00 port
0xefef8000 irq 17
[ 3.690000] ata2: SATA max UDMA/100 host m128@0xefeffc00 port
0xefefa000 irq 17
[ 5.930000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 5.980000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[ 6.000000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6.060000] ata1.00: configured for UDMA/100
[ 8.290000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 8.340000] ata2.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[ 8.360000] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 8.420000] ata2.00: configured for UDMA/100
Torsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 15:13 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 17:48 ` Jeff Garzik
2007-10-13 18:05 ` 2.6.23-mm1 Torsten Kaiser
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2007-10-13 17:48 UTC (permalink / raw)
To: Torsten Kaiser
Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]
Torsten Kaiser wrote:
> On 10/13/07, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
>> Wait!
>>
>> I think I found the bug: Its a evil interaction between the above
>> patch and the swncq patch that is applied later.
>> The qc_defer patch removes the old ata_scmd_need_defer that was always
>> called for all drivers and substitutes it for ata_std_qc_defer and
>> adds it as aops->qc_defer to all drivers that support NCQ *at that
>> point*.
>> Then the swncq patch adds a new NCQ capable driver, but the nobody
>> added the qc_defer-ops to the ops-structure that is added. So swncq
>> will never defer any commands and the first command that would need to
>> be defered (the SMART commands) blows up, if there is still another
>> command in flight.
>>
>> I will only add the qc_defer and try this...
>
> 3 boots, all worked. So I'm very sure that was the bug, but I will now
> do a little load testing...
>
> The only strange thing about 2.6.23-mm1 is, that it takes ~4 second
> more to boot.
So, you basically applied the attached patch?
Yeah, absence of qc_defer for an NCQ-capable chip would do it.
Jeff
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 457 bytes --]
diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index cf5c85e..240a892 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -554,6 +554,7 @@ static const struct ata_port_operations nv_swncq_ops = {
.bmdma_start = ata_bmdma_start,
.bmdma_stop = ata_bmdma_stop,
.bmdma_status = ata_bmdma_status,
+ .qc_defer = ata_std_qc_defer,
.qc_prep = nv_swncq_qc_prep,
.qc_issue = nv_swncq_qc_issue,
.freeze = nv_mcp55_freeze,
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 17:48 ` 2.6.23-mm1 Jeff Garzik
@ 2007-10-13 18:05 ` Torsten Kaiser
2007-10-13 18:18 ` 2.6.23-mm1 Andrew Morton
2007-10-13 18:41 ` 2.6.23-mm1 Jeff Garzik
0 siblings, 2 replies; 13+ messages in thread
From: Torsten Kaiser @ 2007-10-13 18:05 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
> Torsten Kaiser wrote:
> > 3 boots, all worked. So I'm very sure that was the bug, but I will now
> > do a little load testing...
> >
> > The only strange thing about 2.6.23-mm1 is, that it takes ~4 second
> > more to boot.
>
> So, you basically applied the attached patch?
>
> Yeah, absence of qc_defer for an NCQ-capable chip would do it.
Yes. The system seems to work correctly now.
The only thing I noted during load testing (updating Gentoo ==
compiling and installing) was, that there seems to be memory leak.
After ~2h 2.5 of my 4Gb where gone. But there where to many things
going on to pinpoint it... (NFSv4 over eth1394?)
> diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
> index cf5c85e..240a892 100644
> --- a/drivers/ata/sata_nv.c
> +++ b/drivers/ata/sata_nv.c
> @@ -554,6 +554,7 @@ static const struct ata_port_operations nv_swncq_ops = {
> .bmdma_start = ata_bmdma_start,
> .bmdma_stop = ata_bmdma_stop,
> .bmdma_status = ata_bmdma_status,
> + .qc_defer = ata_std_qc_defer,
> .qc_prep = nv_swncq_qc_prep,
> .qc_issue = nv_swncq_qc_issue,
> .freeze = nv_mcp55_freeze,
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 18:05 ` 2.6.23-mm1 Torsten Kaiser
@ 2007-10-13 18:18 ` Andrew Morton
2007-10-13 18:41 ` 2.6.23-mm1 Jeff Garzik
1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2007-10-13 18:18 UTC (permalink / raw)
To: Torsten Kaiser; +Cc: Jeff Garzik, linux-kernel, linux-ide, Kuan Luo, Peer Chen
On Sat, 13 Oct 2007 20:05:19 +0200 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
> The only thing I noted during load testing (updating Gentoo ==
> compiling and installing) was, that there seems to be memory leak.
> After ~2h 2.5 of my 4Gb where gone. But there where to many things
> going on to pinpoint it... (NFSv4 over eth1394?)
Please send /proc/meminfo and /proc/slabinfo after the leak has been
happening for a while.
Sometimes `echo m > /proc/sysrq_trigger ; dmesg -s 1000000' will
provide useful info.
The page-owner code can pinpoint a leak source. See
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23/2.6.23-mm1/broken-out/page-owner-tracking-leak-detector.patch
Enable CONFIG_DEBUG_SLAB_LEAK, check out /proc/slab_allocators
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.23-mm1
2007-10-13 18:05 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 18:18 ` 2.6.23-mm1 Andrew Morton
@ 2007-10-13 18:41 ` Jeff Garzik
1 sibling, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2007-10-13 18:41 UTC (permalink / raw)
To: Torsten Kaiser
Cc: Andrew Morton, linux-kernel, linux-ide, Kuan Luo, Peer Chen
Torsten Kaiser wrote:
> On 10/13/07, Jeff Garzik <jeff@garzik.org> wrote:
>> Torsten Kaiser wrote:
>>> 3 boots, all worked. So I'm very sure that was the bug, but I will now
>>> do a little load testing...
>>>
>>> The only strange thing about 2.6.23-mm1 is, that it takes ~4 second
>>> more to boot.
>> So, you basically applied the attached patch?
>>
>> Yeah, absence of qc_defer for an NCQ-capable chip would do it.
>
> Yes. The system seems to work correctly now.
Thanks for helping track this down. Fix pushed out to libata-dev.git.
Jeff
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-10-13 18:41 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20071011213126.cf92efb7.akpm@linux-foundation.org>
[not found] ` <20071012140328.f82af8e8.kamezawa.hiroyu@jp.fujitsu.com>
[not found] ` <20071011234202.2f15bb76.akpm@linux-foundation.org>
[not found] ` <64bb37e0710120131y6b939951y74c50bd596b1d938@mail.gmail.com>
2007-10-12 8:37 ` 2.6.23-mm1 Andrew Morton
2007-10-12 12:46 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 8:01 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 10:55 ` 2.6.23-mm1 Jeff Garzik
2007-10-13 12:03 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 12:19 ` 2.6.23-mm1 Jeff Garzik
2007-10-13 14:32 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 14:40 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 15:13 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 17:48 ` 2.6.23-mm1 Jeff Garzik
2007-10-13 18:05 ` 2.6.23-mm1 Torsten Kaiser
2007-10-13 18:18 ` 2.6.23-mm1 Andrew Morton
2007-10-13 18:41 ` 2.6.23-mm1 Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).