JMicron - hard resetting link

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* JMicron - hard resetting link
@ 2008-02-12  9:48 Gabor FUNK
  2008-02-12 13:05 ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Gabor FUNK @ 2008-02-12  9:48 UTC (permalink / raw)
  To: IDE/ATA development list

Hi list,

I seem to have a bug with JMicron controller in a Gigabyte GA-N680SLI-DQ6 
motherboard.
http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ProductID=2460
Kernel is 2.6.24.
10 on-board SATA connectors, 2+4*JMicron 20360/20363 + 4*nVidia MCP55
2*200GB disks (System - SW RAID1) on the JMicron controller and
8*500 (Data - SW RAID6) - 4 on the JMicron, 4 on the nVidia controller.

Under heavy load the JMicron controller gets exceptions, then eventually 
"hard resetting link".
All 4 disks/connector, one after another. This of course "kills" the RAID

Excerpt from syslog
Feb  9 16:16:32 storage1 kernel: ata2.00: exception Emask 0x0 SAct 0x3ffff 
SErr 0x0 action 0x2 frozen
Feb  9 16:16:32 storage1 kernel: ata1.00: exception Emask 0x0 SAct 0x1fffff 
SErr 0x0 action 0x2 frozen
Feb  9 16:16:32 storage1 kernel: ata1.00: cmd 
61/08:00:73:12:d9/00:00:23:00:00/40 tag 0 ncq 4096 out
Feb  9 16:16:32 storage1 kernel:          res 
40/00:80:c3:7c:d3/00:01:23:00:00/40 Emask 0x4 (timeout)
Feb  9 16:16:32 storage1 kernel: ata1.00: status: { DRDY }
...
Feb  9 16:16:32 storage1 kernel: ata1.00: cmd 
61/80:a0:c3:1f:d9/00:00:23:00:00/40 tag 20 ncq 65536 out
Feb  9 16:16:32 storage1 kernel:          res 
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb  9 16:16:32 storage1 kernel: ata1.00: status: { DRDY }
Feb  9 16:16:32 storage1 kernel: ata1: hard resetting link

Didn't dare to post all attachments, so
    full dmesg
    lspci -nn
    syslog - from the error start
can be downloaded from:
http://www.huweb.hu/maques/tmp/jmicron

I'm lost.
Anyone seen such thing? What could it be? Hardware (MB, chipset, BIOS), 
kernel (driver) or what?
Any suggestion? Kernel version to try, dispose hardware or shoot myself in 
the head?

Thanks,
Gabor 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-12  9:48 JMicron - hard resetting link Gabor FUNK
@ 2008-02-12 13:05 ` Tejun Heo
  2008-02-12 14:38   ` Gabor FUNK
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2008-02-12 13:05 UTC (permalink / raw)
  To: Gabor FUNK; +Cc: IDE/ATA development list

Gabor FUNK wrote:
> Hi list,
> 
> I seem to have a bug with JMicron controller in a Gigabyte
> GA-N680SLI-DQ6 motherboard.
> http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ProductID=2460
> 
> Kernel is 2.6.24.
> 10 on-board SATA connectors, 2+4*JMicron 20360/20363 + 4*nVidia MCP55
> 2*200GB disks (System - SW RAID1) on the JMicron controller and
> 8*500 (Data - SW RAID6) - 4 on the JMicron, 4 on the nVidia controller.
> 
> Under heavy load the JMicron controller gets exceptions, then eventually
> "hard resetting link".
> All 4 disks/connector, one after another. This of course "kills" the RAID

It shouldn't kill the RAID.  Hmmm... The log is truncated.  Can you
please post full kernel log spanning from boot to array death?

> I'm lost.
> Anyone seen such thing? What could it be? Hardware (MB, chipset, BIOS),
> kernel (driver) or what?
> Any suggestion? Kernel version to try, dispose hardware or shoot myself
> in the head?

One of common causes for this kind of problem is bad power and PSUs
which are rated for high wattage aren't always good enough.  Prepare a
power supply (popular cheap $15 one should do) such that it can be
powered up by itself.

  http://modtown.co.uk/mt/article2.php?id=psumod

Move half of the drives to the new PSU and see whether the problem goes
away.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-12 13:05 ` Tejun Heo
@ 2008-02-12 14:38   ` Gabor FUNK
  2008-02-12 14:52     ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Gabor FUNK @ 2008-02-12 14:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: IDE/ATA development list

>> I seem to have a bug with JMicron controller in a Gigabyte
>> GA-N680SLI-DQ6 motherboard.
>> http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ProductID=2460
>>
>> Kernel is 2.6.24.
>> 10 on-board SATA connectors, 2+4*JMicron 20360/20363 + 4*nVidia MCP55
>> 2*200GB disks (System - SW RAID1) on the JMicron controller and
>> 8*500 (Data - SW RAID6) - 4 on the JMicron, 4 on the nVidia controller.
>>
>> Under heavy load the JMicron controller gets exceptions, then eventually
>> "hard resetting link".
>> All 4 disks/connector, one after another. This of course "kills" the RAID
>
> It shouldn't kill the RAID.  Hmmm... The log is truncated.  Can you
> please post full kernel log spanning from boot to array death?

RAID "dies" because controller dies, then it loses 4 disks out of 8...
Actually, the server last time was up and running for 2 months.
Then when it failed the 1st time, I did some tests and it went on for
3 days, including building the raid and heavy test file copy.
The full log from the 1st relevant error message till the death of
the array is here:
http://www.huweb.hu/maques/tmp/jmicron/syslog

> Move half of the drives to the new PSU and see whether the problem goes
> away.

This is a new server, with a Chieftec GPS650AB, 650W PSU in it.
Though AFAIK a harddisk consumes around 10W, and I will try to use
more than one PSU-s.
The main problem is that I can't immediately see if it helps or not.
Even if it will work without this problem for a week, I can't be sure it
still will in 2 months...
Because of this - and because I believe that this problem related to the HW
(motherboard, chipset) - I'd rather just throw away the MB and use an
other one with two extra 4 port SATA cards.

Thanks,
Gabor 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-12 14:38   ` Gabor FUNK
@ 2008-02-12 14:52     ` Tejun Heo
  2008-02-12 17:27       ` Gabor FUNK
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2008-02-12 14:52 UTC (permalink / raw)
  To: Gabor FUNK; +Cc: IDE/ATA development list

Gabor FUNK wrote:
>> It shouldn't kill the RAID.  Hmmm... The log is truncated.  Can you
>> please post full kernel log spanning from boot to array death?
> 
> RAID "dies" because controller dies, then it loses 4 disks out of 8...
> Actually, the server last time was up and running for 2 months.
> Then when it failed the 1st time, I did some tests and it went on for
> 3 days, including building the raid and heavy test file copy.
> The full log from the 1st relevant error message till the death of
> the array is here:
> http://www.huweb.hu/maques/tmp/jmicron/syslog

What I said was that timeouts occurring due to transmission errors
should be recoverable.  It seems like IRQ delivery didn't work probably
due to screaming IRQ.  I need to see the messages before the first
relevant error message.  It's always a good idea to post full kernel log
from boot till failure.  Things which don't seem relevant are often
relevant.

>> Move half of the drives to the new PSU and see whether the problem goes
>> away.
> 
> This is a new server, with a Chieftec GPS650AB, 650W PSU in it.
> Though AFAIK a harddisk consumes around 10W, and I will try to use
> more than one PSU-s.

I've recently tracked down IO problems a server product line from a
major (really, one of the top three) vendor to malfunctioning PSU, so
don't trust the labeling too much.

> The main problem is that I can't immediately see if it helps or not.
> Even if it will work without this problem for a week, I can't be sure it
> still will in 2 months...
> Because of this - and because I believe that this problem related to the HW
> (motherboard, chipset) - I'd rather just throw away the MB and use an
> other one with two extra 4 port SATA cards.

Till now, none of this kind of problem has been tracked down to MB or
the controller while 90% of hardware problems turned out to be power
related.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-12 14:52     ` Tejun Heo
@ 2008-02-12 17:27       ` Gabor FUNK
  2008-02-12 23:50         ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Gabor FUNK @ 2008-02-12 17:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: IDE/ATA development list

> What I said was that timeouts occurring due to transmission errors
> should be recoverable.  It seems like IRQ delivery didn't work probably
> due to screaming IRQ.  I need to see the messages before the first
> relevant error message.  It's always a good idea to post full kernel log
> from boot till failure.  Things which don't seem relevant are often
> relevant.
Naturally. Full kern.log with boot:
http://www.huweb.hu/maques/tmp/jmicron/kern.log
(no edits, there are really only those 2 lines between Feb 6 and Feb 9's 1st 
exception)

Previously there was kernel 2.6.23.9 and I noticed the following in syslog 
by then:
Feb  6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this 
message won't be printed again
Feb  6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this 
message won't be printed again
Feb  6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this 
message won't be printed again
Feb  6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this 
message won't be printed again

I googled and saw that there was some fixes related to this (maybe it
was you), so that's why we hoped that 2.6.24 will fix this. Actually the
above error messages were gone, but...

> Till now, none of this kind of problem has been tracked down to MB or
> the controller while 90% of hardware problems turned out to be power
> related.
I'll put a brand new, probably different PSU in the case and put the MB
and the 4 disks of the problematic controller on it, and put the 2 system
and other 4 disks to this one (or even another one).

Meanwhile I'd welcome if you have any suggestion why controller reset
causing a "fatal error"...
BTW, the drives were accessible after the array broke (when I got there).

Thanks,
Gabor 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-12 17:27       ` Gabor FUNK
@ 2008-02-12 23:50         ` Tejun Heo
  2008-02-14 23:02           ` Gabor FUNK
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2008-02-12 23:50 UTC (permalink / raw)
  To: Gabor FUNK; +Cc: IDE/ATA development list

Hello,

Gabor FUNK wrote:
>> What I said was that timeouts occurring due to transmission errors
>> should be recoverable.  It seems like IRQ delivery didn't work probably
>> due to screaming IRQ.  I need to see the messages before the first
>> relevant error message.  It's always a good idea to post full kernel log
>> from boot till failure.  Things which don't seem relevant are often
>> relevant.
> Naturally. Full kern.log with boot:
> http://www.huweb.hu/maques/tmp/jmicron/kern.log
> (no edits, there are really only those 2 lines between Feb 6 and Feb 9's
> 1st exception)

Hmmm... Indeed.  This is the first time this mode of failure is reported.

> Previously there was kernel 2.6.23.9 and I noticed the following in
> syslog by then:
> Feb  6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this
> message won't be printed again
> Feb  6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this
> message won't be printed again
> Feb  6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this
> message won't be printed again
> Feb  6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this
> message won't be printed again
> 
> I googled and saw that there was some fixes related to this (maybe it
> was you), so that's why we hoped that 2.6.24 will fix this. Actually the
> above error messages were gone, but...

Yeap, those are gone.

>> Till now, none of this kind of problem has been tracked down to MB or
>> the controller while 90% of hardware problems turned out to be power
>> related.
> I'll put a brand new, probably different PSU in the case and put the MB
> and the 4 disks of the problematic controller on it, and put the 2 system
> and other 4 disks to this one (or even another one).

Yeap, please keep me posted.

> Meanwhile I'd welcome if you have any suggestion why controller reset
> causing a "fatal error"...
> BTW, the drives were accessible after the array broke (when I got there).

What do you mean by 'drives were accessible'?  /dev/sdX nodes were
accessible?

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-12 23:50         ` Tejun Heo
@ 2008-02-14 23:02           ` Gabor FUNK
  2008-02-14 23:32             ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Gabor FUNK @ 2008-02-14 23:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: IDE/ATA development list

To be honest, I didn't believe that doing anything with the PSU
would do something.
However, seemingly it did.
I have also updated the BIOS, but I guess this has not much
to do with it.
So a different brand PSU was additionally installed, and this
one got the motherboard and the 4 disk which were failing.
The "old" PSU got the second 4 hdds and the 2 other system
HDDs.
Test was started yesterday (Feb 13) about 16:30 CET including
array building up and file copies. About today (14) 20:22 the
problem appeared, but seemingly "moved" with the PSU to the
other 4 disks bunch (on nvidia controller) - more precisely, only
2 of them (array is still operational).

Feb 14 20:22:32 storage1 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x2 frozen
Feb 14 20:22:32 storage1 kernel: ata10.00: cmd 
c8/00:00:c3:d5:3b/00:00:00:00:00/e2 tag 0 dma 131072 in
Feb 14 20:22:32 storage1 kernel:          res 
40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 20:22:32 storage1 kernel: ata10.00: status: { DRDY }
Feb 14 20:22:32 storage1 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x2 frozen
Feb 14 20:22:32 storage1 kernel: ata9.00: cmd 
c8/00:00:c3:d5:3b/00:00:00:00:00/e2 tag 0 dma 131072 in
Feb 14 20:22:32 storage1 kernel:          res 
40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 20:22:32 storage1 kernel: ata9.00: status: { DRDY }
Feb 14 20:22:33 storage1 kernel: ata10: soft resetting link
Feb 14 20:22:33 storage1 kernel: ata9: soft resetting link
Feb 14 20:22:33 storage1 kernel: ata10: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Feb 14 20:22:33 storage1 kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Feb 14 20:23:03 storage1 kernel: ata9.00: qc timeout (cmd 0x27)
Feb 14 20:23:03 storage1 kernel: ata9.00: failed to read native max address 
(err_mask=0x4)
Feb 14 20:23:03 storage1 kernel: ata9.00: HPA support seems broken, will 
skip HPA handling
Feb 14 20:23:03 storage1 kernel: ata9.00: revalidation failed (errno=-5)
Feb 14 20:23:03 storage1 kernel: ata9: failed to recover some devices, 
retrying in 5 secs
Feb 14 20:23:03 storage1 kernel: ata10.00: qc timeout (cmd 0x27)
Feb 14 20:23:03 storage1 kernel: ata10.00: failed to read native max address 
(err_mask=0x4)
Feb 14 20:23:03 storage1 kernel: ata10.00: HPA support seems broken, will 
skip HPA handling
Feb 14 20:23:03 storage1 kernel: ata10.00: revalidation failed (errno=-5)
Feb 14 20:23:03 storage1 kernel: ata10: failed to recover some devices, 
retrying in 5 secs
Feb 14 20:23:08 storage1 kernel: ata9: hard resetting link
Feb 14 20:23:08 storage1 kernel: ata10: hard resetting link
...

Full kern.log is at:
http://www.huweb.hu/maques/tmp/jmicron/kern0214.log

So it seems that there is definitely something with the "old" PSU.

Also, I tried to mount the failed drives, without success.

Thought I let you know.
Now I will try with the only one, "new" PSU to see what happens...

G.


----- Original Message ----- 
From: "Tejun Heo" <htejun@gmail.com>
To: "Gabor FUNK" <FUNK.Gabor@hunetkft.hu>
Cc: "IDE/ATA development list" <linux-ide@vger.kernel.org>
Sent: Wednesday, February 13, 2008 12:50 AM
Subject: Re: JMicron - hard resetting link


> Hello,
>
> Gabor FUNK wrote:
>>> What I said was that timeouts occurring due to transmission errors
>>> should be recoverable.  It seems like IRQ delivery didn't work probably
>>> due to screaming IRQ.  I need to see the messages before the first
>>> relevant error message.  It's always a good idea to post full kernel log
>>> from boot till failure.  Things which don't seem relevant are often
>>> relevant.
>> Naturally. Full kern.log with boot:
>> http://www.huweb.hu/maques/tmp/jmicron/kern.log
>> (no edits, there are really only those 2 lines between Feb 6 and Feb 9's
>> 1st exception)
>
> Hmmm... Indeed.  This is the first time this mode of failure is reported.
>
>> Previously there was kernel 2.6.23.9 and I noticed the following in
>> syslog by then:
>> Feb  6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb  6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb  6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb  6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this
>> message won't be printed again
>>
>> I googled and saw that there was some fixes related to this (maybe it
>> was you), so that's why we hoped that 2.6.24 will fix this. Actually the
>> above error messages were gone, but...
>
> Yeap, those are gone.
>
>>> Till now, none of this kind of problem has been tracked down to MB or
>>> the controller while 90% of hardware problems turned out to be power
>>> related.
>> I'll put a brand new, probably different PSU in the case and put the MB
>> and the 4 disks of the problematic controller on it, and put the 2 system
>> and other 4 disks to this one (or even another one).
>
> Yeap, please keep me posted.
>
>> Meanwhile I'd welcome if you have any suggestion why controller reset
>> causing a "fatal error"...
>> BTW, the drives were accessible after the array broke (when I got there).
>
> What do you mean by 'drives were accessible'?  /dev/sdX nodes were
> accessible?
>
> -- 
> tejun
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-14 23:02           ` Gabor FUNK
@ 2008-02-14 23:32             ` Tejun Heo
  2008-02-21 21:45               ` Gabor FUNK
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2008-02-14 23:32 UTC (permalink / raw)
  To: Gabor FUNK; +Cc: IDE/ATA development list

Gabor FUNK wrote:
> To be honest, I didn't believe that doing anything with the PSU
> would do something.
> However, seemingly it did.
> I have also updated the BIOS, but I guess this has not much
> to do with it.

I too am amazed at the number of PSU problems getting reported here.  It
seems most hardware problems turn out to be power related.

> So a different brand PSU was additionally installed, and this
> one got the motherboard and the 4 disk which were failing.
> The "old" PSU got the second 4 hdds and the 2 other system
> HDDs.
> Test was started yesterday (Feb 13) about 16:30 CET including
> array building up and file copies. About today (14) 20:22 the
> problem appeared, but seemingly "moved" with the PSU to the
> other 4 disks bunch (on nvidia controller) - more precisely, only
> 2 of them (array is still operational).

Hmmm..

> So it seems that there is definitely something with the "old" PSU.
> 
> Also, I tried to mount the failed drives, without success.
> 
> Thought I let you know.
> Now I will try with the only one, "new" PSU to see what happens...

Yeah, please keep us posted.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-14 23:32             ` Tejun Heo
@ 2008-02-21 21:45               ` Gabor FUNK
  2008-02-22  2:03                 ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Gabor FUNK @ 2008-02-21 21:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: IDE/ATA development list

>> So a different brand PSU was additionally installed, and this
>> one got the motherboard and the 4 disk which were failing.
>> The "old" PSU got the second 4 hdds and the 2 other system
>> HDDs.
>> Test was started yesterday (Feb 13) about 16:30 CET including
>> array building up and file copies. About today (14) 20:22 the
>> problem appeared, but seemingly "moved" with the PSU to the
>> other 4 disks bunch (on nvidia controller) - more precisely, only
>> 2 of them (array is still operational).
> 
> Hmmm..
> 
>> So it seems that there is definitely something with the "old" PSU.
>> 
>> Also, I tried to mount the failed drives, without success.
>> 
>> Thought I let you know.
>> Now I will try with the only one, "new" PSU to see what happens...
> 
> Yeah, please keep us posted.  Thanks.

To sum it up:
- 1st the 4 disks on the Jmicron controller failed with 1 [chieftek] PSU
- then it failed with 2 PSU too, but this time the chieftek was only
   connected to the different 4 disks - on the nvidia controller. MB
   and other disks were on the other, non-chieftek [650W] PSU.
- Then I started the tests with only this second PSU, and it ran
   for about 6 days under heavy testing and array rebuilding and
   guess what: it failed again.

Full kernel log at:
http://www.huweb.hu/maques/tmp/jmicron/kern0221.log

Since it is not a "switch on and see" problem, I'm not in too good
position, so unless someone have a really great idea or observation,
I seriously have to consider to replace the MB and probably add
some extra sata controllers.

Thanks, G.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-21 21:45               ` Gabor FUNK
@ 2008-02-22  2:03                 ` Tejun Heo
  2008-02-24  9:04                   ` Gabor FUNK
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2008-02-22  2:03 UTC (permalink / raw)
  To: Gabor FUNK; +Cc: IDE/ATA development list

Hello,

Gabor FUNK wrote:
> To sum it up:
> - 1st the 4 disks on the Jmicron controller failed with 1 [chieftek] PSU
> - then it failed with 2 PSU too, but this time the chieftek was only
>   connected to the different 4 disks - on the nvidia controller. MB
>   and other disks were on the other, non-chieftek [650W] PSU.
> - Then I started the tests with only this second PSU, and it ran
>   for about 6 days under heavy testing and array rebuilding and
>   guess what: it failed again.

Eeekk..

> Full kernel log at:
> http://www.huweb.hu/maques/tmp/jmicron/kern0221.log
> 
> Since it is not a "switch on and see" problem, I'm not in too good
> position, so unless someone have a really great idea or observation,
> I seriously have to consider to replace the MB and probably add
> some extra sata controllers.

If you can still do some testing, what happens if you unplug power to
the failed drive and replug it while the system is still running?  Does
hotplug event get triggered and the drive gets recognized again?  If so,
does unplugging and replugging the SATA controller only (w/o powering
down the drive) achieve the same thing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: JMicron - hard resetting link
  2008-02-22  2:03                 ` Tejun Heo
@ 2008-02-24  9:04                   ` Gabor FUNK
  0 siblings, 0 replies; 11+ messages in thread
From: Gabor FUNK @ 2008-02-24  9:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: IDE/ATA development list

>> Since it is not a "switch on and see" problem, I'm not in too good
>> position, so unless someone have a really great idea or observation,
>> I seriously have to consider to replace the MB and probably add
>> some extra sata controllers.
> 
> If you can still do some testing, what happens if you unplug power to
> the failed drive and replug it while the system is still running?  Does
> hotplug event get triggered and the drive gets recognized again?  If so,
> does unplugging and replugging the SATA controller only (w/o powering
> down the drive) achieve the same thing?

I doubt I will start another week of testing without any major changes,
I didn't do drive hotplug, but as for the controllers, they're are all on-
board ones...

I guess new (and different) motherboard and sata controllers cards will
be the next thing to change, 'cause I strongly believe that PSU-s are fine
and should be enough for the system.

G.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-02-24  9:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-12  9:48 JMicron - hard resetting link Gabor FUNK
2008-02-12 13:05 ` Tejun Heo
2008-02-12 14:38   ` Gabor FUNK
2008-02-12 14:52     ` Tejun Heo
2008-02-12 17:27       ` Gabor FUNK
2008-02-12 23:50         ` Tejun Heo
2008-02-14 23:02           ` Gabor FUNK
2008-02-14 23:32             ` Tejun Heo
2008-02-21 21:45               ` Gabor FUNK
2008-02-22  2:03                 ` Tejun Heo
2008-02-24  9:04                   ` Gabor FUNK

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).