linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Darcy <kernel-lists@projecthugo.co.uk>
To: Matt Darcy <kernel-lists@projecthugo.co.uk>
Cc: Sebastian Kuzminsky <seb@highlab.com>,
	Jeff Garzik <jgarzik@pobox.com>,
	linux-ide@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: [git patch] 2.6.x libata fix more information DEBUG INFO !!!
Date: Fri, 13 Jan 2006 21:06:10 +0000	[thread overview]
Message-ID: <43C81642.3030309@projecthugo.co.uk> (raw)
In-Reply-To: <43C804AA.1050303@projecthugo.co.uk>

Matt Darcy wrote:

> Sebastian Kuzminsky wrote:
>
>> Matt Darcy <kernel-lists@projecthugo.co.uk> wrote:
>>  
>>
>>> Its almost as if there is an "IO leak" which is the only way I can 
>>> think of to describe it.the card / system performaces quite well as 
>>> individual disks, but as soon as its entered into a raid 5 
>>> configuration using the any number of disks the creation of the 
>>> array appears to be fine until around %20-%30 through the assembly, 
>>> the speed of the arrays creations plummits and the machine hangs.
>>>   
>>
>>
>> You have 7x250G disks in Raid-5, so that's 6x250G or 1.5T total space.
>> In the beginning of raid recovery, when the system is good, you're
>> getting 12M/s.  It slows then dies after 25% to 40% of completion.
>>
>> 6x250G is 1536000M, at 12M/s that's about 35 hours.  You tested the
>> disks individually (without Raid) for ~12 hours, which is about 34%
>> of 35 hours.  So it's possible you'd see the the same slowdown & hang
>> if you tested the individual disks longer.
>>
>> You're having these problems on a Marvell controller with 2.6.15 and the
>> in-kernel sata_mv driver, right?  I've got a very similar system with
>> unexplained hard hangs too.  On my system the individual disks seem to
>> work fine, Raid-6 of the disks seems work fine, LVM of the disks seems
>> to work fine, but LVM of a Raid-6 of the disks hangs.
>>
>> One wierd thing I've discovered is that if I enable all the kernel
>> debugging options, the system is perfectly stable, and all the debug
>> tests report no warnings or errors to the logs.  Seems like a race
>> condition somewhere, I'm suspecting in the interaction of Raid-6 and
>> LVM, but it could be anywhere I suppose.  I've attached the .config of
>> the production (non-debug) kernel that hangs, and the diff to the debug
>> kernel that works.
>>
>>  
>>
>
>
> Just to clarify a few things,
>
> using the 2.6.15 kernel I can use and assemble the raid 5 array 
> without a problem, however using it lvm2 causes it to hang exactly as 
> you have mentioned before.
>
> When I first started working this problem through I started using some 
> of he mm patches with the 2.6.15-rc's which made a good difference, in 
> that I could build and use the array and even with lvm2 for a period 
> of time, however there was a few quirky bugs with it, in that it 
> couldn't maintain the arrays stability, on certain occasions, if I 
> rebooted the box, most of the disks would be marked as unsuable and 
> the array would refuse to start until it was rebuilt, to futher 
> progress this I started using the libata git branch which again made 
> things a "little" better, until the last 2 git versions where I have 
> this problem with the raid array not being able to build.
>
> from the results I have,  have a gut feeling that this is a driver 
> issue, simpley due to the different results i get with the different 
> kernels.
>
> I've been given some good thoughts today (last mail in from Mark Haln 
> has some good suggestions), so all I can do is run the tests Mark 
> suggested and report back the results to try to progress this forward, 
> although Marks tests seem to point to hardware issues, such as heat, 
> vibration etc I still believe this lies at a software driver level, 
> but its worth running the tests to see what additional data I can get, 
> and to prove/disprove Marks suggestoins.
>
> I shall report back later
>
> thanks,
>

> Matt

Ok,

reverting back to 2.6.15-rc5-mm3 which was my "good" kernel

I started to rebuild me 3+1 spare raid 5 array (smaller test array) and 
it hung on about %50 through
however - from this kernel I got debug results. Bellow (I'll snip them 
in future mails)

I'm going to try the same test again with the latest git kernel to see 
what happens.


Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041321f>] mv_channel_reset+0xff/0x120

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041327a>] mv_stop_and_reset+0x3a/0x60

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c04128ab>] mv_host_intr+0x13b/0x180

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041298d>] mv_interrupt+0x9d/0x130

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c014080d>] handle_IRQ_event+0x3d/0x70

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c01408b6>] __do_IRQ+0x76/0x100

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0105089>] do_IRQ+0x19/0x30

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c01033ee>] common_interrupt+0x1a/0x20

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041316c>] mv_channel_reset+0x4c/0x120

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0512aeb>] schedule+0x31b/0x6a0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c03fb2b0>] scsi_error_handler+0x0/0xb0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041327a>] mv_stop_and_reset+0x3a/0x60

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041374f>] mv_eng_timeout+0x6f/0xb0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c040b4b7>] ata_scsi_error+0x17/0x30

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c03fb33b>] scsi_error_handler+0x8b/0xb0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c01325e6>] kthread+0xb6/0xc0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0132530>] kthread+0x0/0xc0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0101389>] kernel_thread_helper+0x5/0xc

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041365e>] __mv_phy_reset+0x3be/0x420

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041321f>] mv_channel_reset+0xff/0x120

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041328a>] mv_stop_and_reset+0x4a/0x60

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c04128ab>] mv_host_intr+0x13b/0x180

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041298d>] mv_interrupt+0x9d/0x130

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c014080d>] handle_IRQ_event+0x3d/0x70

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c01408b6>] __do_IRQ+0x76/0x100

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0105089>] do_IRQ+0x19/0x30

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c01033ee>] common_interrupt+0x1a/0x20

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041316c>] mv_channel_reset+0x4c/0x120

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0512aeb>] schedule+0x31b/0x6a0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c03fb2b0>] scsi_error_handler+0x0/0xb0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041327a>] mv_stop_and_reset+0x3a/0x60

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c041374f>] mv_eng_timeout+0x6f/0xb0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c040b4b7>] ata_scsi_error+0x17/0x30

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c03fb33b>] scsi_error_handler+0x8b/0xb0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c01325e6>] kthread+0xb6/0xc0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0132530>] kthread+0x0/0xc0

Message from syslogd@berger at Fri Jan 13 20:59:05 2006 ...
berger kernel:  [<c0101389>] kernel_thread_helper+0x5/0xc



      reply	other threads:[~2006-01-13 21:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060109171104.GA25793@havoc.gtf.org>
     [not found] ` <43C4DB86.7030603@projecthugo.co.uk>
2006-01-12 10:01   ` [git patch] 2.6.x libata fix Matt Darcy
2006-01-12 10:57     ` PFC
2006-01-13  9:26       ` Matt Darcy
2006-01-12 11:46     ` Matt Darcy
2006-01-13 11:26       ` [git patch] 2.6.x libata fix more information (sata_mv problems continued) Matt Darcy
2006-01-13 11:42         ` Jens Axboe
2006-01-13 17:14         ` Sebastian Kuzminsky
2006-01-13 19:51           ` Matt Darcy
2006-01-13 21:06             ` Matt Darcy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43C81642.3030309@projecthugo.co.uk \
    --to=kernel-lists@projecthugo.co.uk \
    --cc=jgarzik@pobox.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=seb@highlab.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).