All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Garzik <jeff@garzik.org>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: Tejun Heo <htejun@gmail.com>,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org
Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1
Date: Thu, 27 Sep 2007 02:24:51 -0400	[thread overview]
Message-ID: <46FB4CB3.3090004@garzik.org> (raw)
In-Reply-To: <64bb37e0709262314x1b0100d8lfe34327db6b9bec8@mail.gmail.com>

Torsten Kaiser wrote:
> On 9/27/07, Tejun Heo <htejun@gmail.com> wrote:
>> Tejun Heo wrote:
>>> Torsten Kaiser wrote:
>>>> Comparing the driver/ata directory from rc3-mm1 and rc4-mm1 the
>>>> following change looked the most suspicions to me:
>>>> http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=blobdiff;f=drivers/ata/sata_sil24.c;h=3dcb223117be9739ee04d70b6bfc776a4b839a3f;hp=e0cd31aa8002350add53ba6ff07493e503275244;hb=020bc1bd8d369a77bd9379cd9763ac0057651753;hpb=8d4bdf8087e682df98bdb856f6ad451bf6d597e7
>>>>
>>>> That after rc4-mm1 the sata_sil24.c did not change anymore also
>>>> matches the occurrence of the error.
>>>>
>>>> To confirm my theorie I exchanged the sata_sil24.c from rc8-mm1 with
>>>> the version from rc3-mm1.
>>>> I was able to boot the resulting kernel successfully 5 times, without
>>>> the error happening again.
>>> Thanks a lot for chasing down the problem.  The changed code is address
>>> initialization path and it's weird that it causes intermittent failures,
>>> not a consistent one.
>>>
>>> Anyways, does the attached patch fix the problem?
> 
> I'm starting to *really* hate that bug.
> My analysis was wrong, as I booted to modified 2.6.23-rc8-mm1 this
> morning, that failed too. (Same error messages as -rc7-mm1 from the
> first mail in this thread.)
> So it's not that change that causes the breakage.
> 
> And I'm not really finding a good pattern to what boots fail and what work.
> It seems to only fail, if I completely power off the system for
> several hours. (Using the physical switch at the backside of the
> powersupply, not the normal soft-off)
> 
> One of the five boots I tried yesterday, I also powered the system
> completely off that way, but only leaving it off ~10..20 seconds
> seemed not to trigger the bug.
> 
> But I still think that is not a hardware failure, as the -rc3-mm1
> kernel never showed that error, even when I used it several times
> after the first -rc4-mm1 failures.
> 
>> If not, can you add printk of iomap[SIL24_PORT_BAR], offset, initialized
>> cmd_addr and scr_addr in the loop and see whether anything is different
>> between when the driver works and fails.
> 
> Should I do this anyway?
> 
> I compared the dmesg form good and bad boots with -rc7-mm1 but could
> not see any difference, so do you think that these additional
> diagnostics could show a difference?
> Or could you suggest any other debugging options I should try?

I think since its a reproducible problem, I think it's easiest to get 
you straight to git-bisect.  In this case, that would be

1. Clone branch "upstream" of 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

2. Test.  If bug persists, you have narrowed down the problem to the -mm 
changes from the SATA developers, that are to be sent for 2.6.24.  If 
the problem does not persist, then it's a problem added in the -mm 
patchset alone, which carries few ATA patches outside of libata-dev.git.

3. If the problem is in libata-dev.git#upstream (likely), you can now 
use git-bisect to find the specific commit that causes the problems. 
Read the git-bisect man page for full details, but the basics are

	a) start with a known good point (v2.6.22? v2.6.23?) and
	known bad point (HEAD, aka the most recent commit in
	libata-dev.git#upstream)

	b) build and boot kernels, marking each as known-good or
	known-bad.

	c) This process will systematically narrow down the problem
	to a single git commit.

Regards,

	Jeff




  reply	other threads:[~2007-09-27  6:25 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-26 20:26 sata_sil24 broken since 2.6.23-rc4-mm1 Torsten Kaiser
2007-09-27  4:54 ` Tejun Heo
2007-09-27  4:57   ` Tejun Heo
2007-09-27  6:14     ` Torsten Kaiser
2007-09-27  6:24       ` Jeff Garzik [this message]
2007-09-27 17:34         ` Torsten Kaiser
2007-09-27 20:22           ` Tejun Heo
2007-09-28  5:36             ` Torsten Kaiser
2007-09-30  6:00               ` Torsten Kaiser
2007-09-30 14:34                 ` Tejun Heo
2007-09-30 16:19                   ` Torsten Kaiser
2007-09-30 17:39                     ` Tejun Heo
2007-09-30 18:39                       ` Torsten Kaiser
2007-10-01 18:00                         ` Torsten Kaiser
2007-10-03 15:21                           ` Torsten Kaiser
2007-10-03 15:55                             ` Torsten Kaiser
2007-10-03 16:38                               ` Matt Mackall
2007-10-03 17:36                                 ` Torsten Kaiser
2007-10-03 17:51                                   ` Matt Mackall
2007-10-03 18:06                                     ` Torsten Kaiser
2007-10-04  5:32                                 ` Torsten Kaiser
2007-10-04 17:05                                   ` Matt Mackall
2007-10-05  6:06                                     ` Torsten Kaiser
2007-10-07  8:44                                       ` Torsten Kaiser
2007-10-07 14:39                                         ` Torsten Kaiser
2007-10-11  3:25                                           ` Tejun Heo
2007-10-11  5:54                                             ` Torsten Kaiser
2007-10-11  6:26                                               ` Tejun Heo
2007-10-11 17:51                                                 ` Torsten Kaiser
2007-10-11  8:26                                             ` Jens Axboe
2007-10-11  8:36                                               ` Tejun Heo
2007-10-11 10:28                                                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46FB4CB3.3090004@garzik.org \
    --to=jeff@garzik.org \
    --cc=akpm@linux-foundation.org \
    --cc=htejun@gmail.com \
    --cc=just.for.lkml@googlemail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.