All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
To: Andrey Gusev <a.gusev1980@mail.ru>
Cc: linux-ide@vger.kernel.org, petkovbb@gmail.com, linuxppc-dev@ozlabs.org
Subject: Re: Delay on intialization ide subsystem(most likely)
Date: Mon, 8 Jun 2009 22:20:41 +0200	[thread overview]
Message-ID: <200906082220.41763.bzolnier@gmail.com> (raw)
In-Reply-To: <20090530144643.76cb456e@power-debian>

On Saturday 30 May 2009 12:46:43 Andrey Gusev wrote:
> On Wed, 20 May 2009 17:56:14 +0200
> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> 
> > On Friday 15 May 2009 22:40:07 Andrey Gusev wrote:
> > > On Wed, 13 May 2009 20:46:33 +0200
> > > Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> > > 
> > > > On Wednesday 13 May 2009 19:11:23 Andrey Gusev wrote:
> > > > > On Wed, 13 May 2009 15:28:26 +0200
> > > > > Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> > > > > 
> > > > > > On Tuesday 12 May 2009 21:50:24 Andrey Gusev wrote:
> > > > > > > On Mon, 27 Apr 2009 23:21:48 +0200
> > > > > > > Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> > > > > > > 
> > > > > > > > On Monday 27 April 2009 22:36:45 Andrey Gusev wrote:
> > > > > > > > > On Sat, 25 Apr 2009 16:48:38 +0200
> > > > > > > > > Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On Saturday 25 April 2009 15:02:03 Andrey Gusev wrote:
> > > > > > > > > > > Hello!
> > > > > > > > > > > 
> > > > > > > > > > > I have tested linux-2.6.30-rc3 on my system and find
> > > > > > > > > > > some problems. One of them is delaying on
> > > > > > > > > > > initialization IDE subsystem. I don't have this
> > > > > > > > > > > problem on 2.6.29.1. The difference is looked on
> > > > > > > > > > > log of dmesg.
> > > > > > > > > > 
> > > > > > > > > > Unfortunately this doesn't give us any hint about the
> > > > > > > > > > root cause of the bug so please try narrowing the
> > > > > > > > > > problem down to the specific change using git-bisect
> > > > > > > > > > (sorry, there were 212 drivers/ide/ commits during
> > > > > > > > > > v2.6.29..v2.6.30-rc3 and much much more
> > > > > > > > > > non-drivers/ide/ ones).
> > > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > Bart
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Hello!
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > The full result of bisect is:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > git bisect start
> > > > > > > > > # good: [8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84] Linux
> > > > > > > > > 2.6.29 git bisect good
> > > > > > > > > 8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84 # bad:
> > > > > > > > > [091069740304c979f957ceacec39c461d0192158] Linux
> > > > > > > > > 2.6.30-rc3 git bisect bad
> > > > > > > > > 091069740304c979f957ceacec39c461d0192158 # good:
> > > > > > > > > [40f07111be99b71c1e8d40c13cdc38445add787f] V4L/DVB
> > > > > > > > > (11166): pvrusb2: Implement status fetching from
> > > > > > > > > sub-devices git bisect good
> > > > > > > > > 40f07111be99b71c1e8d40c13cdc38445add787f # good:
> > > > > > > > > [ba0e1ebb7ea0616eebc29d2077355bacea62a9d8] Staging:
> > > > > > > > > sxg: slicoss: Specify the license for Sahara SXG and
> > > > > > > > > Slicoss drivers git bisect good
> > > > > > > > > ba0e1ebb7ea0616eebc29d2077355bacea62a9d8
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > git bisect start 'drivers/ide/'
> > > > > > > > 
> > > > > > > > Please note that limiting search space to drivers/ide/
> > > > > > > > may not give reliable results in case problem was
> > > > > > > > introduced by some other kernel area.
> > > > > > > > 
> > > > > > > > > # good: [ba0e1ebb7ea0616eebc29d2077355bacea62a9d8]
> > > > > > > > > Staging: sxg: slicoss: Specify the license for Sahara
> > > > > > > > > SXG and Slicoss drivers git bisect good
> > > > > > > > > ba0e1ebb7ea0616eebc29d2077355bacea62a9d8 # bad:
> > > > > > > > > [091069740304c979f957ceacec39c461d0192158] Linux
> > > > > > > > > 2.6.30-rc3 git bisect bad
> > > > > > > > > 091069740304c979f957ceacec39c461d0192158 # good:
> > > > > > > > > [e01f251fd09fa7cb3d352eac7de17bb5d5bd1f9d] ide-cd:
> > > > > > > > > convert cdrom_decode_status() to use switch statements
> > > > > > > > > git bisect good
> > > > > > > > > e01f251fd09fa7cb3d352eac7de17bb5d5bd1f9d # good:
> > > > > > > > > [3153c26b54230d025c6d536e8d3015def4524906] ide:
> > > > > > > > > refactor tf_read() method git bisect good
> > > > > > > > > 3153c26b54230d025c6d536e8d3015def4524906 # good:
> > > > > > > > > [c018f1ee5cf81e58b93d9e93a2ee39cad13dc1ac] hpt366: fix
> > > > > > > > > HPT370 DMA timeouts git bisect good
> > > > > > > > > c018f1ee5cf81e58b93d9e93a2ee39cad13dc1ac # bad:
> > > > > > > > > [d5f840bf74c09ca5a31e518c9d984999926b5f44] ide: Remove
> > > > > > > > > void casts git bisect bad
> > > > > > > > > d5f840bf74c09ca5a31e518c9d984999926b5f44 # bad:
> > > > > > > > > [59c8d04f5ee97ea46da854e9adbbaa45d988c39d] hpt366: use
> > > > > > > > > ATA_DMA_* constants git bisect bad
> > > > > > > > > 59c8d04f5ee97ea46da854e9adbbaa45d988c39d
> > > > > > > > 
> > > > > > > > Uhh.. something went wrong during bisect.
> > > > > > > > 
> > > > > > > > "hpt366: use ATA_DMA_* constants" cannot be a first bad
> > > > > > > > commit because hpt366 is not even used on this system.
> > > > > > > > 
> > > > > > > > Could it be that the delay doesn't happen on every boot
> > > > > > > > for "bad" kernels?
> > > > > > > > 
> > > > > > > > Also, is 2.6.30-rc1 okay?
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Bart
> > > > > > > > 
> > > > > > > 
> > > > > > > Hello all!
> > > > > > > 
> > > > > > > I continue to find reason of bug. I made more testing with
> > > > > > > bisect and got result:
> > > > > > > 
> > > > > > > git bisect start
> > > > > > > # bad: [c018f1ee5cf81e58b93d9e93a2ee39cad13dc1ac] hpt366:
> > > > > > > fix HPT370 DMA timeouts git bisect bad
> > > > > > >  # good:
> > > > > > > [fb4252e59452c18b88af014a2c4ee697bbf8cbc6] at91_ide: turn on
> > > > > > > PIO 6 support git bisect good
> > > > > > > fb4252e59452c18b88af014a2c4ee697bbf8cbc6 # good:
> > > > > > > [2e1c63b7ed36532b68f0eddd6a184d7ba1013b89] Merge branch
> > > > > > > 'for-rc1/xen/core' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen git
> > > > > > > bisect good 2e1c63b7ed36532b68f0eddd6a184d7ba1013b89 # bad:
> > > > > > > [cd97824994042b809493807ea644ba26c0c23290] Merge
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
> > > > > > > git bisect bad cd97824994042b809493807ea644ba26c0c23290 #
> > > > > > > bad: [a2c252ebdeaab28c9b400570594d576dae295958] Merge
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
> > > > > > > git bisect bad a2c252ebdeaab28c9b400570594d576dae295958 #
> > > > > > > good: [b897e6fbc49dd84b2634bca664344d503b907ce9] Merge
> > > > > > > branch 'drm-intel-next' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
> > > > > > > git bisect good b897e6fbc49dd84b2634bca664344d503b907ce9 #
> > > > > > > good: [dfbc4752eab33e66f113f9daa2effbe241cd661d] brd:
> > > > > > > support barriers git bisect good
> > > > > > > dfbc4752eab33e66f113f9daa2effbe241cd661d # good:
> > > > > > > [a23c218bd36e11120daf18e00a91d5dc20e288e6] Merge branch
> > > > > > > 'merge' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
> > > > > > > git bisect good a23c218bd36e11120daf18e00a91d5dc20e288e6 #
> > > > > > > good: [23da64b4714812b66ecf010e7dfb3ed1bf2eda69] Merge
> > > > > > > branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
> > > > > > > git bisect good 23da64b4714812b66ecf010e7dfb3ed1bf2eda69 #
> > > > > > > good: [a228df6339e0d385b8149c860d81b6007f5e9c81] GFS2: Move
> > > > > > > umount flush rwsem git bisect good
> > > > > > > a228df6339e0d385b8149c860d81b6007f5e9c81 # skip:
> > > > > > > [1328df725239804ae30fc7257c1a3185e679b517] GFS2: Use
> > > > > > > DEFINE_SPINLOCK git bisect skip
> > > > > > > 1328df725239804ae30fc7257c1a3185e679b517 # good:
> > > > > > > [10d2198805d7faa2b193485446ff6b1de42c9b78] GFS2: cleanup
> > > > > > > file_operations mess git bisect good
> > > > > > > 10d2198805d7faa2b193485446ff6b1de42c9b78
> > > > > > > 
> > > > > > > As I understand, I found bad commit, but it includes 5
> > > > > > > commits. I checked them and they are good. So, I did git
> > > > > > > checkout a2c252ebdeaab28c9b400570594d576dae295958 and test
> > > > > > > one more time this commit. I found that bag is unstable.
> > > > > > > The boot can be bad or good on this commit. I compared
> > > > > > > dmesg of 'bad' and 'good' booting, but it is equal till
> > > > > > > delay. 
> > > > > > 
> > > > > > Thanks for doing it.
> > > > > > 
> > > > > > > Any suggestions?
> > > > > > 
> > > > > > Hard to tell...
> > > > > > 
> > > > > > I went through all commits in-between
> > > > > > 
> > > > > > 	fb4252e59452c18b88af014a2c4ee697bbf8cbc6
> > > > > > 
> > > > > > and
> > > > > > 
> > > > > > 	a2c252ebdeaab28c9b400570594d576dae295958
> > > > > > 
> > > > > > and there are no obvious candidates..
> > > > > > 
> > > > > > Could you please refresh my memory and tell me whether
> > > > > > 2.6.30-rc2 was OK?
> > > > > > 
> > > > > 
> > > > > It was ok, but I don't sure now. I tested only one boot, but
> > > > > this problem is not stable. I am rechecking it.
> > > > 
> > > > Ok.
> > > > 
> > > > Please also recheck first 'good' commit if 2.6.30-rc2 turns out
> > > > to be 'bad'.
> > > 
> > > My last testing is very interesting. I check boot of each kernel at
> > > least 10 times. I checked 2.6.30-rc2, it is good. Rest results:
> > > 
> > > git bisect start
> > > # good: [0882e8dd3aad33eca41696d463bb896e6c8817eb] Linux 2.6.30-rc2
> > > git bisect good 0882e8dd3aad33eca41696d463bb896e6c8817eb
> > > # bad: [a2c252ebdeaab28c9b400570594d576dae295958] Merge
> > > git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
> > > git bisect bad a2c252ebdeaab28c9b400570594d576dae295958 # good:
> > > [c2572f2b4ffc27ba79211aceee3bef53a59bb5cd] brd: fix cacheflushing
> > > git bisect good c2572f2b4ffc27ba79211aceee3bef53a59bb5cd # bad:
> > > [b71a0c296cee4debaf446760fbd29ead1587a7ac] powerpc: pseries/dtl.c
> > > should include asm/firmware.h git bisect bad
> > > b71a0c296cee4debaf446760fbd29ead1587a7ac
> > > 
> > > Last commit had delay only on 10 time, but it ripped my system, I
> > > can't read any place on my hard drive. I can't mount any other
> > > device on it. Fortunately, I have tmpfs and I mounted second hard
> > > disk on it place. After my computer didn't turn on. It looks like
> > > dead drive or controller, but after on/off and replace disks, it
> > > has alived. Below is log of this bad boot.
> > 
> > I think that is an old problem which for some reasons gets triggered
> > more easily in newer kernels or a hardware issue (or just combination
> > of both).
> 
> Could it be RCU (it is experimental implementation in this configuratin)
> related? I have booted 2.6.30-rc6, sometimes it has long delay (about 66
> seconds) after: "[    0.000000] Experimental hierarchical RCU implementation."
> Currently I have it enabled on 2.6.29.4 and don't have any problems. 
> I made photos of boots and some faults:
> http://img17.imageshack.us/img17/149/dscn4403b.jpg
> http://img21.imageshack.us/img21/254/dscn4407v.jpg
> http://img21.imageshack.us/img21/4919/dscn4413.jpg
> http://img14.imageshack.us/img14/1230/dscn4414y.jpg
> http://img13.imageshack.us/img13/878/dscn4419c.jpg
> It is not very good photo, but it is difficult to shoot monitor.
> It looks like same delay.

Indeed.  The delay problem is a more generic kernel/hardware issue.

> > 
> > [...]
> > 
> > > > [ BTW the above bisection points that the problem was introduced
> > > > outside of drivers/ide or that it was introduced earlier that we'd
> > > > initially thought ]
> > > > 
> > > > > I have added second hard drive and got new issue. May be this
> > > > > log (dmesg) can tell you something. It is on first known 'bad
> > > > > commit'. 2.6.29.2 can't properly
> > > > 
> > > > It tells us that there is some IRQ routing problem... seems like a
> > > > platform or ide-pmac specific problem.  Does some earlier kernel
> > > > work OK with this configuration?
> > > 
> > > I have Debian's 2.6.26, it has same problem. Don't take attention
> > > to time, I caught another bug on Aureal Vortex2, this driver kills
> > > kernel completely and system time too. I am lucky on bugs on this
> > > machine.
> > 
> > [...]
> > 
> > > [477194869.958131] hdb: QUANTUM FIREBALLP LM20.5, ATA DISK drive
> > 
> > [...]
> > 
> > > [477194895.065957] ide-pmac lost interrupt, dma status: 8480
> > > [477194895.068535] hdb: lost interrupt
> > > [477194895.070952] hdb: dma_intr: status=0x58 { DriveReady
> > > SeekComplete DataRequest } [477194895.073444] ide: failed opcode
> > > was: unknown [477194895.076053] hda: DMA disabled
> > > [477194895.078504] hdb: DMA disabled
> > > [477194895.209948] ide0: reset: success
> > > [477194895.430619]  hdb1 hdb2 < hdb5 hdb6 hdb7 hdb8 >
> > 
> > This drive is one of the "quirky" drives which has special
> > workarounds in some host drivers...
> > 
> > OK, lets try something else.  I went through IDE code and fixed
> > outstanding issues which may be related to these problems + mixed-in
> > pending bugfixes.
> 
> There is dmesg of 2.6.30-rc6 with patch, I could logged in only once, 
> 2 boots couldn't give me such ability.

Thanks for testing.  Unfortunately none of fixes helped.. :/

> [   70.287747] sysfs: cannot create duplicate filename '/class/ide_port/ide1'
> [   70.287841] ------------[ cut here ]------------
> [   70.287879] Badness at fs/sysfs/dir.c:487

This is caused by small mistake in one of fixes, follow-up fixup:

diff -u b/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
--- b/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -708,6 +708,8 @@
 		goto out;
 	} else if (rc == -EBUSY)
 		printk(KERN_ERR "%s: not ready before the probe\n", hwif->name);
+	else
+		rc = -ENODEV;
 
 	/*
 	 * Second drive should only exist if first drive was found,


> [   70.584122]  hdb:<3>ide-pmac lost interrupt, dma status: 8480

DMA status indicates that DMA transfer is still active according to
the controller.  This one is really a platform/hardware specific issue.

  reply	other threads:[~2009-06-08 20:23 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-25 13:02 Delay on intialization ide subsystem(most likely) Andrey Gusev
2009-04-25 14:48 ` Bartlomiej Zolnierkiewicz
2009-04-26 22:20   ` Andrey Gusev
2009-04-27 20:36   ` Andrey Gusev
2009-04-27 21:21     ` Bartlomiej Zolnierkiewicz
2009-04-28  8:26       ` Re[2]: " Андрей Гусев
2009-04-29 21:05       ` Andrey Gusev
2009-05-12 19:50       ` Andrey Gusev
2009-05-13 13:28         ` Bartlomiej Zolnierkiewicz
2009-05-13 17:11           ` Andrey Gusev
2009-05-13 18:46             ` Bartlomiej Zolnierkiewicz
2009-05-15 20:40               ` Andrey Gusev
2009-05-20 15:56                 ` Bartlomiej Zolnierkiewicz
2009-05-30 10:46                   ` Andrey Gusev
2009-06-08 20:20                     ` Bartlomiej Zolnierkiewicz [this message]
2009-06-08 23:26                       ` Benjamin Herrenschmidt
2009-06-08 23:26                         ` Benjamin Herrenschmidt
2009-06-10 11:44                         ` Bartlomiej Zolnierkiewicz
2009-06-10 11:44                           ` Bartlomiej Zolnierkiewicz
2009-07-05 11:17                           ` Andrey Gusev
2009-07-05 11:17                             ` Andrey Gusev
2009-07-06 15:04                             ` Bartlomiej Zolnierkiewicz
2009-07-06 15:04                               ` Bartlomiej Zolnierkiewicz
2009-07-06 23:23                               ` Benjamin Herrenschmidt
2009-07-07 21:18                                 ` Andrey Gusev
2009-07-07 21:18                                   ` Andrey Gusev
2009-07-08  1:12                                   ` Benjamin Herrenschmidt
2009-07-08  1:12                                     ` Benjamin Herrenschmidt
2009-07-13 19:27                                     ` Andrey Gusev
2009-07-13 19:27                                       ` Andrey Gusev
2009-06-10 20:38                       ` Andrey Gusev
2009-06-10 20:38                         ` Andrey Gusev
2009-06-10 21:46                         ` Benjamin Herrenschmidt
2009-06-10 21:46                           ` Benjamin Herrenschmidt
2009-06-10 21:57                           ` Andrey Gusev
2009-06-10 21:57                             ` Andrey Gusev
2009-06-11  0:44                             ` Benjamin Herrenschmidt
2009-06-20 18:12                               ` Andrey Gusev
2009-06-20 18:12                                 ` Andrey Gusev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200906082220.41763.bzolnier@gmail.com \
    --to=bzolnier@gmail.com \
    --cc=a.gusev1980@mail.ru \
    --cc=linux-ide@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=petkovbb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.