linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PROBLEM]: hdparm strange behaviour for 2.6.21 and later
@ 2007-06-16 16:58 Thanos Kyritsis
  2007-06-18 20:01 ` Mark Lord
  0 siblings, 1 reply; 6+ messages in thread
From: Thanos Kyritsis @ 2007-06-16 16:58 UTC (permalink / raw)
  To: linux-ide; +Cc: Bartlomiej Zolnierkiewicz

Hello,

starting with kernel 2.6.21 and up to kernel 2.6.22-rc4, I'm having the 
following problem:

/etc/rc.d/rc.local contains the following:
/usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hda
/usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdb
/usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdc
/usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdd

(I'm using Slackware, no Debian-style automated hdparm.conf is running 
during bootup, that's why these are in rc.local)

The above seem to somehow lock up the boot procedure just at the point 
where rc.local gets executed, so the system never reaches login prompt.
All drivers (kernelspace) and system daemons (userspace) before rc.local 
do normally load, but there are no strange messages in the console or in 
the system logs and because I cannot login, I cannot trace it any further. 
I believe the kernel is in running state because the machine responds to 
ICMP pings from the ethernet, but since the login prompt is not up, the 
already running sshd/telnetd do not provide any help.

The strange thing is that if I remove all the quiet options (-q) from the 
above commands, everything works like it should. Furthermore, if I 
comment them out from rc.local, then boot, login, and execute them by 
hand (with -q), again everything works like it should. Lockup only happens if 
I run 2 or more hdparm commands, if I leave only one (doesn't matter 
which one) hdparm command in rc.local (with -q), it works.

This is not happening for kernels up to 2.6.20.14 and I'm using the same 
above hdparm options for over a year while the hardware hasn't changed 
at all. 

Speaking of hardware:
Pentium 4 HT, ICH5 IDE Controller, running on SMP/HT kernel 
(ticks enabled @ 1000 Hz, PREEMPT/low-latency is on, 
CONFIG_BLK_DEV_IDEDMA=y).
hda and hdb are Hard drives.
hdc and hdd are DVD drives (hdc is a recorder).


Can this be regarded as a kernel bug at all ? Can I do something to properly 
debug it and help you out ?

I posted it here because I couldn't help noticing the following inside .21's Changelog:

commit 8799620400b0b1a4729d8be828b5bfb3d2a8db1a
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Mon Mar 26 23:03:19 2007 +0200

    ide: fix locking for manual DMA enable/disable ("hdparm -d")
    
    Since hwif->ide_dma_check and hwif->ide_dma_on never queue any commands
    (ide_config_drive_speed() sets transfer mode using polling and has no error
    recovery) we are safe with setting hwgroup->busy for the time while DMA
    setting for a drive is changed (so it won't race against I/O commands in fly).
    
    I audited briefly all ->ide_dma_check/->ide_dma_on/->tuneproc/->speedproc
    implementations and they all look OK wrt to this change.
    
    This patch finally allowed me to close kernel bugzilla bug #8169
    (once again thanks to Patrick Horn for reporting the issue & testing patches).
    
    Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
    Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
    Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>


-- 
Thanos Kyritsis <djart at linux.gr>

- What's your ONE purpose in life ?
- To explode, of course! ;-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later
  2007-06-16 16:58 [PROBLEM]: hdparm strange behaviour for 2.6.21 and later Thanos Kyritsis
@ 2007-06-18 20:01 ` Mark Lord
  2007-06-20 15:07   ` Thanos Kyritsis
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Lord @ 2007-06-18 20:01 UTC (permalink / raw)
  To: Thanos Kyritsis; +Cc: linux-ide, Bartlomiej Zolnierkiewicz

Thanos Kyritsis wrote:
> Hello,
> 
> starting with kernel 2.6.21 and up to kernel 2.6.22-rc4, I'm having the 
> following problem:
> 
> /etc/rc.d/rc.local contains the following:
> /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hda
> /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdb
> /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdc
> /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdd
> 
> (I'm using Slackware, no Debian-style automated hdparm.conf is running 
> during bootup, that's why these are in rc.local)
> 
> The above seem to somehow lock up the boot procedure just at the point 
> where rc.local gets executed, so the system never reaches login prompt.
> All drivers (kernelspace) and system daemons (userspace) before rc.local 
> do normally load, but there are no strange messages in the console or in 
> the system logs and because I cannot login, I cannot trace it any further. 
> I believe the kernel is in running state because the machine responds to 
> ICMP pings from the ethernet, but since the login prompt is not up, the 
> already running sshd/telnetd do not provide any help.
> 
> The strange thing is that if I remove all the quiet options (-q) from the 
> above commands, everything works like it should. Furthermore, if I 
> comment them out from rc.local, then boot, login, and execute them by 
> hand (with -q), again everything works like it should. Lockup only happens if 
> I run 2 or more hdparm commands, if I leave only one (doesn't matter 
> which one) hdparm command in rc.local (with -q), it works.

Sounds like a (kernel) timing issue.
The "-q" option gets rid of some intermediary printf's,
and nothing else.  So with -q, the ioctl() calls happen
much closer together in time.  Without -q, the intermediary
printf's likely cause a resched, giving the kernel more time
to complete anything left over from the earlier call.

????

Any difference with a modern version of hdparm?

-ml

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later
  2007-06-18 20:01 ` Mark Lord
@ 2007-06-20 15:07   ` Thanos Kyritsis
  2007-06-23 18:28     ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 6+ messages in thread
From: Thanos Kyritsis @ 2007-06-20 15:07 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-ide, Bartlomiej Zolnierkiewicz

On Monday 18 June 2007, Mark Lord wrote:
> Thanos Kyritsis wrote:
[snip]
> > /etc/rc.d/rc.local contains the following:
> > /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hda
> > /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdb
[snip]

> Sounds like a (kernel) timing issue.
> The "-q" option gets rid of some intermediary printf's,
> and nothing else.  So with -q, the ioctl() calls happen
> much closer together in time.  Without -q, the intermediary
> printf's likely cause a resched, giving the kernel more time
> to complete anything left over from the earlier call.
>
> ????
>
> Any difference with a modern version of hdparm?

The same issue happens when using hdparm 7.4 as well as 7.5.


> -ml



-- 
Thanos Kyritsis <djart at linux.gr>

- What's your ONE purpose in life ?
- To explode, of course! ;-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later
  2007-06-20 15:07   ` Thanos Kyritsis
@ 2007-06-23 18:28     ` Bartlomiej Zolnierkiewicz
  2007-06-24 17:47       ` Thanos Kyritsis
  0 siblings, 1 reply; 6+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-06-23 18:28 UTC (permalink / raw)
  To: Thanos Kyritsis; +Cc: Mark Lord, linux-ide


Hi,

On Wednesday 20 June 2007, Thanos Kyritsis wrote:
> On Monday 18 June 2007, Mark Lord wrote:
> > Thanos Kyritsis wrote:
> [snip]
> > > /etc/rc.d/rc.local contains the following:
> > > /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hda
> > > /usr/sbin/hdparm -q -d1 -q -u1 -q -c1 -q -k1 /dev/hdb
> [snip]
> 
> > Sounds like a (kernel) timing issue.
> > The "-q" option gets rid of some intermediary printf's,
> > and nothing else.  So with -q, the ioctl() calls happen
> > much closer together in time.  Without -q, the intermediary
> > printf's likely cause a resched, giving the kernel more time
> > to complete anything left over from the earlier call.

It could be that some assumptions that I've taken when
fixing DMA tuning locking were wrong...

> > ????
> >
> > Any difference with a modern version of hdparm?
> 
> The same issue happens when using hdparm 7.4 as well as 7.5.

Adding a couple of printk-s to ide.c::set_using_dma() and
ide.c::ide_spin_wait_hwgroup() will for sure help in debugging
it further.

Also could you try running UP kernel without PREEMPT and see
if it makes difference?

Thanks,
Bart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later
  2007-06-23 18:28     ` Bartlomiej Zolnierkiewicz
@ 2007-06-24 17:47       ` Thanos Kyritsis
  2007-06-27 19:46         ` PREEMPT bug? (was: Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later) Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 6+ messages in thread
From: Thanos Kyritsis @ 2007-06-24 17:47 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Mark Lord, linux-ide

On Saturday 23 June 2007, Bartlomiej Zolnierkiewicz wrote:
> Hi,

Hello, thanks for answering :-D


> Also could you try running UP kernel without PREEMPT and see
> if it makes difference?

Yes, it did make a difference. I've just tried both UP and SMP kernels 
(22-rc5) *without* PREEMPT. Neither of these locked!

I also tried UP with PREEMPT and it locks.

> Adding a couple of printk-s to ide.c::set_using_dma() and
> ide.c::ide_spin_wait_hwgroup() will for sure help in debugging
> it further.

I just tried that as well. I placed printk-s when entering and exiting 
these 2 functions and when they use spin_lock_irq() and 
spin_unlock_irq(), and then watched the output:

I didn't see something odd. For every hdparm execution, set_using_dma() 
is called, it successfully calls ide_spin_wait_hwgroup(), then 
set_using_dma() finishes.
After set_using_dma() finishes, one more ide_spin_wait_hwgroup() 
is called and finishes.

This pattern happens always, no matter if the kernel will eventually 
lock (preempt) or not lock (no preempt), with absolutely no 
differences. 

entering ide.c::set_using_dma()
|
  entering ide.c::ide_spin_wait_hwgroup()
    ide.c::ide_spin_wait_hwgroup: LOCK
  exiting ide.c::ide_spin_wait_hwgroup()
|
ide.c::set_using_dma: set ->busy flag, unlock and let it ride
ide.c::set_using_dma: UNLOCK
ide.c::set_using_dma: lock, clear ->busy flag and unlock before leaving
ide.c::set_using_dma: LOCK
ide.c::set_using_dma: UNLOCK
|
exiting ide.c::set_using_dma()

entering ide.c::ide_spin_wait_hwgroup()
ide.c::ide_spin_wait_hwgroup: LOCK
exiting ide.c::ide_spin_wait_hwgroup()

The above gets printed twice (one for the hda hdparm and one for hdb).

But I noticed something extra. 

Sometimes, the ide_spin_wait_hwgroup() that runs either before the 1st 
set_using_dma() or between the 1st and the 2nd (1st for hda, 2nd for 
hdb) (*but never the one called last*) is waiting A LOT inside the busy 
loop (while (hwgroup->busy)).

I think PREEMPT kernels always produce a lot of this while loop output, 
and then print the above pattern, then lock.

Non-PREEMPT kernels don't always produce the huge while loop output, 
only sometimes, but they never have locking problem.
I don't know if this is at all relevant to the problem. Perhaps it's 
normal that during some of the bootups the IDE device group is busy 
while during other bootups it's not busy, right ?


However, since all functions exit properly, should I try to place 
printk-s in other functions as well ? 

(I kind of need and appreciate guideance in order to help you, because 
I've never been in the kernel hacking business before :) )

> Thanks,
> Bart


-- 
Thanos Kyritsis <djart at linux.gr>

- What's your ONE purpose in life ?
- To explode, of course! ;-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* PREEMPT bug? (was: Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later)
  2007-06-24 17:47       ` Thanos Kyritsis
@ 2007-06-27 19:46         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 6+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-06-27 19:46 UTC (permalink / raw)
  To: Thanos Kyritsis; +Cc: Mark Lord, linux-ide, linux-kernel


Hi,

On Sunday 24 June 2007, Thanos Kyritsis wrote:
> On Saturday 23 June 2007, Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> 
> Hello, thanks for answering :-D
> 
> 
> > Also could you try running UP kernel without PREEMPT and see
> > if it makes difference?
> 
> Yes, it did make a difference. I've just tried both UP and SMP kernels 
> (22-rc5) *without* PREEMPT. Neither of these locked!

Thanks for testing.

IIRC SMP kernel without PREEMPT should also fail (or not?),
it could be that we are hitting some generic PREEMPT bug.

Cc:ed linux-kernel@ in hope that some PREEMPT guru lends us a hand.

[ original thread is here:
 http://www.mail-archive.com/linux-ide@vger.kernel.org/msg07380.html ]

> I also tried UP with PREEMPT and it locks.
> 
> > Adding a couple of printk-s to ide.c::set_using_dma() and
> > ide.c::ide_spin_wait_hwgroup() will for sure help in debugging
> > it further.
> 
> I just tried that as well. I placed printk-s when entering and exiting 
> these 2 functions and when they use spin_lock_irq() and 
> spin_unlock_irq(), and then watched the output:
> 
> I didn't see something odd. For every hdparm execution, set_using_dma() 
> is called, it successfully calls ide_spin_wait_hwgroup(), then 
> set_using_dma() finishes.
> After set_using_dma() finishes, one more ide_spin_wait_hwgroup() 
> is called and finishes.
> 
> This pattern happens always, no matter if the kernel will eventually 
> lock (preempt) or not lock (no preempt), with absolutely no 
> differences. 
> 
> entering ide.c::set_using_dma()
> |
>   entering ide.c::ide_spin_wait_hwgroup()
>     ide.c::ide_spin_wait_hwgroup: LOCK
>   exiting ide.c::ide_spin_wait_hwgroup()
> |
> ide.c::set_using_dma: set ->busy flag, unlock and let it ride
> ide.c::set_using_dma: UNLOCK
> ide.c::set_using_dma: lock, clear ->busy flag and unlock before leaving
> ide.c::set_using_dma: LOCK
> ide.c::set_using_dma: UNLOCK
> |
> exiting ide.c::set_using_dma()
> 
> entering ide.c::ide_spin_wait_hwgroup()
> ide.c::ide_spin_wait_hwgroup: LOCK
> exiting ide.c::ide_spin_wait_hwgroup()
> 
> The above gets printed twice (one for the hda hdparm and one for hdb).
> 
> But I noticed something extra. 
> 
> Sometimes, the ide_spin_wait_hwgroup() that runs either before the 1st 
> set_using_dma() or between the 1st and the 2nd (1st for hda, 2nd for 
> hdb) (*but never the one called last*) is waiting A LOT inside the busy 
> loop (while (hwgroup->busy)).
> 
> I think PREEMPT kernels always produce a lot of this while loop output, 
> and then print the above pattern, then lock.
> 
> Non-PREEMPT kernels don't always produce the huge while loop output, 
> only sometimes, but they never have locking problem.
> I don't know if this is at all relevant to the problem. Perhaps it's 
> normal that during some of the bootups the IDE device group is busy 
> while during other bootups it's not busy, right ?

Yes, this is expected behavior (especially when you mix SMP in)
unless of course ide_spin_wait_hwgroupt() fails (timeouts).

> However, since all functions exit properly, should I try to place 
> printk-s in other functions as well ? 

ide_do_request() and hwgroup->busy flag but as this could produce *a*lot*
of output (serial or net console would be required to capture the log for
the lockup case).

> (I kind of need and appreciate guideance in order to help you, because 
> I've never been in the kernel hacking business before :) )

No problem, happy hacking. :)

Thanks,
Bart

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-06-27 19:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-16 16:58 [PROBLEM]: hdparm strange behaviour for 2.6.21 and later Thanos Kyritsis
2007-06-18 20:01 ` Mark Lord
2007-06-20 15:07   ` Thanos Kyritsis
2007-06-23 18:28     ` Bartlomiej Zolnierkiewicz
2007-06-24 17:47       ` Thanos Kyritsis
2007-06-27 19:46         ` PREEMPT bug? (was: Re: [PROBLEM]: hdparm strange behaviour for 2.6.21 and later) Bartlomiej Zolnierkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).