public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: CONFIG_NO_HZ could cause software timeouts
@ 2009-09-03 10:50 Norbert van Bolhuis
  2009-09-03 11:22 ` Corrado Zoccolo
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Norbert van Bolhuis @ 2009-09-03 10:50 UTC (permalink / raw)
  To: linux-kernel


The problem occurs when e.g. drivers use time_after(jiffes, timeout).

CONFIG_NO_HZ could make jiffies advance by more than 1.
This is done by: tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer

If drivers use a timeout value of jiffies+1,
"time_after(jiffies, timeout)" will be true after 1 interrupt
(given that it advances jiffies by at least 2).

This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250, CONFIG_NO_HZ=y).

do_write_buffer does the following:
  unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
  ...
  timeo = jiffies + uWriteTimeout;
  ...
  for (;;) {
   ...
   if (time_after(jiffies, timeo) && !chip_ready(map, adr))
    break;
   if (chip_ready(map, adr)) {
    xip_enable(map, chip, adr);
    goto op_done;
   }
   UDELAY(map, chip, adr, 1);
  }
  /* software timeout */
  ret = -EIO;
opdone:
  ...

I've seen a few software timeouts after the for-loop
looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
our NOR flash (S29GL01GP) may need upto ~ 200 us to be ready.

disabling CONFIG_NO_HZ fixes the problem.
replacing time_after by a for-loop counter to loop max 1000 times
also fixes the problem.

the latest kernel seems to have the same problem.

do I miss something here or is this a known problem of CONFIG_NO_HZ ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts
  2009-09-03 10:50 PROBLEM: CONFIG_NO_HZ could cause software timeouts Norbert van Bolhuis
@ 2009-09-03 11:22 ` Corrado Zoccolo
  2009-09-03 11:59   ` Norbert van Bolhuis
  2009-09-05 18:19 ` Marcin Slusarz
  2009-09-06  5:27 ` Pavel Machek
  2 siblings, 1 reply; 7+ messages in thread
From: Corrado Zoccolo @ 2009-09-03 11:22 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-kernel

Is XIP enabled? It seems that the xip_udelay is not very accurate, and
may be the cause of what you are seeing.

On Thu, Sep 3, 2009 at 12:50 PM, Norbert van
Bolhuis<nvbolhuis@aimvalley.nl> wrote:
>
> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>
> CONFIG_NO_HZ could make jiffies advance by more than 1.
> This is done by:
> tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
>
> If drivers use a timeout value of jiffies+1,
> "time_after(jiffies, timeout)" will be true after 1 interrupt
> (given that it advances jiffies by at least 2).
>
> This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
> for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250, CONFIG_NO_HZ=y).
>
> do_write_buffer does the following:
>  unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
>  ...
>  timeo = jiffies + uWriteTimeout;
>  ...
>  for (;;) {
>  ...
>  if (time_after(jiffies, timeo) && !chip_ready(map, adr))
>   break;
>  if (chip_ready(map, adr)) {
>   xip_enable(map, chip, adr);
>   goto op_done;
>  }
>  UDELAY(map, chip, adr, 1);
>  }
>  /* software timeout */
>  ret = -EIO;
> opdone:
>  ...
>
> I've seen a few software timeouts after the for-loop
> looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
> our NOR flash (S29GL01GP) may need upto ~ 200 us to be ready.
>
> disabling CONFIG_NO_HZ fixes the problem.
> replacing time_after by a for-loop counter to loop max 1000 times
> also fixes the problem.
>
> the latest kernel seems to have the same problem.
>
> do I miss something here or is this a known problem of CONFIG_NO_HZ ?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts
  2009-09-03 11:22 ` Corrado Zoccolo
@ 2009-09-03 11:59   ` Norbert van Bolhuis
  0 siblings, 0 replies; 7+ messages in thread
From: Norbert van Bolhuis @ 2009-09-03 11:59 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: linux-kernel

Corrado Zoccolo wrote:
> Is XIP enabled? It seems that the xip_udelay is not very accurate, and
> may be the cause of what you are seeing.
> 

no it isn't. CONFIG_MTD_XIP has been disabled.
nor did someone supsended the write (chip->state remains FL_WRITING).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts
  2009-09-03 10:50 PROBLEM: CONFIG_NO_HZ could cause software timeouts Norbert van Bolhuis
  2009-09-03 11:22 ` Corrado Zoccolo
@ 2009-09-05 18:19 ` Marcin Slusarz
  2009-09-06  5:58   ` Pavel Machek
  2009-09-06  5:27 ` Pavel Machek
  2 siblings, 1 reply; 7+ messages in thread
From: Marcin Slusarz @ 2009-09-05 18:19 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-kernel

Norbert van Bolhuis wrote:
> 
> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
> 
> CONFIG_NO_HZ could make jiffies advance by more than 1.
> This is done by:
> tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
> 
> If drivers use a timeout value of jiffies+1,
> "time_after(jiffies, timeout)" will be true after 1 interrupt
> (given that it advances jiffies by at least 2).
> 
> This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
> for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250,
> CONFIG_NO_HZ=y).
> 
> do_write_buffer does the following:
>  unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
>  ...
>  timeo = jiffies + uWriteTimeout;
>  ...
>  for (;;) {
>   ...
>   if (time_after(jiffies, timeo) && !chip_ready(map, adr))
>    break;
>   if (chip_ready(map, adr)) {
>    xip_enable(map, chip, adr);
>    goto op_done;
>   }
>   UDELAY(map, chip, adr, 1);
>  }
>  /* software timeout */
>  ret = -EIO;
> opdone:
>  ...
> 
> I've seen a few software timeouts after the for-loop
> looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically

Are you sure? UDELAY may call schedule(), which can return to this thread
after much longer time than 13us...


Marcin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts
  2009-09-03 10:50 PROBLEM: CONFIG_NO_HZ could cause software timeouts Norbert van Bolhuis
  2009-09-03 11:22 ` Corrado Zoccolo
  2009-09-05 18:19 ` Marcin Slusarz
@ 2009-09-06  5:27 ` Pavel Machek
  2 siblings, 0 replies; 7+ messages in thread
From: Pavel Machek @ 2009-09-06  5:27 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-kernel, Thomas Gleixner

On Thu 2009-09-03 12:50:12, Norbert van Bolhuis wrote:
>
> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>
> CONFIG_NO_HZ could make jiffies advance by more than 1.
> This is done by: tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer

jiffies really should be up-to-date as long as you are running
anything else than idle loop. If not, that's a bug ... and nasty one.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts
  2009-09-05 18:19 ` Marcin Slusarz
@ 2009-09-06  5:58   ` Pavel Machek
  2009-09-06 10:11     ` Marcin Slusarz
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Machek @ 2009-09-06  5:58 UTC (permalink / raw)
  To: Marcin Slusarz; +Cc: Norbert van Bolhuis, linux-kernel

On Sat 2009-09-05 20:19:46, Marcin Slusarz wrote:
> Norbert van Bolhuis wrote:
> > 
> > The problem occurs when e.g. drivers use time_after(jiffes, timeout).
> > 
> > CONFIG_NO_HZ could make jiffies advance by more than 1.
> > This is done by:
> > tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
> > 
> > If drivers use a timeout value of jiffies+1,
> > "time_after(jiffies, timeout)" will be true after 1 interrupt
> > (given that it advances jiffies by at least 2).
> > 
> > This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
> > for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250,
> > CONFIG_NO_HZ=y).
> > 
> > do_write_buffer does the following:
> >  unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
> >  ...
> >  timeo = jiffies + uWriteTimeout;
> >  ...
> >  for (;;) {
> >   ...
> >   if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> >    break;
> >   if (chip_ready(map, adr)) {
> >    xip_enable(map, chip, adr);
> >    goto op_done;
> >   }
> >   UDELAY(map, chip, adr, 1);
> >  }
> >  /* software timeout */
> >  ret = -EIO;
> > opdone:
> >  ...
> > 
> > I've seen a few software timeouts after the for-loop
> > looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
> 
> Are you sure? UDELAY may call schedule(), which can return to this thread
> after much longer time than 13us...

Too long wait is expected, but AFAICS he's complaining about too short
delay and that's a hard bug.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts
  2009-09-06  5:58   ` Pavel Machek
@ 2009-09-06 10:11     ` Marcin Slusarz
  0 siblings, 0 replies; 7+ messages in thread
From: Marcin Slusarz @ 2009-09-06 10:11 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Norbert van Bolhuis, linux-kernel

Pavel Machek wrote:
> On Sat 2009-09-05 20:19:46, Marcin Slusarz wrote:
>> Norbert van Bolhuis wrote:
>>> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>>>
>>> CONFIG_NO_HZ could make jiffies advance by more than 1.
>>> This is done by:
>>> tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
>>>
>>> If drivers use a timeout value of jiffies+1,
>>> "time_after(jiffies, timeout)" will be true after 1 interrupt
>>> (given that it advances jiffies by at least 2).
>>>
>>> This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
>>> for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250,
>>> CONFIG_NO_HZ=y).
>>>
>>> do_write_buffer does the following:
>>>  unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
>>>  ...
>>>  timeo = jiffies + uWriteTimeout;
>>>  ...
>>>  for (;;) {
>>>   ...
>>>   if (time_after(jiffies, timeo) && !chip_ready(map, adr))
>>>    break;
>>>   if (chip_ready(map, adr)) {
>>>    xip_enable(map, chip, adr);
>>>    goto op_done;
>>>   }
>>>   UDELAY(map, chip, adr, 1);
>>>  }
>>>  /* software timeout */
>>>  ret = -EIO;
>>> opdone:
>>>  ...
>>>
>>> I've seen a few software timeouts after the for-loop
>>> looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
>> Are you sure? UDELAY may call schedule(), which can return to this thread
>> after much longer time than 13us...
> 
> Too long wait is expected, but AFAICS he's complaining about too short
> delay and that's a hard bug.

Yeah, I know. But conclusion is a bit fishy - 13 iterations don't necessarily mean 13us.
Bug might be elsewhere.

Marcin


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-06 10:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-03 10:50 PROBLEM: CONFIG_NO_HZ could cause software timeouts Norbert van Bolhuis
2009-09-03 11:22 ` Corrado Zoccolo
2009-09-03 11:59   ` Norbert van Bolhuis
2009-09-05 18:19 ` Marcin Slusarz
2009-09-06  5:58   ` Pavel Machek
2009-09-06 10:11     ` Marcin Slusarz
2009-09-06  5:27 ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox