* is minimum udelay() not respected in preemptible SMP kernel-2.6.23? @ 2007-11-07 17:21 Marin Mitov 2007-11-07 20:30 ` Andrew Morton 0 siblings, 1 reply; 13+ messages in thread From: Marin Mitov @ 2007-11-07 17:21 UTC (permalink / raw) To: linux-kernel Hi all, I have written a linux device driver for a frame grabber I use in my every day experimental work. In my device driver I have to write to a MMIO register, wait for a while (using udelay(65)) for data being written to an internal register (i2c?) and test a flag (in another MMIO register) if the operation has completed. (The hardware guarantees that the operation has completed in less than 65 usec). If the flag is not reset I write a message via printk. After switching to the kernel-2.6.23 (compiled as PREEMPTIBLE SMP i686) (AMD dual core) I see this message in dmesg output sometime. Testing with rdtscll() before and after udelay(65) shows the expected delay of 65 usec (after dividing by CPU frequency) when all is OK, but gives a big value (in the tenths msec range) when the error message shows itself in dmesg. Bracketing udelay(65) by: local_irq_disable(); udelay(65); local_irq_enable(); as well as by preempt_disable(); udelay(65); preempt_enable(); leads to message disappearing. I believe the hardware is working correctly, so if the flag is not reset I think udelay(65) returns prematurely (the flag clears some time latter) And it does not matter if I use udelay(65) or udelay(100). What could be the reason for such a behavior? Is this a bug in udelay() due to preemption? (udelay() being preempted and migrated to another processor) All my previous kernels used were SMP (but not PREEMPTIBLE) My kernel is compiled with: CONFIG_PREEMPT=y CONFIG_IRQBALANCE=y CONFIG_HPET_TIMER=y And I have this line in dmesg: Time: acpi_pm clocksource has been installed. Switched to high resolution mode on CPU 0 Switched to high resolution mode on CPU 1 The south bridge is: VIA VT8237 (Asus A8V Delux) Thank you in advance for your help in understanding where the problem is coming from. Best regards. Marin Mitov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-07 17:21 is minimum udelay() not respected in preemptible SMP kernel-2.6.23? Marin Mitov @ 2007-11-07 20:30 ` Andrew Morton 2007-11-07 23:10 ` Andi Kleen ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Andrew Morton @ 2007-11-07 20:30 UTC (permalink / raw) To: Marin Mitov; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Andi Kleen > On Wed, 7 Nov 2007 19:21:52 +0200 Marin Mitov <mitov@issp.bas.bg> wrote: > Hi all, > > I have written a linux device driver for a frame grabber I use in my > every day experimental work. > > In my device driver I have to write to a MMIO register, wait for a while > (using udelay(65)) for data being written to an internal register (i2c?) > and test a flag (in another MMIO register) if the operation has completed. > (The hardware guarantees that the operation has completed in less than > 65 usec). If the flag is not reset I write a message via printk. > After switching to the kernel-2.6.23 (compiled as PREEMPTIBLE SMP i686) > (AMD dual core) I see this message in dmesg output sometime. > > Testing with rdtscll() before and after udelay(65) shows the expected > delay of 65 usec (after dividing by CPU frequency) when all is OK, but > gives a big value (in the tenths msec range) when the error message > shows itself in dmesg. > > Bracketing udelay(65) by: > > local_irq_disable(); > udelay(65); > local_irq_enable(); > > as well as by > > preempt_disable(); > udelay(65); > preempt_enable(); > > leads to message disappearing. > > I believe the hardware is working correctly, so if the flag is not reset > I think udelay(65) returns prematurely (the flag clears some time latter) > And it does not matter if I use udelay(65) or udelay(100). > > What could be the reason for such a behavior? > Is this a bug in udelay() due to preemption? > (udelay() being preempted and migrated to another processor) > > All my previous kernels used were SMP (but not PREEMPTIBLE) > > My kernel is compiled with: > CONFIG_PREEMPT=y > CONFIG_IRQBALANCE=y > CONFIG_HPET_TIMER=y > > And I have this line in dmesg: > Time: acpi_pm clocksource has been installed. > Switched to high resolution mode on CPU 0 > Switched to high resolution mode on CPU 1 > > The south bridge is: VIA VT8237 (Asus A8V Delux) > > Thank you in advance for your help in understanding where > the problem is coming from. > Ow. Yes, from my reading delay_tsc() can return early (or after heat-death-of-the-universe) if the TSCs are offset and if preemption migrates the calling task between CPUs. I suppose a lameo fix would be to disable preemption in delay_tsc(). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-07 20:30 ` Andrew Morton @ 2007-11-07 23:10 ` Andi Kleen 2007-11-08 0:20 ` Matt Mackall 2007-11-08 11:24 ` Marin Mitov 2 siblings, 0 replies; 13+ messages in thread From: Andi Kleen @ 2007-11-07 23:10 UTC (permalink / raw) To: Andrew Morton; +Cc: Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar > Ow. Yes, from my reading delay_tsc() can return early (or after > heat-death-of-the-universe) if the TSCs are offset and if preemption > migrates the calling task between CPUs. > > I suppose a lameo fix would be to disable preemption in delay_tsc(). Yes. Can't think of any better reasonable fix. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-07 20:30 ` Andrew Morton 2007-11-07 23:10 ` Andi Kleen @ 2007-11-08 0:20 ` Matt Mackall 2007-11-08 0:31 ` Andi Kleen 2007-11-08 9:11 ` Peter Zijlstra 2007-11-08 11:24 ` Marin Mitov 2 siblings, 2 replies; 13+ messages in thread From: Matt Mackall @ 2007-11-08 0:20 UTC (permalink / raw) To: Andrew Morton Cc: Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar, Andi Kleen On Wed, Nov 07, 2007 at 12:30:45PM -0800, Andrew Morton wrote: > Ow. Yes, from my reading delay_tsc() can return early (or after > heat-death-of-the-universe) if the TSCs are offset and if preemption > migrates the calling task between CPUs. > > I suppose a lameo fix would be to disable preemption in delay_tsc(). preempt_disable is lousy documentation here. This and other cases (lots of per_cpu users, IIRC) actually want a migrate_disable() which is a proper subset. We can simply implement migrate_disable() as preempt_disable() for now and come back later and implement a proper migrate_disable() that still allows preemption (and thus avoids the latency). -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 0:20 ` Matt Mackall @ 2007-11-08 0:31 ` Andi Kleen 2007-11-08 1:03 ` Matt Mackall 2007-11-08 11:46 ` Avi Kivity 2007-11-08 9:11 ` Peter Zijlstra 1 sibling, 2 replies; 13+ messages in thread From: Andi Kleen @ 2007-11-08 0:31 UTC (permalink / raw) To: Matt Mackall Cc: Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar On Thursday 08 November 2007 01:20, Matt Mackall wrote: > On Wed, Nov 07, 2007 at 12:30:45PM -0800, Andrew Morton wrote: > > Ow. Yes, from my reading delay_tsc() can return early (or after > > heat-death-of-the-universe) if the TSCs are offset and if preemption > > migrates the calling task between CPUs. > > > > I suppose a lameo fix would be to disable preemption in delay_tsc(). > > preempt_disable is lousy documentation here. This and other cases > (lots of per_cpu users, IIRC) actually want a migrate_disable() which > is a proper subset. We can simply implement migrate_disable() as > preempt_disable() for now and come back later and implement a proper > migrate_disable() that still allows preemption (and thus avoids the > latency). We could actually do this right now. migrate_disable() can be just changing the cpu affinity of the current thread to current cpu and then restoring it afterwards. That should even work from interrupt context. get_cpu() etc. could be changed to use this then too. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 0:31 ` Andi Kleen @ 2007-11-08 1:03 ` Matt Mackall 2007-11-08 1:20 ` Andi Kleen 2007-11-08 11:46 ` Avi Kivity 1 sibling, 1 reply; 13+ messages in thread From: Matt Mackall @ 2007-11-08 1:03 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar On Thu, Nov 08, 2007 at 01:31:00AM +0100, Andi Kleen wrote: > On Thursday 08 November 2007 01:20, Matt Mackall wrote: > > On Wed, Nov 07, 2007 at 12:30:45PM -0800, Andrew Morton wrote: > > > Ow. Yes, from my reading delay_tsc() can return early (or after > > > heat-death-of-the-universe) if the TSCs are offset and if preemption > > > migrates the calling task between CPUs. > > > > > > I suppose a lameo fix would be to disable preemption in delay_tsc(). > > > > preempt_disable is lousy documentation here. This and other cases > > (lots of per_cpu users, IIRC) actually want a migrate_disable() which > > is a proper subset. We can simply implement migrate_disable() as > > preempt_disable() for now and come back later and implement a proper > > migrate_disable() that still allows preemption (and thus avoids the > > latency). > > We could actually do this right now. migrate_disable() can be just changing > the cpu affinity of the current thread to current cpu and then restoring it > afterwards. That should even work from interrupt context. Yes, that's one way. But we need somewhere to stash the old flags. Expanding the task struct sucks. Jamming another bit in the preempt count sucks. But I think we'd be best off stashing a single bit somewhere and checking it at migrate time (relatively infrequent) rather than copying and zeroing out a potentially enormous affinity mask every time we disable migration (often, and in fast paths). Perhaps adding TASK_PINNED to the task state flags would do it? > get_cpu() etc. could be changed to use this then too. Some users of get_cpu might be relying on it to avoid actual preemption. In other words, we should have introduced a migrate_disable() when we first discovered the preempt/per_cpu conflict. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 1:03 ` Matt Mackall @ 2007-11-08 1:20 ` Andi Kleen 2007-11-08 2:44 ` Matt Mackall 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2007-11-08 1:20 UTC (permalink / raw) To: Matt Mackall Cc: Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar > But I think we'd be best off stashing a single bit somewhere and > checking it at migrate time (relatively infrequent) rather than > copying and zeroing out a potentially enormous affinity mask every > time we disable migration (often, and in fast paths). Perhaps adding > TASK_PINNED to the task state flags would do it? It would need to be a count to be able to nest it. > > get_cpu() etc. could be changed to use this then too. > > Some users of get_cpu might be relying on it to avoid actual > preemption. In other words, we should have introduced a > migrate_disable() when we first discovered the preempt/per_cpu > conflict. Ok perhaps it would make sense to migrate it step by step :- define a replacement for get_cpu and migrate over as users are getting audited and eventually deprecate old one. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 1:20 ` Andi Kleen @ 2007-11-08 2:44 ` Matt Mackall 0 siblings, 0 replies; 13+ messages in thread From: Matt Mackall @ 2007-11-08 2:44 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar On Thu, Nov 08, 2007 at 02:20:58AM +0100, Andi Kleen wrote: > > > But I think we'd be best off stashing a single bit somewhere and > > checking it at migrate time (relatively infrequent) rather than > > copying and zeroing out a potentially enormous affinity mask every > > time we disable migration (often, and in fast paths). Perhaps adding > > TASK_PINNED to the task state flags would do it? > > It would need to be a count to be able to nest it. Ahh, right. Suppose that means fattening the task struct until someone comes up with something more clever. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 0:31 ` Andi Kleen 2007-11-08 1:03 ` Matt Mackall @ 2007-11-08 11:46 ` Avi Kivity 2007-11-08 15:10 ` Matt Mackall 1 sibling, 1 reply; 13+ messages in thread From: Avi Kivity @ 2007-11-08 11:46 UTC (permalink / raw) To: Andi Kleen Cc: Matt Mackall, Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar Andi Kleen wrote: > On Thursday 08 November 2007 01:20, Matt Mackall wrote: > >> On Wed, Nov 07, 2007 at 12:30:45PM -0800, Andrew Morton wrote: >> >>> Ow. Yes, from my reading delay_tsc() can return early (or after >>> heat-death-of-the-universe) if the TSCs are offset and if preemption >>> migrates the calling task between CPUs. >>> >>> I suppose a lameo fix would be to disable preemption in delay_tsc(). >>> >> preempt_disable is lousy documentation here. This and other cases >> (lots of per_cpu users, IIRC) actually want a migrate_disable() which >> is a proper subset. We can simply implement migrate_disable() as >> preempt_disable() for now and come back later and implement a proper >> migrate_disable() that still allows preemption (and thus avoids the >> latency). >> > > We could actually do this right now. migrate_disable() can be just changing > the cpu affinity of the current thread to current cpu and then restoring it > afterwards. That should even work from interrupt context. > > get_cpu() etc. could be changed to use this then too. > > What if some other thread calls sched_setaffinity() on the migrate_disable()d cpu? we'd need to detect this to avoid migrate_enable() stomping on sched_setaffinity()'s work. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 11:46 ` Avi Kivity @ 2007-11-08 15:10 ` Matt Mackall 0 siblings, 0 replies; 13+ messages in thread From: Matt Mackall @ 2007-11-08 15:10 UTC (permalink / raw) To: Avi Kivity Cc: Andi Kleen, Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar On Thu, Nov 08, 2007 at 01:46:48PM +0200, Avi Kivity wrote: > Andi Kleen wrote: > >On Thursday 08 November 2007 01:20, Matt Mackall wrote: > > > >>On Wed, Nov 07, 2007 at 12:30:45PM -0800, Andrew Morton wrote: > >> > >>>Ow. Yes, from my reading delay_tsc() can return early (or after > >>>heat-death-of-the-universe) if the TSCs are offset and if preemption > >>>migrates the calling task between CPUs. > >>> > >>>I suppose a lameo fix would be to disable preemption in delay_tsc(). > >>> > >>preempt_disable is lousy documentation here. This and other cases > >>(lots of per_cpu users, IIRC) actually want a migrate_disable() which > >>is a proper subset. We can simply implement migrate_disable() as > >>preempt_disable() for now and come back later and implement a proper > >>migrate_disable() that still allows preemption (and thus avoids the > >>latency). > >> > > > >We could actually do this right now. migrate_disable() can be just changing > >the cpu affinity of the current thread to current cpu and then restoring > >it afterwards. That should even work from interrupt context. > > > >get_cpu() etc. could be changed to use this then too. > > > > > > What if some other thread calls sched_setaffinity() on the > migrate_disable()d cpu? we'd need to detect this to avoid > migrate_enable() stomping on sched_setaffinity()'s work. Yep, there are a bunch of problems with actually touching the affinity mask. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 0:20 ` Matt Mackall 2007-11-08 0:31 ` Andi Kleen @ 2007-11-08 9:11 ` Peter Zijlstra 2007-11-08 15:43 ` Matt Mackall 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2007-11-08 9:11 UTC (permalink / raw) To: Matt Mackall Cc: Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar, Andi Kleen On Wed, 2007-11-07 at 18:20 -0600, Matt Mackall wrote: > This and other cases > (lots of per_cpu users, IIRC) actually want a migrate_disable() which > is a proper subset. The disadvantage of migrate_disable() is that it complicates the load-balancer but more importantly, that it does bring a form of latencies with it that are hard to measure. Using preempt_disable() for these current per-cpu users basically forces them to keep it short. Which is a GOOD (tm) thing. If we go overboard with this migrate_disable() stuff we can end up with a very hard to analyse system that sporadically does weird stuff. So, please, don't start that again. Also see: http://lkml.org/lkml/2007/7/23/338 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-08 9:11 ` Peter Zijlstra @ 2007-11-08 15:43 ` Matt Mackall 0 siblings, 0 replies; 13+ messages in thread From: Matt Mackall @ 2007-11-08 15:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Marin Mitov, linux-kernel, Thomas Gleixner, Ingo Molnar, Andi Kleen On Thu, Nov 08, 2007 at 10:11:21AM +0100, Peter Zijlstra wrote: > On Wed, 2007-11-07 at 18:20 -0600, Matt Mackall wrote: > > > This and other cases > > (lots of per_cpu users, IIRC) actually want a migrate_disable() which > > is a proper subset. > > The disadvantage of migrate_disable() is that it complicates the > load-balancer Hmm, I'm surprised it's more than a one-liner. Something like if (cannot_migrate(task)) continue; But I'm certainly not the expert here. Perhaps this one-liner introduces the "unplannable O(N) overhead" Ingo mentions in the email you referenced. > but more importantly, that it does bring a form of latencies with it > that are hard to measure. Using preempt_disable() for these current > per-cpu users basically forces them to keep it short. Ok, so maybe we've got implementation issues to tackle. But still: a) preempt_disable is inherently misdocumentation (preemption is not the real problem) and b) preemption is a bigger hammer than needed. At the very least, we should introduce migrate_disable as an alias for preempt_disable so we can document intent. At any rate, in the current context, we're talking about actually disabling preempt in a *delay loop*. Let's find a fix for that. > Also see: > http://lkml.org/lkml/2007/7/23/338 Heh, I'd completely forgotten about this thread and thought I was having an original idea! Must have never seen Ingo's followup. Ingo's complaint about it being "a per-task BKL" is interesting. I've occassionally wondered if it makes sense to make some of these primitives take a "documentation parameter" - a pointer to an object that does nothing more than indicate the object of interest. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: is minimum udelay() not respected in preemptible SMP kernel-2.6.23? 2007-11-07 20:30 ` Andrew Morton 2007-11-07 23:10 ` Andi Kleen 2007-11-08 0:20 ` Matt Mackall @ 2007-11-08 11:24 ` Marin Mitov 2 siblings, 0 replies; 13+ messages in thread From: Marin Mitov @ 2007-11-08 11:24 UTC (permalink / raw) To: linux-kernel Hi all, Thanks to all of you answering to my post. On 7.11.2007, Andrew Morton wrote: > Ow. Yes, from my reading delay_tsc() can return early (or after > heat-death-of-the-universe) if the TSCs are offset and if preemption > migrates the calling task between CPUs. > > I suppose a lameo fix would be to disable preemption in delay_tsc(). I have seen the problem in delay_tsc(), but I was pusled by these lines in my dmesg: > Time: acpi_pm clocksource has been installed. > Switched to high resolution mode on CPU 0 > Switched to high resolution mode on CPU 1 I thought (sort of) acpi_pm (but not tsc) is used in udelay(). The same delay_tsc() is used in both arches: i386 & x86_64 (and as I see from the proposed patches in -mm, also for the new x86 arch). Should the patch be applied for all of them? Quite similar function ia64_itc_udelay() is used in IA64, but one find a coment before it: /* * Generic udelay assumes that if preemption is allowed and the thread * migrates to another CPU, that the ITC values are synchronized across * all CPUs. */ Are they really synchronized or similar patch: preempt_disable/enable() should be applied? What about all other arches? Thanks. Marin Mitov ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-11-08 15:45 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-07 17:21 is minimum udelay() not respected in preemptible SMP kernel-2.6.23? Marin Mitov 2007-11-07 20:30 ` Andrew Morton 2007-11-07 23:10 ` Andi Kleen 2007-11-08 0:20 ` Matt Mackall 2007-11-08 0:31 ` Andi Kleen 2007-11-08 1:03 ` Matt Mackall 2007-11-08 1:20 ` Andi Kleen 2007-11-08 2:44 ` Matt Mackall 2007-11-08 11:46 ` Avi Kivity 2007-11-08 15:10 ` Matt Mackall 2007-11-08 9:11 ` Peter Zijlstra 2007-11-08 15:43 ` Matt Mackall 2007-11-08 11:24 ` Marin Mitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox