* [RFC] Dynamic Tick and Deferrable Timer Support @ 2009-01-14 20:03 Hunter, Jon 2009-01-15 6:16 ` Andrew Morton 0 siblings, 1 reply; 17+ messages in thread From: Hunter, Jon @ 2009-01-14 20:03 UTC (permalink / raw) To: linux-kernel@vger.kernel.org Hello All, I have been working to maximise the kernel sleep time on an embedded device by utilising the dynamic tick and deferrable timer features. During the course of this work I found that although timers were configured as deferrable, only timers for time interval tv1 were actually being deferred. Reviewing the deferrable timer patch [1], it does appear that the code is written to only defer timers for interval tv1. Therefore, I wanted to ask if this is intentional or not? I have applied the below patch to defer all deferrable timers regardless of interval and so far it is working on the embedded device. I wanted to share this in case this could be something that could be applied to the mainline. Please excuse any foolish mistakes I may have made here as this is my first post to your mailing list. Any feedback you could offer would be appreciated. Cheers Jon [1] Deferrable timer patch: http://marc.info/?l=linux-kernel&m=117512286417320&w=2 Signed-off-by: Jon Hunter <jon-hunter@ti.com> --- kernel/timer.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/kernel/timer.c b/kernel/timer.c index dee3f64..76a3ac6 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -930,6 +930,9 @@ cascade: index = slot = timer_jiffies & TVN_MASK; do { list_for_each_entry(nte, varp->vec + slot, entry) { + if (tbase_get_deferrable(nte->base)) + continue; + found = 1; if (time_before(nte->expires, expires)) expires = nte->expires; -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-14 20:03 [RFC] Dynamic Tick and Deferrable Timer Support Hunter, Jon @ 2009-01-15 6:16 ` Andrew Morton 2009-01-26 18:23 ` Hunter, Jon 0 siblings, 1 reply; 17+ messages in thread From: Andrew Morton @ 2009-01-15 6:16 UTC (permalink / raw) To: Hunter, Jon; +Cc: linux-kernel@vger.kernel.org, Venkatesh Pallipadi On Wed, 14 Jan 2009 14:03:09 -0600 "Hunter, Jon" <jon-hunter@ti.com> wrote: > Hello All, > > I have been working to maximise the kernel sleep time on an embedded device by utilising the dynamic tick and deferrable timer features. > > During the course of this work I found that although timers were configured as deferrable, only timers for time interval tv1 were actually being deferred. Reviewing the deferrable timer patch [1], it does appear that the code is written to only defer timers for interval tv1. Therefore, I wanted to ask if this is intentional or not? > > I have applied the below patch to defer all deferrable timers regardless of interval and so far it is working on the embedded device. I wanted to share this in case this could be something that could be applied to the mainline. > > Please excuse any foolish mistakes I may have made here as this is my first post to your mailing list. Any feedback you could offer would be appreciated. > > Cheers > Jon > > [1] Deferrable timer patch: http://marc.info/?l=linux-kernel&m=117512286417320&w=2 > > > Signed-off-by: Jon Hunter <jon-hunter@ti.com> > --- > kernel/timer.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/kernel/timer.c b/kernel/timer.c > index dee3f64..76a3ac6 100644 > --- a/kernel/timer.c > +++ b/kernel/timer.c > @@ -930,6 +930,9 @@ cascade: > index = slot = timer_jiffies & TVN_MASK; > do { > list_for_each_entry(nte, varp->vec + slot, entry) { > + if (tbase_get_deferrable(nte->base)) > + continue; > + > found = 1; > if (time_before(nte->expires, expires)) > expires = nte->expires; Venki, could you please take a look? ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-15 6:16 ` Andrew Morton @ 2009-01-26 18:23 ` Hunter, Jon 2009-01-26 19:48 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 17+ messages in thread From: Hunter, Jon @ 2009-01-26 18:23 UTC (permalink / raw) To: Andrew Morton, Venkatesh Pallipadi; +Cc: linux-kernel@vger.kernel.org Andrew Morton <mailto:akpm@linux-foundation.org> wrote on Thursday, January 15, 2009 12:16 AM: > Venki, could you please take a look? Andrew, Venki, if it helps here are some more details. The function "__next_timer_interrupt()" (in file kernel/timer.c) consists of two main loops. The first loop (shown below) looks for timer events in tv1. 898 /* Look for timer events in tv1. */ 899 index = slot = timer_jiffies & TVR_MASK; 900 do { 901 list_for_each_entry(nte, base->tv1.vec + slot, entry) { 902 if (tbase_get_deferrable(nte->base)) 903 continue; 904 905 found = 1; 906 expires = nte->expires; 907 /* Look at the cascade bucket(s)? */ 908 if (!index || slot < index) 909 goto cascade; 910 return expires; 911 } 912 slot = (slot + 1) & TVR_MASK; 913 } while (slot != index); You can see from the above code snippet, that if a timer event is found for tv1, then it performs a test to see if this timer event is deferrable (lines 902 and 903). If no timer events are found for tv1, then the code enters a second loop, a for-loop with a nested do-while loop, that looks for timer events in tv2-tv5. 927 for (array = 0; array < 4; array++) { 928 struct tvec *varp = varray[array]; 929 930 index = slot = timer_jiffies & TVN_MASK; 931 do { 932 list_for_each_entry(nte, varp->vec + slot, entry) { 933 found = 1; 934 if (time_before(nte->expires, expires)) 935 expires = nte->expires; 936 } 937 /* 938 * Do we still search for the first timer or are 939 * we looking up the cascade buckets ? 940 */ 941 if (found) { 942 /* Look at the cascade bucket(s)? */ 943 if (!index || slot < index) 944 break; 945 return expires; 946 } 947 slot = (slot + 1) & TVN_MASK; 948 } while (slot != index); 949 950 if (index) 951 timer_jiffies += TVN_SIZE - index; 952 timer_jiffies >>= TVN_BITS; 953 } In the above code, you will see that if a timer event is found there is no test to see if this timer event is deferrable. The patch that I proposed simply adds two lines of code that performs the same test as seen in the tv1 loop to check if a timer is deferrable for tv2-tv5. In other words, add lines 902 and 903, in between lines 932 and 933 in the function __next_timer_interrupt(). Cheers Jon ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-26 18:23 ` Hunter, Jon @ 2009-01-26 19:48 ` Pallipadi, Venkatesh 2009-01-26 21:41 ` Hunter, Jon 0 siblings, 1 reply; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-01-26 19:48 UTC (permalink / raw) To: Hunter, Jon, Andrew Morton; +Cc: linux-kernel@vger.kernel.org >-----Original Message----- >From: Hunter, Jon [mailto:jon-hunter@ti.com] >Sent: Monday, January 26, 2009 10:24 AM >To: Andrew Morton; Pallipadi, Venkatesh >Cc: linux-kernel@vger.kernel.org >Subject: RE: [RFC] Dynamic Tick and Deferrable Timer Support > >Andrew Morton <mailto:akpm@linux-foundation.org> wrote on >Thursday, January 15, 2009 12:16 AM: > >> Venki, could you please take a look? > >Andrew, Venki, if it helps here are some more details. > Jon, I looked at your patch earlier, but I was concerned about few things and wanted to spend some more time on it. So, I did not reply earlier. The potential issues I see: - May be a bit theoritcal, as this may not happen in reality. But, with your change, if all the timers happen to be defrrable, timer wheel never advances and none of the timers expire. Not sure whether we need to handle this cleanly somehow or assume that not all the timers will be deferrable. - Another similar case is when we have more of deferrable timers in the system, if we do not cascade timers from the timer wheel, we may end up spending more time in the higher order timer wheel looking through all the timers, as they are at a higher timer granularity, instead of on the lower order timer wheel which will have timers sorted at a lower granularity. I am not sure whether any of these issues will be a problem in real world or not. But, I think they are something we should be careful about. Thanks, Venki ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-26 19:48 ` Pallipadi, Venkatesh @ 2009-01-26 21:41 ` Hunter, Jon 2009-01-27 18:36 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 17+ messages in thread From: Hunter, Jon @ 2009-01-26 21:41 UTC (permalink / raw) To: Pallipadi, Venkatesh, Andrew Morton; +Cc: linux-kernel@vger.kernel.org Pallipadi, Venkatesh <mailto:venkatesh.pallipadi@intel.com> wrote on Monday, January 26, 2009 1:48 PM: > I looked at your patch earlier, but I was concerned about few > things and wanted to spend some more time on it. So, I did > not reply earlier. No problem, I was not sure how clear my original email was :-) > The potential issues I see: > - May be a bit theoritcal, as this may not happen in reality. > But, with your change, if all the timers happen to be > defrrable, timer wheel never advances and none of the timers > expire. Not sure whether we need to handle this cleanly > somehow or assume that not all the timers will be deferrable. So my understanding is, and please correct me if I am wrong, but as long as there is a timer interrupt then the timer wheel will advanced and all deferred timer functions will get executed. If that is the case then we should always be guaranteed a timer interrupt due to the implementation of the dynamic tick. The dynamic tick defines a maximum sleep period, max_delta_ns, which is a member of the "clock_event_device" structure. This governs the maximum time you could be asleep/idle for. Currently, the variable, "max_delta_ns", is defined as a 32-bit type (long) and for most architectures, if not all, this is configured by calling function "clockevent_delta2ns()". The maximum value that "max_delta_ns" can be assigned by calling clockevent_delta2ns(), is LONG_MAX (0x7fffffff). In nanoseconds the value 0x7fffffff equates to ~2.15 seconds. Hence, the maximum sleep time is ~2.15 seconds and at a minimum we should have at least 1 timer interrupt every ~2.15 seconds. Do you think that this would be sufficient? I am actually thinking about proposing another idea to increase the dynamic range of max_delta_ns to we could sleep for longer than ~2.15 seconds. > - Another similar case is when we have more of deferrable > timers in the system, if we do not cascade timers from the > timer wheel, we may end up spending more time in the higher > order timer wheel looking through all the timers, as they are > at a higher timer granularity, instead of on the lower order > timer wheel which will have timers sorted at a lower granularity. Good point. I don't like the thought spending a lot of time searching through timers. However, on the other hand you could debate that the current implementation of categorising the timer events is designed to make this efficient as possible. So would this be a bad thing? Anyway, you do confirm that the deferrable timer patch was implemented only to defer timers in the tv1 group? > I am not sure whether any of these issues will be a problem > in real world or not. But, I think they are something we > should be careful about. Completely, agree. I have been playing around with this on my setup, but the last thing I would want to do is introduce a bug. Hence, thanks for spending sometime to discuss this. Cheers Jon ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-26 21:41 ` Hunter, Jon @ 2009-01-27 18:36 ` Pallipadi, Venkatesh 2009-01-27 18:45 ` Pallipadi, Venkatesh 2009-04-08 19:20 ` Hunter, Jon 0 siblings, 2 replies; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-01-27 18:36 UTC (permalink / raw) To: Hunter, Jon; +Cc: Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner On Mon, 2009-01-26 at 13:41 -0800, Hunter, Jon wrote: > Pallipadi, Venkatesh <mailto:venkatesh.pallipadi@intel.com> wrote on Monday, January 26, 2009 1:48 PM: > > > I looked at your patch earlier, but I was concerned about few > > things and wanted to spend some more time on it. So, I did > > not reply earlier. > > No problem, I was not sure how clear my original email was :-) > > > The potential issues I see: > > - May be a bit theoritcal, as this may not happen in reality. > > But, with your change, if all the timers happen to be > > defrrable, timer wheel never advances and none of the timers > > expire. Not sure whether we need to handle this cleanly > > somehow or assume that not all the timers will be deferrable. > > So my understanding is, and please correct me if I am wrong, but as long as there is a timer interrupt then the timer wheel will advanced and all deferred timer functions will get executed. If that is the case then we should always be guaranteed a timer interrupt due to the implementation of the dynamic tick. > > The dynamic tick defines a maximum sleep period, max_delta_ns, which is a member of the "clock_event_device" structure. This governs the maximum time you could be asleep/idle for. Currently, the variable, "max_delta_ns", is defined as a 32-bit type (long) and for most architectures, if not all, this is configured by calling function "clockevent_delta2ns()". The maximum value that "max_delta_ns" can be assigned by calling clockevent_delta2ns(), is LONG_MAX (0x7fffffff). In nanoseconds the value 0x7fffffff equates to ~2.15 seconds. Hence, the maximum sleep time is ~2.15 seconds and at a minimum we should have at least 1 timer interrupt every ~2.15 seconds. > > Do you think that this would be sufficient? Agreed. timer interrupt with its max_delta will avoid the situation I was thinking above. > I am actually thinking about proposing another idea to increase the dynamic range of max_delta_ns to we could sleep for longer than ~2.15 seconds. max_delta would depend on the timer in the platform. With HPET this should be much larger than 2.15 secs. > > - Another similar case is when we have more of deferrable > > timers in the system, if we do not cascade timers from the > > timer wheel, we may end up spending more time in the higher > > order timer wheel looking through all the timers, as they are > > at a higher timer granularity, instead of on the lower order > > timer wheel which will have timers sorted at a lower granularity. > > Good point. I don't like the thought spending a lot of time searching through timers. However, on the other hand you could debate that the current implementation of categorising the timer events is designed to make this efficient as possible. So would this be a bad thing? > > Anyway, you do confirm that the deferrable timer patch was implemented only to defer timers in the tv1 group? > Yes. The initial deferrable timer was done only for tv1 group on purpose. To keep things simpler and that would catch most of the "small" timers and avoid them. > > I am not sure whether any of these issues will be a problem > > in real world or not. But, I think they are something we > > should be careful about. > > Completely, agree. I have been playing around with this on my setup, but the last thing I would want to do is introduce a bug. Hence, thanks for spending sometime to discuss this. Ok. Thinking about it a bit more, I think we can push this patch along. Thomas/Andrew, can one of you pick up this patch.. Thanks, Venki Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-27 18:36 ` Pallipadi, Venkatesh @ 2009-01-27 18:45 ` Pallipadi, Venkatesh 2009-01-29 16:29 ` Jon Hunter 2009-04-08 19:20 ` Hunter, Jon 1 sibling, 1 reply; 17+ messages in thread From: Pallipadi, Venkatesh @ 2009-01-27 18:45 UTC (permalink / raw) To: Pallipadi, Venkatesh, Hunter, Jon Cc: Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4527 bytes --] Oops. Sending with correct address this time. >-----Original Message----- >From: linux-kernel-owner@vger.kernel.org >[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of >Pallipadi, Venkatesh >Sent: Tuesday, January 27, 2009 10:36 AM >To: Hunter, Jon >Cc: Andrew Morton; linux-kernel@vger.kernel.org; Thomas Gleixner >Subject: RE: [RFC] Dynamic Tick and Deferrable Timer Support > >On Mon, 2009-01-26 at 13:41 -0800, Hunter, Jon wrote: >> Pallipadi, Venkatesh <mailto:venkatesh.pallipadi@intel.com> >wrote on Monday, January 26, 2009 1:48 PM: >> >> > I looked at your patch earlier, but I was concerned about few >> > things and wanted to spend some more time on it. So, I did >> > not reply earlier. >> >> No problem, I was not sure how clear my original email was :-) >> >> > The potential issues I see: >> > - May be a bit theoritcal, as this may not happen in reality. >> > But, with your change, if all the timers happen to be >> > defrrable, timer wheel never advances and none of the timers >> > expire. Not sure whether we need to handle this cleanly >> > somehow or assume that not all the timers will be deferrable. >> >> So my understanding is, and please correct me if I am wrong, >but as long as there is a timer interrupt then the timer wheel >will advanced and all deferred timer functions will get >executed. If that is the case then we should always be >guaranteed a timer interrupt due to the implementation of the >dynamic tick. >> >> The dynamic tick defines a maximum sleep period, >max_delta_ns, which is a member of the "clock_event_device" >structure. This governs the maximum time you could be >asleep/idle for. Currently, the variable, "max_delta_ns", is >defined as a 32-bit type (long) and for most architectures, if >not all, this is configured by calling function >"clockevent_delta2ns()". The maximum value that "max_delta_ns" >can be assigned by calling clockevent_delta2ns(), is LONG_MAX >(0x7fffffff). In nanoseconds the value 0x7fffffff equates to >~2.15 seconds. Hence, the maximum sleep time is ~2.15 seconds >and at a minimum we should have at least 1 timer interrupt >every ~2.15 seconds. >> >> Do you think that this would be sufficient? > >Agreed. timer interrupt with its max_delta will avoid the situation I >was thinking above. > >> I am actually thinking about proposing another idea to >increase the dynamic range of max_delta_ns to we could sleep >for longer than ~2.15 seconds. > >max_delta would depend on the timer in the platform. With HPET this >should be much larger than 2.15 secs. > >> > - Another similar case is when we have more of deferrable >> > timers in the system, if we do not cascade timers from the >> > timer wheel, we may end up spending more time in the higher >> > order timer wheel looking through all the timers, as they are >> > at a higher timer granularity, instead of on the lower order >> > timer wheel which will have timers sorted at a lower granularity. >> >> Good point. I don't like the thought spending a lot of time >searching through timers. However, on the other hand you could >debate that the current implementation of categorising the >timer events is designed to make this efficient as possible. >So would this be a bad thing? >> >> Anyway, you do confirm that the deferrable timer patch was >implemented only to defer timers in the tv1 group? >> > >Yes. The initial deferrable timer was done only for tv1 group on >purpose. To keep things simpler and that would catch most of >the "small" >timers and avoid them. > >> > I am not sure whether any of these issues will be a problem >> > in real world or not. But, I think they are something we >> > should be careful about. >> >> Completely, agree. I have been playing around with this on >my setup, but the last thing I would want to do is introduce a >bug. Hence, thanks for spending sometime to discuss this. > >Ok. Thinking about it a bit more, I think we can push this >patch along. >Thomas/Andrew, can one of you pick up this patch.. > >Thanks, >Venki > >Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > >-- >To unsubscribe from this list: send the line "unsubscribe >linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ >ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-27 18:45 ` Pallipadi, Venkatesh @ 2009-01-29 16:29 ` Jon Hunter 2009-01-29 17:36 ` john stultz 0 siblings, 1 reply; 17+ messages in thread From: Jon Hunter @ 2009-01-29 16:29 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner Pallipadi, Venkatesh wrote: > max_delta would depend on the timer in the platform. With HPET this > should be much larger than 2.15 secs. So I agree that the HPET hardware in newer devices themselves would allow longer sleep periods. However, this is not the problem I was raising. The problem is that the dynamic tick uses a 32-bit variable, max_delta_ns, to define that max sleep time of a device in nanoseconds. The maximum value that this variable can be assigned is LONG_MAX or 0x7fffffff nanoseconds (see function clockevent_delta2ns). The value 0x7fffffff nanoseconds equates to ~2.15 seconds. Hence, without increasing the dynamic range of max_delta_ns (ie. make this a 64-bit integer) or change the base of this variable from nanoseconds to milliseconds, I don't see how the device will ever sleep for longer than ~2.15 seconds. I have spent several weeks trying to suppress kernel timers using the deferred timers and lengthen the sleep time. I am now able to get the device to sleep for minutes but I found that max_delta_ns is a limiting factor. I will be surprised if you can sleep for longer than ~2.15 seconds with the current implementation. Let me know if this makes sense. > Ok. Thinking about it a bit more, I think we can push this > patch along. > Thomas/Andrew, can one of you pick up this patch.. Great thanks. Jon ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-29 16:29 ` Jon Hunter @ 2009-01-29 17:36 ` john stultz 2009-01-30 19:04 ` Jon Hunter 2009-02-07 9:20 ` Pavel Machek 0 siblings, 2 replies; 17+ messages in thread From: john stultz @ 2009-01-29 17:36 UTC (permalink / raw) To: Jon Hunter Cc: Pallipadi, Venkatesh, Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner On Thu, Jan 29, 2009 at 8:29 AM, Jon Hunter <jon-hunter@ti.com> wrote: > Pallipadi, Venkatesh wrote: > I have spent several weeks trying to suppress kernel timers using the > deferred timers and lengthen the sleep time. I am now able to get the device > to sleep for minutes but I found that max_delta_ns is a limiting factor. I > will be surprised if you can sleep for longer than ~2.15 seconds with the > current implementation. As an aside, there are some further hardware limitations in the timekeeping core that limit the amount of time the hardware can sleep. For instance, the acpi_pm clocksource wraps every 2.5 seconds or so, so we have to wake up periodically to sample it to avoid wrapping issues. Just to be able to deal with all the different hardware out there, the timekeeping core expects to wake up twice a second to do this sampling. It may be possible to push this out if you are using other clocksources (HPET/TSC), but if sleeps for longer then a second are a needed feature, we probably will need some infrastructure in the timekeeping core that can be queried to make sure its safe. thanks -john ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-29 17:36 ` john stultz @ 2009-01-30 19:04 ` Jon Hunter 2009-01-30 20:29 ` john stultz 2009-02-07 9:20 ` Pavel Machek 1 sibling, 1 reply; 17+ messages in thread From: Jon Hunter @ 2009-01-30 19:04 UTC (permalink / raw) To: john stultz Cc: Pallipadi, Venkatesh, Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner john stultz wrote: > As an aside, there are some further hardware limitations in the > timekeeping core that limit the amount of time the hardware can sleep. > For instance, the acpi_pm clocksource wraps every 2.5 seconds or so, > so we have to wake up periodically to sample it to avoid wrapping > issues. > > Just to be able to deal with all the different hardware out there, the > timekeeping core expects to wake up twice a second to do this > sampling. It may be possible to push this out if you are using other > clocksources (HPET/TSC), but if sleeps for longer then a second are a > needed feature, we probably will need some infrastructure in the > timekeeping core that can be queried to make sure its safe. The variable "max_delta_ns" is used by the dynamic tick to govern the maximum time a given device can sleep. Hence, this variable should be configured as necessary for the device you are using. Therefore, if your device has a timer that will wrap every 2.5 seconds, then for this device the "max_delta_ns" should be configure so that it does not exceed this time. So what I was proposing is that for devices that have timers that would allow you to sleep beyond ~2.15 seconds (current max imposed by the clockevent_delta2ns function), why not increase the dynamic range (make this a 64-bit variable) or base (ie. from nanoseconds to milliseconds) to permit longer sleep times for devices that can support them? This should not have any negative impact on devices that cannot support such long sleep times. So far I have not encountered any issues with doing this. Let me know if this does or does not address your concerns. Cheers Jon ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-30 19:04 ` Jon Hunter @ 2009-01-30 20:29 ` john stultz 2009-02-07 9:20 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: john stultz @ 2009-01-30 20:29 UTC (permalink / raw) To: Jon Hunter Cc: Pallipadi, Venkatesh, Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner On Fri, 2009-01-30 at 13:04 -0600, Jon Hunter wrote: > john stultz wrote: > > As an aside, there are some further hardware limitations in the > > timekeeping core that limit the amount of time the hardware can sleep. > > For instance, the acpi_pm clocksource wraps every 2.5 seconds or so, > > so we have to wake up periodically to sample it to avoid wrapping > > issues. > > > > Just to be able to deal with all the different hardware out there, the > > timekeeping core expects to wake up twice a second to do this > > sampling. It may be possible to push this out if you are using other > > clocksources (HPET/TSC), but if sleeps for longer then a second are a > > needed feature, we probably will need some infrastructure in the > > timekeeping core that can be queried to make sure its safe. > > The variable "max_delta_ns" is used by the dynamic tick to govern the > maximum time a given device can sleep. Hence, this variable should be > configured as necessary for the device you are using. Therefore, if your > device has a timer that will wrap every 2.5 seconds, then for this > device the "max_delta_ns" should be configure so that it does not exceed > this time. > > So what I was proposing is that for devices that have timers that would > allow you to sleep beyond ~2.15 seconds (current max imposed by the > clockevent_delta2ns function), why not increase the dynamic range (make > this a 64-bit variable) or base (ie. from nanoseconds to milliseconds) > to permit longer sleep times for devices that can support them? This > should not have any negative impact on devices that cannot support such > long sleep times. No objection to max_delta_ns being increased, but whatever code manages it will probably need to query the timekeeping core in some fashion to make sure the timer hardware max isn't larger then the clocksource hardware max. I've provided a rough sketch at what the timekeeping code would probably look like below. > So far I have not encountered any issues with doing this. Let me know if > this does or does not address your concerns. There may be some other issues here, such as NTP over-correction issues (for instance: ntp trying to correct for a 1us offset over the next second, but ends up applying it for 10 seconds) if we defer for a really long time. But at that point, we might as well suspend to ram, like the OLPC does. thanks -john diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 900f1b6..2cf9ebd 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -265,6 +265,21 @@ int timekeeping_valid_for_hres(void) return ret; } + +u64 timekeeping_max_deferment(void) +{ + u64 max_nsecs + do { + seq = read_seqbegin(&xtime_lock); + + max_nsecs = cyc2ns(clock, clock->mask); + + } while (read_seqretry(&xtime_lock, seq)); + + return max_nsecs; /* XXX maybe reduce by some amount to be safe? */ +} + + /** * read_persistent_clock - Return time in seconds from the persistent clock. * ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-30 20:29 ` john stultz @ 2009-02-07 9:20 ` Pavel Machek 0 siblings, 0 replies; 17+ messages in thread From: Pavel Machek @ 2009-02-07 9:20 UTC (permalink / raw) To: john stultz Cc: Jon Hunter, Pallipadi, Venkatesh, Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner > > So what I was proposing is that for devices that have timers that would > > allow you to sleep beyond ~2.15 seconds (current max imposed by the > > clockevent_delta2ns function), why not increase the dynamic range (make > > this a 64-bit variable) or base (ie. from nanoseconds to milliseconds) > > to permit longer sleep times for devices that can support them? This > > should not have any negative impact on devices that cannot support such > > long sleep times. > > No objection to max_delta_ns being increased, but whatever code manages > it will probably need to query the timekeeping core in some fashion to > make sure the timer hardware max isn't larger then the clocksource > hardware max. I've provided a rough sketch at what the timekeeping code > would probably look like below. > > > So far I have not encountered any issues with doing this. Let me know if > > this does or does not address your concerns. > > There may be some other issues here, such as NTP over-correction issues > (for instance: ntp trying to correct for a 1us offset over the next > second, but ends up applying it for 10 seconds) if we defer for a really > long time. But at that point, we might as well suspend to ram, like the > OLPC does. Well, there's still some way to go before auto-sleep is possible. android can do that, afaict, but on pc it is quite far away. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-29 17:36 ` john stultz 2009-01-30 19:04 ` Jon Hunter @ 2009-02-07 9:20 ` Pavel Machek 2009-02-09 19:10 ` John Stultz 1 sibling, 1 reply; 17+ messages in thread From: Pavel Machek @ 2009-02-07 9:20 UTC (permalink / raw) To: john stultz Cc: Jon Hunter, Pallipadi, Venkatesh, Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner On Thu 2009-01-29 09:36:00, john stultz wrote: > On Thu, Jan 29, 2009 at 8:29 AM, Jon Hunter <jon-hunter@ti.com> wrote: > > Pallipadi, Venkatesh wrote: > > I have spent several weeks trying to suppress kernel timers using the > > deferred timers and lengthen the sleep time. I am now able to get the device > > to sleep for minutes but I found that max_delta_ns is a limiting factor. I > > will be surprised if you can sleep for longer than ~2.15 seconds with the > > current implementation. > > As an aside, there are some further hardware limitations in the > timekeeping core that limit the amount of time the hardware can sleep. > For instance, the acpi_pm clocksource wraps every 2.5 seconds or so, > so we have to wake up periodically to sample it to avoid wrapping > issues. > > Just to be able to deal with all the different hardware out there, the > timekeeping core expects to wake up twice a second to do this > sampling. It may be possible to push this out if you are using other That's strange... I think I seen less than 2 wakeups per second on powertop...? (thinkpad x60, nothing exotic). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-02-07 9:20 ` Pavel Machek @ 2009-02-09 19:10 ` John Stultz 0 siblings, 0 replies; 17+ messages in thread From: John Stultz @ 2009-02-09 19:10 UTC (permalink / raw) To: Pavel Machek Cc: Jon Hunter, Pallipadi, Venkatesh, Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner On Sat, 2009-02-07 at 10:20 +0100, Pavel Machek wrote: > On Thu 2009-01-29 09:36:00, john stultz wrote: > > On Thu, Jan 29, 2009 at 8:29 AM, Jon Hunter <jon-hunter@ti.com> wrote: > > > Pallipadi, Venkatesh wrote: > > > I have spent several weeks trying to suppress kernel timers using the > > > deferred timers and lengthen the sleep time. I am now able to get the device > > > to sleep for minutes but I found that max_delta_ns is a limiting factor. I > > > will be surprised if you can sleep for longer than ~2.15 seconds with the > > > current implementation. > > > > As an aside, there are some further hardware limitations in the > > timekeeping core that limit the amount of time the hardware can sleep. > > For instance, the acpi_pm clocksource wraps every 2.5 seconds or so, > > so we have to wake up periodically to sample it to avoid wrapping > > issues. > > > > Just to be able to deal with all the different hardware out there, the > > timekeeping core expects to wake up twice a second to do this > > sampling. It may be possible to push this out if you are using other > > That's strange... I think I seen less than 2 wakeups per second on > powertop...? (thinkpad x60, nothing exotic). Yea, I don't think there is an interface that the timekeeping code communicates that through. Probably a good idea to get that established before folks try to push further then a second and end up with trouble on hardware with short clocksources. thanks -john ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Dynamic Tick and Deferrable Timer Support 2009-01-27 18:36 ` Pallipadi, Venkatesh 2009-01-27 18:45 ` Pallipadi, Venkatesh @ 2009-04-08 19:20 ` Hunter, Jon 2009-04-08 22:52 ` Andrew Morton 1 sibling, 1 reply; 17+ messages in thread From: Hunter, Jon @ 2009-04-08 19:20 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andrew Morton, linux-kernel@vger.kernel.org, Thomas Gleixner [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2395 bytes --] > -----Original Message----- > From: Pallipadi, Venkatesh [mailto:venkatesh.pallipadi@intel.com] > Sent: Tuesday, January 27, 2009 12:36 PM > To: Hunter, Jon > Cc: Andrew Morton; linux-kernel@vger.kernel.org; Thomas Gleixner > Subject: RE: [RFC] Dynamic Tick and Deferrable Timer Support > > On Mon, 2009-01-26 at 13:41 -0800, Hunter, Jon wrote: > > Pallipadi, Venkatesh <mailto:venkatesh.pallipadi@intel.com> wrote on > Monday, January 26, 2009 1:48 PM: > > > > > I looked at your patch earlier, but I was concerned about few > > > things and wanted to spend some more time on it. So, I did > > > not reply earlier. > > > > No problem, I was not sure how clear my original email was :-) > > > > > The potential issues I see: > > > - May be a bit theoritcal, as this may not happen in reality. > > > But, with your change, if all the timers happen to be > > > defrrable, timer wheel never advances and none of the timers > > > expire. Not sure whether we need to handle this cleanly > > > somehow or assume that not all the timers will be deferrable. > > > > So my understanding is, and please correct me if I am wrong, but as long > as there is a timer interrupt then the timer wheel will advanced and all > deferred timer functions will get executed. If that is the case then we > should always be guaranteed a timer interrupt due to the implementation of > the dynamic tick. > > > > The dynamic tick defines a maximum sleep period, max_delta_ns, which is > a member of the "clock_event_device" structure. This governs the maximum > time you could be asleep/idle for. Currently, the variable, > "max_delta_ns", is defined as a 32-bit type (long) and for most > architectures, if not all, this is configured by calling function > "clockevent_delta2ns()". The maximum value that "max_delta_ns" can be Pallipadi, Venkatesh wrote: > Ok. Thinking about it a bit more, I think we can push this patch along. > Thomas/Andrew, can one of you pick up this patch.. > > Thanks, > Venki > > Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > Hi Andrew, Thomas, Sorry to respond to this old thread, however, I wanted to see if you had any feedback on this patch. Let me know if you would like me to re-post. Cheers Jon ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-04-08 19:20 ` Hunter, Jon @ 2009-04-08 22:52 ` Andrew Morton 2009-04-09 15:02 ` Jon Hunter 0 siblings, 1 reply; 17+ messages in thread From: Andrew Morton @ 2009-04-08 22:52 UTC (permalink / raw) To: Hunter, Jon Cc: Pallipadi, Venkatesh, linux-kernel@vger.kernel.org, Thomas Gleixner On Wed, 8 Apr 2009 14:20:54 -0500 "Hunter, Jon" <jon-hunter@ti.com> wrote: > > > > -----Original Message----- > > From: Pallipadi, Venkatesh [mailto:venkatesh.pallipadi@intel.com] > > Sent: Tuesday, January 27, 2009 12:36 PM prehistory! > Let me know if you would like me to re-post. That would be best. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Dynamic Tick and Deferrable Timer Support 2009-04-08 22:52 ` Andrew Morton @ 2009-04-09 15:02 ` Jon Hunter 0 siblings, 0 replies; 17+ messages in thread From: Jon Hunter @ 2009-04-09 15:02 UTC (permalink / raw) To: Andrew Morton Cc: Pallipadi, Venkatesh, linux-kernel@vger.kernel.org, Thomas Gleixner Andrew Morton wrote: >> Let me know if you would like me to re-post. > > That would be best. Thanks. I re-posted the patch under a more meaningful title [1]. I also asked Venki to ack this one too for consistency. Cheers Jon [1] [PATCH] Allow deferrable timers for intervals tv2-tv5 to be deferred http://marc.info/?l=linux-kernel&m=123928900703751&w=2 ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-04-09 15:04 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-14 20:03 [RFC] Dynamic Tick and Deferrable Timer Support Hunter, Jon 2009-01-15 6:16 ` Andrew Morton 2009-01-26 18:23 ` Hunter, Jon 2009-01-26 19:48 ` Pallipadi, Venkatesh 2009-01-26 21:41 ` Hunter, Jon 2009-01-27 18:36 ` Pallipadi, Venkatesh 2009-01-27 18:45 ` Pallipadi, Venkatesh 2009-01-29 16:29 ` Jon Hunter 2009-01-29 17:36 ` john stultz 2009-01-30 19:04 ` Jon Hunter 2009-01-30 20:29 ` john stultz 2009-02-07 9:20 ` Pavel Machek 2009-02-07 9:20 ` Pavel Machek 2009-02-09 19:10 ` John Stultz 2009-04-08 19:20 ` Hunter, Jon 2009-04-08 22:52 ` Andrew Morton 2009-04-09 15:02 ` Jon Hunter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox