Runaway cron task on 2.5.63/4 bk?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Runaway cron task on 2.5.63/4 bk?
@ 2003-03-09  7:30 Kevin Brosius
  2003-03-09  8:08 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Brosius @ 2003-03-09  7:30 UTC (permalink / raw)
  To: kernel

Second attempt to send this after not seeing it post after about a day. 
Anyone else have kernel posting problems?

I started seeing the cron task runaway, using 100% CPU continuously on a
single CPU with
2.5.63+bk and now with 2.5.64 (about two weeks now.)  No other
apps/tasks seem to be affected, that I've noticed.  It seems to take
upwards of 8 hours running the kernel for this to occur.

top shows:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  594 root      25   0  1428  620  1364 R    49.9  0.1 195:23 cron

(This is a dual processor Athlon, so CPU0 is at 100% at the moment.) 
This is repeatable.  Leaving the box running overnight, or all day, and
the cron process is running 100% again after several hours.  This does
not occur in prior 2.5 kernels, or in 2.4.19.

Any idea what's causing this?  What additional info on the process would
be helpful?  kernel .config file at
http://kevb.net/files/linux2564_config

-- 
Kevin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-09  7:30 Runaway cron task on 2.5.63/4 bk? Kevin Brosius
@ 2003-03-09  8:08 ` Andrew Morton
  2003-03-09  8:17   ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2003-03-09  8:08 UTC (permalink / raw)
  To: Kevin Brosius; +Cc: linux-kernel

Kevin Brosius <cobra@compuserve.com> wrote:
>
> Second attempt to send this after not seeing it post after about a day. 
> Anyone else have kernel posting problems?
> 
> I started seeing the cron task runaway, using 100% CPU continuously on a
> single CPU with
> 2.5.63+bk and now with 2.5.64 (about two weeks now.)  No other
> apps/tasks seem to be affected, that I've noticed.  It seems to take
> upwards of 8 hours running the kernel for this to occur.
> 
> top shows:
> 
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>   594 root      25   0  1428  620  1364 R    49.9  0.1 195:23 cron
> 

Yes I've seen this four times over maybe three weeks.  Three times on dual
CPU, once on a different UP machine.

In all cases, crond is stuck in a loop calling nanosleep with a tv_sec value
of a bit over 4,000,000 and a tv_nsec value of zero.  nanosleep keeps
returning EINVAL immediately.

I'm not sure why crond is trying to sleep for so long.  Maybe it has set an
alarm.

errr, OK.  This returns -EINVAL:

#include <time.h>

main()
{
	struct timespec req;
	struct timespec rem;
	int ret;

	req.tv_sec = 5000000;
	req.tv_nsec = 0;

	ret = nanosleep(&req, &rem);
	if (ret)
		perror("nanosleep");
}

I shall take a look....


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-09  8:08 ` Andrew Morton
@ 2003-03-09  8:17   ` Andrew Morton
  2003-03-09 16:28     ` [PATCH] " Todd Mokros
  2003-03-10 19:42     ` george anzinger
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2003-03-09  8:17 UTC (permalink / raw)
  To: cobra, linux-kernel, george anzinger

Andrew Morton <akpm@digeo.com> wrote:
>
> errr, OK.  This returns -EINVAL:
> 
> #include <time.h>
> 
> main()
> {
> 	struct timespec req;
> 	struct timespec rem;
> 	int ret;
> 
> 	req.tv_sec = 5000000;
> 	req.tv_nsec = 0;
> 
> 	ret = nanosleep(&req, &rem);
> 	if (ret)
> 		perror("nanosleep");
> }
> 

OK, I give up.

			/*
			 * This is a considered response, not exactly in
			 * line with the standard (in fact it is silent on
			 * possible overflows).  We assume such a large 
			 * value is ALMOST always a programming error and
			 * try not to compound it by setting a really dumb
			 * value.
			 */
			return -EINVAL;

George, RH7.3 and RH8.0 cron daemons are triggering this (trying to sleep
for 4,500,000 seconds) and it causes them to go into a busy loop.

I think we need to just sleep for as long as we can and return an
appropriate partial result.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-09  8:17   ` Andrew Morton
@ 2003-03-09 16:28     ` Todd Mokros
  2003-03-10 19:42     ` george anzinger
  1 sibling, 0 replies; 12+ messages in thread
From: Todd Mokros @ 2003-03-09 16:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cobra, linux-kernel, george anzinger

On Sun, 2003-03-09 at 03:17, Andrew Morton wrote:
> Andrew Morton <akpm@digeo.com> wrote:
> >
> > errr, OK.  This returns -EINVAL:
> > 
> > #include <time.h>
> > 
> > main()
> > {
> > 	struct timespec req;
> > 	struct timespec rem;
> > 	int ret;
> > 
> > 	req.tv_sec = 5000000;
> > 	req.tv_nsec = 0;
> > 
> > 	ret = nanosleep(&req, &rem);
> > 	if (ret)
> > 		perror("nanosleep");
> > }
> > 
> 
> OK, I give up.
> 
> 			/*
> 			 * This is a considered response, not exactly in
> 			 * line with the standard (in fact it is silent on
> 			 * possible overflows).  We assume such a large 
> 			 * value is ALMOST always a programming error and
> 			 * try not to compound it by setting a really dumb
> 			 * value.
> 			 */
> 			return -EINVAL;
> 
> George, RH7.3 and RH8.0 cron daemons are triggering this (trying to sleep
> for 4,500,000 seconds) and it causes them to go into a busy loop.
> 
> I think we need to just sleep for as long as we can and return an
> appropriate partial result.

Cron really isn't at fault, I saw sleep(52) return 4500000, which it
just passed into another sleep call.
The problem is a bug in do_clock_nanosleep. If it gets interrupted by a
signal, when it calculates the amount of time left, it doesn't check if
jiffies has advanced past the expire time, and can pass a negative value
to jiffies_to_timespec, which results in values around 4,500,000
((unsigned int)-1)/HZ, which ends up as sleep's return value.  The
following trivial patch appears to have fixed the problem on my system.
Hopefully this isn't wrapped.


--- 2.5-merge/kernel/posix-timers.c	Sun Mar  9 08:49:11 2003
+++ 2.5-snapshot/kernel/posix-timers.c	Sun Mar  9 08:49:11 2003
@@ -1282,6 +1282,9 @@
 		if (abs)
 			return -ERESTARTNOHAND;
 
+		if (time_after_eq(jiffies_f, new_timer.expires))
+			return 0;
+
 		jiffies_to_timespec(new_timer.expires - jiffies_f, tsave);
 
 		while (tsave->tv_nsec < 0) {



-- 
Todd Mokros <tmokros@neo.rr.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-09  8:17   ` Andrew Morton
  2003-03-09 16:28     ` [PATCH] " Todd Mokros
@ 2003-03-10 19:42     ` george anzinger
  2003-03-10 19:49       ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: george anzinger @ 2003-03-10 19:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cobra, linux-kernel, Linus Torvalds

Andrew Morton wrote:
> Andrew Morton <akpm@digeo.com> wrote:
> 
>>errr, OK.  This returns -EINVAL:
>>
>>#include <time.h>
>>
>>main()
>>{
>>	struct timespec req;
>>	struct timespec rem;
>>	int ret;
>>
>>	req.tv_sec = 5000000;
>>	req.tv_nsec = 0;
>>
>>	ret = nanosleep(&req, &rem);
>>	if (ret)
>>		perror("nanosleep");
>>}
>>
> 
> 
> OK, I give up.
> 
> 			/*
> 			 * This is a considered response, not exactly in
> 			 * line with the standard (in fact it is silent on
> 			 * possible overflows).  We assume such a large 
> 			 * value is ALMOST always a programming error and
> 			 * try not to compound it by setting a really dumb
> 			 * value.
> 			 */
> 			return -EINVAL;
> 
> George, RH7.3 and RH8.0 cron daemons are triggering this (trying to sleep
> for 4,500,000 seconds) and it causes them to go into a busy loop.
> 
> I think we need to just sleep for as long as we can and return an
> appropriate partial result.
> 
> 
Linus has fixed the problem cron showed up, so.

Lets consider this one on its own merits.  What SHOULD sleep do when 
asked to sleep for MAX_INT number of jiffies or more, i.e. when 
jiffies overflows?  My notion, above, it that it is clearly an error. 
  I suppose as HZ gets bigger, this argument will carry less weight, 
but, still:

We have, I think, three choices:
1.) Error out as it does now,
2.) Sleep for MAX_INT and return ?????
3.) Sleep for MAX_INT and then sleep some more until the actual time 
is reached.

2.) Requires, if we are to return other than OK, some way to flag that 
the error happened.

3.) Likewise, requires more bits in the timer.  If we went to a 64-bit 
expire count, we could do the "right" thing, however it adds an int to 
the size of the timer_struct.

So, folks, what is the _right_ thing to do here?

-g

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-10 19:42     ` george anzinger
@ 2003-03-10 19:49       ` Linus Torvalds
  2003-03-10 22:21         ` george anzinger
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2003-03-10 19:49 UTC (permalink / raw)
  To: george anzinger; +Cc: Andrew Morton, cobra, linux-kernel


On Mon, 10 Mar 2003, george anzinger wrote:
> 
> Lets consider this one on its own merits.  What SHOULD sleep do when 
> asked to sleep for MAX_INT number of jiffies or more, i.e. when 
> jiffies overflows?  My notion, above, it that it is clearly an error. 

My suggestion (in order of preference):
 - sleep the max amount, and then restart as if a signal had happened
 - sleep the max amount (old behaviour)
 - consider it an error (new behaviour)

In this case the error case actually helped find the other unrelated bug, 
so in this case the error actually _helped_ us. However, that was only 
"help" from a kernel perspective, from a user perspective I definitely 
think that it makes no sense to have "sleep(largenum)" return -EINVAL.

And in the end it's the user that matters.

		Linus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-10 19:49       ` Linus Torvalds
@ 2003-03-10 22:21         ` george anzinger
  2003-03-10 22:29           ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: george anzinger @ 2003-03-10 22:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, cobra, linux-kernel

Linus Torvalds wrote:
> On Mon, 10 Mar 2003, george anzinger wrote:
> 
>>Lets consider this one on its own merits.  What SHOULD sleep do when 
>>asked to sleep for MAX_INT number of jiffies or more, i.e. when 
>>jiffies overflows?  My notion, above, it that it is clearly an error. 
> 
> 
> My suggestion (in order of preference):
>  - sleep the max amount, and then restart as if a signal had happened

I think this will require a 64-bit expire in the timer_struct 
(actually it would not be treated as such, but the struct would still 
need the added bits).  Is this ok?

I will look at the problem in detail and see if there might be another 
way without the need of the added bits.

>  - sleep the max amount (old behavior)
>  - consider it an error (new behavior)
> 
> In this case the error case actually helped find the other unrelated bug, 
> so in this case the error actually _helped_ us. However, that was only 
> "help" from a kernel perspective, from a user perspective I definitely 
> think that it makes no sense to have "sleep(largenum)" return -EINVAL.
> 
> And in the end it's the user that matters.
> 
Hm...  I changed it to what it is to make it easier to track down 
problems in the test code... and this was user code.  My thinking was 
that such large values are clear errors, and having the code "hang" in 
the sleep just hides the problem.  But then, I NEVER make a system 
call without checking for errors....  And, I was making a LOT of sleep 
calls and wanted to know which one(s) were wrong.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-10 22:21         ` george anzinger
@ 2003-03-10 22:29           ` Andrew Morton
  2003-03-10 22:46             ` george anzinger
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2003-03-10 22:29 UTC (permalink / raw)
  To: george anzinger; +Cc: torvalds, cobra, linux-kernel

george anzinger <george@mvista.com> wrote:
>
> Linus Torvalds wrote:
> > On Mon, 10 Mar 2003, george anzinger wrote:
> > 
> >>Lets consider this one on its own merits.  What SHOULD sleep do when 
> >>asked to sleep for MAX_INT number of jiffies or more, i.e. when 
> >>jiffies overflows?  My notion, above, it that it is clearly an error. 
> > 
> > 
> > My suggestion (in order of preference):
> >  - sleep the max amount, and then restart as if a signal had happened
> 
> I think this will require a 64-bit expire in the timer_struct 
> (actually it would not be treated as such, but the struct would still 
> need the added bits).  Is this ok?
> 
> I will look at the problem in detail and see if there might be another 
> way without the need of the added bits.

Is it not possible to just sit in a loop, sleeping for 0x7fffffff jiffies
on each iteration?  (Until the final partial bit of course)

> Hm...  I changed it to what it is to make it easier to track down 
> problems in the test code... and this was user code.  My thinking was 
> that such large values are clear errors, and having the code "hang" in 
> the sleep just hides the problem.  But then, I NEVER make a system 
> call without checking for errors....  And, I was making a LOT of sleep 
> calls and wanted to know which one(s) were wrong.

If an app wants to sleep forever, calling

	while (1)
		sleep(MAX_INT);

seems like a reasonable approach.  I'd expect quite a lot of applications
would be doing that.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-10 22:29           ` Andrew Morton
@ 2003-03-10 22:46             ` george anzinger
  0 siblings, 0 replies; 12+ messages in thread
From: george anzinger @ 2003-03-10 22:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, cobra, linux-kernel

Andrew Morton wrote:
> george anzinger <george@mvista.com> wrote:
> 
>>Linus Torvalds wrote:
>>
>>>On Mon, 10 Mar 2003, george anzinger wrote:
>>>
>>>
>>>>Lets consider this one on its own merits.  What SHOULD sleep do when 
>>>>asked to sleep for MAX_INT number of jiffies or more, i.e. when 
>>>>jiffies overflows?  My notion, above, it that it is clearly an error. 
>>>
>>>
>>>My suggestion (in order of preference):
>>> - sleep the max amount, and then restart as if a signal had happened
>>
>>I think this will require a 64-bit expire in the timer_struct 
>>(actually it would not be treated as such, but the struct would still 
>>need the added bits).  Is this ok?
>>
>>I will look at the problem in detail and see if there might be another 
>>way without the need of the added bits.
> 
> 
> Is it not possible to just sit in a loop, sleeping for 0x7fffffff jiffies
> on each iteration?  (Until the final partial bit of course)

Seems reasonable.  I will have a look.

-g
> 
> 
>>Hm...  I changed it to what it is to make it easier to track down 
>>problems in the test code... and this was user code.  My thinking was 
>>that such large values are clear errors, and having the code "hang" in 
>>the sleep just hides the problem.  But then, I NEVER make a system 
>>call without checking for errors....  And, I was making a LOT of sleep 
>>calls and wanted to know which one(s) were wrong.
> 
> 
> If an app wants to sleep forever, calling
> 
> 	while (1)
> 		sleep(MAX_INT);
> 
> seems like a reasonable approach.  I'd expect quite a lot of applications
> would be doing that.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runaway cron task on 2.5.63/4 bk?
@ 2003-03-10 23:33 Linus Torvalds
  2003-03-12  3:45 ` [PATCH] " george anzinger
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2003-03-10 23:33 UTC (permalink / raw)
  To: Felipe Alfaro Solana; +Cc: akpm, george, cobra, linux-kernel


On Tue, 11 Mar 2003, Felipe Alfaro Solana wrote:
>  
> why not sleep(0)? 

I think a much more likely (and correct) usage for big sleep values is 
more something like this:

	do_with_timeout(xxx, int timeout)
	{
		struct timespec ts;

		... set up some async event ..
		ts.tv_nsec = 0;
		ts.tv_sec = timeout;
		while (nanosleep(&ts, &ts)) {
			if (async event happened)
				return happy;
		}
		.. tear down the async event if it didn't happen ..
	}

and here the natural thing to do in user space is to just make the "no 
timeout" case be a huge value.

At which point it is a _bug_ in the kernel if we return early with some 
random error code.

		Linus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-10 23:33 Linus Torvalds
@ 2003-03-12  3:45 ` george anzinger
  2003-03-12  4:57   ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: george anzinger @ 2003-03-12  3:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Felipe Alfaro Solana, akpm, cobra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1497 bytes --]

Ok, here is what I have.  I changed nano sleep to use a local 64-bit
value for the target expire time in jiffies.  As much as MAX-INT/2-1
will be put in the timer at any one time. It loops till the target
time is met or exceeded.  The changes affect (clock)nanosleep only and
not timers (they still error out for large values).

I now use the simple u64=(long long) a * b for the mpy so I have 
dropped the sc_math.h stuff (I will bring that round again :).

What do you think?

Oh, the code passes the tests I have, but I have not tried to test for
very large sleep times.
-g


Linus Torvalds wrote:
> On Tue, 11 Mar 2003, Felipe Alfaro Solana wrote:
> 
>> 
>>why not sleep(0)? 
> 
> 
> I think a much more likely (and correct) usage for big sleep values is 
> more something like this:
> 
> 	do_with_timeout(xxx, int timeout)
> 	{
> 		struct timespec ts;
> 
> 		... set up some async event ..
> 		ts.tv_nsec = 0;
> 		ts.tv_sec = timeout;
> 		while (nanosleep(&ts, &ts)) {
> 			if (async event happened)
> 				return happy;
> 		}
> 		.. tear down the async event if it didn't happen ..
> 	}
> 
> and here the natural thing to do in user space is to just make the "no 
> timeout" case be a huge value.
> 
> At which point it is a _bug_ in the kernel if we return early with some 
> random error code.
> 
> 		Linus
> 
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


[-- Attachment #2: hrtimers-large-2.5.64-1.1.patch --]
[-- Type: text/plain, Size: 4976 bytes --]

diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.5.64-kb/include/linux/thread_info.h linux/include/linux/thread_info.h
--- linux-2.5.64-kb/include/linux/thread_info.h	2002-12-11 06:25:32.000000000 -0800
+++ linux/include/linux/thread_info.h	2003-03-10 16:39:52.000000000 -0800
@@ -12,7 +12,7 @@
  */
 struct restart_block {
 	long (*fn)(struct restart_block *);
-	unsigned long arg0, arg1, arg2;
+	unsigned long arg0, arg1, arg2, arg3;
 };
 
 extern long do_no_restart_syscall(struct restart_block *parm);
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.5.64-kb/kernel/posix-timers.c linux/kernel/posix-timers.c
--- linux-2.5.64-kb/kernel/posix-timers.c	2003-03-05 15:10:40.000000000 -0800
+++ linux/kernel/posix-timers.c	2003-03-11 16:51:39.000000000 -0800
@@ -183,7 +183,7 @@
 __initcall(init_posix_timers);
 
 static inline int
-tstojiffie(struct timespec *tp, int res, unsigned long *jiff)
+tstojiffie(struct timespec *tp, int res, u64 *jiff)
 {
 	unsigned long sec = tp->tv_sec;
 	long nsec = tp->tv_nsec + res - 1;
@@ -203,7 +203,7 @@
 	 * below.  Here it is enough to just discard the high order
 	 * bits.  
 	 */
-	*jiff = HZ * sec;
+	*jiff = (u64)sec * HZ;
 	/*
 	 * Do the res thing. (Don't forget the add in the declaration of nsec) 
 	 */
@@ -221,9 +221,12 @@
 static void
 tstotimer(struct itimerspec *time, struct k_itimer *timer)
 {
+	u64 result;
 	int res = posix_clocks[timer->it_clock].res;
-	tstojiffie(&time->it_value, res, &timer->it_timer.expires);
-	tstojiffie(&time->it_interval, res, &timer->it_incr);
+	tstojiffie(&time->it_value, res, &result);
+	timer->it_timer.expires = (unsigned long)result;
+	tstojiffie(&time->it_interval, res, &result);
+	timer->it_incr = (unsigned long)result;
 }
 
 static void
@@ -1020,6 +1023,9 @@
  * Note also that the while loop assures that the sub_jiff_offset
  * will be less than a jiffie, thus no need to normalize the result.
  * Well, not really, if called with ints off :(
+
+ * HELP, this code should make an attempt at resolution beyond the 
+ * jiffie.  Trouble is this is "arch" dependent...
  */
 
 int
@@ -1208,6 +1214,7 @@
 	struct timespec t;
 	struct timer_list new_timer;
 	struct abs_struct abs_struct = { .list = { .next = 0 } };
+	u64 rq_time = 0;
 	int abs;
 	int rtn = 0;
 	int active;
@@ -1226,11 +1233,12 @@
 		 * time and continue.
 		 */
 		restart_block->fn = do_no_restart_syscall;
-		if (!restart_block->arg2)
-			return -EINTR;
 
-		new_timer.expires = restart_block->arg2;
-		if (time_before(new_timer.expires, jiffies))
+		rq_time = restart_block->arg3;
+		rq_time = (rq_time << 32) + restart_block->arg2;
+		if (!rq_time)
+			return -EINTR;
+		if (rq_time <= get_jiffies_64())
 			return 0;
 	}
 
@@ -1243,37 +1251,37 @@
 	}
 	do {
 		t = *tsave;
-		if ((abs || !new_timer.expires) &&
-		    !(rtn = adjust_abs_time(&posix_clocks[which_clock],
-					    &t, abs))) {
-			/*
-			 * On error, we don't set up the timer so
-			 * we don't arm the timer so
-			 * del_timer_sync() will return 0, thus
-			 * active is zero... and so it goes.
-			 */
+		if (abs || !rq_time){
+			adjust_abs_time(&posix_clocks[which_clock], &t, abs);
 
-			tstojiffie(&t,
-				   posix_clocks[which_clock].res,
-				   &new_timer.expires);
+			tstojiffie(&t, posix_clocks[which_clock].res, &rq_time);
 		}
-		if (new_timer.expires) {
-			current->state = TASK_INTERRUPTIBLE;
-			add_timer(&new_timer);
-
-			schedule();
+#if (BITS_PER_LONG < 64)
+		if ((rq_time - get_jiffies_64()) > MAX_JIFFY_OFFSET){
+			new_timer.expires = MAX_JIFFY_OFFSET;
+		}else
+#endif
+		{
+			new_timer.expires = (long)rq_time;
 		}
+		current->state = TASK_INTERRUPTIBLE;
+		add_timer(&new_timer);
+
+		schedule();
 	}
-	while ((active = del_timer_sync(&new_timer)) &&
+	while ((active = del_timer_sync(&new_timer) || 
+		rq_time > get_jiffies_64()) &&
 	       !test_thread_flag(TIF_SIGPENDING));
 
+
 	if (abs_struct.list.next) {
 		spin_lock_irq(&nanosleep_abs_list_lock);
 		list_del(&abs_struct.list);
 		spin_unlock_irq(&nanosleep_abs_list_lock);
 	}
 	if (active) {
-		unsigned long jiffies_f = jiffies;
+		s64 left;
+		unsigned long rmd;
 
 		/*
 		 * Always restart abs calls from scratch to pick up any
@@ -1282,20 +1290,19 @@
 		if (abs)
 			return -ERESTARTNOHAND;
 
-		jiffies_to_timespec(new_timer.expires - jiffies_f, tsave);
+		left = rq_time - get_jiffies_64();
+		if (left < 0)
+			return 0;
+
+		tsave->tv_sec = div_long_long_rem(left, HZ, &rmd);
+		tsave->tv_nsec = rmd * (NSEC_PER_SEC / HZ);
 
-		while (tsave->tv_nsec < 0) {
-			tsave->tv_nsec += NSEC_PER_SEC;
-			tsave->tv_sec--;
-		}
-		if (tsave->tv_sec < 0) {
-			tsave->tv_sec = 0;
-			tsave->tv_nsec = 1;
-		}
 		restart_block->fn = clock_nanosleep_restart;
 		restart_block->arg0 = which_clock;
 		restart_block->arg1 = (unsigned long)tsave;
-		restart_block->arg2 = new_timer.expires;
+		restart_block->arg2 = rq_time & 0xffffffffLL;
+		restart_block->arg3 = rq_time >> 32;
+
 		return -ERESTART_RESTARTBLOCK;
 	}
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-12  3:45 ` [PATCH] " george anzinger
@ 2003-03-12  4:57   ` Andrew Morton
  2003-03-12 10:09     ` george anzinger
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2003-03-12  4:57 UTC (permalink / raw)
  To: george anzinger; +Cc: torvalds, felipe_alfaro, cobra, linux-kernel

george anzinger <george@mvista.com> wrote:
>
> Ok, here is what I have.  I changed nano sleep to use a local 64-bit
> value for the target expire time in jiffies.  As much as MAX-INT/2-1
> will be put in the timer at any one time. It loops till the target
> time is met or exceeded.  The changes affect (clock)nanosleep only and
> not timers (they still error out for large values).

Seem sane.

> I now use the simple u64=(long long) a * b for the mpy so I have 
> dropped the sc_math.h stuff (I will bring that round again :).

Resistance shall be unflagging!

> What do you think?

Sorry, but this little bit:

	while ((active = del_timer_sync(&new_timer) || 
		rq_time > get_jiffies_64()) &&
 	       !test_thread_flag(TIF_SIGPENDING));
 

 	if (abs_struct.list.next) {
 		spin_lock_irq(&nanosleep_abs_list_lock);
 		list_del(&abs_struct.list);
 		spin_unlock_irq(&nanosleep_abs_list_lock);
 	}
 	if (active) {

should be dragged out and mercifully shot.  Is it possible to make that while
loop a little clearer?

The abs_list exactly duplicates the kernel's existing waitqueue
functionality.  You can use prepare_to_wait()/finish_wait() there.

posix_timers_id, posix_clocks[], nanosleep_abs_list_lock and
nanosleep_abs_list should be static to posix-timers.c.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Re: Runaway cron task on 2.5.63/4 bk?
  2003-03-12  4:57   ` Andrew Morton
@ 2003-03-12 10:09     ` george anzinger
  0 siblings, 0 replies; 12+ messages in thread
From: george anzinger @ 2003-03-12 10:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, felipe_alfaro, cobra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1677 bytes --]

Andrew Morton wrote:
> george anzinger <george@mvista.com> wrote:
> 
>>Ok, here is what I have.  I changed nano sleep to use a local 64-bit
>>value for the target expire time in jiffies.  As much as MAX-INT/2-1
>>will be put in the timer at any one time. It loops till the target
>>time is met or exceeded.  The changes affect (clock)nanosleep only and
>>not timers (they still error out for large values).
> 
> 
> Seem sane.
> 
> 
>>I now use the simple u64=(long long) a * b for the mpy so I have 
>>dropped the sc_math.h stuff (I will bring that round again :).
> 
> 
> Resistance shall be unflagging!
> 
> 
>>What do you think?
> 
> 
> Sorry, but this little bit:
> 
> 	while ((active = del_timer_sync(&new_timer) || 
> 		rq_time > get_jiffies_64()) &&
>  	       !test_thread_flag(TIF_SIGPENDING));
>  
> 
>  	if (abs_struct.list.next) {
>  		spin_lock_irq(&nanosleep_abs_list_lock);
>  		list_del(&abs_struct.list);
>  		spin_unlock_irq(&nanosleep_abs_list_lock);
>  	}
>  	if (active) {
> 
> should be dragged out and mercifully shot.  Is it possible to make that while
> loop a little clearer?

I hung it!  It was less of a mess to clean up :)
> 
> The abs_list exactly duplicates the kernel's existing waitqueue
> functionality.  You can use prepare_to_wait()/finish_wait() there.

Well, almost.  Wants to mess with the state, but, try the attached.
> 
> posix_timers_id, posix_clocks[], nanosleep_abs_list_lock and
> nanosleep_abs_list should be static to posix-timers.c.

And a few more :)

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

[-- Attachment #2: hrtimers-large-2.5.64-1.2.patch --]
[-- Type: text/plain, Size: 7736 bytes --]

diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.5.64-kb/include/linux/thread_info.h linux/include/linux/thread_info.h
--- linux-2.5.64-kb/include/linux/thread_info.h	2002-12-11 06:25:32.000000000 -0800
+++ linux/include/linux/thread_info.h	2003-03-10 16:39:52.000000000 -0800
@@ -12,7 +12,7 @@
  */
 struct restart_block {
 	long (*fn)(struct restart_block *);
-	unsigned long arg0, arg1, arg2;
+	unsigned long arg0, arg1, arg2, arg3;
 };
 
 extern long do_no_restart_syscall(struct restart_block *parm);
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.5.64-kb/kernel/posix-timers.c linux/kernel/posix-timers.c
--- linux-2.5.64-kb/kernel/posix-timers.c	2003-03-12 01:57:56.000000000 -0800
+++ linux/kernel/posix-timers.c	2003-03-12 02:04:31.000000000 -0800
@@ -9,7 +9,6 @@
 /* These are all the functions necessary to implement 
  * POSIX clocks & timers
  */
-
 #include <linux/mm.h>
 #include <linux/smp_lock.h>
 #include <linux/interrupt.h>
@@ -23,6 +22,7 @@
 #include <linux/compiler.h>
 #include <linux/idr.h>
 #include <linux/posix-timers.h>
+#include <linux/wait.h>
 
 #ifndef div_long_long_rem
 #include <asm/div64.h>
@@ -56,8 +56,8 @@
    * Lets keep our timers in a slab cache :-)
  */
 static kmem_cache_t *posix_timers_cache;
-struct idr posix_timers_id;
-spinlock_t idr_lock = SPIN_LOCK_UNLOCKED;
+static struct idr posix_timers_id;
+static spinlock_t idr_lock = SPIN_LOCK_UNLOCKED;
 
 /*
  * Just because the timer is not in the timer list does NOT mean it is
@@ -130,7 +130,7 @@
  *	    which we beg off on and pass to do_sys_settimeofday().
  */
 
-struct k_clock posix_clocks[MAX_CLOCKS];
+static struct k_clock posix_clocks[MAX_CLOCKS];
 
 #define if_clock_do(clock_fun, alt_fun,parms)	(! clock_fun)? alt_fun parms :\
 							      clock_fun parms
@@ -183,7 +183,7 @@
 __initcall(init_posix_timers);
 
 static inline int
-tstojiffie(struct timespec *tp, int res, unsigned long *jiff)
+tstojiffie(struct timespec *tp, int res, u64 *jiff)
 {
 	unsigned long sec = tp->tv_sec;
 	long nsec = tp->tv_nsec + res - 1;
@@ -203,7 +203,7 @@
 	 * below.  Here it is enough to just discard the high order
 	 * bits.  
 	 */
-	*jiff = HZ * sec;
+	*jiff = (u64)sec * HZ;
 	/*
 	 * Do the res thing. (Don't forget the add in the declaration of nsec) 
 	 */
@@ -221,9 +221,12 @@
 static void
 tstotimer(struct itimerspec *time, struct k_itimer *timer)
 {
+	u64 result;
 	int res = posix_clocks[timer->it_clock].res;
-	tstojiffie(&time->it_value, res, &timer->it_timer.expires);
-	tstojiffie(&time->it_interval, res, &timer->it_incr);
+	tstojiffie(&time->it_value, res, &result);
+	timer->it_timer.expires = (unsigned long)result;
+	tstojiffie(&time->it_interval, res, &result);
+	timer->it_incr = (unsigned long)result;
 }
 
 static void
@@ -1020,6 +1023,9 @@
  * Note also that the while loop assures that the sub_jiff_offset
  * will be less than a jiffie, thus no need to normalize the result.
  * Well, not really, if called with ints off :(
+
+ * HELP, this code should make an attempt at resolution beyond the 
+ * jiffie.  Trouble is this is "arch" dependent...
  */
 
 int
@@ -1127,26 +1133,14 @@
  * holds (or has held for it) a write_lock_irq( xtime_lock) and is 
  * called from the timer bh code.  Thus we need the irq save locks.
  */
-spinlock_t nanosleep_abs_list_lock = SPIN_LOCK_UNLOCKED;
 
-struct list_head nanosleep_abs_list = LIST_HEAD_INIT(nanosleep_abs_list);
+static DECLARE_WAIT_QUEUE_HEAD(nanosleep_abs_wqueue);
 
-struct abs_struct {
-	struct list_head list;
-	struct task_struct *t;
-};
 
 void
 clock_was_set(void)
 {
-	struct list_head *pos;
-	unsigned long flags;
-
-	spin_lock_irqsave(&nanosleep_abs_list_lock, flags);
-	list_for_each(pos, &nanosleep_abs_list) {
-		wake_up_process(list_entry(pos, struct abs_struct, list)->t);
-	}
-	spin_unlock_irqrestore(&nanosleep_abs_list_lock, flags);
+	wake_up_all(&nanosleep_abs_wqueue);
 }
 
 long clock_nanosleep_restart(struct restart_block *restart_block);
@@ -1201,19 +1195,19 @@
 	return ret;
 
 }
-
 long
 do_clock_nanosleep(clockid_t which_clock, int flags, struct timespec *tsave)
 {
 	struct timespec t;
 	struct timer_list new_timer;
-	struct abs_struct abs_struct = { .list = { .next = 0 } };
+	DECLARE_WAITQUEUE(abs_wqueue, current);
+	u64 rq_time = 0;
+	s64 left;
 	int abs;
-	int rtn = 0;
-	int active;
 	struct restart_block *restart_block =
 	    &current_thread_info()->restart_block;
 
+	abs_wqueue.flags = 0;
 	init_timer(&new_timer);
 	new_timer.expires = 0;
 	new_timer.data = (unsigned long) current;
@@ -1226,54 +1220,50 @@
 		 * time and continue.
 		 */
 		restart_block->fn = do_no_restart_syscall;
-		if (!restart_block->arg2)
-			return -EINTR;
 
-		new_timer.expires = restart_block->arg2;
-		if (time_before(new_timer.expires, jiffies))
+		rq_time = restart_block->arg3;
+		rq_time = (rq_time << 32) + restart_block->arg2;
+		if (!rq_time)
+			return -EINTR;
+		if (rq_time <= get_jiffies_64())
 			return 0;
 	}
 
 	if (abs && (posix_clocks[which_clock].clock_get !=
 		    posix_clocks[CLOCK_MONOTONIC].clock_get)) {
-		spin_lock_irq(&nanosleep_abs_list_lock);
-		list_add(&abs_struct.list, &nanosleep_abs_list);
-		abs_struct.t = current;
-		spin_unlock_irq(&nanosleep_abs_list_lock);
+		add_wait_queue(&nanosleep_abs_wqueue, &abs_wqueue);
 	}
 	do {
 		t = *tsave;
-		if ((abs || !new_timer.expires) &&
-		    !(rtn = adjust_abs_time(&posix_clocks[which_clock],
-					    &t, abs))) {
-			/*
-			 * On error, we don't set up the timer so
-			 * we don't arm the timer so
-			 * del_timer_sync() will return 0, thus
-			 * active is zero... and so it goes.
-			 */
+		if (abs || !rq_time){
+			adjust_abs_time(&posix_clocks[which_clock], &t, abs);
 
-			tstojiffie(&t,
-				   posix_clocks[which_clock].res,
-				   &new_timer.expires);
+			tstojiffie(&t, posix_clocks[which_clock].res, &rq_time);
 		}
-		if (new_timer.expires) {
-			current->state = TASK_INTERRUPTIBLE;
-			add_timer(&new_timer);
-
-			schedule();
+#if (BITS_PER_LONG < 64)
+		if ((rq_time - get_jiffies_64()) > MAX_JIFFY_OFFSET){
+			new_timer.expires = MAX_JIFFY_OFFSET;
+		}else
+#endif
+		{
+			new_timer.expires = (long)rq_time;
 		}
-	}
-	while ((active = del_timer_sync(&new_timer)) &&
-	       !test_thread_flag(TIF_SIGPENDING));
+		current->state = TASK_INTERRUPTIBLE;
+		add_timer(&new_timer);
+
+		schedule();
 
-	if (abs_struct.list.next) {
-		spin_lock_irq(&nanosleep_abs_list_lock);
-		list_del(&abs_struct.list);
-		spin_unlock_irq(&nanosleep_abs_list_lock);
+		del_timer_sync(&new_timer);
+		left = rq_time - get_jiffies_64();
 	}
-	if (active) {
-		long jiffies_left;
+	while ( (left > 0)  &&
+		!test_thread_flag(TIF_SIGPENDING));
+
+	if( abs_wqueue.task_list.next)
+		finish_wait(&nanosleep_abs_wqueue, &abs_wqueue);
+
+	if (left > 0) {
+		unsigned long rmd;
 
 		/*
 		 * Always restart abs calls from scratch to pick up any
@@ -1282,29 +1272,19 @@
 		if (abs)
 			return -ERESTARTNOHAND;
 
-		jiffies_left = new_timer.expires - jiffies;
-
-		if (jiffies_left < 0)
-			return 0;
-
-		jiffies_to_timespec(jiffies_left, tsave);
+		tsave->tv_sec = div_long_long_rem(left, HZ, &rmd);
+		tsave->tv_nsec = rmd * (NSEC_PER_SEC / HZ);
 
-		while (tsave->tv_nsec < 0) {
-			tsave->tv_nsec += NSEC_PER_SEC;
-			tsave->tv_sec--;
-		}
-		if (tsave->tv_sec < 0) {
-			tsave->tv_sec = 0;
-			tsave->tv_nsec = 1;
-		}
 		restart_block->fn = clock_nanosleep_restart;
 		restart_block->arg0 = which_clock;
 		restart_block->arg1 = (unsigned long)tsave;
-		restart_block->arg2 = new_timer.expires;
+		restart_block->arg2 = rq_time & 0xffffffffLL;
+		restart_block->arg3 = rq_time >> 32;
+
 		return -ERESTART_RESTARTBLOCK;
 	}
 
-	return rtn;
+	return 0;
 }
 /*
  * This will restart either clock_nanosleep or clock_nanosleep

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-03-12  9:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-09  7:30 Runaway cron task on 2.5.63/4 bk? Kevin Brosius
2003-03-09  8:08 ` Andrew Morton
2003-03-09  8:17   ` Andrew Morton
2003-03-09 16:28     ` [PATCH] " Todd Mokros
2003-03-10 19:42     ` george anzinger
2003-03-10 19:49       ` Linus Torvalds
2003-03-10 22:21         ` george anzinger
2003-03-10 22:29           ` Andrew Morton
2003-03-10 22:46             ` george anzinger
  -- strict thread matches above, loose matches on Subject: below --
2003-03-10 23:33 Linus Torvalds
2003-03-12  3:45 ` [PATCH] " george anzinger
2003-03-12  4:57   ` Andrew Morton
2003-03-12 10:09     ` george anzinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox