public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] improving O(1)-J9 in heavily threaded situations
@ 2002-02-04 12:32 Ed Tomlinson
  2002-02-04 12:33 ` Arjan van de Ven
  0 siblings, 1 reply; 30+ messages in thread
From: Ed Tomlinson @ 2002-02-04 12:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: arjanv

Arjan van de Ven wrote:

> Ingo Molnar wrote:
>> 
>> On Sun, 3 Feb 2002, Ed Tomlinson wrote:
>> 
>> > One point that seems to get missed is that a group of java threads,
>> > posix threads or sometimes forked processes combine to make an
>> > application. [...]
>> 
>> yes - but what makes them an application is not really the fact that they
>> share the VM (i can very much imagine thread-based login servers where
>> different users use different threads - a single application as well?),
>> but the intention of the application designer, which is hard to guess.
> 
> sharing the same Thread Group ID would be a very obvious quantity to
> check,
> and would very much show the indication of the application author.

I Tried this.  Looks like not all (many?) apps actually use this.

Ed

^ permalink raw reply	[flat|nested] 30+ messages in thread
[parent not found: <Pine.LNX.4.33.0202040627001.22583-100000@localhost.localdomain>]
* Re: [PATCH] improving O(1)-J9 in heavily threaded situations
@ 2002-02-03  6:00 Ed Tomlinson
  0 siblings, 0 replies; 30+ messages in thread
From: Ed Tomlinson @ 2002-02-03  6:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

Found a bug in the patch:

                if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
                        if (!rq->expired_timestamp)
                                rq->expired_timestamp = jiffies;
                        enqueue_task(p, rq->expired);
>>>                     goto out;
                } else
                        enqueue_task(p, rq->active);
        }

        /*
         * Here we stop a single interactive task or a group of tasks sharing
         * a mm_struct (like java threads) from monopolizing the cpu.
         */
        if (p->mm) {
>>>		if (p->mm->mm_hogstamp[smp_processor_id()] != rq->switch_timestamp) {

I added a 'goto out;' after the enqueue_task(p, rq->expired);  Suspect bad
things can happen if I dequeue from the wrong array...  Also changed the test
in 'p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp)' from <
to != as it will be safer when jiffies rolls over.

Ed Tomlinson

On February 2, 2002 10:50 am, Ed Tomlinson wrote:
> The following patch improves the performance of O(1) when under load
> and threaded programs are running.  For this patch, threaded is taken
> to mean processes sharing their vm (p->mm).  While this type of
> programming is usually not optimal it is very common (ie. java).
>
> Under load, with threaded tasks running niced, and a cpu bound task at
> normal priority we quickly notice the load average shooting up.  What
> is happening is all the threads are trying to run.  Due to the nature
> of O(1) all the process in the current array must run to get on to the
> inactive array.  This quickly leads to the higher priority cpu bound
> task starving.  Running the threaded application at nice +19 can help
> but does not solve the base issue.  On UP, I have observed loads of
> between 20 and 40 when this is happening,
>
> I tried various approaches to correcting this.  Neither reducing the
> timeslices or playing the the effective prio of the threaded tasks
> helped much.  What does seems quite effective is to monitor the total
> time _all_ the tasks sharing a vm use.  Once a threshold is exceeded
> we move any tasks in the same group directly to the inactive array
> temporarily increasing their effective prio.
>
> With this change the cpu bound tasks can get over 80% of the cpu.
>
> A couple of comments.  The patch applies its logic to all processes.
> This limits the number of times interactive tasks can requeue.  With
> this in place we can probably drop the expired_timeslice logic.
>
> I see no reason that some of the tunables should not become part of
> the standard scheduler provided they are not adding significant
> overhead.  The THREAD_PENALTY tunable is only used in the tick
> processing logic (and not in the common path) - keeping it tunable
> should not introduce measureable overhead.
>
> Patch against 2.4.17pre7
>
> Comments and feedback welcome,
> Ed Tomlinson
>
> --- linux/include/linux/sched.h.orig	Fri Feb  1 23:19:44 2002
> +++ linux/include/linux/sched.h	Fri Feb  1 23:16:34 2002
> @@ -211,6 +211,8 @@
>  	pgd_t * pgd;
>  	atomic_t mm_users;			/* How many users with user space? */
>  	atomic_t mm_count;			/* How many references to "struct mm_struct" (users
> count as 1) */ +	unsigned long mm_hogstamp[NR_CPUS];	/* timestamp to reset
> the hogslice */ +	unsigned long mm_hogslice[NR_CPUS];	/* when this reaches
> 0, task using this mm are hogs */ int map_count;				/* number of VMAs */
>  	struct rw_semaphore mmap_sem;
>  	spinlock_t page_table_lock;		/* Protects task page tables and mm->rss */
> @@ -486,6 +488,7 @@
>  #define INTERACTIVE_DELTA       3
>  #define MAX_SLEEP_AVG           (2*HZ)
>  #define STARVATION_LIMIT        (2*HZ)
> +#define THREAD_PENALTY		8
>
>  #define USER_PRIO(p)		((p)-MAX_RT_PRIO)
>  #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
> --- linux/kernel/sched.c.orig	Fri Feb  1 23:23:50 2002
> +++ linux/kernel/sched.c	Fri Feb  1 23:12:49 2002
> @@ -41,7 +41,7 @@
>   */
>  struct runqueue {
>  	spinlock_t lock;
> -	unsigned long nr_running, nr_switches, expired_timestamp;
> +	unsigned long nr_running, nr_switches, expired_timestamp,
> switch_timestamp; task_t *curr, *idle;
>  	prio_array_t *active, *expired, arrays[2];
>  	int prev_nr_running[NR_CPUS];
> @@ -614,6 +614,28 @@
> 		} else
>  			enqueue_task(p, rq->active);
>  	}
> +
> +	/*
> +	 * Here we stop a single interactive task or a group of tasks sharing
> +	 * a mm_struct (like java threads) from monopolizing the cpu.
> +	 */
> +	if (p->mm) {
> +		if (p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp) {
> +			p->mm->mm_hogstamp[smp_processor_id()] = rq->switch_timestamp;
> +			p->mm->mm_hogslice[smp_processor_id()] =
> +				NICE_TO_TIMESLICE(p->__nice) * THREAD_PENALTY;
> +		}
> +		if (!p->mm->mm_hogslice[smp_processor_id()]) {
> +			dequeue_task(p, rq->active);
> +			p->need_resched = 1;
> +			p->prio--;
> +        		if (p->prio < MAX_RT_PRIO)
> +                		p->prio = MAX_RT_PRIO;
> +			enqueue_task(p, rq->expired);
> +		} else
> +			p->mm->mm_hogslice[smp_processor_id()]--;
> +	}
> +
>  out:
>  #if CONFIG_SMP
>  	if (!(now % BUSY_REBALANCE_TICK))
> @@ -676,6 +698,7 @@
>  		rq->expired = array;
>  		array = rq->active;
>  		rq->expired_timestamp = 0;
> +		rq->switch_timestamp = jiffies;
>  	}
>
>  	idx = sched_find_first_bit(array->bitmap);

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH] improving O(1)-J9 in heavily threaded situations
@ 2002-02-02 15:50 Ed Tomlinson
  2002-02-03 11:32 ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Ed Tomlinson @ 2002-02-02 15:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

The following patch improves the performance of O(1) when under load 
and threaded programs are running.  For this patch, threaded is taken
to mean processes sharing their vm (p->mm).  While this type of 
programming is usually not optimal it is very common (ie. java).

Under load, with threaded tasks running niced, and a cpu bound task at 
normal priority we quickly notice the load average shooting up.  What 
is happening is all the threads are trying to run.  Due to the nature
of O(1) all the process in the current array must run to get on to the
inactive array.  This quickly leads to the higher priority cpu bound
task starving.  Running the threaded application at nice +19 can help
but does not solve the base issue.  On UP, I have observed loads of
between 20 and 40 when this is happening,

I tried various approaches to correcting this.  Neither reducing the 
timeslices or playing the the effective prio of the threaded tasks
helped much.  What does seems quite effective is to monitor the total 
time _all_ the tasks sharing a vm use.  Once a threshold is exceeded 
we move any tasks in the same group directly to the inactive array
temporarily increasing their effective prio.

With this change the cpu bound tasks can get over 80% of the cpu.

A couple of comments.  The patch applies its logic to all processes.
This limits the number of times interactive tasks can requeue.  With
this in place we can probably drop the expired_timeslice logic.

I see no reason that some of the tunables should not become part of
the standard scheduler provided they are not adding significant 
overhead.  The THREAD_PENALTY tunable is only used in the tick
processing logic (and not in the common path) - keeping it tunable 
should not introduce measureable overhead.

Patch against 2.4.17pre7

Comments and feedback welcome,
Ed Tomlinson

--- linux/include/linux/sched.h.orig	Fri Feb  1 23:19:44 2002
+++ linux/include/linux/sched.h	Fri Feb  1 23:16:34 2002
@@ -211,6 +211,8 @@
 	pgd_t * pgd;
 	atomic_t mm_users;			/* How many users with user space? */
 	atomic_t mm_count;			/* How many references to "struct mm_struct" (users count as 1) */
+	unsigned long mm_hogstamp[NR_CPUS];	/* timestamp to reset the hogslice */
+	unsigned long mm_hogslice[NR_CPUS];	/* when this reaches 0, task using this mm are hogs */
 	int map_count;				/* number of VMAs */
 	struct rw_semaphore mmap_sem;
 	spinlock_t page_table_lock;		/* Protects task page tables and mm->rss */
@@ -486,6 +488,7 @@
 #define INTERACTIVE_DELTA       3
 #define MAX_SLEEP_AVG           (2*HZ)
 #define STARVATION_LIMIT        (2*HZ)
+#define THREAD_PENALTY		8
 
 #define USER_PRIO(p)		((p)-MAX_RT_PRIO)
 #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
--- linux/kernel/sched.c.orig	Fri Feb  1 23:23:50 2002
+++ linux/kernel/sched.c	Fri Feb  1 23:12:49 2002
@@ -41,7 +41,7 @@
  */
 struct runqueue {
 	spinlock_t lock;
-	unsigned long nr_running, nr_switches, expired_timestamp;
+	unsigned long nr_running, nr_switches, expired_timestamp, switch_timestamp;
 	task_t *curr, *idle;
 	prio_array_t *active, *expired, arrays[2];
 	int prev_nr_running[NR_CPUS];
@@ -614,6 +614,28 @@
		} else
 			enqueue_task(p, rq->active);
 	}
+ 
+	/*
+	 * Here we stop a single interactive task or a group of tasks sharing
+	 * a mm_struct (like java threads) from monopolizing the cpu.
+	 */
+	if (p->mm) {
+		if (p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp) {
+			p->mm->mm_hogstamp[smp_processor_id()] = rq->switch_timestamp;
+			p->mm->mm_hogslice[smp_processor_id()] =
+				NICE_TO_TIMESLICE(p->__nice) * THREAD_PENALTY;
+		}
+		if (!p->mm->mm_hogslice[smp_processor_id()]) {
+			dequeue_task(p, rq->active);
+			p->need_resched = 1;
+			p->prio--;
+        		if (p->prio < MAX_RT_PRIO)
+                		p->prio = MAX_RT_PRIO;
+			enqueue_task(p, rq->expired);
+		} else
+			p->mm->mm_hogslice[smp_processor_id()]--;
+	}
+		
 out:
 #if CONFIG_SMP
 	if (!(now % BUSY_REBALANCE_TICK))
@@ -676,6 +698,7 @@
 		rq->expired = array;
 		array = rq->active;
 		rq->expired_timestamp = 0;
+		rq->switch_timestamp = jiffies;
 	}
 
 	idx = sched_find_first_bit(array->bitmap);



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2002-02-10 15:02 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-04 12:32 [PATCH] improving O(1)-J9 in heavily threaded situations Ed Tomlinson
2002-02-04 12:33 ` Arjan van de Ven
     [not found] <Pine.LNX.4.33.0202040627001.22583-100000@localhost.localdomain>
2002-02-04  4:40 ` Ed Tomlinson
2002-02-04 12:30   ` Ingo Molnar
2002-02-04 12:36   ` Ingo Molnar
2002-02-04 10:47     ` Arjan van de Ven
2002-02-04 14:09       ` Ingo Molnar
2002-02-04 12:18     ` Ed Tomlinson
2002-02-04 20:01   ` Jussi Laako
2002-02-04 22:06     ` Ingo Molnar
2002-02-04 22:56       ` Jussi Laako
2002-02-05  0:56         ` Ingo Molnar
2002-02-04 23:32           ` Jussi Laako
2002-02-05  1:34             ` Ingo Molnar
2002-02-04 22:24   ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2002-02-03  6:00 Ed Tomlinson
2002-02-02 15:50 Ed Tomlinson
2002-02-03 11:32 ` Ingo Molnar
2002-02-03 15:46   ` Ed Tomlinson
2002-02-04  0:46     ` Ingo Molnar
2002-02-04 19:52       ` Jussi Laako
2002-02-05  0:59         ` Ingo Molnar
2002-02-04 23:36           ` Jussi Laako
2002-02-05  1:37             ` Ingo Molnar
2002-02-06  0:43               ` Jussi Laako
2002-02-06 12:37                 ` Ingo Molnar
2002-02-07  1:10                   ` Jussi Laako
2002-02-07 22:28                 ` Ingo Molnar
2002-02-10 15:00                   ` Jussi Laako
2002-02-05  1:41             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox