public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [0/3] ARTiS, an asymmetric real-time scheduler
@ 2005-05-03 14:05 Eric Piel
  2005-05-03 14:07 ` [1/3] " Eric Piel
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Piel @ 2005-05-03 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Philippe Marquet, "Christophe Osuna", Christophe.Osuna,
	Julien Soula, Jean-Luc Dekeyser, paul.mckenney

Hello,

I'd like to present "ARTiS", a research project that we developed at the
LIFL, within the HYADES project. The goal was to provide a system which
can be hard real-time and in the same time permit high performance
computing.

The main idea of our approach is to insert asymmetry into an SMP
computer. While the several processors allows us to get HPC, the
asymmetry provides real-time capabilities. In order to guarantee that
the RT tasks can always be scheduled with a low latency, we forbid _all_
the other tasks to take a lock or disable the preemption on some
specified CPUs. Basically, there is a set of "RT CPUs" and a set of
"Non-RT CPUs". When a normal task is executed on a RT CPU and happen to
call either inc_preempt_count() or __local_irq_save(), then it is
automatically migrated to a Non-RT CPU, where the syscall will be
continued. Still, the RT CPUs can execute all computation from any task.
For more detailed description, please have a look at the research papers
available at http://www.lifl.fr/west/artis .

So we currently have an implementation for kernel 2.6.11, available on
the x86 and IA-64 architectures. It's a proof of concept with rough
edges. Even after the code clean-up, I guess some people will scream
when looking at the code, it's not the most beautiful one ever done.
Nevertheless, it works. When ARTiS is activated your system shouldn't
crash more often than usual and tasks scheduled FIFO with priority 99
will get much better latencies than before.

The modifications are mostly situated in the scheduler. The migration
from an RT CPU to a Non-RT CPU is done asynchronously (there is a
special queue which can be read/write without lock). There is also a big
part which tries to enhance the load-balancer wrt the RT and to the
asymmetry. Following to this email, there will be the core patch as well
as the patches for architecture dependent parts (which are rather small).

It worth noting that even if ARTiS was designed for SMP systems, it
works also on a SMT processor (although it seems that for now it's
better not to select the SMT additions to avoid crash). It has been
tested on Pentium HyperThreaded. Then ARTiS can be seen more like just a
firm (or hard if you have hopes) real-time patch that works with any
tasks written for Linux.

The latency measurements which have been conducted on a quadri-processor
IA-64 give results of scheduling always under 105µs instead of 1200µs
for a preemptible kernel. On a Pentium 4 HT, latencies are always under
40µs instead of 14000µs. The measurement tool (realfeel-pfm) that we 
used is available for both x86 and IA-64 architectures. On the latter 
one there is an additional measure of the "kernel latency" which is 
quite interesting too.

In addition, a load-balancer evaluator, lbµ is also available. It's
completely architecture independent but don't hope too much from it, it
just let you always run the same load on the system and collect some
statistic. Apart from this, all needs to be done by the user.

If you want to try it, have a look first at the documentation available
in Documentation/artis.txt . There is also information available on the
research project webpage http://www.lifl.fr/west/artis while you can
follow the development at this page: https://gna.org/projects/artis .

We hope that some people can find interest on this project. Any remark
or comment is very welcome,
Eric Piel



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [1/3] ARTiS, an asymmetric real-time scheduler
  2005-05-03 14:05 [0/3] ARTiS, an asymmetric real-time scheduler Eric Piel
@ 2005-05-03 14:07 ` Eric Piel
  2005-05-03 14:09 ` [2/3] ARTiS, an asymmetric real-time scheduler - x86 Eric Piel
  2005-05-03 14:10 ` [3/3] ARTiS, an asymmetric real-time scheduler - IA-64 Eric Piel
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Piel @ 2005-05-03 14:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Philippe Marquet, Christophe Osuna, Julien Soula,
	Jean-Luc Dekeyser, paul.mckenney

[-- Attachment #1: Type: text/plain, Size: 27 bytes --]

Here is the core of ARTiS.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: artis-2.6.11-20050502-noarch.patch --]
[-- Type: text/x-patch; name="artis-2.6.11-20050502-noarch.patch", Size: 85926 bytes --]

diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/Documentation/artis.txt 2.6.11-artis-cvs/Documentation/artis.txt
--- 2.6.11-pfm/Documentation/artis.txt	1970-01-01 01:00:00.000000000 +0100
+++ 2.6.11-artis-cvs/Documentation/artis.txt	2005-05-02 14:13:18.000000000 +0200
@@ -0,0 +1,243 @@
+ARTiS - an Asymmetric Real-Time Scheduling
+http://www.lifl.fr/west/artis/
+https://gna.org/projects/artis/
+
+Eric Piel <Eric.Piel@lifl.fr>
+Christophe Osuna <Christophe.Osuna@lifl.fr>
+Julien Soula <Julien.Soula@lifl.fr>
+Philippe Marquet <Philippe.Marquet@lifl.fr>
+
+Université des Sciences et Technologies de Lille
+Laboratoire d'Informatique Fondamentale de Lille
+Last update: May 02, 2005
+------------------------------------------------------------------------------
+
+
+Overview
+--------
+
+ARTiS is a patch to the Linux scheduler directed to real-time SMP
+systems. By migrating tasks from real-time CPUs to non-real-time ones
+it can achieve very low latencies. On an 4x IA-64 we target user
+latencies always under 300µs.
+
+Detail descriptions of the implementation and measurements are available as
+research articles at: http://www.lifl.fr/west/artis .
+
+Installation
+------------
+
+ARTiS is provided as a set of Linux kernel patch. They apply against
+the vanilla Linux Kernel, available at http://www.kernel.org . Some
+versions might need to be applied against the Bull version of the
+Linux kernel. Use the command:
+ % patch -p1 < ../the-name-of-the.patch 
+
+ARTiS was designed to run on IA-64 and x86 SMP systems. You can
+compile it into the kernel by selecting:
+
+ [X] File systems -> Pseudo filesystems -> /proc file system support
+ [X] Processor types and features -> Preemptible Kernel
+ [X] Processor types and features ->   Compile the kernel with ARTiS
+ support
+
+Optionally you can select:
+
+ [X] Processor types and features ->     Compile the kernel with ARTiS
+ debugging support
+ [X] Processor types and features ->     Compile the kernel with ARTiS
+ accounting support
+
+You also need to disable the energy saving functions that could put the CPUs
+into a state with a long transition latency. For example, on IA-64 the
+pal_halt optimisation must be disabled, this can be done by appending "nohalt"
+to the kernel command line (in elilo.conf).
+
+System administration
+---------------------
+
+Before you can benefit from the ARTiS enhancements, you will have to
+boot on a Linux kernel including the patches on a SMP computer.
+
+You need to choose an appropriate CPU for the real-time program and
+set affinities accordingly via the /proc file system.
+
+The ARTiS interface provides three files:
+/proc/artis/active : contains 0 or 1 corresponding to the deactivation
+  or activation of ARTiS. It is writable.
+/proc/artis/maskrt : contains a hexadecimal number corresponding to the RT
+  CPU mask. You can change it only when ARTiS is disabled.
+/proc/artis/cpustat : gives various statistics about ARTiS, it is read-only.
+
+EXAMPLE:
+
+This is an x86 system with 4 processors. We will use the RTC interrupt
+for our real-time program, which will run on the 4th processor.
+
+   % cat /proc/interrupts
+              CPU0       CPU1       CPU2       CPU3
+     0:   16320035         14         10         11    IO-APIC-edge  timer
+     1:          8          1          0          0    IO-APIC-edge  i8042
+     2:          0          0          0          0          XT-PIC  cascade
+     8:          2          0          1    2459330    IO-APIC-edge  rtc
+    19:    9930157    1643458    1603777          0   IO-APIC-level  eth0
+    20:        133          0          1          0   IO-APIC-level  aic7xxx
+    21:         29          0          1          0   IO-APIC-level  aic7xxx
+    22:     134403          0          1          0   IO-APIC-level  aic7xxx
+   NMI:          0          0          0          0
+   LOC:   16318912   16318911   16318910   16318835
+   ERR:          0
+   MIS:          0
+   
+We need to redirect all the interrupts except the RTC one to the 3
+first processors (which means 7 as CPU mask):
+
+   % for i in 1 2 19 20 21 22 ; do echo 7 > /proc/irq/$i/smp_affinity ; done
+
+Alternatively, you can use the change_all_irqs.sh script available on
+the ARTiS webpage in the affinity-utils package. Warning, on IA-64, we
+have noticed that some IRQ may not be able to change of mask. You need
+to select another CPU than the one where the IRQ are as a RT CPU.
+
+Here we want to keep IRQ 8 (RTC) for the real-time process on the 4th CPU:
+
+   % echo 8 > /proc/irq/8/smp_affinity
+
+We want the 4th processor to be set as real-time:
+
+   % echo 8 > /proc/artis/maskrt
+
+And of course we must activate ARTiS :-)
+
+   % echo 1 > /proc/artis/active
+
+The active state of ARTiS as well as the maskrt properties can also be
+set on the command line (in elilo.conf) via respectively
+"artis_active" and "artis_maskrt=".
+
+IMPORTANT NOTE (bug workaround): As of 20050502, there seems to be a
+latency problem on SMPs. This bug appears both on x86 and IA-64, even
+when the computer is purely idle and without ARTiS. The problem: there
+is always one (and only one) CPU where the latencies are much higher
+than anywhere else (10x or 100x higher). This CPU seems to be the
+same that receive all the IRQs at boot, so you can detect it by having
+a look at /proc/interrupts. Set this CPU as an NRT CPU, that's all.
+
+
+ARTiS /proc interface
+---------------------
+
+The ARTiS activation is now realized after boot-time via the /proc
+interface.
+
+The following enables ARTiS
+
+   % echo 1 > /proc/artis/active
+
+while the following disable ARTiS
+
+   % echo 0 > /proc/artis/active
+
+
+The file /proc/artis/cpustat contains one line of statistics per CPU
+(a maximum of 8 CPUs). Every line contains in turn four columns with
+the number of migrations initiated on this CPU and some timings useful
+for debugging.
+
+
+The file /proc/artis/maskrt identifies the RT CPUs.
+
+You can specify which CPUs are real-time by writing into this file
+the mask of the CPUs.
+
+For instance the following will set 4th processor as real-time:
+
+   % echo 8 > /proc/artis/maskrt
+
+The following will set 2nd, 3rd and 4th CPUs as real-time:
+
+   % echo e > /proc/artis/maskrt
+
+
+
+
+Real-Time programming with ARTiS
+--------------------------------
+
+You will need to add some instructions to a program to make it
+real-time and to bind it to a CPU. Fortunately this is not a difficult
+task and if your application is POSIX compliant, most of them are
+probably already done. A function like the next one can be used for
+this:
+
+   #include <string.h>
+   #include <sched.h>
+
+   /* Set the current process to real-time priority.
+    * Return 0 on success, -1 on failure */
+   int set_realtime_priority(void)
+   {
+       sched_param schp;
+
+       memset(&schp, 0, sizeof(schp));
+       schp.sched_priority = sched_get_priority_max(SCHED_FIFO);
+
+       return sched_setscheduler(0, SCHED_FIFO, &schp);
+   }
+
+Setting the affinity of a process might not be correctly handled by
+the glibc, so you will have to make a syscall. The following might
+help:
+
+   #include <unistd.h>
+
+   #if defined(__ia64__)
+   #define SYS_sched_setaffinity 1231
+   #define SYS_sched_getaffinity 1232
+   #elif defined(__i386__)
+   #define SYS_sched_setaffinity 241
+   #define SYS_sched_getaffinity 242
+   #else
+   #error Architecture not supported (only ia64 an i386 are)
+   #endif
+   
+   #define sched_setaffinity(arg1, arg2, arg3) \
+           syscall(SYS_sched_setaffinity, arg1, arg2, arg3)
+   #define sched_getaffinity(arg1, arg2, arg3) \
+           syscall(SYS_sched_getaffinity, arg1, arg2, arg3)
+   
+   /* Set the affinity of the current process.
+    * Return 0 on success, -1 on failure */
+   int set_affinity(unsigned long mask)
+   {
+       return sched_setaffinity(0, sizeof(mask), &mask);
+   }
+
+With respect to the example above, our program will need to make the
+following calls somewhere:
+
+   {
+       set_realtime_priority();
+       set_affinity(8);
+   }
+
+Due to the nature of ARTiS, the only constraint compared to a usual
+POSIX RT application is the order of this two function calls. The
+priority set up must always be *before* the affinity set up
+(otherwise, the affinity cannot be guaranted).
+
+Always keep in mind that if you want your task to have obtain firm
+real-time latencies, they will have to be RT0. Such task have the
+maximum RT FIFO priority and are binded to one and only one RT CPU.
+Guaranty can be insured only for one RT0 task per CPU. More RT0 per
+CPU is possible but it is your job to check that they can all be
+executed in the same time and always repect their deadlines.
+
+In case you don't want (or cannot) recompile your application to fit
+those specific requirements for ARTiS, you can still change it to RT0
+from outside. schedtool is probably what you need, cf http://freequaos.host.sk/schedtool/.
+
+Another way to do this is to use the functions provided by libartis, a
+small package available on the ARTiS webpage. If you need a more
+elaborate example, you can have a look at realfeel-pfm, also available
+at the ARTiS webpage.
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/Documentation/kernel-parameters.txt 2.6.11-artis-cvs/Documentation/kernel-parameters.txt
--- 2.6.11-pfm/Documentation/kernel-parameters.txt	2005-03-02 08:38:34.000000000 +0100
+++ 2.6.11-artis-cvs/Documentation/kernel-parameters.txt	2005-03-25 19:47:46.000000000 +0100
@@ -220,6 +220,12 @@ running once the system is up.
 	arcrimi=	[HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
 			Format: <io>,<irq>,<nodeID>
 
+	artis_active    [KNL] Activates ARTiS
+			See also Documentation/artis.txt
+
+	artis_maskrt=   [KNL] Set ARTiS real-time CPUs mask
+			See also Documentation/artis.txt
+
 	ataflop=	[HW,M68k]
 
 	atarimouse=	[HW,MOUSE] Atari Mouse
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/fs/proc/artis.c 2.6.11-artis-cvs/fs/proc/artis.c
--- 2.6.11-pfm/fs/proc/artis.c	1970-01-01 01:00:00.000000000 +0100
+++ 2.6.11-artis-cvs/fs/proc/artis.c	2005-05-02 14:22:14.000000000 +0200
@@ -0,0 +1,254 @@
+#include <linux/config.h>
+#include <linux/errno.h>
+#include <linux/proc_fs.h>
+#include <asm/uaccess.h>
+#include <linux/ctype.h>
+#include <linux/kernel_stat.h>
+#include <linux/artis.h>
+#include <linux/sched.h>
+
+
+/* 
+ * tools functions 
+ */
+
+#define INT_10_SIZE (sizeof(int)*8/3+2/* 1 for the rouding and 1 for the sign */)
+#define LONG_10_SIZE (sizeof(long)*8/3+2/* 1 for the rouding and 1 for the sign */)
+
+int
+artis_copy_from_user_ltrim(const char *ubuf, int ulen, char *kbuf, int klen) {
+	int kl = 0, ul = 0;
+	kbuf[0] = '\0';
+	
+	while ((ul < ulen) && (kl < klen)) {
+		int l, i;
+		if (ulen-ul > klen-kl)
+			l = klen-kl;
+		else
+			l = ulen-ul;
+		copy_from_user(kbuf+kl, ubuf+ul, l);
+		ul += l;
+		kl += l;
+		kbuf[kl] = '\0';
+		for (i = 0; isspace(kbuf[i]); i++)
+			;
+		if (i) {
+			int j;
+			for (j = i; (kbuf[j-i] = kbuf[j]); j++)
+				;
+			kl -= i;
+		}
+	}
+	return ul;
+}
+
+#ifdef CONFIG_ARTIS
+
+static struct proc_dir_entry *proc_artis = NULL;
+
+/*
+ * functions for /proc/artis
+ */
+
+int 
+artis_proc_read_active(char* page, char** start, off_t off, int count, int* eof, void* data) {
+	int len = 0, r;
+
+	/* 0 or neg if inactive, number is number of task in RT-Q */
+	if (artis_info.active)
+		r = 1+atomic_read(&(artis_info.size));
+	else
+		r = -atomic_read(&(artis_info.size));
+	len = snprintf(page+off, count, "%d\n", r);
+	if (len > count)
+		return -EFAULT;
+	return len;
+}
+
+int 
+artis_proc_write_active(struct file *file, const char *buffer, unsigned long count, void *data) {
+	int len;
+#define MAXBUF INT_10_SIZE+100
+	char kbuf[MAXBUF+1];
+	char *endp;
+	int val;
+
+	len = artis_copy_from_user_ltrim(buffer, count, kbuf, MAXBUF);
+	val = simple_strtol(kbuf, &endp, 0);
+	if (*endp && !isspace(*endp))
+		return -EINVAL;
+
+	spin_lock(&artis_info.lock);
+	artis_info.active = (val > 0);
+	spin_unlock(&artis_info.lock);
+
+	return len;
+#undef MAXBUF
+}
+
+int 
+artis_proc_read_maskrt(char* page, char** start, off_t off, int count, int* eof, void* data) {
+	int len;
+
+	spin_lock(&artis_info.lock);
+	
+	len = cpumask_scnprintf(page, count - 2, artis_info.cpus_rt);
+	
+	if (count - len < 2) {
+		len = -EFAULT;
+		goto ret;
+	}
+	len += sprintf(page + len, "\n");
+ret:
+	spin_unlock(&artis_info.lock);
+	return len;
+}
+
+/* only work for less than 8*sizeof(long) CPUs */
+int 
+artis_proc_write_maskrt(struct file *file, const char *buffer, unsigned long count, void *data) {
+	int len = count, err;
+	cpumask_t mask;
+
+	spin_lock(&artis_info.lock);
+	if (artis_info.active) {
+		len = -EPERM;
+		goto ret ;
+	}
+
+	err = cpumask_parse(buffer, count, mask);
+	if (err) {
+		len = -EINVAL;
+		goto ret;
+	}
+
+	/* flush pending migrations */
+	/* XXX OK, this active wait is not very pretty, but this should not
+	 * occur frequently, so wo cares ? (not me obviously) */
+	while (atomic_read(&artis_info.size))
+		;
+
+	if (artis_reinit(mask)<0)
+		len = -EINVAL;
+ret:
+	spin_unlock(&artis_info.lock);
+	return len;
+#undef MAXBUF
+}
+
+
+#ifdef CONFIG_ARTIS_STAT
+int
+artis_proc_read_cpustat(char* page, char** start, off_t off, int count, int* eof, void* data) { 
+	int len, i;
+	artis_per_cpu_info_t *artis_cpu = NULL;
+
+	spin_lock(&artis_info.lock);
+	len = 0;
+	for(i = 0; i < NR_CPUS; i++) {
+		artis_cpu = cpu_artis(i);
+		len += snprintf(page+off+len, count-len,
+			"cpu%d %lu %lu %lu %lu %lu\n",
+			i,
+			artis_cpu->nb_migration,
+			artis_cpu->migration_delta[0],
+			artis_cpu->migration_delta[1],
+			artis_cpu->migration_delta[2],
+			kstat_cpu(i).cpustat.rt);
+		if (len > count) {
+			len = -EFAULT;
+			goto ret;
+		}
+	}
+ret:
+	spin_unlock(&artis_info.lock);
+	return len;
+}
+#endif
+
+/*
+ * Create /proc/artis struct
+ */
+int
+artis_proc_init(void) {
+	struct proc_dir_entry *proc_active, *proc_maskrt;
+#ifdef CONFIG_ARTIS_STAT
+	struct proc_dir_entry *proc_cpustat;
+#endif
+	if (!(proc_artis = proc_mkdir("artis", NULL))) {
+		printk("ARTIS error: create /proc/artis\n");
+		goto err;
+	}
+	if (!(proc_active = create_proc_entry("active", 0644, proc_artis))) {
+		printk("ARTIS error: create /proc/artis/active\n");
+		goto err1;
+	}
+	proc_active->read_proc = artis_proc_read_active;
+	proc_active->write_proc = artis_proc_write_active;
+	if (!(proc_maskrt = create_proc_entry("maskrt", 0644, proc_artis))) {
+		printk("ARTIS error: create /proc/artis/mask\n");
+		goto err2;
+	}
+	proc_maskrt->read_proc = artis_proc_read_maskrt;
+	proc_maskrt->write_proc = artis_proc_write_maskrt;
+#ifdef CONFIG_ARTIS_STAT
+	if (!(proc_cpustat = create_proc_read_entry("cpustat", 0444, proc_artis, artis_proc_read_cpustat, NULL))) {
+		printk("ARTIS error: create /proc/artis/cupstat\n");
+		goto err3;
+	}
+#endif
+	return 0;
+
+#ifdef CONFIG_ARTIS_STAT
+err3:
+	remove_proc_entry("cpustat", proc_artis);
+#endif
+	remove_proc_entry("maskrt", proc_artis);
+err2:
+	remove_proc_entry("active", proc_artis);
+err1:
+	remove_proc_entry("artis", NULL);
+err:
+	return -1;
+}
+
+int
+artis_proc_reset(void) {
+	remove_proc_entry("active", proc_artis);
+	remove_proc_entry("maskrt", proc_artis);
+	remove_proc_entry("cpustat", proc_artis);
+	remove_proc_entry("artis", NULL);
+	return 0;
+}
+
+/*
+ * functions for the /proc/<pid>/artis
+ */
+
+int 
+proc_pid_artis(struct task_struct *task, char *buffer)
+{
+	artis_task_status_t *artis_task;
+	artis_per_cpu_info_t *artis_cpu = NULL;
+	int len;
+
+	artis_migration_disable();
+	preempt_disable();
+	artis_task = &task->artis_status;
+	artis_cpu = cpu_artis(task->thread_info->cpu);
+	len = sprintf(buffer, 
+			"%lu %lx %lu",
+			artis_task->nb_migration, 
+			artis_flag(task),
+			artis_cpu->nb_migration
+			);
+	len += sprintf(buffer+len, " %llu %llu",
+		        artis_task->last_attempt,
+		        artis_task->attempt_period);
+	len += sprintf(buffer+len, "\n");
+	artis_migration_enable();
+	preempt_enable();
+	return len;
+}
+
+#endif
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/fs/proc/base.c 2.6.11-artis-cvs/fs/proc/base.c
--- 2.6.11-pfm/fs/proc/base.c	2005-03-02 08:38:12.000000000 +0100
+++ 2.6.11-artis-cvs/fs/proc/base.c	2005-03-25 19:47:46.000000000 +0100
@@ -32,6 +32,7 @@
 #include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/ptrace.h>
+#include <linux/artis.h>
 #include "internal.h"
 
 /*
@@ -74,6 +75,9 @@ enum pid_directory_inos {
 #ifdef CONFIG_AUDITSYSCALL
 	PROC_TGID_LOGINUID,
 #endif
+#ifdef CONFIG_ARTIS
+	PROC_TGID_ARTIS,
+#endif
 	PROC_TGID_FD_DIR,
 	PROC_TGID_OOM_SCORE,
 	PROC_TGID_OOM_ADJUST,
@@ -105,6 +109,9 @@ enum pid_directory_inos {
 #ifdef CONFIG_AUDITSYSCALL
 	PROC_TID_LOGINUID,
 #endif
+#ifdef CONFIG_ARTIS
+	PROC_TID_ARTIS,
+#endif
 	PROC_TID_FD_DIR = 0x8000,	/* 0x8000-0xffff */
 	PROC_TID_OOM_SCORE,
 	PROC_TID_OOM_ADJUST,
@@ -148,6 +155,9 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_AUDITSYSCALL
 	E(PROC_TGID_LOGINUID, "loginuid", S_IFREG|S_IWUSR|S_IRUGO),
 #endif
+#ifdef CONFIG_ARTIS
+	E(PROC_TGID_ARTIS,     "artis",   S_IFREG|S_IRUGO),
+#endif
 	{0,0,NULL,0}
 };
 static struct pid_entry tid_base_stuff[] = {
@@ -178,6 +188,9 @@ static struct pid_entry tid_base_stuff[]
 #ifdef CONFIG_AUDITSYSCALL
 	E(PROC_TID_LOGINUID, "loginuid", S_IFREG|S_IWUSR|S_IRUGO),
 #endif
+#ifdef CONFIG_ARTIS
+	E(PROC_TID_ARTIS,     "artis",   S_IFREG|S_IRUGO),
+#endif
 	{0,0,NULL,0}
 };
 
@@ -1476,6 +1489,13 @@ static struct dentry *proc_pident_lookup
 			ei->op.proc_read = proc_pid_wchan;
 			break;
 #endif
+#ifdef CONFIG_ARTIS
+		case PROC_TID_ARTIS:
+		case PROC_TGID_ARTIS:
+			inode->i_fop = &proc_info_file_operations;
+			ei->op.proc_read = proc_pid_artis;
+			break;
+#endif
 #ifdef CONFIG_SCHEDSTATS
 		case PROC_TID_SCHEDSTAT:
 		case PROC_TGID_SCHEDSTAT:
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/fs/proc/Makefile 2.6.11-artis-cvs/fs/proc/Makefile
--- 2.6.11-pfm/fs/proc/Makefile	2005-03-02 08:37:47.000000000 +0100
+++ 2.6.11-artis-cvs/fs/proc/Makefile	2005-03-25 19:47:46.000000000 +0100
@@ -8,7 +8,7 @@ proc-y			:= nommu.o task_nommu.o
 proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o
 
 proc-y       += inode.o root.o base.o generic.o array.o \
-		kmsg.o proc_tty.o proc_misc.o
+		kmsg.o proc_tty.o proc_misc.o artis.o
 
 proc-$(CONFIG_PROC_KCORE)	+= kcore.o
 proc-$(CONFIG_PROC_DEVICETREE)	+= proc_devtree.o
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/linux/artis.h 2.6.11-artis-cvs/include/linux/artis.h
--- 2.6.11-pfm/include/linux/artis.h	1970-01-01 01:00:00.000000000 +0100
+++ 2.6.11-artis-cvs/include/linux/artis.h	2005-04-22 18:08:19.000000000 +0200
@@ -0,0 +1,273 @@
+/*
+ * ARTiS support
+ *
+ */
+
+#ifndef _ARTIS_ARTIS_H
+#define	_ARTIS_ARTIS_H
+
+#include <linux/config.h>
+#include <linux/spinlock.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/init.h>
+
+#include <linux/artis-macros.h>
+
+#if CONFIG_ARTIS
+
+#if defined(CONFIG_ARTIS_STAT) && defined(CONFIG_LTT)
+extern int artis_ltt_req1;
+extern int artis_ltt_req2;
+extern int artis_ltt_complete;
+extern int artis_ltt_fetch;
+#endif
+
+/* There is a migration queue between each RT and NRT CPU. The queue used to
+ * be a FIFO, but since the introduction of the optimized
+ * kernel/sched.c:artis_valois_pull_all() it is no longer the case: the
+ * very last task appened in the migration queue will be the first pulled
+ * from the migration queue. Still, this is not a big issue.
+ */
+
+
+/* Node of the migration queue */
+typedef struct artis_fifo_node {
+  struct task_struct *task;
+  struct artis_fifo_node *next;
+} artis_fifo_node_t;
+
+#define ARTIS_FIFO_NODE_INIT {    \
+        .task = NULL,        \
+        .next = NULL,        \
+}
+#define SET_ARTIS_FIFO_NODE_INIT(v) v = ARTIS_FIFO_NODE_INIT,
+
+extern kmem_cache_t *artis_fifo_node_cachep;
+#define alloc_artis_fifo_node(task) \
+	kmem_cache_alloc(artis_fifo_node_cachep, GFP_KERNEL)
+#define free_artis_fifo_node(node) \
+	kmem_cache_free(artis_fifo_node_cachep, (node))
+
+#ifdef CONFIG_ARTIS_DEBUG
+#define ARTIS_BT_SIZE 7
+typedef struct artis_stack {
+	unsigned long time;
+	void *bt[ARTIS_BT_SIZE];
+} artis_stack_t;
+
+#define ARTIS_STACK_INIT {	\
+	.time = 0,		\
+}
+#define SET_ARTIS_STACK_INIT(v) v = ARTIS_STACK_INIT,
+#endif
+
+typedef struct artis_local_task_status {
+	struct task_struct *next;	/* next task in the linked list */
+	cpumask_t cpus_allowed_bak;	/* backup of task cpus_allowed */
+	long cpu;			/* cpu from */
+#ifdef CONFIG_ARTIS_STAT
+        /* some timings of code of migration:
+         * _ migration_delta[0]: artis_request_for_migration() call
+         * _ migration_delta[1]: before the schedule() that will initiate the migration
+         * _ migration_delta[2]: artis_complete_migration() call
+         * _ migration_delta[3]: before the schedule() on the NRT CPU
+         */
+	unsigned long migration_delta[4];
+#endif
+#ifdef CONFIG_ARTIS_DEBUG
+	unsigned long complete_by_at;	/* time when being completed */
+	struct task_struct *complete_by; /* task that is doing the complete (not the completed task) */
+	artis_stack_t request;		/* info on the task doing the request */
+#endif
+} artis_local_task_status_t; 
+
+#if defined(CONFIG_ARTIS_STAT) && defined(CONFIG_ARTIS_DEBUG)
+#define ARTIS_LOCAL_TASK_STATUS_INIT {	\
+	.migration_delta = {0,0,0,0},	\
+	.complete_by_at = 0,		\
+	.complete_by = NULL,		\
+	ARTIS_STACK_INIT,		\
+}
+
+#elif defined(CONFIG_ARTIS_STAT)
+#define ARTIS_LOCAL_TASK_STATUS_INIT {	\
+	.migration_delta = {0,0,0,0},	\
+}
+
+#elif defined(CONFIG_ARTIS_DEBUG)
+#define ARTIS_LOCAL_TASK_STATUS_INIT {	\
+	.complete_by_at = 0,		\
+	.complete_by = NULL,		\
+	ARTIS_STACK_INIT,		\
+}
+
+#else
+#define ARTIS_LOCAL_TASK_STATUS_INIT {	\
+}
+#endif
+
+#define SET_ARTIS_LOCAL_TASK_STATUS_INIT(v) v = ARTIS_LOCAL_TASK_STATUS_INIT,
+
+/* Structure for ARTiS migration. This is added to task_struct. */
+typedef struct artis_task_status
+{
+	unsigned long flag;		/* step of migration */
+	unsigned long nb_migration;
+	int migration_count;		/* ARTiS lock depth */
+	artis_fifo_node_t *fifo_node;   /* node of the migration queue containing the task */
+	unsigned long long last_attempt;	/* last attempt to migrate for the task */
+	unsigned long long attempt_period;	/* time ponderated average period between attempt to migrate */
+	artis_local_task_status_t *local_status;
+#ifdef CONFIG_ARTIS_DEBUG
+	unsigned long complete_to_at;	/* time when completing */
+	struct task_struct *complete_to;/* task which is being completed */
+	artis_stack_t complete;		/* info on the task doing the complete (not the completed task) */
+#endif
+} artis_task_status_t;
+
+#ifdef CONFIG_ARTIS_DEBUG
+#define ARTIS_TASK_STATUS_INIT {	\
+	.flag = 0,			\
+	.nb_migration = 0,		\
+	.migration_count = 0,		\
+        .fifo_node = NULL,              \
+	.local_status = NULL,		\
+	.complete_to_at = 0,		\
+	.complete_to = NULL,		\
+      ARTIS_STACK_INIT,			\
+}
+#else
+
+#define ARTIS_TASK_STATUS_INIT {	\
+	.flag = 0,			\
+	.nb_migration = 0,		\
+	.migration_count = 0,		\
+	.last_attempt = 0,		\
+	.attempt_period = 0,		\
+	.fifo_node = NULL,              \
+	.local_status = NULL,		\
+}
+#endif
+
+#define SET_ARTIS_TASK_STATUS_INIT(v) v = ARTIS_TASK_STATUS_INIT,
+
+/* 
+ * Migration queue
+ * There will be one between each RT and NRT processors, or if 
+ * load-balancing is activated, there is 2 between each processors.
+ */
+typedef struct artis_migration_queue
+{
+	atomic_t size;				/* size of the linked list */
+	artis_fifo_node_t *head, *tail;		/* edjes of the list */
+} artis_migration_queue_t;
+
+#define ARTIS_MIGRATION_QUEUE_INIT { 		\
+	.size = ATOMIC_INIT(0),		       	\
+	.head = NULL,                           \
+	.tail = NULL,                           \
+}
+#define SET_ARTIS_MIGRATION_QUEUE_INIT(v) v = ARTIS_MIGRATION_QUEUE_INIT,
+
+typedef struct artis_per_cpu_info
+{
+	artis_migration_queue_t ***queues; /* pointer to shared [RT][NRT] queues matrix */
+	unsigned int cpu_type;		/* ARTIS_RT_CPU or ARTIS_NRT_CPU */
+	unsigned long nb_migration;	/* number of migrations initiated on this (RT) CPU */
+	atomic_t fetch_size;		/* number of tasks waiting for a fetch on this (NRT) CPU */
+#ifdef CONFIG_ARTIS_STAT
+	/* Biggest timings on this CPU: 
+	 * _ migration_delta[0]: time between artis_request_for_migration()
+	 * 	and schedule()
+	 * _ migration_delta[1]: time between schedule() and
+	 * 	artis_complete_migration()
+	 * _ migration_delta[2]: time between artis_complete_migration() and
+	 * 	schedule()
+	 */
+	unsigned long migration_delta[3];
+#endif
+} artis_per_cpu_info_t;
+
+#if defined(CONFIG_ARTIS_STAT) && defined(CONFIG_ARTIS_DEBUG)
+#define ARTIS_PER_CPU_INFO_INIT {	\
+	.queues = NULL, 		\
+	.cpu_type = 0,			\
+	.nb_migration = 0,		\
+	.fetch_size = ATOMIC_INIT(0),	\
+	.migration_delta = {0,0,0},	\
+}
+
+#elif defined(CONFIG_ARTIS_STAT)
+#define ARTIS_PER_CPU_INFO_INIT {	\
+	.queues = NULL, 		\
+	.cpu_type = 0,			\
+	.nb_migration = 0,		\
+	.fetch_size = ATOMIC_INIT(0),	\
+	.migration_delta = {0,0,0},	\
+}
+
+#elif defined(CONFIG_ARTIS_DEBUG)
+#define ARTIS_PER_CPU_INFO_INIT {	\
+	.queues = NULL, 		\
+	.cpu_type = 0,			\
+	.nb_migration = 0,		\
+	.fetch_size = ATOMIC_INIT(0),	\
+}
+
+#else
+#define ARTIS_PER_CPU_INFO_INIT {	\
+	.queues = NULL, 		\
+	.cpu_type = 0,			\
+	.nb_migration = 0,		\
+	.fetch_size = ATOMIC_INIT(0),	\
+}
+
+#endif
+
+#define SET_ARTIS_PER_CPU_INFO_INIT(v) v = ARTIS_PER_CPU_INFO_INIT,
+DECLARE_PER_CPU(artis_per_cpu_info_t, artis_percpu);
+
+typedef struct artis_info {
+	int active;		/* is ARTiS active? */
+	cpumask_t cpus_rt;	/* mask of RT CPUs */
+        cpumask_t cpus_nrt;	/* mask of NRT CPUs */
+	atomic_t size;		/* total number of tasks in all the fifo's */
+	spinlock_t lock;
+} artis_info_t;
+
+#define ARTIS_INFO_INIT {		\
+	.active = 0,			\
+	.cpus_rt = CPU_MASK_NONE,	\
+	.cpus_nrt = CPU_MASK_ALL,	\
+	.size = ATOMIC_INIT(0),		\
+	.lock = SPIN_LOCK_UNLOCKED,	\
+}
+#define SET_ARTIS_INFO_INIT(v) v = ARTIS_INFO_INIT,
+
+extern artis_info_t artis_info;
+
+void artis_request_for_migration(void);
+int artis_complete_migration(struct task_struct *task);
+void artis_fetch_from_migration(int);
+void artis_check_dummyfifo(int rt);
+int proc_pid_artis(struct task_struct *task, char *buffer);
+int artis_proc_init(void);
+int artis_proc_reset(void);
+int artis_reinit(cpumask_t);
+
+#ifdef CONFIG_ARTIS_DEBUG
+void artis_put_trace(void **, struct task_struct *, unsigned long *);
+#endif
+
+#else  /* CONFIG_ARTIS */
+
+#define SET_ARTIS_LOCAL_TASK_STATUS_INIT(v)
+#define SET_ARTIS_STACK_INIT(v)
+#define SET_ARTIS_TASK_STATUS_INIT(v)
+#define SET_ARTIS_MIGRATION_QUEUE_INIT(v)
+#define SET_ARTIS_PER_CPU_INFO_INIT(v)
+
+#endif /* CONFIG_ARTIS */
+
+#endif
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/linux/artis-macros.h 2.6.11-artis-cvs/include/linux/artis-macros.h
--- 2.6.11-pfm/include/linux/artis-macros.h	1970-01-01 01:00:00.000000000 +0100
+++ 2.6.11-artis-cvs/include/linux/artis-macros.h	2005-05-02 14:22:14.000000000 +0200
@@ -0,0 +1,136 @@
+/*
+ * ARTiS macros support
+ *
+ *
+ */
+#ifndef _ARTIS_ARTIS_MACRO_H
+#define	_ARTIS_ARTIS_MACRO_H
+
+#define ARTIS_RT_CPU  1
+#define ARTIS_NRT_CPU 0
+
+#define ARTIS_DEBUG_SIZE 512
+
+#define ARTIS_STATUS_REQUEST1	(0x01) // beginning of request function
+#define ARTIS_STATUS_REQUEST2	(0x02) // just after the de-activate
+#define ARTIS_STATUS_COMPLETE1	(0x04) // just before the enqueueing
+#define ARTIS_STATUS_COMPLETE2	(0x08) // just after the enqueueing
+#define ARTIS_STATUS_FETCH	(0x10) // after the dequeueing
+#define ARTIS_STATUS_DUMMY1	(0x20) // in dummy, before the printk
+#define ARTIS_STATUS_DUMMY2	(0x40) // in dummy, after the printk
+#define ARTIS_STATUS_LB		(0x80) // a task is being load-balanced
+
+#define cpu_artis(cpu)          (&per_cpu(artis_percpu, (cpu)))
+#define this_artis()            (&__get_cpu_var(artis_percpu))
+
+#define artis_is_this_cpu_rt() (artis_is_cpu_rt(smp_processor_id()))
+#define artis_is_this_cpu_nrt() (artis_is_cpu_nrt(smp_processor_id()))
+#define artis_test_this_migrating() (artis_test_migrating(current))
+#define artis_this_flag() (artis_flag(current))
+#define artis_this_local() (artis_local(current))
+
+#ifdef CONFIG_ARTIS
+
+#ifdef CONFIG_ARTIS_DEBUG
+#define ARTIS_BUG(c, x...) 	\
+do { 				\
+	if (unlikely(c)) 	\
+		artis_bug(__LINE__, __FILE__, 0UL, ##x); \
+} while (0)
+void artis_bug(int, char *, unsigned long, ...);
+#else
+#define ARTIS_BUG(c, x...) 
+#endif
+
+void artis_try_to_migrate(void);
+void function_artis_migration_disable(void);
+void function_artis_migration_enable(void);
+
+#define artis_is_cpu_rt(id) ((cpu_artis(id))->cpu_type == ARTIS_RT_CPU)
+#define artis_is_cpu_nrt(id) ((cpu_artis(id))->cpu_type == ARTIS_NRT_CPU)
+#define artis_test_migrating(p) (artis_flag(p) & ARTIS_STATUS_REQUEST2)
+#define artis_flag(p) ((p)->artis_status.flag)
+#define artis_local(p) ((p)->artis_status.local_status)
+#define artis_migration_count() ((current)->artis_status.migration_count)
+
+#define artis_isrt0() ((current)->policy==SCHED_FIFO && \
+                      (current)->prio==0)
+
+/* preempt_disable() will try_to_migrate() */
+#define artis_force_migration() 	\
+do { 					\
+	preempt_disable(); 		\
+	preempt_enable_no_resched(); 	\
+} while(0)
+
+/*
+ * migration not possible if:
+ * 	idle or
+ * 	migration off or
+ * 	preempt already off or
+ * 	RT0 or
+ * 	in interrupt handler (hard or soft) or
+ * 	irq already off (shouldn't happen because we've already migrated before, BUT IT HAPPEN!!!) or
+ * 	CPU is not RT 
+ */
+
+/*
+ * this tests should be done before the migration stats because they are
+ * independant of the state of the task.
+ */
+#define _artis_ismigrable_nostat()	\
+	(artis_info.active > 0 &&	\
+	preempt_count() <= 1 &&		\
+	!in_interrupt() &&		\
+	!artis_migration_count() &&	\
+	!irqs_disabled() &&		\
+	current->pid > 1	)
+/*
+ * These tests should be done after stat computation because the task is
+ * endangering the RT properties (just that we allow it anyway).
+ */
+#define _artis_ismigrable_stat()	\
+	(artis_is_this_cpu_rt() &&	\
+	!artis_isrt0()		)
+
+#define inc_artis_migration_count() 	\
+do { 					\
+	artis_migration_count()++; 	\
+} while (0)
+
+#define dec_artis_migration_count() 	\
+do { 					\
+	artis_migration_count()--; 	\
+} while (0)
+
+#define artis_migration_disable() 	\
+do { 					\
+	inc_artis_migration_count(); 	\
+	barrier(); 			\
+} while (0)
+
+#define artis_migration_enable() 	\
+do { 					\
+	barrier(); 			\
+	dec_artis_migration_count(); 	\
+} while (0)
+
+#else
+
+#define ARTIS_BUG(c, x...) 
+
+#define artis_is_cpu_rt(id) (0)
+#define artis_is_cpu_nrt(id) (0)
+#define artis_isrt0() (0)
+#define artis_test_migrating(p) (0)
+#define artis_force_migration() do { } while(0)
+#define _artis_ismigrable() (0)
+#define _artis_ismigrable_nostat() (0)
+#define _artis_ismigrable_stat() (0)
+#define artis_migration_disable() do { } while(0)
+#define artis_migration_enable() do { } while(0)
+
+#endif
+
+#endif
+
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/linux/init_task.h 2.6.11-artis-cvs/include/linux/init_task.h
--- 2.6.11-pfm/include/linux/init_task.h	2005-03-02 08:37:48.000000000 +0100
+++ 2.6.11-artis-cvs/include/linux/init_task.h	2005-03-25 19:47:46.000000000 +0100
@@ -112,6 +112,7 @@ extern struct group_info init_groups;
 	.proc_lock	= SPIN_LOCK_UNLOCKED,				\
 	.switch_lock	= SPIN_LOCK_UNLOCKED,				\
 	.journal_info	= NULL,						\
+	SET_ARTIS_TASK_STATUS_INIT(.artis_status)			\
 }
 
 
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/linux/kernel_stat.h 2.6.11-artis-cvs/include/linux/kernel_stat.h
--- 2.6.11-pfm/include/linux/kernel_stat.h	2005-03-02 08:38:26.000000000 +0100
+++ 2.6.11-artis-cvs/include/linux/kernel_stat.h	2005-04-22 18:33:41.000000000 +0200
@@ -7,6 +7,7 @@
 #include <linux/threads.h>
 #include <linux/percpu.h>
 #include <asm/cputime.h>
+#include <linux/artis.h>
 
 /*
  * 'kernel_stat.h' contains the definitions needed for doing
@@ -23,6 +24,9 @@ struct cpu_usage_stat {
 	cputime64_t idle;
 	cputime64_t iowait;
 	cputime64_t steal;
+#ifdef CONFIG_ARTIS
+	u64 rt;
+#endif
 };
 
 struct kernel_stat {
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/linux/preempt.h 2.6.11-artis-cvs/include/linux/preempt.h
--- 2.6.11-pfm/include/linux/preempt.h	2005-03-02 08:37:50.000000000 +0100
+++ 2.6.11-artis-cvs/include/linux/preempt.h	2005-03-25 19:47:46.000000000 +0100
@@ -8,6 +8,7 @@
 
 #include <linux/config.h>
 #include <linux/linkage.h>
+#include <linux/artis-macros.h>
 
 #ifdef CONFIG_DEBUG_PREEMPT
   extern void fastcall add_preempt_count(int val);
@@ -17,7 +18,15 @@
 # define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
 #endif
 
+#ifdef CONFIG_ARTIS
+#define inc_preempt_count() 	\
+do { 				\
+	add_preempt_count(1);	\
+	artis_try_to_migrate();	\
+} while (0)
+#else
 #define inc_preempt_count() add_preempt_count(1)
+#endif
 #define dec_preempt_count() sub_preempt_count(1)
 
 #define preempt_count()	(current_thread_info()->preempt_count)
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/linux/sched.h 2.6.11-artis-cvs/include/linux/sched.h
--- 2.6.11-pfm/include/linux/sched.h	2005-03-02 08:37:48.000000000 +0100
+++ 2.6.11-artis-cvs/include/linux/sched.h	2005-04-26 01:21:49.000000000 +0200
@@ -33,6 +33,8 @@
 #include <linux/percpu.h>
 #include <linux/topology.h>
 
+#include <linux/artis.h>
+
 struct exec_domain;
 
 /*
@@ -685,6 +687,11 @@ struct task_struct {
   	struct mempolicy *mempolicy;
 	short il_next;
 #endif
+#ifdef CONFIG_ARTIS
+	artis_task_status_t artis_status;
+	unsigned long nr_rt;
+	int prev_cpu_load_rt[NR_CPUS];
+#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/init/main.c 2.6.11-artis-cvs/init/main.c
--- 2.6.11-pfm/init/main.c	2005-03-02 08:37:49.000000000 +0100
+++ 2.6.11-artis-cvs/init/main.c	2005-03-25 19:47:47.000000000 +0100
@@ -46,6 +46,7 @@
 #include <linux/rmap.h>
 #include <linux/mempolicy.h>
 #include <linux/key.h>
+#include <linux/artis.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -106,6 +107,10 @@ extern void tc_init(void);
 enum system_states system_state;
 EXPORT_SYMBOL(system_state);
 
+#ifdef CONFIG_ARTIS
+extern void artis_init(void);
+#endif
+
 /*
  * Boot command-line arguments
  */
@@ -510,6 +515,11 @@ asmlinkage void __init start_kernel(void
 
 	acpi_early_init(); /* before LAPIC and SMP init */
 
+#ifdef CONFIG_ARTIS
+	artis_init();
+#endif
+
+
 	/* Do the rest non-__init'ed, we're now alive */
 	rest_init();
 }
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/kernel/fork.c 2.6.11-artis-cvs/kernel/fork.c
--- 2.6.11-pfm/kernel/fork.c	2005-03-02 08:37:48.000000000 +0100
+++ 2.6.11-artis-cvs/kernel/fork.c	2005-03-25 19:47:47.000000000 +0100
@@ -40,6 +40,7 @@
 #include <linux/profile.h>
 #include <linux/rmap.h>
 #include <linux/acct.h>
+#include <linux/artis.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -78,10 +79,16 @@ int nr_processes(void)
 # define free_task_struct(tsk)	kmem_cache_free(task_struct_cachep, (tsk))
 static kmem_cache_t *task_struct_cachep;
 #endif
+#ifdef CONFIG_ARTIS
+kmem_cache_t *artis_fifo_node_cachep = NULL;
+#endif
 
 void free_task(struct task_struct *tsk)
 {
 	free_thread_info(tsk->thread_info);
+#ifdef CONFIG_ARTIS
+	free_artis_fifo_node(tsk->artis_status.fifo_node);
+#endif
 	free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -113,6 +120,14 @@ void __init fork_init(unsigned long memp
 		kmem_cache_create("task_struct", sizeof(struct task_struct),
 			ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL, NULL);
 #endif
+#ifdef CONFIG_ARTIS
+	artis_fifo_node_cachep =
+	  kmem_cache_create("artis_fifo_node",
+			    sizeof(artis_fifo_node_t),0,
+			    SLAB_MUST_HWCACHE_ALIGN, NULL, NULL);
+	if (!artis_fifo_node_cachep)
+	  panic("fork_init(): cannot create artis_fifo_node_t SLAB cache");
+#endif
 
 	/*
 	 * The default maximum number of threads is set to a safe
@@ -135,6 +150,9 @@ static struct task_struct *dup_task_stru
 {
 	struct task_struct *tsk;
 	struct thread_info *ti;
+#ifdef CONFIG_ARTIS
+	artis_fifo_node_t *an, def_an=ARTIS_FIFO_NODE_INIT;
+#endif
 
 	prepare_to_copy(orig);
 
@@ -148,11 +166,23 @@ static struct task_struct *dup_task_stru
 		return NULL;
 	}
 
-	*ti = *orig->thread_info;
 	*tsk = *orig;
+#ifdef CONFIG_ARTIS
+	an = alloc_artis_fifo_node(tsk);
+	if (!an) {
+		free_task_struct(tsk);
+		free_thread_info(ti);
+		return NULL;
+	}
+	
+	*an = def_an;
+	an->task = tsk;
+	tsk->artis_status.fifo_node = an;
+#endif
+	*ti = *orig->thread_info;
 	tsk->thread_info = ti;
 	ti->task = tsk;
-
+	
 	/* One for us, one for whoever does the "release_task()" (usually parent) */
 	atomic_set(&tsk->usage,2);
 	return tsk;
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/kernel/sched.c 2.6.11-artis-cvs/kernel/sched.c
--- 2.6.11-pfm/kernel/sched.c	2005-03-02 08:38:19.000000000 +0100
+++ 2.6.11-artis-cvs/kernel/sched.c	2005-04-24 01:09:22.000000000 +0200
@@ -46,9 +46,25 @@
 #include <linux/syscalls.h>
 #include <linux/times.h>
 #include <asm/tlb.h>
+#include <linux/artis.h>
 
 #include <asm/unistd.h>
 
+#ifdef CONFIG_ARTIS
+#include <linux/proc_fs.h>
+#include <linux/ctype.h>
+#if defined(__ia64__)
+#define artis_get_time(x) ((x)=ia64_get_itc())
+#elif defined(__i386__)
+#define artis_get_time(x) rdtscl(x)
+#else
+#define artis_get_time(x) ((x)=jiffies)
+#endif
+#define KERN_ARTIS_LEVEL ""
+#define ARTIS_PRINTK noprintk
+int noprintk(char *s, ...) { return 0; }
+#endif
+
 /*
  * Convert user-nice values [ -20 ... 0 ... 19 ]
  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -234,6 +250,11 @@ struct runqueue {
 	task_t *migration_thread;
 	struct list_head migration_queue;
 #endif
+#ifdef CONFIG_ARTIS
+	unsigned long nr_rt;
+	int prev_cpu_load_rt[NR_CPUS];
+#endif
+
 
 #ifdef CONFIG_SCHEDSTATS
 	/* latency stats */
@@ -288,6 +309,12 @@ static DEFINE_PER_CPU(struct runqueue, r
 #define task_rq(p)		cpu_rq(task_cpu(p))
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 
+/* ARTiS migration */
+#ifdef CONFIG_ARTIS
+DEFINE_PER_CPU(artis_per_cpu_info_t, artis_percpu);
+EXPORT_PER_CPU_SYMBOL(artis_percpu);
+#endif
+
 /*
  * Default context-switch locking:
  */
@@ -296,6 +323,12 @@ static DEFINE_PER_CPU(struct runqueue, r
 # define finish_arch_switch(rq, next)	spin_unlock_irq(&(rq)->lock)
 # define task_running(rq, p)		((rq)->curr == (p))
 #endif
+#ifdef CONFIG_ARTIS
+#ifndef artis_complete_arch
+#define artis_complete_arch(rq, task)  do { } while (0)
+#define artis_finish_complete_arch(rq, task)  do { } while (0)
+#endif
+#endif /* CONFIG_ARTIS */
 
 /*
  * task_rq_lock - lock the runqueue a given task resides on and disable
@@ -628,6 +661,22 @@ static int effective_prio(task_t *p)
 	return prio;
 }
 
+#ifdef CONFIG_ARTIS
+static inline void increase_nr_rt(task_t *p, runqueue_t *rq)
+{
+	if (unlikely(rt_task(p)))
+		rq->nr_rt++;
+}
+static inline void decrease_nr_rt(task_t *p, runqueue_t *rq)
+{
+	if (unlikely(rt_task(p)))
+		rq->nr_rt--;
+}
+#else
+static inline void increase_nr_rt(task_t *p, runqueue_t *rq){}
+static inline void decrease_nr_rt(task_t *p, runqueue_t *rq){}
+#endif /* CONFIG_ARTIS */
+
 /*
  * __activate_task - move a task to the runqueue.
  */
@@ -635,6 +684,7 @@ static inline void __activate_task(task_
 {
 	enqueue_task(p, rq->active);
 	rq->nr_running++;
+	increase_nr_rt(p, rq);
 }
 
 /*
@@ -644,6 +694,7 @@ static inline void __activate_idle_task(
 {
 	enqueue_task_head(p, rq->active);
 	rq->nr_running++;
+	increase_nr_rt(p, rq); //XXX probably not needed as the idle task is never RT
 }
 
 static void recalc_task_prio(task_t *p, unsigned long long now)
@@ -761,7 +812,11 @@ static void activate_task(task_t *p, run
  */
 static void deactivate_task(struct task_struct *p, runqueue_t *rq)
 {
+	ARTIS_BUG(artis_flag(p) & ARTIS_STATUS_FETCH, p, rq);
 	rq->nr_running--;
+	decrease_nr_rt(p, rq);
+	if (p->state == TASK_UNINTERRUPTIBLE)
+		rq->nr_uninterruptible++;
 	dequeue_task(p, p->array);
 	p->array = NULL;
 }
@@ -917,13 +972,106 @@ static inline unsigned long source_load(
 	return min(rq->cpu_load, load_now);
 }
 
+#ifdef CONFIG_ARTIS
+
+/* periods are expressed in nsec (like sched_clock()) */
+#define FSHIFT_P	11		/* nr of bits of precision */
+#define FIXED_P_1	(1<<FSHIFT_P)	/* 1.0 as fixed-point */
+#define EXP_PERIOD	2028		/* 1/exp(1attempt/100attempt) as fixed-point */
+
+#define CALC_PERIOD(period,n)			\
+	*period *= EXP_PERIOD;			\
+	*period += (n)*(FIXED_P_1-EXP_PERIOD);	\
+	*period >>= FSHIFT_P;
+
+/* period of time considered too short to worth migration to RT before next migration attempt */
+#define PERIOD_BEFORE_NMA	(2 * NSEC_PER_SEC/HZ)	/* two scheduling ticks - empirical, could be a bit more */
+/*
+ * Period of time after next migration attempt long enough signifying the forecast was wrong.
+ * It's expressed as a multiplier to the current period.
+ */
+#define MULTIPLY_AFTER_NMA	(2)		/* that should be enough */
+
+/* 
+ * update the statistics about the migration attempts by saving the 
+ * new timestamp of the attempt and re-computing the average period.
+ * Warning: this function must be lock-free
+ */
+void artis_migration_stat(artis_task_status_t *status)
+{
+	unsigned long long old = status->last_attempt;
+	unsigned long long current_period;
+
+	status->last_attempt = sched_clock();
+	// XXX necessary? should not it be just set in init?
+	if (unlikely(old == 0)) /* on the first migration we just initialise the value*/
+		return;
+
+	current_period = (status->last_attempt - old) * FIXED_P_1;
+	
+	CALC_PERIOD(&status->attempt_period, current_period);
+}
+
+#define FSHIFT_RT	15		/* nr of bits of precision */
+#define FIXED_RT_1	(1<<FSHIFT_RT)	/* 1.0 as fixed-point */
+#define EXP_RT		32703		/* 1/exp(1tick/500ticks) as fixed-point */
+#define MAX_RT		FIXED_RT_1 - (FIXED_RT_1/(FIXED_RT_1 - EXP_RT)) /* the maximum possible */
+
+#define CALC_RT(load,n)				\
+	*load *= EXP_RT;			\
+	*load += (n)*(FIXED_RT_1-EXP_RT);	\
+	*load >>= FSHIFT_RT;
+/*
+ * update the statistics about the RT task CPU consumption and the time 
+ * they donot consume. We always insert 1 or 0 in the mean so it should
+ * always be between 0 and 1.
+ */
+void artis_rt_stat(task_t *p, struct cpu_usage_stat *cpustat)
+{
+	int rt_ticks = rt_task(p) ? FIXED_RT_1 : 0;
+
+	CALC_RT(&cpustat->rt, rt_ticks);
+}
+
+/*
+ * It is called RT ratio but actually it is the inverse of the cpu time NOT consumed
+ * by RT tasks.
+ * Returns 1, the minimum, when no RT process was ran.
+ * The maximum value of the rt stat is MAX_RT, to avoid division by 0
+ * when only RT tasks are running it returns MAX_RT+1.
+ */ 
+unsigned int get_rt_ratio(int cpu)
+{
+	unsigned int nrt;
+	struct cpu_usage_stat *cpustat = &(kstat_cpu(cpu)).cpustat;
+
+	if (!artis_info.active)
+		return 1;
+	
+	nrt = (MAX_RT - cpustat->rt) ?: 1;
+	return (unsigned int)(nrt + cpustat->rt)/nrt;
+}
+
+/* return the number of active RT tasks on a runqueue */
+static inline unsigned int get_nr_rt(runqueue_t *rq)
+{
+	return (artis_info.active) ? rq->nr_rt : 0;
+}
+#else /* CONFIG_ARTIS */
+#define get_rt_ratio(cpu) 1
+#define get_nr_rt(cpu) 0
+#define artis_rt_stat(p, cpustat) 
+#define artis_migration_stat(status) 
+#endif /* CONFIG_ARTIS */
+
 /*
  * Return a high guess at the load of a migration-target cpu
  */
 static inline unsigned long target_load(int cpu)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->nr_running - get_nr_rt(rq))
+	       				* get_rt_ratio(cpu) * SCHED_LOAD_SCALE;
 
 	return max(rq->cpu_load, load_now);
 }
@@ -938,6 +1086,11 @@ static inline unsigned long target_load(
  *
  * Returns the CPU we should wake onto.
  */
+
+/* XXX on ARTiS, it's probably a bad idea to wake up a task on a RT CPU
+ * because it is very likely to take a lock in order to wake up the task,
+ * a migration to an NRT CPU would followed.
+ */
 #if defined(ARCH_HAS_SCHED_WAKE_IDLE)
 static int wake_idle(int cpu, task_t *p)
 {
@@ -1166,6 +1319,10 @@ void fastcall sched_fork(task_t *p)
 	 */
 	p->thread_info->preempt_count = 1;
 #endif
+#ifdef CONFIG_ARTIS
+	/* Reset the migration counter for the newly created process */
+	p->artis_status.nb_migration = 0;
+#endif
 	/*
 	 * Share the timeslice between parent and child, thus the
 	 * total amount of pending timeslices in the system doesn't change,
@@ -1241,6 +1398,7 @@ void fastcall wake_up_new_task(task_t * 
 				p->array = current->array;
 				p->array->nr_active++;
 				rq->nr_running++;
+				increase_nr_rt(p, rq);
 			}
 			set_need_resched();
 		} else
@@ -1344,6 +1502,14 @@ static void finish_task_switch(task_t *p
 	 *		Manfred Spraul <manfred@colorfullife.com>
 	 */
 	prev_task_flags = prev->flags;
+#ifdef CONFIG_ARTIS
+	/* 
+	 * We complete the migration here because the next task can re-run from
+	 * differents places : schedule(), schedule_tail()... 
+	 */
+	if (artis_test_migrating(prev))
+		artis_complete_migration(prev);
+#endif
 	finish_arch_switch(rq, prev);
 	if (mm)
 		mmdrop(mm);
@@ -1358,10 +1524,13 @@ static void finish_task_switch(task_t *p
 asmlinkage void schedule_tail(task_t *prev)
 	__releases(rq->lock)
 {
+	/* ARTiS: this is part of the scheduler, migration is forbiden */
+	artis_migration_disable();
 	finish_task_switch(prev);
 
 	if (current->set_child_tid)
 		put_user(current->pid, current->set_child_tid);
+	artis_migration_enable();
 }
 
 /*
@@ -1621,6 +1790,46 @@ out:
 	put_cpu();
 }
 
+#ifdef CONFIG_ARTIS
+
+static void artis_valois_push(artis_migration_queue_t *queue, task_t *task);
+/*
+ * push_task - move a task from the local runqueue to a remote runqueue.
+ * Only local runqueue needs lock, useful for ARTiS
+ */
+static inline
+void push_task(runqueue_t *src_rq, prio_array_t *src_array, task_t *p,
+	       int dest_cpu)
+{
+	artis_per_cpu_info_t *artis_cpu = cpu_artis(dest_cpu);
+	artis_migration_queue_t *queue = artis_cpu->queues[smp_processor_id()][dest_cpu];
+
+	/*
+	 * compared to pull_task(), we deactivate the task the full way because 
+	 * valois_pull() will reactivate it the full way.
+	 */
+	deactivate_task(p, src_rq);
+	
+	//XXX we should have a valois_push() clean and available for every part of the kernel
+	get_task_struct(p);
+	/*
+	 * we do it the opposite way than the ARTiS migration:
+	 * local flag stays NULL.
+	 * artis_fetch_from_migration() will notice it and avoid changing the flag
+	 * and giving the struct back. That's necessary because we can't keep the 
+	 * local variable available.
+	 */
+	ARTIS_BUG(artis_local(p), artis_local(p));
+	set_task_cpu(p, dest_cpu);
+	artis_flag(p) = ARTIS_STATUS_LB;
+
+	atomic_inc(&(cpu_artis(dest_cpu)->fetch_size));
+	atomic_inc(&(artis_info.size));
+	artis_valois_push(queue, p);
+}
+
+#endif
+
 /*
  * pull_task - move a task from a remote runqueue to the local runqueue.
  * Both runqueues must be locked.
@@ -1631,8 +1840,10 @@ void pull_task(runqueue_t *src_rq, prio_
 {
 	dequeue_task(p, src_array);
 	src_rq->nr_running--;
+	decrease_nr_rt(p, src_rq);
 	set_task_cpu(p, this_cpu);
 	this_rq->nr_running++;
+	increase_nr_rt(p, this_rq);
 	enqueue_task(p, this_array);
 	p->timestamp = (p->timestamp - src_rq->timestamp_last_tick)
 				+ this_rq->timestamp_last_tick;
@@ -1674,9 +1885,118 @@ int can_migrate_task(task_t *p, runqueue
 
 	if (task_hot(p, rq->timestamp_last_tick, sd))
 			return 0;
+#ifdef CONFIG_ARTIS
+	/* 4) ARTIS-migrating */
+	if (artis_flag(p))
+		return 0;
+	/* 5) likely to migrate soon if going to a RT CPU */
+	/* this_cpu is actually the _destination_ CPU */
+	if ((artis_info.active > 0) && artis_is_cpu_rt(this_cpu)) {
+		artis_task_status_t *status = &p->artis_status;
+		unsigned long long next_attempt = status->last_attempt + status->attempt_period;
+		unsigned long long now = sched_clock();
+
+		if ((now > (next_attempt - PERIOD_BEFORE_NMA)) && 
+		    (now > (next_attempt + status->attempt_period * MULTIPLY_AFTER_NMA)))
+			return 0;
+	}
+#endif
 	return 1;
 }
 
+
+#ifdef CONFIG_ARTIS
+
+/*
+ * move_tasks_push pushes up to max_nr_move task to the destination CPU.
+ * used for ARTiS to avoid lock on destination CPU.
+ *
+ * called with only the local runqueue locked
+ */
+static int move_tasks_push(runqueue_t *this_rq, int this_cpu, int idlest_cpu,
+		      unsigned long max_nr_move, struct sched_domain *sd,
+		      int only_rt)
+{
+	prio_array_t *array;
+	struct list_head *head, *curr;
+	int idx, pushed = 0, did_modulo, start;
+	task_t *tmp;
+
+	if (max_nr_move <= 0 || cpu_rq(this_cpu)->nr_running <= 1)
+		goto out;
+
+	/*
+	 * We first consider expired tasks. Those will likely not be
+	 * executed in the near future, and they are most likely to
+	 * be cache-cold, thus switching CPUs has the least effect
+	 * on them.
+	 */
+	if (this_rq->expired->nr_active)
+		array = this_rq->expired;
+	else
+		array = this_rq->active;
+
+	/* when from RT -> NRT, we first move the Linux task and keep the RT tasks */
+	if (artis_is_cpu_rt(this_cpu) && artis_is_cpu_nrt(idlest_cpu))
+		start = MAX_RT_PRIO;
+	else
+		start = MAX_PRIO;
+
+new_array:
+	did_modulo = 0;
+	idx = start;
+skip_bitmap:
+	/* special trick to be able to start anywhere in the mask */
+	if (!did_modulo && (idx >= MAX_PRIO)) {
+		idx = 0;
+		did_modulo = 1;
+	}
+
+	if (!idx)
+		idx = sched_find_first_bit(array->bitmap);
+	else
+		idx = find_next_bit(array->bitmap, MAX_PRIO, idx);
+	if ((idx >= MAX_PRIO) || (only_rt && (idx >= MAX_RT_PRIO))) {
+		if (array == this_rq->expired && this_rq->active->nr_active) {
+			array = this_rq->active;
+			goto new_array;
+		}
+		goto out;
+	}
+
+	head = array->queue + idx;
+	curr = head->prev;
+skip_queue:
+	tmp = list_entry(curr, task_t, run_list);
+
+	curr = curr->prev;
+
+	if (!can_migrate_task(tmp, this_rq, idlest_cpu, sd, 0)) {
+		if (curr != head)
+			goto skip_queue;
+		idx++;
+		goto skip_bitmap;
+	}
+
+	push_task(this_rq, array, tmp, idlest_cpu);
+	pushed++;
+
+	/* We only want to steal up to the prescribed number of tasks. */
+	if (pushed < max_nr_move) {
+		if (curr != head)
+			goto skip_queue;
+		idx++;
+		goto skip_bitmap;
+	}
+out:
+	if (pushed) {
+		set_tsk_need_resched(cpu_curr(idlest_cpu));
+		smp_send_reschedule(idlest_cpu);
+	}
+	return pushed;
+}
+#endif
+
 /*
  * move_tasks tries to move up to max_nr_move tasks from busiest to this_rq,
  * as part of a balancing operation within "domain". Returns the number of
@@ -1763,6 +2083,152 @@ out:
 	return pulled;
 }
 
+#ifdef CONFIG_ARTIS
+/*
+ * find_idlest_group does the opposite as find_busiest_group,
+ * used only for ARTiS, for active policy
+ * idle is useless here, if we are idle we don't have anything to move
+ * actually it should not even happen that we are called when idle
+ */
+static struct sched_group *
+find_idlest_group(struct sched_domain *sd, int this_cpu,
+		   unsigned long *imbalance, int only_rt)
+{
+	struct sched_group *idlest = NULL, *this = NULL, *group = sd->groups;
+	unsigned long min_load, avg_load, total_load, this_load, total_pwr;
+
+	min_load = ULONG_MAX;
+	this_load = total_load = total_pwr = 0;
+
+	do {
+		cpumask_t tmp;
+		unsigned long load;
+		int local_group;
+		int i, nr_cpus = 0;
+
+		local_group = cpu_isset(this_cpu, group->cpumask);
+
+		/* Tally up the load of all CPUs in the group */
+		avg_load = 0;
+		cpus_and(tmp, group->cpumask, cpu_online_map);
+		if (only_rt)
+			cpus_and(tmp, tmp, artis_info.cpus_nrt);
+
+		if (unlikely(cpus_empty(tmp)))
+			goto nextgroup;
+
+		for_each_cpu_mask(i, tmp) {
+			/* Bias balancing toward cpus of our domain */
+			if (local_group)
+				load = source_load(i);
+			else
+				load = target_load(i);
+
+			nr_cpus++;
+			avg_load += load;
+		}
+
+		if (!nr_cpus)
+			goto nextgroup;
+
+		total_load += avg_load;
+		total_pwr += group->cpu_power;
+
+		/* Adjust by relative CPU power of the group */
+		avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+
+		if (local_group) {
+			this_load = avg_load;
+			this = group;
+			goto nextgroup;
+		} else if (avg_load < min_load) {
+			min_load = avg_load;
+			idlest = group;
+		}
+nextgroup:
+		group = group->next;
+	} while (group != sd->groups);
+
+	if (!idlest || this_load <= min_load)
+		goto out_balanced;
+	
+	avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
+
+	if (this_load <= avg_load ||
+			100*this_load <= sd->imbalance_pct*min_load)
+		goto out_balanced;
+
+	/*
+	 * We're trying to get all the cpus to the average_load, so we don't
+	 * want to push ourselves above the average load, nor do we wish to
+	 * reduce the max loaded cpu below the average load, as either of these
+	 * actions would just result in more rebalancing later, and ping-pong
+	 * tasks around. Thus we look for the minimum possible imbalance.
+	 * Negative imbalances (*we* are more loaded than anyone else) will
+	 * be counted as no imbalance for these purposes -- we can't fix that
+	 * by pulling tasks to us.  Be careful of negative numbers as they'll
+	 * appear as very large values with unsigned longs.
+	 */
+	*imbalance = min(avg_load - min_load, this_load - avg_load);
+
+	/* How much load to actually move to equalise the imbalance */
+	*imbalance = (*imbalance * min(idlest->cpu_power, this->cpu_power))
+				/ SCHED_LOAD_SCALE;
+
+	if (*imbalance < SCHED_LOAD_SCALE - 1) {
+		unsigned long pwr_now = 0, pwr_move = 0;
+		unsigned long tmp;
+
+		if (this_load - min_load >= SCHED_LOAD_SCALE*2) {
+			*imbalance = 1;
+			return idlest;
+		}
+
+		/*
+		 * OK, we don't have enough imbalance to justify moving tasks,
+		 * however we may be able to increase total CPU power used by
+		 * moving them.
+		 */
+
+		pwr_now += idlest->cpu_power*min(SCHED_LOAD_SCALE, min_load);
+		pwr_now += this->cpu_power*min(SCHED_LOAD_SCALE, this_load);
+		pwr_now /= SCHED_LOAD_SCALE;
+
+		/* Amount of load we'd subtract */
+		tmp = SCHED_LOAD_SCALE*SCHED_LOAD_SCALE/this->cpu_power;
+		if (this_load > tmp)
+			pwr_move += this->cpu_power*min(SCHED_LOAD_SCALE,
+							tmp - this_load);
+
+		/* Amount of load we'd add */
+		tmp = SCHED_LOAD_SCALE*SCHED_LOAD_SCALE/idlest->cpu_power;
+		if (this_load < tmp)
+			tmp = this_load;
+		pwr_move += idlest->cpu_power*min(SCHED_LOAD_SCALE, min_load + tmp);
+		pwr_move /= SCHED_LOAD_SCALE;
+
+		/* Move if we gain another 8th of a CPU worth of throughput */
+		if (pwr_move < pwr_now + SCHED_LOAD_SCALE / 8)
+			goto out_balanced;
+
+		*imbalance = 1;
+		return idlest;
+	}
+
+	/* Get rid of the scaling factor, rounding down as we divide */
+	*imbalance = (*imbalance + 1) / SCHED_LOAD_SCALE;
+
+	return idlest;
+
+out_balanced:
+	// maybe we should check if the idlest CPU is idle, in this case we
+	// could move one task.
+
+	*imbalance = 0;
+	return NULL;
+}
+#endif
+
 /*
  * find_busiest_group finds and returns the busiest CPU group within the
  * domain. It calculates and returns the number of tasks which should be
@@ -1781,13 +2247,26 @@ find_busiest_group(struct sched_domain *
 		unsigned long load;
 		int local_group;
 		int i, nr_cpus = 0;
+		cpumask_t tmp = group->cpumask;
 
-		local_group = cpu_isset(this_cpu, group->cpumask);
+		local_group = cpu_isset(this_cpu, tmp);
+
+#ifdef CONFIG_ARTIS
+		/* we forbid balancing from the RT-CPUS because first it 
+		 * implies the lock of the RT runqueue and second the 
+		 * ARTiS migration does a lot of "balance" from RT-CPUS
+		 * 
+		 * However it is not done for the local group since it is
+		 * used to calculated the local reference and it must be done.
+		 */ 
+		if ((artis_info.active > 0) && !local_group)
+			cpus_and(tmp, tmp, artis_info.cpus_nrt);
+#endif
 
 		/* Tally up the load of all CPUs in the group */
 		avg_load = 0;
 
-		for_each_cpu_mask(i, group->cpumask) {
+		for_each_cpu_mask(i, tmp) {
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
 				load = target_load(i);
@@ -1901,6 +2380,37 @@ out_balanced:
 	return NULL;
 }
 
+
+#ifdef CONFIG_ARTIS
+/*
+ * find_idlest_queue 
+ * does the opposite of find_busiest_queue, used for ARTiS
+ * instead of returning the runqueue we return the cpu (-1 if nothing)
+ */
+static int find_idlest_queue(struct sched_group *group, int only_rt)
+{
+	cpumask_t tmp;
+	unsigned long load, min_load = ULONG_MAX;
+	int i, idlest = -1;
+
+	cpus_and(tmp, group->cpumask, cpu_online_map);
+	if (only_rt)
+		cpus_and(tmp, tmp, artis_info.cpus_nrt);
+
+	for_each_cpu_mask(i, tmp) {
+		load = target_load(i);
+
+		if (load < min_load) {
+			min_load = load;
+			idlest = i;
+		}
+	}
+
+	return idlest;
+}
+
+#endif
+
 /*
  * find_busiest_queue - find the busiest runqueue among the cpus in group.
  */
@@ -1909,8 +2419,13 @@ static runqueue_t *find_busiest_queue(st
 	unsigned long load, max_load = 0;
 	runqueue_t *busiest = NULL;
 	int i;
+	cpumask_t tmp = group->cpumask;
 
-	for_each_cpu_mask(i, group->cpumask) {
+#ifdef CONFIG_ARTIS
+	if (artis_info.active > 0)
+		cpus_and(tmp, tmp, artis_info.cpus_nrt);
+#endif
+	for_each_cpu_mask(i, tmp) {
 		load = source_load(i);
 
 		if (load > max_load) {
@@ -1922,6 +2437,81 @@ static runqueue_t *find_busiest_queue(st
 	return busiest;
 }
 
+#ifdef CONFIG_ARTIS
+
+/*
+ * Same as the load_balance function but with "active" policy.
+ * This is good for ARTiS because it doesn't need to lock the destination
+ * CPU. However it also means we never do "passive" -pull- policy.
+ *
+ * Called with this_rq unlocked.
+ */
+
+static int load_balance_push(int this_cpu, runqueue_t *this_rq,
+			struct sched_domain *sd, int only_rt)
+{
+	struct sched_group *group;
+	unsigned long imbalance;
+	int nr_moved, idlest;
+	
+	spin_lock(&this_rq->lock);
+
+	group = find_idlest_group(sd, this_cpu, &imbalance, only_rt);
+	if (!group)
+		goto out_balanced;
+	
+	idlest = find_idlest_queue(group, only_rt);
+	if (idlest == -1)
+		goto out_balanced;
+	
+	/*
+	 * This should be "impossible", but since load
+	 * balancing is inherently racy and statistical,
+	 * it could happen in theory.
+	 */
+	if (unlikely(idlest == this_cpu)) {
+		WARN_ON(1);
+		goto out_balanced;
+	}
+
+	nr_moved = 0;
+	if (cpu_rq(this_cpu)->nr_running > 1) {
+		/*
+		 * This should be only one function wich move task and manages
+		 * the flags and all other small things, in common with 
+		 * artis_complete_migration(). But here the point is that we 
+		 * don't want to send the resched flag too often.
+		 */
+		nr_moved = move_tasks_push(this_rq, this_cpu, idlest,
+						imbalance, sd, only_rt);
+		
+	}
+	spin_unlock(&this_rq->lock);
+
+	/*
+	 * In the normal load_balance() we do would force "active" LB is
+	 * the passive one hasn't work. Here we already do "active" one
+	 * so we don't re-force it, however we don't force "passive" one
+	 * neither because there is no easy to do that without lock...
+	 */
+	sd->nr_balance_failed = 0;
+
+	/* We were unbalanced, so reset the balancing interval */
+	sd->balance_interval = sd->min_interval;
+
+	return nr_moved;
+
+out_balanced:
+	spin_unlock(&this_rq->lock);
+
+	/* tune up the balancing interval */
+	if (sd->balance_interval < sd->max_interval)
+		sd->balance_interval *= 2;
+
+	return 0;
+}
+
+#endif
 /*
  * Check this_cpu to ensure it is balanced within domain. Attempt to move
  * tasks if there is an imbalance.
@@ -2042,6 +2632,16 @@ static int load_balance_newidle(int this
 	unsigned long imbalance;
 	int nr_moved = 0;
 
+#ifdef CONFIG_ARTIS
+		/*
+		 *  don't balance when idle if the CPU can only do push
+		 *  == if the CPU is a RT CPU.
+		 */
+		if ((artis_info.active > 0) && (artis_is_cpu_rt(this_cpu)))
+			goto out;
+#endif
+
+
 	schedstat_inc(sd, lb_cnt[NEWLY_IDLE]);
 	group = find_busiest_group(sd, this_cpu, &imbalance, NEWLY_IDLE);
 	if (!group) {
@@ -2172,7 +2772,8 @@ static void rebalance_tick(int this_cpu,
 
 	/* Update our load */
 	old_load = this_rq->cpu_load;
-	this_load = this_rq->nr_running * SCHED_LOAD_SCALE;
+	this_load = (this_rq->nr_running - get_nr_rt(this_rq))
+					* get_rt_ratio(this_cpu) * SCHED_LOAD_SCALE;
 	/*
 	 * Round up the averaging division if load is increasing. This
 	 * prevents us from getting stuck on 9 if the load is 10, for
@@ -2189,6 +2790,32 @@ static void rebalance_tick(int this_cpu,
 			continue;
 
 		interval = sd->balance_interval;
+
+#ifdef CONFIG_ARTIS
+		unsigned long interval_rt = ULONG_MAX;
+
+		if (artis_info.active > 0) {
+			/*
+			 *  don't balance when idle if the CPU can only do push
+			 *  (== if the CPU is a RT CPU).
+			 */
+			if ((idle == SCHED_IDLE) && (artis_is_cpu_rt(this_cpu)))
+				return;
+			/*
+			 * On NRT CPUs we have two load-balancing:
+			 * - one very often that only sends back RT tasks to RT CPUs
+			 * - one usual
+			 */
+			if (artis_is_cpu_nrt(this_cpu)) {
+				/* scale ms to jiffies */
+				interval_rt = msecs_to_jiffies(interval / 4);
+				if (unlikely(!interval_rt))
+					interval_rt = 1;
+			}
+		}
+		
+#endif
+
 		if (idle != SCHED_IDLE)
 			interval *= sd->busy_factor;
 
@@ -2198,12 +2825,27 @@ static void rebalance_tick(int this_cpu,
 			interval = 1;
 
 		if (j - sd->last_balance >= interval) {
+#ifdef CONFIG_ARTIS
+			if (artis_info.active > 0)
+				load_balance_push(this_cpu, this_rq, sd, 0);	
+			else
+#endif
 			if (load_balance(this_cpu, this_rq, sd, idle)) {
 				/* We've pulled tasks over so no longer idle */
 				idle = NOT_IDLE;
 			}
 			sd->last_balance += interval;
 		}
+#ifdef CONFIG_ARTIS
+		/* 
+		 * This might not be called exatly as often as expected because some
+		 * ticks might be missed but it's not a problem, in general
+		 * it works and that's enough. At least it keeps the code simple.
+		 */
+		else if ((interval_rt != ULONG_MAX) &&
+			 (((j - sd->last_balance) % interval_rt) == 0))
+			load_balance_push(this_cpu, this_rq, sd, 1);
+#endif
 	}
 }
 #else
@@ -2344,6 +2986,7 @@ void account_user_time(struct task_struc
 		cpustat->nice = cputime64_add(cpustat->nice, tmp);
 	else
 		cpustat->user = cputime64_add(cpustat->user, tmp);
+	artis_rt_stat(p, cpustat);
 }
 
 /*
@@ -2379,6 +3022,7 @@ void account_system_time(struct task_str
 		cpustat->iowait = cputime64_add(cpustat->iowait, tmp);
 	else
 		cpustat->idle = cputime64_add(cpustat->idle, tmp);
+	artis_rt_stat(p, cpustat);
 }
 
 /*
@@ -2677,16 +3321,42 @@ asmlinkage void __sched schedule(void)
 	 */
 	if (likely(!current->exit_state)) {
 		if (unlikely(in_atomic())) {
+#ifdef CONFIG_ARTIS
+			printk(KERN_ERR "bad: scheduling while atomic! (pid=%d/%d, preempt=%d, kernel_lock=%d)\n",
+					current->pid, current->thread_info->cpu, 
+					preempt_count(), current->lock_depth);
+#else
 			printk(KERN_ERR "scheduling while atomic: "
 				"%s/0x%08x/%d\n",
 				current->comm, preempt_count(), current->pid);
+#endif
 			dump_stack();
 		}
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
 need_resched:
+	artis_migration_disable();
 	preempt_disable();
+
+#ifdef CONFIG_ARTIS
+	if ((artis_info.active > 0) &&
+	       (smp_processor_id() == first_cpu(artis_info.cpus_nrt))) {
+		int i;
+		for_each_cpu_mask(i, artis_info.cpus_rt)
+			artis_check_dummyfifo(i);
+	}
+#endif
+
+
+#ifdef CONFIG_ARTIS_DEBUG
+	if (current->pid>200 && (artis_migration_count()>2 || (artis_migration_count()>1 && artis_this_flag() == ARTIS_STATUS_REQUEST1))) {
+		ARTIS_PRINTK(KERN_ARTIS_LEVEL "ARTIS debug: In schedule (p=%d/%d) migration=%d\n",
+				current->pid, smp_processor_id(), 
+				artis_migration_count());
+	}
+#endif
+
 	prev = current;
 	release_kernel_lock(prev);
 need_resched_nonpreemptible:
@@ -2716,14 +3386,47 @@ need_resched_nonpreemptible:
 
 	spin_lock_irq(&rq->lock);
 
+#ifdef CONFIG_ARTIS
+	if (atomic_read(&(this_artis()->fetch_size))) {
+		int i;
+		for_each_cpu(i)
+			artis_fetch_from_migration(i);
+	}
+#endif /* CONFIG_ARTIS */
+
 	if (unlikely(prev->flags & PF_DEAD))
 		prev->state = EXIT_DEAD;
 
 	switch_count = &prev->nivcsw;
-	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
+#ifdef CONFIG_ARTIS
+	/* 
+	 * if the flag ARTIS is set, we know where we come from
+	 * and where we go (may be !!!) so we force deactivation of
+	 * the task
+	 */
+	if (unlikely(artis_flag(prev) & (ARTIS_STATUS_REQUEST1|ARTIS_STATUS_DUMMY1|ARTIS_STATUS_DUMMY2)) &&
+			!(preempt_count() & PREEMPT_ACTIVE)) {
+		ARTIS_BUG(!prev->array, prev);
+		artis_flag(prev) = ARTIS_STATUS_REQUEST2;
+		deactivate_task(prev, rq);
+	} else
+#endif
+	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE) 
+#ifdef CONFIG_ARTIS
+	/*
+	 * We don't want to be deactivated before ending
+	 * the request function.
+	 *
+	 * This can happen if the migrated task has been 
+	 * choosen but the need_resched flag implies the 
+	 * restarting of schedule (cf. label need_resched:)
+	 */
+		&& !(artis_flag(prev) & ARTIS_STATUS_FETCH)
+#endif
+		) {
 		switch_count = &prev->nvcsw;
 		if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
-				unlikely(signal_pending(prev))))
+			     unlikely(signal_pending(prev))))
 			prev->state = TASK_RUNNING;
 		else {
 			if (prev->state == TASK_UNINTERRUPTIBLE)
@@ -2731,10 +3434,21 @@ need_resched_nonpreemptible:
 			deactivate_task(prev, rq);
 		}
 	}
+	ARTIS_BUG(artis_test_migrating(prev) && prev->array, prev);
 
 	cpu = smp_processor_id();
 	if (unlikely(!rq->nr_running)) {
 go_idle:
+#ifdef CONFIG_ARTIS
+		/* 
+		 * On the migration flag, we skip load_balancing for 2 reasons :
+		 *  - we want to migrate quickly
+		 *  - load_balance() can call double_balance_lock() which can 
+		 *  release the runqueue lock. So a wake-up would have been 
+		 *  able to re-activate the task
+		 */
+		if (likely(!artis_test_migrating(prev)))
+#endif
 		idle_balance(cpu, rq);
 		if (!rq->nr_running) {
 			next = rq->idle;
@@ -2779,6 +3493,7 @@ go_idle:
 	idx = sched_find_first_bit(array->bitmap);
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
+	ARTIS_BUG(artis_test_migrating(prev) && prev==next, prev);
 
 	if (!rt_task(next) && next->activated > 0) {
 		unsigned long long delta = now - next->timestamp;
@@ -2798,6 +3513,7 @@ switch_tasks:
 	prefetch(next);
 	clear_tsk_need_resched(prev);
 	rcu_qsctr_inc(task_cpu(prev));
+	ARTIS_BUG(artis_test_migrating(prev) && prev->array, prev);
 
 	prev->sleep_avg -= run_time;
 	if ((long)prev->sleep_avg <= 0)
@@ -2806,6 +3522,8 @@ switch_tasks:
 
 	sched_info_switch(prev, next);
 	if (likely(prev != next)) {
+		ARTIS_BUG(artis_test_migrating(prev) && prev->array, prev);
+
 		next->timestamp = now;
 		rq->nr_switches++;
 		rq->curr = next;
@@ -2815,14 +3533,19 @@ switch_tasks:
 		prev = context_switch(rq, prev, next);
 		barrier();
 
+		ARTIS_BUG(artis_test_migrating(current), artis_flag(current));
+
 		finish_task_switch(prev);
-	} else
+	} else {
+		ARTIS_BUG(artis_test_migrating(prev), prev);
 		spin_unlock_irq(&rq->lock);
+	}
 
 	prev = current;
 	if (unlikely(reacquire_kernel_lock(prev) < 0))
 		goto need_resched_nonpreemptible;
 	preempt_enable_no_resched();
+	artis_migration_enable();
 	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
 		goto need_resched;
 }
@@ -3486,6 +4209,620 @@ static int do_sched_setscheduler(pid_t p
 	return retval;
 }
 
+#ifdef CONFIG_ARTIS
+
+void
+artis_bug(int line, char *file, unsigned long time, ...) {
+	artis_migration_disable();
+	artis_get_time(time);
+	printk("ARTIS BUG (%s:%d@%ld) KDB_ENTER (%d/%d)=%x\n",
+		file, line, time, (int)current->pid, (int)smp_processor_id(),
+		(int)artis_this_flag());
+	artis_migration_enable();
+}
+EXPORT_SYMBOL(artis_bug);
+
+/*
+ * Main function of ARTiS: when a task endangers RT properties of the CPU,
+ * this function is called and check the task is allowed. If not then it's
+ * migrated.
+ * When entering, the preemption is already disabled.
+ */
+void artis_try_to_migrate(void) {
+	if (!_artis_ismigrable_nostat())
+		return;
+
+	artis_migration_stat(&current->artis_status);
+
+	if (!_artis_ismigrable_stat())
+		return;
+
+	/* we will come back from this function on an NRT CPU */
+	artis_request_for_migration();
+}
+EXPORT_SYMBOL(artis_try_to_migrate);
+
+void
+function_artis_migration_disable(void) {
+	artis_migration_disable();
+}
+EXPORT_SYMBOL(function_artis_migration_disable);
+
+void
+function_artis_migration_enable(void) {
+	artis_migration_enable();
+}
+EXPORT_SYMBOL(function_artis_migration_enable);
+
+
+/**
+ * artis_valois_push - append a task to a migration queue
+ * @queue the migration queue
+ * @task the task
+ */
+static void
+artis_valois_push(artis_migration_queue_t *queue, task_t *task) {
+	artis_fifo_node_t *node, *otail;
+	node=task->artis_status.fifo_node;
+	ARTIS_BUG(node->task!=task, task, node, node->task);
+	ARTIS_BUG(node->next, task, node, node->next);
+	otail=queue->tail;
+	ARTIS_BUG(!otail, task, queue);
+	atomic_inc(&(queue->size));
+	otail->next=node;
+	queue->tail=node;
+}
+
+
+/**
+ * artis_valois_pull_all - pull all the tasks from a migration queue
+ * @queue the migration queue
+ *
+ * It returns the last task in the migration queue or NULL if the migration queue
+ * is empty.
+ *
+ * The migration queue is supposed to be a FIFO queue but this does not seem
+ * to be that important, so we return a linked list of the tasks in the migration
+ * queue with the last element at the begining.
+ */
+static task_t *
+artis_valois_pull_all(artis_migration_queue_t *queue) {
+	artis_fifo_node_t *node, *dummy;
+	task_t *task;
+	int i;
+	dummy=queue->head;
+	ARTIS_BUG(!dummy, queue);
+	ARTIS_BUG(dummy->task, queue, dummy, dummy->task);
+	if (!dummy->next)
+		return NULL;
+	for(i=1, node=dummy;node->next->next;node=node->next, i++)
+		;
+	task=node->next->task;
+	node->next->task=NULL;
+	dummy->task=task;
+	task->artis_status.fifo_node=dummy;
+	queue->head=node->next;
+	node->next=NULL;
+	atomic_sub(i, &(queue->size));
+	return task;
+}
+
+
+/**
+ * artis_valois_next - next task in the list retrieved from the migration queue
+ * @task current task
+ * @reset_next set next node to NULL ?
+ *
+ * It returns the task in the next node or NULL if there is no next node.
+ */
+static inline task_t *
+artis_valois_next(task_t *task, int reset_next) {
+	artis_fifo_node_t *node;
+	task_t *rtask;
+	node=task->artis_status.fifo_node;
+	ARTIS_BUG(!node, task);
+	if (!node->next)
+		return NULL;
+	rtask=node->next->task;
+	ARTIS_BUG(!rtask, task, node, node->next);
+	if (reset_next)
+		node->next=NULL;
+	return rtask;
+}
+
+#define VALOIS_NEXT(t) artis_valois_next((t),1)
+#define VALOIS_NEXT_NOBREAK(t) artis_valois_next((t),0)
+
+
+/**
+ * artis_request_for_migration - prepares a task for migration
+ *
+ * This function is called when executing preempt_disable().
+ *
+ * We mark ourself with ARTiS flag then call scheduling to be dropped 
+ * from the run-queue. The rest of action takes place in 
+ * finish_task_switch() after the context-switch.
+ */
+void
+artis_request_for_migration(void) {
+	/* Preemption is already disabled */
+	artis_local_task_status_t artis_local_task_status = ARTIS_LOCAL_TASK_STATUS_INIT;
+	struct thread_info *ti;
+	artis_per_cpu_info_t *artis_cpu;
+
+	artis_migration_disable();
+	get_task_struct(current);
+	ARTIS_BUG(artis_this_local(), artis_this_local());
+	artis_this_local() = &artis_local_task_status;
+#ifdef CONFIG_ARTIS_STAT
+	artis_get_time(artis_this_local()->migration_delta[0]);
+#endif
+	ARTIS_BUG(irqs_disabled(), 0);
+	ti = current_thread_info();
+	ARTIS_BUG(ti->preempt_count != 1, ti, ti->preempt_count);
+	ARTIS_BUG(!current->array, current->array);
+	ARTIS_BUG(artis_test_this_migrating() || artis_this_flag(), artis_this_flag());
+	artis_cpu = this_artis();
+#ifdef CONFIG_ARTIS_DEBUG
+	artis_get_time(artis_this_local()->request.time);
+	artis_put_trace(artis_this_local()->request.bt, current, 0);
+#endif
+
+	artis_this_flag() = ARTIS_STATUS_REQUEST1;
+
+	/* Even if there is a wake_up() on this task, force it to stay 
+	 * on this CPU */
+	artis_this_local()->cpus_allowed_bak = current->cpus_allowed;
+	current->cpus_allowed = cpumask_of_cpu(smp_processor_id());
+#ifdef CONFIG_ARTIS_DEBUG
+	/* debug : save the cpu value */
+	artis_this_local()->cpu = smp_processor_id();
+#endif
+#ifdef CONFIG_ARTIS_STAT
+	artis_get_time(artis_this_local()->migration_delta[1]);
+#endif
+	/*
+	 * Now that artis flag is set and affinity has been changed, we 
+	 * can re-enable preemption
+	 */
+	preempt_enable();
+	/*
+	 * loop for the dummy task that have been re-activated unwillingly
+	 */
+	do {
+		schedule();
+		ARTIS_BUG(artis_this_flag() & ~(ARTIS_STATUS_FETCH|ARTIS_STATUS_DUMMY1|ARTIS_STATUS_DUMMY2), artis_this_flag());
+	} while(artis_this_flag() & (ARTIS_STATUS_DUMMY1|ARTIS_STATUS_DUMMY2));
+	ARTIS_BUG(smp_processor_id() == artis_this_local()->cpu, artis_this_local()->cpu);
+	/* restore original mask and ensure that the current cpu is in */
+	cpu_set(smp_processor_id(), artis_this_local()->cpus_allowed_bak);
+	current->cpus_allowed = artis_this_local()->cpus_allowed_bak;
+
+	artis_this_local() = NULL;
+	/* We release flag to enable kmigration but before we disable 
+	 * preemption like it was at the call.
+	 */ 
+	preempt_disable();
+	artis_this_flag() = 0;
+	artis_migration_enable();
+}
+
+EXPORT_SYMBOL(artis_request_for_migration);
+
+/**
+ * artis_complete_migration - push a task to migrate in a migration queue
+ * @task the task
+ *
+ * This function is called from finish_task_switch(). It moves the current
+ * task to a NRT processor.
+ *
+ * Several threads can enter this function.
+ */
+int 
+artis_complete_migration(task_t *task) {
+	/*
+	 * We come from finish_task_switch() so we know that:
+	 * - preemption is disabled
+	 * - irq are disabled
+	 * - artis migration is disabled
+	 * - the runqueue is locked (except on IA64)
+	 */
+	artis_per_cpu_info_t *artis_cpu;
+	artis_migration_queue_t *queue;
+	runqueue_t *rq;
+	int nrt_processor;
+	cpumask_t cpuchoice;
+
+#ifdef CONFIG_ARTIS_STAT
+	artis_get_time(artis_local(task)->migration_delta[2]);
+#endif
+	rq = this_rq();
+	artis_complete_arch(rq, task);
+	/* now we have the runqueue lock */
+	/* and the task must not be in the runqueue */
+	ARTIS_BUG(task->array, task);
+	/* debug: flag ARTIS must be set */
+	ARTIS_BUG(!artis_test_migrating(task) || (artis_flag(task) != ARTIS_STATUS_REQUEST2), task);
+	artis_flag(task) = ARTIS_STATUS_COMPLETE1;
+	artis_cpu = this_artis();
+	/* debug: are we on the same cpu */
+	ARTIS_BUG(smp_processor_id() != artis_local(task)->cpu, task);
+
+#if defined(CONFIG_ARTIS_DEBUG)
+	artis_get_time(current->artis_status.complete.time);
+	artis_put_trace(current->artis_status.complete.bt, current, 0);
+	current->artis_status.complete_to = task;
+	artis_get_time(current->artis_status.complete_to_at);
+	artis_local(task)->complete_by = current;
+	artis_get_time(artis_local(task)->complete_by_at);
+#endif
+	/* 
+	 * Which NRT processor to choose? for the time being
+	 * we choose the first one which respects the affinity
+	 */
+	cpus_and(cpuchoice, artis_local(task)->cpus_allowed_bak, artis_info.cpus_nrt);
+	nrt_processor = any_online_cpu(cpuchoice);
+
+	/*
+	 * The solution of putting the task in a dummy queue is 
+	 * redhibitory for kthread. So we prefer to force migration
+	 * on any NRT CPU and change the affinity of the task.
+	 */
+	if (nrt_processor == NR_CPUS)
+		nrt_processor = first_cpu(artis_info.cpus_nrt);
+
+	queue = artis_cpu->queues[smp_processor_id()][nrt_processor];
+
+#ifdef CONFIG_ARTIS_DEBUG
+	ARTIS_PRINTK(KERN_ARTIS_LEVEL "ARTIS debug: enter artis_complete_migration (p=%d) cpu %d(%d) -> %d with (%p)\n", task->pid, task->thread_info->cpu, smp_processor_id(), nrt_processor, task->array);
+#endif
+
+	/* now we want the re-activation take place on the NRT cpu */
+	task->cpus_allowed = cpumask_of_cpu(nrt_processor);
+	set_task_cpu(task, nrt_processor);
+	artis_flag(task) = (nrt_processor<NR_CPUS ? ARTIS_STATUS_COMPLETE2 : ARTIS_STATUS_DUMMY1);
+
+	atomic_inc(&(cpu_artis(nrt_processor)->fetch_size));
+	atomic_inc(&(artis_info.size));
+	artis_valois_push(queue, task);
+	
+	artis_finish_complete_arch(rq, task);
+	/* 
+	 * Now that the lock has been released, we force a schedule on
+	 * the NRT CPU. It doesn't matter if the current task changes just
+	 * before we set the flag since that means the cpu has been 
+	 * re-scheduled.
+	 */
+	set_tsk_need_resched(cpu_curr(nrt_processor));
+	smp_send_reschedule(nrt_processor);
+
+#ifdef CONFIG_ARTIS_STAT
+	artis_get_time(artis_local(task)->migration_delta[3]);
+#endif
+#ifdef CONFIG_ARTIS_STAT
+	{ unsigned long t, i, delta;
+		artis_get_time(t);
+		if (!(artis_cpu->nb_migration & 0x3ff)) { 
+			for(i=0; i<3; i++) 
+				artis_cpu->migration_delta[i] = 0; 
+		}
+		for(i=0; i<3; i++) { 
+			delta = artis_local(task)->migration_delta[i+1] -
+			       	artis_local(task)->migration_delta[i]; 
+			if (delta > artis_cpu->migration_delta[i]) 
+				artis_cpu->migration_delta[i] = delta;
+		}
+	}
+#endif
+	artis_cpu->nb_migration++;
+
+	return (nrt_processor<NR_CPUS);
+}
+
+EXPORT_SYMBOL(artis_complete_migration);
+
+/**
+ * artis_fetch_from_migration - fetch all the tasks being moved from a RT CPU
+ * @cpu_orig the processor where the tasks come from
+ *
+ */
+void artis_fetch_from_migration(int cpu_orig)
+{
+	/* This function is called from schedule() so we know that :
+	 * - preemption is disabled
+	 * - irq are disabled
+	 * - artis migration is disabled
+	 * - the runqueue is locked
+	 */
+ 	runqueue_t *rq = this_rq();
+	artis_migration_queue_t *queue = this_artis()->queues[cpu_orig][smp_processor_id()];
+	task_t *p;
+	int size;
+
+	rq = this_rq();
+	/* there is one queue per couple rt_processor/nrt_processor */
+	for (p=artis_valois_pull_all(queue), size=0; p; p=VALOIS_NEXT(p)) {
+		/*
+		 * A wake-up has been able to reactivate the task but 
+		 * since we are in schedule(), it cannot take cpu before
+		 * the fetch has been runned. Therefore the local_status
+		 * is still available.
+		 * If the task comes from load-balancing then local_status
+		 * is null and we avoid doing checks on it.
+		 */
+		if (artis_local(p))
+			ARTIS_BUG((artis_flag(p) != ARTIS_STATUS_COMPLETE2), p);
+		if (!(artis_flag(p) & ARTIS_STATUS_LB))
+			 artis_flag(p) = ARTIS_STATUS_FETCH;
+		/* 
+		 * the task could have been re-activated by a wake-up 
+		 * but this code is part of scheduler so it has the
+		 * hand before the task can be scheduled
+		 */
+		if (!p->array)
+			activate_task(p, rq, 1);
+		ARTIS_BUG(!p->array, p);
+		p->artis_status.nb_migration++;
+		if (artis_flag(p) & ARTIS_STATUS_LB)
+			artis_flag(p) = 0;
+		put_task_struct(p);
+		size++;
+	}
+	atomic_sub(size, &(this_artis()->fetch_size));
+	atomic_sub(size, &(artis_info.size));
+}
+
+EXPORT_SYMBOL(artis_fetch_from_migration);
+
+/*
+ * We do the printk here because if we do it in artis_complete there is
+ * a freeze (may be because the IRQ are disabled).
+ *
+ * We are in schedule() and we know that:
+ *  - artis migration is disabled
+ *  - preemption is disabled
+ */
+void 
+artis_check_dummyfifo(int rt) {
+	artis_migration_queue_t *queue = this_artis()->queues[rt][NR_CPUS];
+	task_t *p;
+	
+	ARTIS_BUG(!queue->head, queue);
+	if (!queue->head->next)
+		return;
+	for(p=queue->head->next->task; p; p=VALOIS_NEXT_NOBREAK(p)) {
+		if (artis_flag(p) & ARTIS_STATUS_DUMMY1) {
+			printk("ARTIS Error : no NRT CPU found for pid=%d from cpu %d. Using dummy queue\n", (int)p->pid, rt);
+			artis_flag(p) = ARTIS_STATUS_DUMMY2;
+		}
+	}
+}
+
+artis_info_t artis_info = ARTIS_INFO_INIT;
+
+/* CPU j is a NRT-CPU or the dummy CPUa */
+#define isNRT(j,nrt_mask) ((j)==NR_CPUS || cpu_isset((j),(nrt_mask)))
+
+
+static cpumask_t artis_default_maskrt;
+static int artis_boot_active;
+
+/**
+ * setup_artis_maskrt
+ * @s rt cpus mask
+ */
+static int setup_artis_maskrt (char *s)
+{
+	cpumask_t mask;
+	int err;
+	
+	if (!s || !(*s))
+		return 0;
+
+	err = cpumask_parse(s, strlen(s), mask);
+	if (err)
+		return err;
+	
+	/* Is there a clear way to copy a mask into another? */
+	cpus_and(artis_default_maskrt, mask, mask);
+
+       return 1;
+}
+
+__setup("artis_maskrt=", setup_artis_maskrt);
+
+
+/**
+ * setup_artis_active
+ * @s
+ *
+ * Activates ARTiS at boot time
+ */
+static int setup_artis_active (char *s)
+{
+       artis_boot_active = 1;
+
+       return 1;
+}
+
+__setup("artis_active", setup_artis_active);
+
+
+
+/**
+ * artis_init - initializes ARTiS subsystem
+ * @rt_cpu_mask real-time CPUs mask
+ *
+ * This function is called at boottime.
+ */
+void __init artis_init(void)
+{
+	artis_per_cpu_info_t *artis_cpu=NULL;
+	artis_per_cpu_info_t artis_cpu_init=ARTIS_PER_CPU_INFO_INIT;
+	artis_migration_queue_t ***queues_matrix;
+	artis_migration_queue_t queue_init=ARTIS_MIGRATION_QUEUE_INIT;
+	artis_fifo_node_t an_init = ARTIS_FIFO_NODE_INIT;
+	int i, j;
+
+	cpus_clear(artis_info.cpus_rt);
+	cpus_clear(artis_info.cpus_nrt);
+	/* initialization of shared runqueue matrix*/
+	queues_matrix = (artis_migration_queue_t ***) kmalloc(NR_CPUS * sizeof(artis_migration_queue_t **), GFP_KERNEL);
+	if (!queues_matrix)
+		goto nomem1;
+
+	memset(queues_matrix, 0, NR_CPUS * sizeof(artis_migration_queue_t **));
+	printk("ARTIS : initialization of the matrix queue (%p)\n", queues_matrix);
+
+	/* initialization of per-CPU structures */
+	/* on X86 it seems for_each_cpu doesn't work yet */
+	for(i = 0; i < NR_CPUS; i++) {
+		artis_cpu = cpu_artis(i);
+		*artis_cpu = artis_cpu_init;
+		artis_cpu->queues = queues_matrix;
+		if (cpu_isset(i, artis_default_maskrt)) {
+			/* rt cpu. Init queues matrix */
+			printk("ARTIS CPU%d: configured as Real-Time (%p)\n", i, artis_cpu);
+			artis_cpu->cpu_type = ARTIS_RT_CPU;
+			cpu_set(i, artis_info.cpus_rt);
+		} else {
+			printk("ARTIS CPU%d: configured as Non Real-Time (%p)\n", i, artis_cpu);
+			artis_cpu->cpu_type = ARTIS_NRT_CPU;
+			cpu_set(i, artis_info.cpus_nrt);
+		}
+	}
+
+	/* init queue between RT and NRT 
+	 * the access order is first RT then NRT : [RT][NRT]
+	 */
+	for (i = 0; i < NR_CPUS; i++) {
+		queues_matrix[i] = (artis_migration_queue_t **) kmalloc((NR_CPUS+1) * sizeof(artis_migration_queue_t *), GFP_KERNEL);
+		if (!queues_matrix[i])
+			goto nomem2;
+
+		memset(queues_matrix[i], 0, (NR_CPUS+1) * sizeof(artis_migration_queue_t *));
+		for(j = 0; j <= NR_CPUS; j++) {
+			artis_migration_queue_t *queue;
+			artis_fifo_node_t *an;
+
+			/* alloc the dummy node of the FIFO */
+			an = alloc_artis_fifo_node(tsk);
+			*an = an_init;
+
+			artis_cpu->queues[i][j] = (artis_migration_queue_t *) kmalloc(sizeof(artis_migration_queue_t), GFP_KERNEL);
+			if (!artis_cpu->queues[i][j])
+				goto nomem2;
+
+			queue = artis_cpu->queues[i][j];
+			*queue = queue_init;
+			queue->head = queue->tail = an;
+			if (j < NR_CPUS) {
+				printk("ARTIS CPU%d: initialization of the queue for CPU%d (%p+%ld)\n",
+						i, j, queue, (long)sizeof(artis_migration_queue_t));
+			} else {
+				printk("ARTIS CPU%d: initialization of the dummy queue (%p+%ld)\n",
+						i, queue, (long)sizeof(artis_migration_queue_t));
+			}
+		}
+	}
+
+	printk("ARTIS debug : NR_CPUS=%d PREEMPT_ACTIVE=%lx PREEMPT_MASK=%lx HARDIRQ_MASK=%lx SOFTIRQ_MASK=%lx\n",
+			(int)NR_CPUS,
+			(unsigned long)PREEMPT_ACTIVE,
+			(unsigned long)PREEMPT_MASK,
+			(unsigned long)HARDIRQ_MASK,
+			(unsigned long)SOFTIRQ_MASK
+		);
+	
+	/* Create /proc/artis struct */
+	artis_proc_init();
+
+	/* activate ARTiS if artis_active was specified on command line */
+	artis_info.active = artis_boot_active;
+
+	printk ("ARTiS succesfully initialized...\n");
+
+	return;
+
+nomem2:
+	/* Not enough memory (at boot time ?!), free up everything */
+	for (i = 0; i < NR_CPUS; i++) {
+		if (queues_matrix[i]) {
+			for(j = 0; j <= NR_CPUS; j++)
+				if (artis_cpu->queues[i][j])
+					kfree(artis_cpu->queues[i][j]);
+			kfree(queues_matrix[i]);
+		}
+	}
+
+	kfree(queues_matrix);
+
+nomem1:
+	printk("ARTiS initialization error\n");
+}
+
+
+
+/**
+ * artis_reinit - reinitializes ARTiS subsystem
+ * @rt_cpu_mask real-time CPUs mask
+ *
+ * Returns 0 on success, -1 on failure.
+ *
+ * This function is called when the RT mask is changed using
+ * /proc/artis/maskrt
+ */
+int
+artis_reinit(cpumask_t rt_cpu_mask)
+{
+	artis_per_cpu_info_t *artis_cpu=NULL;
+	artis_migration_queue_t ***queues_matrix;
+	cpumask_t old_rt_mask, old_nrt_mask, tmp;
+	int i;
+
+	/* test if the new config has at least a NRT-CPU */
+	cpus_complement(tmp, rt_cpu_mask);
+	cpus_and(tmp, tmp, cpu_online_map);
+	if (cpus_empty(tmp)) {
+		printk("ARTIS Error : re-initialization without any online NRT-CPUs (%ld)\n", cpus_addr(rt_cpu_mask)[0]);
+		return -1;
+	}
+
+	old_rt_mask = artis_info.cpus_rt;
+	old_nrt_mask = artis_info.cpus_nrt;
+	queues_matrix = this_artis()->queues;
+	printk("ARTIS : re-initialization of the configuration (%p)\n", queues_matrix);
+
+	cpus_clear(artis_info.cpus_rt);
+	cpus_clear(artis_info.cpus_nrt);
+
+	for_each_cpu(i) {
+		artis_cpu = cpu_artis(i);
+		if (cpu_isset(i, rt_cpu_mask)) {
+			/* rt cpu. Init queues matrix */
+			if (cpu_isset(i, old_rt_mask))
+				printk("ARTIS CPU%d: stay Real-Time (%p)\n", i, artis_cpu);
+			else
+				printk("ARTIS CPU%d: reconfigured as Real-Time (%p)\n", i, artis_cpu);
+			artis_cpu->cpu_type = ARTIS_RT_CPU;
+			cpu_set(i, artis_info.cpus_rt);
+		} else {
+			if (cpu_isset(i, old_nrt_mask))
+				printk("ARTIS CPU%d: stay Non Real-Time (%p)\n", i, artis_cpu);
+			else
+				printk("ARTIS CPU%d: reconfigured as Non Real-Time (%p)\n", i, artis_cpu);
+			artis_cpu->cpu_type = ARTIS_NRT_CPU;
+			cpu_set(i, artis_info.cpus_nrt);
+		}
+	}
+
+	return 0;
+}
+
+
+#endif /* CONFIG_ARTIS */
+
+
 /**
  * sys_sched_setscheduler - set/change the scheduler policy and RT priority
  * @pid: the pid in question.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [2/3] ARTiS, an asymmetric real-time scheduler - x86
  2005-05-03 14:05 [0/3] ARTiS, an asymmetric real-time scheduler Eric Piel
  2005-05-03 14:07 ` [1/3] " Eric Piel
@ 2005-05-03 14:09 ` Eric Piel
  2005-05-03 14:10 ` [3/3] ARTiS, an asymmetric real-time scheduler - IA-64 Eric Piel
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Piel @ 2005-05-03 14:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Philippe Marquet, Christophe Osuna, Julien Soula,
	Jean-Luc Dekeyser, paul.mckenney

[-- Attachment #1: Type: text/plain, Size: 58 bytes --]

Here is the architecture dependant part of ARTiS for x86.

[-- Attachment #2: artis-2.6.11-20050502-i386.patch --]
[-- Type: text/x-patch, Size: 7374 bytes --]

diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/arch/i386/Kconfig 2.6.11-artis-cvs/arch/i386/Kconfig
--- 2.6.11-pfm/arch/i386/Kconfig	2005-03-02 08:37:49.000000000 +0100
+++ 2.6.11-artis-cvs/arch/i386/Kconfig	2005-04-22 18:08:19.000000000 +0200
@@ -528,6 +528,32 @@ config PREEMPT_BKL
 	  Say Y here if you are building a kernel for a desktop system.
 	  Say N if you are unsure.
 
+config ARTIS
+	bool "Compile the kernel with ARTiS support (EXPERIMENTAL)"
+	depends on PREEMPT && SMP
+	help
+		ARTiS is an "Asymmetric Real-Time Scheduling". When activated,
+		the real-time tasks will be insured of low latencies. It works
+		by automatically migrating the normal tasks out from CPUs running
+		real-time tasks when they would trigger a preemption disable or a
+		IRQ disable. Obviously, a SMP system is required (SMT works too).
+		Note that it will not be activated by default, you need to select
+		it at boot time. For more information, see Documentation/artis.txt.
+
+config ARTIS_DEBUG
+	bool "Compile the kernel with ARTiS debugging support (EXPERIMENTAL)"
+	depends on ARTIS
+	help
+		Activate debuging code in ARTiS, you probably don't want this, excepted
+		if you want to hack on ARTiS.
+
+config ARTIS_STAT
+	bool "Compile the kernel with ARTiS accounting support (EXPERIMENTAL)"
+	depends on ARTIS
+	help
+		Activate statistics about ARTiS, available in /proc. There is no (or
+		very little) performance penalty, so you can safely say "yes" here.
+
 config X86_UP_APIC
 	bool "Local APIC support on uniprocessors" if !SMP
 	depends on !(X86_VISWS || X86_VOYAGER)
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/arch/i386/kernel/entry.S 2.6.11-artis-cvs/arch/i386/kernel/entry.S
--- 2.6.11-pfm/arch/i386/kernel/entry.S	2005-03-02 08:37:51.000000000 +0100
+++ 2.6.11-artis-cvs/arch/i386/kernel/entry.S	2005-04-26 01:21:48.000000000 +0200
@@ -361,13 +361,25 @@ common_interrupt:
 	call do_IRQ
 	jmp ret_from_intr
 
+#ifdef CONFIG_ARTIS
 #define BUILD_INTERRUPT(name, nr)	\
 ENTRY(name)				\
 	pushl $nr-256;			\
 	SAVE_ALL			\
+	call function_artis_migration_disable; \
 	movl %esp,%eax;			\
 	call smp_/**/name;		\
+	call function_artis_migration_enable; \
 	jmp ret_from_intr;
+#else
+#define BUILD_INTERRUPT(name, nr)	\
+ENTRY(name)				\
+	pushl $nr-256;			\
+	SAVE_ALL			\
+	movl %esp,%eax;			\
+	call smp_/**/name;	\
+	jmp ret_from_intr;
+#endif
 
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/arch/i386/kernel/traps.c 2.6.11-artis-cvs/arch/i386/kernel/traps.c
--- 2.6.11-pfm/arch/i386/kernel/traps.c	2005-03-02 08:37:49.000000000 +0100
+++ 2.6.11-artis-cvs/arch/i386/kernel/traps.c	2005-03-25 19:47:46.000000000 +0100
@@ -95,6 +95,31 @@ static int kstack_depth_to_print = 24;
 struct notifier_block *i386die_chain;
 static DEFINE_SPINLOCK(die_notifier_lock);
 
+#ifdef CONFIG_ARTIS_DEBUG
+void artis_put_trace(void **bt, struct task_struct *task, unsigned long * stack)
+{
+       int i, artis_skip_bt=0;
+       unsigned long addr;
+
+       if (!stack)
+               stack = (unsigned long*)&stack;
+       memset(bt, 0, ARTIS_BT_SIZE*sizeof(void *));
+       for(i=artis_skip_bt; i>0 && !kstack_end(stack); ) {
+               addr = *stack++;
+               if (kernel_text_address(addr))
+                       i--;
+       }
+       for(i=ARTIS_BT_SIZE; i>0 && !kstack_end(stack); ) {
+               addr = *stack++;
+               if (kernel_text_address(addr)) {
+                       i--;
+                       bt[i]=(void *)addr;
+               }
+       }
+}
+#endif
+
+
 int register_die_notifier(struct notifier_block *nb)
 {
 	int err = 0;
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/asm-i386/bug.h 2.6.11-artis-cvs/include/asm-i386/bug.h
--- 2.6.11-pfm/include/asm-i386/bug.h	2005-03-02 08:37:49.000000000 +0100
+++ 2.6.11-artis-cvs/include/asm-i386/bug.h	2005-03-25 19:47:46.000000000 +0100
@@ -2,6 +2,7 @@
 #define _I386_BUG_H
 
 #include <linux/config.h>
+#include <linux/artis-macros.h>
 
 /*
  * Tell the user there is some problem.
@@ -10,13 +11,19 @@
  */
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
-#define BUG()				\
+#define _old_BUG()				\
  __asm__ __volatile__(	"ud2\n"		\
 			"\t.word %c0\n"	\
 			"\t.long %c1\n"	\
 			 : : "i" (__LINE__), "i" (__FILE__))
 #else
-#define BUG() __asm__ __volatile__("ud2\n")
+#define _old_BUG() __asm__ __volatile__("ud2\n")
+#endif
+
+#if defined(CONFIG_ARTIS_DEBUG)
+#define BUG() ARTIS_BUG(1,0)
+#else
+#define BUG() _old_BUG()
 #endif
 
 #define HAVE_ARCH_BUG
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/asm-i386/system.h 2.6.11-artis-cvs/include/asm-i386/system.h
--- 2.6.11-pfm/include/asm-i386/system.h	2005-03-02 08:37:30.000000000 +0100
+++ 2.6.11-artis-cvs/include/asm-i386/system.h	2005-03-25 19:47:46.000000000 +0100
@@ -441,9 +441,13 @@ struct alt_instr { 
 #define set_wmb(var, value) do { var = value; wmb(); } while (0)
 
 /* interrupt control.. */
-#define local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
-#define local_irq_restore(x) 	do { typecheck(unsigned long,x); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
-#define local_irq_disable() 	__asm__ __volatile__("cli": : :"memory")
+#define local_save_flags(x)		\
+do { 					\
+	typecheck(unsigned long,x); 	\
+	__asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); \
+} while (0)
+
+#define _raw_local_irq_disable() __asm__ __volatile__("cli": : :"memory")
 #define local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
 /* used in the idle loop; sti takes one instruction cycle to complete */
 #define safe_halt()		__asm__ __volatile__("sti; hlt": : :"memory")
@@ -456,7 +460,48 @@ struct alt_instr { 
 })
 
 /* For spinlocks etc */
-#define local_irq_save(x)	__asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory")
+#define _raw_local_irq_save(x) __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory")
+
+
+#ifdef CONFIG_ARTIS
+
+/* ARTIS: force migration on irq disable. We use the "preempt_disable" work
+ * around to do it because the funtion "artis_try_to_migrate" assume
+ * that preemption is already disabled */
+#define local_irq_restore(x)   		\
+do { 					\
+	typecheck(unsigned long,x); 	\
+	if (!((x) & (1<<9))) { 		\
+		artis_force_migration();\
+	}; 				\
+	__asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc");  \
+} while (0)
+
+#define local_irq_disable()    		\
+do {  					\
+	artis_force_migration();	\
+	_raw_local_irq_disable(); 	\
+} while (0)
+
+#define local_irq_save(x)      		\
+do { 					\
+	artis_force_migration();	\
+	_raw_local_irq_save(x); 	\
+} while(0)
+
+#else
+
+#define local_irq_restore(x)      	\
+do { 					\
+	typecheck(unsigned long,x); 	\
+	__asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc");  				\
+} while (0)
+
+#define local_irq_disable() _raw_local_irq_disable()
+#define local_irq_save(x) _raw_local_irq_save(x)
+
+#endif
+
 
 /*
  * disable hlt during certain critical i/o operations

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [3/3] ARTiS, an asymmetric real-time scheduler - IA-64
  2005-05-03 14:05 [0/3] ARTiS, an asymmetric real-time scheduler Eric Piel
  2005-05-03 14:07 ` [1/3] " Eric Piel
  2005-05-03 14:09 ` [2/3] ARTiS, an asymmetric real-time scheduler - x86 Eric Piel
@ 2005-05-03 14:10 ` Eric Piel
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Piel @ 2005-05-03 14:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Philippe Marquet, Christophe Osuna, Julien Soula,
	Jean-Luc Dekeyser, paul.mckenney

[-- Attachment #1: Type: text/plain, Size: 72 bytes --]

Here is the Here is the architecture dependant part of ARTiS for IA-64.

[-- Attachment #2: artis-2.6.11-20050502-ia64.patch --]
[-- Type: text/x-patch, Size: 6294 bytes --]

diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/arch/ia64/Kconfig 2.6.11-artis-cvs/arch/ia64/Kconfig
--- 2.6.11-pfm/arch/ia64/Kconfig	2005-03-02 08:38:26.000000000 +0100
+++ 2.6.11-artis-cvs/arch/ia64/Kconfig	2005-04-22 18:08:19.000000000 +0200
@@ -46,6 +46,32 @@ config GENERIC_IOMAP
 	bool
 	default y
 
+config ARTIS
+	bool "Compile the kernel with ARTiS support (EXPERIMENTAL)"
+	depends on PREEMPT && SMP
+	help
+		ARTiS is an "Asymmetric Real-Time Scheduling". When activated,
+		the real-time tasks will be insured of low latencies. It works
+		by automatically migrating the normal tasks out from CPUs running
+		real-time tasks when they would trigger a preemption disable or an
+		IRQ disable. Obviously, a SMP system is required (SMT works too).
+		Note that it will not be activated by default, you need to select
+		it at boot time. For more information, see Documentation/artis.txt.
+
+config ARTIS_DEBUG
+       bool "Compile the kernel with ARTiS debugging support (EXPERIMENTAL)"
+       depends on ARTIS
+       help
+		Activate debuging code in ARTiS, you probably don't want this, excepted
+		if you want to hack on ARTiS.
+
+config ARTIS_STAT
+       bool "Compile the kernel with ARTiS accounting support (EXPERIMENTAL)"
+       depends on ARTIS
+       help
+		Activate statistics about ARTiS, available in /proc. There is no (or
+		very little) performance penalty, so you can safely say "yes" here.
+
 choice
 	prompt "System type"
 	default IA64_GENERIC
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/arch/ia64/kernel/process.c 2.6.11-artis-cvs/arch/ia64/kernel/process.c
--- 2.6.11-pfm/arch/ia64/kernel/process.c	2005-03-02 08:38:08.000000000 +0100
+++ 2.6.11-artis-cvs/arch/ia64/kernel/process.c	2005-03-25 19:47:46.000000000 +0100
@@ -229,6 +229,43 @@ static inline void play_dead(void)
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_ARTIS_DEBUG
+#include <linux/artis.h>
+void
+artis_ia64_put_trace(struct unw_frame_info *info, void *arg)
+{
+	void **bt = (void **)arg;
+	int i, artis_skip_bt, r_unw;
+	unsigned long ip, sp, bsp;
+
+	memset(bt, 0, ARTIS_BT_SIZE*sizeof(void *));
+	r_unw=0; 
+	artis_skip_bt=0;
+	for(i=artis_skip_bt-1;
+			i>=0 && r_unw>=0; 
+			i--, r_unw=unw_unwind(info)) {
+		unw_get_ip(info, &ip);
+		if (ip == 0)
+			break;
+		unw_get_sp(info, &sp);
+		unw_get_bsp(info, &bsp);
+	}
+	for(i=ARTIS_BT_SIZE-1;
+			i>=0 && r_unw>=0; 
+			i--, r_unw=unw_unwind(info)) {
+		unw_get_ip(info, &ip);
+		if (ip == 0)
+			break;
+		unw_get_sp(info, &sp);
+		unw_get_bsp(info, &bsp);
+		bt[i] = (void *)ip;
+	}
+}
+void
+artis_put_trace(void **bt, struct task_struct *task, unsigned long *stack) {
+		unw_init_running(artis_ia64_put_trace, (void *)bt);
+}
+#endif
 
 void cpu_idle_wait(void)
 {
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/asm-ia64/bug.h 2.6.11-artis-cvs/include/asm-ia64/bug.h
--- 2.6.11-pfm/include/asm-ia64/bug.h	2005-03-02 08:38:34.000000000 +0100
+++ 2.6.11-artis-cvs/include/asm-ia64/bug.h	2005-03-25 19:47:46.000000000 +0100
@@ -1,12 +1,20 @@
 #ifndef _ASM_IA64_BUG_H
 #define _ASM_IA64_BUG_H
 
+#include <linux/artis-macros.h>
+
 #if (__GNUC__ > 3) || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
 # define ia64_abort()	__builtin_trap()
 #else
 # define ia64_abort()	(*(volatile int *) 0 = 0)
 #endif
-#define BUG() do { printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); ia64_abort(); } while (0)
+#define _old_BUG() do { printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); ia64_abort(); } while (0)
+
+#if defined(CONFIG_ARTIS_DEBUG)
+#define BUG() do { ARTIS_BUG(1,0); } while (0)
+#else
+#define BUG() _old_BUG()
+#endif
 
 /* should this BUG should be made generic? */
 #define HAVE_ARCH_BUG
diff -urpN -X /export/src/patches/dontdiff -X /export/src/patches/dontdiff2 2.6.11-pfm/include/asm-ia64/system.h 2.6.11-artis-cvs/include/asm-ia64/system.h
--- 2.6.11-pfm/include/asm-ia64/system.h	2005-03-02 08:38:07.000000000 +0100
+++ 2.6.11-artis-cvs/include/asm-ia64/system.h	2005-03-25 19:47:46.000000000 +0100
@@ -121,7 +121,7 @@ extern struct ia64_boot_param {
  *   write a floating-point register right before reading the PSR
  *   and that writes to PSR.mfl
  */
-#define __local_irq_save(x)			\
+#define _raw__local_irq_save(x)			\
 do {						\
 	ia64_stop();				\
 	(x) = ia64_getreg(_IA64_REG_PSR);	\
@@ -129,13 +129,47 @@ do {						\
 	ia64_rsm(IA64_PSR_I);			\
 } while (0)
 
-#define __local_irq_disable()			\
+#define _raw__local_irq_disable()		\
 do {						\
 	ia64_stop();				\
 	ia64_rsm(IA64_PSR_I);			\
 } while (0)
 
-#define __local_irq_restore(x)	ia64_intrin_local_irq_restore((x) & IA64_PSR_I)
+#define _raw__local_irq_restore(x)	ia64_intrin_local_irq_restore((x) & IA64_PSR_I)
+
+#ifdef CONFIG_ARTIS
+#include <linux/artis-macros.h>
+
+/* Overwrite all functions "dangerous" for real-time: migrate task on
+ * non real-time CPU */
+
+#define __local_irq_save(x) 			\
+do { 						\
+	artis_force_migration(); 		\
+	_raw__local_irq_save(x); 		\
+} while (0)
+
+#define __local_irq_disable() 			\
+do { 						\
+	artis_force_migration(); 		\
+	_raw__local_irq_disable(); 		\
+} while (0)
+
+#define __local_irq_restore(x) 			\
+do { 						\
+	if (!((x) & IA64_PSR_I)) 		\
+		artis_force_migration(); 	\
+	_raw__local_irq_restore(x); 		\
+} while (0)
+
+#else
+
+#define __local_irq_save(x) 	_raw__local_irq_save(x)
+#define __local_irq_disable() 	_raw__local_irq_disable()
+#define __local_irq_restore(x) 	_raw__local_irq_restore(x)
+
+#endif
+
 
 #ifdef CONFIG_IA64_DEBUG_IRQ
 
@@ -282,6 +316,20 @@ do {						\
 #define finish_arch_switch(rq, prev)	spin_unlock_irq(&(prev)->switch_lock)
 #define task_running(rq, p) 		((rq)->curr == (p) || spin_is_locked(&(p)->switch_lock))
 
+#ifdef CONFIG_ARTIS
+/* On IA64, the end of scheduler releases the runqueue lock, 
+ * so a wake-up has re-activated the task and we must
+ * deactivate it
+ * */
+#define artis_complete_arch(rq, task) 		\
+do { 						\
+	spin_lock(&(rq)->lock); 		\
+	if ((task)->array) 			\
+		deactivate_task((task),(rq)); 	\
+} while(0)
+#define artis_finish_complete_arch(rq, task) spin_unlock(&(rq)->lock)
+#endif
+
 #define ia64_platform_is(x) (strcmp(x, platform_name) == 0)
 
 void cpu_idle_wait(void);

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-05-03 14:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-03 14:05 [0/3] ARTiS, an asymmetric real-time scheduler Eric Piel
2005-05-03 14:07 ` [1/3] " Eric Piel
2005-05-03 14:09 ` [2/3] ARTiS, an asymmetric real-time scheduler - x86 Eric Piel
2005-05-03 14:10 ` [3/3] ARTiS, an asymmetric real-time scheduler - IA-64 Eric Piel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox