* [patch] softirq performance fixes, cleanups, 2.4.10.
@ 2001-09-26 16:44 Ingo Molnar
2001-09-26 17:48 ` Mike Kravetz
` (3 more replies)
0 siblings, 4 replies; 44+ messages in thread
From: Ingo Molnar @ 2001-09-26 16:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise,
Andrea Arcangeli
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4179 bytes --]
the Linux softirq code still has a number of performance and latency
problems as of 2.4.10.
one issue is that there are still places in the kernel that disable/enable
softirq processing, but do not restart softirqs. This creates softirq
processing latencies, which can show up eg. as 'stuttering' packet
processing. Longer latencies between hard interrupts and soft interrupt
processing also decreases caching efficiency - if eg. a socket buffer was
touched in a network driver, it might get dropped from the cache by the
time the skb is processed by its softirq handler.
another problem is increased scheduling and softirq handling overhead due
to ksoftirqd, and related performance degradation in high-speed network
environments. (Performance drops of more than 10% were reported with
certain gigabit cards.) Under various multi-process networking loads
ksoftirqd is very active.
the attached softirq-2.4.10-A5 patch solves these two main problems and
also cleans up softirq.c.
main changes in softirq handling:
- softirq handling can now be restarted N times within do_softirq(), if a
softirq gets reactivated while it's being handled.
- implemented a new scheduler mechanizm, 'unwakeup()', to undo ksoftirqd
wakeups if softirqs happen to be fully handled before ksoftirqd runs.
(unwakeup() does not touch the runqueue lock if the task in question is
already running.)
- cpu_raise_softirq() used to wake ksoftirqd up - instead of handling
softirqs immediately. All softirq users are using __cpu_raise_softirq()
now, and have to call rerun_softirqs() after the softirq-atomic section
has finished.
none of these changes results in any change of tasklet or bottom-half
semantics.
the HTTP load situation i tested shows the following changes in scheduling
frequency:
context switches per second
(measured over a period of 10 seconds,
repeated 10 times and averaged.)
2.4.10-vanilla: 39299
2.4.10-softirq-A6: 35552
a 10.5% improvement. HTTP performance increased by 2%, but the system had
idle time left. Kernels with the softirq-A6 patch applied show almost no
ksoftirqd activity, while vanilla 2.4.10 shows frequent ksoftirqd
activation.
other fixes/cleanups to softirq.c:
- removed 'mask' handling from do_softirq() - it's unnecessery due to the
restarts. this further simplifies the code.
- tasklet_hi_schedule() and tasklet_lo_schedule() are now rerunning
softirqs, instead of just kicking ksoftirqd.
- removed raise_softirq() and cpu_raise_softirq(), they are not used by
any other code anymore. unexported them.
- simplified argument passing between spawn_ksoftirqd() and ksoftirqd(),
passing an argument by pointer and waiting for ksoftirqd tasks to start
up is unnecessery.
- it's unnecessary to spin scheduling in ksoftirqd() startup, waiting for
the process to migrate - it's enough to call schedule() once, the
scheduler will not run the task on the wrong CPU.
- '[ksoftirqd_CPU0]' is confusing on UP systems, changed it to
'[ksoftirqd]' instead.
- simplified ksoftirqd()'s loop, it's both shorter and faster by a few
instructions now.
- __netif_schedule() is using __cpu_raise_softirq(), instead of
cpu_raise_softirq() [which did not restart softirq handling, it only
woke ksoftirqd up].
- dev_kfree_skb_irq() ditto. (although this function is mostly called
from IRQ contexts, where softirq restarts are not possible - but the
IRQ code will restart them nevertheless, on IRQ exit.)
- the generic definition of __cpu_raise_softirq() used to override
any lowlevel definitions done in asm/softirq.h. It's now conditional so
the architecture definitions should actually be used.
i've tested the patch both on UP and SMP systems, and saw no problems at
all. The changes decrease the size of softirq object code by ~8%. Network
packet handling appears to be smoother. (this is subjective, it's hard to
measure it). Ben, does this patch fix gigabit performance in your test, or
is still something else going on as well?
(The patch also applies cleanly to the -ac tree.)
Ingo
[-- Attachment #2: Type: TEXT/PLAIN, Size: 10815 bytes --]
--- linux/kernel/ksyms.c.orig Wed Sep 26 17:04:40 2001
+++ linux/kernel/ksyms.c Wed Sep 26 17:04:48 2001
@@ -538,8 +538,6 @@
EXPORT_SYMBOL(tasklet_kill);
EXPORT_SYMBOL(__run_task_queue);
EXPORT_SYMBOL(do_softirq);
-EXPORT_SYMBOL(raise_softirq);
-EXPORT_SYMBOL(cpu_raise_softirq);
EXPORT_SYMBOL(__tasklet_schedule);
EXPORT_SYMBOL(__tasklet_hi_schedule);
--- linux/kernel/sched.c.orig Wed Sep 26 17:04:40 2001
+++ linux/kernel/sched.c Wed Sep 26 17:04:48 2001
@@ -366,6 +366,28 @@
}
/**
+ * unwakeup - undo wakeup if possible.
+ * @p: task
+ * @state: new task state
+ *
+ * Undo a previous wakeup of the specified task - if the process
+ * is not running already. The main interface to be used is
+ * unwakeup_process(), it will do a lockless test whether the task
+ * is on the runqueue.
+ */
+void __unwakeup_process(struct task_struct * p, long state)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&runqueue_lock, flags);
+ if (!p->has_cpu && (p != current) && task_on_runqueue(p)) {
+ del_from_runqueue(p);
+ p->state = state;
+ }
+ spin_unlock_irqrestore(&runqueue_lock, flags);
+}
+
+/**
* schedule_timeout - sleep until timeout
* @timeout: timeout value in jiffies
*
--- linux/kernel/softirq.c.orig Wed Sep 26 17:04:40 2001
+++ linux/kernel/softirq.c Wed Sep 26 17:45:00 2001
@@ -58,12 +58,35 @@
wake_up_process(tsk);
}
+/*
+ * If a softirqs were fully handled after ksoftirqd was woken
+ * up then try to undo the wakeup.
+ */
+static inline void unwakeup_softirqd(unsigned cpu)
+{
+ struct task_struct * tsk = ksoftirqd_task(cpu);
+
+ if (tsk)
+ unwakeup_process(tsk, TASK_INTERRUPTIBLE);
+}
+
+/*
+ * We restart softirq processing MAX_SOFTIRQ_RESTART times,
+ * and we fall back to softirqd after that.
+ *
+ * This number has been established via experimentation.
+ * The two things to balance is latency against fairness -
+ * we want to handle softirqs as soon as possible, but they
+ * should not be able to lock up the box.
+ */
+#define MAX_SOFTIRQ_RESTART 10
+
asmlinkage void do_softirq()
{
+ int max_restart = MAX_SOFTIRQ_RESTART;
int cpu = smp_processor_id();
- __u32 pending;
+ __u32 pending, mask;
long flags;
- __u32 mask;
if (in_interrupt())
return;
@@ -95,55 +118,37 @@
local_irq_disable();
pending = softirq_pending(cpu);
- if (pending & mask) {
- mask &= ~pending;
+ if (pending && --max_restart)
goto restart;
- }
__local_bh_enable();
if (pending)
+ /*
+ * In the normal case ksoftirqd is rarely activated,
+ * increased scheduling hurts performance.
+ * It's a safety measure: if external load starts
+ * to flood the system with softirqs then we
+ * will mitigate softirq work to the softirq thread.
+ */
wakeup_softirqd(cpu);
+ else
+ /*
+ * All softirqs are handled - undo any possible
+ * wakeup of softirqd. This reduces context switch
+ * overhead.
+ */
+ unwakeup_softirqd(cpu);
}
local_irq_restore(flags);
}
-/*
- * This function must run with irq disabled!
- */
-inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
-{
- __cpu_raise_softirq(cpu, nr);
-
- /*
- * If we're in an interrupt or bh, we're done
- * (this also catches bh-disabled code). We will
- * actually run the softirq once we return from
- * the irq or bh.
- *
- * Otherwise we wake up ksoftirqd to make sure we
- * schedule the softirq soon.
- */
- if (!(local_irq_count(cpu) | local_bh_count(cpu)))
- wakeup_softirqd(cpu);
-}
-
-void raise_softirq(unsigned int nr)
-{
- long flags;
-
- local_irq_save(flags);
- cpu_raise_softirq(smp_processor_id(), nr);
- local_irq_restore(flags);
-}
-
void open_softirq(int nr, void (*action)(struct softirq_action*), void *data)
{
softirq_vec[nr].data = data;
softirq_vec[nr].action = action;
}
-
/* Tasklets */
struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned;
@@ -157,8 +162,9 @@
local_irq_save(flags);
t->next = tasklet_vec[cpu].list;
tasklet_vec[cpu].list = t;
- cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
+ __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
local_irq_restore(flags);
+ rerun_softirqs(cpu);
}
void __tasklet_hi_schedule(struct tasklet_struct *t)
@@ -169,8 +175,9 @@
local_irq_save(flags);
t->next = tasklet_hi_vec[cpu].list;
tasklet_hi_vec[cpu].list = t;
- cpu_raise_softirq(cpu, HI_SOFTIRQ);
+ __cpu_raise_softirq(cpu, HI_SOFTIRQ);
local_irq_restore(flags);
+ rerun_softirqs(cpu);
}
static void tasklet_action(struct softirq_action *a)
@@ -241,7 +248,6 @@
}
}
-
void tasklet_init(struct tasklet_struct *t,
void (*func)(unsigned long), unsigned long data)
{
@@ -268,8 +274,6 @@
clear_bit(TASKLET_STATE_SCHED, &t->state);
}
-
-
/* Old style BHs */
static void (*bh_base[32])(void);
@@ -325,7 +329,7 @@
{
int i;
- for (i=0; i<32; i++)
+ for (i = 0; i < 32; i++)
tasklet_init(bh_task_vec+i, bh_action, i);
open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
@@ -361,56 +365,52 @@
static int ksoftirqd(void * __bind_cpu)
{
- int bind_cpu = *(int *) __bind_cpu;
- int cpu = cpu_logical_map(bind_cpu);
+ int cpu = cpu_logical_map((int)__bind_cpu);
daemonize();
- current->nice = 19;
+
sigfillset(¤t->blocked);
/* Migrate to the right CPU */
- current->cpus_allowed = 1UL << cpu;
- while (smp_processor_id() != cpu)
- schedule();
+ current->cpus_allowed = 1 << cpu;
- sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu);
+#if CONFIG_SMP
+ sprintf(current->comm, "ksoftirqd CPU%d", cpu);
+#else
+ sprintf(current->comm, "ksoftirqd");
+#endif
+ current->nice = 19;
+ schedule();
__set_current_state(TASK_INTERRUPTIBLE);
- mb();
-
ksoftirqd_task(cpu) = current;
for (;;) {
- if (!softirq_pending(cpu))
- schedule();
-
- __set_current_state(TASK_RUNNING);
-
- while (softirq_pending(cpu)) {
+back:
+ do {
do_softirq();
if (current->need_resched)
- schedule();
- }
-
+ goto preempt;
+ } while (softirq_pending(cpu));
+ schedule();
__set_current_state(TASK_INTERRUPTIBLE);
}
+
+preempt:
+ __set_current_state(TASK_RUNNING);
+ schedule();
+ __set_current_state(TASK_INTERRUPTIBLE);
+ goto back;
}
static __init int spawn_ksoftirqd(void)
{
int cpu;
- for (cpu = 0; cpu < smp_num_cpus; cpu++) {
- if (kernel_thread(ksoftirqd, (void *) &cpu,
+ for (cpu = 0; cpu < smp_num_cpus; cpu++)
+ if (kernel_thread(ksoftirqd, (void *) cpu,
CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0)
- printk("spawn_ksoftirqd() failed for cpu %d\n", cpu);
- else {
- while (!ksoftirqd_task(cpu_logical_map(cpu))) {
- current->policy |= SCHED_YIELD;
- schedule();
- }
- }
- }
+ BUG();
return 0;
}
--- linux/include/linux/netdevice.h.orig Wed Sep 26 17:04:36 2001
+++ linux/include/linux/netdevice.h Wed Sep 26 17:08:20 2001
@@ -486,8 +486,9 @@
local_irq_save(flags);
dev->next_sched = softnet_data[cpu].output_queue;
softnet_data[cpu].output_queue = dev;
- cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
+ __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
local_irq_restore(flags);
+ rerun_softirqs(cpu);
}
}
@@ -535,8 +536,9 @@
local_irq_save(flags);
skb->next = softnet_data[cpu].completion_queue;
softnet_data[cpu].completion_queue = skb;
- cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
+ __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
local_irq_restore(flags);
+ rerun_softirqs(cpu);
}
}
--- linux/include/linux/interrupt.h.orig Wed Sep 26 17:04:40 2001
+++ linux/include/linux/interrupt.h Wed Sep 26 17:45:23 2001
@@ -74,9 +74,16 @@
asmlinkage void do_softirq(void);
extern void open_softirq(int nr, void (*action)(struct softirq_action*), void *data);
extern void softirq_init(void);
-#define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0)
-extern void FASTCALL(cpu_raise_softirq(unsigned int cpu, unsigned int nr));
-extern void FASTCALL(raise_softirq(unsigned int nr));
+#ifndef __cpu_raise_softirq
+#define __cpu_raise_softirq(cpu, nr) \
+ do { softirq_pending(cpu) |= 1UL << (nr); } while (0)
+#endif
+
+#define rerun_softirqs(cpu) \
+do { \
+ if (!(local_irq_count(cpu) | local_bh_count(cpu))) \
+ do_softirq(); \
+} while (0);
--- linux/include/linux/sched.h.orig Wed Sep 26 17:04:40 2001
+++ linux/include/linux/sched.h Wed Sep 26 17:08:16 2001
@@ -556,6 +556,7 @@
extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr));
extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr));
+extern void FASTCALL(__unwakeup_process(struct task_struct * p, long state));
extern void FASTCALL(sleep_on(wait_queue_head_t *q));
extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q,
signed long timeout));
@@ -574,6 +575,13 @@
#define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0)
#define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
#define wake_up_interruptible_sync_nr(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, nr)
+
+#define unwakeup_process(tsk,state) \
+do { \
+ if (task_on_runqueue(tsk)) \
+ __unwakeup_process(tsk,state); \
+} while (0)
+
asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru);
extern int in_group_p(gid_t);
--- linux/include/asm-i386/softirq.h.orig Wed Sep 26 17:04:40 2001
+++ linux/include/asm-i386/softirq.h Wed Sep 26 17:08:16 2001
@@ -45,4 +45,9 @@
/* no registers clobbered */ ); \
} while (0)
+
+/* It's using __set_bit() because it only needs to be IRQ-atomic. */
+
+#define __cpu_raise_softirq(cpu, nr) __set_bit(nr, &softirq_pending(cpu))
+
#endif /* __ASM_SOFTIRQ_H */
--- linux/net/core/dev.c.orig Wed Sep 26 17:04:41 2001
+++ linux/net/core/dev.c Wed Sep 26 17:04:48 2001
@@ -1218,8 +1218,9 @@
dev_hold(skb->dev);
__skb_queue_tail(&queue->input_pkt_queue,skb);
/* Runs from irqs or BH's, no need to wake BH */
- cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
+ __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
local_irq_restore(flags);
+ rerun_softirqs(this_cpu);
#ifndef OFFLINE_SAMPLE
get_sample_stats(this_cpu);
#endif
@@ -1529,8 +1530,9 @@
local_irq_disable();
netdev_rx_stat[this_cpu].time_squeeze++;
/* This already runs in BH context, no need to wake up BH's */
- cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
+ __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
local_irq_enable();
+ rerun_softirqs(this_cpu);
NET_PROFILE_LEAVE(softnet_process);
return;
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-26 16:44 [patch] softirq performance fixes, cleanups, 2.4.10 Ingo Molnar @ 2001-09-26 17:48 ` Mike Kravetz 2001-09-26 18:48 ` Ingo Molnar 2001-09-26 18:55 ` Russell King ` (2 subsequent siblings) 3 siblings, 1 reply; 44+ messages in thread From: Mike Kravetz @ 2001-09-26 17:48 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise, Andrea Arcangeli On Wed, Sep 26, 2001 at 06:44:03PM +0200, Ingo Molnar wrote: > +void __unwakeup_process(struct task_struct * p, long state) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&runqueue_lock, flags); > + if (!p->has_cpu && (p != current) && task_on_runqueue(p)) { > + del_from_runqueue(p); > + p->state = state; > + } > + spin_unlock_irqrestore(&runqueue_lock, flags); > +} Is it really possible for a task to be 'current' without having 'has_cpu' set? If so, then don't you need to compare p to 'current' on all CPUs since 'current' is CPU specific? -- Mike Kravetz kravetz@us.ibm.com IBM Linux Technology Center (we're not at Sequent anymore) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-26 17:48 ` Mike Kravetz @ 2001-09-26 18:48 ` Ingo Molnar 0 siblings, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-26 18:48 UTC (permalink / raw) To: Mike Kravetz Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise, Andrea Arcangeli On Wed, 26 Sep 2001, Mike Kravetz wrote: > > + if (!p->has_cpu && (p != current) && task_on_runqueue(p)) { > Is it really possible for a task to be 'current' without having > 'has_cpu' set? If so, then don't you need to compare p to 'current' > on all CPUs since 'current' is CPU specific? you are correct that the check is redundant on SMP, but ->has_cpu is not set on UP kernels, so the 'current' check is needed there. I did not want to introduce yet another #if CONFIG_SMP ifdef. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-26 16:44 [patch] softirq performance fixes, cleanups, 2.4.10 Ingo Molnar 2001-09-26 17:48 ` Mike Kravetz @ 2001-09-26 18:55 ` Russell King 2001-09-26 19:14 ` Ingo Molnar 2001-09-27 23:31 ` Andrea Arcangeli 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet 3 siblings, 1 reply; 44+ messages in thread From: Russell King @ 2001-09-26 18:55 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise, Andrea Arcangeli On Wed, Sep 26, 2001 at 06:44:03PM +0200, Ingo Molnar wrote: > - the generic definition of __cpu_raise_softirq() used to override > any lowlevel definitions done in asm/softirq.h. It's now conditional so > the architecture definitions should actually be used. Ingo, The generic definition is the one to use - we used to allow __cpu_raise_softirq from outside IRQ context, but all RISC architectures ended up with code as follows: restore irqs save + disable irqs set bit restore irqs In the latest kernels, you will notice that __cpu_raise_softirq is always within an IRQ protected region (thanks to Andrea for that work), so there is no reason for it to protected itself from interrupts. Hence the comment above cpu_raise_softirq: * This function must run with irq disabled! -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-26 18:55 ` Russell King @ 2001-09-26 19:14 ` Ingo Molnar 0 siblings, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-26 19:14 UTC (permalink / raw) To: Russell King Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise, Andrea Arcangeli [-- Attachment #1: Type: TEXT/PLAIN, Size: 271 bytes --] On Wed, 26 Sep 2001, Russell King wrote: > The generic definition is the one to use [...] right, i forgot, thanks for pointing it out. New patch attached that removes all the old definitions from other architectures and makes the generic version unconditional. Ingo [-- Attachment #2: Type: TEXT/PLAIN, Size: 11338 bytes --] --- linux/kernel/ksyms.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/ksyms.c Wed Sep 26 17:04:48 2001 @@ -538,8 +538,6 @@ EXPORT_SYMBOL(tasklet_kill); EXPORT_SYMBOL(__run_task_queue); EXPORT_SYMBOL(do_softirq); -EXPORT_SYMBOL(raise_softirq); -EXPORT_SYMBOL(cpu_raise_softirq); EXPORT_SYMBOL(__tasklet_schedule); EXPORT_SYMBOL(__tasklet_hi_schedule); --- linux/kernel/sched.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/sched.c Wed Sep 26 17:04:48 2001 @@ -366,6 +366,28 @@ } /** + * unwakeup - undo wakeup if possible. + * @p: task + * @state: new task state + * + * Undo a previous wakeup of the specified task - if the process + * is not running already. The main interface to be used is + * unwakeup_process(), it will do a lockless test whether the task + * is on the runqueue. + */ +void __unwakeup_process(struct task_struct * p, long state) +{ + unsigned long flags; + + spin_lock_irqsave(&runqueue_lock, flags); + if (!p->has_cpu && (p != current) && task_on_runqueue(p)) { + del_from_runqueue(p); + p->state = state; + } + spin_unlock_irqrestore(&runqueue_lock, flags); +} + +/** * schedule_timeout - sleep until timeout * @timeout: timeout value in jiffies * --- linux/kernel/softirq.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/softirq.c Wed Sep 26 17:45:00 2001 @@ -58,12 +58,35 @@ wake_up_process(tsk); } +/* + * If a softirqs were fully handled after ksoftirqd was woken + * up then try to undo the wakeup. + */ +static inline void unwakeup_softirqd(unsigned cpu) +{ + struct task_struct * tsk = ksoftirqd_task(cpu); + + if (tsk) + unwakeup_process(tsk, TASK_INTERRUPTIBLE); +} + +/* + * We restart softirq processing MAX_SOFTIRQ_RESTART times, + * and we fall back to softirqd after that. + * + * This number has been established via experimentation. + * The two things to balance is latency against fairness - + * we want to handle softirqs as soon as possible, but they + * should not be able to lock up the box. + */ +#define MAX_SOFTIRQ_RESTART 10 + asmlinkage void do_softirq() { + int max_restart = MAX_SOFTIRQ_RESTART; int cpu = smp_processor_id(); - __u32 pending; + __u32 pending, mask; long flags; - __u32 mask; if (in_interrupt()) return; @@ -95,55 +118,37 @@ local_irq_disable(); pending = softirq_pending(cpu); - if (pending & mask) { - mask &= ~pending; + if (pending && --max_restart) goto restart; - } __local_bh_enable(); if (pending) + /* + * In the normal case ksoftirqd is rarely activated, + * increased scheduling hurts performance. + * It's a safety measure: if external load starts + * to flood the system with softirqs then we + * will mitigate softirq work to the softirq thread. + */ wakeup_softirqd(cpu); + else + /* + * All softirqs are handled - undo any possible + * wakeup of softirqd. This reduces context switch + * overhead. + */ + unwakeup_softirqd(cpu); } local_irq_restore(flags); } -/* - * This function must run with irq disabled! - */ -inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr) -{ - __cpu_raise_softirq(cpu, nr); - - /* - * If we're in an interrupt or bh, we're done - * (this also catches bh-disabled code). We will - * actually run the softirq once we return from - * the irq or bh. - * - * Otherwise we wake up ksoftirqd to make sure we - * schedule the softirq soon. - */ - if (!(local_irq_count(cpu) | local_bh_count(cpu))) - wakeup_softirqd(cpu); -} - -void raise_softirq(unsigned int nr) -{ - long flags; - - local_irq_save(flags); - cpu_raise_softirq(smp_processor_id(), nr); - local_irq_restore(flags); -} - void open_softirq(int nr, void (*action)(struct softirq_action*), void *data) { softirq_vec[nr].data = data; softirq_vec[nr].action = action; } - /* Tasklets */ struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned; @@ -157,8 +162,9 @@ local_irq_save(flags); t->next = tasklet_vec[cpu].list; tasklet_vec[cpu].list = t; - cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); + __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } void __tasklet_hi_schedule(struct tasklet_struct *t) @@ -169,8 +175,9 @@ local_irq_save(flags); t->next = tasklet_hi_vec[cpu].list; tasklet_hi_vec[cpu].list = t; - cpu_raise_softirq(cpu, HI_SOFTIRQ); + __cpu_raise_softirq(cpu, HI_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } static void tasklet_action(struct softirq_action *a) @@ -241,7 +248,6 @@ } } - void tasklet_init(struct tasklet_struct *t, void (*func)(unsigned long), unsigned long data) { @@ -268,8 +274,6 @@ clear_bit(TASKLET_STATE_SCHED, &t->state); } - - /* Old style BHs */ static void (*bh_base[32])(void); @@ -325,7 +329,7 @@ { int i; - for (i=0; i<32; i++) + for (i = 0; i < 32; i++) tasklet_init(bh_task_vec+i, bh_action, i); open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); @@ -361,56 +365,52 @@ static int ksoftirqd(void * __bind_cpu) { - int bind_cpu = *(int *) __bind_cpu; - int cpu = cpu_logical_map(bind_cpu); + int cpu = cpu_logical_map((int)__bind_cpu); daemonize(); - current->nice = 19; + sigfillset(¤t->blocked); /* Migrate to the right CPU */ - current->cpus_allowed = 1UL << cpu; - while (smp_processor_id() != cpu) - schedule(); + current->cpus_allowed = 1 << cpu; - sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu); +#if CONFIG_SMP + sprintf(current->comm, "ksoftirqd CPU%d", cpu); +#else + sprintf(current->comm, "ksoftirqd"); +#endif + current->nice = 19; + schedule(); __set_current_state(TASK_INTERRUPTIBLE); - mb(); - ksoftirqd_task(cpu) = current; for (;;) { - if (!softirq_pending(cpu)) - schedule(); - - __set_current_state(TASK_RUNNING); - - while (softirq_pending(cpu)) { +back: + do { do_softirq(); if (current->need_resched) - schedule(); - } - + goto preempt; + } while (softirq_pending(cpu)); + schedule(); __set_current_state(TASK_INTERRUPTIBLE); } + +preempt: + __set_current_state(TASK_RUNNING); + schedule(); + __set_current_state(TASK_INTERRUPTIBLE); + goto back; } static __init int spawn_ksoftirqd(void) { int cpu; - for (cpu = 0; cpu < smp_num_cpus; cpu++) { - if (kernel_thread(ksoftirqd, (void *) &cpu, + for (cpu = 0; cpu < smp_num_cpus; cpu++) + if (kernel_thread(ksoftirqd, (void *) cpu, CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0) - printk("spawn_ksoftirqd() failed for cpu %d\n", cpu); - else { - while (!ksoftirqd_task(cpu_logical_map(cpu))) { - current->policy |= SCHED_YIELD; - schedule(); - } - } - } + BUG(); return 0; } --- linux/include/linux/netdevice.h.orig Wed Sep 26 17:04:36 2001 +++ linux/include/linux/netdevice.h Wed Sep 26 17:08:20 2001 @@ -486,8 +486,9 @@ local_irq_save(flags); dev->next_sched = softnet_data[cpu].output_queue; softnet_data[cpu].output_queue = dev; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } @@ -535,8 +536,9 @@ local_irq_save(flags); skb->next = softnet_data[cpu].completion_queue; softnet_data[cpu].completion_queue = skb; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } --- linux/include/linux/interrupt.h.orig Wed Sep 26 17:04:40 2001 +++ linux/include/linux/interrupt.h Wed Sep 26 20:57:12 2001 @@ -74,9 +74,14 @@ asmlinkage void do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action*), void *data); extern void softirq_init(void); -#define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0) -extern void FASTCALL(cpu_raise_softirq(unsigned int cpu, unsigned int nr)); -extern void FASTCALL(raise_softirq(unsigned int nr)); +#define __cpu_raise_softirq(cpu, nr) \ + do { softirq_pending(cpu) |= 1UL << (nr); } while (0) + +#define rerun_softirqs(cpu) \ +do { \ + if (!(local_irq_count(cpu) | local_bh_count(cpu))) \ + do_softirq(); \ +} while (0); --- linux/include/linux/sched.h.orig Wed Sep 26 17:04:40 2001 +++ linux/include/linux/sched.h Wed Sep 26 17:08:16 2001 @@ -556,6 +556,7 @@ extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr)); extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); +extern void FASTCALL(__unwakeup_process(struct task_struct * p, long state)); extern void FASTCALL(sleep_on(wait_queue_head_t *q)); extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q, signed long timeout)); @@ -574,6 +575,13 @@ #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0) #define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1) #define wake_up_interruptible_sync_nr(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, nr) + +#define unwakeup_process(tsk,state) \ +do { \ + if (task_on_runqueue(tsk)) \ + __unwakeup_process(tsk,state); \ +} while (0) + asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru); extern int in_group_p(gid_t); --- linux/include/asm-mips/softirq.h.orig Wed Sep 26 20:58:00 2001 +++ linux/include/asm-mips/softirq.h Wed Sep 26 20:58:07 2001 @@ -40,6 +40,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -#define __cpu_raise_softirq(cpu, nr) set_bit(nr, &softirq_pending(cpu)) - #endif /* _ASM_SOFTIRQ_H */ --- linux/include/asm-mips64/softirq.h.orig Wed Sep 26 20:58:20 2001 +++ linux/include/asm-mips64/softirq.h Wed Sep 26 20:58:26 2001 @@ -39,19 +39,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -extern inline void __cpu_raise_softirq(int cpu, int nr) -{ - unsigned int *m = (unsigned int *) &softirq_pending(cpu); - unsigned int temp; - - __asm__ __volatile__( - "1:\tll\t%0, %1\t\t\t# __cpu_raise_softirq\n\t" - "or\t%0, %2\n\t" - "sc\t%0, %1\n\t" - "beqz\t%0, 1b" - : "=&r" (temp), "=m" (*m) - : "ir" (1UL << nr), "m" (*m) - : "memory"); -} - #endif /* _ASM_SOFTIRQ_H */ --- linux/net/core/dev.c.orig Wed Sep 26 17:04:41 2001 +++ linux/net/core/dev.c Wed Sep 26 17:04:48 2001 @@ -1218,8 +1218,9 @@ dev_hold(skb->dev); __skb_queue_tail(&queue->input_pkt_queue,skb); /* Runs from irqs or BH's, no need to wake BH */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(this_cpu); #ifndef OFFLINE_SAMPLE get_sample_stats(this_cpu); #endif @@ -1529,8 +1530,9 @@ local_irq_disable(); netdev_rx_stat[this_cpu].time_squeeze++; /* This already runs in BH context, no need to wake up BH's */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_enable(); + rerun_softirqs(this_cpu); NET_PROFILE_LEAVE(softnet_process); return; ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-26 16:44 [patch] softirq performance fixes, cleanups, 2.4.10 Ingo Molnar 2001-09-26 17:48 ` Mike Kravetz 2001-09-26 18:55 ` Russell King @ 2001-09-27 23:31 ` Andrea Arcangeli 2001-09-28 3:20 ` Andrea Arcangeli 2001-09-28 7:18 ` [patch] softirq-2.4.10-B2 Ingo Molnar 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet 3 siblings, 2 replies; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-27 23:31 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise On Wed, Sep 26, 2001 at 06:44:03PM +0200, Ingo Molnar wrote: some comment after reading your softirq-2.4.10-A7. > - softirq handling can now be restarted N times within do_softirq(), if a > softirq gets reactivated while it's being handled. is this really necessary after introducing the unwakeup logic? What do you get if you allow at max 1 softirq pass as before? > - '[ksoftirqd_CPU0]' is confusing on UP systems, changed it to > '[ksoftirqd]' instead. "confusing" for you maybe, not for me, but I don't care about this one anyways :). > - simplified ksoftirqd()'s loop, it's both shorter and faster by a few > instructions now. only detail: ksoftirqd can show up as sleeping from /proc while it's runnable but I don't think it's a problem and saving the state clobbering is probably more sensible. no other obvious issue, except I preferred to wait each ksoftirqd to startup succesfully to be strictier and I'd also put an assert after the schedule() to verify ksoftirqd is running in the right cpu. Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-27 23:31 ` Andrea Arcangeli @ 2001-09-28 3:20 ` Andrea Arcangeli 2001-09-28 7:34 ` Ingo Molnar 2001-09-28 7:18 ` [patch] softirq-2.4.10-B2 Ingo Molnar 1 sibling, 1 reply; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 3:20 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise On Fri, Sep 28, 2001 at 01:31:06AM +0200, Andrea Arcangeli wrote: > On Wed, Sep 26, 2001 at 06:44:03PM +0200, Ingo Molnar wrote: > > some comment after reading your softirq-2.4.10-A7. > > > - softirq handling can now be restarted N times within do_softirq(), if a > > softirq gets reactivated while it's being handled. > > is this really necessary after introducing the unwakeup logic? What do > you get if you allow at max 1 softirq pass as before? I'm just curious, what are the numbers of your A7 patch compared with this one? diff -urN 2.4.10/include/asm-mips/softirq.h softirq/include/asm-mips/softirq.h --- 2.4.10/include/asm-mips/softirq.h Sun Sep 23 21:11:41 2001 +++ softirq/include/asm-mips/softirq.h Fri Sep 28 05:18:45 2001 @@ -40,6 +40,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -#define __cpu_raise_softirq(cpu, nr) set_bit(nr, &softirq_pending(cpu)) - #endif /* _ASM_SOFTIRQ_H */ diff -urN 2.4.10/include/asm-mips64/softirq.h softirq/include/asm-mips64/softirq.h --- 2.4.10/include/asm-mips64/softirq.h Sun Sep 23 21:11:41 2001 +++ softirq/include/asm-mips64/softirq.h Fri Sep 28 05:18:45 2001 @@ -39,19 +39,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -extern inline void __cpu_raise_softirq(int cpu, int nr) -{ - unsigned int *m = (unsigned int *) &softirq_pending(cpu); - unsigned int temp; - - __asm__ __volatile__( - "1:\tll\t%0, %1\t\t\t# __cpu_raise_softirq\n\t" - "or\t%0, %2\n\t" - "sc\t%0, %1\n\t" - "beqz\t%0, 1b" - : "=&r" (temp), "=m" (*m) - : "ir" (1UL << nr), "m" (*m) - : "memory"); -} - #endif /* _ASM_SOFTIRQ_H */ diff -urN 2.4.10/include/linux/interrupt.h softirq/include/linux/interrupt.h --- 2.4.10/include/linux/interrupt.h Fri Sep 28 04:08:52 2001 +++ softirq/include/linux/interrupt.h Fri Sep 28 05:18:45 2001 @@ -74,9 +74,14 @@ asmlinkage void do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action*), void *data); extern void softirq_init(void); -#define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0) -extern void FASTCALL(cpu_raise_softirq(unsigned int cpu, unsigned int nr)); -extern void FASTCALL(raise_softirq(unsigned int nr)); +#define __cpu_raise_softirq(cpu, nr) \ + do { softirq_pending(cpu) |= 1UL << (nr); } while (0) + +#define rerun_softirqs(cpu) \ +do { \ + if (!(local_irq_count(cpu) | local_bh_count(cpu))) \ + do_softirq(); \ +} while (0); diff -urN 2.4.10/include/linux/netdevice.h softirq/include/linux/netdevice.h --- 2.4.10/include/linux/netdevice.h Fri Sep 28 04:09:37 2001 +++ softirq/include/linux/netdevice.h Fri Sep 28 05:18:45 2001 @@ -486,8 +486,9 @@ local_irq_save(flags); dev->next_sched = softnet_data[cpu].output_queue; softnet_data[cpu].output_queue = dev; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } @@ -535,8 +536,9 @@ local_irq_save(flags); skb->next = softnet_data[cpu].completion_queue; softnet_data[cpu].completion_queue = skb; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } diff -urN 2.4.10/include/linux/sched.h softirq/include/linux/sched.h --- 2.4.10/include/linux/sched.h Sun Sep 23 21:11:42 2001 +++ softirq/include/linux/sched.h Fri Sep 28 05:18:45 2001 @@ -556,6 +556,7 @@ extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr)); extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); +extern void FASTCALL(__unwakeup_process(struct task_struct * p, long state)); extern void FASTCALL(sleep_on(wait_queue_head_t *q)); extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q, signed long timeout)); @@ -574,6 +575,13 @@ #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0) #define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1) #define wake_up_interruptible_sync_nr(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, nr) + +#define unwakeup_process(tsk,state) \ +do { \ + if (task_on_runqueue(tsk)) \ + __unwakeup_process(tsk,state); \ +} while (0) + asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru); extern int in_group_p(gid_t); diff -urN 2.4.10/kernel/ksyms.c softirq/kernel/ksyms.c --- 2.4.10/kernel/ksyms.c Sun Sep 23 21:11:43 2001 +++ softirq/kernel/ksyms.c Fri Sep 28 05:18:45 2001 @@ -538,8 +538,6 @@ EXPORT_SYMBOL(tasklet_kill); EXPORT_SYMBOL(__run_task_queue); EXPORT_SYMBOL(do_softirq); -EXPORT_SYMBOL(raise_softirq); -EXPORT_SYMBOL(cpu_raise_softirq); EXPORT_SYMBOL(__tasklet_schedule); EXPORT_SYMBOL(__tasklet_hi_schedule); diff -urN 2.4.10/kernel/sched.c softirq/kernel/sched.c --- 2.4.10/kernel/sched.c Sun Sep 23 21:11:43 2001 +++ softirq/kernel/sched.c Fri Sep 28 05:18:46 2001 @@ -366,6 +366,28 @@ } /** + * unwakeup - undo wakeup if possible. + * @p: task + * @state: new task state + * + * Undo a previous wakeup of the specified task - if the process + * is not running already. The main interface to be used is + * unwakeup_process(), it will do a lockless test whether the task + * is on the runqueue. + */ +void __unwakeup_process(struct task_struct * p, long state) +{ + unsigned long flags; + + spin_lock_irqsave(&runqueue_lock, flags); + if (!p->has_cpu && (p != current) && task_on_runqueue(p)) { + del_from_runqueue(p); + p->state = state; + } + spin_unlock_irqrestore(&runqueue_lock, flags); +} + +/** * schedule_timeout - sleep until timeout * @timeout: timeout value in jiffies * diff -urN 2.4.10/kernel/softirq.c softirq/kernel/softirq.c --- 2.4.10/kernel/softirq.c Sun Sep 23 21:11:43 2001 +++ softirq/kernel/softirq.c Fri Sep 28 05:18:47 2001 @@ -58,12 +58,23 @@ wake_up_process(tsk); } +/* + * If a softirqs were fully handled after ksoftirqd was woken + * up then try to undo the wakeup. + */ +static inline void unwakeup_softirqd(unsigned cpu) +{ + struct task_struct * tsk = ksoftirqd_task(cpu); + + if (tsk) + unwakeup_process(tsk, TASK_INTERRUPTIBLE); +} + asmlinkage void do_softirq() { int cpu = smp_processor_id(); - __u32 pending; + __u32 pending, mask; long flags; - __u32 mask; if (in_interrupt()) return; @@ -102,48 +113,32 @@ __local_bh_enable(); if (pending) + /* + * In the normal case ksoftirqd is rarely activated, + * increased scheduling hurts performance. + * It's a safety measure: if external load starts + * to flood the system with softirqs then we + * will mitigate softirq work to the softirq thread. + */ wakeup_softirqd(cpu); + else + /* + * All softirqs are handled - undo any possible + * wakeup of softirqd. This reduces context switch + * overhead. + */ + unwakeup_softirqd(cpu); } local_irq_restore(flags); } -/* - * This function must run with irq disabled! - */ -inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr) -{ - __cpu_raise_softirq(cpu, nr); - - /* - * If we're in an interrupt or bh, we're done - * (this also catches bh-disabled code). We will - * actually run the softirq once we return from - * the irq or bh. - * - * Otherwise we wake up ksoftirqd to make sure we - * schedule the softirq soon. - */ - if (!(local_irq_count(cpu) | local_bh_count(cpu))) - wakeup_softirqd(cpu); -} - -void raise_softirq(unsigned int nr) -{ - long flags; - - local_irq_save(flags); - cpu_raise_softirq(smp_processor_id(), nr); - local_irq_restore(flags); -} - void open_softirq(int nr, void (*action)(struct softirq_action*), void *data) { softirq_vec[nr].data = data; softirq_vec[nr].action = action; } - /* Tasklets */ struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned; @@ -157,8 +152,9 @@ local_irq_save(flags); t->next = tasklet_vec[cpu].list; tasklet_vec[cpu].list = t; - cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); + __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } void __tasklet_hi_schedule(struct tasklet_struct *t) @@ -169,8 +165,9 @@ local_irq_save(flags); t->next = tasklet_hi_vec[cpu].list; tasklet_hi_vec[cpu].list = t; - cpu_raise_softirq(cpu, HI_SOFTIRQ); + __cpu_raise_softirq(cpu, HI_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } static void tasklet_action(struct softirq_action *a) @@ -241,7 +238,6 @@ } } - void tasklet_init(struct tasklet_struct *t, void (*func)(unsigned long), unsigned long data) { @@ -268,8 +264,6 @@ clear_bit(TASKLET_STATE_SCHED, &t->state); } - - /* Old style BHs */ static void (*bh_base[32])(void); @@ -325,7 +319,7 @@ { int i; - for (i=0; i<32; i++) + for (i = 0; i < 32; i++) tasklet_init(bh_task_vec+i, bh_action, i); open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); @@ -361,39 +355,45 @@ static int ksoftirqd(void * __bind_cpu) { - int bind_cpu = *(int *) __bind_cpu; - int cpu = cpu_logical_map(bind_cpu); + int cpu = cpu_logical_map((int)__bind_cpu); daemonize(); - current->nice = 19; + sigfillset(¤t->blocked); /* Migrate to the right CPU */ current->cpus_allowed = 1UL << cpu; - while (smp_processor_id() != cpu) - schedule(); - sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu); +#if CONFIG_SMP + sprintf(current->comm, "ksoftirqd CPU%d", cpu); +#else + sprintf(current->comm, "ksoftirqd"); +#endif + current->nice = 19; + schedule(); + if (smp_processor_id() != cpu) + BUG(); __set_current_state(TASK_INTERRUPTIBLE); mb(); - ksoftirqd_task(cpu) = current; for (;;) { - if (!softirq_pending(cpu)) - schedule(); - - __set_current_state(TASK_RUNNING); - - while (softirq_pending(cpu)) { + schedule(); + do { do_softirq(); if (current->need_resched) - schedule(); - } - + goto preempt; + back: + ; + } while (softirq_pending(cpu)); __set_current_state(TASK_INTERRUPTIBLE); } + +preempt: + __set_current_state(TASK_RUNNING); + schedule(); + goto back; } static __init int spawn_ksoftirqd(void) @@ -401,7 +401,7 @@ int cpu; for (cpu = 0; cpu < smp_num_cpus; cpu++) { - if (kernel_thread(ksoftirqd, (void *) &cpu, + if (kernel_thread(ksoftirqd, (void *) cpu, CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0) printk("spawn_ksoftirqd() failed for cpu %d\n", cpu); else { diff -urN 2.4.10/net/core/dev.c softirq/net/core/dev.c --- 2.4.10/net/core/dev.c Sun Sep 23 21:11:43 2001 +++ softirq/net/core/dev.c Fri Sep 28 05:18:46 2001 @@ -1218,8 +1218,9 @@ dev_hold(skb->dev); __skb_queue_tail(&queue->input_pkt_queue,skb); /* Runs from irqs or BH's, no need to wake BH */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(this_cpu); #ifndef OFFLINE_SAMPLE get_sample_stats(this_cpu); #endif @@ -1529,8 +1530,9 @@ local_irq_disable(); netdev_rx_stat[this_cpu].time_squeeze++; /* This already runs in BH context, no need to wake up BH's */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_enable(); + rerun_softirqs(this_cpu); NET_PROFILE_LEAVE(softnet_process); return; Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 3:20 ` Andrea Arcangeli @ 2001-09-28 7:34 ` Ingo Molnar 2001-09-28 15:17 ` Andrea Arcangeli 0 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 7:34 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > I'm just curious, what are the numbers of your A7 patch compared with > this one? well, a quick glance shows that this is mostly the -A7 patch with the single-loop omitted, right? I tested it before (the unwakeup mechanizm is orthogonal to the looping concept), and while unwakeup alone helps somewhat, its effect cannot be compared to the effects of looping. Please check my previous mail for details, i tried a large set of other combinations as well. frankly, i dont understand what your problem is with the looping concept. It's actually simpler than the mask-bitmap version, and it ensures finegrained handling and low latencies of softirq handlers. We loop in the softirq handlers themselves already. Ideally we'd like to loop until all work is done - but we exit on some limits, and it's generally a good idea to not let the handlers themselves loop for a too long time. (to get good scheduling latencies.) And we can increase/decrease MAX_SOFTIRQ_RESTART if it's ever shown to be excessive/insufficient. (We could even runtime tune it - but i thought that to be an overdesigning of a nonexisting problem.) (i had an interim version of the patch which had a sysctl-tunable MAX_SOFTIRQ_RESTART - this is where the value of '10' comes from. A value of '1' means a single-loop. Just to make sure i also tested the 'mask' version which is a bit more than just a single loop in the current patch - but for tx/rx purposes it's almost equivalent to a single loop.) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 7:34 ` Ingo Molnar @ 2001-09-28 15:17 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 15:17 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise On Fri, Sep 28, 2001 at 09:34:52AM +0200, Ingo Molnar wrote: > frankly, i dont understand what your problem is with the looping concept. There is no problem, it's just that I am _curious_ about is the "difference" in "numbers" not words. I'm not claiming there is no difference of course, obviously there is difference, I expect that. Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* [patch] softirq-2.4.10-B2 2001-09-27 23:31 ` Andrea Arcangeli 2001-09-28 3:20 ` Andrea Arcangeli @ 2001-09-28 7:18 ` Ingo Molnar 2001-09-28 15:58 ` Andrea Arcangeli 2001-09-28 18:36 ` Simon Kirby 1 sibling, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 7:18 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise [-- Attachment #1: Type: TEXT/PLAIN, Size: 5426 bytes --] On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > some comment after reading your softirq-2.4.10-A7. > > > - softirq handling can now be restarted N times within do_softirq(), if a > > softirq gets reactivated while it's being handled. > > is this really necessary after introducing the unwakeup logic? What do > you get if you allow at max 1 softirq pass as before? yes, of course it's necessery. The reason is really simple: softirqs have a natural ordering, and if we are handling softirq #2, while softirq #1 gets reactivated, nothing will process softirq #1 if we do only a single loop. I explained this in full detail during my previous softirq patch a few months ago. The unwakeup logic only reverts the wakeup of ksoftirqd, it will not magically process pending softirqs! I explained all these effects in detail when i wrote the first softirq-looping patch, and when you originally came up with ksoftirqd. (Even with just a single softirq activated, if we are just on the way out of handing softirqs (the first and last time) at the syscall level, and a hardirq comes in that activates this softirq, then there is nothing that will process the softirq: 1) the hardirq's own handling of softirqs is inhibed due to the softirq-atomic section within do_softirq() 2) this loop will not process the new work since it's on the way out.) basically the problem is that there is a big 'gap' between the activation of softirqs, and the time when ksoftirqd starts running. There are a number of mechanisms within the networking stack that are quite timing-sensitive. And generally, if there is work A and work B that are related, and we've executed work A (the hardware interrupt), then it's almost always the best idea to execute work B as soon as possible. Any 'delaying' of work B should only be done for non-performance reasons: eg. fairness in this case. Now it's MAX_SOFTIRQ_RESTART that balances performance against fairness. (in most kernel subsystems we almost always give preference to performance over fairness - without ignoring fairness of course.) there is also another bad side-effect of ksoftirqd as well: if it's relatively inactive for some time then it will 'collect' current->counter scheduler ticks, basically boosting its performance way above that of the intended ->nice = 19. It will then often 'suck' softirq handling to itself, due to its more agressive scheduling position. To combat this effect, i've modified ksoftirq to do: if (current->counter > 1) current->counter = 1; (this is a tiny bit racy wrt. the timer interrupt, but it's harmless.) the current form of softirqs were designed by Alexey and David for the purposes high-performance networking, as part of the 'softnet' effort. Networking remains the biggest user of softirqs - while there are a few cases of high-frequency tasklet uses, generally it's the network stack's TX_SOFTIRQ and RX_SOFTIRQ workload that we care about most - and tasklets. (see the tasklet fixes in the patch.) Via TX-completion-IRQ capable cards, there can be a constant and separate TX and RX softirq workload added. especially under high loads, the work done in the 'later' net-softirq, NET_RX_SOFTIRQ can mount up, and thus the amount of pending work within NET_TX_SOFTIRQ can mount up. Furthermore, there is a mechanizm within both the tx and rx softirq that can break out of softirq handling before all work has been handled: if a jiffy (10 msecs) has passed, or if we have processed more than netdev_max_backlog (default: 300) packets. there are a number of other options i experimented with: - handling softirqs in schedule(), before runqueue_lock is taken, in a softirq- and irq- atomic way, unless ->need_resched is set. This was done in earlier kernels, and might be a good idea to do again => especially with unwakeup(). The downside is extra cost within schedule(). - tuning the amount of work within the tx/rx handlers, both increasing and decreasing the amount of packets. Decreasing the amount of work has the effect of decreasing the latency of processing RX-triggered TX events (such as ACK), and generally handling TX/RX events more smoothly, but it also has the effect of increasing the cache footprint. - exchanging the order of tx and rx softirqs. - using jiffies within do_softirq() to make sure it does not execute for more than 10-20 msecs. - feeding back a 'work left' integer through the ->action functions to do_softirq() - who can then do decisions which softirq to restart. (basically a mini softirq scheduler.) this later one looked pretty powerful because it provides more information ot the generic layer - but it's something i think might be too intrusive for 2.4. For now, the simplest and most effective method of all was the looping. - i've done one more refinement to the current patch: do_softirq() now checks current->need_resched and it will break out of softirq processing if it's 1. Note that do_softirq() is a rare function which *must not* do '!current->need_resched': poll_idle() uses need_resched == -1 as a special value. (but normally irq-level code does not check ->need_resched so this is a special case). This prevent irqs that hit the idle-poll task to do normal softirq processing - and not break out after one loop. i've attached the softirq-2.4.10-B2 that has your TASK_RUNNING suggestion, Oleg's fixes and this change included. Ingo [-- Attachment #2: Type: TEXT/PLAIN, Size: 11556 bytes --] --- linux/kernel/ksyms.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/ksyms.c Wed Sep 26 17:04:48 2001 @@ -538,8 +538,6 @@ EXPORT_SYMBOL(tasklet_kill); EXPORT_SYMBOL(__run_task_queue); EXPORT_SYMBOL(do_softirq); -EXPORT_SYMBOL(raise_softirq); -EXPORT_SYMBOL(cpu_raise_softirq); EXPORT_SYMBOL(__tasklet_schedule); EXPORT_SYMBOL(__tasklet_hi_schedule); --- linux/kernel/sched.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/sched.c Wed Sep 26 17:04:48 2001 @@ -366,6 +366,28 @@ } /** + * unwakeup - undo wakeup if possible. + * @p: task + * @state: new task state + * + * Undo a previous wakeup of the specified task - if the process + * is not running already. The main interface to be used is + * unwakeup_process(), it will do a lockless test whether the task + * is on the runqueue. + */ +void __unwakeup_process(struct task_struct * p, long state) +{ + unsigned long flags; + + spin_lock_irqsave(&runqueue_lock, flags); + if (!p->has_cpu && (p != current) && task_on_runqueue(p)) { + del_from_runqueue(p); + p->state = state; + } + spin_unlock_irqrestore(&runqueue_lock, flags); +} + +/** * schedule_timeout - sleep until timeout * @timeout: timeout value in jiffies * --- linux/kernel/softirq.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/softirq.c Fri Sep 28 08:56:08 2001 @@ -58,12 +58,35 @@ wake_up_process(tsk); } +/* + * If a softirqs were fully handled after ksoftirqd was woken + * up then try to undo the wakeup. + */ +static inline void unwakeup_softirqd(unsigned cpu) +{ + struct task_struct * tsk = ksoftirqd_task(cpu); + + if (tsk) + unwakeup_process(tsk, TASK_INTERRUPTIBLE); +} + +/* + * We restart softirq processing MAX_SOFTIRQ_RESTART times, + * and we fall back to softirqd after that. + * + * This number has been established via experimentation. + * The two things to balance is latency against fairness - + * we want to handle softirqs as soon as possible, but they + * should not be able to lock up the box. + */ +#define MAX_SOFTIRQ_RESTART 10 + asmlinkage void do_softirq() { + int max_restart = MAX_SOFTIRQ_RESTART; int cpu = smp_processor_id(); __u32 pending; long flags; - __u32 mask; if (in_interrupt()) return; @@ -75,7 +98,6 @@ if (pending) { struct softirq_action *h; - mask = ~pending; local_bh_disable(); restart: /* Reset the pending bitmask before enabling irqs */ @@ -95,55 +117,37 @@ local_irq_disable(); pending = softirq_pending(cpu); - if (pending & mask) { - mask &= ~pending; + if (pending && --max_restart && (current->need_resched != 1)) goto restart; - } __local_bh_enable(); if (pending) + /* + * In the normal case ksoftirqd is rarely activated, + * increased scheduling hurts performance. + * It's a safety measure: if external load starts + * to flood the system with softirqs then we + * will mitigate softirq work to the softirq thread. + */ wakeup_softirqd(cpu); + else + /* + * All softirqs are handled - undo any possible + * wakeup of softirqd. This reduces context switch + * overhead. + */ + unwakeup_softirqd(cpu); } local_irq_restore(flags); } -/* - * This function must run with irq disabled! - */ -inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr) -{ - __cpu_raise_softirq(cpu, nr); - - /* - * If we're in an interrupt or bh, we're done - * (this also catches bh-disabled code). We will - * actually run the softirq once we return from - * the irq or bh. - * - * Otherwise we wake up ksoftirqd to make sure we - * schedule the softirq soon. - */ - if (!(local_irq_count(cpu) | local_bh_count(cpu))) - wakeup_softirqd(cpu); -} - -void raise_softirq(unsigned int nr) -{ - long flags; - - local_irq_save(flags); - cpu_raise_softirq(smp_processor_id(), nr); - local_irq_restore(flags); -} - void open_softirq(int nr, void (*action)(struct softirq_action*), void *data) { softirq_vec[nr].data = data; softirq_vec[nr].action = action; } - /* Tasklets */ struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned; @@ -157,8 +161,9 @@ local_irq_save(flags); t->next = tasklet_vec[cpu].list; tasklet_vec[cpu].list = t; - cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); + __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } void __tasklet_hi_schedule(struct tasklet_struct *t) @@ -169,8 +174,9 @@ local_irq_save(flags); t->next = tasklet_hi_vec[cpu].list; tasklet_hi_vec[cpu].list = t; - cpu_raise_softirq(cpu, HI_SOFTIRQ); + __cpu_raise_softirq(cpu, HI_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } static void tasklet_action(struct softirq_action *a) @@ -241,7 +247,6 @@ } } - void tasklet_init(struct tasklet_struct *t, void (*func)(unsigned long), unsigned long data) { @@ -268,8 +273,6 @@ clear_bit(TASKLET_STATE_SCHED, &t->state); } - - /* Old style BHs */ static void (*bh_base[32])(void); @@ -325,7 +328,7 @@ { int i; - for (i=0; i<32; i++) + for (i = 0; i < 32; i++) tasklet_init(bh_task_vec+i, bh_action, i); open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); @@ -361,38 +364,42 @@ static int ksoftirqd(void * __bind_cpu) { - int bind_cpu = *(int *) __bind_cpu; - int cpu = cpu_logical_map(bind_cpu); + int cpu = cpu_logical_map((int)__bind_cpu); daemonize(); - current->nice = 19; + sigfillset(¤t->blocked); /* Migrate to the right CPU */ - current->cpus_allowed = 1UL << cpu; - while (smp_processor_id() != cpu) - schedule(); + current->cpus_allowed = 1 << cpu; - sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu); - - __set_current_state(TASK_INTERRUPTIBLE); - mb(); +#if CONFIG_SMP + sprintf(current->comm, "ksoftirqd CPU%d", cpu); +#else + sprintf(current->comm, "ksoftirqd"); +#endif + current->nice = 19; + schedule(); ksoftirqd_task(cpu) = current; for (;;) { - if (!softirq_pending(cpu)) - schedule(); - - __set_current_state(TASK_RUNNING); - while (softirq_pending(cpu)) { do_softirq(); if (current->need_resched) - schedule(); + goto preempt; } __set_current_state(TASK_INTERRUPTIBLE); + /* This has to be here to make the test IRQ-correct. */ + barrier(); + if (!softirq_pending(cpu)) { +preempt: + if (current->counter > 1) + current->counter = 1; + schedule(); + } + __set_current_state(TASK_RUNNING); } } @@ -400,17 +407,10 @@ { int cpu; - for (cpu = 0; cpu < smp_num_cpus; cpu++) { - if (kernel_thread(ksoftirqd, (void *) &cpu, + for (cpu = 0; cpu < smp_num_cpus; cpu++) + if (kernel_thread(ksoftirqd, (void *) cpu, CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0) - printk("spawn_ksoftirqd() failed for cpu %d\n", cpu); - else { - while (!ksoftirqd_task(cpu_logical_map(cpu))) { - current->policy |= SCHED_YIELD; - schedule(); - } - } - } + BUG(); return 0; } --- linux/include/linux/netdevice.h.orig Wed Sep 26 17:04:36 2001 +++ linux/include/linux/netdevice.h Fri Sep 28 07:44:01 2001 @@ -486,8 +486,9 @@ local_irq_save(flags); dev->next_sched = softnet_data[cpu].output_queue; softnet_data[cpu].output_queue = dev; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } @@ -535,8 +536,9 @@ local_irq_save(flags); skb->next = softnet_data[cpu].completion_queue; softnet_data[cpu].completion_queue = skb; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } --- linux/include/linux/interrupt.h.orig Wed Sep 26 17:04:40 2001 +++ linux/include/linux/interrupt.h Fri Sep 28 07:44:01 2001 @@ -74,9 +74,14 @@ asmlinkage void do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action*), void *data); extern void softirq_init(void); -#define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0) -extern void FASTCALL(cpu_raise_softirq(unsigned int cpu, unsigned int nr)); -extern void FASTCALL(raise_softirq(unsigned int nr)); +#define __cpu_raise_softirq(cpu, nr) \ + do { softirq_pending(cpu) |= 1UL << (nr); } while (0) + +#define rerun_softirqs(cpu) \ +do { \ + if (!(local_irq_count(cpu) | local_bh_count(cpu))) \ + do_softirq(); \ +} while (0); --- linux/include/linux/sched.h.orig Wed Sep 26 17:04:40 2001 +++ linux/include/linux/sched.h Fri Sep 28 07:44:01 2001 @@ -556,6 +556,7 @@ extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr)); extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); +extern void FASTCALL(__unwakeup_process(struct task_struct * p, long state)); extern void FASTCALL(sleep_on(wait_queue_head_t *q)); extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q, signed long timeout)); @@ -574,6 +575,13 @@ #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0) #define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1) #define wake_up_interruptible_sync_nr(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, nr) + +#define unwakeup_process(tsk,state) \ +do { \ + if (task_on_runqueue(tsk)) \ + __unwakeup_process(tsk,state); \ +} while (0) + asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru); extern int in_group_p(gid_t); --- linux/include/asm-mips/softirq.h.orig Wed Sep 26 20:58:00 2001 +++ linux/include/asm-mips/softirq.h Wed Sep 26 20:58:07 2001 @@ -40,6 +40,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -#define __cpu_raise_softirq(cpu, nr) set_bit(nr, &softirq_pending(cpu)) - #endif /* _ASM_SOFTIRQ_H */ --- linux/include/asm-mips64/softirq.h.orig Wed Sep 26 20:58:20 2001 +++ linux/include/asm-mips64/softirq.h Wed Sep 26 20:58:26 2001 @@ -39,19 +39,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -extern inline void __cpu_raise_softirq(int cpu, int nr) -{ - unsigned int *m = (unsigned int *) &softirq_pending(cpu); - unsigned int temp; - - __asm__ __volatile__( - "1:\tll\t%0, %1\t\t\t# __cpu_raise_softirq\n\t" - "or\t%0, %2\n\t" - "sc\t%0, %1\n\t" - "beqz\t%0, 1b" - : "=&r" (temp), "=m" (*m) - : "ir" (1UL << nr), "m" (*m) - : "memory"); -} - #endif /* _ASM_SOFTIRQ_H */ --- linux/net/core/dev.c.orig Wed Sep 26 17:04:41 2001 +++ linux/net/core/dev.c Wed Sep 26 17:04:48 2001 @@ -1218,8 +1218,9 @@ dev_hold(skb->dev); __skb_queue_tail(&queue->input_pkt_queue,skb); /* Runs from irqs or BH's, no need to wake BH */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(this_cpu); #ifndef OFFLINE_SAMPLE get_sample_stats(this_cpu); #endif @@ -1529,8 +1530,9 @@ local_irq_disable(); netdev_rx_stat[this_cpu].time_squeeze++; /* This already runs in BH context, no need to wake up BH's */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_enable(); + rerun_softirqs(this_cpu); NET_PROFILE_LEAVE(softnet_process); return; ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq-2.4.10-B2 2001-09-28 7:18 ` [patch] softirq-2.4.10-B2 Ingo Molnar @ 2001-09-28 15:58 ` Andrea Arcangeli 2001-09-28 18:36 ` Simon Kirby 1 sibling, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 15:58 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Alan Cox, Alexey Kuznetsov, Ben LaHaise On Fri, Sep 28, 2001 at 09:18:17AM +0200, Ingo Molnar wrote: > i've attached the softirq-2.4.10-B2 that has your TASK_RUNNING suggestion, > Oleg's fixes and this change included. please include this safety checke too: --- ./kernel/softirq.c.~1~ Fri Sep 28 17:42:07 2001 +++ ./kernel/softirq.c Fri Sep 28 17:46:32 2001 @@ -381,6 +381,8 @@ current->nice = 19; schedule(); + if (smp_processor_id() != cpu) + BUG(); ksoftirqd_task(cpu) = current; for (;;) { Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq-2.4.10-B2 2001-09-28 7:18 ` [patch] softirq-2.4.10-B2 Ingo Molnar 2001-09-28 15:58 ` Andrea Arcangeli @ 2001-09-28 18:36 ` Simon Kirby 2001-09-28 18:47 ` Ingo Molnar 1 sibling, 1 reply; 44+ messages in thread From: Simon Kirby @ 2001-09-28 18:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrea Arcangeli, linux-kernel, Alexey Kuznetsov On Fri, Sep 28, 2001 at 09:18:17AM +0200, Ingo Molnar wrote: > basically the problem is that there is a big 'gap' between the activation > of softirqs, and the time when ksoftirqd starts running. There are a > number of mechanisms within the networking stack that are quite > timing-sensitive. And generally, if there is work A and work B that are > related, and we've executed work A (the hardware interrupt), then it's > almost always the best idea to execute work B as soon as possible. Any > 'delaying' of work B should only be done for non-performance reasons: > eg. fairness in this case. Now it's MAX_SOFTIRQ_RESTART that balances > performance against fairness. (in most kernel subsystems we almost always > give preference to performance over fairness - without ignoring fairness > of course.) This may be somewhat related to something I've been thinking about lately: The fact that most machines will choke under a simple UDP while(1) sendto() attack (eg: around 100Mbit wire speed with lots of small packets). It seems that what's happening is a new packet comes in immediately after the last interrupt is serviced (because the packets are so small), so the CPU has no time to execute anything else and goes right back into the interrupt handler. This starves userspace CPU time. Would your changes affect this at all? It would be nice if this could somehow be batched so that a Linux router could be capable of handling large amounts of traffic with small packets, and wouldn't choke when lots of small packets (most commonly from a TCP SYN bomb) go through it. Actually, I just tested my while(1) sendto() UDP spamming program on an Acenic card and noticed that it seems to do some sort of batching / grouping, because the number of interrupts reported by vmstat is only 8,000 while it's easily 75,000 on other eepro100 boxes. Interesting... Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ sim@stormix.com ][ sim@netnation.com ] [ Opinions expressed are not necessarily those of my employers. ] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq-2.4.10-B2 2001-09-28 18:36 ` Simon Kirby @ 2001-09-28 18:47 ` Ingo Molnar 2001-09-28 19:31 ` kuznet 0 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 18:47 UTC (permalink / raw) To: Simon Kirby; +Cc: Andrea Arcangeli, linux-kernel, Alexey Kuznetsov On Fri, 28 Sep 2001, Simon Kirby wrote: > Actually, I just tested my while(1) sendto() UDP spamming program on > an Acenic card and noticed that it seems to do some sort of batching / > grouping, because the number of interrupts reported by vmstat is only > 8,000 while it's easily 75,000 on other eepro100 boxes. > Interesting... yes, acenic.c (and the card itself) supports irq-rate limitation and work-coalescing. Almost all gigabit cards do. Check out the tx_coal_tick, rx_coal_tick, max_tx_dsc, max_rx_desc tunables that can be set at insmod time. Note that this does not decrease the amount of work generated, but it will reduce the irq-processing overhead significantly. The eepro100 card does not provide such ways - there the only way to stop future interrupts from happening is to stop the receiver, or to not refill the rx-skb queue. (or disable the interrupt, which is a pretty crude and inaccurate method.) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq-2.4.10-B2 2001-09-28 18:47 ` Ingo Molnar @ 2001-09-28 19:31 ` kuznet 0 siblings, 0 replies; 44+ messages in thread From: kuznet @ 2001-09-28 19:31 UTC (permalink / raw) To: mingo; +Cc: sim, andrea, linux-kernel Hello! > The eepro100 card does not provide such ways It does of course. Only it is not documented. Intel's driver does mitigate. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-26 16:44 [patch] softirq performance fixes, cleanups, 2.4.10 Ingo Molnar ` (2 preceding siblings ...) 2001-09-27 23:31 ` Andrea Arcangeli @ 2001-09-28 16:18 ` kuznet 2001-09-28 16:31 ` Ingo Molnar ` (3 more replies) 3 siblings, 4 replies; 44+ messages in thread From: kuznet @ 2001-09-28 16:18 UTC (permalink / raw) To: mingo; +Cc: torvalds, linux-kernel, alan, bcrl, andrea Hello! > - removed 'mask' handling from do_softirq() - it's unnecessery due to the > restarts. this further simplifies the code. Ingo, but this means that only the first softirq is handled. "mask" implements round-robin and this is necessary. > - '[ksoftirqd_CPU0]' is confusing on UP systems, changed it to > '[ksoftirqd]' instead. It is useless to argue about preferences, but universal naming scheme looks as less confusing yet. :-) Generally, I dislike this patch. It is utterly ugly. Also, you did not assure me that you interpret problem correctly. netif_rx() is __insensitive__ to latencies due to blocked softirq restarts. It stops spinning only when it becomes true cpu hog. And scheduling ksoftirqd is the only variant here. Most likely, your problem will disappear when you renice ksoftirqd to higher priority f.e. equal to priority of tux threads. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet @ 2001-09-28 16:31 ` Ingo Molnar 2001-09-28 17:04 ` kuznet 2001-09-28 16:32 ` Andrea Arcangeli ` (2 subsequent siblings) 3 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 16:31 UTC (permalink / raw) To: kuznet; +Cc: torvalds, linux-kernel, alan, bcrl, andrea On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > > - removed 'mask' handling from do_softirq() - it's unnecessery due to the > > restarts. this further simplifies the code. > > Ingo, but this means that only the first softirq is handled. > "mask" implements round-robin and this is necessary. it does not, please read the code again. We iterate over all active bits in the 'pending' bitmask. All softirqs that are active are handled. This inner loop is repeated up to MAX_SOFTIRQ_RESTART times, if any softirq got reactivated meanwhile. > Also, you did not assure me that you interpret problem correctly. > netif_rx() is __insensitive__ to latencies due to blocked softirq > restarts. It stops spinning only when it becomes true cpu hog. [...] (i suspect you meant net_rx_action().) net_rx_action() stops spinning if 1) a jiffy has passed 2) 300 packets have been processed. [the jiffy test is not completely accurate as it can break out of the loop after a short amount of time, but that is a minor issue.] there is nothing sacred about the old method of processing NET_RX_SOFTIRQ, then NET_TX_SOFTIRQ, then breaking out of do_softirq() (the mechanizm was a bit more complex than that, but in insignificant ways). It's just as arbitrary as 10 loops - with the difference that 10 loops perform better. > And scheduling ksoftirqd is the only variant here. scheduling ksoftirqd is the only variant once we have decided that softirqs are a CPU hog and we want to make progress in non-irq contexts. A single loop over softirqs is just as arbitrary as 10 loops, this all is a matter of balancing. > Most likely, your problem will disappear when you renice ksoftirqd > to higher priority f.e. equal to priority of tux threads. (no. see the previous mails.) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 16:31 ` Ingo Molnar @ 2001-09-28 17:04 ` kuznet 2001-09-28 17:21 ` Rik van Riel 2001-09-28 17:31 ` Ingo Molnar 0 siblings, 2 replies; 44+ messages in thread From: kuznet @ 2001-09-28 17:04 UTC (permalink / raw) To: mingo; +Cc: torvalds, linux-kernel, alan, bcrl, andrea Hello! > it does not, please read the code again. We iterate over all active bits > in the 'pending' bitmask. OK. > net_rx_action() stops spinning if 1) a jiffy has passed 2) 300 packets > have been processed. [the jiffy test is not completely accurate It is. The break is forced not earlier than after one jiffie, but it may loop for up to 2 jiffies. > there is nothing sacred about the old method of processing NET_RX_SOFTIRQ, > then NET_TX_SOFTIRQ, then breaking out of do_softirq() (the mechanizm was > a bit more complex than that, but in insignificant ways). It's just as > arbitrary as 10 loops - with the difference that 10 loops perform better. Do not cheat yourself. 10 is more than 1 by order of magnitude. I would eat this argument if the limit was 2. :-) > > Most likely, your problem will disappear when you renice ksoftirqd > > to higher priority f.e. equal to priority of tux threads. > > (no. see the previous mails.) But I do not know what exactly to look for there. Please, explain who exactly obtains an advantage of looping. net_rx_action()? Do you see drops in backlog? Actually, I could see a sense in your approach if you said something sort of: "I see drops in backlog, I want to increase it to 3000, but it breaks latency, so that let it remain 300, but instead I will make 10 loops". This would be sane justification at least. net_tx_action()? It does not look critical. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:04 ` kuznet @ 2001-09-28 17:21 ` Rik van Riel 2001-09-28 17:31 ` Andrea Arcangeli ` (2 more replies) 2001-09-28 17:31 ` Ingo Molnar 1 sibling, 3 replies; 44+ messages in thread From: Rik van Riel @ 2001-09-28 17:21 UTC (permalink / raw) To: kuznet; +Cc: mingo, torvalds, linux-kernel, alan, bcrl, andrea On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > Please, explain who exactly obtains an advantage of looping. > net_rx_action()? Do you see drops in backlog? > net_tx_action()? It does not look critical. Then how would you explain the 10% speed difference Ben and others have seen with gigabit ethernet ? regards, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:21 ` Rik van Riel @ 2001-09-28 17:31 ` Andrea Arcangeli 2001-09-28 17:41 ` kuznet 2001-09-28 18:39 ` Josh MacDonald 2 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 17:31 UTC (permalink / raw) To: Rik van Riel; +Cc: kuznet, mingo, torvalds, linux-kernel, alan, bcrl On Fri, Sep 28, 2001 at 02:21:07PM -0300, Rik van Riel wrote: > Then how would you explain the 10% speed difference > Ben and others have seen with gigabit ethernet ? partly because of the unwakeup logic, I've no problem with it, this is why I asked to measure how much of the 10% improvement cames from the unwakeup/ksoftirqd-deschedule logic, I was just curious about that. Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:21 ` Rik van Riel 2001-09-28 17:31 ` Andrea Arcangeli @ 2001-09-28 17:41 ` kuznet 2001-09-28 17:46 ` Ingo Molnar 2001-09-28 18:39 ` Josh MacDonald 2 siblings, 1 reply; 44+ messages in thread From: kuznet @ 2001-09-28 17:41 UTC (permalink / raw) To: Rik van Riel; +Cc: mingo, torvalds, linux-kernel, alan, bcrl, andrea Hello! > Then how would you explain the 10% speed difference > Ben and others have seen with gigabit ethernet ? Huh... kill all the softirqs (and, even better, scheduler too) and you will see even more effect. :-) More seriously, the question is not why these 10% appears, but rather why they _disappeared_ in 2.4.7. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:41 ` kuznet @ 2001-09-28 17:46 ` Ingo Molnar 0 siblings, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 17:46 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Rik van Riel, Linus Torvalds, linux-kernel, Alan Cox, bcrl, Andrea Arcangeli On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > More seriously, the question is not why these 10% appears, but rather > why they _disappeared_ in 2.4.7. the effects i saw (Ben is not around unfortunately, so i cannot confirm neither deny whether there is any correlation between the effects Ben saw and the effects i saw) were due to ksoftirqd's generic tendency to increase the latency between the issuing of work and the completion of it. Increasing ksoftirqd's priority (in fact, setting current->counter = 2 everytime schedule() is called :-) does not fix this fundamental property => it still causes 'work generators' (processes) to use more CPU time than 'work completion' (irqs, softirqs). Furthermore, it also increases the latency between hardirqs and softirqs. so my patch makes it sure that work is completed as soon as possible - but i've also kept an exit door open. [which you might find insufficient, but that i think is up to individual tuning anyway - i think i'm generally running faster machines than you, so our perspectives are slightly different.] [processes are not always work generators, and irqs/softirqs not always do work completion, but i think generally they fit nicely into these two categories. Eg. the TCP stack often generates new work from softirqs, but IMO this does not change the fundamental equation.] Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:21 ` Rik van Riel 2001-09-28 17:31 ` Andrea Arcangeli 2001-09-28 17:41 ` kuznet @ 2001-09-28 18:39 ` Josh MacDonald 2 siblings, 0 replies; 44+ messages in thread From: Josh MacDonald @ 2001-09-28 18:39 UTC (permalink / raw) To: linux-kernel Quoting Rik van Riel (riel@conectiva.com.br): > On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > > > Please, explain who exactly obtains an advantage of looping. > > net_rx_action()? Do you see drops in backlog? > > > net_tx_action()? It does not look critical. > > Then how would you explain the 10% speed difference > Ben and others have seen with gigabit ethernet ? Could this possibly be due to I-cache improvements? If the same interrupt handling code is being run 10 times at once you should expect an improvement of that kind. -josh ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:04 ` kuznet 2001-09-28 17:21 ` Rik van Riel @ 2001-09-28 17:31 ` Ingo Molnar 2001-09-28 17:56 ` kuznet 1 sibling, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 17:31 UTC (permalink / raw) To: kuznet; +Cc: torvalds, linux-kernel, alan, bcrl, andrea On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > > have been processed. [the jiffy test is not completely accurate > > It is. The break is forced not earlier than after one jiffie, but > it may loop for up to 2 jiffies. (you are right. it does jiffies - start_time > 1. good.) > > there is nothing sacred about the old method of processing NET_RX_SOFTIRQ, > > then NET_TX_SOFTIRQ, then breaking out of do_softirq() (the mechanizm was > > a bit more complex than that, but in insignificant ways). It's just as > > arbitrary as 10 loops - with the difference that 10 loops perform better. > > Do not cheat yourself. 10 is more than 1 by order of magnitude. > I would eat this argument if the limit was 2. :-) what are you trying to argue, that it's incorrect to re-process softirqs if they got activated during the previous pass? I told a number of fundamental reasons why it's a good idea to loop, tell me one not to do so, on a 'healthy' system. (Lets assume that a loop of 10 still ensures basic safety in situations where external load overloads the system. I will tell you that not even a loop of 1 ensures safety, unless one tweaks the softirq handlers themselves (netdev_max_backlog) to do less and less work per interrupt. Is your point to make the softirq loop to be sysctl-tunable? I can sure show you slow enough systems which can be overloaded via hardirqs alone.) > Please, explain who exactly obtains an advantage of looping. I think the technically most reasonable approach is to process softirqs *as soon as possible*, without context switches, for obvious performance reasons. They are called 'interrupts' after all. Or do we want to delegate hardware interrupts to process contexts as well? interrupts, in most cases, are 'replies to a request'. It's a pair of code: the first part issues the work, then comes the completion of work. Processes are usually the entities that 'issue' work, interrupts/softirqs are the ones that complete it. It's conceptually the right thing to complete work that was started. Softirq work is something that has to be executed sooner or later, in a healthy system. I prefer 'sooner' to 'later'. I prefer delaying current contexts instead of buffering work - executing current contexts will likely only increase the amount of work that has to be done in the future. the only reason why the loop is not endless is reliability: weaker systems, wich can be overloaded over external interfaces need some protection against endless softirq loops. I measured 10 to be something pretty close to a loop of 10000. i prefer my system to go near its peak performance, and i do not want to suffer from any 'safety measure' until it's absolutely necessery. and if it's absolutely necessery i prefer to design the safety measure in a way to make fast enough systems still go fast. Not the other way around. please point out flaws in this thinking. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:31 ` Ingo Molnar @ 2001-09-28 17:56 ` kuznet 2001-09-28 18:28 ` Ingo Molnar 2001-09-28 18:51 ` Benjamin LaHaise 0 siblings, 2 replies; 44+ messages in thread From: kuznet @ 2001-09-28 17:56 UTC (permalink / raw) To: mingo; +Cc: torvalds, linux-kernel, alan, bcrl, andrea Hello! > what are you trying to argue, that it's incorrect to re-process softirqs > if they got activated during the previous pass? I told a number of > fundamental reasons I did not find them in your mails. > (Lets assume that a loop of 10 still ensures basic safety in situations > where external load overloads the system. It does not, evidently. And 1 does not. But 10 is 10 times worse. > Is your point to make the softirq loop to be sysctl-tunable? No. My point is to make it correctly eventually. If you will repeat such attempts to "improve" it each third 2.4.x, it will remain broken forever. > show you slow enough systems which can be overloaded via hardirqs alone.) This problem has been solved ages ago. The only remaining question is how to make this more nicely. > I think the technically most reasonable approach is to process softirqs > *as soon as possible*, ... > please point out flaws in this thinking. They are processed as soon as possible. Ingo, I told net_rx_action() is small do_softirq() restarting not 10, but not less than 300 times in row. If this logic is wrong, you should explain why it is wrong. Looping dozen of times may look like cure, but for me it still looks rather like steroids. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:56 ` kuznet @ 2001-09-28 18:28 ` Ingo Molnar 2001-09-28 19:23 ` kuznet 2001-09-28 19:39 ` kuznet 2001-09-28 18:51 ` Benjamin LaHaise 1 sibling, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 18:28 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Linus Torvalds, linux-kernel, Alan Cox, bcrl, Andrea Arcangeli On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > > Is your point to make the softirq loop to be sysctl-tunable? > > No. My point is to make it correctly eventually. there is no 'technically correct' way but to process all softirqs immediately. Ie. an infinite loop. The only reason why we are doing less than that is to be nice to systems that can be overloaded. > > show you slow enough systems which can be overloaded via hardirqs alone.) > > This problem has been solved ages ago. The only remaining question is > how to make this more nicely. this problem has not been solved. hardirqs have properties unlike to softirqs that make it harder to overload them. > > I think the technically most reasonable approach is to process softirqs > > *as soon as possible*, > ... > > please point out flaws in this thinking. > > They are processed as soon as possible. they are *not*. The old code deliberately ignores newly pending softirqs. > Ingo, I told net_rx_action() is small do_softirq() restarting not 10, > but not less than 300 times in row. net_rx_action is 'small' in code, but not small in terms of cachemisses and other, unavoidable overhead. It goes over device-side skbs which can and will take several microseconds per packet. > If this logic is wrong, you should explain why it is wrong. Looping > dozen of times may look like cure, but for me it still looks rather > like steroids. the fundamental issue here that makes looping inevitable is the following one. (not that looping itself is anything bad.) We enable hardirqs during the processing of softirqs. We also guarantee exclusivity of softirq processing. We also have multiple 'queues of work'. Ie. if a hardirq adds some new work while we were off doing other work, then we have no choice but look at this other work as well. Ie. we have to loop. q.e.d. in fact even the current code is a mini-loop of ~1.5 iterations, due to the mask code. (which has the effect of processing 'a little bit more work'.) looping can of course be avoided - but only by breaking any of the above assumptions. My patch took the 2.4 semantics of softirqs more or less as cast into stone - for obvious reasons. Some more radical options: - do not use split softirqs, use one global (but per-cpu) 'work to be done' queue, that lists multiple things. The downside: no 'priority' between softirqs. (sidenote: i do think that priority of softirqs is overdesign and a fad, or, in the best case, implemented incorrectly. It has no practical effect in any workload i saw. *If* a higher-priority work queue is higher priority, then we should abort softirq processing immediately if a higher-priority softirq is activated. Ie. we should break out of net_rx_action as soon as a higher priority, ie. any of the SOFTIRQ_HI or SOFTIRQ_TX softirqs is activated. We dont do that - in fact we didnt even go to the higher priority active softirq within the inner loop. So i consider the softirq priority arguments to be pulled here by their hair. softirq priorities never existed in any realistic form, for any serious softirq load.) so this is option #1: get rid of distrinction between 'work', and just do a global (as in type, but per-CPU) work-FIFO. This concept works really well in a number of other design areas within the Linux kernel. advantage: no fancy priorities. We process work as it comes. More work means a straightforward loop over all work that need to be done. We can break out of this loop very easily and in a finegrained way: eg. if current->need_resched is set, or if we spend more than 1-2 jiffies processing softirqs then we break out of the loop. 'Measuring' softirq-related load becomes much easier as well - and if it ever becomes a problem we can throttle it artificially. (but normal systems would never see this throttling.) disadvantage: there is some cost to build 'context' of processing an event. The current method of looping over pending events that are sorted by event type (HI, TX, RX, LO) has the advantage of preserving this 'context'. OTOH, most of the event processing cost right now is not related to building of context, but related to all sorts of cachemisses like touching device-DMA-ed headers first time, or touching long-untouched skbs and sockets. unless you have strong arguments against this approach, i will start coding this. It's a pretty intrusive change, because all current softirq users will have to agree on a generic event format + callback that can be queued. This has to be embedded into skbs and bh-handling structs. What do you think? Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 18:28 ` Ingo Molnar @ 2001-09-28 19:23 ` kuznet 2001-09-28 19:48 ` Ingo Molnar 2001-09-30 9:01 ` Ingo Molnar 2001-09-28 19:39 ` kuznet 1 sibling, 2 replies; 44+ messages in thread From: kuznet @ 2001-09-28 19:23 UTC (permalink / raw) To: mingo; +Cc: torvalds, linux-kernel, alan, bcrl, andrea Hello! > this problem has not been solved. hardirqs have properties unlike to > softirqs that make it harder to overload them. What properties do you mean? I see the only property, they are "hard", and hence they should be harder to tame. :-) :-) [ No matter. It is out of topic, but it were you who said about this. Did you ever hear about "interrupt mitigation", "hw flowcontrol", "polling" eventually? All the schemes work. ] > they are *not*. The old code deliberately ignores newly pending softirqs. I tell you that net_rx_action() already LOOPs so much that it is silly to loop even more. You multiplied an infinity with 10 and dare to state that this infinity became larger. I am sorry, it is not. > net_rx_action is 'small' in code, but not small in terms of cachemisses > and other, unavoidable overhead. It goes over device-side skbs which can > and will take several microseconds per packet. Big news for me. :-) And how is it related to the issue? > > If this logic is wrong, you should explain why it is wrong. Looping > > dozen of times may look like cure, but for me it still looks rather > > like steroids. > > the fundamental issue here that makes looping inevitable is the following > one. (not that looping itself is anything bad.) We enable hardirqs during > the processing of softirqs. We also guarantee exclusivity of softirq > processing. We also have multiple 'queues of work'. Ie. if a hardirq adds > some new work while we were off doing other work, then we have no choice > but look at this other work as well. Ie. we have to loop. q.e.d. If irq adds new work it is processed immedately without any new hacks. Look into net_rx_action(). > - do not use split softirqs, use one global (but per-cpu) 'work to be > done' queue, that lists multiple things. The downside: no 'priority' > between softirqs. > > (sidenote: i do think that priority of softirqs is overdesign Where did you find priority there? As soon as you see something is enumerated, it still does not mean that low numbers have some priority. Argh... I see, what you mean. It is not priority, it is bug introduced in 2.4.7. (By you? :-)) It was not present before, all the softirq were processed each round. Well, just restore old net_rx_action() from 2.4.6. OK, now I understand your point. You mean that net_rx_action() is not started immediately, when irq arrives while net_tx_action() or a timer or something is running. Yes, it is really bad. But it still does not allow you to restart net_rx_action() 10 times in row. > unless you have strong arguments against this approach, i will start > coding this. I am sorry, I do not find anything new in this approach. Seems, all that you propose is to replace single bit in single word to some struct. It will take sense when we have more than one real softirq. :-) > queued. This has to be embedded into skbs and bh-handling structs. What do > you think? I think that you should change subject to discuss this. :-) It is not related to the problem. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 19:23 ` kuznet @ 2001-09-28 19:48 ` Ingo Molnar 2001-09-29 16:35 ` kuznet 2001-09-30 9:01 ` Ingo Molnar 1 sibling, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 19:48 UTC (permalink / raw) To: kuznet; +Cc: torvalds, linux-kernel, alan, bcrl, andrea On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > > this problem has not been solved. hardirqs have properties unlike to > > softirqs that make it harder to overload them. > > What properties do you mean? I see the only property, they are "hard", > and hence they should be harder to tame. :-) :-) (they can be disabled at their source more easily. softirqs keep getting new work queued via hardirqs, so it's harder to 'tame' them other than dropping them or delaying them - both of which can cause additional problems.) > [ No matter. It is out of topic, but it were you who said about this. > Did you ever hear about "interrupt mitigation", "hw flowcontrol", > "polling" eventually? All the schemes work. > ] (sure it works. but Linux boxes still effectively locks up if bombarded with UDP packets - ie. the problem has not been solved. But this is a different topic - i think i understand that you meant 'has been solved', as 'solved theoretically'. What i meant is that it's implemented in Linux, for a number of cases, yet, and that Linux provides no automatic guarantee against such overloads. I think we both agree and i was just too short and to inaccurate in specifying what i meant to say. Issue closed?) > > they are *not*. The old code deliberately ignores newly pending softirqs. > > I tell you that net_rx_action() already LOOPs so much that it is silly > to loop even more. You multiplied an infinity with 10 and dare to > state that this infinity became larger. I am sorry, it is not. no matter how much rx_action is looping, if it's tx work that is queueing up meanwhile and if we ignore that tx work after having finished rx processing. How many times do i have to loop over this simple packet of a fact to receive an ACK? :-) > > net_rx_action is 'small' in code, but not small in terms of cachemisses > > and other, unavoidable overhead. It goes over device-side skbs which can > > and will take several microseconds per packet. > > Big news for me. :-) And how is it related to the issue? it's very much related because it opens up a real window for other softirqs to be re-activated. which is the basic situation my patch solves. > > the fundamental issue here that makes looping inevitable is the following > > one. (not that looping itself is anything bad.) We enable hardirqs during > > the processing of softirqs. We also guarantee exclusivity of softirq > > processing. We also have multiple 'queues of work'. Ie. if a hardirq adds > > some new work while we were off doing other work, then we have no choice > > but look at this other work as well. Ie. we have to loop. q.e.d. > > If irq adds new work it is processed immedately without any new hacks. > Look into net_rx_action(). it's not processed if we are on our way out eg. processing the LO softirq. Or tx work does not get processed if we are in rx processing. > > - do not use split softirqs, use one global (but per-cpu) 'work to be > > done' queue, that lists multiple things. The downside: no 'priority' > > between softirqs. > > > > (sidenote: i do think that priority of softirqs is overdesign > > Where did you find priority there? [...] i found no functional concept of priority there. But i had to react to the following, largely bogus claim by Andrea: -----------> > What you are missing is a property provided by the old method. > > We have the NET_RX_SOFTIRQ that floods very heavily, so far so good. > > Then we have HI_SOFTIRQ, incidentally HI_SOFTIRQ from irq wants to be > executed with very low latency, with your patch it _can_ be postpone to > ksoftirqd processing just because there's the NET_RX_SOFTIRQ cpu hog in > background. With the old method it was guaranteed that the HI_SOFTIRQ > was executed with very low latency within the irq, no matter of the > NET_RX_SOFTIRQ flood. <----------- i was wrong to attribute this claim to you - sorry! I have hard time multitasking between your and Andrea's arguments, so sometimes work request and completion interrupt gets mixed up :-) > As soon as you see something is enumerated, it still does not mean > that low numbers have some priority. yep. > OK, now I understand your point. You mean that net_rx_action() is not > started immediately, when irq arrives while net_tx_action() or a timer > or something is running. Yes, it is really bad. good :) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 19:48 ` Ingo Molnar @ 2001-09-29 16:35 ` kuznet 2001-09-30 9:37 ` Kai Henningsen 0 siblings, 1 reply; 44+ messages in thread From: kuznet @ 2001-09-29 16:35 UTC (permalink / raw) To: mingo; +Cc: torvalds, linux-kernel, alan, bcrl, andrea Hello! [ Well, Ingo, you may go directly to the end of the mail, snipping historical comments. :-) ] > (sure it works. but Linux boxes still effectively locks up if bombarded > with UDP packets - ie. the problem has not been solved. ... > different topic - i think i understand that you meant 'has been solved', > as 'solved theoretically'. For me it is solved in practice. I cannot cause hardirq lockup either with tulip or with acenic. If you mean eepro100, you may fix it, implementing f.e. hw flowcontrol, which should be easy with any hardware. BTW I cannot cause softirq lockup with current kernels as well. > to inaccurate in specifying what i meant to say. Issue closed?) Yes. I do not know why you started to talk about this. "I do this nasty, because other nasty exists" is never good argument. Though I also use it sometimes. :-) > i found no functional concept of priority there. But i had to react to the > following, largely bogus claim by Andrea: Andrea speaks not about priority in fact. He speaks rather about fairness. Essentially, talking in your terms we have a small amount of events, encoded in bits of mask and instead of queueing them in strict order, we used to process all of them each round. In fact, it is equivalent to fifo with small reordering (as soon as amount of events is small). :-) Two letter "HI" was my invention, and I bet I did not mean that it is high priority. Actually, it is just random name. It is diffucult to say why I selected this name: combined from "higher than BH", sparc assembly, which I had to hack that time, and... yes, _that_ time it was important that it is processed before NET_*, because timers and BH_IMMEDIATE produce net softirq. It is not important more. > good :) Listen, let us try to make this right yet. Essentially, if we invent some real condition when softirq loop must be stopped, we win. Now it is "stop after all events pending at start are processed". OK, it is wrong. You propose: "10 rounds". It is even worse because it is against plain logic. (pardon :-)). If we invent right condition, we may improve things really significantly almost without changes to the rest. (F.e. killing of internal restart in net_rx_action() would be great win, it is in one row with your event queue, by the way. Ideal scheme must not fail even if each net_rx_action() processes single packet, leaving the rest queued. Each skb is an event. :-)) I am sure, taming is very easy, if to strain brains a bit. Essentially, your 10 must be replaced with some simple time characteristics Seems, it is possible to do this correctly, f.e. using small deadline scheduler with two participants: softirq and all the rest. Policy sort of: softirq may execute for 1/HZ, if they insist on this, but if they force to starve some processes, they lose low latency guarantee, but get latency _not_ worse than 10msec and do not eat more than some share (half) of cpu in this case. At least for UP I know how to do this exactly forming trivial deadline scheduler with two participants: run_queue and softirq pending. With SMP this can be the same... only run_queue is shared and dumb extension has bad effects, sort of throttling softirq on cpu0, when process is scheduled really on cpu0. No doubts, it should be easy. Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-29 16:35 ` kuznet @ 2001-09-30 9:37 ` Kai Henningsen 0 siblings, 0 replies; 44+ messages in thread From: Kai Henningsen @ 2001-09-30 9:37 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, mingo kuznet@ms2.inr.ac.ru wrote on 29.09.01 in <200109291635.UAA17377@ms2.inr.ac.ru>: > Essentially, if we invent some real condition when softirq loop must be > stopped, we win. Now it is "stop after all events pending at start are > processed". OK, it is wrong. You propose: "10 rounds". It is even worse > because it is against plain logic. (pardon :-)). The way I understand from this thread, the condition is *not* "10 rounds", it is "no more new softirqs are pending when we want to leave" - except it does a sort of emergency abort if 10 rounds weren't enough to get to that condition. MfG Kai ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 19:23 ` kuznet 2001-09-28 19:48 ` Ingo Molnar @ 2001-09-30 9:01 ` Ingo Molnar 1 sibling, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-30 9:01 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Linus Torvalds, linux-kernel, Alan Cox, bcrl, Andrea Arcangeli On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > Argh... I see, what you mean. It is not priority, it is bug introduced > in 2.4.7. (By you? :-)) [...] not by me. the 2.4.6 softirq changes/cleanups (softirq_pending cleanups, restart softirqs in various places, etc.) are the ones from me (the loop until done stuff). 2.4.7 contains the ksoftirqd related changes/cleanups. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 18:28 ` Ingo Molnar 2001-09-28 19:23 ` kuznet @ 2001-09-28 19:39 ` kuznet 2001-09-28 20:03 ` Ingo Molnar 1 sibling, 1 reply; 44+ messages in thread From: kuznet @ 2001-09-28 19:39 UTC (permalink / raw) To: mingo; +Cc: torvalds, linux-kernel, alan, bcrl, andrea Hello! > unless you have strong arguments against this approach, i will start > coding this. It's a pretty intrusive change, because all current softirq > users will have to agree on a generic event format + callback that can be > queued. This has to be embedded into skbs and bh-handling structs. What do > you think? Why skbs? Anyway, scheme sort of: do_softirq() { start = jiffies; while (dequeue()) { if (jiffies - start > 1) goto wakeup_ksoftirq process(); } } and raise_softirq() enqueuing event to tail, if it is still not queued is nice, not intrusive and very easy. Only "skbs" scares me. :-) Alexey ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 19:39 ` kuznet @ 2001-09-28 20:03 ` Ingo Molnar 0 siblings, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 20:03 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Linus Torvalds, linux-kernel, Alan Cox, bcrl, Andrea Arcangeli On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > Why skbs? because i did not mean to queue 'softirq bits'. the current bitmap is perfectly okay and equivalent to any queued scheme. (i'd say queueing those bits is pretty stupid and only adds overhead. It might make sense once there are more softirqs, but that can wait.) what i really meant to queue is something much more finegrained: to queue softirq *events*. The ones that get queued by netif_rx into softnet_data[this_cpu]'s input_pkt_queue. We basically have the skb as the 'event context', and we have a number of queues that are used by hardirqs to queue work towards softirq processing. do you see the whole scope of this approach? It has a number of disadvantages (some of which i outlined in the previous mail), but it also has a number of (i think important) advantages. It's also in fact simpler, once implemented. but this means that some of the queueing, that is softnet_data & skb pointers now, has to be generalized. netif_rx would use this generic interface to push work towards softirq processing, and the generic softirq engine would use a callback in the event data structure to do actual processing of the event. eg. a fastroute packet could set its event handler to be a function that does dev_queue_xmit. Other packets could set their event handlers based on their ->dev. The current demultiplexing done in net_rx_action could be done in a more natural way, and eg. fastrouting would not add runtime overhead. another effect: i think it might be useful to preserve micro-ordering of rx and tx events. It's certainly the best way to get the most out of existing cache footprint, but the downside is that it has a higher 'environment' cache footprint, due to rx and tx functions being called at high frequencies and being intermixed randomly. > Only "skbs" scares me. :-) well :) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 17:56 ` kuznet 2001-09-28 18:28 ` Ingo Molnar @ 2001-09-28 18:51 ` Benjamin LaHaise 1 sibling, 0 replies; 44+ messages in thread From: Benjamin LaHaise @ 2001-09-28 18:51 UTC (permalink / raw) To: kuznet; +Cc: mingo, torvalds, linux-kernel, alan, andrea On Fri, Sep 28, 2001 at 09:56:24PM +0400, kuznet@ms2.inr.ac.ru wrote: > > (Lets assume that a loop of 10 still ensures basic safety in situations > > where external load overloads the system. > > It does not, evidently. And 1 does not. But 10 is 10 times worse. 10 is not 10 times worse. The effect is much more subtle than that when you factor in the amount of work done as well as the cache state. > Ingo, I told net_rx_action() is small do_softirq() restarting not 10, > but not less than 300 times in row. The problem comes from net_rx_action doing less than the allowable amount of work if another rx interrupt comes in while softirqs were being run -- if the rx action is not repeated, the new packets are not acknowledged until 10's of ms later, which is a *lot* of data in the future. Running the logic via ksoftirqd alone results in a significant cache hit for just the context switch (our stack and task structs all map to the *SAME* L1 cache lines in many processors), which softirqs do not. Blanket disallowing repeats generates more overhead than it reduces overload: it makes things worse by increasing the amount of work that needs to be done during overload. -ben ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet 2001-09-28 16:31 ` Ingo Molnar @ 2001-09-28 16:32 ` Andrea Arcangeli 2001-09-28 16:46 ` Ingo Molnar 2001-09-28 16:35 ` [patch] softirq-2.4.10-B3 Ingo Molnar 2001-09-29 11:03 ` [patch] softirq performance fixes, cleanups, 2.4.10 Rusty Russell 3 siblings, 1 reply; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 16:32 UTC (permalink / raw) To: kuznet; +Cc: mingo, torvalds, linux-kernel, alan, bcrl On Fri, Sep 28, 2001 at 08:18:20PM +0400, kuznet@ms2.inr.ac.ru wrote: > Hello! > > > - removed 'mask' handling from do_softirq() - it's unnecessery due to the > > restarts. this further simplifies the code. > > Ingo, but this means that only the first softirq is handled. > "mask" implements round-robin and this is necessary. he's allowing to repeat the loop more than once to hide it, to do the "mask" with repetition correctly we'd need a per-softirq counter, not just a bitmask so it wouldn't be handy to allocate on the stack, but it's nothing unfixable. However I also preferred the previous behaviour, I think it was much nicer for general purpose (non specweb99 gigabit like scenarios). > > - '[ksoftirqd_CPU0]' is confusing on UP systems, changed it to > > '[ksoftirqd]' instead. > > It is useless to argue about preferences, but universal naming scheme > looks as less confusing yet. :-) Agreed. > Generally, I dislike this patch. It is utterly ugly. I also dislike it overall but I can see why it improves performance, and the deschedule thing makes sense for the flooding case. I would be very confortable in only merging the deschedule part and this is why I asked Ingo if he could measure the difference. Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 16:32 ` Andrea Arcangeli @ 2001-09-28 16:46 ` Ingo Molnar 2001-09-28 16:58 ` Andrea Arcangeli 0 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 16:46 UTC (permalink / raw) To: Andrea Arcangeli Cc: Alexey Kuznetsov, Linus Torvalds, linux-kernel, Alan Cox, bcrl On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > he's allowing to repeat the loop more than once to hide it, [...] it's not done to 'hide' it in any way. I removed the mask method because it's redundant under the new scheme. Softirqs *cannot* be handled 'fairly', there is always going to be one that is called sooner, and one that is called later, and even with the old code, their position within the bitmask decides which one goes first. > to do the "mask" with repetition correctly we'd need a per-softirq > counter, not just a bitmask so it wouldn't be handy to allocate on the > stack, but it's nothing unfixable. with a repetition count of 10, there is no difference, and no reason to add the mask mechanizm again. (In fact it will statistically more fair because we 'interleave' the softirqs in a fair pattern.) to make it easier to compare, here are verbal descriptions of the two mechanizms. (for those who have *cough* trouble understanding the patch.) the old code's "mask method": it starts processing softirqs in strict order. It will reprocess any softirq (in strict order again) that got reactivated meanwhile, but it will only process softirqs that were not processed yet during this run. it's not roundrobin, and it's not fair in any way, there is a strict priority between softirqs, with the additional twist of processing softirqs that got activated meanwhile. This concept is completely meaningless, it neither tries to process all pending softirqs, nor does it try implement some simple policy, like 'process all softirqs that are active *now* and *once*'. the new method does a simple loop (in priority order, first HI, then TX, then RX and LO) over all currently active softirqs, then it does a (limited) loop over all of them again if some softirq got activated while these were processed. Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 16:46 ` Ingo Molnar @ 2001-09-28 16:58 ` Andrea Arcangeli 0 siblings, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 16:58 UTC (permalink / raw) To: Ingo Molnar Cc: Alexey Kuznetsov, Linus Torvalds, linux-kernel, Alan Cox, bcrl On Fri, Sep 28, 2001 at 06:46:16PM +0200, Ingo Molnar wrote: > > On Fri, 28 Sep 2001, Andrea Arcangeli wrote: > > > he's allowing to repeat the loop more than once to hide it, [...] > > it's not done to 'hide' it in any way. I removed the mask method because > it's redundant under the new scheme. What you are missing is a property provided by the old method. We have the NET_RX_SOFTIRQ that floods very heavily, so far so good. Then we have HI_SOFTIRQ, incidentally HI_SOFTIRQ from irq wants to be executed with very low latency, with your patch it _can_ be postpone to ksoftirqd processing just because there's the NET_RX_SOFTIRQ cpu hog in background. With the old method it was guaranteed that the HI_SOFTIRQ was executed with very low latency within the irq, no matter of the NET_RX_SOFTIRQ flood. So there _is_ a difference, and the multiple-loop will tend to "hide" the difference. Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* [patch] softirq-2.4.10-B3 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet 2001-09-28 16:31 ` Ingo Molnar 2001-09-28 16:32 ` Andrea Arcangeli @ 2001-09-28 16:35 ` Ingo Molnar 2001-09-29 0:40 ` J . A . Magallon 2001-09-29 11:03 ` [patch] softirq performance fixes, cleanups, 2.4.10 Rusty Russell 3 siblings, 1 reply; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 16:35 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Linus Torvalds, linux-kernel, Alan Cox, bcrl, Andrea Arcangeli [-- Attachment #1: Type: TEXT/PLAIN, Size: 418 bytes --] On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > > - '[ksoftirqd_CPU0]' is confusing on UP systems, changed it to > > '[ksoftirqd]' instead. > > It is useless to argue about preferences, but universal naming scheme > looks as less confusing yet. :-) since you are the second one to complain, i'm convinced :) reverted. i've also added back the BUG() check for smp_processor_id() == cpu. -B3 attached. Ingo [-- Attachment #2: Type: TEXT/PLAIN, Size: 28319 bytes --] --- linux/kernel/ksyms.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/ksyms.c Wed Sep 26 17:04:48 2001 @@ -538,8 +538,6 @@ EXPORT_SYMBOL(tasklet_kill); EXPORT_SYMBOL(__run_task_queue); EXPORT_SYMBOL(do_softirq); -EXPORT_SYMBOL(raise_softirq); -EXPORT_SYMBOL(cpu_raise_softirq); EXPORT_SYMBOL(__tasklet_schedule); EXPORT_SYMBOL(__tasklet_hi_schedule); --- linux/kernel/sched.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/sched.c Wed Sep 26 17:04:48 2001 @@ -366,6 +366,28 @@ } /** + * unwakeup - undo wakeup if possible. + * @p: task + * @state: new task state + * + * Undo a previous wakeup of the specified task - if the process + * is not running already. The main interface to be used is + * unwakeup_process(), it will do a lockless test whether the task + * is on the runqueue. + */ +void __unwakeup_process(struct task_struct * p, long state) +{ + unsigned long flags; + + spin_lock_irqsave(&runqueue_lock, flags); + if (!p->has_cpu && (p != current) && task_on_runqueue(p)) { + del_from_runqueue(p); + p->state = state; + } + spin_unlock_irqrestore(&runqueue_lock, flags); +} + +/** * schedule_timeout - sleep until timeout * @timeout: timeout value in jiffies * --- linux/kernel/softirq.c.orig Wed Sep 26 17:04:40 2001 +++ linux/kernel/softirq.c Fri Sep 28 18:16:21 2001 @@ -58,12 +58,35 @@ wake_up_process(tsk); } +/* + * If a softirqs were fully handled after ksoftirqd was woken + * up then try to undo the wakeup. + */ +static inline void unwakeup_softirqd(unsigned cpu) +{ + struct task_struct * tsk = ksoftirqd_task(cpu); + + if (tsk) + unwakeup_process(tsk, TASK_INTERRUPTIBLE); +} + +/* + * We restart softirq processing MAX_SOFTIRQ_RESTART times, + * and we fall back to softirqd after that. + * + * This number has been established via experimentation. + * The two things to balance is latency against fairness - + * we want to handle softirqs as soon as possible, but they + * should not be able to lock up the box. + */ +#define MAX_SOFTIRQ_RESTART 10 + asmlinkage void do_softirq() { + int max_restart = MAX_SOFTIRQ_RESTART; int cpu = smp_processor_id(); __u32 pending; long flags; - __u32 mask; if (in_interrupt()) return; @@ -75,7 +98,6 @@ if (pending) { struct softirq_action *h; - mask = ~pending; local_bh_disable(); restart: /* Reset the pending bitmask before enabling irqs */ @@ -95,55 +117,37 @@ local_irq_disable(); pending = softirq_pending(cpu); - if (pending & mask) { - mask &= ~pending; + if (pending && --max_restart && (current->need_resched != 1)) goto restart; - } __local_bh_enable(); if (pending) + /* + * In the normal case ksoftirqd is rarely activated, + * increased scheduling hurts performance. + * It's a safety measure: if external load starts + * to flood the system with softirqs then we + * will mitigate softirq work to the softirq thread. + */ wakeup_softirqd(cpu); + else + /* + * All softirqs are handled - undo any possible + * wakeup of softirqd. This reduces context switch + * overhead. + */ + unwakeup_softirqd(cpu); } local_irq_restore(flags); } -/* - * This function must run with irq disabled! - */ -inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr) -{ - __cpu_raise_softirq(cpu, nr); - - /* - * If we're in an interrupt or bh, we're done - * (this also catches bh-disabled code). We will - * actually run the softirq once we return from - * the irq or bh. - * - * Otherwise we wake up ksoftirqd to make sure we - * schedule the softirq soon. - */ - if (!(local_irq_count(cpu) | local_bh_count(cpu))) - wakeup_softirqd(cpu); -} - -void raise_softirq(unsigned int nr) -{ - long flags; - - local_irq_save(flags); - cpu_raise_softirq(smp_processor_id(), nr); - local_irq_restore(flags); -} - void open_softirq(int nr, void (*action)(struct softirq_action*), void *data) { softirq_vec[nr].data = data; softirq_vec[nr].action = action; } - /* Tasklets */ struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned; @@ -157,8 +161,9 @@ local_irq_save(flags); t->next = tasklet_vec[cpu].list; tasklet_vec[cpu].list = t; - cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); + __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } void __tasklet_hi_schedule(struct tasklet_struct *t) @@ -169,8 +174,9 @@ local_irq_save(flags); t->next = tasklet_hi_vec[cpu].list; tasklet_hi_vec[cpu].list = t; - cpu_raise_softirq(cpu, HI_SOFTIRQ); + __cpu_raise_softirq(cpu, HI_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } static void tasklet_action(struct softirq_action *a) @@ -241,7 +247,6 @@ } } - void tasklet_init(struct tasklet_struct *t, void (*func)(unsigned long), unsigned long data) { @@ -268,8 +273,6 @@ clear_bit(TASKLET_STATE_SCHED, &t->state); } - - /* Old style BHs */ static void (*bh_base[32])(void); @@ -325,7 +328,7 @@ { int i; - for (i=0; i<32; i++) + for (i = 0; i < 32; i++) tasklet_init(bh_task_vec+i, bh_action, i); open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); @@ -361,38 +364,41 @@ static int ksoftirqd(void * __bind_cpu) { - int bind_cpu = *(int *) __bind_cpu; - int cpu = cpu_logical_map(bind_cpu); + int cpu = cpu_logical_map((int)__bind_cpu); daemonize(); - current->nice = 19; + sigfillset(¤t->blocked); /* Migrate to the right CPU */ - current->cpus_allowed = 1UL << cpu; - while (smp_processor_id() != cpu) - schedule(); - - sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu); + current->cpus_allowed = 1 << cpu; - __set_current_state(TASK_INTERRUPTIBLE); - mb(); + sprintf(current->comm, "ksoftirqd CPU%d", cpu); + current->nice = 19; + schedule(); ksoftirqd_task(cpu) = current; - for (;;) { - if (!softirq_pending(cpu)) - schedule(); - - __set_current_state(TASK_RUNNING); + if (smp_processor_id() != cpu) + BUG(); + for (;;) { while (softirq_pending(cpu)) { do_softirq(); if (current->need_resched) - schedule(); + goto preempt; } __set_current_state(TASK_INTERRUPTIBLE); + /* This has to be here to make the test IRQ-correct. */ + barrier(); + if (!softirq_pending(cpu)) { +preempt: + if (current->counter > 1) + current->counter = 1; + schedule(); + } + __set_current_state(TASK_RUNNING); } } @@ -400,17 +406,10 @@ { int cpu; - for (cpu = 0; cpu < smp_num_cpus; cpu++) { - if (kernel_thread(ksoftirqd, (void *) &cpu, + for (cpu = 0; cpu < smp_num_cpus; cpu++) + if (kernel_thread(ksoftirqd, (void *) cpu, CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0) - printk("spawn_ksoftirqd() failed for cpu %d\n", cpu); - else { - while (!ksoftirqd_task(cpu_logical_map(cpu))) { - current->policy |= SCHED_YIELD; - schedule(); - } - } - } + BUG(); return 0; } --- linux/include/linux/netdevice.h.orig Wed Sep 26 17:04:36 2001 +++ linux/include/linux/netdevice.h Fri Sep 28 07:44:01 2001 @@ -486,8 +486,9 @@ local_irq_save(flags); dev->next_sched = softnet_data[cpu].output_queue; softnet_data[cpu].output_queue = dev; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } @@ -535,8 +536,9 @@ local_irq_save(flags); skb->next = softnet_data[cpu].completion_queue; softnet_data[cpu].completion_queue = skb; - cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); + __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(cpu); } } --- linux/include/linux/interrupt.h.orig Wed Sep 26 17:04:40 2001 +++ linux/include/linux/interrupt.h Fri Sep 28 07:44:01 2001 @@ -74,9 +74,14 @@ asmlinkage void do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action*), void *data); extern void softirq_init(void); -#define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0) -extern void FASTCALL(cpu_raise_softirq(unsigned int cpu, unsigned int nr)); -extern void FASTCALL(raise_softirq(unsigned int nr)); +#define __cpu_raise_softirq(cpu, nr) \ + do { softirq_pending(cpu) |= 1UL << (nr); } while (0) + +#define rerun_softirqs(cpu) \ +do { \ + if (!(local_irq_count(cpu) | local_bh_count(cpu))) \ + do_softirq(); \ +} while (0); --- linux/include/linux/sched.h.orig Wed Sep 26 17:04:40 2001 +++ linux/include/linux/sched.h Fri Sep 28 07:44:01 2001 @@ -556,6 +556,7 @@ extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr)); extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); +extern void FASTCALL(__unwakeup_process(struct task_struct * p, long state)); extern void FASTCALL(sleep_on(wait_queue_head_t *q)); extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q, signed long timeout)); @@ -574,6 +575,13 @@ #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0) #define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1) #define wake_up_interruptible_sync_nr(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, nr) + +#define unwakeup_process(tsk,state) \ +do { \ + if (task_on_runqueue(tsk)) \ + __unwakeup_process(tsk,state); \ +} while (0) + asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru); extern int in_group_p(gid_t); --- linux/include/asm-mips/softirq.h.orig Wed Sep 26 20:58:00 2001 +++ linux/include/asm-mips/softirq.h Wed Sep 26 20:58:07 2001 @@ -40,6 +40,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -#define __cpu_raise_softirq(cpu, nr) set_bit(nr, &softirq_pending(cpu)) - #endif /* _ASM_SOFTIRQ_H */ --- linux/include/asm-mips64/softirq.h.orig Wed Sep 26 20:58:20 2001 +++ linux/include/asm-mips64/softirq.h Wed Sep 26 20:58:26 2001 @@ -39,19 +39,4 @@ #define in_softirq() (local_bh_count(smp_processor_id()) != 0) -extern inline void __cpu_raise_softirq(int cpu, int nr) -{ - unsigned int *m = (unsigned int *) &softirq_pending(cpu); - unsigned int temp; - - __asm__ __volatile__( - "1:\tll\t%0, %1\t\t\t# __cpu_raise_softirq\n\t" - "or\t%0, %2\n\t" - "sc\t%0, %1\n\t" - "beqz\t%0, 1b" - : "=&r" (temp), "=m" (*m) - : "ir" (1UL << nr), "m" (*m) - : "memory"); -} - #endif /* _ASM_SOFTIRQ_H */ --- linux/net/core/dev.c.orig Wed Sep 26 17:04:41 2001 +++ linux/net/core/dev.c Wed Sep 26 17:04:48 2001 @@ -1218,8 +1218,9 @@ dev_hold(skb->dev); __skb_queue_tail(&queue->input_pkt_queue,skb); /* Runs from irqs or BH's, no need to wake BH */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_restore(flags); + rerun_softirqs(this_cpu); #ifndef OFFLINE_SAMPLE get_sample_stats(this_cpu); #endif @@ -1529,8 +1530,9 @@ local_irq_disable(); netdev_rx_stat[this_cpu].time_squeeze++; /* This already runs in BH context, no need to wake up BH's */ - cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); + __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ); local_irq_enable(); + rerun_softirqs(this_cpu); NET_PROFILE_LEAVE(softnet_process); return; --- linux/linux/include/linux/netdevice.h.orig Wed Sep 26 21:14:55 2001 +++ linux/linux/include/linux/netdevice.h Thu Sep 27 07:51:00 2001 @@ -390,6 +390,8 @@ unsigned char *haddr); int (*neigh_setup)(struct net_device *dev, struct neigh_parms *); int (*accept_fastpath)(struct net_device *, struct dst_entry*); +#define HAVE_POLL_CONTROLLER + void (*poll_controller)(struct net_device *dev); /* open/release and usage marking */ struct module *owner; --- linux/linux/drivers/net/tulip/tulip_core.c.orig Wed Sep 26 21:14:58 2001 +++ linux/linux/drivers/net/tulip/tulip_core.c Fri Sep 28 09:23:22 2001 @@ -243,6 +243,7 @@ static struct net_device_stats *tulip_get_stats(struct net_device *dev); static int private_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); static void set_rx_mode(struct net_device *dev); +static void poll_tulip(struct net_device *dev); @@ -1671,6 +1672,9 @@ dev->get_stats = tulip_get_stats; dev->do_ioctl = private_ioctl; dev->set_multicast_list = set_rx_mode; +#ifdef HAVE_POLL_CONTROLLER + dev->poll_controller = &poll_tulip; +#endif if (register_netdev(dev)) goto err_out_free_ring; @@ -1839,6 +1843,24 @@ /* pci_power_off (pdev, -1); */ } + + +#ifdef HAVE_POLL_CONTROLLER + +/* + * Polling 'interrupt' - used by things like netconsole to send skbs + * without having to re-enable interrupts. It's not called while + * the interrupt routine is executing. + */ + +static void poll_tulip (struct net_device *dev) +{ + disable_irq(dev->irq); + tulip_interrupt (dev->irq, dev, NULL); + enable_irq(dev->irq); +} + +#endif static struct pci_driver tulip_driver = { --- linux/linux/drivers/net/eepro100.c.orig Wed Sep 26 21:14:58 2001 +++ linux/linux/drivers/net/eepro100.c Thu Sep 27 08:11:41 2001 @@ -542,6 +542,7 @@ static int speedo_rx(struct net_device *dev); static void speedo_tx_buffer_gc(struct net_device *dev); static void speedo_interrupt(int irq, void *dev_instance, struct pt_regs *regs); +static void poll_speedo (struct net_device *dev); static int speedo_close(struct net_device *dev); static struct net_device_stats *speedo_get_stats(struct net_device *dev); static int speedo_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); @@ -837,6 +838,9 @@ dev->get_stats = &speedo_get_stats; dev->set_multicast_list = &set_rx_mode; dev->do_ioctl = &speedo_ioctl; +#ifdef HAVE_POLL_CONTROLLER + dev->poll_controller = &poll_speedo; +#endif return 0; } @@ -1594,6 +1598,23 @@ clear_bit(0, (void*)&sp->in_interrupt); return; } + +#ifdef HAVE_POLL_CONTROLLER + +/* + * Polling 'interrupt' - used by things like netconsole to send skbs + * without having to re-enable interrupts. It's not called while + * the interrupt routine is executing. + */ + +static void poll_speedo (struct net_device *dev) +{ + disable_irq(dev->irq); + speedo_interrupt (dev->irq, dev, NULL); + enable_irq(dev->irq); +} + +#endif static inline struct RxFD *speedo_rx_alloc(struct net_device *dev, int entry) { --- linux/linux/drivers/net/netconsole.c.orig Wed Sep 26 21:15:15 2001 +++ linux/linux/drivers/net/netconsole.c Thu Sep 27 16:12:10 2001 @@ -0,0 +1,331 @@ +/* + * linux/drivers/net/netconsole.c + * + * Copyright (C) 2001 Ingo Molnar <mingo@redhat.com> + * + * This file contains the implementation of an IRQ-safe, crash-safe + * kernel console implementation that outputs kernel messages to the + * network. + * + * Modification history: + * + * 2001-09-17 started by Ingo Molnar. + */ + +/**************************************************************** + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + * + ****************************************************************/ + +#include <net/tcp.h> +#include <net/udp.h> +#include <linux/mm.h> +#include <linux/tty.h> +#include <linux/init.h> +#include <linux/module.h> +#include <asm/unaligned.h> +#include <linux/console.h> +#include <linux/smp_lock.h> +#include <linux/netdevice.h> +#include <linux/tty_driver.h> +#include <linux/etherdevice.h> + +static struct net_device *netconsole_dev; +static u16 source_port, target_port; +static u32 source_ip, target_ip; +static unsigned char daddr[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff} ; + +#define NETCONSOLE_VERSION 0x01 +#define HEADER_LEN 5 + +#define MAX_UDP_CHUNK 1460 +#define MAX_PRINT_CHUNK (MAX_UDP_CHUNK-HEADER_LEN) + +/* + * We maintain a small pool of fully-sized skbs, + * to make sure the message gets out even in + * extreme OOM situations. + */ +#define MAX_NETCONSOLE_SKBS 32 + +static spinlock_t netconsole_lock = SPIN_LOCK_UNLOCKED; +static int nr_netconsole_skbs; +static struct sk_buff *netconsole_skbs; + +#define MAX_SKB_SIZE \ + (MAX_UDP_CHUNK + sizeof(struct udphdr) + \ + sizeof(struct iphdr) + sizeof(struct ethhdr)) + +static void __refill_netconsole_skbs(void) +{ + struct sk_buff *skb; + unsigned long flags; + + spin_lock_irqsave(&netconsole_lock, flags); + while (nr_netconsole_skbs < MAX_NETCONSOLE_SKBS) { + skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC); + if (!skb) + break; + if (netconsole_skbs) + skb->next = netconsole_skbs; + else + skb->next = NULL; + netconsole_skbs = skb; + nr_netconsole_skbs++; + } + spin_unlock_irqrestore(&netconsole_lock, flags); +} + +static struct sk_buff * get_netconsole_skb(void) +{ + struct sk_buff *skb; + + unsigned long flags; + + spin_lock_irqsave(&netconsole_lock, flags); + skb = netconsole_skbs; + if (skb) + netconsole_skbs = skb->next; + skb->next = NULL; + nr_netconsole_skbs--; + spin_unlock_irqrestore(&netconsole_lock, flags); + + return skb; +} + +static spinlock_t sequence_lock = SPIN_LOCK_UNLOCKED; +static unsigned int offset; + +static void send_netconsole_skb(struct net_device *dev, const char *msg, unsigned int msg_len) +{ + int total_len, eth_len, ip_len, udp_len; + unsigned long flags; + struct sk_buff *skb; + struct udphdr *udph; + struct iphdr *iph; + struct ethhdr *eth; + + udp_len = msg_len + HEADER_LEN + sizeof(*udph); + ip_len = eth_len = udp_len + sizeof(*iph); + total_len = eth_len + ETH_HLEN; + + if (nr_netconsole_skbs < MAX_NETCONSOLE_SKBS) + __refill_netconsole_skbs(); + + skb = alloc_skb(total_len, GFP_ATOMIC); + if (!skb) { + skb = get_netconsole_skb(); + if (!skb) + /* tough! */ + return; + } + + atomic_set(&skb->users, 1); + skb_reserve(skb, total_len - msg_len - HEADER_LEN); + skb->data[0] = NETCONSOLE_VERSION; + + spin_lock_irqsave(&sequence_lock, flags); + put_unaligned(htonl(offset), (u32 *) (skb->data + 1)); + offset += msg_len; + spin_unlock_irqrestore(&sequence_lock, flags); + + memcpy(skb->data + HEADER_LEN, msg, msg_len); + skb->len += msg_len + HEADER_LEN; + + udph = (struct udphdr *) skb_push(skb, sizeof(*udph)); + udph->source = source_port; + udph->dest = target_port; + udph->len = htons(udp_len); + udph->check = 0; + + iph = (struct iphdr *)skb_push(skb, sizeof(*iph)); + + iph->version = 4; + iph->ihl = 5; + iph->tos = 0; + iph->tot_len = htons(ip_len); + iph->id = 0; + iph->frag_off = 0; + iph->ttl = 64; + iph->protocol = IPPROTO_UDP; + iph->check = 0; + iph->saddr = source_ip; + iph->daddr = target_ip; + iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl); + + eth = (struct ethhdr *) skb_push(skb, ETH_HLEN); + + eth->h_proto = htons(ETH_P_IP); + memcpy(eth->h_source, dev->dev_addr, dev->addr_len); + memcpy(eth->h_dest, daddr, dev->addr_len); + +repeat: + spin_lock(&dev->xmit_lock); + dev->xmit_lock_owner = smp_processor_id(); + + if (netif_queue_stopped(dev)) { + dev->xmit_lock_owner = -1; + spin_unlock(&dev->xmit_lock); + + dev->poll_controller(dev); + goto repeat; + } + + dev->hard_start_xmit(skb, dev); + + dev->xmit_lock_owner = -1; + spin_unlock(&dev->xmit_lock); +} + +static void write_netconsole_msg(struct console *con, const char *msg, unsigned int msg_len) +{ + int len, left; + struct net_device *dev; + + dev = netconsole_dev; + if (!dev) + return; + + if (dev->poll_controller && netif_running(dev)) { + unsigned long flags; + + __save_flags(flags); + __cli(); + left = msg_len; +repeat: + if (left > MAX_PRINT_CHUNK) + len = MAX_PRINT_CHUNK; + else + len = left; + send_netconsole_skb(dev, msg, len); + msg += len; + left -= len; + if (left) + goto repeat; + __restore_flags(flags); + } +} + +static char *dev; +static int target_eth_byte0 = 255; +static int target_eth_byte1 = 255; +static int target_eth_byte2 = 255; +static int target_eth_byte3 = 255; +static int target_eth_byte4 = 255; +static int target_eth_byte5 = 255; + +MODULE_PARM(target_ip, "i"); +MODULE_PARM(target_eth_byte0, "i"); +MODULE_PARM(target_eth_byte1, "i"); +MODULE_PARM(target_eth_byte2, "i"); +MODULE_PARM(target_eth_byte3, "i"); +MODULE_PARM(target_eth_byte4, "i"); +MODULE_PARM(target_eth_byte5, "i"); +MODULE_PARM(source_port, "h"); +MODULE_PARM(target_port, "h"); +MODULE_PARM(dev, "s"); + +static struct console netconsole = + { flags: CON_ENABLED, write: write_netconsole_msg }; + +static int init_netconsole(void) +{ + struct net_device *ndev = NULL; + struct in_device *in_dev; + + printk(KERN_INFO "netconsole: using network device <%s>\n", dev); + // this will be valid once the device goes up. + if (dev) + ndev = dev_get_by_name(dev); + if (!ndev) { + printk(KERN_ERR "netconsole: network device %s does not exist, aborting.\n", dev); + return -1; + } + if (!ndev->poll_controller) { + printk(KERN_ERR "netconsole: %s's network driver does not implement netlogging yet, aborting.\n", dev); + return -1; + } + in_dev = in_dev_get(ndev); + if (!in_dev) { + printk(KERN_ERR "netconsole: network device %s is not an IP protocol device, aborting.\n", dev); + return -1; + } + source_ip = ntohl(in_dev->ifa_list->ifa_local); + if (!source_ip) { + printk(KERN_ERR "netconsole: network device %s has no local address, aborting.\n", dev); + return -1; + } +#define IP(x) ((char *)&source_ip)[x] + printk(KERN_INFO "netconsole: using source IP %i.%i.%i.%i\n", + IP(3), IP(2), IP(1), IP(0)); +#undef IP + source_ip = htonl(source_ip); + if (!target_ip) { + printk(KERN_ERR "netconsole: target_ip parameter not specified, aborting.\n"); + return -1; + } +#define IP(x) ((char *)&target_ip)[x] + printk(KERN_INFO "netconsole: using target IP %i.%i.%i.%i\n", + IP(3), IP(2), IP(1), IP(0)); +#undef IP + target_ip = htonl(target_ip); + if (!source_port) { + printk(KERN_ERR "netconsole: source_port parameter not specified, aborting.\n"); + return -1; + } + printk(KERN_INFO "netconsole: using source UDP port: %i\n", source_port); + source_port = htons(source_port); + if (!target_port) { + printk(KERN_ERR "netconsole: target_port parameter not specified, aborting.\n"); + return -1; + } + printk(KERN_INFO "netconsole: using target UDP port: %i\n", target_port); + target_port = htons(target_port); + + daddr[0] = target_eth_byte0; + daddr[1] = target_eth_byte1; + daddr[2] = target_eth_byte2; + daddr[3] = target_eth_byte3; + daddr[4] = target_eth_byte4; + daddr[5] = target_eth_byte5; + + if ((daddr[0] & daddr[1] & daddr[2] & daddr[3] & daddr[4] & daddr[5]) == 255) + printk(KERN_INFO "netconsole: using broadcast ethernet frames to send packets.\n"); + else + printk(KERN_INFO "netconsole: using target ethernet address %02x:%02x:%02x:%02x:%02x:%02x.\n", daddr[0], daddr[1], daddr[2], daddr[3], daddr[4], daddr[5]); + + netconsole_dev = ndev; +#define STARTUP_MSG "[...network console startup...]\n" + write_netconsole_msg(NULL, STARTUP_MSG, strlen(STARTUP_MSG)); + + register_console(&netconsole); + printk(KERN_INFO "netconsole: network logging started up successfully!\n"); + return 0; +} + +static void cleanup_netconsole(void) +{ + printk(KERN_INFO "netconsole: network logging shut down.\n"); + unregister_console(&netconsole); + +#define SHUTDOWN_MSG "[...network console shutdown...]\n" + write_netconsole_msg(NULL, SHUTDOWN_MSG, strlen(SHUTDOWN_MSG)); + netconsole_dev = NULL; +} + +module_init(init_netconsole); +module_exit(cleanup_netconsole); + +int dummy = MAX_SKB_SIZE; --- linux/linux/drivers/net/Makefile.orig Wed Sep 26 21:14:58 2001 +++ linux/linux/drivers/net/Makefile Wed Sep 26 21:15:15 2001 @@ -216,6 +216,8 @@ obj-y += ../acorn/net/acorn-net.o endif +obj-$(CONFIG_NETCONSOLE) += netconsole.o + # # HIPPI adapters # --- linux/linux/drivers/net/Config.in.orig Wed Sep 26 21:14:58 2001 +++ linux/linux/drivers/net/Config.in Wed Sep 26 21:15:15 2001 @@ -248,6 +248,8 @@ dep_tristate ' SysKonnect FDDI PCI support' CONFIG_SKFP $CONFIG_PCI fi +dep_tristate 'Network logging support' CONFIG_NETCONSOLE + if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then if [ "$CONFIG_INET" = "y" ]; then bool 'HIPPI driver support (EXPERIMENTAL)' CONFIG_HIPPI --- linux/linux/Documentation/networking/netlogging.txt.orig Wed Sep 26 21:15:15 2001 +++ linux/linux/Documentation/networking/netlogging.txt Wed Sep 26 21:15:15 2001 @@ -0,0 +1,46 @@ + +started by Ingo Molnar <mingo@redhat.com>, 2001.09.17 + +the network console can be configured over the module interface, +by specifying the dev=, target_ip=, source_port=, target_port= +module parameters. Sample use: + +insmod netconsole dev=eth0 target_ip=0x0a000701 \ + source_port=6666 target_port=6666 + +IP addresses must be specified in hexadecimal numbers. The +above example uses target host IP 10.0.7.1. + +if the module is loaded then all kernel messages are sent to the +target host via UDP packets. The remote host should run the +client-side 'netconsole' daemon to display & log the messages. + +WARNING: the default setting uses the broadcast ethernet address +to send packets, which can cause increased load on other systems +on the same ethernet segment. If this is not desired, then the +target ethernet address can be specified via the target_eth_byte0, +target_eth_byte1 ... byte5 module parameters. Eg.: + +insmod netconsole dev=eth0 target_ip=0x0a000701 \ + source_port=6666 target_port=6666 \ + target_eth_byte0=0x00 \ + target_eth_byte1=0x02\ + target_eth_byte2=0xA5 \ + target_eth_byte3=0x13 \ + target_eth_byte4=0x51 \ + target_eth_byte5=0xDD + +will instruct the netconsole code to use target ethernet address +00:02:A5:13:51:DD. + +NOTE: the network device (eth0 in the above case) can run any kind +of other network traffic, netconsole is not intrusive. Netconsole +might cause slight delays in other traffic if the volume of kernel +messages is high, but should have no other impact. + +netconsole was designed to be as instantaneous as possible, to +enable the logging of even the most critical kernel bugs. It works +from IRQ contexts as well, and does not enable interrupts while +sending packets. Due to these unique needs, configuration can not +be more automatic, and some fundamental limitations will remain: +only IP networks, UDP packets and ethernet devices are supported. --- linux/linux/Documentation/Configure.help.orig Wed Sep 26 21:14:55 2001 +++ linux/linux/Documentation/Configure.help Wed Sep 26 21:15:15 2001 @@ -9997,6 +9997,11 @@ say M here and read Documentation/modules.txt. This is recommended. The module will be called skfp.o. +Network logging support +CONFIG_NETCONSOLE + If you want to log kernel messages over the network, then say + "M" here. See Documentation/networking/netlogging.txt for details. + HIgh Performance Parallel Interface support (EXPERIMENTAL) CONFIG_HIPPI HIgh Performance Parallel Interface (HIPPI) is a 800Mbit/sec and ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq-2.4.10-B3 2001-09-28 16:35 ` [patch] softirq-2.4.10-B3 Ingo Molnar @ 2001-09-29 0:40 ` J . A . Magallon 0 siblings, 0 replies; 44+ messages in thread From: J . A . Magallon @ 2001-09-29 0:40 UTC (permalink / raw) To: mingo Cc: Alexey Kuznetsov, Linus Torvalds, linux-kernel, Alan Cox, bcrl, Andrea Arcangeli On 20010928 Ingo Molnar wrote: > >On Fri, 28 Sep 2001 kuznet@ms2.inr.ac.ru wrote: > >> > - '[ksoftirqd_CPU0]' is confusing on UP systems, changed it to >> > '[ksoftirqd]' instead. >> >> It is useless to argue about preferences, but universal naming scheme >> looks as less confusing yet. :-) > >since you are the second one to complain, i'm convinced :) reverted. > >i've also added back the BUG() check for smp_processor_id() == cpu. > >-B3 attached. > Beware: netconsole slipped into the patch, and makes 'make xconfig' fail: ... ./tkparse < ../arch/i386/config.in >> kconfig.tk drivers/net/Config.in: 251: can't handle dep_bool/dep_mbool/dep_tristate condition make[1]: *** [kconfig.tk] Error 1 make[1]: Leaving directory `/usr/src/linux-2.4.10-bw/scripts' make: *** [xconfig] Error 2 -- J.A. Magallon # Let the source be with you... mailto:jamagallon@able.es Mandrake Linux release 8.2 (Cooker) for i586 Linux werewolf 2.4.10-bw #1 SMP Thu Sep 27 00:32:53 CEST 2001 i686 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet ` (2 preceding siblings ...) 2001-09-28 16:35 ` [patch] softirq-2.4.10-B3 Ingo Molnar @ 2001-09-29 11:03 ` Rusty Russell 3 siblings, 0 replies; 44+ messages in thread From: Rusty Russell @ 2001-09-29 11:03 UTC (permalink / raw) To: kuznet; +Cc: mingo, torvalds, linux-kernel, alan, bcrl, andrea On Fri, 28 Sep 2001 20:18:20 +0400 (MSK DST) kuznet@ms2.inr.ac.ru wrote: > Also, you did not assure me that you interpret problem correctly. > netif_rx() is __insensitive__ to latencies due to blocked softirq restarts. > It stops spinning only when it becomes true cpu hog. And scheduling ksoftirqd > is the only variant here. Hi Ingo at al, Attached is the approach I sent to l-k earlier this week: if a timer tick goes off, back off to ksoftirqd. This should, statistically, do the right thing, even in the case of smart softirqs like netif_rx. Please consider, Rusty. --- linux-pmac/kernel/softirq.c Sun Sep 9 15:11:37 2001 +++ working-pmac-ksoftirq/kernel/softirq.c Mon Sep 24 09:44:07 2001 @@ -63,11 +63,12 @@ int cpu = smp_processor_id(); __u32 pending; long flags; - __u32 mask; + long start; if (in_interrupt()) return; + start = jiffies; local_irq_save(flags); pending = softirq_pending(cpu); @@ -75,32 +76,32 @@ if (pending) { struct softirq_action *h; - mask = ~pending; local_bh_disable(); -restart: - /* Reset the pending bitmask before enabling irqs */ - softirq_pending(cpu) = 0; + do { + /* Reset the pending bitmask before enabling irqs */ + softirq_pending(cpu) = 0; - local_irq_enable(); + local_irq_enable(); - h = softirq_vec; + h = softirq_vec; - do { - if (pending & 1) - h->action(h); - h++; - pending >>= 1; - } while (pending); - - local_irq_disable(); - - pending = softirq_pending(cpu); - if (pending & mask) { - mask &= ~pending; - goto restart; - } + do { + if (pending & 1) + h->action(h); + h++; + pending >>= 1; + } while (pending); + + local_irq_disable(); + + pending = softirq_pending(cpu); + + /* Don't spin here forever... */ + } while (pending && start == jiffies); __local_bh_enable(); + /* If a timer tick went off, assume we're overloaded, + and kick in ksoftirqd */ if (pending) wakeup_softirqd(cpu); } ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10.
@ 2001-09-27 23:29 Oleg Nesterov
2001-09-28 0:03 ` Andrea Arcangeli
2001-09-28 6:57 ` Ingo Molnar
0 siblings, 2 replies; 44+ messages in thread
From: Oleg Nesterov @ 2001-09-27 23:29 UTC (permalink / raw)
To: linux-kernel, mingo
Hello.
I am trying to understand the basics of softirq handling.
It seems to me that ksoftirqd()'s loop can be cleanuped a bit
with following (untested) patch on top of 2.4.10-softirq-A7.
It also removes the 'mask' variable in do_softirq().
Oleg
--- 2.4.10-softirq-A7/kernel/softirq.c.orig Thu Sep 27 22:31:06 2001
+++ 2.4.10-softirq-A7/kernel/softirq.c Thu Sep 27 22:54:37 2001
@@ -85,7 +85,7 @@
{
int max_restart = MAX_SOFTIRQ_RESTART;
int cpu = smp_processor_id();
- __u32 pending, mask;
+ __u32 pending;
long flags;
if (in_interrupt())
@@ -98,7 +98,6 @@
if (pending) {
struct softirq_action *h;
- mask = ~pending;
local_bh_disable();
restart:
/* Reset the pending bitmask before enabling irqs */
@@ -381,26 +380,22 @@
#endif
current->nice = 19;
- schedule();
- __set_current_state(TASK_INTERRUPTIBLE);
ksoftirqd_task(cpu) = current;
for (;;) {
-back:
+ schedule();
+ __set_current_state(TASK_INTERRUPTIBLE);
+
do {
do_softirq();
if (current->need_resched)
goto preempt;
} while (softirq_pending(cpu));
- schedule();
- __set_current_state(TASK_INTERRUPTIBLE);
- }
+ continue;
preempt:
- __set_current_state(TASK_RUNNING);
- schedule();
- __set_current_state(TASK_INTERRUPTIBLE);
- goto back;
+ __set_current_state(TASK_RUNNING);
+ }
}
static __init int spawn_ksoftirqd(void)
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-27 23:29 Oleg Nesterov @ 2001-09-28 0:03 ` Andrea Arcangeli 2001-09-28 6:57 ` Ingo Molnar 1 sibling, 0 replies; 44+ messages in thread From: Andrea Arcangeli @ 2001-09-28 0:03 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel, mingo On Fri, Sep 28, 2001 at 03:29:13AM +0400, Oleg Nesterov wrote: > --- 2.4.10-softirq-A7/kernel/softirq.c.orig Thu Sep 27 22:31:06 2001 > +++ 2.4.10-softirq-A7/kernel/softirq.c Thu Sep 27 22:54:37 2001 > @@ -85,7 +85,7 @@ > { > int max_restart = MAX_SOFTIRQ_RESTART; > int cpu = smp_processor_id(); > - __u32 pending, mask; > + __u32 pending; > long flags; > > if (in_interrupt()) > @@ -98,7 +98,6 @@ > if (pending) { > struct softirq_action *h; > > - mask = ~pending; > local_bh_disable(); > restart: > /* Reset the pending bitmask before enabling irqs */ correct. > @@ -381,26 +380,22 @@ > #endif > > current->nice = 19; > - schedule(); > - __set_current_state(TASK_INTERRUPTIBLE); buggy (check cpus_allowed). > for (;;) { > -back: > + schedule(); > + __set_current_state(TASK_INTERRUPTIBLE); > + > do { > do_softirq(); > if (current->need_resched) > goto preempt; > } while (softirq_pending(cpu)); > - schedule(); > - __set_current_state(TASK_INTERRUPTIBLE); > - } > > + continue; > preempt: > - __set_current_state(TASK_RUNNING); > - schedule(); > - __set_current_state(TASK_INTERRUPTIBLE); > - goto back; > + __set_current_state(TASK_RUNNING); > + } > } you dropped Ingo's optimization (but you resurrected the strictier /proc statistics). Andrea ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-27 23:29 Oleg Nesterov 2001-09-28 0:03 ` Andrea Arcangeli @ 2001-09-28 6:57 ` Ingo Molnar 1 sibling, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 6:57 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel On Fri, 28 Sep 2001, Oleg Nesterov wrote: > I am trying to understand the basics of softirq handling. > > It seems to me that ksoftirqd()'s loop can be cleanuped a bit with > following (untested) patch on top of 2.4.10-softirq-A7. thanks - the patch looks mostly correct. > It also removes the 'mask' variable in do_softirq(). (right, applied.) > - schedule(); > - __set_current_state(TASK_INTERRUPTIBLE); we need a single schedule to migrate the CPU to the right CPU. Btw, it's not a big offense to run on the wrong CPU: do_softirq() will run correctly. But there wont be a ksoftirqd exeuting on the 'right' CPU (for this short amount of time until it schedules once), and we might miss a single instance of softirq processing. So it's cleaner to do a schedule() that guarantees that we migrate to the target CPU. > + __set_current_state(TASK_RUNNING); > + } well, this basically reverts it to the original code and removes the optimization - but i dont have any strong feelings in either way. I think Andrea is right and it's more correct to set it TASK_RUNNING, for task-statistics reasons. Reverted it to the old code. and in fact while doing that i found another bug in the old code: TASK_INTERRUPTIBLE was set in a non-IRQ-atomic way, so theoretically it could be possible that the compiler reorders the setting of TASK_INTERRUPTIBLE and the test for softirq_pending(), and if an IRQ gets inbetween the we could miss processing softirq for 1/HZ again. Added an optimization barrier(). (SMP-consistency is not needed, since we are per-CPU.) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. @ 2001-09-28 2:50 Oleg Nesterov 2001-09-28 7:56 ` Ingo Molnar 0 siblings, 1 reply; 44+ messages in thread From: Oleg Nesterov @ 2001-09-28 2:50 UTC (permalink / raw) To: andrea, linux-kernel Hello. Thanks a lot for Your response. On Fri, Sep 28, 2001 at 04:03:39 +0400, Andrea Arcangeli wrote: > > @@ -381,26 +380,22 @@ > > #endif > > > > current->nice = 19; > > - schedule(); > > - __set_current_state(TASK_INTERRUPTIBLE); > > buggy (check cpus_allowed). Why? The next three lines is for (;;) { schedule(); __set_current_state(TASK_INTERRUPTIBLE); And if I misunderstand schedule()'s still_running: if (!(prev->cpus_allowed & (1UL << this_cpu))) goto still_running_back; Ingo's patch was buggy as well. > you dropped Ingo's optimization (but you resurrected the strictier /proc > statistics). Again, I can't understand. The new loop for (;;) { schedule(); __set_current_state(TASK_INTERRUPTIBLE); do { do_softirq(); if (current->need_resched) goto preempt; } while (softirq_pending(cpu)); continue; preempt: __set_current_state(TASK_RUNNING); } seems to be _equivalent_ to Ingo's ... What i am missed? I apologize in advance, if it is something obvious. Oleg ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [patch] softirq performance fixes, cleanups, 2.4.10. 2001-09-28 2:50 Oleg Nesterov @ 2001-09-28 7:56 ` Ingo Molnar 0 siblings, 0 replies; 44+ messages in thread From: Ingo Molnar @ 2001-09-28 7:56 UTC (permalink / raw) To: Oleg Nesterov; +Cc: andrea, linux-kernel On Fri, 28 Sep 2001, Oleg Nesterov wrote: > > > current->nice = 19; > > > - schedule(); > > > - __set_current_state(TASK_INTERRUPTIBLE); your change is indeed correct. (the history if this is that the schedule() was not always at the top of the loop - eg. it's not at the top of it in the current -B2 patch.) Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2001-09-30 13:21 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-09-26 16:44 [patch] softirq performance fixes, cleanups, 2.4.10 Ingo Molnar 2001-09-26 17:48 ` Mike Kravetz 2001-09-26 18:48 ` Ingo Molnar 2001-09-26 18:55 ` Russell King 2001-09-26 19:14 ` Ingo Molnar 2001-09-27 23:31 ` Andrea Arcangeli 2001-09-28 3:20 ` Andrea Arcangeli 2001-09-28 7:34 ` Ingo Molnar 2001-09-28 15:17 ` Andrea Arcangeli 2001-09-28 7:18 ` [patch] softirq-2.4.10-B2 Ingo Molnar 2001-09-28 15:58 ` Andrea Arcangeli 2001-09-28 18:36 ` Simon Kirby 2001-09-28 18:47 ` Ingo Molnar 2001-09-28 19:31 ` kuznet 2001-09-28 16:18 ` [patch] softirq performance fixes, cleanups, 2.4.10 kuznet 2001-09-28 16:31 ` Ingo Molnar 2001-09-28 17:04 ` kuznet 2001-09-28 17:21 ` Rik van Riel 2001-09-28 17:31 ` Andrea Arcangeli 2001-09-28 17:41 ` kuznet 2001-09-28 17:46 ` Ingo Molnar 2001-09-28 18:39 ` Josh MacDonald 2001-09-28 17:31 ` Ingo Molnar 2001-09-28 17:56 ` kuznet 2001-09-28 18:28 ` Ingo Molnar 2001-09-28 19:23 ` kuznet 2001-09-28 19:48 ` Ingo Molnar 2001-09-29 16:35 ` kuznet 2001-09-30 9:37 ` Kai Henningsen 2001-09-30 9:01 ` Ingo Molnar 2001-09-28 19:39 ` kuznet 2001-09-28 20:03 ` Ingo Molnar 2001-09-28 18:51 ` Benjamin LaHaise 2001-09-28 16:32 ` Andrea Arcangeli 2001-09-28 16:46 ` Ingo Molnar 2001-09-28 16:58 ` Andrea Arcangeli 2001-09-28 16:35 ` [patch] softirq-2.4.10-B3 Ingo Molnar 2001-09-29 0:40 ` J . A . Magallon 2001-09-29 11:03 ` [patch] softirq performance fixes, cleanups, 2.4.10 Rusty Russell -- strict thread matches above, loose matches on Subject: below -- 2001-09-27 23:29 Oleg Nesterov 2001-09-28 0:03 ` Andrea Arcangeli 2001-09-28 6:57 ` Ingo Molnar 2001-09-28 2:50 Oleg Nesterov 2001-09-28 7:56 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox