From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265175AbUGZVDc (ORCPT ); Mon, 26 Jul 2004 17:03:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S266121AbUGZVBe (ORCPT ); Mon, 26 Jul 2004 17:01:34 -0400 Received: from mx2.elte.hu ([157.181.151.9]:42148 "EHLO mx2.elte.hu") by vger.kernel.org with ESMTP id S266131AbUGZUsX (ORCPT ); Mon, 26 Jul 2004 16:48:23 -0400 Date: Mon, 26 Jul 2004 22:36:34 +0200 From: Ingo Molnar To: Andrew Morton Cc: rlrevell@joe-job.com, wli@holomorphy.com, lenar@vision.ee, linux-kernel@vger.kernel.org Subject: Re: [patch] voluntary-preempt-2.6.8-rc2-J3 Message-ID: <20040726203634.GA26096@elte.hu> References: <20040713122805.GZ21066@holomorphy.com> <40F3F0A0.9080100@vision.ee> <20040713143947.GG21066@holomorphy.com> <1090732537.738.2.camel@mindpipe> <1090795742.719.4.camel@mindpipe> <20040726082330.GA22764@elte.hu> <1090830574.6936.96.camel@mindpipe> <20040726083537.GA24948@elte.hu> <20040726125750.5e467cfd.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040726125750.5e467cfd.akpm@osdl.org> User-Agent: Mutt/1.4.1i X-ELTE-SpamVersion: MailScanner 4.31.6-itk1 (ELTE 1.2) SpamAssassin 2.63 ClamAV 0.73 X-ELTE-VirusStatus: clean X-ELTE-SpamCheck: no X-ELTE-SpamCheck-Details: score=-1.524, required 5.9, autolearn=not spam, BAYES_01 -1.52 X-ELTE-SpamLevel: X-ELTE-SpamScore: -1 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org * Andrew Morton wrote: > The bigger this thing gets, the more worried I get. Sometime this is > going to need to be split up into individual fixes, and they need to > be based upon an overall approach which we haven't yet settled on. i will do that splitup. Right now i'm simply mapping how widespread the problem is and what type of fixes we need. The situation isnt all that bad but we might need (an optional) mechanism to make softirqs synchronous. All of this stuff is nicely modular and i'll do a splitup post 2.6.8 (i dont think we want to disturb 2.6.8 with any of this). > In particular your whole approach (with voluntary_need_resched()) > doesn't work on SMP. (for the record, voluntary_need_resched() == need_resched() - i'm only keeping a distinction between the two to be able to track progress and regressions between the vanilla and modified kernels.) need_resched() indeed doesnt do a lock-break for SMP purposes. > The approach I'm using is to unconditionally drop locks on every Nth pass > around the loop to allow another CPU to grab the lock, do some work, drop > the lock, then be preempted. eg: > > @@ -773,6 +774,12 @@ int get_user_pages(struct task_struct *t > struct page *map = NULL; > int lookup_write = write; > > + if ((++nr_pages & 63) == 0) { > + spin_unlock(&mm->page_table_lock); > + cpu_relax(); > + spin_lock(&mm->page_table_lock); > + } > + guaranteeing latencies on SMP is hard, because the latency of a CPU might depend on the latency of a task on another CPU - which CPU isnt notified of the rescheduling request. one alternative technique to yours would be to notify _all_ CPUs that a high-prio RT task is ready to run (via a broadcast need-resched). That way the UP latency-break techniques map to SMP in a 1:1 way. non-RT tasks dont get this benefit, which is a difference to the UP situation, but i dont think it would be appropriate to use the UP behavior, due to the overhead of broadcasting. a combination of the two techniques could be used too: a global 'break locks from now on' flag which gets set if a (RT?) task wants to reschedule. Normally this flag would be zero and the cacheline would be clean and shared between all CPUs, causing no overhead. Once a task wants to reschedule it would increase by 1 until that task has been scheduled. This would remove the ugliness factor too, your sample code would look: > + if (need_lock_break()) { > + spin_unlock(&mm->page_table_lock); > + cpu_relax(); > + spin_lock(&mm->page_table_lock); > + } > + or a shortcut: > + cond_lock_break(&mm->page_table_lock); there are two problems with the unconditional lock-break: - i'm not sure cpu_relax() guarantees that another CPU can grab this lock. It all depends on whether the lock cacheline can bounce over to the other CPU faster than cpu_relax() finishes and this CPU re-acquires the lock. - the latencies will still be higher than on UP, in a hard to predict way. Every time the codepath encounters a spinlock it has to take in order to exit for a reschedule, the latency gets re-added. Also, the now scheduled high-prio task will encounter the latency every time it acquires a spinlock - while on UP it has the CPU for itself with no interaction from other CPUs. Ingo