From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261904AbUL0Pch (ORCPT ); Mon, 27 Dec 2004 10:32:37 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261908AbUL0Pch (ORCPT ); Mon, 27 Dec 2004 10:32:37 -0500 Received: from bgm-24-94-57-164.stny.rr.com ([24.94.57.164]:36787 "EHLO localhost.localdomain") by vger.kernel.org with ESMTP id S261904AbUL0PcU (ORCPT ); Mon, 27 Dec 2004 10:32:20 -0500 Subject: Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) From: Steven Rostedt To: Esben Nielsen Cc: Ingo Molnar , Rui Nuno Capela , "K.R. Foley" , Fernando Lopez-Lezcano , Mark Johnson , Amit Shah , Karsten Wiese , Bill Huey , Adam Heath , emann@mrv.com, Gunther Persoons , LKML , Florian Schmidt , Lee Revell , Shane Shrybman , Thomas Gleixner , Michal Schmidt In-Reply-To: References: Content-Type: text/plain Date: Mon, 27 Dec 2004 10:27:52 -0500 Message-Id: <1104161272.20042.96.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.0.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2004-12-27 at 15:35 +0100, Esben Nielsen wrote: > I haven't seen much traffic on real-time preemption lately. Is it due > to Christmas or lost interest? > I think they are on vacation :-) > I noticed that you changed rw-locks to behave quite diferently under > real-time preemption: They basicly works like normal locks now. I.e. there > can only be one reader task within each region. This can can however lock > the region recursively. I wanted to start looking at fixing that because > it ought to hurt scalability quite a bit - and even on UP create a few > unneeded task-switchs. However, the more I think about it the bigger the > problem: > > First, let me describe how I see a read-write lock. It has 3 states: > a) unlocked > b) locked by n readers > c) locked by 1 writer > There can either be 1 writer within the protected region or n>=0 > readers within the region. When a writer wants to take the lock, > calling down_write(), it has to wait until the read count is 0. When a > reader wants to take the lock, calling down_read(), he has only to wait > until the the writer is done - there is no need to wait for the other > readers. > > Now in a real-time system down_X() ought to have a deterministic > blocking time. It should be easy to make down_read() deterministic: If > there is a writer let it inherit the calling readers priority. > However, down_write() is hard to make deterministic. Even if we assume > that the lock not only keeps track of the number of readers but keeps a > list of all the reader threads within the region it can traverse the list > and boost the priority of all those threads. If there is n readers when > down_write() is called the blocking time would be > O(ceil(n/#cpus)) > time - which is unbounded as n is not known! > > Having a rw-lock with deterministic down_read() but non-deterministic > down_write() would be very usefull in a lot of cases. The characteritic is > that the data structure being protected is relative static, is going > to be used by a lot of RT readers and the updates doesn't have to be done > with any real-time requirements. > However, there is no way to know in general which locks in the kernel can > be allowed to work like that and which can't. A good compromise would be > limit the number of readers in a lock by the number of cpu's on the > system. That would make the system scale over several CPUs without hitting > unneeded congestions on read-locks and still have a determnistic > down_write(). > Why just limit to the number of CPUs, but make a configurable limit. I would say the default may be 2*CPUs. Reason being is that once you limit the number of readers, you just bound the down_write. Even if number of readers allowed is 100, the down_write is now bound to O(ceil(n/#cpus)) as you said, but now n is known. Make a CONFIG_ALLOWED_READERS or something to that affect, and it would be easy to see what is a good optimal configuration (assuming you have the proper tests). > down_write() shall then do the following: Boost the priority of all the > active readers to the priority of the caller. This will in turn distribute > the readers over the cpu's of the system assuming no higher priority RT > tasks are running. All the reader tasks will then run to up_read() in > time O(1) as they can all run in parellel - assuming there is no ugly > nested locking ofcourse! > down_read() should first check if there is a writer. If there is > boost it and wait. If there isn't but there isn't room for another reader > boost one of the readers such it will run to up_read(). > > An extra bonus of not having the number of readers bounded: The various > structures needed for making the list of readers can be allocated once. > There is no need to call kmalloc() from within down_read() to get a list > element for the lock's list of readers. > > I don't know wether I have time for coding this soon. Under all > circumstances I do not have a SMP system so I can't really test it if I > get time to code it :-( > I have two SMP machines that I can test on, unfortunately, they both have NVIDIA cards, so I cant use them with X, unless I go back to the default driver. Which I would do, but I really like the 3d graphics ;-) -- Steve > Esben > > > > On Tue, 14 Dec 2004, Ingo Molnar wrote: > > > > > * Rui Nuno Capela wrote: > > > > > Isn't this tightly related to mkinitrd sometimes hanging while on > > > mount -o loop, that I've been reporting a couple of times before? It > > > used to hang on any other time I do a new kernel install, but latetly > > > it seems to be OK (RT-V0.9.32-19 and -20). > > > > yeah, i've added Thomas Gleixner's earlier semaphore->completion > > conversion to the loop device, to -19 or -18. > > > > Ingo > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Steven Rostedt Senior Engineer Kihon Technologies