* RT and XFS
@ 2005-07-12 23:01 Daniel Walker
2005-07-12 23:39 ` William Weston
2005-07-13 0:25 ` Nathan Scott
0 siblings, 2 replies; 22+ messages in thread
From: Daniel Walker @ 2005-07-12 23:01 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel
Is there something so odd about the XFS locking, that it can't use the
rt_lock ?
--- linux.orig/fs/xfs/linux-2.6/mrlock.h
+++ linux/fs/xfs/linux-2.6/mrlock.h
@@ -37,12 +37,12 @@
enum { MR_NONE, MR_ACCESS, MR_UPDATE };
typedef struct {
- struct rw_semaphore mr_lock;
- int mr_writer;
+ struct compat_rw_semaphore mr_lock;
+ int mr_writer;
} mrlock_t;
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: RT and XFS 2005-07-12 23:01 RT and XFS Daniel Walker @ 2005-07-12 23:39 ` William Weston 2005-07-13 0:25 ` Nathan Scott 1 sibling, 0 replies; 22+ messages in thread From: William Weston @ 2005-07-12 23:39 UTC (permalink / raw) To: Daniel Walker; +Cc: mingo, linux-kernel On Tue, 12 Jul 2005, Daniel Walker wrote: > Is there something so odd about the XFS locking, that it can't use the > rt_lock ? > > > --- linux.orig/fs/xfs/linux-2.6/mrlock.h > +++ linux/fs/xfs/linux-2.6/mrlock.h > @@ -37,12 +37,12 @@ > enum { MR_NONE, MR_ACCESS, MR_UPDATE }; > > typedef struct { > - struct rw_semaphore mr_lock; > - int mr_writer; > + struct compat_rw_semaphore mr_lock; > + int mr_writer; > } mrlock_t; BTW, what's the difference between rw_semaphore and compat_rw_semaphore? Or between semaphore and compat_semaphore? I ran into a similar issue (needing compat_semaphore) with the IVTV drivers. The following is a portion of my patch to get IVTV running under RT (the other portions are just compile-time semantics): --- ivtv-0.2.0-rc3k.orig/driver/msp3400.c 2004-11-19 08:21:04.000000000 -0800 +++ ivtv-0.2.0-rc3k/driver/msp3400.c 2005-06-22 17:26:24.000000000 -0700 @@ -115,7 +115,7 @@ struct task_struct *thread; wait_queue_head_t wq; - struct semaphore *notify; + struct compat_semaphore *notify; int active,restart,rmmod; int watch_stereo; --ww ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-12 23:01 RT and XFS Daniel Walker 2005-07-12 23:39 ` William Weston @ 2005-07-13 0:25 ` Nathan Scott 2005-07-13 0:41 ` Daniel Walker 2005-07-13 6:47 ` Ingo Molnar 1 sibling, 2 replies; 22+ messages in thread From: Nathan Scott @ 2005-07-13 0:25 UTC (permalink / raw) To: mingo, Daniel Walker; +Cc: linux-kernel On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote: > > Is there something so odd about the XFS locking, that it can't use the > rt_lock ? Not that I know of - XFS does use the downgrade_write interface, whose use isn't overly common in the rest of the kernel... maybe that has caused some confusion, dunno. > --- linux.orig/fs/xfs/linux-2.6/mrlock.h > +++ linux/fs/xfs/linux-2.6/mrlock.h > @@ -37,12 +37,12 @@ > enum { MR_NONE, MR_ACCESS, MR_UPDATE }; > > typedef struct { > - struct rw_semaphore mr_lock; > - int mr_writer; > + struct compat_rw_semaphore mr_lock; > + int mr_writer; > } mrlock_t; The XFS code is also written such that it just releases a mrlock without tracking whether it had it for access/update in the end (end lock state is not necessarily how it started out, since it may have downgraded the lock at some point, or it may not have). Its a non-trivial change to track that state within XFS itself, so the above mr_writer field in XFS's mrlock wrapper tracks that state alongside the rw_semaphore. It would prefer to be getting that out of the rw_semaphore itself, alot, but there's not any mechanism for doing so (its not a particularly nice API change either, really, for the generic locking code). I guess that may have been another reason for the above change in the RT patch, I don't know all the details there. cheers. -- Nathan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-13 0:25 ` Nathan Scott @ 2005-07-13 0:41 ` Daniel Walker 2005-07-13 0:37 ` Nathan Scott 2005-07-13 6:47 ` Ingo Molnar 1 sibling, 1 reply; 22+ messages in thread From: Daniel Walker @ 2005-07-13 0:41 UTC (permalink / raw) To: Nathan Scott; +Cc: mingo, linux-kernel On Wed, 2005-07-13 at 10:25 +1000, Nathan Scott wrote: > On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote: > > > > Is there something so odd about the XFS locking, that it can't use the > > rt_lock ? > > Not that I know of - XFS does use the downgrade_write interface, > whose use isn't overly common in the rest of the kernel... maybe > that has caused some confusion, dunno. Current RT doesn't implement downgrade_write() , but it's trivial to add it. > > --- linux.orig/fs/xfs/linux-2.6/mrlock.h > > +++ linux/fs/xfs/linux-2.6/mrlock.h > > @@ -37,12 +37,12 @@ > > enum { MR_NONE, MR_ACCESS, MR_UPDATE }; > > > > typedef struct { > > - struct rw_semaphore mr_lock; > > - int mr_writer; > > + struct compat_rw_semaphore mr_lock; > > + int mr_writer; > > } mrlock_t; > > The XFS code is also written such that it just releases a mrlock > without tracking whether it had it for access/update in the end > (end lock state is not necessarily how it started out, since it > may have downgraded the lock at some point, or it may not have). > Its a non-trivial change to track that state within XFS itself, > so the above mr_writer field in XFS's mrlock wrapper tracks that > state alongside the rw_semaphore. It would prefer to be getting > that out of the rw_semaphore itself, alot, but there's not any > mechanism for doing so (its not a particularly nice API change > either, really, for the generic locking code). I guess that may > have been another reason for the above change in the RT patch, I > don't know all the details there. So it calls up_read if it has a read lock ? Or up_write if it has a write lock? I suppose it would be broken if it didn't though. Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-13 0:41 ` Daniel Walker @ 2005-07-13 0:37 ` Nathan Scott 0 siblings, 0 replies; 22+ messages in thread From: Nathan Scott @ 2005-07-13 0:37 UTC (permalink / raw) To: Daniel Walker; +Cc: mingo, linux-kernel On Tue, Jul 12, 2005 at 05:41:43PM -0700, Daniel Walker wrote: > On Wed, 2005-07-13 at 10:25 +1000, Nathan Scott wrote: > > On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote: > > > > > > Is there something so odd about the XFS locking, that it can't use the > > > rt_lock ? > > > > Not that I know of - XFS does use the downgrade_write interface, > > whose use isn't overly common in the rest of the kernel... maybe > > that has caused some confusion, dunno. > > Current RT doesn't implement downgrade_write() , but it's trivial to add > it. Ah, thats probably it then. > So it calls up_read if it has a read lock ? Or up_write if it has a > write lock? I suppose it would be broken if it didn't though. Thats correct. cheers. -- Nathan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-13 0:25 ` Nathan Scott 2005-07-13 0:41 ` Daniel Walker @ 2005-07-13 6:47 ` Ingo Molnar 2005-07-13 16:45 ` Daniel Walker 1 sibling, 1 reply; 22+ messages in thread From: Ingo Molnar @ 2005-07-13 6:47 UTC (permalink / raw) To: Nathan Scott; +Cc: Daniel Walker, linux-kernel * Nathan Scott <nathans@sgi.com> wrote: > On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote: > > > > Is there something so odd about the XFS locking, that it can't use the > > rt_lock ? > > Not that I know of - XFS does use the downgrade_write interface, whose > use isn't overly common in the rest of the kernel... maybe that has > caused some confusion, dunno. downgrade_write() wasnt the main problem - the main problem was that for PREEMPT_RT i implemented 'strict' semaphores, which are not identical to vanilla kernel semaphores. The thing that seemed to impact XFS the most is the 'acquirer thread has to release the lock' rule of strict semaphores. Both the XFS logging code and the XFS IO completion code seems to release locks in a different context from where the acquire happened. It's of course valid upstream behavior, but without these extra rules it's hard to do sane priority inheritance. (who do you boost if you dont really know who 'owns' the lock?) It might make sense to introduce some sort of sem_pass_to(new_owner) interface? For now i introduced a compat type, which lets those semaphores fall back to the vanilla implementation. Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-13 6:47 ` Ingo Molnar @ 2005-07-13 16:45 ` Daniel Walker 2005-07-14 0:22 ` Nathan Scott 0 siblings, 1 reply; 22+ messages in thread From: Daniel Walker @ 2005-07-13 16:45 UTC (permalink / raw) To: Ingo Molnar; +Cc: Nathan Scott, linux-kernel On Wed, 2005-07-13 at 08:47 +0200, Ingo Molnar wrote: > > downgrade_write() wasnt the main problem - the main problem was that for > PREEMPT_RT i implemented 'strict' semaphores, which are not identical to > vanilla kernel semaphores. The thing that seemed to impact XFS the most > is the 'acquirer thread has to release the lock' rule of strict > semaphores. Both the XFS logging code and the XFS IO completion code > seems to release locks in a different context from where the acquire > happened. It's of course valid upstream behavior, but without these > extra rules it's hard to do sane priority inheritance. (who do you boost > if you dont really know who 'owns' the lock?) It might make sense to > introduce some sort of sem_pass_to(new_owner) interface? For now i > introduced a compat type, which lets those semaphores fall back to the > vanilla implementation. There's a lot of code like this in there .. I've seen some that down() in process contex, and up() in interrupt contex which is weird .. But those aren't major features, just little drivers. XFS is pretty major feature. Nathan, does XFS need this property or could we convert it to synchronize the locking (with ease?)? Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-13 16:45 ` Daniel Walker @ 2005-07-14 0:22 ` Nathan Scott 2005-07-14 3:50 ` Dave Chinner 0 siblings, 1 reply; 22+ messages in thread From: Nathan Scott @ 2005-07-14 0:22 UTC (permalink / raw) To: Daniel Walker, Ingo Molnar, Steve Lord; +Cc: linux-kernel, linux-xfs Hi there, On Wed, Jul 13, 2005 at 09:45:58AM -0700, Daniel Walker wrote: > On Wed, 2005-07-13 at 08:47 +0200, Ingo Molnar wrote: > > > > downgrade_write() wasnt the main problem - the main problem was that for > > PREEMPT_RT i implemented 'strict' semaphores, which are not identical to > > vanilla kernel semaphores. The thing that seemed to impact XFS the most > > is the 'acquirer thread has to release the lock' rule of strict > > semaphores. Both the XFS logging code and the XFS IO completion code > > seems to release locks in a different context from where the acquire > > happened. It's of course valid upstream behavior, but without these > > extra rules it's hard to do sane priority inheritance. (who do you boost > > if you dont really know who 'owns' the lock?) It might make sense to > > introduce some sort of sem_pass_to(new_owner) interface? For now i > > introduced a compat type, which lets those semaphores fall back to the > > vanilla implementation. Hmm, I'm not aware of anywhere in XFS where we do that. From talking to some colleagues here, they're claiming that we can't be doing that since it'd trip an assert in the IRIX mrlock code. > There's a lot of code like this in there .. I've seen some that down() > in process contex, and up() in interrupt contex which is weird .. But > those aren't major features, just little drivers. XFS is pretty major > feature. > > Nathan, does XFS need this property or could we convert it to > synchronize the locking (with ease?)? I'm not yet sure in what situations we are doing this, so can't really say. It'd be interesting to see an implementation of the downgrade_write functionality and then a specific case where the above locking behaviour happens ... and I'd then be able to say how tricky that would be to resolve. Steve, are you aware of situations where we unlock in a different thread to where we acquired the lock? It'd surprise me, as we're holding these things for as short a time as possible - afaict the transactions always ilock, copy delta to iclog, pin, and unlock, no? (all in the same thread). I can't see the iolock being used in this way anywhere either... you? cheers. -- Nathan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-14 0:22 ` Nathan Scott @ 2005-07-14 3:50 ` Dave Chinner 2005-07-14 4:10 ` Daniel Walker 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2005-07-14 3:50 UTC (permalink / raw) To: Nathan Scott Cc: Daniel Walker, Ingo Molnar, Steve Lord, linux-kernel, linux-xfs On Thu, Jul 14, 2005 at 10:22:46AM +1000, Nathan Scott wrote: > Hi there, > > On Wed, Jul 13, 2005 at 09:45:58AM -0700, Daniel Walker wrote: > > On Wed, 2005-07-13 at 08:47 +0200, Ingo Molnar wrote: > > > > > > downgrade_write() wasnt the main problem - the main problem was that for > > > PREEMPT_RT i implemented 'strict' semaphores, which are not identical to > > > vanilla kernel semaphores. The thing that seemed to impact XFS the most > > > is the 'acquirer thread has to release the lock' rule of strict > > > semaphores. Both the XFS logging code and the XFS IO completion code > > > seems to release locks in a different context from where the acquire > > > happened. It's of course valid upstream behavior, but without these > > > extra rules it's hard to do sane priority inheritance. (who do you boost > > > if you dont really know who 'owns' the lock?) It might make sense to > > > introduce some sort of sem_pass_to(new_owner) interface? For now i > > > introduced a compat type, which lets those semaphores fall back to the > > > vanilla implementation. > > Hmm, I'm not aware of anywhere in XFS where we do that. From talking > to some colleagues here, they're claiming that we can't be doing that > since it'd trip an assert in the IRIX mrlock code. Now that I've read the thread, I see it's not mrlocks that is the issue with unlocking in a different context - it's semaphores. All the pagebuf synchronisation is done with a semaphore because it's held across the I/O and it's _most definitely_ released in a different context when doing async I/O. Just about all metadata I/O is async because once the transaction has been logged to disk we don't need to write these buffers out synchronously. Not to mention the log I/O completion unlocks the buffers in a transaction in a different context as well. The whole point of using a semaphore in the pagebuf is because there is no tracking of who "owns" the lock so we can actually release it in a different context. Semaphores were invented for this purpose, and we use them in the way they were intended. ;) Realistically, I seriously doubt the need for any sort of rt changes to these semaphores. They can be held for indeterminant periods of time potentially across multiple disk I/Os (e.g. when held locked in a transaction that requires more metadata to be read in from disk to make progress). Hence there is no really no point in making them RT aware because if you end up waiting on one of them you can forget about pretty much any RT guarantee that you've ever given.... Cheers, Dave. -- Dave Chinner R&D Software Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-14 3:50 ` Dave Chinner @ 2005-07-14 4:10 ` Daniel Walker [not found] ` <20050714052347.GA18813@elte.hu> 2005-07-15 10:23 ` Ingo Molnar 0 siblings, 2 replies; 22+ messages in thread From: Daniel Walker @ 2005-07-14 4:10 UTC (permalink / raw) To: Dave Chinner Cc: Nathan Scott, Ingo Molnar, Steve Lord, linux-kernel, linux-xfs On Thu, 2005-07-14 at 13:50 +1000, Dave Chinner wrote: > Now that I've read the thread, I see it's not mrlocks that is the > issue with unlocking in a different context - it's semaphores. > > All the pagebuf synchronisation is done with a semaphore because > it's held across the I/O and it's _most definitely_ released in a > different context when doing async I/O. Just about all metadata I/O > is async because once the transaction has been logged to disk we > don't need to write these buffers out synchronously. Not to mention > the log I/O completion unlocks the buffers in a transaction in a > different context as well. > > The whole point of using a semaphore in the pagebuf is because there > is no tracking of who "owns" the lock so we can actually release it > in a different context. Semaphores were invented for this purpose, > and we use them in the way they were intended. ;) Where is the that semaphore spec, is that posix ? There is a new construct called "complete" that is good for this type of stuff too. No owner needed , just something running, and something waiting till it completes. > Realistically, I seriously doubt the need for any sort of rt changes > to these semaphores. They can be held for indeterminant periods of > time potentially across multiple disk I/Os (e.g. when held locked in > a transaction that requires more metadata to be read in from disk to > make progress). Hence there is no really no point in making them RT > aware because if you end up waiting on one of them you can forget > about pretty much any RT guarantee that you've ever given.... PI is always good, cause it allows the tracking of what is high priority , and what is not . Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20050714052347.GA18813@elte.hu>]
* Re: RT and XFS [not found] ` <20050714052347.GA18813@elte.hu> @ 2005-07-14 15:56 ` Daniel Walker 2005-07-14 16:08 ` Christoph Hellwig 2005-07-14 16:08 ` Christoph Hellwig 0 siblings, 2 replies; 22+ messages in thread From: Daniel Walker @ 2005-07-14 15:56 UTC (permalink / raw) To: Ingo Molnar Cc: Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs On Thu, 2005-07-14 at 07:23 +0200, Ingo Molnar wrote: > * Daniel Walker <dwalker@mvista.com> wrote: > > > > The whole point of using a semaphore in the pagebuf is because there > > > is no tracking of who "owns" the lock so we can actually release it > > > in a different context. Semaphores were invented for this purpose, > > > and we use them in the way they were intended. ;) > > > > Where is the that semaphore spec, is that posix ? There is a new > > construct called "complete" that is good for this type of stuff too. > > No owner needed , just something running, and something waiting till > > it completes. > > wrt. posix, we dont really care about that for kernel-internal > primitives like struct semaphore. So whether it's posix or not has no > relevance. This reminds me of Documentation/stable_api_nonsense.txt . That no one should really be dependent on a particular kernel API doing a particular thing. The kernel is play dough for the kernel hacker (as it should be), including kernel semaphores. So we can change whatever we want, and make no excuses, as long as we fix the rest of the kernel to work with our change. That seems pretty sensible , because Linux should be an evolution. Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-14 15:56 ` Daniel Walker @ 2005-07-14 16:08 ` Christoph Hellwig 2005-07-18 12:10 ` Esben Nielsen 2005-07-14 16:08 ` Christoph Hellwig 1 sibling, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2005-07-14 16:08 UTC (permalink / raw) To: Daniel Walker Cc: Ingo Molnar, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote: > This reminds me of Documentation/stable_api_nonsense.txt . That no one > should really be dependent on a particular kernel API doing a particular > thing. The kernel is play dough for the kernel hacker (as it should be), > including kernel semaphores. > > So we can change whatever we want, and make no excuses, as long as we > fix the rest of the kernel to work with our change. That seems pretty > sensible , because Linux should be an evolution. Daniel, get a fucking clue. Read some CS 101 literature on what a semaphore is defined to be. If you want PI singing dancing blinking christmas tree locking primites call them a mutex, but not a semaphore. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-14 16:08 ` Christoph Hellwig @ 2005-07-18 12:10 ` Esben Nielsen 2005-07-19 3:26 ` Bill Huey 0 siblings, 1 reply; 22+ messages in thread From: Esben Nielsen @ 2005-07-18 12:10 UTC (permalink / raw) To: Christoph Hellwig Cc: Daniel Walker, Ingo Molnar, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs On Thu, 14 Jul 2005, Christoph Hellwig wrote: > On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote: > > This reminds me of Documentation/stable_api_nonsense.txt . That no one > > should really be dependent on a particular kernel API doing a particular > > thing. The kernel is play dough for the kernel hacker (as it should be), > > including kernel semaphores. > > > > So we can change whatever we want, and make no excuses, as long as we > > fix the rest of the kernel to work with our change. That seems pretty > > sensible , because Linux should be an evolution. > > Daniel, get a fucking clue. Read some CS 101 literature on what a semaphore > is defined to be. If you want PI singing dancing blinking christmas tree > locking primites call them a mutex, but not a semaphore. > As a matter of fact I just finished what corresponds to your "CS 101" (I study CS in spare time while having a full time job coding RT stuff): To the one lecture I attended they talked about sempahores. They tought students to use binary semphores for locking. Based on real-life experience (and the Pathfinder story), I complained and told them they ought to teach the students to use a mutex instead. They had no clue "It is the same thing they said". Yes, a mutex can be implemented just as a binary semaphore but the semantics of it is different. In RT the difference is very important and even without-RT it is a good idea to maintain the difference for readability and deadlock detection. If you later on want to optimize the semaphore for what it is used for it is also good to have maintained that information. It is a bit like discarding the type information from you programs. You want to keep the type information even though the compilere end up producing the same code. The kernel developer clearly have followed the same lectures and used plain binary semaphores, sometimes calling the mutex sometimes semaphore. I believe that the semaphore ought to be removed. Either use a mutex or a completion. Far the most code is using a sempahore as either signalling - i.e. as a completion - or critical sections - i.e. as a mutex. If code mixes the usage it is must likely very hard to read.... Unfortunately, one of the goals of the preempt-rt branch is to avoid altering too much code. Therefore the type semaphore can't be removed there. Therefore the name still lingers ... :-( Esben ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-18 12:10 ` Esben Nielsen @ 2005-07-19 3:26 ` Bill Huey 2005-07-19 12:34 ` Ingo Molnar 0 siblings, 1 reply; 22+ messages in thread From: Bill Huey @ 2005-07-19 3:26 UTC (permalink / raw) To: Esben Nielsen Cc: Christoph Hellwig, Daniel Walker, Ingo Molnar, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs On Mon, Jul 18, 2005 at 02:10:31PM +0200, Esben Nielsen wrote: > Unfortunately, one of the goals of the preempt-rt branch is to avoid > altering too much code. Therefore the type semaphore can't be removed > there. Therefore the name still lingers ... :-( This is where you failed. You assumed that that person making the comment, Christopher, in the first place didn't have his head up his ass in the first place and was open to your end of the discussion. bill ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-19 3:26 ` Bill Huey @ 2005-07-19 12:34 ` Ingo Molnar 2005-07-19 13:27 ` Christoph Hellwig 0 siblings, 1 reply; 22+ messages in thread From: Ingo Molnar @ 2005-07-19 12:34 UTC (permalink / raw) To: Bill Huey Cc: Esben Nielsen, Christoph Hellwig, Daniel Walker, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs * Bill Huey <bhuey@lnxw.com> wrote: > On Mon, Jul 18, 2005 at 02:10:31PM +0200, Esben Nielsen wrote: > > Unfortunately, one of the goals of the preempt-rt branch is to avoid > > altering too much code. Therefore the type semaphore can't be removed > > there. Therefore the name still lingers ... :-( > > This is where you failed. You assumed that that person making the > comment, Christopher, in the first place didn't have his head up his > ass in the first place and was open to your end of the discussion. please take me off the Cc: list for such kind of replies. Christoph is very much entitled to his opinion, which i happen to mostly share in this case: we should not be bothering upstream with requirements unique to PREEMPT_RT. PREEMPT_RT restricts struct semaphore to be a mutex, and that doesnt make it a classic semaphore anymore. We had no other choice but it's still somewhat unclean in that regard. (I do disagree with Christoph on another point: i do think we eventually want to change the standard semaphore type in a similar fashion upstream as well - but that probably has to come with a s/struct semaphore/struct mutex/ change as well.) Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-19 12:34 ` Ingo Molnar @ 2005-07-19 13:27 ` Christoph Hellwig 2005-07-19 13:50 ` Ingo Molnar 0 siblings, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2005-07-19 13:27 UTC (permalink / raw) To: Ingo Molnar Cc: Bill Huey, Esben Nielsen, Christoph Hellwig, Daniel Walker, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs On Tue, Jul 19, 2005 at 02:34:57PM +0200, Ingo Molnar wrote: > (I do disagree with Christoph on another point: i do think we eventually > want to change the standard semaphore type in a similar fashion upstream > as well - but that probably has to come with a s/struct semaphore/struct > mutex/ change as well.) Actually having a mutex_t in mainline would be a good idea even without preempt rt, to document better what kind of locking we expect. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-19 13:27 ` Christoph Hellwig @ 2005-07-19 13:50 ` Ingo Molnar 0 siblings, 0 replies; 22+ messages in thread From: Ingo Molnar @ 2005-07-19 13:50 UTC (permalink / raw) To: Christoph Hellwig, Bill Huey, Esben Nielsen, Daniel Walker, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs * Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jul 19, 2005 at 02:34:57PM +0200, Ingo Molnar wrote: > > (I do disagree with Christoph on another point: i do think we eventually > > want to change the standard semaphore type in a similar fashion upstream > > as well - but that probably has to come with a s/struct semaphore/struct > > mutex/ change as well.) > > Actually having a mutex_t in mainline would be a good idea even > without preempt rt, to document better what kind of locking we expect. cool! I'll cook up a patch for that. Right now these are the numbers: there are 526 uses of struct semaphore in 2.6.12. In the -RT tree i had to change 23 of them to be compat_semaphore - i.e. 23 uses were definitely non-mutex. (We sure have missed some cases - but it would be fair to say that the expected number of cases is less than 50, and that we've mapped the most common ones already. That makes it a 90%/10% splitup: more than 90% of all struct semaphore use is pure mutex.) Of the remaining <10% cases, the majority is of the type of completions, and there are a handful of (<10) cases of 'counted semaphore' uses: semaphores with a count larger than 1. (e.g. ACPI uses it to count resources, some audio code too - but it's very rare) Btw., that's the only 'true' (in terms of CS) semaphore use. Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-14 15:56 ` Daniel Walker 2005-07-14 16:08 ` Christoph Hellwig @ 2005-07-14 16:08 ` Christoph Hellwig 1 sibling, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2005-07-14 16:08 UTC (permalink / raw) To: Daniel Walker Cc: Ingo Molnar, Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote: > On Thu, 2005-07-14 at 07:23 +0200, Ingo Molnar wrote: > > * Daniel Walker <dwalker@mvista.com> wrote: > > > > > > The whole point of using a semaphore in the pagebuf is because there > > > > is no tracking of who "owns" the lock so we can actually release it > > > > in a different context. Semaphores were invented for this purpose, > > > > and we use them in the way they were intended. ;) > > > > > > Where is the that semaphore spec, is that posix ? There is a new > > > construct called "complete" that is good for this type of stuff too. > > > No owner needed , just something running, and something waiting till > > > it completes. > > > > wrt. posix, we dont really care about that for kernel-internal > > primitives like struct semaphore. So whether it's posix or not has no > > relevance. > > This reminds me of Documentation/stable_api_nonsense.txt . That no one > should really be dependent on a particular kernel API doing a particular > thing. The kernel is play dough for the kernel hacker (as it should be), > including kernel semaphores. > > So we can change whatever we want, and make no excuses, as long as we > fix the rest of the kernel to work with our change. That seems pretty > sensible , because Linux should be an evolution. > > Daniel > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ---end quoted text--- ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-14 4:10 ` Daniel Walker [not found] ` <20050714052347.GA18813@elte.hu> @ 2005-07-15 10:23 ` Ingo Molnar 2005-07-15 16:16 ` Daniel Walker 1 sibling, 1 reply; 22+ messages in thread From: Ingo Molnar @ 2005-07-15 10:23 UTC (permalink / raw) To: Daniel Walker Cc: Dave Chinner, Nathan Scott, Steve Lord, linux-kernel, linux-xfs, Christoph Hellwig * Daniel Walker <dwalker@mvista.com> wrote: > PI is always good, cause it allows the tracking of what is high > priority , and what is not . that's just plain wrong. PI might be good if one cares about priorities and worst-case latencies, but most of the time the kernel is plain good enough and we dont care. PI can also be pretty expensive. So in no way, shape or form can PI be "always good". Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-15 10:23 ` Ingo Molnar @ 2005-07-15 16:16 ` Daniel Walker 2005-07-18 11:33 ` Esben Nielsen 2005-07-19 3:31 ` Bill Huey 0 siblings, 2 replies; 22+ messages in thread From: Daniel Walker @ 2005-07-15 16:16 UTC (permalink / raw) To: Ingo Molnar Cc: Dave Chinner, Nathan Scott, Steve Lord, linux-kernel, linux-xfs, Christoph Hellwig On Fri, 2005-07-15 at 12:23 +0200, Ingo Molnar wrote: > * Daniel Walker <dwalker@mvista.com> wrote: > > > PI is always good, cause it allows the tracking of what is high > > priority , and what is not . > > that's just plain wrong. PI might be good if one cares about priorities > and worst-case latencies, but most of the time the kernel is plain good > enough and we dont care. PI can also be pretty expensive. So in no way, > shape or form can PI be "always good". I don't agree with that. But of course I'm always speaking from a real time perspective . PI is expensive , but it won't always be. However, no one is forcing PI on anyone, even if I think it's good .. Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-15 16:16 ` Daniel Walker @ 2005-07-18 11:33 ` Esben Nielsen 2005-07-19 3:31 ` Bill Huey 1 sibling, 0 replies; 22+ messages in thread From: Esben Nielsen @ 2005-07-18 11:33 UTC (permalink / raw) To: Daniel Walker Cc: Ingo Molnar, Dave Chinner, Nathan Scott, Steve Lord, linux-kernel, linux-xfs, Christoph Hellwig On Fri, 15 Jul 2005, Daniel Walker wrote: > On Fri, 2005-07-15 at 12:23 +0200, Ingo Molnar wrote: > > * Daniel Walker <dwalker@mvista.com> wrote: > > > > > PI is always good, cause it allows the tracking of what is high > > > priority , and what is not . > > > > that's just plain wrong. PI might be good if one cares about priorities > > and worst-case latencies, but most of the time the kernel is plain good > > enough and we dont care. PI can also be pretty expensive. So in no way, > > shape or form can PI be "always good". > > I don't agree with that. But of course I'm always speaking from a real > time perspective . PI is expensive , but it won't always be. However, no > one is forcing PI on anyone, even if I think it's good .. > Is PI needed? If you use a mutex to protect a critical area you are destroying the strict meaning of priorities if the mutex doesn't have PI: Priority inversion can effectively make the high priority task low priority in that situation and postpone it's execution indefinitely. For RT applications that is clearly unacceptable. One can argue that for non-RT tasks priorities aren't supposed to be that rigid as for RT tasks, anyway. Therefore it doesn't matter so much. But as I read the comments in sched.c a nice -20 task have to preempt any nice 0 task no matter how much a cpu-hog it is. If it happens to share a critical section with a nice +19 task, priority inversion will occationally destroy that property. If we disregard the costs of PI, PI is thus a good thing. But how expensive is PI? Ofcourse there is an overhead in doing the calculations. Ingo's implementation can be optimized quite a bit once things are settled but it will always be many times more expensive than a raw spin-lock. But is it much more expensive than a plain binary semaphore? If the is no congestion on a mutex the PI code will not be called at all. On UP, the only occation where congestion can occur is when a low priority task is preempted by a higher priority task while it has the mutex. So let us look at the expensive part where the high priority task tries to grab the mutex: With PI: The owner have to be boosted, an immediate task switch have to take place, the owner runs to the unlock operation and it set down in priority, whereafter there is a task-switch again to the highpriority task. Without PI: The owner waits and there is a task switch to some thread which might not be the owner but often is. When the owner eventually unlocks the mutex it will be follow by a task-switch - because congestion can only occur when the task trying to get the task preempts and thus have higher priority than the owner. The number of task switches are thus the same with and without PI! And then there is the cache issue: When other tasks gets scheduled in the priority inversion case the data being protected can be flushed from the cache while they are running. With PI the CPU continues to work with the same data - and most often in the same code module. I.e. there is a higher chance that the instruction and data cache contains the right data. Thus in the end it all depends on how cheaply the PI calculations can be made. Esben > Daniel > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RT and XFS 2005-07-15 16:16 ` Daniel Walker 2005-07-18 11:33 ` Esben Nielsen @ 2005-07-19 3:31 ` Bill Huey 1 sibling, 0 replies; 22+ messages in thread From: Bill Huey @ 2005-07-19 3:31 UTC (permalink / raw) To: Daniel Walker Cc: Ingo Molnar, Dave Chinner, Nathan Scott, Steve Lord, linux-kernel, linux-xfs, Christoph Hellwig On Fri, Jul 15, 2005 at 09:16:55AM -0700, Daniel Walker wrote: > I don't agree with that. But of course I'm always speaking from a real > time perspective . PI is expensive , but it won't always be. However, no > one is forcing PI on anyone, even if I think it's good .. It depends on what kind of PI under specific circumstances. In the general kernel, it's really to be avoided at all costs since it's masking a general contention problem at those places. In a formally provable worst case system using priority ceiling emulation and stuff, PI really valuable. How a system like the Linux kernel fits into that is a totally different story. General purpose kernels using general purpose facilities don't. That's how I see it. bill ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-07-19 13:51 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-12 23:01 RT and XFS Daniel Walker
2005-07-12 23:39 ` William Weston
2005-07-13 0:25 ` Nathan Scott
2005-07-13 0:41 ` Daniel Walker
2005-07-13 0:37 ` Nathan Scott
2005-07-13 6:47 ` Ingo Molnar
2005-07-13 16:45 ` Daniel Walker
2005-07-14 0:22 ` Nathan Scott
2005-07-14 3:50 ` Dave Chinner
2005-07-14 4:10 ` Daniel Walker
[not found] ` <20050714052347.GA18813@elte.hu>
2005-07-14 15:56 ` Daniel Walker
2005-07-14 16:08 ` Christoph Hellwig
2005-07-18 12:10 ` Esben Nielsen
2005-07-19 3:26 ` Bill Huey
2005-07-19 12:34 ` Ingo Molnar
2005-07-19 13:27 ` Christoph Hellwig
2005-07-19 13:50 ` Ingo Molnar
2005-07-14 16:08 ` Christoph Hellwig
2005-07-15 10:23 ` Ingo Molnar
2005-07-15 16:16 ` Daniel Walker
2005-07-18 11:33 ` Esben Nielsen
2005-07-19 3:31 ` Bill Huey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox