RT and XFS

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* RT and XFS
@ 2005-07-12 23:01 Daniel Walker
  2005-07-12 23:39 ` William Weston
  2005-07-13  0:25 ` Nathan Scott
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Walker @ 2005-07-12 23:01 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel


Is there something so odd about the XFS locking, that it can't use the
rt_lock ?


--- linux.orig/fs/xfs/linux-2.6/mrlock.h
+++ linux/fs/xfs/linux-2.6/mrlock.h
@@ -37,12 +37,12 @@
 enum { MR_NONE, MR_ACCESS, MR_UPDATE };
 
 typedef struct {
-	struct rw_semaphore	mr_lock;
-	int			mr_writer;
+	struct compat_rw_semaphore	mr_lock;
+	int				mr_writer;
 } mrlock_t;
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-12 23:01 RT and XFS Daniel Walker
@ 2005-07-12 23:39 ` William Weston
  2005-07-13  0:25 ` Nathan Scott
  1 sibling, 0 replies; 22+ messages in thread
From: William Weston @ 2005-07-12 23:39 UTC (permalink / raw)
  To: Daniel Walker; +Cc: mingo, linux-kernel

On Tue, 12 Jul 2005, Daniel Walker wrote:

> Is there something so odd about the XFS locking, that it can't use the
> rt_lock ?
> 
> 
> --- linux.orig/fs/xfs/linux-2.6/mrlock.h
> +++ linux/fs/xfs/linux-2.6/mrlock.h
> @@ -37,12 +37,12 @@
>  enum { MR_NONE, MR_ACCESS, MR_UPDATE };
>  
>  typedef struct {
> -	struct rw_semaphore	mr_lock;
> -	int			mr_writer;
> +	struct compat_rw_semaphore	mr_lock;
> +	int				mr_writer;
>  } mrlock_t;

BTW, what's the difference between rw_semaphore and compat_rw_semaphore?  
Or between semaphore and compat_semaphore?  I ran into a similar issue
(needing compat_semaphore) with the IVTV drivers.  The following is a
portion of my patch to get IVTV running under RT (the other portions are
just compile-time semantics):

--- ivtv-0.2.0-rc3k.orig/driver/msp3400.c       2004-11-19 08:21:04.000000000 -0800
+++ ivtv-0.2.0-rc3k/driver/msp3400.c    2005-06-22 17:26:24.000000000 
-0700
@@ -115,7 +115,7 @@
 	struct task_struct  *thread;
 	wait_queue_head_t    wq;
 
-	struct semaphore    *notify;
+	struct compat_semaphore *notify;
 	int                  active,restart,rmmod;
 
 	int                  watch_stereo;

--ww

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-12 23:01 RT and XFS Daniel Walker
  2005-07-12 23:39 ` William Weston
@ 2005-07-13  0:25 ` Nathan Scott
  2005-07-13  0:41   ` Daniel Walker
  2005-07-13  6:47   ` Ingo Molnar
  1 sibling, 2 replies; 22+ messages in thread
From: Nathan Scott @ 2005-07-13  0:25 UTC (permalink / raw)
  To: mingo, Daniel Walker; +Cc: linux-kernel

On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote:
> 
> Is there something so odd about the XFS locking, that it can't use the
> rt_lock ?

Not that I know of - XFS does use the downgrade_write interface,
whose use isn't overly common in the rest of the kernel... maybe
that has caused some confusion, dunno.

> --- linux.orig/fs/xfs/linux-2.6/mrlock.h
> +++ linux/fs/xfs/linux-2.6/mrlock.h
> @@ -37,12 +37,12 @@
>  enum { MR_NONE, MR_ACCESS, MR_UPDATE };
>  
>  typedef struct {
> -	struct rw_semaphore	mr_lock;
> -	int			mr_writer;
> +	struct compat_rw_semaphore	mr_lock;
> +	int				mr_writer;
>  } mrlock_t;

The XFS code is also written such that it just releases a mrlock
without tracking whether it had it for access/update in the end
(end lock state is not necessarily how it started out, since it
may have downgraded the lock at some point, or it may not have).
Its a non-trivial change to track that state within XFS itself,
so the above mr_writer field in XFS's mrlock wrapper tracks that
state alongside the rw_semaphore.  It would prefer to be getting
that out of the rw_semaphore itself, alot, but there's not any
mechanism for doing so (its not a particularly nice API change
either, really, for the generic locking code).  I guess that may
have been another reason for the above change in the RT patch, I
don't know all the details there.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-13  0:25 ` Nathan Scott
@ 2005-07-13  0:41   ` Daniel Walker
  2005-07-13  0:37     ` Nathan Scott
  2005-07-13  6:47   ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Walker @ 2005-07-13  0:41 UTC (permalink / raw)
  To: Nathan Scott; +Cc: mingo, linux-kernel

On Wed, 2005-07-13 at 10:25 +1000, Nathan Scott wrote:
> On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote:
> > 
> > Is there something so odd about the XFS locking, that it can't use the
> > rt_lock ?
> 
> Not that I know of - XFS does use the downgrade_write interface,
> whose use isn't overly common in the rest of the kernel... maybe
> that has caused some confusion, dunno.

Current RT doesn't implement downgrade_write() , but it's trivial to add
it.

> > --- linux.orig/fs/xfs/linux-2.6/mrlock.h
> > +++ linux/fs/xfs/linux-2.6/mrlock.h
> > @@ -37,12 +37,12 @@
> >  enum { MR_NONE, MR_ACCESS, MR_UPDATE };
> >  
> >  typedef struct {
> > -	struct rw_semaphore	mr_lock;
> > -	int			mr_writer;
> > +	struct compat_rw_semaphore	mr_lock;
> > +	int				mr_writer;
> >  } mrlock_t;
> 
> The XFS code is also written such that it just releases a mrlock
> without tracking whether it had it for access/update in the end
> (end lock state is not necessarily how it started out, since it
> may have downgraded the lock at some point, or it may not have).
> Its a non-trivial change to track that state within XFS itself,
> so the above mr_writer field in XFS's mrlock wrapper tracks that
> state alongside the rw_semaphore.  It would prefer to be getting
> that out of the rw_semaphore itself, alot, but there's not any
> mechanism for doing so (its not a particularly nice API change
> either, really, for the generic locking code).  I guess that may
> have been another reason for the above change in the RT patch, I
> don't know all the details there.

So it calls up_read if it has a read lock ? Or up_write if it has a
write lock? I suppose it would be broken if it didn't though.

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-13  0:41   ` Daniel Walker
@ 2005-07-13  0:37     ` Nathan Scott
  0 siblings, 0 replies; 22+ messages in thread
From: Nathan Scott @ 2005-07-13  0:37 UTC (permalink / raw)
  To: Daniel Walker; +Cc: mingo, linux-kernel

On Tue, Jul 12, 2005 at 05:41:43PM -0700, Daniel Walker wrote:
> On Wed, 2005-07-13 at 10:25 +1000, Nathan Scott wrote:
> > On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote:
> > > 
> > > Is there something so odd about the XFS locking, that it can't use the
> > > rt_lock ?
> > 
> > Not that I know of - XFS does use the downgrade_write interface,
> > whose use isn't overly common in the rest of the kernel... maybe
> > that has caused some confusion, dunno.
> 
> Current RT doesn't implement downgrade_write() , but it's trivial to add
> it.

Ah, thats probably it then.

> So it calls up_read if it has a read lock ? Or up_write if it has a
> write lock? I suppose it would be broken if it didn't though.

Thats correct.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-13  0:25 ` Nathan Scott
  2005-07-13  0:41   ` Daniel Walker
@ 2005-07-13  6:47   ` Ingo Molnar
  2005-07-13 16:45     ` Daniel Walker
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2005-07-13  6:47 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Daniel Walker, linux-kernel

* Nathan Scott <nathans@sgi.com> wrote:

> On Tue, Jul 12, 2005 at 04:01:32PM -0700, Daniel Walker wrote:
> > 
> > Is there something so odd about the XFS locking, that it can't use the
> > rt_lock ?
> 
> Not that I know of - XFS does use the downgrade_write interface, whose 
> use isn't overly common in the rest of the kernel... maybe that has 
> caused some confusion, dunno.

downgrade_write() wasnt the main problem - the main problem was that for 
PREEMPT_RT i implemented 'strict' semaphores, which are not identical to 
vanilla kernel semaphores. The thing that seemed to impact XFS the most 
is the 'acquirer thread has to release the lock' rule of strict 
semaphores.  Both the XFS logging code and the XFS IO completion code 
seems to release locks in a different context from where the acquire 
happened. It's of course valid upstream behavior, but without these 
extra rules it's hard to do sane priority inheritance. (who do you boost 
if you dont really know who 'owns' the lock?) It might make sense to 
introduce some sort of sem_pass_to(new_owner) interface? For now i 
introduced a compat type, which lets those semaphores fall back to the 
vanilla implementation.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-13  6:47   ` Ingo Molnar
@ 2005-07-13 16:45     ` Daniel Walker
  2005-07-14  0:22       ` Nathan Scott
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Walker @ 2005-07-13 16:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Nathan Scott, linux-kernel

On Wed, 2005-07-13 at 08:47 +0200, Ingo Molnar wrote:
> 
> downgrade_write() wasnt the main problem - the main problem was that for 
> PREEMPT_RT i implemented 'strict' semaphores, which are not identical to 
> vanilla kernel semaphores. The thing that seemed to impact XFS the most 
> is the 'acquirer thread has to release the lock' rule of strict 
> semaphores.  Both the XFS logging code and the XFS IO completion code 
> seems to release locks in a different context from where the acquire 
> happened. It's of course valid upstream behavior, but without these 
> extra rules it's hard to do sane priority inheritance. (who do you boost 
> if you dont really know who 'owns' the lock?) It might make sense to 
> introduce some sort of sem_pass_to(new_owner) interface? For now i 
> introduced a compat type, which lets those semaphores fall back to the 
> vanilla implementation.



There's a lot of code like this in there .. I've seen some that down()
in process contex, and up() in interrupt contex which is weird .. But
those aren't major features, just little drivers. XFS is pretty major
feature.

Nathan, does XFS need this property or could we convert it to
synchronize the locking (with ease?)?

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-13 16:45     ` Daniel Walker
@ 2005-07-14  0:22       ` Nathan Scott
  2005-07-14  3:50         ` Dave Chinner
  0 siblings, 1 reply; 22+ messages in thread
From: Nathan Scott @ 2005-07-14  0:22 UTC (permalink / raw)
  To: Daniel Walker, Ingo Molnar, Steve Lord; +Cc: linux-kernel, linux-xfs

Hi there,

On Wed, Jul 13, 2005 at 09:45:58AM -0700, Daniel Walker wrote:
> On Wed, 2005-07-13 at 08:47 +0200, Ingo Molnar wrote:
> > 
> > downgrade_write() wasnt the main problem - the main problem was that for 
> > PREEMPT_RT i implemented 'strict' semaphores, which are not identical to 
> > vanilla kernel semaphores. The thing that seemed to impact XFS the most 
> > is the 'acquirer thread has to release the lock' rule of strict 
> > semaphores.  Both the XFS logging code and the XFS IO completion code 
> > seems to release locks in a different context from where the acquire 
> > happened. It's of course valid upstream behavior, but without these 
> > extra rules it's hard to do sane priority inheritance. (who do you boost 
> > if you dont really know who 'owns' the lock?) It might make sense to 
> > introduce some sort of sem_pass_to(new_owner) interface? For now i 
> > introduced a compat type, which lets those semaphores fall back to the 
> > vanilla implementation.

Hmm, I'm not aware of anywhere in XFS where we do that.  From talking
to some colleagues here, they're claiming that we can't be doing that
since it'd trip an assert in the IRIX mrlock code.

> There's a lot of code like this in there .. I've seen some that down()
> in process contex, and up() in interrupt contex which is weird .. But
> those aren't major features, just little drivers. XFS is pretty major
> feature.
> 
> Nathan, does XFS need this property or could we convert it to
> synchronize the locking (with ease?)?

I'm not yet sure in what situations we are doing this, so can't
really say.  It'd be interesting to see an implementation of the
downgrade_write functionality and then a specific case where the
above locking behaviour happens ... and I'd then be able to say
how tricky that would be to resolve.

Steve, are you aware of situations where we unlock in a different 
thread to where we acquired the lock?  It'd surprise me, as we're
holding these things for as short a time as possible - afaict the
transactions always ilock, copy delta to iclog, pin, and unlock,
no?  (all in the same thread).  I can't see the iolock being used
in this way anywhere either... you?

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-14  0:22       ` Nathan Scott
@ 2005-07-14  3:50         ` Dave Chinner
  2005-07-14  4:10           ` Daniel Walker
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2005-07-14  3:50 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Daniel Walker, Ingo Molnar, Steve Lord, linux-kernel, linux-xfs

On Thu, Jul 14, 2005 at 10:22:46AM +1000, Nathan Scott wrote:
> Hi there,
> 
> On Wed, Jul 13, 2005 at 09:45:58AM -0700, Daniel Walker wrote:
> > On Wed, 2005-07-13 at 08:47 +0200, Ingo Molnar wrote:
> > > 
> > > downgrade_write() wasnt the main problem - the main problem was that for 
> > > PREEMPT_RT i implemented 'strict' semaphores, which are not identical to 
> > > vanilla kernel semaphores. The thing that seemed to impact XFS the most 
> > > is the 'acquirer thread has to release the lock' rule of strict 
> > > semaphores. Both the XFS logging code and the XFS IO completion code 
> > > seems to release locks in a different context from where the acquire 
> > > happened. It's of course valid upstream behavior, but without these 
> > > extra rules it's hard to do sane priority inheritance. (who do you boost 
> > > if you dont really know who 'owns' the lock?) It might make sense to 
> > > introduce some sort of sem_pass_to(new_owner) interface? For now i 
> > > introduced a compat type, which lets those semaphores fall back to the 
> > > vanilla implementation.
> 
> Hmm, I'm not aware of anywhere in XFS where we do that.  From talking
> to some colleagues here, they're claiming that we can't be doing that
> since it'd trip an assert in the IRIX mrlock code.

Now that I've read the thread, I see it's not mrlocks that is the
issue with unlocking in a different context - it's semaphores.

All the pagebuf synchronisation is done with a semaphore because
it's held across the I/O and it's _most definitely_ released in a
different context when doing async I/O. Just about all metadata I/O
is async because once the transaction has been logged to disk we
don't need to write these buffers out synchronously. Not to mention
the log I/O completion unlocks the buffers in a transaction in a
different context as well.

The whole point of using a semaphore in the pagebuf is because there
is no tracking of who "owns" the lock so we can actually release it
in a different context. Semaphores were invented for this purpose,
and we use them in the way they were intended. ;)

Realistically, I seriously doubt the need for any sort of rt changes
to these semaphores. They can be held for indeterminant periods of
time potentially across multiple disk I/Os (e.g. when held locked in
a transaction that requires more metadata to be read in from disk to
make progress).  Hence there is no really no point in making them RT
aware because if you end up waiting on one of them you can forget
about pretty much any RT guarantee that you've ever given....

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-14  3:50         ` Dave Chinner
@ 2005-07-14  4:10           ` Daniel Walker
       [not found]             ` <20050714052347.GA18813@elte.hu>
  2005-07-15 10:23             ` Ingo Molnar
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Walker @ 2005-07-14  4:10 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Nathan Scott, Ingo Molnar, Steve Lord, linux-kernel, linux-xfs

On Thu, 2005-07-14 at 13:50 +1000, Dave Chinner wrote:
> Now that I've read the thread, I see it's not mrlocks that is the
> issue with unlocking in a different context - it's semaphores.
> 
> All the pagebuf synchronisation is done with a semaphore because
> it's held across the I/O and it's _most definitely_ released in a
> different context when doing async I/O. Just about all metadata I/O
> is async because once the transaction has been logged to disk we
> don't need to write these buffers out synchronously. Not to mention
> the log I/O completion unlocks the buffers in a transaction in a
> different context as well.
> 
> The whole point of using a semaphore in the pagebuf is because there
> is no tracking of who "owns" the lock so we can actually release it
> in a different context. Semaphores were invented for this purpose,
> and we use them in the way they were intended. ;)

Where is the that semaphore spec, is that posix ?  There is a new
construct called "complete" that is good for this type of stuff too. No
owner needed , just something running, and something waiting till it
completes.

> Realistically, I seriously doubt the need for any sort of rt changes
> to these semaphores. They can be held for indeterminant periods of
> time potentially across multiple disk I/Os (e.g. when held locked in
> a transaction that requires more metadata to be read in from disk to
> make progress).  Hence there is no really no point in making them RT
> aware because if you end up waiting on one of them you can forget
> about pretty much any RT guarantee that you've ever given....

PI is always good, cause it allows the tracking of what is high
priority , and what is not . 

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

[parent not found: <20050714052347.GA18813@elte.hu>]

* Re: RT and XFS
       [not found]             ` <20050714052347.GA18813@elte.hu>
@ 2005-07-14 15:56               ` Daniel Walker
  2005-07-14 16:08                 ` Christoph Hellwig
  2005-07-14 16:08                 ` Christoph Hellwig
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Walker @ 2005-07-14 15:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel,
	linux-xfs

On Thu, 2005-07-14 at 07:23 +0200, Ingo Molnar wrote:
> * Daniel Walker <dwalker@mvista.com> wrote:
> 
> > > The whole point of using a semaphore in the pagebuf is because there
> > > is no tracking of who "owns" the lock so we can actually release it
> > > in a different context. Semaphores were invented for this purpose,
> > > and we use them in the way they were intended. ;)
> > 
> > Where is the that semaphore spec, is that posix ?  There is a new 
> > construct called "complete" that is good for this type of stuff too. 
> > No owner needed , just something running, and something waiting till 
> > it completes.
> 
> wrt. posix, we dont really care about that for kernel-internal 
> primitives like struct semaphore. So whether it's posix or not has no 
> relevance.

This reminds me of Documentation/stable_api_nonsense.txt . That no one
should really be dependent on a particular kernel API doing a particular
thing. The kernel is play dough for the kernel hacker (as it should be),
including kernel semaphores.

So we can change whatever we want, and make no excuses, as long as we
fix the rest of the kernel to work with our change. That seems pretty
sensible , because Linux should be an evolution. 

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-14 15:56               ` Daniel Walker
@ 2005-07-14 16:08                 ` Christoph Hellwig
  2005-07-18 12:10                   ` Esben Nielsen
  2005-07-14 16:08                 ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2005-07-14 16:08 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Dave Chinner, greg, Nathan Scott, Steve Lord,
	linux-kernel, linux-xfs

On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote:
> This reminds me of Documentation/stable_api_nonsense.txt . That no one
> should really be dependent on a particular kernel API doing a particular
> thing. The kernel is play dough for the kernel hacker (as it should be),
> including kernel semaphores.
> 
> So we can change whatever we want, and make no excuses, as long as we
> fix the rest of the kernel to work with our change. That seems pretty
> sensible , because Linux should be an evolution. 

Daniel, get a fucking clue.  Read some CS 101 literature on what a semaphore
is defined to be.  If you want PI singing dancing blinking christmas tree
locking primites call them a mutex, but not a semaphore.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-14 16:08                 ` Christoph Hellwig
@ 2005-07-18 12:10                   ` Esben Nielsen
  2005-07-19  3:26                     ` Bill Huey
  0 siblings, 1 reply; 22+ messages in thread
From: Esben Nielsen @ 2005-07-18 12:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Daniel Walker, Ingo Molnar, Dave Chinner, greg, Nathan Scott,
	Steve Lord, linux-kernel, linux-xfs

On Thu, 14 Jul 2005, Christoph Hellwig wrote:

> On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote:
> > This reminds me of Documentation/stable_api_nonsense.txt . That no one
> > should really be dependent on a particular kernel API doing a particular
> > thing. The kernel is play dough for the kernel hacker (as it should be),
> > including kernel semaphores.
> > 
> > So we can change whatever we want, and make no excuses, as long as we
> > fix the rest of the kernel to work with our change. That seems pretty
> > sensible , because Linux should be an evolution. 
> 
> Daniel, get a fucking clue.  Read some CS 101 literature on what a semaphore
> is defined to be.  If you want PI singing dancing blinking christmas tree
> locking primites call them a mutex, but not a semaphore.
>

As a matter of fact I just finished what corresponds to your "CS 101" (I
study CS in spare time while having a full time job coding RT stuff):
To the one lecture I attended they talked about sempahores. They tought
students to use binary semphores for locking. Based on real-life
experience (and the Pathfinder story), I complained and told
them they ought to teach the students to use a mutex instead. They had no
clue "It is the same thing they said". Yes, a mutex can be implemented
just as a binary semaphore but the semantics of it is different. In RT the
difference is very important and even without-RT it is a good idea to
maintain the difference for readability and deadlock detection. If you
later on want to optimize the semaphore for what it is used for it is also
good to have maintained that information. It is a bit like discarding
the type information from you programs. You want to keep the type information 
even though the compilere end up producing the same code.

The kernel developer clearly have followed the same lectures and used
plain binary semaphores, sometimes calling the mutex sometimes semaphore.
I believe that the semaphore ought to be removed. Either use a mutex or
a completion. Far the most code is using a sempahore as either signalling 
- i.e. as a completion - or critical sections - i.e. as a mutex. If code
mixes the usage it is must likely very hard to read....

Unfortunately, one of the goals of the preempt-rt branch is to avoid
altering too much code. Therefore the type semaphore can't be removed
there. Therefore the name still lingers ... :-(

Esben

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-18 12:10                   ` Esben Nielsen
@ 2005-07-19  3:26                     ` Bill Huey
  2005-07-19 12:34                       ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Bill Huey @ 2005-07-19  3:26 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: Christoph Hellwig, Daniel Walker, Ingo Molnar, Dave Chinner, greg,
	Nathan Scott, Steve Lord, linux-kernel, linux-xfs

On Mon, Jul 18, 2005 at 02:10:31PM +0200, Esben Nielsen wrote:
> Unfortunately, one of the goals of the preempt-rt branch is to avoid
> altering too much code. Therefore the type semaphore can't be removed
> there. Therefore the name still lingers ... :-(

This is where you failed. You assumed that that person making the comment,
Christopher, in the first place didn't have his head up his ass in the
first place and was open to your end of the discussion.

bill


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-19  3:26                     ` Bill Huey
@ 2005-07-19 12:34                       ` Ingo Molnar
  2005-07-19 13:27                         ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2005-07-19 12:34 UTC (permalink / raw)
  To: Bill Huey
  Cc: Esben Nielsen, Christoph Hellwig, Daniel Walker, Dave Chinner,
	greg, Nathan Scott, Steve Lord, linux-kernel, linux-xfs


* Bill Huey <bhuey@lnxw.com> wrote:

> On Mon, Jul 18, 2005 at 02:10:31PM +0200, Esben Nielsen wrote:
> > Unfortunately, one of the goals of the preempt-rt branch is to avoid
> > altering too much code. Therefore the type semaphore can't be removed
> > there. Therefore the name still lingers ... :-(
> 
> This is where you failed. You assumed that that person making the 
> comment, Christopher, in the first place didn't have his head up his 
> ass in the first place and was open to your end of the discussion.

please take me off the Cc: list for such kind of replies. Christoph is 
very much entitled to his opinion, which i happen to mostly share in 
this case: we should not be bothering upstream with requirements unique 
to PREEMPT_RT. PREEMPT_RT restricts struct semaphore to be a mutex, and 
that doesnt make it a classic semaphore anymore. We had no other choice 
but it's still somewhat unclean in that regard.

(I do disagree with Christoph on another point: i do think we eventually 
want to change the standard semaphore type in a similar fashion upstream 
as well - but that probably has to come with a s/struct semaphore/struct 
mutex/ change as well.)

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-19 12:34                       ` Ingo Molnar
@ 2005-07-19 13:27                         ` Christoph Hellwig
  2005-07-19 13:50                           ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2005-07-19 13:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bill Huey, Esben Nielsen, Christoph Hellwig, Daniel Walker,
	Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel,
	linux-xfs

On Tue, Jul 19, 2005 at 02:34:57PM +0200, Ingo Molnar wrote:
> (I do disagree with Christoph on another point: i do think we eventually 
> want to change the standard semaphore type in a similar fashion upstream 
> as well - but that probably has to come with a s/struct semaphore/struct 
> mutex/ change as well.)

Actually having a mutex_t in mainline would be a good idea even without
preempt rt, to document better what kind of locking we expect.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-19 13:27                         ` Christoph Hellwig
@ 2005-07-19 13:50                           ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2005-07-19 13:50 UTC (permalink / raw)
  To: Christoph Hellwig, Bill Huey, Esben Nielsen, Daniel Walker,
	Dave Chinner, greg, Nathan Scott, Steve Lord, linux-kernel,
	linux-xfs

* Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Jul 19, 2005 at 02:34:57PM +0200, Ingo Molnar wrote:
> > (I do disagree with Christoph on another point: i do think we eventually 
> > want to change the standard semaphore type in a similar fashion upstream 
> > as well - but that probably has to come with a s/struct semaphore/struct 
> > mutex/ change as well.)
> 
> Actually having a mutex_t in mainline would be a good idea even 
> without preempt rt, to document better what kind of locking we expect.

cool! I'll cook up a patch for that. Right now these are the numbers: 
there are 526 uses of struct semaphore in 2.6.12. In the -RT tree i had 
to change 23 of them to be compat_semaphore - i.e. 23 uses were 
definitely non-mutex.

(We sure have missed some cases - but it would be fair to say that the 
expected number of cases is less than 50, and that we've mapped the most 
common ones already. That makes it a 90%/10% splitup: more than 90% of 
all struct semaphore use is pure mutex.)

Of the remaining <10% cases, the majority is of the type of completions, 
and there are a handful of (<10) cases of 'counted semaphore' uses: 
semaphores with a count larger than 1. (e.g. ACPI uses it to count 
resources, some audio code too - but it's very rare) Btw., that's the 
only 'true' (in terms of CS) semaphore use.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-14 15:56               ` Daniel Walker
  2005-07-14 16:08                 ` Christoph Hellwig
@ 2005-07-14 16:08                 ` Christoph Hellwig
  1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2005-07-14 16:08 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Dave Chinner, greg, Nathan Scott, Steve Lord,
	linux-kernel, linux-xfs

On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote:
> On Thu, 2005-07-14 at 07:23 +0200, Ingo Molnar wrote:
> > * Daniel Walker <dwalker@mvista.com> wrote:
> > 
> > > > The whole point of using a semaphore in the pagebuf is because there
> > > > is no tracking of who "owns" the lock so we can actually release it
> > > > in a different context. Semaphores were invented for this purpose,
> > > > and we use them in the way they were intended. ;)
> > > 
> > > Where is the that semaphore spec, is that posix ?  There is a new 
> > > construct called "complete" that is good for this type of stuff too. 
> > > No owner needed , just something running, and something waiting till 
> > > it completes.
> > 
> > wrt. posix, we dont really care about that for kernel-internal 
> > primitives like struct semaphore. So whether it's posix or not has no 
> > relevance.
> 
> This reminds me of Documentation/stable_api_nonsense.txt . That no one
> should really be dependent on a particular kernel API doing a particular
> thing. The kernel is play dough for the kernel hacker (as it should be),
> including kernel semaphores.
> 
> So we can change whatever we want, and make no excuses, as long as we
> fix the rest of the kernel to work with our change. That seems pretty
> sensible , because Linux should be an evolution. 
> 
> Daniel
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-14  4:10           ` Daniel Walker
       [not found]             ` <20050714052347.GA18813@elte.hu>
@ 2005-07-15 10:23             ` Ingo Molnar
  2005-07-15 16:16               ` Daniel Walker
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2005-07-15 10:23 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Dave Chinner, Nathan Scott, Steve Lord, linux-kernel, linux-xfs,
	Christoph Hellwig


* Daniel Walker <dwalker@mvista.com> wrote:

> PI is always good, cause it allows the tracking of what is high 
> priority , and what is not .

that's just plain wrong. PI might be good if one cares about priorities 
and worst-case latencies, but most of the time the kernel is plain good 
enough and we dont care. PI can also be pretty expensive. So in no way, 
shape or form can PI be "always good".

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-15 10:23             ` Ingo Molnar
@ 2005-07-15 16:16               ` Daniel Walker
  2005-07-18 11:33                 ` Esben Nielsen
  2005-07-19  3:31                 ` Bill Huey
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Walker @ 2005-07-15 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Chinner, Nathan Scott, Steve Lord, linux-kernel, linux-xfs,
	Christoph Hellwig

On Fri, 2005-07-15 at 12:23 +0200, Ingo Molnar wrote:
> * Daniel Walker <dwalker@mvista.com> wrote:
> 
> > PI is always good, cause it allows the tracking of what is high 
> > priority , and what is not .
> 
> that's just plain wrong. PI might be good if one cares about priorities 
> and worst-case latencies, but most of the time the kernel is plain good 
> enough and we dont care. PI can also be pretty expensive. So in no way, 
> shape or form can PI be "always good".

I don't agree with that. But of course I'm always speaking from a real
time perspective . PI is expensive , but it won't always be. However, no
one is forcing PI on anyone, even if I think it's good ..

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-15 16:16               ` Daniel Walker
@ 2005-07-18 11:33                 ` Esben Nielsen
  2005-07-19  3:31                 ` Bill Huey
  1 sibling, 0 replies; 22+ messages in thread
From: Esben Nielsen @ 2005-07-18 11:33 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Dave Chinner, Nathan Scott, Steve Lord, linux-kernel,
	linux-xfs, Christoph Hellwig

On Fri, 15 Jul 2005, Daniel Walker wrote:

> On Fri, 2005-07-15 at 12:23 +0200, Ingo Molnar wrote:
> > * Daniel Walker <dwalker@mvista.com> wrote:
> > 
> > > PI is always good, cause it allows the tracking of what is high 
> > > priority , and what is not .
> > 
> > that's just plain wrong. PI might be good if one cares about priorities 
> > and worst-case latencies, but most of the time the kernel is plain good 
> > enough and we dont care. PI can also be pretty expensive. So in no way, 
> > shape or form can PI be "always good".
> 
> I don't agree with that. But of course I'm always speaking from a real
> time perspective . PI is expensive , but it won't always be. However, no
> one is forcing PI on anyone, even if I think it's good ..
> 

Is PI needed? If you use a mutex to protect a critical area you are
destroying the strict meaning of priorities if the mutex doesn't have PI:
Priority inversion can effectively make the high priority task low
priority in that situation and postpone it's execution indefinitely. 
For RT applications that is clearly unacceptable.

One can argue that for non-RT tasks priorities aren't supposed to be that 
rigid as for RT tasks, anyway. Therefore it doesn't matter so much.
But as I read the comments in sched.c a nice -20 task have to preempt any
nice 0 task no matter how much a cpu-hog it is. If it happens to share a
critical section with a nice +19 task, priority inversion will
occationally destroy that property. If we disregard the costs of PI, PI is
thus a good thing.

But how expensive is PI? Ofcourse there is an overhead in doing
the calculations. Ingo's implementation can be optimized quite a bit once
things are settled but it will always be many times more expensive than a
raw spin-lock. But is it much more expensive than a plain binary
semaphore?

If the is no congestion on a mutex the PI code will not be called at all.
On UP, the only occation where congestion can occur is when a low
priority task is preempted by a higher priority task while it has the
mutex. So let us look at the expensive part where the high priority task
tries to grab the mutex:

With PI: The owner have to be boosted, an immediate task switch have to
take place, the owner runs to the unlock operation and it set down in
priority, whereafter there is a task-switch again to the highpriority
task.

Without PI: The owner waits and there is a task switch to some thread
which might not be the owner but often is. When the owner eventually
unlocks the mutex it will be follow by a task-switch - because congestion
can only occur when the task trying to get the task preempts and thus have
higher priority than the owner.

The number of task switches are thus the same with and without PI!

And then there is the cache issue: When other tasks gets scheduled in the
priority inversion case the data being protected can be flushed from the
cache while they are running. With PI the CPU continues to work with the
same data - and most often in the same code module. I.e. there is a higher
chance that the instruction and data cache contains the right data.

Thus in the end it all depends on how cheaply the PI calculations can be
made.

Esben

> Daniel
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RT and XFS
  2005-07-15 16:16               ` Daniel Walker
  2005-07-18 11:33                 ` Esben Nielsen
@ 2005-07-19  3:31                 ` Bill Huey
  1 sibling, 0 replies; 22+ messages in thread
From: Bill Huey @ 2005-07-19  3:31 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Dave Chinner, Nathan Scott, Steve Lord, linux-kernel,
	linux-xfs, Christoph Hellwig

On Fri, Jul 15, 2005 at 09:16:55AM -0700, Daniel Walker wrote:
> I don't agree with that. But of course I'm always speaking from a real
> time perspective . PI is expensive , but it won't always be. However, no
> one is forcing PI on anyone, even if I think it's good ..

It depends on what kind of PI under specific circumstances. In the general
kernel, it's really to be avoided at all costs since it's masking a general
contention problem at those places. In a formally provable worst case system
using priority ceiling emulation and stuff, PI really valuable. How a system
like the Linux kernel fits into that is a totally different story. General
purpose kernels using general purpose facilities don't.

That's how I see it.

bill

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2005-07-19 13:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-12 23:01 RT and XFS Daniel Walker
2005-07-12 23:39 ` William Weston
2005-07-13  0:25 ` Nathan Scott
2005-07-13  0:41   ` Daniel Walker
2005-07-13  0:37     ` Nathan Scott
2005-07-13  6:47   ` Ingo Molnar
2005-07-13 16:45     ` Daniel Walker
2005-07-14  0:22       ` Nathan Scott
2005-07-14  3:50         ` Dave Chinner
2005-07-14  4:10           ` Daniel Walker
     [not found]             ` <20050714052347.GA18813@elte.hu>
2005-07-14 15:56               ` Daniel Walker
2005-07-14 16:08                 ` Christoph Hellwig
2005-07-18 12:10                   ` Esben Nielsen
2005-07-19  3:26                     ` Bill Huey
2005-07-19 12:34                       ` Ingo Molnar
2005-07-19 13:27                         ` Christoph Hellwig
2005-07-19 13:50                           ` Ingo Molnar
2005-07-14 16:08                 ` Christoph Hellwig
2005-07-15 10:23             ` Ingo Molnar
2005-07-15 16:16               ` Daniel Walker
2005-07-18 11:33                 ` Esben Nielsen
2005-07-19  3:31                 ` Bill Huey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox