Re: semaphore and mutex in current Linux kernel (3.2.2)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Chen, Dennis (SRDC SW)" <Dennis1.Chen@amd.com>
Cc: Clemens Ladisch <clemens@ladisch.de>,
	Ingo Molnar <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: semaphore and mutex in current Linux kernel (3.2.2)
Date: Fri, 13 Apr 2012 11:43:50 -0700	[thread overview]
Message-ID: <20120413184350.GD2402@linux.vnet.ibm.com> (raw)
In-Reply-To: <491D6B4EAD0A714894D8AD22F4BDE043A33FD3@SCYBEXDAG02.amd.com>

On Fri, Apr 13, 2012 at 02:15:25PM +0000, Chen, Dennis (SRDC SW) wrote:
> On Thu, Apr 12, 2012 at 11:18 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Apr 12, 2012 at 09:42:36AM +0000, Chen, Dennis (SRDC SW) wrote:
> >> I think mutex_lock is not the same as spin_lock, so it's not reasonable to add the time restriction
> >> for the lock user...though maybe most owners of the lock release it very quickly in real life
> >
> > Yep -- to do otherwise is usually bad for both performance and scalability.
> >
> >> The preemptive has been disabled at the beginning of the mutex lock slow path, so I guess above 2 and 3
> >> are impossible...
> >
> > Almost.  The guy in mutex_spin_on_owner() might be interrupted at just
> > the wrong times, which would have the same effect as being preempted.
> > But in both cases, the probability is really low, so it can still be
> > safely ignored.
> >
> >> I have a test to get the jiffies lasting when someone call set_need_resched() to break the loop:
> >>     ...
> >>     i0 = jiffies;
> >>     rcu_read_lock();
> >>     while(1){
> >>         if (need_resched())
> >>             break;
> >>         cpu_relax();
> >>     }
> >>     rcu_read_unlock();
> >>     i1 = jiffies;
> >>     printk("i0 = %ld, i1 = %ld, HZ=%d\n", i0, i1, HZ);
> >>     ...
> >> The result is, in the case of HZ=1000, i1-i0 will be in [1,...,5] range randomly. So if we exit the while loop
> >> on need_resched(), that means the optimization of mutex comparing semaphore doesn't success: mutex spins about
> >> several jiffies before goto sleep or get the lock finally,right?
> >
> > But this is quite different than mutex_spin_on_owner().  You are doing
> > "while(1)" and mutex_spin_on_owner() is doing "while (owner_running(...))".
> > This means that mutex_spin_on_owner() will typically stop spinning as
> > soon as the owner blocks or releases the mutex.  In contrast, the only
> > way your loop can exit is if need_resched() returns true.
> >
> > Therefore, your results are irrelevant to mutex_spin_on_owner().  If you
> > want to show that mutex_spin_on_owner() has a problem, you should
> > instrument mutex_spin_on_owner() and run your instrumented version in
> > the kernel.
> 
> I know that the test is different from mutex_spin_on_owner(), the reason I used while(1) not the actual code instead is
> that I want to know how many jiffies will be elapsed when break the loop on need_resched in case of owner running long enough.
> If you want to see the test result on mutex_spin_on_owner(), I have the following test case (based on my earlier message
> in this thread, you can check if for some background context...):
> 
> 1. <in a kernel module> in the xxx_read() function, I have the follow code pieces:
>     /* idle loop for 5s between the 1st App get the lock and release it
>      Unsigned long j = jiffies + 5 * HZ;
>     /* identify ourself in the mutex_spin_on_owner function to void tons of printk message
>      * The first App will get the mutex lock in fast path, it will not enter into slow path, so it's easy
>      * to identify the 2nd App by tricky pid
>      */
>     current->pid |= (1 << 31);
>     mutex_lock(&c_mutex);
>     while(time_before(jiffies, j)){
>         cpu_relax();
>     }
>     mutex_unlock(&c_mutex);
>     current->pid &= ~(1 << 31);
> 
> 2. <in the kernel source code> I make some changes in mutex_spin_on_owner as below:
>     int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
> {
>         unsigned long j0, j1; /*add*/
>         if (!sched_feat(OWNER_SPIN))
>                 return 0;
> 
>         rcu_read_lock();
>         j0 = jiffies; /*add*/
>         while (owner_running(lock, owner)) {
>                 if (need_resched())
>                         break;
> 
>                 arch_mutex_cpu_relax();
>         }
>         j1 = jiffies;
>         rcu_read_unlock();
>         if(current->pid & (1 << 31))  /*add*/
>                 printk("j1-j0=%ld\n", j1-j0); /*add*/
>         /*
>          * We break out the loop above on need_resched() and when the
>          * owner changed, which is a sign for heavy contention. Return
>          * success only when lock->owner is NULL.
>          */
>         return lock->owner == NULL;
> }
> 
> 3. run the Apps in 2 different CPUs (0 and 2):
>    #taskset 0x00000001 ./main
>    #taskset 0x00000004 ./main
>    
>   The App running on cpu0 will get the lock directly, and it will in idle loop state for 5s owning the lock,
>   So App on cpu2 will enter into slow path and call mutex_spin_on_owner(), thus we can get the delta jiffies
>   When the loop break on need_resched(). The test result as below(HZ=1000, I run the test about 6 rounds):
>   j1-j0=19    -- NO.1
>   j1-j0=512   -- NO.2
>   j1-j0=316   -- NO.3
>   j1-j0=538   -- NO.4
>   j1-j0=331   -- NO.5
>   j1-j0=996   -- NO.6
> 
> obviously, the mutex will waste lots of CPU resource comparing semaphore in this case.
> I know the base rock of mutex performance optimization is "if the lock owner is running, it is likely to
> release the lock soon", I don't know if there are some statistics data to support this rational, in other words,
> what does make us get this conclusion?
> 
> >> If lucky enough, maybe in 99% case we break the loop by owner_running(lock, owner), meaning the lock owner release
> >> the lock in a very quick speed (can't be measured by jiffies).
> >
> > That is in fact the intent of the design.
> >
> > In theory, possible problems with the current code include:
> >
> > 1.      Unnecessarily blocking lower-priority tasks.
> >
> > 2.      Consuming energy spinning when the CPU might otherwise
> >        go idle and thus into a low-power state.
> >
> > But you need to measure the actual code running a realistic workload
> > to get a fair diagnosis of any problems that might be present.
> 
> Actually, as I mentioned earlier in this thread, I have no intention to care about the code design itself, I am just curious
> about the PERFORMANCE OPTIMIZATION of mutex, not others. Now I am writing some simple benchmark code which used to measure 
> the real performance of the 2 primitives with realistic workload, hopefully I can finish it in this weekend, I am not familiar
> with the user space codes :-)

Actual measurements of the real code showing the benefits of any change
you might propose would indeed be helpful.  ;-)

							Thanx, Paul

next prev parent reply	other threads:[~2012-04-13 18:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-01  9:56 semaphore and mutex in current Linux kernel (3.2.2) Chen, Dennis (SRDC SW)
2012-04-01 12:19 ` Ingo Molnar
2012-04-02 15:28   ` Chen, Dennis (SRDC SW)
2012-04-03  7:52     ` Ingo Molnar
2012-04-05  8:37       ` Chen, Dennis (SRDC SW)
2012-04-05 14:15         ` Clemens Ladisch
2012-04-06  9:45           ` Chen, Dennis (SRDC SW)
2012-04-06 10:10             ` Clemens Ladisch
2012-04-06 17:47               ` Chen, Dennis (SRDC SW)
2012-04-09 18:45                 ` Paul E. McKenney
2012-04-11  5:04                   ` Chen, Dennis (SRDC SW)
2012-04-11 17:30                     ` Paul E. McKenney
2012-04-12  9:42                       ` Chen, Dennis (SRDC SW)
2012-04-12 15:18                         ` Paul E. McKenney
2012-04-13 14:15                           ` Chen, Dennis (SRDC SW)
2012-04-13 18:43                             ` Paul E. McKenney [this message]
2012-04-16  8:33                               ` [PATCH 0/2] tools perf: Add a new benchmark tool for semaphore/mutex Chen, Dennis (SRDC SW)
2012-04-16  9:24                                 ` Ingo Molnar
2012-04-16 14:10                                   ` Chen, Dennis (SRDC SW)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120413184350.GD2402@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Dennis1.Chen@amd.com \
    --cc=clemens@ladisch.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.