All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Török Edwin" <edwin@clamav.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	aCaB <acab@clamav.net>, David Howells <dhowells@redhat.com>,
	Nick Piggin <npiggin@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Mutex vs semaphores scheduler bug
Date: Mon, 12 Oct 2009 18:37:17 +0300	[thread overview]
Message-ID: <4AD34D2D.7050808@clamav.net> (raw)
In-Reply-To: <1255359207.10420.31.camel@twins>

On 2009-10-12 17:53, Peter Zijlstra wrote:
> On Sat, 2009-10-10 at 17:57 +0300, Török Edwin wrote:
>> If a semaphore (such as mmap_sem) is heavily congested, then using a
>> userspace mutex makes the program faster.
>>
>> For example using a mutex around *anonymous* mmaps, speeds it up
>> significantly (~80% on this microbenchmark,
>> ~15% on real applications). Such workarounds shouldn't  be necessary for
>> userspace applications, the kernel should
>> by default use the most efficient implementation for locks.
> 
> Should, yes, does, no.
> 
>> However when using a mutex the number of context switches is SMALLER by
>> 40-60%.
> 
> That matches the problem, see below.
> 
>> I think its a bug in the scheduler, it scheduler the mutex case much
>> better. 
> 
> It's not, the scheduler doesn't know about mutexes/futexes/rwsems.
> 
>> Maybe because userspace also spins a bit before actually calling
>> futex().
> 
> Nope, if we would ever spin, it would be in the kernel after calling
> FUTEX_LOCK (which currently doesn't exist). glibc shouldn't do any
> spinning on its own (if it does, I have yet another reason to try and
> supplant the glibc futex code).

I think it doesn't by default, I was mislead by the huge number of cases
in pthread_mutex_lock.c. The default one does this:

__lll_lock_wait:
	cfi_startproc
	pushq	%r10
	cfi_adjust_cfa_offset(8)
	pushq	%rdx
	cfi_adjust_cfa_offset(8)
	cfi_offset(%r10, -16)
	cfi_offset(%rdx, -24)
	xorq	%r10, %r10	/* No timeout.  */
	movl	$2, %edx
	LOAD_FUTEX_WAIT (%esi)

	cmpl	%edx, %eax	/* NB:	 %edx == 2 */
	jne	2f

1:	movl	$SYS_futex, %eax
	syscall

2:	movl	%edx, %eax
	xchgl	%eax, (%rdi)	/* NB:	 lock is implied */

	testl	%eax, %eax
	jnz	1b

	popq	%rdx
	cfi_adjust_cfa_offset(-8)
	cfi_restore(%rdx)
	popq	%r10
	cfi_adjust_cfa_offset(-8)
	cfi_restore(%r10)
	retq


> 
>> I think its important to optimize the mmap_sem semaphore
> 
> It is.
> 
> The problem appears to be that rwsem doesn't allow lock-stealing

OK, sorry for mistaking lack of lock-stealing with scheduler bug.

>, and
> very strictly maintains FIFO order on contention. This results in extra
> schedules and reduced performance as you noticed.
> 
> What happens is that when we release a contended rwsem we assign it to
> the next waiter, if before that waiter gets ran, another (running) tasks
> comes along and tries to acquire the lock, that gets put to sleep, even
> though it could possibly get to acquire it (and the woken waiter would
> detect failure and go back to sleep).

The reason I initially thought it was a scheduler bug is that it seemed
it has something to do with wakeups, and threads are sleeping for too
long waiting for the lock.
But I think the scheduler can't give preference to tasks which would be
able to acquire a semaphore they were sleeping on, because that'd throw
 fair scheduling off-balance, right?

> 
> So what I think we need to do is have a look at all this lib/rwsem.c
> slowpath code and hack in lock stealing.
> 
> 

Best regards,
--Edwin

  reply	other threads:[~2009-10-12 15:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-10 14:57 Mutex vs semaphores scheduler bug Török Edwin
2009-10-12 14:53 ` Peter Zijlstra
2009-10-12 15:37   ` Török Edwin [this message]
2009-10-15 23:44   ` David Howells
2009-10-17 15:32     ` Peter Zijlstra
2009-10-20 19:02       ` Török Edwin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AD34D2D.7050808@clamav.net \
    --to=edwin@clamav.net \
    --cc=acab@clamav.net \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.