public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ulrich Drepper <drepper@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Andi Kleen <ak@suse.de>,
	Ravikiran G Thirumalai <kiran@scalex86.org>,
	"Shai Fultheim (Shai@scalex86.org)" <shai@scalex86.org>,
	pravin b shelar <pravin.shelar@calsoftinc.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] FUTEX : new PRIVATE futexes
Date: Fri, 06 Apr 2007 07:53:08 +0200	[thread overview]
Message-ID: <4615E044.6080205@cosmosbay.com> (raw)
In-Reply-To: <4615A009.808@yahoo.com.au>

Nick Piggin a écrit :
> Hi Eric,
> 
> Thanks for doing this... It's looking good, I just have some minor
> comments:

Hi Nick, thanks for reviewing.

> 
> Eric Dumazet wrote:
>>   */
>> -int get_futex_key(void __user *uaddr, union futex_key *key)
>> +int get_futex_key(void __user *uaddr, union futex_key *key,
>> +    struct rw_semaphore *shared)
> 
> Can we pass in something other than the rw_semaphore here? Seeing as
> it only actually gets used as a flag, it might be nicer just to pass
> a 0 or 1? And all through the call stack...
> 
> Did the whole thing just turn out neater when you passed the rwsem?
> We always know to use current->mm->mmap_sem, so it doesn't seem like
> a boolean flag would hurt?

That's a good question

current->mm->mmap_sem being calculated once is a win in itself, because 
current access is not cheap.
It also does the memory access to go through part of the chain in advance, 
before its use. It does a prefetch() equivalent for free : If current->mm is 
not in CPU cache, CPU wont stall because next instructions dont depend on it.

This means less CPU stall in case current->mm is not in CPU cache. Thats 
difficult to benchmark it, but you can trust me.

A flag means :

if (flag)
     up_read(&current->mm->mmap_sem)

This generates quite a bad code.

if (ptr)
    up_read(ptr)

generates *much* better code.

So this is a cleanup and a runtime optimization.

I dit a similar optimization on commit 163da958ba5282cbf85e8b3dc08e4f51f8b01c5e

I invite you to check it :

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=163da958ba5282cbf85e8b3dc08e4f51f8b01c5e



> 
>>  {
>>      unsigned long address = (unsigned long)uaddr;
>>      struct mm_struct *mm = current->mm;
>> @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un
>>      address -= key->both.offset;
>>  
>>      /*
>> +     * PROCESS_PRIVATE futexes are fast.
>> +     * As the mm cannot disappear under us and the 'key' only needs
>> +     * virtual address, we dont even have to find the underlying vma.
>> +     * Note : We do have to check 'address' is a valid user address,
>> +     *        but access_ok() should be faster than find_vma()
>> +     * Note : At this point, address points to the start of page,
>> +     *        not the real futex address, this is ok.
>> +     */
>> +    if (!shared) {
>> +        if (!access_ok(VERIFY_WRITE, address, sizeof(int)))
>> +            return -EFAULT;
> 
> Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it
> should depend on the size of the operation. Maybe the access_ok check
> should go outside get_futex_key?

If you check again, you'll see that address points to the start of the PAGE, 
not the real u32/u64 futex address. This checks the PAGE. We can use char, 
short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span 
two pages.


>>       */
>>      key->shared.inode = vma->vm_file->f_path.dentry->d_inode;
>> -    key->both.offset++; /* Bit 0 of offset indicates inode-based key. */
>> +    key->both.offset += FUT_OFF_INODE; /* inode-based key. */
>>      if (likely(!(vma->vm_flags & VM_NONLINEAR))) {
>>          key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT)
>>                       + vma->vm_pgoff);
> 
> I like |= for adding flags, it seems less ambiguous. But I guess that's
> a matter of opinion. Hugh seems to like +=, and I can't argue with him
> about style issues ;)


Previous code was doing offset++ wich means offset += 1;
I didnt want to hurt Hugh :)

>>  EXPORT_SYMBOL_GPL(drop_futex_key_refs);
> 
> I wonder if it would be worthwhile inlining and likley()ing the
> private fastpath? Might make it pretty compact... I guess that's
> something to worry about after glibc gets support.

Yes, in a future patch, in about one year :)

>> +
>> +    if (!(vma = find_vma(mm, address)) ||
>> +        vma->vm_start > address || !(vma->vm_flags & VM_WRITE))
>> +        ret = -EFAULT;
>> +
>> +    else
>> +        switch (handle_mm_fault(mm, vma, address, 1)) {
>> +        case VM_FAULT_MINOR:
>> +            current->min_flt++;
>> +            break;
>> +        case VM_FAULT_MAJOR:
>> +            current->maj_flt++;
>> +            break;
>> +        default:
>> +            ret = -EFAULT;
>> +        }
>> +    if (!shared)
>> +        up_read(&mm->mmap_sem);
>> +    return ret;
>>  }
>>  
>>  /*
> 
> You've got an extra space after the if (maybe for clarity?). In this
> situation I prefer putting braces around both the if and the else, and
> if you get rid of that blank line, it doesn't cost you anything more ;)

Oh well...

> 
>> @@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us
>>          restart->arg1 = val;
>>          restart->arg2 = (unsigned long)abs_time;
>>          restart->arg3 = (unsigned long)futex64;
>> +        if (shared)
>> +            restart->arg3 |= 2;
> 
> Could you make this into a proper flags argument and use #define 
> CONSTANTs for it?

Yes, but I'm not sure it will improve readability.

> 
>> @@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u
>>      struct timespec ts;
>>      ktime_t t, *tp = NULL;
>>      u64 val2 = 0;
>> +    int opm = op & FUTEX_CMD_MASK;
> 
> What's opm stand for?

I guess 'm' stands for 'mask' or 'masked' ?

Thank you

  reply	other threads:[~2007-04-06  5:53 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-08  7:07 [RFC] NUMA futex hashing Ravikiran G Thirumalai
2006-08-08  9:14 ` Eric Dumazet
2006-08-08 20:31   ` Ravikiran G Thirumalai
2006-08-08  9:37 ` Jes Sorensen
2006-08-08  9:58   ` Andi Kleen
2006-08-08 10:07     ` Jes Sorensen
2006-08-08  9:57 ` Andi Kleen
2006-08-08 10:10   ` Eric Dumazet
2006-08-08 10:36     ` Andi Kleen
2006-08-08 12:29       ` Eric Dumazet
2006-08-08 12:47         ` Andi Kleen
2006-08-08 12:57           ` Eric Dumazet
2006-08-08 14:39             ` Ulrich Drepper
2006-08-08 15:11               ` Nick Piggin
2006-08-08 15:36                 ` Ulrich Drepper
2006-08-08 16:22                   ` Nick Piggin
2006-08-08 16:26                     ` Nick Piggin
2006-08-08 16:49                     ` Ulrich Drepper
2006-08-08 16:08                 ` Eric Dumazet
2006-08-08 16:34                   ` Nick Piggin
2006-08-08 16:49                     ` Eric Dumazet
2006-08-08 16:59                       ` Eric Dumazet
2006-08-09  1:56                       ` Nick Piggin
2006-08-08 16:58                   ` Ulrich Drepper
2006-08-08 17:08                     ` Eric Dumazet
2006-08-09  1:58                     ` Nick Piggin
2006-08-09  6:26                       ` Eric Dumazet
2006-08-09  6:43                         ` Eric Dumazet
2007-03-15 19:10                           ` [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements Eric Dumazet
2007-03-15 20:15                             ` Nick Piggin
2007-03-16  8:05                             ` Peter Zijlstra
2007-03-16  9:30                               ` Eric Dumazet
2007-03-16 10:10                                 ` Peter Zijlstra
2007-03-16 10:30                                   ` Eric Dumazet
2007-03-16 10:36                                     ` Peter Zijlstra
2007-04-04  7:16                             ` Ulrich Drepper
2007-04-05 17:49                               ` [PATCH] FUTEX : new PRIVATE futexes Eric Dumazet
2007-04-05 20:43                                 ` Ulrich Drepper
2007-04-06  1:19                                 ` Nick Piggin
2007-04-06  5:53                                   ` Eric Dumazet [this message]
2007-04-06 11:50                                     ` Nick Piggin
2007-04-06  6:05                                   ` Hugh Dickins
2007-04-06 17:41                                     ` Jan Engelhardt
2007-04-06 12:26                                 ` Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes) Peter Zijlstra
2007-04-06 13:02                                   ` Hugh Dickins
2007-04-06 13:15                                     ` Peter Zijlstra
2007-04-06 13:15                                     ` Nick Piggin
2007-04-06 13:22                                       ` Peter Zijlstra
2007-04-06 13:40                                         ` Nick Piggin
2007-04-06 12:31                                 ` [PATCH] FUTEX : new PRIVATE futexes Peter Zijlstra
2007-04-07  8:43                                 ` [PATCH, take4] " Eric Dumazet
2007-04-07  9:30                                   ` Nick Piggin
2007-04-07 10:00                                     ` Eric Dumazet
2007-04-11  7:22                                       ` Nick Piggin
2007-04-11  8:14                                         ` Eric Dumazet
2007-04-11  9:23                                           ` Nick Piggin
2007-04-11  9:30                                             ` Pierre Peiffer
2007-04-11  9:39                                               ` Nick Piggin
2007-04-11  9:40                                                 ` Nick Piggin
2007-04-11  9:35                                             ` Eric Dumazet
2007-04-12  1:57                                               ` Nick Piggin
2007-04-07 11:18                                   ` Jakub Jelinek
2007-04-07 11:54                                     ` Eric Dumazet
2007-04-07 16:40                                       ` Ulrich Drepper
2007-04-07 22:15                                   ` Andrew Morton
2007-04-10  9:21                                     ` Eric Dumazet
2007-04-11  9:19                                   ` [PATCH, take5] " Eric Dumazet
2007-04-11 12:23                                     ` Rusty Russell
2007-04-26 12:55                                     ` [PATCH, take6] " Eric Dumazet
2007-04-26 13:35                                       ` Pierre Peiffer
2007-03-15 19:13                           ` [PATCH 1/3] FUTEX : introduce PROCESS_PRIVATE semantic Eric Dumazet
2007-03-15 19:16                           ` [PATCH 2/3] FUTEX : introduce private hashtables Eric Dumazet
2007-03-15 20:25                             ` Nick Piggin
2007-03-15 21:09                               ` Ulrich Drepper
2007-03-15 21:29                                 ` Nick Piggin
2007-03-15 22:59                               ` William Lee Irwin III
2007-03-15 19:20                           ` [PATCH 3/3] FUTEX : NUMA friendly global hashtable Eric Dumazet
2006-08-09  0:13     ` [RFC] NUMA futex hashing Ravikiran G Thirumalai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4615E044.6080205@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=drepper@gmail.com \
    --cc=kiran@scalex86.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pravin.shelar@calsoftinc.com \
    --cc=shai@scalex86.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox