From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>,
Jared Hulbert <jaredeh@gmail.com>,
Carsten Otte <cotte@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
linux-arch@vger.kernel.org
Subject: Re: [rfc][patch 2/2] mm: introduce optional pte_special pte bit
Date: Wed, 16 Jan 2008 04:38:00 +0100 [thread overview]
Message-ID: <20080116033800.GA8890@wotan.suse.de> (raw)
In-Reply-To: <alpine.LFD.1.00.0801130846000.2806@woody.linux-foundation.org>
On Sun, Jan 13, 2008 at 08:50:23AM -0800, Linus Torvalds wrote:
>
>
> On Sun, 13 Jan 2008, Nick Piggin wrote:
> >
> > OK, you have to read to the 3rd paragraph.
>
> That one doesn't actually say what the point is either. It just claims
> that there *is* a point that isn't actually explained or mentioned
> further.
OK, right after reading it again I concede it is unclear and doesn't
actually make the point.
My biggest concen is that s390 basically are thinking about doing
something like this:
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
#ifdef s390
if (pte_special(pte))
return NULL;
#else
if (!pfn_valid(pfn))
return NULL;
#endif
goto out;
} else {
unsigned long off = (addr-vma->vm_start) >> PAGE_SHIFT;
if (pfn == vma->vm_pgoff + off)
return NULL;
if (!is_cow_mapping(vma->vm_flags))
return NULL;
}
}
Of course it is nicely wrapped in arch functions, but that is the
basic logic. Now what happens is that already gives us 2 paths through
that function which needs to be understood by developers, but it also
IMO is more confusing because it mixes the 2 schemes (vma based vs pte
based).
Basically what I want to do is make the pte_special part usable by
all architectures, and bring it out of the vma tests. I think if anything
it actually makes the code clearer to a novice reader because they can
quite easily see the intention of the more complex vma scheme, by
understanding what the pte scheme does (althogh the comments are pretty
good at doing that anyway). My point is just that I don't think it is
too horrible. It isn't like the poor developer who has to understand
the vma scheme will have much problem with the pte scheme...
> > Well the immediate improvement from this actual patch is just that
> > it gives better and smaller code for vm_normal_page (even if you
> > discount the debug checks in the existing code).
>
> It does no such thing, except for the slow-path that doesn't matter.
Well it does do such a thing.
Here is the old one (with the sanity check ifdefed away). 88 bytes or so,
somewhat sparse in icache for the fastpath, which also has a short forward
jump:
vm_normal_page:
movabsq $70368744177663, %rax #, tmp66
andq %rax, %rdx # tmp66, pfn
movq 40(%rdi), %rax # <variable>.vm_flags, D.23528
shrq $12, %rdx #, pfn
testl $268436480, %eax #, D.23528
je .L14 #,
testl $268435456, %eax #, D.23528
je .L16 #,
cmpq end_pfn(%rip), %rdx # end_pfn, pfn
jb .L14 #,
jmp .L18 #
.L16:
subq 8(%rdi), %rsi # <variable>.vm_start, addr
shrq $12, %rsi #, addr
addq 136(%rdi), %rsi # <variable>.vm_pgoff, addr
cmpq %rsi, %rdx # addr, pfn
je .L18 #,
andl $40, %eax #, tmp72
cmpl $32, %eax #, tmp72
jne .L18 #,
.L14:
imulq $56, %rdx, %rax #, pfn, D.23534
addq mem_map(%rip), %rax # mem_map, D.23534
ret
.L18:
xorl %eax, %eax # D.23534
ret
The new one looks like this. It has no taken branches in the fastpath so
is dense in icache, is less than half the size, and works out of registers.
vm_normal_page:
xorl %eax, %eax # D.23526
testb $2, %dh #, pte$pte
jne .L16 #,
movabsq $70368744177663, %rax #, tmp66
andq %rax, %rdx # tmp66, pte$pte
shrq $12, %rdx #, pte$pte
imulq $56, %rdx, %rax #, pte$pte, D.23526
addq mem_map(%rip), %rax # mem_map, D.23526
.L16:
ret
I think it is slightly better code even for the fastpath.
> So it may optimize the slow-path (PFNMAP/MIXMAP), but the real path stays
> exactly the same from a code generation standpoint (just checking a
> different bit, afaik).
Well, possibly MIXMAP will become an important path too if you have a lot
of use of these XIP filesystems.
> If those pte_special bits are required for unexplained lockless
> get_user_pages, is this going to get EVEN WORSE when s390 (again) cannot
> do it?
Ah, my raising of the lockless get_user_pages was just to demonstrate
that some other architectures (like x86) would also like to have a
pte_special bit in their pte. So it isn't like we'd be using a pte bit
_simply_ to make this little improvement to vm_normal_page(), just that
the path isn't going to be an s390 only thing because we may get other
architectures with a pte_special bit for other good reasons.
lockless get_user_pages won't touch vm_normal_page, or even the existing
get_user_pages. Architectures that don't support it would just fall
through to regular get_user_pages...
next prev parent reply other threads:[~2008-01-16 3:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-13 3:08 [rfc] changes to user memory mapping scheme Nick Piggin
2008-01-13 3:09 ` [rfc][patch 1/2] mm: introduce VM_MIXEDMAP Nick Piggin
2008-01-13 3:10 ` [rfc][patch 2/2] mm: introduce optional pte_special pte bit Nick Piggin
2008-01-13 3:41 ` Linus Torvalds
2008-01-13 4:39 ` Nick Piggin
2008-01-13 4:45 ` Linus Torvalds
2008-01-13 5:06 ` Nick Piggin
2008-01-13 16:50 ` Linus Torvalds
2008-01-13 20:46 ` Martin Schwidefsky
2008-01-14 21:04 ` Jared Hulbert
2008-01-15 9:18 ` Carsten Otte
2008-01-16 3:38 ` Nick Piggin [this message]
2008-01-16 4:04 ` Linus Torvalds
2008-01-16 4:37 ` Nick Piggin
2008-01-16 4:48 ` Linus Torvalds
2008-01-16 4:51 ` David Miller
2008-01-16 5:23 ` Linus Torvalds
2008-01-16 5:48 ` Nick Piggin
2008-01-16 9:52 ` Martin Schwidefsky
2008-01-16 5:17 ` Nick Piggin
2008-01-16 10:52 ` Catalin Marinas
2008-01-16 18:18 ` Russell King
2008-01-16 17:21 ` Linus Torvalds
2008-01-16 17:14 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080116033800.GA8890@wotan.suse.de \
--to=npiggin@suse.de \
--cc=cotte@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hugh@veritas.com \
--cc=jaredeh@gmail.com \
--cc=linux-arch@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).