* [PATCH] Inconsistent mmap()/mremap() flags
@ 2007-09-28 5:46 Thayne Harbaugh
2007-10-01 11:13 ` Andi Kleen
0 siblings, 1 reply; 9+ messages in thread
From: Thayne Harbaugh @ 2007-09-28 5:46 UTC (permalink / raw)
To: linux-kernel; +Cc: ak, linux-mm, discuss
The x86_64 mmap() accepts the MAP_32BIT flag to request 32-bit clean
addresses. It seems to me that for consistency x86_64 mremap() should
also accept this (or an equivalent) flag.
Here is a trivial and untested patch for basis of discussion:
--- linux-source-2.6.22/mm/mremap.c.orig 2007-09-27 23:02:13.000000000 -0600
+++ linux-source-2.6.22/mm/mremap.c 2007-09-27 23:07:29.000000000 -0600
@@ -23,6 +23,11 @@
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
+/* MAP_32BIT possibly defined in asm/mman.h */
+#ifndef MAP_32BIT
+#define MAP_32BIT 0
+#endif
+
static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
@@ -255,7 +259,7 @@
unsigned long ret = -EINVAL;
unsigned long charged = 0;
- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MAP_32BIT))
goto out;
if (addr & ~PAGE_MASK)
@@ -388,6 +392,9 @@
if (vma->vm_flags & VM_MAYSHARE)
map_flags |= MAP_SHARED;
+ if (flags & MAP_32BIT)
+ map_flags |= MAP_32BIT;
+
new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
vma->vm_pgoff, map_flags);
ret = new_addr;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Inconsistent mmap()/mremap() flags
2007-09-28 5:46 [PATCH] Inconsistent mmap()/mremap() flags Thayne Harbaugh
@ 2007-10-01 11:13 ` Andi Kleen
2007-10-02 2:57 ` Thayne Harbaugh
0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2007-10-01 11:13 UTC (permalink / raw)
To: thayne; +Cc: linux-kernel, linux-mm, discuss
> @@ -388,6 +392,9 @@
> if (vma->vm_flags & VM_MAYSHARE)
> map_flags |= MAP_SHARED;
>
> + if (flags & MAP_32BIT)
> + map_flags |= MAP_32BIT;
> +
> new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
> vma->vm_pgoff, map_flags);
> ret = new_addr;
That's not enough -- you would also need to fail the mremap when the result
is > 2GB (MAP_32BIT is actually a MAP_31BIT)
But that would be ugly to implement without a new architecture wrapper
or better changing arch_get_unmapped_area()
It might be better to just not bother. MAP_32BIT is a kind of hack anyways
that at least for mmap can be easily emulated in user space anyways.
Given for mremap() it is not that easy because there is no "hint" argument
without MREMAP_FIXED; but unless someone really needs it i would prefer
to not propagate the hack. If it's really needed it's probably better
to implement a start search hint for mremap()
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Inconsistent mmap()/mremap() flags
2007-10-01 11:13 ` Andi Kleen
@ 2007-10-02 2:57 ` Thayne Harbaugh
2007-10-02 5:15 ` Andi Kleen
0 siblings, 1 reply; 9+ messages in thread
From: Thayne Harbaugh @ 2007-10-02 2:57 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, linux-mm, discuss
On Mon, 2007-10-01 at 13:13 +0200, Andi Kleen wrote:
> > @@ -388,6 +392,9 @@
> > if (vma->vm_flags & VM_MAYSHARE)
> > map_flags |= MAP_SHARED;
> >
> > + if (flags & MAP_32BIT)
> > + map_flags |= MAP_32BIT;
> > +
> > new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
> > vma->vm_pgoff, map_flags);
> > ret = new_addr;
>
> That's not enough -- you would also need to fail the mremap when the result
> is > 2GB (MAP_32BIT is actually a MAP_31BIT)
Yeah, after I sent the email I realized that it was a bit more involved.
As far as the 32/31 bit, it just depends on the perspective. I can see
that 32 bits are needed to represent all possible return values from
mmap() - possible address and error value of -1. From that perspective
I think that MAP_32BIT is appropriate.
> But that would be ugly to implement without a new architecture wrapper
> or better changing arch_get_unmapped_area()
>
> It might be better to just not bother. MAP_32BIT is a kind of hack anyways
> that at least for mmap can be easily emulated in user space anyways.
Care to give me some hints as to how that would be easily emulated in
user space? That might be a better solution for the case I want to
solve.
> Given for mremap() it is not that easy because there is no "hint" argument
> without MREMAP_FIXED; but unless someone really needs it i would prefer
> to not propagate the hack. If it's really needed it's probably better
> to implement a start search hint for mremap()
It came up for user-mode Qemu for the case of emulating 32bit archs on
x86_64 using mmap. At the moment it calls mmap with MAP_32BIT and then
uses the returned address directly in the emulator. Without MAP_32BIT
there's the possibility of having an address that would be too large to
pass to what a 32bit arch would expect. Since the MAP_32BIT flag solves
the problem for mmap() I was expecting something similar for mremap() -
unfortunately the MAP_32BIT feature is consistent throughout.
Thoughts?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Inconsistent mmap()/mremap() flags
2007-10-02 2:57 ` Thayne Harbaugh
@ 2007-10-02 5:15 ` Andi Kleen
2007-10-02 7:06 ` Thayne Harbaugh
0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2007-10-02 5:15 UTC (permalink / raw)
To: Thayne Harbaugh; +Cc: Andi Kleen, linux-kernel, linux-mm, discuss
On Mon, Oct 01, 2007 at 08:57:10PM -0600, Thayne Harbaugh wrote:
> Yeah, after I sent the email I realized that it was a bit more involved.
> As far as the 32/31 bit, it just depends on the perspective. I can see
> that 32 bits are needed to represent all possible return values from
> mmap() - possible address and error value of -1. From that perspective
> I think that MAP_32BIT is appropriate.
Your perspective seems quite narrow. Only using 2GB instead of 4GB
is a major functional difference.
Negative error values are used in all system calls, so it would
hardly seem necessary to encode the use of the 32th bit for that
in the option name.
> > But that would be ugly to implement without a new architecture wrapper
> > or better changing arch_get_unmapped_area()
> >
> > It might be better to just not bother. MAP_32BIT is a kind of hack anyways
> > that at least for mmap can be easily emulated in user space anyways.
>
> Care to give me some hints as to how that would be easily emulated in
> user space? That might be a better solution for the case I want to
> solve.
For mmap you can emulate it by passing a low hint != 0 (e.g. getpagesize())
in address but without MAP_FIXED and checking if the result is not beyond
your range.
>
> > Given for mremap() it is not that easy because there is no "hint" argument
> > without MREMAP_FIXED; but unless someone really needs it i would prefer
> > to not propagate the hack. If it's really needed it's probably better
> > to implement a start search hint for mremap()
>
> It came up for user-mode Qemu for the case of emulating 32bit archs on
> x86_64 using mmap. At the moment it calls mmap with MAP_32BIT and then
That would limit the 32bit architectures to 2GB; but their real limit
is 4GB. Losing half of the address space definitely would make users unhappy
(e.g. at least normal Linux kernels wouldn't run at all)
The reason it's only 2GB is that the flag was added to support the small
code model of x86-64, which is limited to 2GB (31bit). Yes it's misnamed.
But it's not used for the 32bit compat code.
> uses the returned address directly in the emulator. Without MAP_32BIT
> there's the possibility of having an address that would be too large to
> pass to what a 32bit arch would expect. Since the MAP_32BIT flag solves
> the problem for mmap()
It doesn't really for that case
> I was expecting something similar for mremap() -
> unfortunately the MAP_32BIT feature is consistent throughout.
I guess you mean inconsistent.
Does qemu actually need mremap() ? It would surprise me because
a lot of other OS don't implement it.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Inconsistent mmap()/mremap() flags
2007-10-02 5:15 ` Andi Kleen
@ 2007-10-02 7:06 ` Thayne Harbaugh
2007-10-02 12:19 ` Hugh Dickins
0 siblings, 1 reply; 9+ messages in thread
From: Thayne Harbaugh @ 2007-10-02 7:06 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, linux-mm, discuss
On Tue, 2007-10-02 at 07:15 +0200, Andi Kleen wrote:
> On Mon, Oct 01, 2007 at 08:57:10PM -0600, Thayne Harbaugh wrote:
> For mmap you can emulate it by passing a low hint != 0 (e.g. getpagesize())
> in address but without MAP_FIXED and checking if the result is not beyond
> your range.
Cool. That's a much better solution for multiple reasons - like you
mention, MAP_32BIT is only 2GB as well as it's only available on x86_64.
> > > Given for mremap() it is not that easy because there is no "hint" argument
> > > without MREMAP_FIXED; but unless someone really needs it i would prefer
> > > to not propagate the hack. If it's really needed it's probably better
> > > to implement a start search hint for mremap()
> >
> > It came up for user-mode Qemu for the case of emulating 32bit archs on
> > x86_64 using mmap. At the moment it calls mmap with MAP_32BIT and then
>
> That would limit the 32bit architectures to 2GB; but their real limit
> is 4GB. Losing half of the address space definitely would make users unhappy
> (e.g. at least normal Linux kernels wouldn't run at all)
Keeping a kernel happy isn't necessary since it's user-space emulation
rather than full emulation. It is, however, useful to have 4GB rather
than 2GB.
> Does qemu actually need mremap() ? It would surprise me because
> a lot of other OS don't implement it.
Qemu has two modes: full hardware emulation and user-mode emulation.
User-mode emulation translates the user-mode code and then remaps the
system calls directly into the native kernel (that way all the kernel
and all the I/O runs natively and faster). As far as mremap(), I'm
trying to get a 32bit arm mremap() emulated syscall mapped onto a 64bit
x86_64 mremap().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Inconsistent mmap()/mremap() flags
2007-10-02 7:06 ` Thayne Harbaugh
@ 2007-10-02 12:19 ` Hugh Dickins
2007-10-02 13:45 ` [discuss] " Andi Kleen
0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2007-10-02 12:19 UTC (permalink / raw)
To: Thayne Harbaugh; +Cc: Andi Kleen, linux-kernel, linux-mm, discuss
On Tue, 2 Oct 2007, Thayne Harbaugh wrote:
> On Tue, 2007-10-02 at 07:15 +0200, Andi Kleen wrote:
>
> > For mmap you can emulate it by passing a low hint != 0 (e.g. getpagesize())
> > in address but without MAP_FIXED and checking if the result is not beyond
> > your range.
>
> Cool. That's a much better solution for multiple reasons - like you
> mention, MAP_32BIT is only 2GB as well as it's only available on x86_64.
>
> > > > Given for mremap() it is not that easy because there is no "hint" argument
> > > > without MREMAP_FIXED; but unless someone really needs it i would prefer
> > > > to not propagate the hack. If it's really needed it's probably better
> > > > to implement a start search hint for mremap()
I think you can do it already, without us complicating mremap further
with such a start search hint.
First call mmap with a low hint address, the new size you'll be wanting
from the mremap, PROT_NONE, MAP_ANONYMOUS, -1, 0. Then call mremap with
old address, old size, new size, MREMAP_MAYMOVE|MREMAP_FIXED, and new
address as returned by the preparatory mmap.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [discuss] [PATCH] Inconsistent mmap()/mremap() flags
2007-10-02 12:19 ` Hugh Dickins
@ 2007-10-02 13:45 ` Andi Kleen
2007-10-02 14:16 ` Hugh Dickins
0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2007-10-02 13:45 UTC (permalink / raw)
To: discuss; +Cc: Hugh Dickins, Thayne Harbaugh, linux-mm, linux-kernel
> First call mmap with a low hint address, the new size you'll be wanting
> from the mremap, PROT_NONE, MAP_ANONYMOUS, -1, 0. Then call mremap with
> old address, old size, new size, MREMAP_MAYMOVE|MREMAP_FIXED, and new
> address as returned by the preparatory mmap.
That's racy unfortunately in a multithreaded process. They would need to loop.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [discuss] [PATCH] Inconsistent mmap()/mremap() flags
2007-10-02 13:45 ` [discuss] " Andi Kleen
@ 2007-10-02 14:16 ` Hugh Dickins
2007-10-02 15:21 ` Thayne Harbaugh
0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2007-10-02 14:16 UTC (permalink / raw)
To: Andi Kleen; +Cc: discuss, Thayne Harbaugh, linux-mm, linux-kernel
On Tue, 2 Oct 2007, Andi Kleen wrote:
>
> > First call mmap with a low hint address, the new size you'll be wanting
> > from the mremap, PROT_NONE, MAP_ANONYMOUS, -1, 0. Then call mremap with
> > old address, old size, new size, MREMAP_MAYMOVE|MREMAP_FIXED, and new
> > address as returned by the preparatory mmap.
>
> That's racy unfortunately in a multithreaded process. They would need to loop.
Perhaps. Though I don't see what your loop would be doing;
and the mapping established by the first thread would only
be vulnerable to another thread if that were really set on
interfering (an un-FIXED mmap by another thread will keep
away from the area assigned to the first).
Certainly a two-stage procedure has to be weaker than one stage,
but it is just how MAP_FIXED is normally used (isn't it?): first
stake out an arena for all that's needed without MAP_FIXED, then
fit into it the actual mappings required using MAP_FIXED. Blind
use of MAP_FIXED is always in danger of unmapping something vital.
But whether the two-stage procedure is good enough for Thayne's
purpose, he'll have to judge for himself.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [discuss] [PATCH] Inconsistent mmap()/mremap() flags
2007-10-02 14:16 ` Hugh Dickins
@ 2007-10-02 15:21 ` Thayne Harbaugh
0 siblings, 0 replies; 9+ messages in thread
From: Thayne Harbaugh @ 2007-10-02 15:21 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andi Kleen, discuss, linux-mm, linux-kernel
On Tue, 2007-10-02 at 15:16 +0100, Hugh Dickins wrote:
> On Tue, 2 Oct 2007, Andi Kleen wrote:
> >
> > > First call mmap with a low hint address, the new size you'll be wanting
> > > from the mremap, PROT_NONE, MAP_ANONYMOUS, -1, 0. Then call mremap with
> > > old address, old size, new size, MREMAP_MAYMOVE|MREMAP_FIXED, and new
> > > address as returned by the preparatory mmap.
> >
> > That's racy unfortunately in a multithreaded process. They would need to loop.
>
> Perhaps. Though I don't see what your loop would be doing;
> and the mapping established by the first thread would only
> be vulnerable to another thread if that were really set on
> interfering (an un-FIXED mmap by another thread will keep
> away from the area assigned to the first).
>
> Certainly a two-stage procedure has to be weaker than one stage,
> but it is just how MAP_FIXED is normally used (isn't it?): first
> stake out an arena for all that's needed without MAP_FIXED, then
> fit into it the actual mappings required using MAP_FIXED. Blind
> use of MAP_FIXED is always in danger of unmapping something vital.
>
> But whether the two-stage procedure is good enough for Thayne's
> purpose, he'll have to judge for himself.
I think my eyes have been opened enough so that I can get things to work
- it's certainly better in many respects than using MAP_32BIT with its
many limitations.
Thank you.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-10-02 15:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-28 5:46 [PATCH] Inconsistent mmap()/mremap() flags Thayne Harbaugh
2007-10-01 11:13 ` Andi Kleen
2007-10-02 2:57 ` Thayne Harbaugh
2007-10-02 5:15 ` Andi Kleen
2007-10-02 7:06 ` Thayne Harbaugh
2007-10-02 12:19 ` Hugh Dickins
2007-10-02 13:45 ` [discuss] " Andi Kleen
2007-10-02 14:16 ` Hugh Dickins
2007-10-02 15:21 ` Thayne Harbaugh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).