* mremap() use is racy
@ 2005-08-23 19:53 Ulrich Drepper
2005-08-23 20:45 ` Hugh Dickins
0 siblings, 1 reply; 8+ messages in thread
From: Ulrich Drepper @ 2005-08-23 19:53 UTC (permalink / raw)
To: Linux Kernel, Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]
Not the mremap() implementation itself, so don't worry.
If mremap() is to be used without the MREMAP_MAYMOVE flag the call will
only succeed of the address space after the block which is to be
remapped is empty. This is rarely the case since there are many users
of mmap and memory is allocated consecutively in many cases.
So what programs have to do is to make sure ahead of time that the
mremap() call can succeed. The best way to do this is using an
anonymous, unused, unusable mapping. Code like this:
p = mmap(NULL, 65536, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);
mmap(p, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
Then when the mapping has to be extended one should be able to use mremap():
mremap(p, 16384, 32768, 0);
But this is not possible since mremap() respects the anonymous mapping.
So one has to use
munmap((char*)p + 16384, 16384);
before the mremap() call. But this is where the race comes in. Some
other thread might allocate these blocks before the mremap() call can do it.
One possible solution would be to add a flag to mremap() which allows
mremap() to steal memory. In general that would be too dangerous but we
could limit it to private, anonymous mappings which have no access
permissions (i.e., PROT_NONE with MAP_PRIVATE and MAP_ANON). One
explicitly has to allocate such blocks, they don't appear naturally.
And the program in any case knows about the address space layout.
So, how about adding MREMAP_MAPOVERNONE or so?
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-23 19:53 mremap() use is racy Ulrich Drepper
@ 2005-08-23 20:45 ` Hugh Dickins
2005-08-23 20:56 ` Ulrich Drepper
0 siblings, 1 reply; 8+ messages in thread
From: Hugh Dickins @ 2005-08-23 20:45 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Linux Kernel, Linus Torvalds
On Tue, 23 Aug 2005, Ulrich Drepper wrote:
>
> One possible solution would be to add a flag to mremap() which allows
> mremap() to steal memory. In general that would be too dangerous but we
> could limit it to private, anonymous mappings which have no access
> permissions (i.e., PROT_NONE with MAP_PRIVATE and MAP_ANON). One
> explicitly has to allocate such blocks, they don't appear naturally.
> And the program in any case knows about the address space layout.
>
> So, how about adding MREMAP_MAPOVERNONE or so?
If the app can plan ahead as you're proposing, why doesn't it just
mmap the maximum it might need, mprotect PROT_NONE the end it doesn't
need yet, then progressively re-mprotect parts to make them accessible
as needed?
I'm missing what mremap gives you here that mprotect doesn't. Though
I do see that it would be nice not to be forced into mremap moving
all the time, because of other maps blocking you off: nice perhaps
to know what region of the layout is least likely to be so affected.
Hugh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-23 20:45 ` Hugh Dickins
@ 2005-08-23 20:56 ` Ulrich Drepper
2005-08-23 21:28 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Ulrich Drepper @ 2005-08-23 20:56 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Linux Kernel, Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]
Hugh Dickins wrote:
> If the app can plan ahead as you're proposing, why doesn't it just
> mmap the maximum it might need, mprotect PROT_NONE the end it doesn't
> need yet, then progressively re-mprotect parts to make them accessible
> as needed?
Because the underlying file isn't larger than the initial mapping. In
the one case I'm working on now the file can grow over time. More data
is added at the end but the mapping cannot move in the address space.
Using mmap with a too-large size for the underlying file and then hoping
that future file growth is magically handled when those pages are
accessed is not valid.
> I'm missing what mremap gives you here that mprotect doesn't. Though
> I do see that it would be nice not to be forced into mremap moving
> all the time, because of other maps blocking you off: nice perhaps
> to know what region of the layout is least likely to be so affected.
Just accept here that moving is not an option. If remap cannot be used
then a complete new mmap() with adjusted length is needed. That is
unnecessarily expensive. It is the reason why there is mremap(). But
mremap() with MREMAP_MAYMOVE is unreliable as it is implemented today.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-23 20:56 ` Ulrich Drepper
@ 2005-08-23 21:28 ` Linus Torvalds
2005-08-23 22:08 ` Ulrich Drepper
0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2005-08-23 21:28 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Hugh Dickins, Linux Kernel
On Tue, 23 Aug 2005, Ulrich Drepper wrote:
>
> Using mmap with a too-large size for the underlying file and then hoping
> that future file growth is magically handled when those pages are
> accessed is not valid.
Actually, it should be pretty much as valid as using mremap - ie it works
on Linux.
Especially if you use MAP_SHARED, you don't even need to mprotect
anything: you'll get a nice SIGBUS if you ever try to access past the last
page that maps the file.
I think that works correctly for any half-way modern kernel - anything
that has mremap() should do the right thing (I think older kernels would
map zero pages past the end of the file mapping, and then if you touched
the page first, you'd lose the coherency).
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-23 21:28 ` Linus Torvalds
@ 2005-08-23 22:08 ` Ulrich Drepper
2005-08-23 23:46 ` Hugh Dickins
2005-08-24 0:08 ` Linus Torvalds
0 siblings, 2 replies; 8+ messages in thread
From: Ulrich Drepper @ 2005-08-23 22:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hugh Dickins, Linux Kernel
[-- Attachment #1: Type: text/plain, Size: 781 bytes --]
Linus Torvalds wrote:
> Actually, it should be pretty much as valid as using mremap - ie it works
> on Linux.
>
> Especially if you use MAP_SHARED, you don't even need to mprotect
> anything: you'll get a nice SIGBUS if you ever try to access past the last
> page that maps the file.
If you guarantee this (and test for this) it's fine with me. The POSIX
spec explicitly leaves this undefined and requiring to use mremap()
would be a nice way to work around this without allowing the
introduction of undefined behavior into programs. I probably would
prefer to use mremap() since this makes it clear what should happen but
I can live with using the too-large mapping.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-23 22:08 ` Ulrich Drepper
@ 2005-08-23 23:46 ` Hugh Dickins
2005-08-24 0:08 ` Linus Torvalds
1 sibling, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2005-08-23 23:46 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Linus Torvalds, Linux Kernel
On Tue, 23 Aug 2005, Ulrich Drepper wrote:
> Linus Torvalds wrote:
> > Actually, it should be pretty much as valid as using mremap - ie it works
> > on Linux.
> >
> > Especially if you use MAP_SHARED, you don't even need to mprotect
> > anything: you'll get a nice SIGBUS if you ever try to access past the last
> > page that maps the file.
>
> If you guarantee this (and test for this) it's fine with me. The POSIX
> spec explicitly leaves this undefined and requiring to use mremap()
> would be a nice way to work around this without allowing the
> introduction of undefined behavior into programs. I probably would
> prefer to use mremap() since this makes it clear what should happen but
> I can live with using the too-large mapping.
I don't have spec to hand to quote, but at least 10 years ago the SIGBUS
beyond end of file was required behaviour, on both MAP_SHARED and more
surprisingly MAP_PRIVATE mappings of a file. Well, you don't get SIGBUS
on any part of the page that covers the end of the file: perhaps what
you read as undefined in the spec is addressing that tail page issue.
We guarantee it: it's a kernel bug if it doesn't work.
Hugh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-23 22:08 ` Ulrich Drepper
2005-08-23 23:46 ` Hugh Dickins
@ 2005-08-24 0:08 ` Linus Torvalds
2005-08-24 12:00 ` Hugh Dickins
1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2005-08-24 0:08 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Hugh Dickins, Linux Kernel
On Tue, 23 Aug 2005, Ulrich Drepper wrote:
>
> Linus Torvalds wrote:
> >
> > Especially if you use MAP_SHARED, you don't even need to mprotect
> > anything: you'll get a nice SIGBUS if you ever try to access past the last
> > page that maps the file.
>
> If you guarantee this (and test for this) it's fine with me.
It's how the kernel _should_ work, but very few apps seem to depend on it,
so no guarantees. I looked over the code, and I think we've lost the
SIGBUS thing.
Basically, if you don't ever access past the end, you should always be ok.
If you access past the end, at least some _really_ old kernel versions
will zero-fill the page with an anonymous page and lose coherency, while
more modern kernels should _either_ cause a SIGBUS or alternatively at
least have a coherent zero-filled page.
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mremap() use is racy
2005-08-24 0:08 ` Linus Torvalds
@ 2005-08-24 12:00 ` Hugh Dickins
0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2005-08-24 12:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Ulrich Drepper, Linux Kernel
On Tue, 23 Aug 2005, Linus Torvalds wrote:
> On Tue, 23 Aug 2005, Ulrich Drepper wrote:
> > Linus Torvalds wrote:
> > >
> > > Especially if you use MAP_SHARED, you don't even need to mprotect
> > > anything: you'll get a nice SIGBUS if you ever try to access past
> > > the last page that maps the file.
MAP_PRIVATE also - even if you earlier wrote private data there before
the file got truncated down.
> > If you guarantee this (and test for this) it's fine with me.
Since you're not guaranteeing it, shall I?
The Open Posix Testsuite tests for it: though I haven't run that up,
and its conformance/interfaces/mmap/coverage.txt notes it failed with
glibc-2.3 on Linux-2.6.0-test2.
I have just tested mmap SIGBUS beyond EOF on 2.6.13-rc7,
i386 and x86_64, works correctly as expected.
A quick browse through the others shows all the MMU architectures
appearing to support it: delivering SIGBUS signal if VM_FAULT_SIGBUS
returned by handle_mm_fault.
Except for PA-RISC, which delivers SIGSEGV instead. I imagine that's
wrong, but safer to leave unchanged now - I won't guarantee that one.
> It's how the kernel _should_ work, but very few apps seem to depend on it,
> so no guarantees. I looked over the code, and I think we've lost the
> SIGBUS thing.
It would be easier to follow the route from ->nopage observing offset
beyond EOF through to delivery of the SIGBUS if filemap_nopage were
to say NOPAGE_SIGBUS, rather than the NULL that's defined to be.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
--- 2.6.13-rc7/mm/filemap.c 2005-08-24 11:13:41.000000000 +0100
+++ linux/mm/filemap.c 2005-08-24 12:35:33.000000000 +0100
@@ -1281,7 +1281,7 @@ outside_data_content:
* accessible..
*/
if (area->vm_mm == current->mm)
- return NULL;
+ return NOPAGE_SIGBUS;
/* Fall through to the non-read-ahead case */
no_cached_page:
/*
@@ -1306,7 +1306,7 @@ no_cached_page:
*/
if (error == -ENOMEM)
return NOPAGE_OOM;
- return NULL;
+ return NOPAGE_SIGBUS;
page_not_uptodate:
if (!did_readaround) {
@@ -1366,7 +1366,7 @@ page_not_uptodate:
* mm layer so, possibly freeing the page cache page first.
*/
page_cache_release(page);
- return NULL;
+ return NOPAGE_SIGBUS;
}
EXPORT_SYMBOL(filemap_nopage);
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-08-24 11:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-23 19:53 mremap() use is racy Ulrich Drepper
2005-08-23 20:45 ` Hugh Dickins
2005-08-23 20:56 ` Ulrich Drepper
2005-08-23 21:28 ` Linus Torvalds
2005-08-23 22:08 ` Ulrich Drepper
2005-08-23 23:46 ` Hugh Dickins
2005-08-24 0:08 ` Linus Torvalds
2005-08-24 12:00 ` Hugh Dickins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).