* [patch][rfc] expandable anonymous shared mappings
@ 2004-06-16 18:12 Stas Sergeev
2004-06-18 20:46 ` Hugh Dickins
0 siblings, 1 reply; 14+ messages in thread
From: Stas Sergeev @ 2004-06-16 18:12 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 873 bytes --]
Hello.
It seems right now there is no way to
expand the anonymous shared mappings
(or am I missing something?)
This makes this mechanism practically
useless for many tasks, it otherwise could
suit very well, I beleive.
The attached patch implements a simple
nopage vm op for anonymous shared mappings,
which allows to expand them with mremap().
Attached is also a test-case which crashes
with bus error without the patch and works
properly with the patch (it tries to expand
the mapping, but without the patch it is
not possible).
The patch is against 2.6.7-rc3-mm2, but
should work with any 2.6.6 I beleive.
I would be glad to hear any comments on that.
I really would like to use the anonymous shared
mappings rather than those heavy-weight
alternatives like shm_open() and such, when
appropriate, but without the mremap() support,
this looks quite impossible to me.
[-- Attachment #2: anon_nopg.diff --]
[-- Type: text/x-patch, Size: 1786 bytes --]
--- linux/mm/shmem.c 2004-06-15 09:51:54.000000000 +0400
+++ linux/mm/shmem.c 2004-06-15 17:03:04.000000000 +0400
@@ -169,6 +169,7 @@
static struct inode_operations shmem_inode_operations;
static struct inode_operations shmem_dir_inode_operations;
static struct vm_operations_struct shmem_vm_ops;
+static struct vm_operations_struct shmem_zero_vm_ops;
static struct backing_dev_info shmem_backing_dev_info = {
.ra_pages = 0, /* No readahead */
@@ -1095,6 +1096,27 @@
return page;
}
+struct page *shmem_zero_nopage(struct vm_area_struct *vma, unsigned long address, int *type)
+{
+ struct inode *inode = vma->vm_file->f_dentry->d_inode;
+ loff_t vm_size = address + PAGE_SIZE - vma->vm_start;
+ loff_t i_size = i_size_read(inode);
+
+ if (i_size < vm_size) {
+ int error;
+ error = shmem_acct_size(SHMEM_I(inode)->flags, vm_size - i_size);
+ if (error)
+ return NOPAGE_SIGBUS;
+ down(&inode->i_sem);
+ error = vmtruncate(inode, vm_size);
+ up(&inode->i_sem);
+ if (error)
+ return NOPAGE_SIGBUS;
+ }
+
+ return shmem_nopage(vma, address, type);
+}
+
static int shmem_populate(struct vm_area_struct *vma,
unsigned long addr, unsigned long len,
pgprot_t prot, unsigned long pgoff, int nonblock)
@@ -1970,6 +1992,15 @@
#endif
};
+static struct vm_operations_struct shmem_zero_vm_ops = {
+ .nopage = shmem_zero_nopage,
+ .populate = shmem_populate,
+#ifdef CONFIG_NUMA
+ .set_policy = shmem_set_policy,
+ .get_policy = shmem_get_policy,
+#endif
+};
+
static struct super_block *shmem_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
@@ -2101,7 +2132,7 @@
if (vma->vm_file)
fput(vma->vm_file);
vma->vm_file = file;
- vma->vm_ops = &shmem_vm_ops;
+ vma->vm_ops = &shmem_zero_vm_ops;
return 0;
}
[-- Attachment #3: shar_grow.c --]
[-- Type: text/x-csrc, Size: 998 bytes --]
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#define NPAG 200
int main(int argc, char *argv[])
{
char *src, *dst, *ptr, buf[255];
if ((src = mmap(0, getpagesize(), PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0)) == MAP_FAILED) {
perror("mmap()");
return 1;
}
printf("mapped to %p\n", src);
if ((dst = mremap(src, getpagesize(), NPAG * getpagesize(),
MREMAP_MAYMOVE)) == MAP_FAILED) {
perror("mremap()");
return 1;
}
sprintf(buf, "cat /proc/%i/maps", getpid());
system(buf);
fflush(stdout);
ptr = dst + (NPAG - 5) * getpagesize();
printf("trying %p (%p)...\n", ptr, ptr-dst);
*ptr = 20;
printf("%#x, works!\n", *ptr);
ptr = dst + (NPAG - 3) * getpagesize();
printf("trying %p (%p)...\n", ptr, ptr-dst);
*ptr = 20;
printf("%#x, works!\n", *ptr);
return 0;
}
[-- Attachment #4: Type: text/plain, Size: 82 bytes --]
Scanned by evaluation version of Dr.Web antivirus Daemon
http://drweb.ru/unix/
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-16 18:12 [patch][rfc] expandable anonymous shared mappings Stas Sergeev @ 2004-06-18 20:46 ` Hugh Dickins 2004-06-19 9:07 ` Stas Sergeev 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2004-06-18 20:46 UTC (permalink / raw) To: Stas Sergeev; +Cc: Andrew Morton, linux-kernel On Wed, 16 Jun 2004, Stas Sergeev wrote: > > It seems right now there is no way to > expand the anonymous shared mappings > (or am I missing something?) You're right, not missing something. > This makes this mechanism practically > useless for many tasks, it otherwise could > suit very well, I beleive. You may well be right, though I've no idea of those tasks myself. Your test case doesn't fork at all: didn't need to to demonstrate the behaviour you object to, but it doesn't give us any hints of the use for the change you suggest - shared anonymous only becomes interesting when you fork. But I do rather like your idea. At first I was sceptical, thinking the same could be achieved in other ways. But once forked, it's too late to extend with further shared anonymous areas, they won't be shared with your children by then. You can use shared maps of actual or tmpfs files, or SysV shm, but both are more cumbersome. Your point, which I now accept, is that we've got this odd thing of shared anonymous mappings, and this odd thing of mremap extending maps, can we please put them together in a way that's natural and useful, rather than in a way that's useless. (And Linux is free to specify mremap as it wishes, so long as reasonable compatibility is retained - and I agree we don't need to reproduce every SIGBUS of yesteryear. I would feel uncomfortable about letting mremap extend SysV shm objects, but that doesn't happen with your patch.) > The attached patch implements a simple > nopage vm op for anonymous shared mappings, > which allows to expand them with mremap(). I've appended a slightly different patch below: much like yours, but I've a few issues with your shmem_zero_nopage implementation and so reworked it e.g. it's scary to see an i_sem taken within mmap_sem (though I don't think there's really any possibility of deadlock in this case, given the special nature of this object); we actually want to avoid vmtruncate's RLIMIT_FSIZE check (it's an implementation detail that we're using something like a file to back the shared anonymous mapping, the relevant rlimit is RLIMIT_AS already checked by mremap); must be careful about two nopages extending at the same time; might as well extend object to end of vma in one go rather than fault by fault. Is this patch good for you? But... I've a couple of reservations, which make me reluctant to push this patch further without encouragement from others. One reservation: is there any security aspect to such a change? Previously the parent (or ancestor) determined the maximum size of the anonymous shared object, now children can extend it (and it will all remain allocated until the very last mapping of any part of it is unmapped). I don't see an actual problem with that, just see that it leads me into an area where I'm a little uneasy (might we need a further MAP_ flag to allow such extension?), and had better rely on the judgements of others. Other reservation: it's nicely only a few lines of code, but it really is a bit of a hack to be extending the object in the nopage, with SIGBUS on failure. Isn't the right approach (more trouble than it's worth?) to extend vm_operations_struct with an mremap method (perhaps NULL .mremap would be equivalent to VM_DONTEXPAND, and replace that flag in due course)? Which in the shmem_zero case would extend the underlying object (or fail with a proper mremap failure), and in other cases do nothing but allow the extension to succeed. My own inclination is towards the quick hack in this instance (add the method only when more uses appear), but again I'd defer to the judgements of others. Comments? Hugh --- 2.6.7/mm/shmem.c 2004-06-16 06:20:39.000000000 +0100 +++ linux/mm/shmem.c 2004-06-17 20:26:26.567507056 +0100 @@ -169,6 +169,7 @@ static struct file_operations shmem_file static struct inode_operations shmem_inode_operations; static struct inode_operations shmem_dir_inode_operations; static struct vm_operations_struct shmem_vm_ops; +static struct vm_operations_struct shmem_zero_vm_ops; static struct backing_dev_info shmem_backing_dev_info = { .ra_pages = 0, /* No readahead */ @@ -1076,7 +1077,8 @@ failed: return error; } -struct page *shmem_nopage(struct vm_area_struct *vma, unsigned long address, int *type) +struct page *shmem_nopage(struct vm_area_struct *vma, + unsigned long address, int *type) { struct inode *inode = vma->vm_file->f_dentry->d_inode; struct page *page = NULL; @@ -1095,6 +1097,36 @@ struct page *shmem_nopage(struct vm_area return page; } +struct page *shmem_zero_nopage(struct vm_area_struct *vma, + unsigned long address, int *type) +{ + struct page *page; + + page = shmem_nopage(vma, address, type); + if (unlikely(page == NOPAGE_SIGBUS)) { + struct inode *inode = vma->vm_file->f_dentry->d_inode; + struct shmem_inode_info *info = SHMEM_I(inode); + loff_t vm_size, i_size; + + /* + * If mremap has been used to extend the vma, + * now extend the underlying object to match it. + */ + vm_size = (vma->vm_end - vma->vm_start) + + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); + spin_lock(&info->lock); + i_size = i_size_read(inode); + if (i_size < vm_size && vm_size <= SHMEM_MAX_BYTES && + !shmem_acct_size(info->flags, vm_size - i_size)) { + i_size_write(inode, vm_size); + inode->i_ctime = inode->i_mtime = CURRENT_TIME; + } + spin_unlock(&info->lock); + page = shmem_nopage(vma, address, type); + } + return page; +} + static int shmem_populate(struct vm_area_struct *vma, unsigned long addr, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock) @@ -1976,6 +2008,15 @@ static struct vm_operations_struct shmem #endif }; +static struct vm_operations_struct shmem_zero_vm_ops = { + .nopage = shmem_zero_nopage, + .populate = shmem_populate, +#ifdef CONFIG_NUMA + .set_policy = shmem_set_policy, + .get_policy = shmem_get_policy, +#endif +}; + static struct super_block *shmem_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { @@ -2098,7 +2139,8 @@ put_memory: int shmem_zero_setup(struct vm_area_struct *vma) { struct file *file; - loff_t size = vma->vm_end - vma->vm_start; + loff_t size = (vma->vm_end - vma->vm_start) + + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); file = shmem_file_setup("dev/zero", size, vma->vm_flags); if (IS_ERR(file)) @@ -2107,7 +2149,7 @@ int shmem_zero_setup(struct vm_area_stru if (vma->vm_file) fput(vma->vm_file); vma->vm_file = file; - vma->vm_ops = &shmem_vm_ops; + vma->vm_ops = &shmem_zero_vm_ops; return 0; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-18 20:46 ` Hugh Dickins @ 2004-06-19 9:07 ` Stas Sergeev 2004-06-20 0:20 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Stas Sergeev @ 2004-06-19 9:07 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel [-- Attachment #1: Type: text/plain, Size: 5658 bytes --] Hi, Hugh Dickins wrote: >> It seems right now there is no way to >> expand the anonymous shared mappings >> (or am I missing something?) > You're right, not missing something. Good. Thanks for taking a look. >> This makes this mechanism practically >> useless for many tasks, it otherwise could >> suit very well, I beleive. > You may well be right, though I've no idea of those tasks myself. I'll try to explain what I wanted to achieve. > Your test case doesn't fork at all: didn't need to to demonstrate > the behaviour you object to, but it doesn't give us any hints of > the use for the change you suggest - Why I didn't demonstrate its use is because I actually have an impression the current behaviour is simply not what someone may expect after reading "man mmap" and "man mremap". Nothing in that docs says that it is not possible to expand that mapping with mremap(), and I expected it to be expandable the same way the private anonymous mapping is. So I just assumed it is a bug, not a design decision. So I demanstrated not too much, just whatever I considered a bogus behaveour. > shared anonymous only becomes > interesting when you fork. I disagree with this. The way I am using it, may look horrible (so that was another reason I haven't demonstrated it), but yes, I do use it without the fork(). Here is how am I using it, roughly: --- pool=mmap(0,pool_size,...,MAP_SHARED|MAP_ANONYMUS,...); area1=mremap2(pool, 0, area1_size, MREMAP_MAYMOVE|MREMAP_FIXED, area1_addr); area2=mremap2(pool, 0, area2_size, MREMAP_MAYMOVE|MREMAP_FIXED, area2_addr); assert(area1==area1_addr && area2==area2_addr); --- where mremap2() is a function which calls the mremap syscall directly, to make use of MREMAP_FIXED. Please don't beat me on that, that's just how I am (ab)using that stuff :) > But I do rather like your idea. At first I was sceptical, thinking > the same could be achieved in other ways. But once forked, it's too > late to extend with further shared anonymous areas, they won't be > shared with your children by then. You can use shared maps of actual > or tmpfs files, or SysV shm, but both are more cumbersome. Yes, that's the main point. And also I think this makes the interface more consistent. > Your point, which I now accept, is that we've got this odd thing of > shared anonymous mappings, and this odd thing of mremap extending > maps, can we please put them together in a way that's natural and > useful, rather than in a way that's useless. Yes, exactly! This says everything I had to say in my initial post myself, but haven't done. > I've appended a slightly different patch below: much like yours, > but I've a few issues with your shmem_zero_nopage implementation [...] > we actually want to avoid vmtruncate's RLIMIT_FSIZE check (it's Yes, I expected that calling vmtruncate() can be a bad idea vrt RLIMIT_FSIZE check, esp. when it sends the SIGXFSZ and then my patch returns NOPAGE_SIGBUS. Your patch is certainly better. > Is this patch good for you? Yes, the patch works. But I still think it can be cleared up a bit. Esp. I think we can avoid calling shmem_nopage() twice as we know for sure (or do we?) that it will fail when i_size<vm_size. I merged my patch and your patch, I think whatever I get, is a little cleaner. It is attached. > One reservation: is there any security aspect to such a change? > Previously the parent (or ancestor) determined the maximum size > of the anonymous shared object, now children can extend it (and > it will all remain allocated until the very last mapping of any > part of it is unmapped). I don't see an actual problem with that, > just see that it leads me into an area where I'm a little uneasy > (might we need a further MAP_ flag to allow such extension?), I think (and I may be completely wrong) that we need the flag to *disallow* such extension. Something like MAP_DONTEXPAND which will set the VM_DONTEXPAND for vma. Something more generic, not making the anon shared mapping a special case. > Other reservation: it's nicely only a few lines of code, but > it really is a bit of a hack to be extending the object in the > nopage, with SIGBUS on failure. I agree with that completely. But that seems to be the design choise to me, so I hadn't any idea how to get around this, when making the patch (truncating inside the mremap() itself doesn't really work with the current design I think). > Isn't the right approach (more > trouble than it's worth?) to extend vm_operations_struct with > an mremap method (perhaps NULL .mremap would be equivalent to > VM_DONTEXPAND, and replace that flag in due course)? Which in That sounds like a very good idea to me. I was always wondering if it is really the right thing to not have the .mremap handler at all. And IIRC right now the missing nopage handler just causes the zero-page to be mapped in, which already made me confused a couple of times (for example I tried to expand the /dev/mem mapping and was wondering why nothing works). Instead, making a missing mremap handler to be equivalent of VM_DONTEXPAND, so that mremap() to just fail right away, sound very reasonable to me. But I am not pretending to understand the linux mm code, so I may be wrong in everything I am saying here. As a starting point I am attaching a small patch for VM_DONTEXPAND. I think it is implemented wrongly right now. It should allow the things like mremap(addr, 0, old_size, MREMAP_MAYMOVE); but it doesn't. I think it is a bug, and if we want to have whatever you suggest, we need this patch. What do you think? Thanks for your comments again. They were extremely usefull for me in both educational and practical purposes. [-- Attachment #2: anon_shm1.diff --] [-- Type: text/x-patch, Size: 2359 bytes --] --- 2.6.7/mm/shmem.c 2004-06-16 06:20:39.000000000 +0100 +++ linux/mm/shmem.c 2004-06-17 20:26:26.567507056 +0100 @@ -169,6 +169,7 @@ static struct inode_operations shmem_inode_operations; static struct inode_operations shmem_dir_inode_operations; static struct vm_operations_struct shmem_vm_ops; +static struct vm_operations_struct shmem_zero_vm_ops; static struct backing_dev_info shmem_backing_dev_info = { .ra_pages = 0, /* No readahead */ @@ -1095,6 +1096,33 @@ return page; } +struct page *shmem_zero_nopage(struct vm_area_struct *vma, unsigned long address, int *type) +{ + struct inode *inode = vma->vm_file->f_dentry->d_inode; + loff_t vm_size = (vma->vm_end - vma->vm_start) + + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); + loff_t i_size = i_size_read(inode); + + if (unlikely(i_size < vm_size)) { + struct shmem_inode_info *info = SHMEM_I(inode); + if (shmem_acct_size(info->flags, vm_size - i_size)) + return NOPAGE_OOM; + /* + * If mremap has been used to extend the vma, + * now extend the underlying object to match it. + */ + if (vm_size <= SHMEM_MAX_BYTES) { + spin_lock(&info->lock); + i_size_write(inode, vm_size); + inode->i_ctime = inode->i_mtime = CURRENT_TIME; + spin_unlock(&info->lock); + } else + return NOPAGE_SIGBUS; + } + + return shmem_nopage(vma, address, type); +} + static int shmem_populate(struct vm_area_struct *vma, unsigned long addr, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock) @@ -1970,6 +1998,15 @@ #endif }; +static struct vm_operations_struct shmem_zero_vm_ops = { + .nopage = shmem_zero_nopage, + .populate = shmem_populate, +#ifdef CONFIG_NUMA + .set_policy = shmem_set_policy, + .get_policy = shmem_get_policy, +#endif +}; + static struct super_block *shmem_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { @@ -2092,7 +2129,8 @@ int shmem_zero_setup(struct vm_area_struct *vma) { struct file *file; - loff_t size = vma->vm_end - vma->vm_start; + loff_t size = (vma->vm_end - vma->vm_start) + + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); file = shmem_file_setup("dev/zero", size, vma->vm_flags); if (IS_ERR(file)) @@ -2101,7 +2139,7 @@ if (vma->vm_file) fput(vma->vm_file); vma->vm_file = file; - vma->vm_ops = &shmem_vm_ops; + vma->vm_ops = &shmem_zero_vm_ops; return 0; } [-- Attachment #3: mremap.diff --] [-- Type: text/x-patch, Size: 357 bytes --] --- linux-2.6.6/mm/mremap.c.old 2004-06-14 19:48:36.000000000 +0400 +++ linux-2.6.6/mm/mremap.c 2004-06-19 11:54:28.508681472 +0400 @@ -320,7 +320,7 @@ if (old_len > vma->vm_end - addr) goto out; if (vma->vm_flags & VM_DONTEXPAND) { - if (new_len > old_len) + if (new_len > vma->vm_end - addr) goto out; } if (vma->vm_flags & VM_LOCKED) { [-- Attachment #4: Type: text/plain, Size: 82 bytes --] Scanned by evaluation version of Dr.Web antivirus Daemon http://drweb.ru/unix/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-19 9:07 ` Stas Sergeev @ 2004-06-20 0:20 ` Hugh Dickins 2004-06-20 9:46 ` Stas Sergeev 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2004-06-20 0:20 UTC (permalink / raw) To: Stas Sergeev; +Cc: Andrew Morton, Christoph Rohland, linux-kernel On Sat, 19 Jun 2004, Stas Sergeev wrote: > Hugh Dickins wrote: > > shared anonymous only becomes interesting when you fork. > I disagree with this. The way I am using it may look horrible, > but yes, I do use it without the fork(). Then I think you have no reason to use MAP_SHARED: use MAP_PRIVATE and you should get the behaviour you require, without kernel change. Shared anonymous is peculiar: although mapping is anonymous (nothing shared with unrelated mms), modifications are shared between parent and children. It's half-way between anonymous and file-backed. We agree that it might be nice to let the object used to support that be extended if mremap extends the mapping. But it might instead just be needless feature creep. Sorry, your case does not persuade me yet. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-20 0:20 ` Hugh Dickins @ 2004-06-20 9:46 ` Stas Sergeev 2004-06-23 11:23 ` Christoph Rohland 2004-06-28 14:22 ` Hugh Dickins 0 siblings, 2 replies; 14+ messages in thread From: Stas Sergeev @ 2004-06-20 9:46 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Christoph Rohland, linux-kernel Hi. Hugh Dickins wrote: >> I disagree with this. The way I am using it may look horrible, >> but yes, I do use it without the fork(). > Then I think you have no reason to use MAP_SHARED: use MAP_PRIVATE > and you should get the behaviour you require, without kernel change. Hugh, I think you misunderstood me because once again I wrote something, said nothing - happens to me sometimes. The trick is that I am setting the old_len arg of mremap() to 0. This means that the new mapping is created while the old one is *not* being destroyed. So I get multiple virtual memory areas referencing the same shared memory region, lets call them "aliases". You propose to share the same backing-store across the multiple processes. Instead, I am sharing it across the multiple memory areas of a single process. I can't use MAP_PRIVATE for that, really. Then I want to expand my initial pool of shared mem while preserving the already created "aliases", and there seems to be no way of doing that: creating the larger pool and mremap'ing old one to the beginning of the new one, leaves the VMAs unmerged; creating the larger pool and mamcpy() the content of the old one to it, doesn't preserve the already created "aliases". The only thing I can do, is to expand the initial pool with mremap(), which doesn't work. So I have to resort to the more heavyweight things like shm_open(), while otherwise the expandable anonymous shared mapping can suit very well for my needs. > Shared anonymous is peculiar: although mapping is anonymous (nothing > shared with unrelated mms), modifications are shared between parent > and children. It's half-way between anonymous and file-backed. > We agree that it might be nice to let the object used to support that > be extended if mremap extends the mapping. But it might instead just > be needless feature creep. But then I beleive the entire idea of anonymous shared mapping is also a crap. But since it is already there, I would like to have it fully functional, so that I can avoid the things like shm_open() when possible. I just don't see the reason of keeping something partially implemented. > Sorry, your case does not persuade me yet. Well, I beleive perhaps you missed the fact that I was setting the old_len to 0 in mremap(), which doesn't work as I want to, when you use MAP_PRIVATE. You'll probably call it a horrible hack, but here's where that technique comes from: http://www.ussg.iu.edu/hypermail/linux/kernel/0401.1/1351.html And since it comes from here, I beleive it must be fully supported. ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [patch][rfc] expandable anonymous shared mappings 2004-06-20 9:46 ` Stas Sergeev @ 2004-06-23 11:23 ` Christoph Rohland 2004-06-23 18:45 ` Stas Sergeev 2004-06-28 14:22 ` Hugh Dickins 1 sibling, 1 reply; 14+ messages in thread From: Christoph Rohland @ 2004-06-23 11:23 UTC (permalink / raw) To: 'Stas Sergeev', 'Hugh Dickins' Cc: 'Andrew Morton', linux-kernel Hi Stas, Hugh, > The trick is that I am setting the old_len arg > of mremap() to 0. This means that the new mapping > is created while the old one is *not* being > destroyed. So I get multiple virtual memory > areas referencing the same shared memory region, > lets call them "aliases". I would propose you use the posix shm API for what you want to do and leave anonymous shared memory as a special case of this like it is... Keep it simple and stupid! Just my 2cts Christoph ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-23 11:23 ` Christoph Rohland @ 2004-06-23 18:45 ` Stas Sergeev 2004-06-28 14:25 ` Christoph Rohland 0 siblings, 1 reply; 14+ messages in thread From: Stas Sergeev @ 2004-06-23 18:45 UTC (permalink / raw) To: Christoph Rohland Cc: 'Hugh Dickins', 'Andrew Morton', linux-kernel Hi Christoph. Christoph Rohland wrote: >> The trick is that I am setting the old_len arg >> of mremap() to 0. This means that the new mapping >> is created while the old one is *not* being >> destroyed. So I get multiple virtual memory >> areas referencing the same shared memory region, >> lets call them "aliases". > I would propose you use the posix shm API for what you want to do and leave > anonymous shared memory as a special case of this like it is... > Keep it simple and stupid! Yes, I can use the posix shm API for that. But no, this will not be simple and stupid. Anon shared mapping - that's the thing that could really be simple and stupid, if you ask me. Posix shm API have some disadvantages in my eyes: 1. If you need many small SHM objects, in order to manage them vrt resizing, you'll have to track the addresses to the corresponding FDs, so that you can ftruncate(). Managing multiple FDs can be avoided by using the custom allocator on a single large pool. But in both ways I have to duplicate some functionality which is already in the kernel, but that I just can't use effectively. With the anon mapping I was hoping to avoid this duplication. The fact that some functionality cannot be used in an effective way, also doesn't appeal to me. 2. posix shm API (if we are talking about shm_open() and friends) is, AFAIK, implemented in librt. This, in turn, will get the libpthread.so to be linked to your program. I have problems with libpthreads. For ages it has the severe bugs, for example it was not possible to use sigaltstack() in the programs which are linked with libpthreads. And I am using the sigaltstack()... It is fixed in the very recent glibc, but upgrading glibc can be sometimes a bit problematic. But no, my prog won't die without the expandable anon shared mapping thing. I can resort to the other matters. I can (as I do right now) to just open a tmp file in /dev/shm and use the custom allocator. Yes, that works, but I wanted to keep it "simple and stupid". The anon shared mapping looked like the good candidate to try out. And then I thought it may be nice if the kernel to provide this functionality. If you suggest me to use the other mechanisms, you probably imply that doing such a things in the kernel is not a good idea. If this functionality is considered bad or useless by the kernel developers, then of course it shouldn't be integrated and I'll use other things. But really it looks like a bug to me, as both the file-backed shared, and the anon-private mappings are expandable. I also accept the Hugh's reservations and I see integrating this may be problematic. But since he also said he liked the idea itself, I'll wait for a while before dropping the use of the anon shared mapping in my prog. ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [patch][rfc] expandable anonymous shared mappings 2004-06-23 18:45 ` Stas Sergeev @ 2004-06-28 14:25 ` Christoph Rohland 0 siblings, 0 replies; 14+ messages in thread From: Christoph Rohland @ 2004-06-28 14:25 UTC (permalink / raw) To: 'Stas Sergeev' Cc: 'Hugh Dickins', 'Andrew Morton', linux-kernel Hi Stas, > But no, my prog won't die without the expandable > anon shared mapping thing. I can resort to the other > matters. I can (as I do right now) to just open a > tmp file in /dev/shm and use the custom allocator. BTW that's what librt does: look into /dev/shm scheck if it is tmpfs and do an open... > Yes, that works, but I wanted to keep it "simple > and stupid". The anon shared mapping looked like > the good candidate to try out. Yes, I understand. For your use case anon shared mem with the small addition is perfect. > And then I thought it may be nice if the kernel > to provide this functionality. For me the question is: The functionality you propose is very specific for your problem. The kernel should provide general solutions. So is there a bigger demand for this? Greetings Christoph ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-20 9:46 ` Stas Sergeev 2004-06-23 11:23 ` Christoph Rohland @ 2004-06-28 14:22 ` Hugh Dickins 2004-06-28 18:48 ` Stas Sergeev 1 sibling, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2004-06-28 14:22 UTC (permalink / raw) To: Stas Sergeev; +Cc: Andrew Morton, Christoph Rohland, linux-kernel I hope my lightning responses don't catch you unawares ;) On Sun, 20 Jun 2004, Stas Sergeev wrote: > Hugh Dickins wrote: > >> I disagree with this. The way I am using it may look horrible, > >> but yes, I do use it without the fork(). > > Then I think you have no reason to use MAP_SHARED: use MAP_PRIVATE > > and you should get the behaviour you require, without kernel change. > Hugh, I think you misunderstood me because > once again I wrote something, said nothing - > happens to me sometimes. > The trick is that I am setting the old_len arg > of mremap() to 0. Yes, I did indeed overlook that, sorry. You're quite right that you need MAP_SHARED for this use, and my dismissal was wrong. Thanks for forcing me to understand at last how mremap old_len 0 behaves! >From time to time in the last week I've reconsidered your feature (faulting on mremap-expanded area of shared anonymous object expands the object itself instead of SIGBUS): I still _like_ it, but cannot shake off my unease with it. I like the few lines of code in shmem.c, and I don't mind that it's a bit of a hack that ought really to be done at mremap time - I'm happier with the fewness of lines as you have it. But I'm still uneasy how children (not in your usage) could expand the object without the parent knowing, all the object remaining in use until all mms have unmapped all portions. Plus I've realized we've no complementary interface to shrink the object (we cannot redefine the behaviour of mremap with new_len less than old_len). Both those could be dealt with, by adding a new MAP_flag and a new MREMAP_flag - but just for you? I'm wary of a feature without its complement; and I'm wary of adding rather odd neat new features which just _might_ cause difficulty down the line. mremap is itself a rather odd neat feature, and shared anonymous is another rather odd neat feature. You're only trying to get them to play better together, but extending a file (at fault time or within mremap) is something new, which could be troublesome to go on supporting if we change other things around it (odd cases often end up being a nuisance when it comes to lock ordering, for example). > and there seems to be no way of doing that: > creating the larger pool and mremap'ing old > one to the beginning of the new one, leaves the > VMAs unmerged; creating the larger pool and > mamcpy() the content of the old one to it, > doesn't preserve the already created "aliases". > The only thing I can do, is to expand the > initial pool with mremap(), which doesn't work. > So I have to resort to the more heavyweight > things like shm_open(), while otherwise the > expandable anonymous shared mapping can suit > very well for my needs. I don't see how shm helps you, that's fixed-size objects too, isn't it? Can't you use a tmpfs file? Create it, unlink it, mmap a page of it, ftruncate and mremap together whenever you need? That's then using standard interfaces without any kernel change - provided CONFIG_TMPFS=y. > > Shared anonymous is peculiar: although mapping is anonymous (nothing > > shared with unrelated mms), modifications are shared between parent > > and children. It's half-way between anonymous and file-backed. > > We agree that it might be nice to let the object used to support that > > be extended if mremap extends the mapping. But it might instead just > > be needless feature creep. > But then I beleive the entire idea of anonymous > shared mapping is also a crap. But since it is > already there, I would like to have it fully > functional, so that I can avoid the things like > shm_open() when possible. I just don't see the > reason of keeping something partially implemented. Shared anonymous is fully implemented; you have a natural and useful extension to its behaviour with mremap, I agree it seems reasonable, but I think we're safer not to make the change. > > Sorry, your case does not persuade me yet. > Well, I beleive perhaps you missed the fact > that I was setting the old_len to 0 in mremap(), > which doesn't work as I want to, when you use > MAP_PRIVATE. > You'll probably call it a horrible hack, but > here's where that technique comes from: > http://www.ussg.iu.edu/hypermail/linux/kernel/0401.1/1351.html > And since it comes from here, I beleive it > must be fully supported. Linus was arguing for back-compatibility, that we must continue to support old_len 0: he wasn't asking for new features here. I don't oppose your patch - except in the details, please stick with my version if you continue with it. Your second version of shmem_zero_nopage remained incorrect (accounting for a size change without holding the lock against another making a change at the same time); and in removing the double shmem_nopage you were optimizing for the very rare (newly allowed) case, at the expense of every normal use (which go through the size tests twice with yours). If you want to persuade Andrew or Linus directly with the patch, that's fine by me, but my own opinion is to let caution rule. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-28 14:22 ` Hugh Dickins @ 2004-06-28 18:48 ` Stas Sergeev 2004-06-28 21:44 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Stas Sergeev @ 2004-06-28 18:48 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Christoph Rohland, linux-kernel Hi Hugh. Hugh Dickins wrote: > I like the few lines of code in shmem.c, and I don't mind that it's > a bit of a hack that ought really to be done at mremap time - > I'm happier with the fewness of lines as you have it. But you also mentioned the per-vma mremap handler. Have you considered it an overkill after all? > Plus I've realized > we've no complementary interface to shrink the object (we cannot > redefine the behaviour of mremap with new_len less than old_len). What I was thinking about, is that this inode truncation mess should be resolved by the kernel completely internally, and the app should see only the anonymous mapping. From that point of view I think there is no such a problem. Program will shrink the mapping with the usual mremap(), inode will not be trunceted, but so what? The mapping will be shrunk, the virtual address space will be freed, so what is to worry about? I might be missing something. > Both those could be dealt with, by adding a new MAP_flag > and a new MREMAP_flag - but just for you? But I think the per-vma mremap() handler, as you suggested before, can solve everything, isn't it? And I beleive this will not be usefull only for me then, while the new flags may indeed be usefull for me only (which is not good). > You're only trying to get them to play better together, but > extending a file (at fault time or within mremap) is something > new, which could be troublesome to go on supporting if we change > other things around it I agree with this completely, but my point is that we are already deeply in that mess anyway. In particular, I see basically the same things in shmem_zero_setup()/ shmem_file_setup(). It is absolutely similar as far as I can tell, so I don't agree that I proposed really something new to it. The only difference I can see is that it is being invoked not by the fault handler... >> So I have to resort to the more heavyweight >> things like shm_open(), while otherwise the >> expandable anonymous shared mapping can suit >> very well for my needs. > I don't see how shm helps you, that's fixed-size objects too, isn't it? AFAIK - no, but actually I haven't tried, used a custom allocator on a single large object instead. AFAIK you can expand the posix shm objects with mremap() perfectly well, you just need to ftruncate() first. No? If no, I may reconsider the sanity of this my proposal entirely. I thought mremap() works to expand any kind of shared mapping, that's why I thought the inability to expand one particular type of shared mapping is a misbehavoir. How about this: http://www.ussg.iu.edu/hypermail/linux/kernel/9603.2/0692.html > Can't you use a tmpfs file? Create it, unlink it, mmap a page of it, > ftruncate and mremap together whenever you need? That's then using > standard interfaces without any kernel change - provided CONFIG_TMPFS=y. The problem is that I need multiple independantly shared allocations. I can either use multiple tmp files and track their mapping addresses to their descriptors by hands if I want to ftruncate(), or I have to use the custom memory allocator on a single pool - thats what I do now, but I wanted to make it simpler. And the shared anon mapping could make it simpler for me I beleive. But I am not insisting in case you think it is really not worth an efforts to do in the kernel. I just want to point out the potential advantages because I beleive if it is implemented, it can be usefull not only for me. >> shm_open() when possible. I just don't see the >> reason of keeping something partially implemented. > Shared anonymous is fully implemented; you have a natural > and useful extension to its behaviour with mremap I thought any kinds of mapping should be expandable with mremap(), esp. the shared ones. This impression comes to me from the above URL. From that point of view this is not an extension, but rather the unimplemented feature. If it is only an extension, then perhaps it is really not worth the troubles. > Linus was arguing for back-compatibility, that we must continue > to support old_len 0: he wasn't asking for new features here. I feel an inconsistency here. Consider you mmap'ed 10 pages and duplicated 1 page with mremap(,0,). Then you get this area of one page, which you actually *can* expand with mremap(). But as soon as you expanded it more than to original 10 pages - SIGBUS. And not immediately, but rather when you get around to access it, i.e. in a completely unpredictable place:( And what's worse, is that mremap() perfectly succeeds, so you don't even expect the failure. What's even worse, is that your /proc/self/maps will show that expanded region as being present and one will never expect the SIGBUS after all this. If at least /proc/self/maps to reveal the truth, or mremap() to fail - then yes, the program might be able to handle the failures. But everything suggests that the mapping was expanded, but you just give the SIGBUS in your face at one point. For some reasons this looks extremely insane and unreliable to me. My program is rather complex and it runs the "foreign" code sometimes (say, third-party plugins), so I have some difficulties to control everything by hands, when I can't get the reliable result from the kernel. > I don't oppose your patch - except in the details, please stick > with my version if you continue with it. Sure, no problems. > If you want to persuade Andrew or Linus directly with the patch, No, I won't of course, and besides, this may not be even remotely possible:) > that's fine by me, but my own opinion is to let caution rule. OK, I tried my best to make up my point. I was not trying to convince you that I need it, rather I was trying point out that without this patch, mremap() behaves not the way it is expected to behave, furthermore, it behaves unreliably and there is no way to find out whether it is really succeeded or not. If you still feel it is only a nice extension to the otherwise perfectly functional interface, then there might be no point to include it only for me. My main point was to avoid the mremap's unreliability. There are cases where nopage handler is missing. You get the zero-page mapped in, in these cases. This is bad, but at least you don't get a SIGBUS. Here we have a completely different thing - nopage handler is present, but doesn't really work, and you get the really unexpected and unpredictable results. I wanted to make them both expected and predictable. With this in mind, I would even feel safer if this functionality is not available at all, rather then letting it go the way it is now. Btw, this small patch about VM_DONTEXPAND in my previous mail, what do you think about that one? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-28 18:48 ` Stas Sergeev @ 2004-06-28 21:44 ` Hugh Dickins 2004-06-29 17:06 ` Stas Sergeev 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2004-06-28 21:44 UTC (permalink / raw) To: Stas Sergeev; +Cc: Andrew Morton, Christoph Rohland, linux-kernel On Mon, 28 Jun 2004, Stas Sergeev wrote: > Hugh Dickins wrote: > > I like the few lines of code in shmem.c, and I don't mind that it's > > a bit of a hack that ought really to be done at mremap time - > > I'm happier with the fewness of lines as you have it. > But you also mentioned the per-vma mremap handler. > Have you considered it an overkill after all? Yes, that's what I meant: on balance, in the absence of further uses for a new .mremap vm_operations method, I do consider it overkill, and prefer your little addition of shmem_zero_nopage. > > Plus I've realized > > we've no complementary interface to shrink the object (we cannot > > redefine the behaviour of mremap with new_len less than old_len). > What I was thinking about, is that this inode truncation > mess should be resolved by the kernel completely internally, > and the app should see only the anonymous mapping. From > that point of view I think there is no such a problem. > Program will shrink the mapping with the usual mremap(), > inode will not be trunceted, but so what? The mapping > will be shrunk, the virtual address space will be freed, > so what is to worry about? I might be missing something. The underlying "object" is not shrunk, /proc/meminfo Committed_AS is not shrunk, /proc/sys/vm/overcommit_memory 2 will still consider the memory in reserved, it'll still be taking up memory+swap - and must be, in case an mremap expand follows to make it accessible again. None of which is an issue within a single mm, it's all normal. And someone with a more reliable security instinct than I may be able to assure us that it's not an issue within a group of mms, forked from each other. But to me it does look like a new situation, that before the change the parent of all the forked anonymous shared mappings imposed the upper limit on their memory usage, but with the change any child could change that irreversibly (irreversibly until all have unmapped all or exited). Danger of memory going missing mysteriously (perhaps equivalent to existing danger, but I've not been convinced of that). > > Both those could be dealt with, by adding a new MAP_flag > > and a new MREMAP_flag - but just for you? > But I think the per-vma mremap() handler, as you > suggested before, can solve everything, isn't it? > And I beleive this will not be usefull only for me > then, while the new flags may indeed be usefull for > me only (which is not good). Either way, without someone else speaking up, it does seem just for you. > > You're only trying to get them to play better together, but > > extending a file (at fault time or within mremap) is something > > new, which could be troublesome to go on supporting if we change > > other things around it > I agree with this completely, but my point is that > we are already deeply in that mess anyway. In particular, > I see basically the same things in shmem_zero_setup()/ > shmem_file_setup(). It is absolutely similar as far > as I can tell, so I don't agree that I proposed really > something new to it. The only difference I can see is that > it is being invoked not by the fault handler... Shared anonymous is (and has to be) implemented quite differently from private anonymous. You rightly resent that the implementation is imposing a surprising limitation on the user-visible semantics (surprising compared with private; expected compared with shared file, but then we hit that there's no ftruncate to extend anon). I sympathise. But I've never heard that complained of before, you're exaggerating to say we're deeply in a mess here. I'd like to change it, but pathetic caution tells me to beware, the change might cause trouble down the line. > >> So I have to resort to the more heavyweight > >> things like shm_open(), while otherwise the > >> expandable anonymous shared mapping can suit > >> very well for my needs. > > I don't see how shm helps you, that's fixed-size objects too, isn't it? > AFAIK - no, but actually I haven't tried, used a > custom allocator on a single large object instead. > AFAIK you can expand the posix shm objects with > mremap() perfectly well, you just need to > ftruncate() first. No? If no, I may reconsider Sorry, I was talking SysV shm there. POSIX shm, as I understand it, simply works with the tmpfs mounted at /dev/shm: I never think of POSIX shm, just of tmpfs. Yes, you can use ftruncate on an fd created with shm_open. > the sanity of this my proposal entirely. I thought > mremap() works to expand any kind of shared mapping, > that's why I thought the inability to expand one > particular type of shared mapping is a misbehavoir. > How about this: > http://www.ussg.iu.edu/hypermail/linux/kernel/9603.2/0692.html Hey, (no offence!) are you a lawyer? You're very good at looking up past history to support your case. But I'm afraid this does not actually add support to your case (certainly doesn't subtract from it either) - it's about using mremap to track extensions to a file extended by other means. > > Can't you use a tmpfs file? Create it, unlink it, mmap a page of it, > > ftruncate and mremap together whenever you need? That's then using > > standard interfaces without any kernel change - provided CONFIG_TMPFS=y. > The problem is that I need multiple independantly > shared allocations. I can either use multiple tmp > files and track their mapping addresses to their > descriptors by hands if I want to ftruncate(), or > I have to use the custom memory allocator on a single > pool - thats what I do now, but I wanted to make it > simpler. And the shared anon mapping could make it > simpler for me I beleive. Yes, it would make it simpler, but not significantly simpler. You could ask instead for an mtruncate system call (similarly to save having to keep track of fds with addresses), but the need does not seem compelling. > But I am not insisting in case you think it is really > not worth an efforts to do in the kernel. I just > want to point out the potential advantages because > I beleive if it is implemented, it can be usefull > not only for me. Nobody else has spoken up yet. > >> shm_open() when possible. I just don't see the > >> reason of keeping something partially implemented. > > Shared anonymous is fully implemented; you have a natural > > and useful extension to its behaviour with mremap > I thought any kinds of mapping should be expandable > with mremap(), esp. the shared ones. This impression > comes to me from the above URL. From that point of > view this is not an extension, but rather the > unimplemented feature. If it is only an extension, > then perhaps it is really not worth the troubles. It's an extension, a natural extension in principle (seen from the private anon side), but in implementation less natural. > > Linus was arguing for back-compatibility, that we must continue > > to support old_len 0: he wasn't asking for new features here. > I feel an inconsistency here. Consider you mmap'ed > 10 pages and duplicated 1 page with mremap(,0,). [snip surprise at SIGBUS beyond end of shared object] We're just repeating our acknowledged positions. > Btw, this small patch about VM_DONTEXPAND in my > previous mail, what do you think about that one? This one? I overlooked it before: it seems to be a slight redefinition of VM_DONTEXPAND, and I don't see what bug it fixes. --- linux-2.6.6/mm/mremap.c.old 2004-06-14 19:48:36.000000000 +0400 +++ linux-2.6.6/mm/mremap.c 2004-06-19 11:54:28.508681472 +0400 @@ -320,7 +320,7 @@ if (old_len > vma->vm_end - addr) goto out; if (vma->vm_flags & VM_DONTEXPAND) { - if (new_len > old_len) + if (new_len > vma->vm_end - addr) goto out; } if (vma->vm_flags & VM_LOCKED) { Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-28 21:44 ` Hugh Dickins @ 2004-06-29 17:06 ` Stas Sergeev 2004-06-29 17:41 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Stas Sergeev @ 2004-06-29 17:06 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Christoph Rohland, linux-kernel Hi Hugh. Hugh Dickins wrote: > But I've never heard that complained of before, > you're exaggerating to say we're deeply in a mess here. I'd What I wanted to say is only that shmem_file_setup() does exactly the same thing - creates and truncates the file itself. But this is no longer important as I finally realized your points. > Sorry, I was talking SysV shm there. Excellent point btw! I verified that, mremap() on the SysV shm gives me the same nice SIGBUS. I feel much better right now:) I was wrong assuming that expanding the shared mapping with mremap() should be valid in any case, SysV shm is an obvious example, and it resembles the anon-shm the way they both do not have an FD accessible. So yes, I finally realized that my proposal is nothing more than an extension to the otherwise functional interface, and should not be traded as a bugfix. And yes, the inability to shrink it back is also not friendly. As for not allowing children to expand, I'd vote for MAP_DONTEXPAND flag, but this already have nothing to do with my original proposal, so no need to worry about it. Thanks for you time and explanations, after all they worked out right:) >> http://www.ussg.iu.edu/hypermail/linux/kernel/9603.2/0692.html > up past history to support your case. But I'm afraid this does > not actually add support to your case (certainly doesn't subtract > from it either) - it's about using mremap to track extensions > to a file extended by other means. Well, what I meant pointing to this URL, was this: --- In most cases (at least for the malloc case) this will be a anonymous mapping, but it's by no means an error to extend any mapping you have created. --- Either I read it the completely wrong way, or it is no longer valid, but in the light of the above, it seems like indeed this doesn't add the support to my case. > Yes, it would make it simpler, but not significantly simpler. You are right. > You could ask instead for an mtruncate system call (similarly Good idea, why not? :) > [snip surprise at SIGBUS beyond end of shared object] But you snipped the most important part:) Now I understand, however, the problems I had with mremap(), have nothing special to do with the anon-shm, it just seems to be the usual thing with that syscall. It is full of surprises, it will map a zero-page to you, it will give you a SIGBUS, it will do everything but not what you really want from it:) I always wanted to have the more reliable mremap(). Not the one that can expand everything in the world, but the one that returns the value you can rely upon, as all the other syscalls do. But the way I wanted to achieve it (by implementing the "proper" nopage handler for anon-shm) is definitely not the right one. That's why I was very happy when you proposed the .mremap handlers per vma, but since this have nothing to do with the current subject, I should probably go complaining about mremap() elsewhere. Thank you and Christoph for the very usefull discussion. It was usefull maybe again only for me, but I hope you weren't bored with it too much either. Thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-29 17:06 ` Stas Sergeev @ 2004-06-29 17:41 ` Hugh Dickins 2004-06-29 19:03 ` Stas Sergeev 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2004-06-29 17:41 UTC (permalink / raw) To: Stas Sergeev; +Cc: Andrew Morton, Christoph Rohland, linux-kernel On Tue, 29 Jun 2004, Stas Sergeev wrote: > Now I understand, however, the problems I had with > mremap(), have nothing special to do with the anon-shm, > it just seems to be the usual thing with that syscall. > It is full of surprises, it will map a zero-page to you, > it will give you a SIGBUS, it will do everything > but not what you really want from it:) I always > wanted to have the more reliable mremap(). Not the > one that can expand everything in the world, but > the one that returns the value you can rely upon, > as all the other syscalls do. >From this attack on poor little mremap(), I think perhaps there's something else you didn't realize, that I'd assumed you did realize. If you have a file of size, say, 2 pages; and you mmap 3 pages of it (from offset 0 for simplicity); then you try to access the third page of your mapping... you get SIGBUS. That's so whether it's a shared or a private mapping. It's even so if it's a private mapping, you write your own data into the second page, you truncate the file to 1 page, you try to access your own data in the second page again. That's how mmap is specified to behave, not just on Linux. So mremap() is entirely consistent to allow you to extend a mapping beyond the end of the object, such that you'll get SIGBUS if you try to access the end of your mapping. The deficiency with shared anonymous is that there's no way to expand or shrink the underlying object to match the mapping, whereas you can ftruncate a real file. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] expandable anonymous shared mappings 2004-06-29 17:41 ` Hugh Dickins @ 2004-06-29 19:03 ` Stas Sergeev 0 siblings, 0 replies; 14+ messages in thread From: Stas Sergeev @ 2004-06-29 19:03 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Christoph Rohland, linux-kernel Hi. Hugh Dickins wrote: >> It is full of surprises, it will map a zero-page to you, >> it will give you a SIGBUS, it will do everything > From this attack on poor little mremap(), I think perhaps there's > something else you didn't realize, that I'd assumed you did realize. OK, the last time I was messing around the mremap, it was mapping the anonymous pages to me when I was trying to expand the /dev/mem mapping (missing nopage handler apparently, having the failure could be much better in that case I think). > So mremap() is entirely consistent to allow you to extend a mapping > beyond the end of the object, such that you'll get SIGBUS if you > try to access the end of your mapping. Yes, as for SIGBUS - now I see why it works that way (not necessary accept its correctness, but what you said, sounds perfectly valid to me, so the complete retirement on my side:) > The deficiency with shared anonymous is that there's no way to expand > or shrink the underlying object to match the mapping, whereas you can > ftruncate a real file. ... and/or expand the anonymous private mapping without the truncation (nothing to truncate). So yes, after all, it was specific to the anon-shm, which is a special case. And introducing the per-vma mremap handlers to fix only this special case (by doing the truncation immediately) is an overkill and is not required by anyone except myself. And this: --- In most cases (at least for the malloc case) this will be a anonymous mapping, but it's by no means an error to extend any mapping you have created. --- may just mean that it is never an error, but you have to deal with the consequences yourself if you expanded only the mapping and not the object (except for the case of anon-shm, where you can't deal with the consequences anyhow since you can't expand the object). Have I got everything properly this time, or failed the homework agan? :) ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-06-29 19:03 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-16 18:12 [patch][rfc] expandable anonymous shared mappings Stas Sergeev 2004-06-18 20:46 ` Hugh Dickins 2004-06-19 9:07 ` Stas Sergeev 2004-06-20 0:20 ` Hugh Dickins 2004-06-20 9:46 ` Stas Sergeev 2004-06-23 11:23 ` Christoph Rohland 2004-06-23 18:45 ` Stas Sergeev 2004-06-28 14:25 ` Christoph Rohland 2004-06-28 14:22 ` Hugh Dickins 2004-06-28 18:48 ` Stas Sergeev 2004-06-28 21:44 ` Hugh Dickins 2004-06-29 17:06 ` Stas Sergeev 2004-06-29 17:41 ` Hugh Dickins 2004-06-29 19:03 ` Stas Sergeev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox