All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Tambe <tambewilliam@gmail.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Stas Sergeev <stsp@aknet.ru>,
	"Rohland, Hans-Christoph" <hans-christoph.rohland@sap.com>,
	linux-kernel@vger.kernel.org
Subject: Re: Concerning a post that you made about expandable anonymous shared mappings
Date: Mon, 09 Jul 2007 20:50:23 -0500	[thread overview]
Message-ID: <4692E5DF.7080304@gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0707031606370.16965@blonde.wat.veritas.com>


Hugh Dickins wrote:
> On Mon, 2 Jul 2007, Stas Sergeev wrote:
>> Hugh Dickins wrote:
>>> You've answered your own question: we did not make the change Stas
>>> suggested, IIRC because I remained a little uneasy with that change
>>> in behaviour, and nobody else spoke up for it.
>> IIRC your argument, that made sense to me,
>> was that with such an approach, you can only
>> expand the backing-store, but never shrink.
>> I agree this is a problem from some point of
>> view. I have no idea whether it is important
>> or not, but it definitely _looks_ not very good.
> 
> You were gracious enough to accept my arguments back then, but after
> mulling this over overnight, I've come to think I was just too timid
> back then, and gave too much weight to the issue of there being no
> shrink, and to the issue that child might expand the object without
> parent knowing (that surplus remaining allocated until both exited).
> 
> I've come right around to your original view, Stas, and William's:
> if that mmap creates such an object, then the expanding mremap really
> ought to be useful, and allow the underlying object to be expanded.
> The shared anonymous object is already anomalous: expanding it on
> fault makes it more consistent with its own nature, not less.
> 
>>>>         //why does this failed. I am well in the interval [4096, 8192]
>>>>         *(unsigned int *)(ptr + 4096 + 8)= 10;
>>>> }
>> Well, this generates a bus error, while, for
>> example, doing the same trick with having a
>> /dev/mem as a backing-store, simply maps the
>> "enlarged" space with the anonymous memory,
>> and so does not generate a SIGBUS (not producing
>> a desired effect either, of course).
>> Why do we have it both ways? Shouldn't they
>> (i.e. /dev/zero and /dev/mem mapping) behave
>> the same after expanding with mremap?
> 
> They've very different: mapping /dev/mem maps an existing object;
> whereas mapping /dev/zero is a convention by which a new object
> is created - and we can't afford the memory to make it of infinite
> extent, so have limited it to the mapped size.
> 
>>> I haven't given it any thought since then:
>> I was thinking about it a bit, and I imagined
>> we need something like
>>    int mopen(void *addr, size_t len, unsigned int flags);
>> which will give you an fd for the memory area,
>> which you can then ftruncate() and mmap() (and
>> mremap).
> 
> Ah, if we added an mopen(), there's no end to the discussions
> we could have about what it can do.  It may be a great idea:
> but it's really not needed to solve this particular little
> problem.  As last time around, you were suggesting .mremap
> callouts; but I much prefer your original shmem_zero_nopage,
> which is a solution of the scale appropriate to the problem.
> 
>> Such a thing could solve that mremap() problem
>> that me and William Tambe have.
> 
> If you're thinking of going that way, for this problem it
> would be better just to stick with POSIX shm_open, as Christoph
> suggested before, and you suggest to William in other mail.
> 
> But I now accept your view, that the shared anonymous object
> is not behaving consistently in response to mremap, and would
> like to put in a patch for that.  This isn't a particularly good
> time for such a patch - it's unclear right now whether 2.6.23 will
> have shmem_nopage like 2.6.22 or shmem_fault like 2.6.22-rc-mm;
> but we can easily adjust to whichever.
> 
> Here's a patch against 2.6.22-rc7: would you, Stas, put your
> Signed-off-by into this, and accept authorship - although I'm
> sending this back to you, it's very much your idea, and only
> trivially modified from your three-year-old patch by me.  If
> you're agreeable, I can then forward it or its shmem_zero_fault
> equivalent to Andrew when we see which way 2.6.23 is going.
> 
> [I'll fill in the patch comment later]
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ----
> 
>  mm/shmem.c |   41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> --- 2.6.22-rc7/mm/shmem.c	2007-06-17 10:51:02.000000000 +0100
> +++ linux/mm/shmem.c	2007-07-03 15:35:32.000000000 +0100
> @@ -1320,6 +1320,36 @@ static struct page *shmem_nopage(struct 
>  	return page;
>  }
>  
> +struct page *shmem_zero_nopage(struct vm_area_struct *vma,
> +				unsigned long address, int *type)
> +{
> +	struct page *page;
> +
> +	page = shmem_nopage(vma, address, type);
> +	if (unlikely(page == NOPAGE_SIGBUS)) {
> +		struct inode *inode = vma->vm_file->f_dentry->d_inode;
> +		struct shmem_inode_info *info = SHMEM_I(inode);
> +		loff_t vm_size, i_size;
> +
> +		/*
> +		 * If mremap has been used to extend the vma,
> +		 * now extend the underlying object to include this page.
> +		 */
> +		vm_size = (PAGE_ALIGN(address) - vma->vm_start) +
> +				((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> +		spin_lock(&info->lock);
> +		i_size = i_size_read(inode);
> +		if (i_size < vm_size && vm_size <= SHMEM_MAX_BYTES &&
> +		    !shmem_acct_size(info->flags, vm_size - i_size)) {
> +			i_size_write(inode, vm_size);
> +			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
> +		}
> +		spin_unlock(&info->lock);
> +		page = shmem_nopage(vma, address, type);
> +	}
> +	return page;
> +}
> +
>  static int shmem_populate(struct vm_area_struct *vma,
>  	unsigned long addr, unsigned long len,
>  	pgprot_t prot, unsigned long pgoff, int nonblock)
> @@ -2471,6 +2501,14 @@ static struct vm_operations_struct shmem
>  #endif
>  };
>  
> +static struct vm_operations_struct shmem_zero_vm_ops = {
> +	.nopage		= shmem_zero_nopage,
> +	.populate	= shmem_populate,
> +#ifdef CONFIG_NUMA
> +	.set_policy     = shmem_set_policy,
> +	.get_policy     = shmem_get_policy,
> +#endif
> +};
>  
>  static int shmem_get_sb(struct file_system_type *fs_type,
>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> @@ -2599,6 +2637,7 @@ int shmem_zero_setup(struct vm_area_stru
>  	if (vma->vm_file)
>  		fput(vma->vm_file);
>  	vma->vm_file = file;
> -	vma->vm_ops = &shmem_vm_ops;
> +	vma->vm_ops = &shmem_zero_vm_ops;
> +	vma->vm_pgoff = 0;
>  	return 0;
>  }
> 

Will this patch be added to stable versions of the linux kernel?
Please let me know.

Sincerely,
William Tambe

  parent reply	other threads:[~2007-07-10  1:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-29 19:52 Concerning a post that you made about expandable anonymous shared mappings William Tambe
2007-07-02 15:33 ` Hugh Dickins
2007-07-02 17:35   ` William Tambe
2007-07-02 18:21     ` Stas Sergeev
2007-07-02 18:10   ` Stas Sergeev
2007-07-03 15:48     ` Hugh Dickins
2007-07-03 18:29       ` Stas Sergeev
2007-07-10  1:50       ` William Tambe [this message]
2007-07-10 20:10         ` Hugh Dickins
2007-07-10 20:55           ` William Tambe
2007-07-11  4:12             ` Stas Sergeev
2007-07-12  6:35               ` William Tambe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4692E5DF.7080304@gmail.com \
    --to=tambewilliam@gmail.com \
    --cc=hans-christoph.rohland@sap.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stsp@aknet.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.