public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: William Tambe <tambewilliam@gmail.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Stas Sergeev <stsp@aknet.ru>,
	"Rohland, Hans-Christoph" <hans-christoph.rohland@sap.com>,
	linux-kernel@vger.kernel.org
Subject: Re: Concerning a post that you made about expandable anonymous shared mappings
Date: Mon, 09 Jul 2007 20:50:23 -0500	[thread overview]
Message-ID: <4692E5DF.7080304@gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0707031606370.16965@blonde.wat.veritas.com>


Hugh Dickins wrote:
> On Mon, 2 Jul 2007, Stas Sergeev wrote:
>> Hugh Dickins wrote:
>>> You've answered your own question: we did not make the change Stas
>>> suggested, IIRC because I remained a little uneasy with that change
>>> in behaviour, and nobody else spoke up for it.
>> IIRC your argument, that made sense to me,
>> was that with such an approach, you can only
>> expand the backing-store, but never shrink.
>> I agree this is a problem from some point of
>> view. I have no idea whether it is important
>> or not, but it definitely _looks_ not very good.
> 
> You were gracious enough to accept my arguments back then, but after
> mulling this over overnight, I've come to think I was just too timid
> back then, and gave too much weight to the issue of there being no
> shrink, and to the issue that child might expand the object without
> parent knowing (that surplus remaining allocated until both exited).
> 
> I've come right around to your original view, Stas, and William's:
> if that mmap creates such an object, then the expanding mremap really
> ought to be useful, and allow the underlying object to be expanded.
> The shared anonymous object is already anomalous: expanding it on
> fault makes it more consistent with its own nature, not less.
> 
>>>>         //why does this failed. I am well in the interval [4096, 8192]
>>>>         *(unsigned int *)(ptr + 4096 + 8)= 10;
>>>> }
>> Well, this generates a bus error, while, for
>> example, doing the same trick with having a
>> /dev/mem as a backing-store, simply maps the
>> "enlarged" space with the anonymous memory,
>> and so does not generate a SIGBUS (not producing
>> a desired effect either, of course).
>> Why do we have it both ways? Shouldn't they
>> (i.e. /dev/zero and /dev/mem mapping) behave
>> the same after expanding with mremap?
> 
> They've very different: mapping /dev/mem maps an existing object;
> whereas mapping /dev/zero is a convention by which a new object
> is created - and we can't afford the memory to make it of infinite
> extent, so have limited it to the mapped size.
> 
>>> I haven't given it any thought since then:
>> I was thinking about it a bit, and I imagined
>> we need something like
>>    int mopen(void *addr, size_t len, unsigned int flags);
>> which will give you an fd for the memory area,
>> which you can then ftruncate() and mmap() (and
>> mremap).
> 
> Ah, if we added an mopen(), there's no end to the discussions
> we could have about what it can do.  It may be a great idea:
> but it's really not needed to solve this particular little
> problem.  As last time around, you were suggesting .mremap
> callouts; but I much prefer your original shmem_zero_nopage,
> which is a solution of the scale appropriate to the problem.
> 
>> Such a thing could solve that mremap() problem
>> that me and William Tambe have.
> 
> If you're thinking of going that way, for this problem it
> would be better just to stick with POSIX shm_open, as Christoph
> suggested before, and you suggest to William in other mail.
> 
> But I now accept your view, that the shared anonymous object
> is not behaving consistently in response to mremap, and would
> like to put in a patch for that.  This isn't a particularly good
> time for such a patch - it's unclear right now whether 2.6.23 will
> have shmem_nopage like 2.6.22 or shmem_fault like 2.6.22-rc-mm;
> but we can easily adjust to whichever.
> 
> Here's a patch against 2.6.22-rc7: would you, Stas, put your
> Signed-off-by into this, and accept authorship - although I'm
> sending this back to you, it's very much your idea, and only
> trivially modified from your three-year-old patch by me.  If
> you're agreeable, I can then forward it or its shmem_zero_fault
> equivalent to Andrew when we see which way 2.6.23 is going.
> 
> [I'll fill in the patch comment later]
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ----
> 
>  mm/shmem.c |   41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> --- 2.6.22-rc7/mm/shmem.c	2007-06-17 10:51:02.000000000 +0100
> +++ linux/mm/shmem.c	2007-07-03 15:35:32.000000000 +0100
> @@ -1320,6 +1320,36 @@ static struct page *shmem_nopage(struct 
>  	return page;
>  }
>  
> +struct page *shmem_zero_nopage(struct vm_area_struct *vma,
> +				unsigned long address, int *type)
> +{
> +	struct page *page;
> +
> +	page = shmem_nopage(vma, address, type);
> +	if (unlikely(page == NOPAGE_SIGBUS)) {
> +		struct inode *inode = vma->vm_file->f_dentry->d_inode;
> +		struct shmem_inode_info *info = SHMEM_I(inode);
> +		loff_t vm_size, i_size;
> +
> +		/*
> +		 * If mremap has been used to extend the vma,
> +		 * now extend the underlying object to include this page.
> +		 */
> +		vm_size = (PAGE_ALIGN(address) - vma->vm_start) +
> +				((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> +		spin_lock(&info->lock);
> +		i_size = i_size_read(inode);
> +		if (i_size < vm_size && vm_size <= SHMEM_MAX_BYTES &&
> +		    !shmem_acct_size(info->flags, vm_size - i_size)) {
> +			i_size_write(inode, vm_size);
> +			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
> +		}
> +		spin_unlock(&info->lock);
> +		page = shmem_nopage(vma, address, type);
> +	}
> +	return page;
> +}
> +
>  static int shmem_populate(struct vm_area_struct *vma,
>  	unsigned long addr, unsigned long len,
>  	pgprot_t prot, unsigned long pgoff, int nonblock)
> @@ -2471,6 +2501,14 @@ static struct vm_operations_struct shmem
>  #endif
>  };
>  
> +static struct vm_operations_struct shmem_zero_vm_ops = {
> +	.nopage		= shmem_zero_nopage,
> +	.populate	= shmem_populate,
> +#ifdef CONFIG_NUMA
> +	.set_policy     = shmem_set_policy,
> +	.get_policy     = shmem_get_policy,
> +#endif
> +};
>  
>  static int shmem_get_sb(struct file_system_type *fs_type,
>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> @@ -2599,6 +2637,7 @@ int shmem_zero_setup(struct vm_area_stru
>  	if (vma->vm_file)
>  		fput(vma->vm_file);
>  	vma->vm_file = file;
> -	vma->vm_ops = &shmem_vm_ops;
> +	vma->vm_ops = &shmem_zero_vm_ops;
> +	vma->vm_pgoff = 0;
>  	return 0;
>  }
> 

Will this patch be added to stable versions of the linux kernel?
Please let me know.

Sincerely,
William Tambe

  parent reply	other threads:[~2007-07-10  1:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-29 19:52 Concerning a post that you made about expandable anonymous shared mappings William Tambe
2007-07-02 15:33 ` Hugh Dickins
2007-07-02 17:35   ` William Tambe
2007-07-02 18:21     ` Stas Sergeev
2007-07-02 18:10   ` Stas Sergeev
2007-07-03 15:48     ` Hugh Dickins
2007-07-03 18:29       ` Stas Sergeev
2007-07-10  1:50       ` William Tambe [this message]
2007-07-10 20:10         ` Hugh Dickins
2007-07-10 20:55           ` William Tambe
2007-07-11  4:12             ` Stas Sergeev
2007-07-12  6:35               ` William Tambe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4692E5DF.7080304@gmail.com \
    --to=tambewilliam@gmail.com \
    --cc=hans-christoph.rohland@sap.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stsp@aknet.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox