From: Badari Pulavarty <pbadari@us.ibm.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Jeff Dike <jdike@addtoit.com>, linux-mm <linux-mm@kvack.org>,
dvhltc@us.ibm.com
Subject: Re: [RFC][PATCH] OVERCOMMIT_ALWAYS extension
Date: Mon, 24 Oct 2005 13:24:08 -0700 [thread overview]
Message-ID: <1130185448.6831.20.camel@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.61.0510242027001.6509@goblin.wat.veritas.com>
On Mon, 2005-10-24 at 21:04 +0100, Hugh Dickins wrote:
> On Thu, 20 Oct 2005, Badari Pulavarty wrote:
> >
> > Changes from previous:
> >
> > 1) madvise(DISCARD) - zaps the range and discards the pages. So, no
> > need to call madvise(DONTNEED) before.
> >
> > 2) I added truncate_inode_pages2_range() to just discard only the
> > range of pages - not the whole file.
> >
> > Hugh, when you get a chance could you review this instead ?
>
> I haven't had time to go through it thoroughly, and will have no time
> the next couple of days, but here are some remarks.
>
> --- linux-2.6.14-rc3/include/asm-alpha/mman.h 2005-09-30 14:17:35.000000000 -0700
> +++ linux-2.6.14-rc3.db2/include/asm-alpha/mman.h 2005-10-20 10:52:37.000000000 -0700
> @@ -42,6 +42,7 @@
> #define MADV_WILLNEED 3 /* will need these pages */
> #define MADV_SPACEAVAIL 5 /* ensure resources are available */
> #define MADV_DONTNEED 6 /* don't need these pages */
> +#define MADV_DISCARD 7 /* discard pages right now */
>
> Throughout the patch there's lots of spaces where there should be tabs.
> But I'm glad you've put a space after the "#define" here, unlike in that
> MADV_SPACEAVAIL higher up! Not so glad at your spaces to the right of it.
Sorry about that. I was working with Darren's old patch and didn't
bother cleaning up the white spaces.
>
> Are we free to define MADV_DISCARD, coming after the others, in each of
> the architectures? In general, I think mman.h reflects definitions made
> by native Operating Systems of the architectures in question, and they
> might have added a few since.
I looked at all architectures. No matter what their header file says,
none of them actually implemented anything other than the standard ones
(documented in the manpages).
> --- linux-2.6.14-rc3/include/linux/mm.h 2005-09-30 14:17:35.000000000 -0700
> +++ linux-2.6.14-rc3.db2/include/linux/mm.h 2005-10-20 13:41:57.000000000 -0700
> @@ -865,6 +865,7 @@ extern unsigned long do_brk(unsigned lon
> /* filemap.c */
> extern unsigned long page_unuse(struct page *);
> extern void truncate_inode_pages(struct address_space *, loff_t);
> +extern void truncate_inode_pages2_range(struct address_space *, loff_t, loff_t);
>
> Personally, I have an aversion to sticking a "2" in there. I know you're
> just following the convention established by invalidate_inode_pages2, but..
>
> Hold on, -mm already contains reiser4-truncate_inode_pages_range.patch,
> you should be working with that. Doesn't it do just what you need,
> even without a "2" :-?
Yes. Thats exactly what I did also. One less thing to worry for me :)
>
> --- linux-2.6.14-rc3/mm/madvise.c 2005-09-30 14:17:35.000000000 -0700
> +++ linux-2.6.14-rc3.db2/mm/madvise.c 2005-10-20 13:37:41.000000000 -0700
> @@ -137,6 +137,40 @@ static long madvise_dontneed(struct vm_a
> return 0;
> }
>
> +static long madvise_discard(struct vm_area_struct * vma,
> + struct vm_area_struct ** prev,
> + unsigned long start, unsigned long end)
> +{
> ....
> + error = madvise_dontneed(vma, prev, start, end);
> + if (error)
> + return error;
> +
> + /* looks good, try and rip it out of page cache */
> + printk("%s: trying to rip shm vma (%p) inode from page cache\n", __FUNCTION__, vma);
> + offset = (loff_t)(start - vma->vm_start);
> + endoff = (loff_t)(end - vma->vm_start);
> + printk("call truncate_inode_pages(%p, %x %x)\n", mapping,
> + (unsigned int)offset, (unsigned int)endoff);
> + down(&mapping->host->i_sem);
> + truncate_inode_pages2_range(mapping, offset, endoff);
> + up(&mapping->host->i_sem);
> + return 0;
> +}
>
> Hmm. I don't think it's consistent to zap the pages from a single mm,
> then remove them from the page cache, while leaving the pages mapped into
> other mms. Just what would those pages then be? they're not file pages,
> they're not anonymous pages, such pages have given trouble in the past.
>
> I think you'll need to follow vmtruncate much more closely - and the
> unmap_mapping_range code already allows for a range, shouldn't need
> much change - going through all the vmas before truncating the range.
>
> Which makes it feel more like sys_fpunch than an madvise.
>
> You of course need write access to the underlying file, is that checked?
>
> What should it be doing to anonymous COWed pages? Not clear whether
> it should be following truncate in discarding those too, or not.
You are right. What we have here is a kludge - pointed out by Andrea
also (in a private e-mail). He recommended that I should look at
doing "real" MADV_TRUNCATE and add filesystem hooks to make it
sure its not limited to only "shmfs".
I am re-doing it again. I am scared to touch this part of VM code,
thats why I was trying to get away with smallest possible thing.
I guess its time to sit and do it for real.
Thanks,
Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2005-10-24 20:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-17 17:30 [RFC] OVERCOMMIT_ALWAYS extension Badari Pulavarty
2005-10-17 18:13 ` Hugh Dickins
2005-10-17 18:25 ` Hugh Dickins
2005-10-17 23:14 ` Badari Pulavarty
2005-10-18 16:05 ` [RFC][PATCH] " Badari Pulavarty
2005-10-19 17:56 ` Hugh Dickins
2005-10-19 18:32 ` Jeff Dike
2005-10-19 21:21 ` Badari Pulavarty
2005-10-19 22:38 ` Jeff Dike
2005-10-19 18:50 ` Badari Pulavarty
2005-10-19 19:12 ` Darren Hart
2005-10-19 20:10 ` Hugh Dickins
2005-10-19 20:47 ` Jeff Dike
2005-10-20 15:11 ` Badari Pulavarty
2005-10-20 17:27 ` Jeff Dike
2005-10-20 22:37 ` Badari Pulavarty
2005-10-24 20:04 ` Hugh Dickins
2005-10-24 20:22 ` Darren Hart
2005-10-24 20:24 ` Badari Pulavarty [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1130185448.6831.20.camel@localhost.localdomain \
--to=pbadari@us.ibm.com \
--cc=dvhltc@us.ibm.com \
--cc=hugh@veritas.com \
--cc=jdike@addtoit.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.