linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Andrea Arcangeli <andrea@suse.de>
Cc: pbadari@us.ibm.com, ak@suse.de, hugh@veritas.com,
	jdike@addtoit.com, dvhltc@us.ibm.com, linux-mm@kvack.org
Subject: Re: [RFC] madvise(MADV_TRUNCATE)
Date: Thu, 27 Oct 2005 15:23:40 -0700	[thread overview]
Message-ID: <20051027152340.5e3ae2c6.akpm@osdl.org> (raw)
In-Reply-To: <20051027213721.GX5091@opteron.random>

Andrea Arcangeli <andrea@suse.de> wrote:
>
> On Thu, Oct 27, 2005 at 01:50:58PM -0700, Andrew Morton wrote:
> > This is what I'm asking about.  What's the requirement?  What's the
> > application?  What's the workload?  What's the testcase?  All that old
> > stuff.  This should have been the very, very first thing which Badari
> > presented to us.
> 
> I mentioned the reason we need that feature at the end of the last email.

It's slowly becoming clearer ;)

> > If we do it this way then we should do it for other filesystems.  And then
> 
> Why do you think so? Even O_DIRECT and the acl were not supported by all
> the fs immediately, what's wrong with that? This is normal procedure as
> far as I can tell. If -ENOSYS is returned, it means the app should
> fallback to some other way to do the truncate by hand (depending on the
> app, bzero could work or some other app can be ok with doing nothing at
> all if -ENOSYS is returned).

But in the case of O_DIRECT and acls we had a plan, from day one, to extend
the capability to many (ideally all) filesystems.

We have no such plan for holepunching!

Maybe we _should_ have such a plan, but we've never discussed it.

If we _do_ have such a plan (or might in the future) then what would the
API look like?  I think sys_holepunch(fd, start, len), so we should start
out with that.

If we don't have such a plan, and we don't think that we ever will have
such a plan, then what should the API look like?

Using madvise is very weird, because people will ask "why do I need to mmap
my file before I can stick a hole in it?"

None of the other madvise operations call into the filesystem in this manner.

A broad question is: is this capability an MM operation or a filesytem
operation?  truncate, for example, is a filesystem operation which
sometimes has MM side-effects.  madvise is an mm operation and with this
patch, it gains FS side-effects, only they're really, really significant
ones.

So I'm struggling to work out where all this is headed, and how we should
think about it all.

> > we should do it for files which _aren't_ mmapped.  And then we should do it
> > on a finer-than-PAGE_SIZE granularity.
> 
> I agree with this. I also suggested doing all of it, not just the mmap
> interface.

Right.  Sometime, maybe.  There's been _some_ demand for holepunching, but
it's been fairly minor and is probably a distraction from this immediate
and specific customer requirement.

> However the only thing they care about is the mmap interface,
> and this is why this is coming first. Also note, my MADV_TRUNCATE is by
> coincidence needed by IBM too, the testcase I was trying to improve was
> not an IBM workload, I learnt about the IBM effort only a few days ago.
> But others happen to need it for the very same reason (no, not Oracle,
> but Oracle would benefit from it too of course).
> 
> > IOW: we're unlikely to implement MADV_TRUNCATE for anything other than
> > tmpfs, in which case MADV_TRUNCATE will remain a tmpfs specific hack, no?
> 
> In 2.6 yes. But in the future it's an API we can extend to work on more
> fs with well defined semantics.

Right.  And in the future I think it would be designed as a generalisation
of sys_ftruncate().

> What's the benefit in having MADV_DISCARD that works on tmpfs, and then
> some day in the future to add a MADV_TRUNCATE that works on other fs too?
> 
> The retval of MADV_TRUNCATE will still be an error in both cases for
> older kernels. So we may go for the more generic API in the first place
> IMHO.
> 
> The less MADV_MESS there is the better and the more explicit the name is
> the better too.
> 
> > Or to swap it out.
> 
> Ok, the whole point is to release the swap. This stuff is already in
> completely swap for ages, nobody touched it for ages, but it's bad for
> performance and for swap fragmentation if after a peak of load 16G
> remains always in swap when infact the app could release all the
> swap after the load went down (if only it could use MADV_TRUNCATE).

ah-hah.

hm.   Tossing ideas out here:

- Implement the internal infrastructure as you have it

- View it as a filesystem operation which has MM side-effects.

- Initially access it via sys_ipc()  (or madvise, I guess.  Both are a bit odd)

- Later access it via sys_[hole]punch()

Alternatively, access it via sys_[hole]punch() immediately, but I'm not
sure that userspace can get access to the shm area's fd?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2005-10-27 22:23 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-26 22:49 [RFC] madvise(MADV_TRUNCATE) Badari Pulavarty
2005-10-27  8:38 ` Andi Kleen
2005-10-27 13:17   ` Andrea Arcangeli
2005-10-27 15:00     ` Badari Pulavarty
2005-10-27 15:11       ` Andrea Arcangeli
2005-10-27 18:20         ` Andrew Morton
2005-10-27 18:35           ` Badari Pulavarty
2005-10-27 18:50             ` Andrew Morton
2005-10-27 19:40               ` Gerrit Huizenga
2005-10-27 19:56                 ` Andi Kleen
2005-10-27 23:21                   ` Darren Hart
2005-10-27 20:05               ` Theodore Ts'o
2005-10-27 20:16                 ` Andrea Arcangeli
2005-10-28  1:42                 ` Badari Pulavarty
2005-10-28 16:33                   ` Theodore Ts'o
2005-10-27 20:22               ` Jeff Dike
2005-10-27 20:04           ` Andrea Arcangeli
2005-10-27 20:50             ` Andrew Morton
2005-10-27 21:37               ` Andrea Arcangeli
2005-10-27 22:23                 ` Andrew Morton [this message]
2005-10-27 23:05                   ` Badari Pulavarty
2005-10-27 23:16                     ` Andrew Morton
2005-10-27 23:33                       ` Peter Chubb
2005-10-28  0:22                   ` Andrea Arcangeli
2005-10-28  0:32                     ` Andrew Morton
2005-10-28  1:10                       ` Andrea Arcangeli
2005-10-28  1:27                       ` Badari Pulavarty
2005-10-28  2:00                         ` Andrew Morton
2005-10-27 22:32               ` Badari Pulavarty
2005-10-27 23:28             ` Peter Chubb
2005-10-27 23:49               ` Andrew Morton
2005-10-27 23:56                 ` Nathan Scott
2005-10-28  0:15                   ` Andrea Arcangeli
2005-10-27 23:59                 ` Peter Chubb
2005-10-28  3:46 ` Jeff Dike
2005-10-28 11:03   ` Blaisorblade
2005-10-28 13:29     ` Andrea Arcangeli
2005-10-28 16:56       ` Blaisorblade
2005-10-28 16:16     ` Badari Pulavarty
2005-10-28 18:40       ` Blaisorblade
2005-10-28 18:56         ` Badari Pulavarty
2005-10-29  0:35         ` Badari Pulavarty
2005-10-28 16:19   ` Badari Pulavarty
2005-10-28 17:10     ` Blaisorblade
2005-10-28 18:28       ` Jeff Dike
2005-10-28 18:44         ` Blaisorblade
2005-10-28 18:42     ` Jeff Dike
2005-10-28 18:54       ` Badari Pulavarty
2005-10-29  0:03       ` Badari Pulavarty
2005-10-29  2:51         ` Jeff Dike
2005-10-31 16:34           ` Badari Pulavarty
2005-10-31 19:15           ` Badari Pulavarty
2005-10-31 19:49           ` [RFC][PATCH] madvise(MADV_TRUNCATE) Badari Pulavarty
2005-11-01  0:05             ` Jeff Dike
2005-11-02  1:15               ` [PATCH] 2.6.14 patch for supporting madvise(MADV_FREE) Badari Pulavarty
2005-11-02  1:43                 ` Andrea Arcangeli
2005-11-02 15:49                   ` Badari Pulavarty
2005-11-02 16:12                   ` [PATCH] 2.6.14 patch for supporting madvise(MADV_REMOVE) Badari Pulavarty
2005-11-02 19:54                     ` New bug in patch and existing Linux code - race with install_page() (was: Re: [PATCH] 2.6.14 patch for supporting madvise(MADV_REMOVE)) Blaisorblade
2005-11-02 20:12                       ` Hugh Dickins
2005-11-02 20:45                         ` Hugh Dickins
2005-11-02 21:36                       ` Badari Pulavarty
2005-11-02 21:55                         ` Hugh Dickins
2005-11-02 22:02                           ` Badari Pulavarty
2005-11-12  0:25                     ` [PATCH] 2.6.14 patch for supporting madvise(MADV_REMOVE) Andrew Morton
2005-11-12  0:34                       ` Badari Pulavarty
2005-11-12  1:43                         ` Andrew Morton
2005-11-12  4:41                           ` Badari Pulavarty
2006-01-16 13:06                             ` differences between MADV_FREE and MADV_DONTNEED Andrea Arcangeli
2006-01-16 16:02                               ` Suleiman Souhlal
2006-01-16 16:28                                 ` Andrea Arcangeli
2006-01-16 17:03                                   ` Suleiman Souhlal
2006-01-16 17:24                                     ` Andrea Arcangeli
2006-01-16 21:43                                       ` Eric W. Biederman
2006-01-17  0:24                                         ` Suleiman Souhlal
2006-01-17  1:04                                           ` Nicholas Miell
2006-01-17 12:43                                             ` Christoph Hellwig
2006-01-17 18:23                                               ` Eric W. Biederman
2006-01-17 22:55                                                 ` Nicholas Miell
2007-03-01 18:11                                                 ` Samuel Thibault
2006-01-17 19:06                                               ` Badari Pulavarty
2006-01-17  1:06                               ` Blaisorblade
2006-01-17  1:33                                 ` Andrea Arcangeli
2005-11-12  0:34                     ` [PATCH] 2.6.14 patch for supporting madvise(MADV_REMOVE) Andrew Morton
2005-10-28 17:55   ` [RFC] madvise(MADV_TRUNCATE) Blaisorblade
2005-10-28 21:23     ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051027152340.5e3ae2c6.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=ak@suse.de \
    --cc=andrea@suse.de \
    --cc=dvhltc@us.ibm.com \
    --cc=hugh@veritas.com \
    --cc=jdike@addtoit.com \
    --cc=linux-mm@kvack.org \
    --cc=pbadari@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).