From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Pierre Tardy <tardyp@gmail.com>, Ingo Molnar <mingo@elte.hu>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Tom Zanussi <tzanussi@gmail.com>,
Paul Mackerras <paulus@samba.org>,
linux-kernel@vger.kernel.org, arjan@infradead.org,
ziga.mahkovec@gmail.com, davem <davem@davemloft.net>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Jens Axboe <jens.axboe@oracle.com>
Subject: Re: Unexpected splice "always copy" behavior observed
Date: Wed, 19 May 2010 16:31:16 +1000 [thread overview]
Message-ID: <20100519063116.GR2516@laptop> (raw)
In-Reply-To: <alpine.LFD.2.00.1005180918300.4195@i5.linux-foundation.org>
On Tue, May 18, 2010 at 09:25:05AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 18 May 2010, Steven Rostedt wrote:
> >
> > Hopefully we can find a way to avoid the copy to file. But the splice
> > code was created to avoid the copy to and from userspace, it did not
> > guarantee no copy within the kernel itself.
>
> Well, we always _wanted_ to splice directly to a file, but it's just not
> been done properly. It's not entirely trivial, since you need to worry
> about preexisting pages and generally just do the right thing wrt the
> filesystem.
>
> And no, it should NOT use migration code. I suspect you could do something
> fairly simple like:
I was thinking it could possibly reuse some of the migration code for
swapping filesystem state to the new page. But I agree it gets hairy and
is probably better to just insert new pages.
>
> - get the inode semaphore.
> - check if the splice is a pure "extend size" operation for that page
> - if so, just create the page cache entry and mark it dirty
> - otherwise, fall back to copying.
>
> because the "extend file" case is the easiest one, and is likely the only
> one that matters in practice (if you are overwriting an existing file,
> things get _way_ hairier, and why the hell would anybody expect that to be
> fast anyway?)
>
> But somebody needs to write the code..
We can possibly do an attempt to invalidate existing pagecache and
then try to install the new page. The filesystem still needs a look
over to ensure error handling will work properly, and that it does
not make incorrect assumptions about the contents of the page being
passed in.
This still isn't ideal because we drop the filesystem state (eg bufer
heads) on a page which, by definition, will need to be written out soon.
But something smarter could be added if it turns out to be important.
Big if, because I don't like adding complex code without having a
really good reason. I do like having the splice flag there, though.
The more the app can tell the kernel the better. Hopefully people use
it and we can get a better idea of whether these fancy optimisations
will be worth it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-05-19 6:31 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-18 15:34 Unexpected splice "always copy" behavior observed Mathieu Desnoyers
2010-05-18 15:51 ` Nick Piggin
2010-05-18 15:56 ` Christoph Lameter
2010-05-18 16:00 ` Nick Piggin
2010-05-18 16:13 ` Nick Piggin
2010-05-18 15:53 ` Steven Rostedt
2010-05-18 16:10 ` Steven Rostedt
2010-05-18 16:25 ` Linus Torvalds
2010-05-19 6:31 ` Nick Piggin [this message]
2010-05-19 14:39 ` Linus Torvalds
2010-05-19 14:56 ` Steven Rostedt
2010-05-19 14:59 ` Linus Torvalds
2010-05-19 15:12 ` Steven Rostedt
2010-05-19 15:51 ` Mathieu Desnoyers
2010-05-19 15:33 ` Miklos Szeredi
2010-05-19 15:45 ` Steven Rostedt
2010-05-19 15:55 ` Nick Piggin
2010-05-19 16:01 ` Mathieu Desnoyers
2010-05-19 16:36 ` Steven Rostedt
2010-05-19 15:57 ` Mathieu Desnoyers
2010-05-19 16:27 ` Nick Piggin
2010-05-19 19:14 ` Mathieu Desnoyers
2010-05-19 19:31 ` Linus Torvalds
2010-05-19 21:49 ` Mathieu Desnoyers
2010-05-20 0:04 ` Linus Torvalds
2010-05-20 1:56 ` Mathieu Desnoyers
2010-05-20 14:18 ` Linus Torvalds
2010-05-19 20:59 ` Rick Sherm
2010-05-19 15:17 ` Nick Piggin
2010-05-19 15:30 ` Linus Torvalds
2010-05-19 15:44 ` Nick Piggin
2010-05-19 15:28 ` Miklos Szeredi
2010-05-19 15:32 ` Linus Torvalds
2010-05-19 15:56 ` Miklos Szeredi
2010-05-19 16:01 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100519063116.GR2516@laptop \
--to=npiggin@suse.de \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=cl@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=fweisbec@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tardyp@gmail.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tzanussi@gmail.com \
--cc=ziga.mahkovec@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).