linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jamie Lokier <jamie@shareable.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	jens.axboe@oracle.com, akpm@linux-foundation.org,
	nickpiggin@yahoo.com.au, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch v3] splice: fix race with page invalidation
Date: Thu, 31 Jul 2008 11:54:56 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.1.10.0807311142510.3277@nehalem.linux-foundation.org> (raw)
In-Reply-To: <20080731172111.GA23644@shareable.org>



On Thu, 31 Jul 2008, Jamie Lokier wrote:
> 
> But did you miss the bit where you DON'T COPY ANYTHING EVER*?  COW is
> able provide _correctness_ for the rare corner cases which you're not
> optimising for.  You don't actually copy more than 0.0% (*approx).

The thing is, just even _marking_ things COW is the expensive part. If we 
have to walk page tables - we're screwed.

> The cost of COW is TLB flushes*.  But for splice, there ARE NO TLB
> FLUSHES because such files are not mapped writable!

For splice, there are also no flags to set, no extra tracking costs, etc 
etc.

But yes, we could make splice (from a file) do something like

 - just fall back to copy if the page is already mapped (page->mapcount 
   gives us that)

 - set a bit ("splicemapped") when we splice it in, and increment 
   page->mapcount for each splice copy.

 - if a "splicemapped" page is ever mmap'ed or written to (either through 
   write or truncate), we COW it then (and actually move the page cache 
   page - it would be a "woc": a reverse cow, not a normal one).

 - do all of this with page lock held, to make sure that there are no 
   writers or new mappers happening.

So it's probably doable. 

(We could have a separate "splicecount", and actually allow non-writable 
mappings, but I suspect we cannot afford the space in teh "struct space" 
for a whole new count).

> You're missing the real point of network splice().
> 
> It's not just for speed.
> 
> It's for sharing data.  Your TCP buffers can share data, when the same
> big lump is in flight to lots of clients.  Think static file / web /
> FTP server, the kind with 80% of hits to 0.01% of the files roughly
> the same of your RAM.

Maybe. Does it really show up as a big thing?

			Linus

  reply	other threads:[~2008-07-31 18:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-30  9:43 [patch v3] splice: fix race with page invalidation Miklos Szeredi
2008-07-30 17:00 ` Linus Torvalds
2008-07-30 17:29   ` Miklos Szeredi
2008-07-30 17:54     ` Jens Axboe
2008-07-30 18:32       ` Miklos Szeredi
2008-07-30 18:43         ` Miklos Szeredi
2008-07-30 19:45           ` Jens Axboe
2008-07-30 20:05             ` Miklos Szeredi
2008-07-30 20:13               ` Linus Torvalds
2008-07-30 20:45                 ` Miklos Szeredi
2008-07-30 20:51                   ` Linus Torvalds
2008-07-30 21:16                     ` Miklos Szeredi
2008-07-30 21:22                       ` Linus Torvalds
2008-07-30 21:46                         ` Miklos Szeredi
2008-07-30 21:56                           ` Linus Torvalds
2008-07-31  0:11                   ` Jamie Lokier
2008-07-31  0:42                     ` Jamie Lokier
2008-07-31  0:51                       ` Linus Torvalds
2008-07-31  0:54                         ` Linus Torvalds
2008-07-31  6:12                         ` Jamie Lokier
2008-07-31 10:26                           ` Evgeniy Polyakov
2008-07-31 12:33                             ` Jamie Lokier
2008-07-31 12:49                               ` Nick Piggin
2008-07-31 13:29                               ` Evgeniy Polyakov
2008-07-31 16:56                                 ` Linus Torvalds
2008-07-31 16:34                           ` Linus Torvalds
2008-07-31 17:21                             ` Jamie Lokier
2008-07-31 18:54                               ` Linus Torvalds [this message]
2008-07-31  7:30                     ` Miklos Szeredi
2008-07-31  2:16       ` Nick Piggin
2008-07-31 12:59 ` Nick Piggin
2008-07-31 17:00   ` Linus Torvalds
2008-07-31 18:13     ` Miklos Szeredi
2008-08-01  1:22       ` Nick Piggin
2008-08-01 18:28         ` Miklos Szeredi
2008-08-01 18:32           ` Linus Torvalds
2008-08-02  4:26           ` Nick Piggin
2008-08-04 15:29             ` Jamie Lokier
2008-08-05  2:57               ` Nick Piggin
2008-08-11  3:22                 ` Michael Kerrisk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.10.0807311142510.3277@nehalem.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=jamie@shareable.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).