public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Ben Myers <bpm@sgi.com>, Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Jones <davej@redhat.com>,
	xfs@oss.sgi.com
Subject: Re: splice vs execve lockdep trace.
Date: Wed, 17 Jul 2013 14:06:16 +1000	[thread overview]
Message-ID: <20130717040616.GI11674@dastard> (raw)
In-Reply-To: <CA+55aFwHMQd-VDeTDh-gm3jyj+5+FSoAHOeU47mwU-mKtEj9RQ@mail.gmail.com>

On Tue, Jul 16, 2013 at 02:02:12PM -0700, Linus Torvalds wrote:
> On Tue, Jul 16, 2013 at 1:43 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We
> > don't use i_mutex for many things IO related, and so internal
> > locking is needed to serialise against stuff like truncate, hole
> > punching, etc, that are run through non-vfs interfaces.
> 
> Umm. But the page IO isn't serialized by i_mutext *either*. You don't
> hold it across page faults. In fact you don't even take it at all
> across page faults.

Right, and that's one of the biggest problems page based IO has - we
can't serialise it against other IO and other page cache
manipulation functions like hole punching. What happens when a
splice read or mmap page fault races with a hole punch? You get
stale data being left in the page cache because we can't serialise
the page read with the page cache invalidation and underlying extent
removal.

Indeed, why do you think we've been talking about VFS-level IO range
locking for the past year or more, and had a discussion session at
LSF/MM this year on it? i.e. this:

http://lwn.net/Articles/548939/

So forget about this "we don't need no steenkin' IO serialisation"
concept - it's fundamentally broken.

FWIW, hole punching in XFS takes the i_iolock in exclusive
mode, and hence serialises correctly against splice. IOWs, there is
a whole class of splice read data corruption race conditions that
XFS is not susceptible to but....

> *Every* other local filesystem uses generic_file_splice_read() with
> just a single
> 
>      .splice_read = generic_file_splice_read,

... and so they all are broken in a nasty, subtle way....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-07-17  4:06 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130716015305.GB30569@redhat.com>
     [not found] ` <CA+55aFyLbqJp0-=7=HOF9sKGOHwsa7A7-V76b8tbsnra8Z2=-w@mail.gmail.com>
     [not found]   ` <20130716023847.GA31481@redhat.com>
2013-07-16  3:25     ` splice vs execve lockdep trace Linus Torvalds
2013-07-16  3:28       ` Dave Jones
2013-07-16  5:31       ` Al Viro
2013-07-16  6:03       ` Dave Chinner
2013-07-16  6:16         ` Al Viro
2013-07-16  6:41           ` Dave Chinner
2013-07-16  6:50           ` Dave Chinner
2013-07-16 19:33         ` Ben Myers
2013-07-16 20:18           ` Linus Torvalds
2013-07-16 20:43             ` Dave Chinner
2013-07-16 21:02               ` Linus Torvalds
2013-07-17  4:06                 ` Dave Chinner [this message]
2013-07-17  4:54                   ` Linus Torvalds
2013-07-17  5:51                     ` Dave Chinner
2013-07-17 16:03                       ` Linus Torvalds
2013-07-17 23:40                         ` Ben Myers
2013-07-18  0:17                           ` Linus Torvalds
2013-07-18  3:42                             ` Dave Chinner
2013-07-18 21:16                               ` Ben Myers
2013-07-18 22:21                                 ` Ben Myers
2013-07-18 22:49                                   ` Dave Chinner
2013-07-18  3:17                         ` Dave Chinner
2013-07-16 13:59       ` Vince Weaver
2013-07-16 15:02         ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130717040616.GI11674@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox