All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Ben Myers <bpm@sgi.com>, Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Jones <davej@redhat.com>,
	xfs@oss.sgi.com
Subject: Re: splice vs execve lockdep trace.
Date: Wed, 17 Jul 2013 06:43:35 +1000	[thread overview]
Message-ID: <20130716204335.GH11674@dastard> (raw)
In-Reply-To: <CA+55aFzTBUKStdZu1GhKoiYc2knybhiaUFr2By98QYew_STE=A@mail.gmail.com>

On Tue, Jul 16, 2013 at 01:18:06PM -0700, Linus Torvalds wrote:
> On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@sgi.com> wrote:
> >> >
> >> > And looking more at that, I'm actually starting to think this is an
> >> > XFS locking problem. XFS really should not call back to splice while
> >> > holding the inode lock.
> 
> .. that was misleading, normally "inode lock" would be i_lock, but
> here I meant the XFS-specific i_iolock.
> 
> But you clearly picked up on it:
> 
> > CPU0                            CPU1                            CPU2
> > ----                            ----                            ----
> > lock(&sig->cred_guard_mutex);
> >                                 lock(&pipe->mutex/1);
> >                                                                 lock(&(&ip->io_lock)->mr_lock);
> > lock(&(&ip->io_lock)->mr_lock);
> >                                 lock(&sig->cred_guard_mutex);
> >                                                                 lock(&pipe->mutex/1);
> 
> Yup.
> 
> > I agree that fixing this in XFS seems to be the most reasonable plan,
> > and Dave's approach looks ok to me.  Seems like patch 1 should go
> > through Al's tree, but we'll also need to commit it to the xfs tree
> > prerequisite to patch 2.
> 
> Btw, is there some reason why XFS couldn't just use
> generic_file_splice_read() directly?

Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We
don't use i_mutex for many things IO related, and so internal
locking is needed to serialise against stuff like truncate, hole
punching, etc, that are run through non-vfs interfaces.

> And splice has mmap semantics - the whole point of splice is about
> moving pages around, after all - so I *think* your current
> "xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization.

No, that's just taking the ip->i_iolock in shared mode - that's less
serialisation than holding i_mutex as it allow parallel read
operations but still locks out concurrent buffered writes to the
file (i.e. posix atomic write vs read requirements)

> The pages will be shared by the pipe buffers anyway, so splicing by
> definition doesn't imply full data serialization (because the reading
> of the data itself will happen much later).
> 
> So the per-page serialization done by readpage() should already be
> sufficient, no?
> 
> I dunno. Maybe there's something I'm missing. XFS does seem to wrap
> all the other generic functions in that lock too, but since mmap() etc
> clearly have to be able to get things one page at a time (without any
> wrapping at higher layers), I'm just wondering whether splice_read
> might not be able to avoid it.

Read isn't the problem - it's write that's the deadlock issue...

Cheers,

Dave.
> 
>                      Linus
> 

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Myers <bpm@sgi.com>, Peter Zijlstra <peterz@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Jones <davej@redhat.com>,
	xfs@oss.sgi.com
Subject: Re: splice vs execve lockdep trace.
Date: Wed, 17 Jul 2013 06:43:35 +1000	[thread overview]
Message-ID: <20130716204335.GH11674@dastard> (raw)
In-Reply-To: <CA+55aFzTBUKStdZu1GhKoiYc2knybhiaUFr2By98QYew_STE=A@mail.gmail.com>

On Tue, Jul 16, 2013 at 01:18:06PM -0700, Linus Torvalds wrote:
> On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@sgi.com> wrote:
> >> >
> >> > And looking more at that, I'm actually starting to think this is an
> >> > XFS locking problem. XFS really should not call back to splice while
> >> > holding the inode lock.
> 
> .. that was misleading, normally "inode lock" would be i_lock, but
> here I meant the XFS-specific i_iolock.
> 
> But you clearly picked up on it:
> 
> > CPU0                            CPU1                            CPU2
> > ----                            ----                            ----
> > lock(&sig->cred_guard_mutex);
> >                                 lock(&pipe->mutex/1);
> >                                                                 lock(&(&ip->io_lock)->mr_lock);
> > lock(&(&ip->io_lock)->mr_lock);
> >                                 lock(&sig->cred_guard_mutex);
> >                                                                 lock(&pipe->mutex/1);
> 
> Yup.
> 
> > I agree that fixing this in XFS seems to be the most reasonable plan,
> > and Dave's approach looks ok to me.  Seems like patch 1 should go
> > through Al's tree, but we'll also need to commit it to the xfs tree
> > prerequisite to patch 2.
> 
> Btw, is there some reason why XFS couldn't just use
> generic_file_splice_read() directly?

Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We
don't use i_mutex for many things IO related, and so internal
locking is needed to serialise against stuff like truncate, hole
punching, etc, that are run through non-vfs interfaces.

> And splice has mmap semantics - the whole point of splice is about
> moving pages around, after all - so I *think* your current
> "xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization.

No, that's just taking the ip->i_iolock in shared mode - that's less
serialisation than holding i_mutex as it allow parallel read
operations but still locks out concurrent buffered writes to the
file (i.e. posix atomic write vs read requirements)

> The pages will be shared by the pipe buffers anyway, so splicing by
> definition doesn't imply full data serialization (because the reading
> of the data itself will happen much later).
> 
> So the per-page serialization done by readpage() should already be
> sufficient, no?
> 
> I dunno. Maybe there's something I'm missing. XFS does seem to wrap
> all the other generic functions in that lock too, but since mmap() etc
> clearly have to be able to get things one page at a time (without any
> wrapping at higher layers), I'm just wondering whether splice_read
> might not be able to avoid it.

Read isn't the problem - it's write that's the deadlock issue...

Cheers,

Dave.
> 
>                      Linus
> 

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2013-07-16 20:43 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16  1:53 splice vs execve lockdep trace Dave Jones
2013-07-16  2:32 ` Linus Torvalds
2013-07-16  2:38   ` Dave Jones
2013-07-16  3:25     ` Linus Torvalds
2013-07-16  3:25       ` Linus Torvalds
2013-07-16  3:28       ` Dave Jones
2013-07-16  3:28         ` Dave Jones
2013-07-16  5:31       ` Al Viro
2013-07-16  5:31         ` Al Viro
2013-07-16  6:03       ` Dave Chinner
2013-07-16  6:03         ` Dave Chinner
2013-07-16  6:16         ` Al Viro
2013-07-16  6:16           ` Al Viro
2013-07-16  6:41           ` Dave Chinner
2013-07-16  6:41             ` Dave Chinner
2013-07-16  6:50           ` Dave Chinner
2013-07-16  6:50             ` Dave Chinner
2013-07-16 19:33         ` Ben Myers
2013-07-16 19:33           ` Ben Myers
2013-07-16 20:18           ` Linus Torvalds
2013-07-16 20:18             ` Linus Torvalds
2013-07-16 20:43             ` Dave Chinner [this message]
2013-07-16 20:43               ` Dave Chinner
2013-07-16 21:02               ` Linus Torvalds
2013-07-16 21:02                 ` Linus Torvalds
2013-07-17  4:06                 ` Dave Chinner
2013-07-17  4:06                   ` Dave Chinner
2013-07-17  4:54                   ` Linus Torvalds
2013-07-17  4:54                     ` Linus Torvalds
2013-07-17  5:51                     ` Dave Chinner
2013-07-17  5:51                       ` Dave Chinner
2013-07-17 16:03                       ` Linus Torvalds
2013-07-17 16:03                         ` Linus Torvalds
2013-07-17 23:40                         ` Ben Myers
2013-07-17 23:40                           ` Ben Myers
2013-07-18  0:17                           ` Linus Torvalds
2013-07-18  0:17                             ` Linus Torvalds
2013-07-18  3:42                             ` Dave Chinner
2013-07-18  3:42                               ` Dave Chinner
2013-07-18 21:16                               ` Ben Myers
2013-07-18 21:16                                 ` Ben Myers
2013-07-18 22:21                                 ` Ben Myers
2013-07-18 22:21                                   ` Ben Myers
2013-07-18 22:49                                   ` Dave Chinner
2013-07-18 22:49                                     ` Dave Chinner
2013-07-18  3:17                         ` Dave Chinner
2013-07-18  3:17                           ` Dave Chinner
2013-07-16 13:59       ` Vince Weaver
2013-07-16 13:59         ` Vince Weaver
2013-07-16 15:02         ` Dave Jones
2013-07-16 15:02           ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130716204335.GH11674@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.