splice/tee bugs?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* splice/tee bugs?
@ 2006-07-07  7:07 Michael Kerrisk
  2006-07-07 11:07 ` Andrew Morton
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-07  7:07 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, michael.kerrisk

Hello Jens,

While editing and extending your draft man pages for
tee(), splice(), vmsplice() I've been testing out 
the splice()/tee() calls using a modified version of 
the program you provided in the tee.2 manual page.

The most notable differences between my program and yours
are:

* I print some debugging info to stderr.

* I don't pass SPLICE_F_NONBLOCK to tee().

I'm running this on kernel 2.6.17, using the following 
command line:

$ ls *.c  | ktee r  | wc

On different runs I see:

a) No output from ls through the pipeline:

tee returned 0
      0       0       0

b) Very many instances of EAGAIN followed by expected results:

...
EAGAIN
EAGAIN
EAGAIN
EAGAIN
EAGAIN
EAGAIN
tee returned 19
splice returned 19
tee returned 0
      2       2      19

In some of these cases the elapsed time to run the command-line 
is 1 or 2 seconds in this case (instead of the more typical 
0.05 seconds).

c) Occasionally the command line just hangs, producing no output.
   In this case I can't kill it with ^C or ^\.  This is a 
   hard-to-reproduce behaviour on my (x86) system, but I have 
   seen it several times by now.

Assuming I'm not messing up with my test method, some 
observations:

Result a) seems to be occurring because tee() returns 0 if its
in_fd is not yet "ready" to deliver data.  Shouldn't tee() 
be blocking in this case?  And should not 0 only 
be returned for EOF? on the input file descriptor?

If I uncomment the usleep() line in the program, this behavior 
does not occur--the program always just produces the expected
output:

tee returned 19
splice returned 19
tee returned 0
      2       2      19

For behaviour b) -- why does tee() give EAGAIN when
I haven't specified SPLICE_F_NONBLOCK?  (This is a 
philosophical question; I can see that there are code paths
that lead to EAGAIN without SPLICE_F_NONBLOCK, but that
seems confusing behaviour for userland.)

Behaviour c) hints of a bug in tee().

Your thoughts?

Cheers,

Michael

====

/* ktee.c */

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>
#include <errno.h>
#include <limits.h>

#if defined(__i386__)
#define __NR_splice     313
#define __NR_tee        315
#else
#error unsupported arch
#endif

#define SPLICE_F_MOVE   (0x01)  /* move pages instead of copying */
#define SPLICE_F_NONBLOCK (0x02) /* don't block on the pipe splicing (but */
                                 /* we may still block on the fd we splice */
                                 /* from/to, of course */
#define SPLICE_F_MORE   (0x04)  /* expect more data */
#define SPLICE_F_GIFT   (0x08)  /* pages passed in are a gift */

static inline int splice(int fdin, loff_t *off_in, int fdout,
                         loff_t *off_out, size_t len, unsigned int flags)
{
    return syscall(__NR_splice, fdin, off_in, fdout, off_out, len, flags);
}

static inline int tee(int fdin, int fdout, size_t len, unsigned int flags)
{
    return syscall(__NR_tee, fdin, fdout, len, flags);
}

int
main(int argc, char *argv[])
{
    int fd;
    int len, slen;

    assert(argc == 2);

    fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    //usleep(100000);
    do {
        /*
         * tee stdin to stdout.
         */
        len = tee(STDIN_FILENO, STDOUT_FILENO,
                  INT_MAX, 0);

        if (len < 0) {
            if (errno == EAGAIN) {
                fprintf(stderr, "EAGAIN\n");
                continue;
            }
            perror("tee");
            exit(EXIT_FAILURE);
        }
        fprintf(stderr, "tee returned %ld\n",  (long) len);
        if (len == 0)
            break;

        /*
         * Consume stdin by splicing it to a file.
         */
        while (len > 0) {
            slen = splice(STDIN_FILENO, NULL, fd, NULL,
                          len, SPLICE_F_MOVE);
            if (slen < 0) {
                perror("splice");
                break;
            }
            fprintf(stderr, "splice returned %ld\n", (long) slen);
            len -= slen;
        }
    } while (1);

    close(fd);
    exit(EXIT_SUCCESS);
}

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07  7:07 splice/tee bugs? Michael Kerrisk
@ 2006-07-07 11:07 ` Andrew Morton
  2006-07-07 11:42   ` Michael Kerrisk
  2006-07-07 16:13   ` Luiz Fernando N. Capitulino
  0 siblings, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2006-07-07 11:07 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: axboe, linux-kernel, michael.kerrisk

On Fri, 07 Jul 2006 09:07:03 +0200
"Michael Kerrisk" <mtk-manpages@gmx.net> wrote:

> c) Occasionally the command line just hangs, producing no output.
>    In this case I can't kill it with ^C or ^\.  This is a 
>    hard-to-reproduce behaviour on my (x86) system, but I have 
>    seen it several times by now.

aka local DoS.  Please capture sysrq-T output next time.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 11:07 ` Andrew Morton
@ 2006-07-07 11:42   ` Michael Kerrisk
  2006-07-07 12:03     ` Jens Axboe
  2006-07-07 16:13   ` Luiz Fernando N. Capitulino
  1 sibling, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-07 11:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: michael.kerrisk, linux-kernel, axboe

> > c) Occasionally the command line just hangs, producing no output.
> >    In this case I can't kill it with ^C or ^\.  This is a 
> >    hard-to-reproduce behaviour on my (x86) system, but I have 
> >    seen it several times by now.
> 
> aka local DoS.  Please capture sysrq-T output next time.

I don't have sysrq configured in the kernel that I'm testing at 
the moment (I'll build again with sysrq), but have just got 
the error again.  For what it's worth, "ps l" says:

F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000  7099   630  16   0      0     0 -      D+   pts/30     0:00 [ktee]

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 11:42   ` Michael Kerrisk
@ 2006-07-07 12:03     ` Jens Axboe
  2006-07-07 12:28       ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 12:03 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: Andrew Morton, michael.kerrisk, linux-kernel

On Fri, Jul 07 2006, Michael Kerrisk wrote:
> > > c) Occasionally the command line just hangs, producing no output.
> > >    In this case I can't kill it with ^C or ^\.  This is a 
> > >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > >    seen it several times by now.
> > 
> > aka local DoS.  Please capture sysrq-T output next time.
> 
> I don't have sysrq configured in the kernel that I'm testing at 
> the moment (I'll build again with sysrq), but have just got 
> the error again.  For what it's worth, "ps l" says:
> 
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 0  1000  7099   630  16   0      0     0 -      D+   pts/30     0:00 [ktee]

Try ps -eo cmd,wchan, it should give you a little more at least. But
sysrq-t is the best, of course.

I'll see about reproducing locally.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 12:03     ` Jens Axboe
@ 2006-07-07 12:28       ` Jens Axboe
  2006-07-07 12:31         ` Michael Kerrisk
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 12:28 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: Andrew Morton, michael.kerrisk, linux-kernel

On Fri, Jul 07 2006, Jens Axboe wrote:
> On Fri, Jul 07 2006, Michael Kerrisk wrote:
> > > > c) Occasionally the command line just hangs, producing no output.
> > > >    In this case I can't kill it with ^C or ^\.  This is a 
> > > >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > > >    seen it several times by now.
> > > 
> > > aka local DoS.  Please capture sysrq-T output next time.
> > 
> > I don't have sysrq configured in the kernel that I'm testing at 
> > the moment (I'll build again with sysrq), but have just got 
> > the error again.  For what it's worth, "ps l" says:
> > 
> > F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> > 0  1000  7099   630  16   0      0     0 -      D+   pts/30     0:00 [ktee]
> 
> Try ps -eo cmd,wchan, it should give you a little more at least. But
> sysrq-t is the best, of course.
> 
> I'll see about reproducing locally.

With your modified ktee, I can reproduce it here. Here's the ktee and wc
output:

ktee2         D 00000002     0 10027   3182         10028       (L-TLB)
       f5cd7da0 00000002 f5cd7d8c 00000002 f5cd7d48 c0148c1e f5cd7d58
c18a5914 
       c03edc80 c19a9f50 00000007 00000000 c1ff1ab0 b5d39d0a 0000003a
067a9ddd 
       c1ff1bc0 c19aa720 00000000 00000000 06d4480b 00000000 c0474880
c0474880 
Call Trace:
 [<c0389114>] __mutex_lock_slowpath+0x95/0x236
 [<c03892d1>] mutex_lock+0x1c/0x1f
 [<c016cff7>] pipe_read_fasync+0x24/0x57
 [<c016d2d4>] pipe_read_release+0x12/0x23
 [<c01623c7>] __fput+0x53/0x141
 [<c016250e>] fput+0x19/0x1c
 [<c015fc84>] filp_close+0x41/0x67
 [<c0121c1a>] put_files_struct+0xa6/0xb8
 [<c0122d06>] do_exit+0x124/0x8dd
 [<c01045a7>] do_trap+0x0/0x9e
 [<c01179d9>] do_page_fault+0x274/0x586
 [<c0103b6d>] error_code+0x39/0x40
 [<c0103039>] sysenter_past_esp+0x56/0x79

wc            D C1DB7F74     0 10028   3182               10027 (NOTLB)
       c1db7ec8 00000002 c1db7eb4 c1db7f74 00000246 00000101 00000001
00000000 
       00000003 c1db7f68 00000007 00000001 f6351ab0 af26d8c3 0000003a
0011c727 
       f6351bc0 c19b2720 00000002 00000044 001ce734 00000000 c0474880
c0474880 
Call Trace:
 [<c0389114>] __mutex_lock_slowpath+0x95/0x236
 [<c03892d1>] mutex_lock+0x1c/0x1f
 [<c016d910>] pipe_readv+0x54/0x3a9
 [<c016dc84>] pipe_read+0x1f/0x21
 [<c0161bbf>] vfs_read+0x85/0xf6
 [<c0162048>] sys_read+0x3d/0x64
 [<c0103039>] sysenter_past_esp+0x56/0x79

I'll dig around.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 12:28       ` Jens Axboe
@ 2006-07-07 12:31         ` Michael Kerrisk
  2006-07-07 12:41           ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-07 12:31 UTC (permalink / raw)
  To: Jens Axboe, mtk-manpages; +Cc: linux-kernel, akpm

Jens Axboe wrote:

> > > > >    In this case I can't kill it with ^C or ^\.  This is a 
> > > > >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > > > >    seen it several times by now.
> > > > 
> > > > aka local DoS.  Please capture sysrq-T output next time.
[...]
> > I'll see about reproducing locally.
> 
> With your modified ktee, I can reproduce it here. Here's the ktee and wc
> output:

Good; thanks.

By the way, what about points a) and b) in my original mail
in this thread?

Cheers,

Michael
-- 


"Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 12:31         ` Michael Kerrisk
@ 2006-07-07 12:41           ` Jens Axboe
  2006-07-07 13:12             ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 12:41 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: mtk-manpages, linux-kernel, akpm

On Fri, Jul 07 2006, Michael Kerrisk wrote:
> Jens Axboe wrote:
> 
> > > > > >    In this case I can't kill it with ^C or ^\.  This is a 
> > > > > >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > > > > >    seen it several times by now.
> > > > > 
> > > > > aka local DoS.  Please capture sysrq-T output next time.
> [...]
> > > I'll see about reproducing locally.
> > 
> > With your modified ktee, I can reproduce it here. Here's the ktee and wc
> > output:
> 
> Good; thanks.
> 
> By the way, what about points a) and b) in my original mail
> in this thread?

I'll look at them after this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 12:41           ` Jens Axboe
@ 2006-07-07 13:12             ` Jens Axboe
  2006-07-07 13:14               ` Jens Axboe
  2006-07-07 14:05               ` Michael Kerrisk
  0 siblings, 2 replies; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 13:12 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: mtk-manpages, linux-kernel, akpm

On Fri, Jul 07 2006, Jens Axboe wrote:
> On Fri, Jul 07 2006, Michael Kerrisk wrote:
> > Jens Axboe wrote:
> > 
> > > > > > >    In this case I can't kill it with ^C or ^\.  This is a 
> > > > > > >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > > > > > >    seen it several times by now.
> > > > > > 
> > > > > > aka local DoS.  Please capture sysrq-T output next time.
> > [...]
> > > > I'll see about reproducing locally.
> > > 
> > > With your modified ktee, I can reproduce it here. Here's the ktee and wc
> > > output:
> > 
> > Good; thanks.
> > 
> > By the way, what about points a) and b) in my original mail
> > in this thread?
> 
> I'll look at them after this.

I _think_ it was due to a bad check for ipipe->nrbufs, can you see if
this works for you? It also changes some other things:

- instead of returning EAGAIN on nothing tee'd because of the possible
  deadlock problem, release/regrab the ipipe/opipe mutex if we have to.
  This makes sys_tee block for that case if SPLICE_F_NONBLOCK isn't set.

- Check that ipipe and opipe differ to avoid possible deadlock if user
  gives the same pipe.

You can still see 0 results without SPLICE_F_NONBLOCK set, if we have no
writers for instance. This is expected, not much we can do about that as
we cannot block for that condition.

diff --git a/fs/splice.c b/fs/splice.c
index 05fd278..de323df 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1316,7 +1316,7 @@ static int link_pipe(struct pipe_inode_i
 	struct pipe_buffer *ibuf, *obuf;
 	int ret, do_wakeup, i, ipipe_first;
 
-	ret = do_wakeup = ipipe_first = 0;
+	i = ret = do_wakeup = ipipe_first = 0;
 
 	/*
 	 * Potential ABBA deadlock, work around it by ordering lock
@@ -1332,14 +1332,14 @@ static int link_pipe(struct pipe_inode_i
 		mutex_lock(&ipipe->inode->i_mutex);
 	}
 
-	for (i = 0;; i++) {
+	do {
 		if (!opipe->readers) {
 			send_sig(SIGPIPE, current, 0);
 			if (!ret)
 				ret = -EPIPE;
 			break;
 		}
-		if (ipipe->nrbufs - i) {
+		if (i < ipipe->nrbufs) {
 			ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (PIPE_BUFFERS - 1));
 
 			/*
@@ -1370,6 +1370,7 @@ static int link_pipe(struct pipe_inode_i
 				do_wakeup = 1;
 				ret += obuf->len;
 				len -= obuf->len;
+				i++;
 
 				if (!len)
 					break;
@@ -1379,11 +1380,9 @@ static int link_pipe(struct pipe_inode_i
 
 			/*
 			 * We have input available, but no output room.
-			 * If we already copied data, return that. If we
-			 * need to drop the opipe lock, it must be ordered
-			 * last to avoid deadlocks.
+			 * If we already copied data, return that.
 			 */
-			if ((flags & SPLICE_F_NONBLOCK) || !ipipe_first) {
+			if (flags & SPLICE_F_NONBLOCK) {
 				if (!ret)
 					ret = -EAGAIN;
 				break;
@@ -1400,10 +1399,22 @@ static int link_pipe(struct pipe_inode_i
 				kill_fasync(&opipe->fasync_readers, SIGIO, POLL_IN);
 				do_wakeup = 0;
 			}
+	
+			/*
+			 * To avoid ABBA deadlocks, we need to drop the ipipe
+			 * lock before dropping/grabbing the opipe lock in
+			 * pipe_wait().
+			 */
+			if (!ipipe_first)
+				mutex_unlock(&ipipe->inode->i_mutex);
 
 			opipe->waiting_writers++;
 			pipe_wait(opipe);
 			opipe->waiting_writers--;
+
+			if (!ipipe_first)
+				mutex_lock(&ipipe->inode->i_mutex);
+
 			continue;
 		}
 
@@ -1417,12 +1428,7 @@ static int link_pipe(struct pipe_inode_i
 			if (ret)
 				break;
 		}
-		/*
-		 * pipe_wait() drops the ipipe mutex. To avoid deadlocks
-		 * with another process, we can only safely do that if
-		 * the ipipe lock is ordered last.
-		 */
-		if ((flags & SPLICE_F_NONBLOCK) || ipipe_first) {
+		if (flags & SPLICE_F_NONBLOCK) {
 			if (!ret)
 				ret = -EAGAIN;
 			break;
@@ -1437,7 +1443,18 @@ static int link_pipe(struct pipe_inode_i
 			wake_up_interruptible_sync(&ipipe->wait);
 		kill_fasync(&ipipe->fasync_writers, SIGIO, POLL_OUT);
 
+		/*
+		 * To avoid ABBA deadlocks, we need to drop the ipipe
+		 * lock before dropping/grabbing the opipe lock in
+		 * pipe_wait().
+		 */
+		if (ipipe_first)
+			mutex_unlock(&opipe->inode->i_mutex);
+
 		pipe_wait(ipipe);
+
+		if (ipipe_first)
+			mutex_lock(&opipe->inode->i_mutex);
 	}
 
 	mutex_unlock(&ipipe->inode->i_mutex);
@@ -1468,7 +1485,7 @@ static long do_tee(struct file *in, stru
 	/*
 	 * Link ipipe to the two output pipes, consuming as we go along.
 	 */
-	if (ipipe && opipe)
+	if (ipipe && opipe && ipipe != opipe)
 		return link_pipe(ipipe, opipe, len, flags);
 
 	return -EINVAL;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 13:12             ` Jens Axboe
@ 2006-07-07 13:14               ` Jens Axboe
  2006-07-07 13:21                 ` Arjan van de Ven
  2006-07-07 14:05               ` Michael Kerrisk
  1 sibling, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 13:14 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: mtk-manpages, linux-kernel, akpm, Ingo Molnar

On Fri, Jul 07 2006, Jens Axboe wrote:
> On Fri, Jul 07 2006, Jens Axboe wrote:
> > On Fri, Jul 07 2006, Michael Kerrisk wrote:
> > > Jens Axboe wrote:
> > > 
> > > > > > > >    In this case I can't kill it with ^C or ^\.  This is a 
> > > > > > > >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > > > > > > >    seen it several times by now.
> > > > > > > 
> > > > > > > aka local DoS.  Please capture sysrq-T output next time.
> > > [...]
> > > > > I'll see about reproducing locally.
> > > > 
> > > > With your modified ktee, I can reproduce it here. Here's the ktee and wc
> > > > output:
> > > 
> > > Good; thanks.
> > > 
> > > By the way, what about points a) and b) in my original mail
> > > in this thread?
> > 
> > I'll look at them after this.
> 
> I _think_ it was due to a bad check for ipipe->nrbufs, can you see if
> this works for you? It also changes some other things:
> 
> - instead of returning EAGAIN on nothing tee'd because of the possible
>   deadlock problem, release/regrab the ipipe/opipe mutex if we have to.
>   This makes sys_tee block for that case if SPLICE_F_NONBLOCK isn't set.
> 
> - Check that ipipe and opipe differ to avoid possible deadlock if user
>   gives the same pipe.
> 
> You can still see 0 results without SPLICE_F_NONBLOCK set, if we have no
> writers for instance. This is expected, not much we can do about that as
> we cannot block for that condition.

BTW, I'm seeing an odd lockdep message on the first invocation of the
test:

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
ktee2/6208 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<c03922c6>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<c03922c6>] mutex_lock+0x1c/0x1f

other info that might help us debug this:
1 lock held by ktee2/6208:
 #0:  (&inode->i_mutex){--..}, at: [<c03922c6>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c01041ab>] show_trace+0x12/0x14
 [<c0104874>] dump_stack+0x19/0x1b
 [<c01399b6>] __lock_acquire+0x645/0xc77
 [<c013a32a>] lock_acquire+0x5d/0x79
 [<c0392082>] __mutex_lock_slowpath+0x6e/0x296
 [<c03922c6>] mutex_lock+0x1c/0x1f
 [<c018d37f>] sys_tee+0x292/0x4a4
 [<c0103075>] sysenter_past_esp+0x56/0x8d

I cannot see where this could be happening, Ingo is this valid?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 13:14               ` Jens Axboe
@ 2006-07-07 13:21                 ` Arjan van de Ven
  2006-07-07 13:26                   ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Arjan van de Ven @ 2006-07-07 13:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Kerrisk, mtk-manpages, linux-kernel, akpm, Ingo Molnar


> I cannot see where this could be happening, Ingo is this valid?

maybe the test found a way to exit the kernel previously while holding
the lock ?

that would be highly lethal in any scenario.. lockdep would just be the
messenger here


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 13:21                 ` Arjan van de Ven
@ 2006-07-07 13:26                   ` Jens Axboe
  2006-07-07 13:54                     ` Paulo Marques
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 13:26 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Michael Kerrisk, mtk-manpages, linux-kernel, akpm, Ingo Molnar

On Fri, Jul 07 2006, Arjan van de Ven wrote:
> 
> > I cannot see where this could be happening, Ingo is this valid?
> 
> maybe the test found a way to exit the kernel previously while holding
> the lock ?

I don't see how that could happen. The function in question is
fs/splice.c:link_pipe(). There are no returns in that function, it
always just breaks out and unlocks the two mutexes again.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 13:26                   ` Jens Axboe
@ 2006-07-07 13:54                     ` Paulo Marques
  2006-07-07 14:02                       ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Paulo Marques @ 2006-07-07 13:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Arjan van de Ven, Michael Kerrisk, mtk-manpages, linux-kernel,
	akpm, Ingo Molnar

Jens Axboe wrote:
> On Fri, Jul 07 2006, Arjan van de Ven wrote:
>>> I cannot see where this could be happening, Ingo is this valid?
>> maybe the test found a way to exit the kernel previously while holding
>> the lock ?
> 
> I don't see how that could happen. The function in question is
> fs/splice.c:link_pipe(). There are no returns in that function, it
> always just breaks out and unlocks the two mutexes again.

AFAICS, in the case that you don't release any lock before entering 
pipe_wait (because of the lock ordering), pipe_wait just releases one of 
the locks and then schedules with the other lock still held.

BTW, the comment over the second pipe_wait was copy+pasted and is 
reversed ;)

-- 
Paulo Marques - www.grupopie.com

Pointy-Haired Boss: I don't see anything that could stand in our way.
            Dilbert: Sanity? Reality? The laws of physics?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 13:54                     ` Paulo Marques
@ 2006-07-07 14:02                       ` Jens Axboe
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 14:02 UTC (permalink / raw)
  To: Paulo Marques
  Cc: Arjan van de Ven, Michael Kerrisk, mtk-manpages, linux-kernel,
	akpm, Ingo Molnar

On Fri, Jul 07 2006, Paulo Marques wrote:
> Jens Axboe wrote:
> >On Fri, Jul 07 2006, Arjan van de Ven wrote:
> >>>I cannot see where this could be happening, Ingo is this valid?
> >>maybe the test found a way to exit the kernel previously while holding
> >>the lock ?
> >
> >I don't see how that could happen. The function in question is
> >fs/splice.c:link_pipe(). There are no returns in that function, it
> >always just breaks out and unlocks the two mutexes again.
> 
> AFAICS, in the case that you don't release any lock before entering 
> pipe_wait (because of the lock ordering), pipe_wait just releases one of 
> the locks and then schedules with the other lock still held.

That should not violate the lock ordering, though. I'm testing an easier
fix now, basically always grabbing the ipipe mutex first and never
blocking on the input pipe. Makes sense too, we will attempt to dupe the
contents of that pipe from when sys_tee() was invoked. We cannot
reliably have the pipe changing too much in progress anyway.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 13:12             ` Jens Axboe
  2006-07-07 13:14               ` Jens Axboe
@ 2006-07-07 14:05               ` Michael Kerrisk
  2006-07-07 14:08                 ` Jens Axboe
  1 sibling, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-07 14:05 UTC (permalink / raw)
  To: Jens Axboe, michael.kerrisk; +Cc: akpm, linux-kernel

Jens, thep atch does not compile...

CC      fs/splice.o
fs/splice.c: In function 'link_pipe':
fs/splice.c:1448: error: expected 'while' before 'mutex_unlock'
make[1]: *** [fs/splice.o] Error 1
make: *** [fs] Error 2

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 14:05               ` Michael Kerrisk
@ 2006-07-07 14:08                 ` Jens Axboe
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2006-07-07 14:08 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: michael.kerrisk, akpm, linux-kernel

On Fri, Jul 07 2006, Michael Kerrisk wrote:
> Jens, thep atch does not compile...
> 
> CC      fs/splice.o
> fs/splice.c: In function 'link_pipe':
> fs/splice.c:1448: error: expected 'while' before 'mutex_unlock'
> make[1]: *** [fs/splice.o] Error 1
> make: *** [fs] Error 2

Woops, missing an ending while (0); I'll send out a new one, I need to
be happy with it first...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 11:07 ` Andrew Morton
  2006-07-07 11:42   ` Michael Kerrisk
@ 2006-07-07 16:13   ` Luiz Fernando N. Capitulino
  2006-07-07 21:43     ` Luiz Fernando N. Capitulino
  2006-07-08  6:41     ` Jens Axboe
  1 sibling, 2 replies; 34+ messages in thread
From: Luiz Fernando N. Capitulino @ 2006-07-07 16:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael Kerrisk, axboe, linux-kernel, michael.kerrisk, vendor-sec

On Fri, 7 Jul 2006 04:07:49 -0700
Andrew Morton <akpm@osdl.org> wrote:

| On Fri, 07 Jul 2006 09:07:03 +0200
| "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
| 
| > c) Occasionally the command line just hangs, producing no output.
| >    In this case I can't kill it with ^C or ^\.  This is a 
| >    hard-to-reproduce behaviour on my (x86) system, but I have 
| >    seen it several times by now.
| 
| aka local DoS.  Please capture sysrq-T output next time.

 If I run lots of them in parallel, I get the following OOPs in a few
seconds:

Jul  7 13:04:52 doriath kernel: [  105.041722] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
Jul  7 13:04:52 doriath kernel: [  105.048885]  printing eip:
Jul  7 13:04:52 doriath kernel: [  105.056095] c01790c7
Jul  7 13:04:52 doriath kernel: [  105.056097] *pde = 00000000
Jul  7 13:04:52 doriath kernel: [  105.063516] Oops: 0000 [#1]
Jul  7 13:04:52 doriath kernel: [  105.071116] Modules linked in: ipv6 capability commoncap snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq via_rhine mii snd_pcm_oss snd_mixer_oss af_packet snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore rfcomm l2cap bluetooth ide_cd cdrom binfmt_misc loop sata_via libata scsi_mod video thermal processor fan container button battery asus_acpi ac amd64_agp agpgart ehci_hcd uhci_hcd usbcore xfs
Jul  7 13:04:52 doriath kernel: [  105.129492] CPU:    0
Jul  7 13:04:52 doriath kernel: [  105.129494] EIP:    0060:[sys_tee+371/924]    Not tainted VLI
Jul  7 13:04:52 doriath kernel: [  105.129494] EIP:    0060:[<c01790c7>]    Not tainted VLI
Jul  7 13:04:52 doriath kernel: [  105.129495] EFLAGS: 00010293   (2.6.18-rc1 #8) 
Jul  7 13:04:52 doriath kernel: [  105.170966] EIP is at sys_tee+0x173/0x39c
Jul  7 13:04:52 doriath kernel: [  105.185414] eax: d62bfa00   ebx: 00000000   ecx: 00000000   edx: d62bfa98
Jul  7 13:04:52 doriath kernel: [  105.200731] esi: d7434800   edi: d62bfa98   ebp: d5d5cfb4   esp: d5d5cf84
Jul  7 13:04:52 doriath kernel: [  105.216341] ds: 007b   es: 007b   ss: 0068
Jul  7 13:04:52 doriath kernel: [  105.232017] Process ktee (pid: 12605, ti=d5d5c000 task=d9cce0b0 task.ti=d5d5c000)
Jul  7 13:04:52 doriath kernel: [  105.233023] Stack: d5eede40 00000000 d827ac00 00000002 00000000 d62bfa00 00000000 00000000 
Jul  7 13:04:52 doriath kernel: [  105.250147]        00000000 00000000 00000000 b7f72920 d5d5c000 c0102b7d 00000000 00000001 
Jul  7 13:04:52 doriath kernel: [  105.267904]        7fffffff 00000000 b7f72920 bf8f37b8 0000013b 0000007b 0000007b 0000013b 
Jul  7 13:04:52 doriath kernel: [  105.286091] Call Trace:
Jul  7 13:04:52 doriath kernel: [  105.321546]  [show_stack_log_lvl+140/151] show_stack_log_lvl+0x8c/0x97
Jul  7 13:04:52 doriath kernel: [  105.321546]  [<c010422c>] show_stack_log_lvl+0x8c/0x97
Jul  7 13:04:52 doriath kernel: [  105.340519]  [show_registers+292/401] show_registers+0x124/0x191
Jul  7 13:04:52 doriath kernel: [  105.340519]  [<c0104397>] show_registers+0x124/0x191
Jul  7 13:04:52 doriath kernel: [  105.359642]  [die+332/617] die+0x14c/0x269
Jul  7 13:04:53 doriath kernel: [  105.359642]  [<c0104550>] die+0x14c/0x269
Jul  7 13:04:53 doriath kernel: [  105.378978]  [do_page_fault+1091/1310] do_page_fault+0x443/0x51e
Jul  7 13:04:53 doriath kernel: [  105.378978]  [<c02a6521>] do_page_fault+0x443/0x51e
Jul  7 13:04:53 doriath kernel: [  105.398696]  [error_code+57/64] error_code+0x39/0x40
Jul  7 13:04:53 doriath kernel: [  105.398696]  [<c0103d49>] error_code+0x39/0x40
Jul  7 13:04:53 doriath kernel: [  105.418612]  [sysenter_past_esp+86/121] sysenter_past_esp+0x56/0x79
Jul  7 13:04:54 doriath kernel: [  105.418612]  [<c0102b7d>] sysenter_past_esp+0x56/0x79
Jul  7 13:04:54 doriath kernel: [  105.438935] Code: 00 00 00 89 d0 8b 55 e4 03 42 6c 83 e0 0f 6b c0 14 8d 7c 10 70 8b 46 68 89 45 e0 83 f8 0f 77 5c 8b 4f 0c 8b 5e 6c 89 fa 8b 45 e4 <ff> 51 18 03 5d e0 83 e3 0f 89 fa 6b db 14 b9 14 00 00 00 8d 5c 
Jul  7 13:04:54 doriath kernel: [  105.506704] EIP: [sys_tee+371/924] sys_tee+0x173/0x39c SS:ESP 0068:d5d5cf84
Jul  7 13:04:54 doriath kernel: [  105.506704] EIP: [<c01790c7>] sys_tee+0x173/0x39c SS:ESP 0068:d5d5cf84

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 16:13   ` Luiz Fernando N. Capitulino
@ 2006-07-07 21:43     ` Luiz Fernando N. Capitulino
  2006-07-08  6:41     ` Jens Axboe
  1 sibling, 0 replies; 34+ messages in thread
From: Luiz Fernando N. Capitulino @ 2006-07-07 21:43 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: Andrew Morton, Michael Kerrisk, axboe, linux-kernel,
	michael.kerrisk, vendor-sec

On Fri, 7 Jul 2006 13:13:10 -0300
"Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:

| On Fri, 7 Jul 2006 04:07:49 -0700
| Andrew Morton <akpm@osdl.org> wrote:
| 
| | On Fri, 07 Jul 2006 09:07:03 +0200
| | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
| | 
| | > c) Occasionally the command line just hangs, producing no output.
| | >    In this case I can't kill it with ^C or ^\.  This is a 
| | >    hard-to-reproduce behaviour on my (x86) system, but I have 
| | >    seen it several times by now.
| | 
| | aka local DoS.  Please capture sysrq-T output next time.
| 
|  If I run lots of them in parallel, I get the following OOPs in a few
| seconds:
| 
| Jul  7 13:04:52 doriath kernel: [  105.041722] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
| Jul  7 13:04:52 doriath kernel: [  105.048885]  printing eip:
| Jul  7 13:04:52 doriath kernel: [  105.056095] c01790c7
| Jul  7 13:04:52 doriath kernel: [  105.056097] *pde = 00000000
| Jul  7 13:04:52 doriath kernel: [  105.063516] Oops: 0000 [#1]
| Jul  7 13:04:52 doriath kernel: [  105.071116] Modules linked in: ipv6 capability commoncap snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq via_rhine mii snd_pcm_oss snd_mixer_oss af_packet snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore rfcomm l2cap bluetooth ide_cd cdrom binfmt_misc loop sata_via libata scsi_mod video thermal processor fan container button battery asus_acpi ac amd64_agp agpgart ehci_hcd uhci_hcd usbcore xfs
| Jul  7 13:04:52 doriath kernel: [  105.129492] CPU:    0
| Jul  7 13:04:52 doriath kernel: [  105.129494] EIP:    0060:[sys_tee+371/924]    Not tainted VLI
| Jul  7 13:04:52 doriath kernel: [  105.129494] EIP:    0060:[<c01790c7>]    Not tainted VLI
| Jul  7 13:04:52 doriath kernel: [  105.129495] EFLAGS: 00010293   (2.6.18-rc1 #8) 
| Jul  7 13:04:52 doriath kernel: [  105.170966] EIP is at sys_tee+0x173/0x39c
| Jul  7 13:04:52 doriath kernel: [  105.185414] eax: d62bfa00   ebx: 00000000   ecx: 00000000   edx: d62bfa98
| Jul  7 13:04:52 doriath kernel: [  105.200731] esi: d7434800   edi: d62bfa98   ebp: d5d5cfb4   esp: d5d5cf84
| Jul  7 13:04:52 doriath kernel: [  105.216341] ds: 007b   es: 007b   ss: 0068
| Jul  7 13:04:52 doriath kernel: [  105.232017] Process ktee (pid: 12605, ti=d5d5c000 task=d9cce0b0 task.ti=d5d5c000)
| Jul  7 13:04:52 doriath kernel: [  105.233023] Stack: d5eede40 00000000 d827ac00 00000002 00000000 d62bfa00 00000000 00000000 
| Jul  7 13:04:52 doriath kernel: [  105.250147]        00000000 00000000 00000000 b7f72920 d5d5c000 c0102b7d 00000000 00000001 
| Jul  7 13:04:52 doriath kernel: [  105.267904]        7fffffff 00000000 b7f72920 bf8f37b8 0000013b 0000007b 0000007b 0000013b 
| Jul  7 13:04:52 doriath kernel: [  105.286091] Call Trace:
| Jul  7 13:04:52 doriath kernel: [  105.321546]  [show_stack_log_lvl+140/151] show_stack_log_lvl+0x8c/0x97
| Jul  7 13:04:52 doriath kernel: [  105.321546]  [<c010422c>] show_stack_log_lvl+0x8c/0x97
| Jul  7 13:04:52 doriath kernel: [  105.340519]  [show_registers+292/401] show_registers+0x124/0x191
| Jul  7 13:04:52 doriath kernel: [  105.340519]  [<c0104397>] show_registers+0x124/0x191
| Jul  7 13:04:52 doriath kernel: [  105.359642]  [die+332/617] die+0x14c/0x269
| Jul  7 13:04:53 doriath kernel: [  105.359642]  [<c0104550>] die+0x14c/0x269
| Jul  7 13:04:53 doriath kernel: [  105.378978]  [do_page_fault+1091/1310] do_page_fault+0x443/0x51e
| Jul  7 13:04:53 doriath kernel: [  105.378978]  [<c02a6521>] do_page_fault+0x443/0x51e
| Jul  7 13:04:53 doriath kernel: [  105.398696]  [error_code+57/64] error_code+0x39/0x40
| Jul  7 13:04:53 doriath kernel: [  105.398696]  [<c0103d49>] error_code+0x39/0x40
| Jul  7 13:04:53 doriath kernel: [  105.418612]  [sysenter_past_esp+86/121] sysenter_past_esp+0x56/0x79
| Jul  7 13:04:54 doriath kernel: [  105.418612]  [<c0102b7d>] sysenter_past_esp+0x56/0x79
| Jul  7 13:04:54 doriath kernel: [  105.438935] Code: 00 00 00 89 d0 8b 55 e4 03 42 6c 83 e0 0f 6b c0 14 8d 7c 10 70 8b 46 68 89 45 e0 83 f8 0f 77 5c 8b 4f 0c 8b 5e 6c 89 fa 8b 45 e4 <ff> 51 18 03 5d e0 83 e3 0f 89 fa 6b db 14 b9 14 00 00 00 8d 5c 
| Jul  7 13:04:54 doriath kernel: [  105.506704] EIP: [sys_tee+371/924] sys_tee+0x173/0x39c SS:ESP 0068:d5d5cf84
| Jul  7 13:04:54 doriath kernel: [  105.506704] EIP: [<c01790c7>] sys_tee+0x173/0x39c SS:ESP 0068:d5d5cf84
| 

 Reproducible with 2.6.17.4, can we get a CVE number for this?

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
@ 2006-07-08  5:33 Chuck Ebbert
  0 siblings, 0 replies; 34+ messages in thread
From: Chuck Ebbert @ 2006-07-08  5:33 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: vendor-sec, Michael Kerrisk, linux-kernel, Jens Axboe,
	Andrew Morton

In-Reply-To: <20060707131310.0e382585@doriath.conectiva>

On Fri, 7 Jul 2006 13:13:10 -0300, Luiz Fernando N. Capitulino wrote:

> | > c) Occasionally the command line just hangs, producing no output.
> | >    In this case I can't kill it with ^C or ^\.  This is a 
> | >    hard-to-reproduce behaviour on my (x86) system, but I have 
> | >    seen it several times by now.
> | 
> | aka local DoS.  Please capture sysrq-T output next time.
> 
>  If I run lots of them in parallel, I get the following OOPs in a few
> seconds:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
>  printing eip:
> c01790c7
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: ipv6 capability commoncap snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq via_rhine mii snd_pcm_oss snd_mixer_oss af_packet snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart sn
> CPU:    0
> EIP:    0060:[sys_tee+371/924]    Not tainted VLI
> EIP:    0060:[<c01790c7>]    Not tainted VLI
> EFLAGS: 00010293   (2.6.18-rc1 #8) 
> EIP is at sys_tee+0x173/0x39c
> eax: d62bfa00   ebx: 00000000   ecx: 00000000   edx: d62bfa98
> esi: d7434800   edi: d62bfa98   ebp: d5d5cfb4   esp: d5d5cf84
> ds: 007b   es: 007b   ss: 0068
> Process ktee (pid: 12605, ti=d5d5c000 task=d9cce0b0 task.ti=d5d5c000)
> Stack: d5eede40 00000000 d827ac00 00000002 00000000 d62bfa00 00000000 00000000 
>        00000000 00000000 00000000 b7f72920 d5d5c000 c0102b7d 00000000 00000001 
>        7fffffff 00000000 b7f72920 bf8f37b8 0000013b 0000007b 0000007b 0000013b 
> Call Trace:
>  [<c010422c>] show_stack_log_lvl+0x8c/0x97
>  [<c0104397>] show_registers+0x124/0x191
>  [<c0104550>] die+0x14c/0x269
>  [<c02a6521>] do_page_fault+0x443/0x51e
>  [<c0103d49>] error_code+0x39/0x40
>  [<c0102b7d>] sysenter_past_esp+0x56/0x79
> Code: 00 00 00 89 d0 8b 55 e4 03 42 6c 83 e0 0f 6b c0 14 8d 7c 10 70 8b 46 68 89 45 e0 83 f8 0f 77 5c 8b 4f 0c 8b 5e 6c 89 fa 8b 45 e4 <ff> 51 18 03 5d e0 83 e3 0f 89 fa 6b db 14 b9 14 00 00 00 8d 5c 


ibuf->ops is NULL in the below code (fs/splice.c line 1355 in 2.6.18-rc1)


static int link_pipe(struct pipe_inode_info *ipipe,
                     struct pipe_inode_info *opipe,
                     size_t len, unsigned int flags)
{
        struct pipe_buffer *ibuf, *obuf;
        int ret, do_wakeup, i, ipipe_first;

        ret = do_wakeup = ipipe_first = 0;

        /*
         * Potential ABBA deadlock, work around it by ordering lock
         * grabbing by inode address. Otherwise two different processes
         * could deadlock (one doing tee from A -> B, the other from B -> A).
         */
        if (ipipe->inode < opipe->inode) {
                ipipe_first = 1;
                mutex_lock(&ipipe->inode->i_mutex);
                mutex_lock(&opipe->inode->i_mutex);
        } else {
                mutex_lock(&opipe->inode->i_mutex);
                mutex_lock(&ipipe->inode->i_mutex);
        }

        for (i = 0;; i++) {
                if (!opipe->readers) {
                        send_sig(SIGPIPE, current, 0);
                        if (!ret)
                                ret = -EPIPE;
                        break;
                }
                if (ipipe->nrbufs - i) {
                        ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (PIPE_BUFFERS - 1));

                        /*
                         * If we have room, fill this buffer
                         */
                        if (opipe->nrbufs < PIPE_BUFFERS) {
                                int nbuf = (opipe->curbuf + opipe->nrbufs) & (PIPE_BUFFERS - 1);

                                /*
                                 * Get a reference to this pipe buffer,
                                 * so we can copy the contents over.
                                 */
========>                       ibuf->ops->get(ipipe, ibuf);

                                obuf = opipe->bufs + nbuf;
                                *obuf = *ibuf;

-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-07 16:13   ` Luiz Fernando N. Capitulino
  2006-07-07 21:43     ` Luiz Fernando N. Capitulino
@ 2006-07-08  6:41     ` Jens Axboe
  2006-07-08 21:09       ` Luiz Fernando N. Capitulino
  1 sibling, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-08  6:41 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: Andrew Morton, Michael Kerrisk, linux-kernel, michael.kerrisk,
	vendor-sec

On Fri, Jul 07 2006, Luiz Fernando N. Capitulino wrote:
> On Fri, 7 Jul 2006 04:07:49 -0700
> Andrew Morton <akpm@osdl.org> wrote:
> 
> | On Fri, 07 Jul 2006 09:07:03 +0200
> | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
> | 
> | > c) Occasionally the command line just hangs, producing no output.
> | >    In this case I can't kill it with ^C or ^\.  This is a 
> | >    hard-to-reproduce behaviour on my (x86) system, but I have 
> | >    seen it several times by now.
> | 
> | aka local DoS.  Please capture sysrq-T output next time.
> 
>  If I run lots of them in parallel, I get the following OOPs in a few
> seconds:

With the patch posted? You need the i vs nrbufs fix.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-08  6:41     ` Jens Axboe
@ 2006-07-08 21:09       ` Luiz Fernando N. Capitulino
  2006-07-09 10:36         ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Luiz Fernando N. Capitulino @ 2006-07-08 21:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Michael Kerrisk, linux-kernel, michael.kerrisk,
	vendor-sec


 Hi Jens,

On Sat, 8 Jul 2006 08:41:32 +0200
Jens Axboe <axboe@suse.de> wrote:

| On Fri, Jul 07 2006, Luiz Fernando N. Capitulino wrote:
| > On Fri, 7 Jul 2006 04:07:49 -0700
| > Andrew Morton <akpm@osdl.org> wrote:
| > 
| > | On Fri, 07 Jul 2006 09:07:03 +0200
| > | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
| > | 
| > | > c) Occasionally the command line just hangs, producing no output.
| > | >    In this case I can't kill it with ^C or ^\.  This is a 
| > | >    hard-to-reproduce behaviour on my (x86) system, but I have 
| > | >    seen it several times by now.
| > | 
| > | aka local DoS.  Please capture sysrq-T output next time.
| > 
| >  If I run lots of them in parallel, I get the following OOPs in a few
| > seconds:
| 
| With the patch posted? You need the i vs nrbufs fix.

 Yes, it fixes the problem. I didn't try it before because I thought
you were going to double check it [1].

 Is it suitable for -stable then?

[1] http://lkml.org/lkml/2006/7/7/158

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-08 21:09       ` Luiz Fernando N. Capitulino
@ 2006-07-09 10:36         ` Jens Axboe
  2006-07-09 11:16           ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-09 10:36 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: Andrew Morton, Michael Kerrisk, linux-kernel, michael.kerrisk,
	vendor-sec

On Sat, Jul 08 2006, Luiz Fernando N. Capitulino wrote:
> 
>  Hi Jens,
> 
> On Sat, 8 Jul 2006 08:41:32 +0200
> Jens Axboe <axboe@suse.de> wrote:
> 
> | On Fri, Jul 07 2006, Luiz Fernando N. Capitulino wrote:
> | > On Fri, 7 Jul 2006 04:07:49 -0700
> | > Andrew Morton <akpm@osdl.org> wrote:
> | > 
> | > | On Fri, 07 Jul 2006 09:07:03 +0200
> | > | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
> | > | 
> | > | > c) Occasionally the command line just hangs, producing no output.
> | > | >    In this case I can't kill it with ^C or ^\.  This is a 
> | > | >    hard-to-reproduce behaviour on my (x86) system, but I have 
> | > | >    seen it several times by now.
> | > | 
> | > | aka local DoS.  Please capture sysrq-T output next time.
> | > 
> | >  If I run lots of them in parallel, I get the following OOPs in a few
> | > seconds:
> | 
> | With the patch posted? You need the i vs nrbufs fix.
> 
>  Yes, it fixes the problem. I didn't try it before because I thought
> you were going to double check it [1].

Yeah the patch needs reworking, however the isolated i vs nrbufs fix is
safe enough on its own. I'll post a full patch for inclusion, I'm afraid
I wont be able to fully test it enough for submitting it until tomorrow
though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-09 10:36         ` Jens Axboe
@ 2006-07-09 11:16           ` Jens Axboe
  2006-07-09 16:47             ` Luiz Fernando N. Capitulino
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-09 11:16 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: Andrew Morton, Michael Kerrisk, linux-kernel, michael.kerrisk,
	vendor-sec

On Sun, Jul 09 2006, Jens Axboe wrote:
> On Sat, Jul 08 2006, Luiz Fernando N. Capitulino wrote:
> > 
> >  Hi Jens,
> > 
> > On Sat, 8 Jul 2006 08:41:32 +0200
> > Jens Axboe <axboe@suse.de> wrote:
> > 
> > | On Fri, Jul 07 2006, Luiz Fernando N. Capitulino wrote:
> > | > On Fri, 7 Jul 2006 04:07:49 -0700
> > | > Andrew Morton <akpm@osdl.org> wrote:
> > | > 
> > | > | On Fri, 07 Jul 2006 09:07:03 +0200
> > | > | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
> > | > | 
> > | > | > c) Occasionally the command line just hangs, producing no output.
> > | > | >    In this case I can't kill it with ^C or ^\.  This is a 
> > | > | >    hard-to-reproduce behaviour on my (x86) system, but I have 
> > | > | >    seen it several times by now.
> > | > | 
> > | > | aka local DoS.  Please capture sysrq-T output next time.
> > | > 
> > | >  If I run lots of them in parallel, I get the following OOPs in a few
> > | > seconds:
> > | 
> > | With the patch posted? You need the i vs nrbufs fix.
> > 
> >  Yes, it fixes the problem. I didn't try it before because I thought
> > you were going to double check it [1].
> 
> Yeah the patch needs reworking, however the isolated i vs nrbufs fix is
> safe enough on its own. I'll post a full patch for inclusion, I'm afraid
> I wont be able to fully test it enough for submitting it until tomorrow
> though.

Something like this, testing would be appreciated! Michael, can you
repeat your testing as well? Thanks.

diff --git a/fs/splice.c b/fs/splice.c
index 05fd278..ecf72bc 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1307,6 +1307,69 @@ asmlinkage long sys_splice(int fd_in, lo
 }
 
 /*
+ * Make sure there's data to read. Wait for input if we can, otherwise
+ * return an appropriate error.
+ */
+static int link_ipipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
+{
+	int ret = 0;
+
+	mutex_lock(&pipe->inode->i_mutex);
+
+	while (!pipe->nrbufs) {
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+		if (!pipe->writers)
+			break;
+		if (!pipe->waiting_writers) {
+			if (flags & SPLICE_F_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+		}
+		pipe_wait(pipe);
+	}
+	
+	mutex_unlock(&pipe->inode->i_mutex);
+	return ret;
+}
+
+/*
+ * Make sure there's writeable room. Wait for room if we can, otherwise
+ * return an appropriate error.
+ */
+static int link_opipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
+{
+	int ret = 0;
+
+	mutex_lock(&pipe->inode->i_mutex);
+
+	while (pipe->nrbufs >= PIPE_BUFFERS) {
+		if (!pipe->readers) {
+			send_sig(SIGPIPE, current, 0);
+			ret = -EPIPE;
+			break;
+		}
+		if (flags & SPLICE_F_NONBLOCK) {
+			ret = -EAGAIN;
+			break;
+		}
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+		pipe->waiting_writers++;
+		pipe_wait(pipe);
+		pipe->waiting_writers--;
+	}
+
+	mutex_unlock(&pipe->inode->i_mutex);
+	return ret;
+}
+
+/*
  * Link contents of ipipe to opipe.
  */
 static int link_pipe(struct pipe_inode_info *ipipe,
@@ -1314,9 +1377,9 @@ static int link_pipe(struct pipe_inode_i
 		     size_t len, unsigned int flags)
 {
 	struct pipe_buffer *ibuf, *obuf;
-	int ret, do_wakeup, i, ipipe_first;
+	int ret, i, nbuf;
 
-	ret = do_wakeup = ipipe_first = 0;
+	i = ret = 0;
 
 	/*
 	 * Potential ABBA deadlock, work around it by ordering lock
@@ -1324,131 +1387,65 @@ static int link_pipe(struct pipe_inode_i
 	 * could deadlock (one doing tee from A -> B, the other from B -> A).
 	 */
 	if (ipipe->inode < opipe->inode) {
-		ipipe_first = 1;
-		mutex_lock(&ipipe->inode->i_mutex);
-		mutex_lock(&opipe->inode->i_mutex);
+		mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_CHILD);
 	} else {
-		mutex_lock(&opipe->inode->i_mutex);
-		mutex_lock(&ipipe->inode->i_mutex);
+		mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_CHILD);
 	}
 
-	for (i = 0;; i++) {
+	do {
 		if (!opipe->readers) {
 			send_sig(SIGPIPE, current, 0);
 			if (!ret)
 				ret = -EPIPE;
 			break;
 		}
-		if (ipipe->nrbufs - i) {
-			ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (PIPE_BUFFERS - 1));
-
-			/*
-			 * If we have room, fill this buffer
-			 */
-			if (opipe->nrbufs < PIPE_BUFFERS) {
-				int nbuf = (opipe->curbuf + opipe->nrbufs) & (PIPE_BUFFERS - 1);
 
-				/*
-				 * Get a reference to this pipe buffer,
-				 * so we can copy the contents over.
-				 */
-				ibuf->ops->get(ipipe, ibuf);
-
-				obuf = opipe->bufs + nbuf;
-				*obuf = *ibuf;
-
-				/*
-				 * Don't inherit the gift flag, we need to
-				 * prevent multiple steals of this page.
-				 */
-				obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
-
-				if (obuf->len > len)
-					obuf->len = len;
-
-				opipe->nrbufs++;
-				do_wakeup = 1;
-				ret += obuf->len;
-				len -= obuf->len;
-
-				if (!len)
-					break;
-				if (opipe->nrbufs < PIPE_BUFFERS)
-					continue;
-			}
-
-			/*
-			 * We have input available, but no output room.
-			 * If we already copied data, return that. If we
-			 * need to drop the opipe lock, it must be ordered
-			 * last to avoid deadlocks.
-			 */
-			if ((flags & SPLICE_F_NONBLOCK) || !ipipe_first) {
-				if (!ret)
-					ret = -EAGAIN;
-				break;
-			}
-			if (signal_pending(current)) {
-				if (!ret)
-					ret = -ERESTARTSYS;
-				break;
-			}
-			if (do_wakeup) {
-				smp_mb();
-				if (waitqueue_active(&opipe->wait))
-					wake_up_interruptible(&opipe->wait);
-				kill_fasync(&opipe->fasync_readers, SIGIO, POLL_IN);
-				do_wakeup = 0;
-			}
+		/*
+		 * If we have iterated all input buffers or ran out of
+		 * output room, break.
+		 */
+		if (i >= ipipe->nrbufs || opipe->nrbufs >= PIPE_BUFFERS)
+			break;
 
-			opipe->waiting_writers++;
-			pipe_wait(opipe);
-			opipe->waiting_writers--;
-			continue;
-		}
+		ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (PIPE_BUFFERS - 1));
+		nbuf = (opipe->curbuf + opipe->nrbufs) & (PIPE_BUFFERS - 1);
 
 		/*
-		 * No input buffers, do the usual checks for available
-		 * writers and blocking and wait if necessary
+		 * Get a reference to this pipe buffer,
+		 * so we can copy the contents over.
 		 */
-		if (!ipipe->writers)
-			break;
-		if (!ipipe->waiting_writers) {
-			if (ret)
-				break;
-		}
+		ibuf->ops->get(ipipe, ibuf);
+
+		obuf = opipe->bufs + nbuf;
+		*obuf = *ibuf;
+
 		/*
-		 * pipe_wait() drops the ipipe mutex. To avoid deadlocks
-		 * with another process, we can only safely do that if
-		 * the ipipe lock is ordered last.
+		 * Don't inherit the gift flag, we need to
+		 * prevent multiple steals of this page.
 		 */
-		if ((flags & SPLICE_F_NONBLOCK) || ipipe_first) {
-			if (!ret)
-				ret = -EAGAIN;
-			break;
-		}
-		if (signal_pending(current)) {
-			if (!ret)
-				ret = -ERESTARTSYS;
-			break;
-		}
+		obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 
-		if (waitqueue_active(&ipipe->wait))
-			wake_up_interruptible_sync(&ipipe->wait);
-		kill_fasync(&ipipe->fasync_writers, SIGIO, POLL_OUT);
+		if (obuf->len > len)
+			obuf->len = len;
 
-		pipe_wait(ipipe);
-	}
+		opipe->nrbufs++;
+		ret += obuf->len;
+		len -= obuf->len;
+		i++;
+	} while (len);
 
 	mutex_unlock(&ipipe->inode->i_mutex);
 	mutex_unlock(&opipe->inode->i_mutex);
 
-	if (do_wakeup) {
+	if (ret) {
 		smp_mb();
 		if (waitqueue_active(&opipe->wait))
 			wake_up_interruptible(&opipe->wait);
 		kill_fasync(&opipe->fasync_readers, SIGIO, POLL_IN);
-	}
+	} else if (flags & SPLICE_F_NONBLOCK)
+		ret = -EAGAIN;
 
 	return ret;
 }
@@ -1464,12 +1461,23 @@ static long do_tee(struct file *in, stru
 {
 	struct pipe_inode_info *ipipe = in->f_dentry->d_inode->i_pipe;
 	struct pipe_inode_info *opipe = out->f_dentry->d_inode->i_pipe;
+	int ret;
 
 	/*
-	 * Link ipipe to the two output pipes, consuming as we go along.
+	 * Duplicate the contents of ipipe to opipe without actually
+	 * copying the data.
 	 */
-	if (ipipe && opipe)
+	if (ipipe && opipe && ipipe != opipe) {
+		ret = link_ipipe_prep(ipipe, flags);
+		if (unlikely(ret))
+			return ret;
+
+		ret = link_opipe_prep(opipe, flags);
+		if (unlikely(ret))
+			return ret;
+
 		return link_pipe(ipipe, opipe, len, flags);
+	}
 
 	return -EINVAL;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-09 11:16           ` Jens Axboe
@ 2006-07-09 16:47             ` Luiz Fernando N. Capitulino
  2006-07-09 17:57               ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Luiz Fernando N. Capitulino @ 2006-07-09 16:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Michael Kerrisk, linux-kernel, michael.kerrisk,
	vendor-sec

On Sun, 9 Jul 2006 13:16:29 +0200
Jens Axboe <axboe@suse.de> wrote:

| On Sun, Jul 09 2006, Jens Axboe wrote:
| > On Sat, Jul 08 2006, Luiz Fernando N. Capitulino wrote:
| > > 
| > >  Hi Jens,
| > > 
| > > On Sat, 8 Jul 2006 08:41:32 +0200
| > > Jens Axboe <axboe@suse.de> wrote:
| > > 
| > > | On Fri, Jul 07 2006, Luiz Fernando N. Capitulino wrote:
| > > | > On Fri, 7 Jul 2006 04:07:49 -0700
| > > | > Andrew Morton <akpm@osdl.org> wrote:
| > > | > 
| > > | > | On Fri, 07 Jul 2006 09:07:03 +0200
| > > | > | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
| > > | > | 
| > > | > | > c) Occasionally the command line just hangs, producing no output.
| > > | > | >    In this case I can't kill it with ^C or ^\.  This is a 
| > > | > | >    hard-to-reproduce behaviour on my (x86) system, but I have 
| > > | > | >    seen it several times by now.
| > > | > | 
| > > | > | aka local DoS.  Please capture sysrq-T output next time.
| > > | > 
| > > | >  If I run lots of them in parallel, I get the following OOPs in a few
| > > | > seconds:
| > > | 
| > > | With the patch posted? You need the i vs nrbufs fix.
| > > 
| > >  Yes, it fixes the problem. I didn't try it before because I thought
| > > you were going to double check it [1].
| > 
| > Yeah the patch needs reworking, however the isolated i vs nrbufs fix is
| > safe enough on its own. I'll post a full patch for inclusion, I'm afraid
| > I wont be able to fully test it enough for submitting it until tomorrow
| > though.
| 
| Something like this, testing would be appreciated! Michael, can you
| repeat your testing as well? Thanks.

 Yeah, it fixes the problem for 2.6.18-rc1.

 But doesn't compile for 2.6.17.4:

  CC      fs/splice.o
fs/splice.c: In function `link_pipe':
fs/splice.c:1378: warning: implicit declaration of function `mutex_lock_nested'
fs/splice.c:1378: error: `I_MUTEX_PARENT' undeclared (first use in this function)
fs/splice.c:1378: error: (Each undeclared identifier is reported only once
fs/splice.c:1378: error: for each function it appears in.)
fs/splice.c:1379: error: `I_MUTEX_CHILD' undeclared (first use in this function)
make[1]: ** [fs/splice.o] Erro 1
make: ** [fs] Erro 2

 Should we use the first patch for it? It does work too.

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-09 16:47             ` Luiz Fernando N. Capitulino
@ 2006-07-09 17:57               ` Jens Axboe
  2006-07-10  6:25                 ` Michael Kerrisk
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-09 17:57 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: Andrew Morton, Michael Kerrisk, linux-kernel, michael.kerrisk,
	vendor-sec

On Sun, Jul 09 2006, Luiz Fernando N. Capitulino wrote:
> On Sun, 9 Jul 2006 13:16:29 +0200
> Jens Axboe <axboe@suse.de> wrote:
> 
> | On Sun, Jul 09 2006, Jens Axboe wrote:
> | > On Sat, Jul 08 2006, Luiz Fernando N. Capitulino wrote:
> | > > 
> | > >  Hi Jens,
> | > > 
> | > > On Sat, 8 Jul 2006 08:41:32 +0200
> | > > Jens Axboe <axboe@suse.de> wrote:
> | > > 
> | > > | On Fri, Jul 07 2006, Luiz Fernando N. Capitulino wrote:
> | > > | > On Fri, 7 Jul 2006 04:07:49 -0700
> | > > | > Andrew Morton <akpm@osdl.org> wrote:
> | > > | > 
> | > > | > | On Fri, 07 Jul 2006 09:07:03 +0200
> | > > | > | "Michael Kerrisk" <mtk-manpages@gmx.net> wrote:
> | > > | > | 
> | > > | > | > c) Occasionally the command line just hangs, producing no output.
> | > > | > | >    In this case I can't kill it with ^C or ^\.  This is a 
> | > > | > | >    hard-to-reproduce behaviour on my (x86) system, but I have 
> | > > | > | >    seen it several times by now.
> | > > | > | 
> | > > | > | aka local DoS.  Please capture sysrq-T output next time.
> | > > | > 
> | > > | >  If I run lots of them in parallel, I get the following OOPs in a few
> | > > | > seconds:
> | > > | 
> | > > | With the patch posted? You need the i vs nrbufs fix.
> | > > 
> | > >  Yes, it fixes the problem. I didn't try it before because I thought
> | > > you were going to double check it [1].
> | > 
> | > Yeah the patch needs reworking, however the isolated i vs nrbufs fix is
> | > safe enough on its own. I'll post a full patch for inclusion, I'm afraid
> | > I wont be able to fully test it enough for submitting it until tomorrow
> | > though.
> | 
> | Something like this, testing would be appreciated! Michael, can you
> | repeat your testing as well? Thanks.
> 
>  Yeah, it fixes the problem for 2.6.18-rc1.
> 
>  But doesn't compile for 2.6.17.4:
> 
>   CC      fs/splice.o
> fs/splice.c: In function `link_pipe':
> fs/splice.c:1378: warning: implicit declaration of function `mutex_lock_nested'
> fs/splice.c:1378: error: `I_MUTEX_PARENT' undeclared (first use in this function)
> fs/splice.c:1378: error: (Each undeclared identifier is reported only once
> fs/splice.c:1378: error: for each function it appears in.)
> fs/splice.c:1379: error: `I_MUTEX_CHILD' undeclared (first use in this function)
> make[1]: ** [fs/splice.o] Erro 1
> make: ** [fs] Erro 2
> 
>  Should we use the first patch for it? It does work too.

No, I'll rebase the patch for 2.6.17.x - basically you just need to
change the two mutex_lock_nested() to mutex_lock() and that is it. But
first I'd like Michael to retest as well (and more importantly, I'll do
some testing myself too).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-09 17:57               ` Jens Axboe
@ 2006-07-10  6:25                 ` Michael Kerrisk
  2006-07-10  6:43                   ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-10  6:25 UTC (permalink / raw)
  To: Jens Axboe, lcapitulino; +Cc: vendor-sec, linux-kernel, mtk-manpages, akpm

> >  Should we use the first patch for it? It does work too.
> 
> No, I'll rebase the patch for 2.6.17.x - basically you just need to
> change the two mutex_lock_nested() to mutex_lock() and that is it. But
> first I'd like Michael to retest as well (and more importantly, I'll do
> some testing myself too).

Jens,

Could you post a 2.6.17 patch please.

Cheers,

Michael
-- 


"Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  6:25                 ` Michael Kerrisk
@ 2006-07-10  6:43                   ` Jens Axboe
  2006-07-10  8:09                     ` Michael Kerrisk
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-10  6:43 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: lcapitulino, vendor-sec, linux-kernel, mtk-manpages, akpm

On Mon, Jul 10 2006, Michael Kerrisk wrote:
> > >  Should we use the first patch for it? It does work too.
> > 
> > No, I'll rebase the patch for 2.6.17.x - basically you just need to
> > change the two mutex_lock_nested() to mutex_lock() and that is it. But
> > first I'd like Michael to retest as well (and more importantly, I'll do
> > some testing myself too).
> 
> Jens,
> 
> Could you post a 2.6.17 patch please.

Here's a 2.6.17.x version.

--- linux-2.6.17/fs/splice.c~	2006-07-10 08:42:55.000000000 +0200
+++ linux-2.6.17/fs/splice.c	2006-07-10 08:43:20.000000000 +0200
@@ -1295,6 +1295,85 @@
 }
 
 /*
+ * Make sure there's data to read. Wait for input if we can, otherwise
+ * return an appropriate error.
+ */
+static int link_ipipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
+{
+	int ret;
+
+	/*
+	 * Check ->nrbufs without the inode lock first. This function
+	 * is speculative anyways, so missing one is ok.
+	 */
+	if (pipe->nrbufs)
+		return 0;
+
+	ret = 0;
+	mutex_lock(&pipe->inode->i_mutex);
+
+	while (!pipe->nrbufs) {
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+		if (!pipe->writers)
+			break;
+		if (!pipe->waiting_writers) {
+			if (flags & SPLICE_F_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+		}
+		pipe_wait(pipe);
+	}
+
+	mutex_unlock(&pipe->inode->i_mutex);
+	return ret;
+}
+
+/*
+ * Make sure there's writeable room. Wait for room if we can, otherwise
+ * return an appropriate error.
+ */
+static int link_opipe_prep(struct pipe_inode_info *pipe, unsigned int flags)
+{
+	int ret;
+
+	/*
+	 * Check ->nrbufs without the inode lock first. This function
+	 * is speculative anyways, so missing one is ok.
+	 */
+	if (pipe->nrbufs < PIPE_BUFFERS)
+		return 0;
+
+	ret = 0;
+	mutex_lock(&pipe->inode->i_mutex);
+
+	while (pipe->nrbufs >= PIPE_BUFFERS) {
+		if (!pipe->readers) {
+			send_sig(SIGPIPE, current, 0);
+			ret = -EPIPE;
+			break;
+		}
+		if (flags & SPLICE_F_NONBLOCK) {
+			ret = -EAGAIN;
+			break;
+		}
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+		pipe->waiting_writers++;
+		pipe_wait(pipe);
+		pipe->waiting_writers--;
+	}
+
+	mutex_unlock(&pipe->inode->i_mutex);
+	return ret;
+}
+
+/*
  * Link contents of ipipe to opipe.
  */
 static int link_pipe(struct pipe_inode_info *ipipe,
@@ -1302,9 +1381,9 @@
 		     size_t len, unsigned int flags)
 {
 	struct pipe_buffer *ibuf, *obuf;
-	int ret, do_wakeup, i, ipipe_first;
+	int ret, i, nbuf;
 
-	ret = do_wakeup = ipipe_first = 0;
+	i = ret = 0;
 
 	/*
 	 * Potential ABBA deadlock, work around it by ordering lock
@@ -1312,7 +1391,6 @@
 	 * could deadlock (one doing tee from A -> B, the other from B -> A).
 	 */
 	if (ipipe->inode < opipe->inode) {
-		ipipe_first = 1;
 		mutex_lock(&ipipe->inode->i_mutex);
 		mutex_lock(&opipe->inode->i_mutex);
 	} else {
@@ -1320,123 +1398,58 @@
 		mutex_lock(&ipipe->inode->i_mutex);
 	}
 
-	for (i = 0;; i++) {
+	do {
 		if (!opipe->readers) {
 			send_sig(SIGPIPE, current, 0);
 			if (!ret)
 				ret = -EPIPE;
 			break;
 		}
-		if (ipipe->nrbufs - i) {
-			ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (PIPE_BUFFERS - 1));
-
-			/*
-			 * If we have room, fill this buffer
-			 */
-			if (opipe->nrbufs < PIPE_BUFFERS) {
-				int nbuf = (opipe->curbuf + opipe->nrbufs) & (PIPE_BUFFERS - 1);
-
-				/*
-				 * Get a reference to this pipe buffer,
-				 * so we can copy the contents over.
-				 */
-				ibuf->ops->get(ipipe, ibuf);
-
-				obuf = opipe->bufs + nbuf;
-				*obuf = *ibuf;
-
-				/*
-				 * Don't inherit the gift flag, we need to
-				 * prevent multiple steals of this page.
-				 */
-				obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
-
-				if (obuf->len > len)
-					obuf->len = len;
-
-				opipe->nrbufs++;
-				do_wakeup = 1;
-				ret += obuf->len;
-				len -= obuf->len;
-
-				if (!len)
-					break;
-				if (opipe->nrbufs < PIPE_BUFFERS)
-					continue;
-			}
 
-			/*
-			 * We have input available, but no output room.
-			 * If we already copied data, return that. If we
-			 * need to drop the opipe lock, it must be ordered
-			 * last to avoid deadlocks.
-			 */
-			if ((flags & SPLICE_F_NONBLOCK) || !ipipe_first) {
-				if (!ret)
-					ret = -EAGAIN;
-				break;
-			}
-			if (signal_pending(current)) {
-				if (!ret)
-					ret = -ERESTARTSYS;
-				break;
-			}
-			if (do_wakeup) {
-				smp_mb();
-				if (waitqueue_active(&opipe->wait))
-					wake_up_interruptible(&opipe->wait);
-				kill_fasync(&opipe->fasync_readers, SIGIO, POLL_IN);
-				do_wakeup = 0;
-			}
+		/*
+		 * If we have iterated all input buffers or ran out of
+		 * output room, break.
+		 */
+		if (i >= ipipe->nrbufs || opipe->nrbufs >= PIPE_BUFFERS)
+			break;
 
-			opipe->waiting_writers++;
-			pipe_wait(opipe);
-			opipe->waiting_writers--;
-			continue;
-		}
+		ibuf = ipipe->bufs + ((ipipe->curbuf + i) & (PIPE_BUFFERS - 1));
+		nbuf = (opipe->curbuf + opipe->nrbufs) & (PIPE_BUFFERS - 1);
 
 		/*
-		 * No input buffers, do the usual checks for available
-		 * writers and blocking and wait if necessary
+		 * Get a reference to this pipe buffer,
+		 * so we can copy the contents over.
 		 */
-		if (!ipipe->writers)
-			break;
-		if (!ipipe->waiting_writers) {
-			if (ret)
-				break;
-		}
+		ibuf->ops->get(ipipe, ibuf);
+
+		obuf = opipe->bufs + nbuf;
+		*obuf = *ibuf;
+
 		/*
-		 * pipe_wait() drops the ipipe mutex. To avoid deadlocks
-		 * with another process, we can only safely do that if
-		 * the ipipe lock is ordered last.
+		 * Don't inherit the gift flag, we need to
+		 * prevent multiple steals of this page.
 		 */
-		if ((flags & SPLICE_F_NONBLOCK) || ipipe_first) {
-			if (!ret)
-				ret = -EAGAIN;
-			break;
-		}
-		if (signal_pending(current)) {
-			if (!ret)
-				ret = -ERESTARTSYS;
-			break;
-		}
+		obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 
-		if (waitqueue_active(&ipipe->wait))
-			wake_up_interruptible_sync(&ipipe->wait);
-		kill_fasync(&ipipe->fasync_writers, SIGIO, POLL_OUT);
+		if (obuf->len > len)
+			obuf->len = len;
 
-		pipe_wait(ipipe);
-	}
+		opipe->nrbufs++;
+		ret += obuf->len;
+		len -= obuf->len;
+		i++;
+	} while (len);
 
 	mutex_unlock(&ipipe->inode->i_mutex);
 	mutex_unlock(&opipe->inode->i_mutex);
 
-	if (do_wakeup) {
+	if (ret) {
 		smp_mb();
 		if (waitqueue_active(&opipe->wait))
 			wake_up_interruptible(&opipe->wait);
 		kill_fasync(&opipe->fasync_readers, SIGIO, POLL_IN);
-	}
+	} else if (flags & SPLICE_F_NONBLOCK)
+		ret = -EAGAIN;
 
 	return ret;
 }
@@ -1452,12 +1465,23 @@
 {
 	struct pipe_inode_info *ipipe = in->f_dentry->d_inode->i_pipe;
 	struct pipe_inode_info *opipe = out->f_dentry->d_inode->i_pipe;
+	int ret;
 
 	/*
-	 * Link ipipe to the two output pipes, consuming as we go along.
+	 * Duplicate the contents of ipipe to opipe without actually
+	 * copying the data.
 	 */
-	if (ipipe && opipe)
+	if (ipipe && opipe && ipipe != opipe) {
+		ret = link_ipipe_prep(ipipe, flags);
+		if (unlikely(ret))
+			return ret;
+
+		ret = link_opipe_prep(opipe, flags);
+		if (unlikely(ret))
+			return ret;
+
 		return link_pipe(ipipe, opipe, len, flags);
+	}
 
 	return -EINVAL;
 }

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  6:43                   ` Jens Axboe
@ 2006-07-10  8:09                     ` Michael Kerrisk
  2006-07-10  8:24                       ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-10  8:09 UTC (permalink / raw)
  To: Jens Axboe, michael.kerrisk; +Cc: akpm, linux-kernel, vendor-sec, lcapitulino

> > Could you post a 2.6.17 patch please.
> 
> Here's a 2.6.17.x version.

Jens,

Thanks.  I applied your patch against 2.6.17(.0), and did some
testing using my modified version of your test program, using 
the same command line: ls *.c | ktee r | wc, and also running 
several instances of the program in parallel using the 
command line:

find . | ktee r | wc

which in my test directory produces this output:

tee returned 65536
splice returned 65536
tee returned 65536
splice returned 65536
tee returned 53248
splice returned 53248
tee returned 57344
splice returned 57344
tee returned 7245
splice returned 7245
tee returned 0
   6212    6213  248909

Things look good so far: runs produce the results I expect, and 
no OOPSes (which Luiz Fernando reported when running multiple
instances in parallel, but I didn't see myself because I didn't
try doing that with vanilla 2.6.17) and no command-line hangs.

I'll quote my original mail in this thread, with a few questions,
and then note one lingering strange behaviour.

> The most notable differences between my program and yours
> are:
>
> * I print some debugging info to stderr.
>
> * I don't pass SPLICE_F_NONBLOCK to tee().
[...]
> On different runs I see:
>
> a) No output from ls through the pipeline:
>
> tee returned 0
>       0       0       0

I am no longer seeing results like this. So am I correct in 
understanding that tee() should only return 0 on EOF?

And is the same true of splice()?  (There is no statement 
about 0 returns from splice() in your draft manual page.)

> b) Very many instances of EAGAIN followed by expected results:
>
> ...
> EAGAIN
> EAGAIN
> EAGAIN
> EAGAIN
> EAGAIN
> EAGAIN
> tee returned 19
> splice returned 19
> tee returned 0
>       2       2      19
[...]

I no longer see results like this.  From another of your mails
in this thread, I gather that intended behaviour is that EAGAIN
will only occur if SPLICE_F_NONBLOCK has been set, right?

> c) Occasionally the command line just hangs, producing no output.
>    In this case I can't kill it with ^C or ^\.  This is a
>    hard-to-reproduce behaviour on my (x86) system, but I have
>    seen it several times by now.

I no longer see this behaviour (at least so far, after quite a
bit of testing).

One slight strangeness.  Most of the time, the 
"find . | ktee r | wc" command line takes about 0.1 seconds to 
execute, but about 1 time in 5 on my x86 system, it takes about 
1.5 to 2 seconds to execute.  Any ideas about what's happening 
there?

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  8:09                     ` Michael Kerrisk
@ 2006-07-10  8:24                       ` Jens Axboe
  2006-07-10  8:40                         ` Michael Kerrisk
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-10  8:24 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: michael.kerrisk, akpm, linux-kernel, vendor-sec, lcapitulino

On Mon, Jul 10 2006, Michael Kerrisk wrote:
> > > Could you post a 2.6.17 patch please.
> > 
> > Here's a 2.6.17.x version.
> 
> Jens,
> 
> Thanks.  I applied your patch against 2.6.17(.0), and did some
> testing using my modified version of your test program, using 
> the same command line: ls *.c | ktee r | wc, and also running 
> several instances of the program in parallel using the 
> command line:
> 
> find . | ktee r | wc
> 
> which in my test directory produces this output:
> 
> tee returned 65536
> splice returned 65536
> tee returned 65536
> splice returned 65536
> tee returned 53248
> splice returned 53248
> tee returned 57344
> splice returned 57344
> tee returned 7245
> splice returned 7245
> tee returned 0
>    6212    6213  248909
> 
> Things look good so far: runs produce the results I expect, and 
> no OOPSes (which Luiz Fernando reported when running multiple
> instances in parallel, but I didn't see myself because I didn't
> try doing that with vanilla 2.6.17) and no command-line hangs.

So far, so good.

> > The most notable differences between my program and yours
> > are:
> >
> > * I print some debugging info to stderr.
> >
> > * I don't pass SPLICE_F_NONBLOCK to tee().
> [...]
> > On different runs I see:
> >
> > a) No output from ls through the pipeline:
> >
> > tee returned 0
> >       0       0       0
> 
> I am no longer seeing results like this. So am I correct in 
> understanding that tee() should only return 0 on EOF?

tee() can still return 0 without SPLICE_F_NONBLOCK being set, if the
pipes are changed in between the _prep calls and link_pipe(). There's
really nothing we can do about that. There's no EOF condition for
link_pipe(), as it purely operates on pipes. A 0 return means that we
had no data to splice and could not wait for data, either because it
would be a locking violation or because it simply doesn't make sense to
wait (eg no writers attached to the pipe). It will only return EAGAIN
for a non-blocking tee() now though.

> And is the same true of splice()?  (There is no statement 
> about 0 returns from splice() in your draft manual page.)

Same holds true for splice. We can still return 0 even for a blocking
splice if there's no data to splice from the pipe and no writers
attached. This is identical to how pipes behave.

> > b) Very many instances of EAGAIN followed by expected results:
> >
> > ...
> > EAGAIN
> > EAGAIN
> > EAGAIN
> > EAGAIN
> > EAGAIN
> > EAGAIN
> > tee returned 19
> > splice returned 19
> > tee returned 0
> >       2       2      19
> [...]
> 
> I no longer see results like this.  From another of your mails
> in this thread, I gather that intended behaviour is that EAGAIN
> will only occur if SPLICE_F_NONBLOCK has been set, right?

Correct.

> > c) Occasionally the command line just hangs, producing no output.
> >    In this case I can't kill it with ^C or ^\.  This is a
> >    hard-to-reproduce behaviour on my (x86) system, but I have
> >    seen it several times by now.
> 
> I no longer see this behaviour (at least so far, after quite a
> bit of testing).

Good, it should be fixed with the blocking removal from link_pipe().

> One slight strangeness.  Most of the time, the 
> "find . | ktee r | wc" command line takes about 0.1 seconds to 
> execute, but about 1 time in 5 on my x86 system, it takes about 
> 1.5 to 2 seconds to execute.  Any ideas about what's happening 
> there?

That is pretty odd. Any chance you can do a quick sysrq-t and see where
find/ktee/wc is stuck when this happens? You should not be seeing that,
naturally, I'll see if I can reproduce that here. How much data does
find . return in your example?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  8:24                       ` Jens Axboe
@ 2006-07-10  8:40                         ` Michael Kerrisk
  2006-07-10  8:46                           ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-10  8:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: lcapitulino, vendor-sec, linux-kernel, akpm, michael.kerrisk

Jens,

> > Thanks.  I applied your patch against 2.6.17(.0), and did some
> > testing using my modified version of your test program, using 
> > the same command line: ls *.c | ktee r | wc, and also running 
> > several instances of the program in parallel using the 
> > command line:
> > 
> > find . | ktee r | wc
> > 
> > which in my test directory produces this output:
> > 
> > tee returned 65536
> > splice returned 65536
> > tee returned 65536
> > splice returned 65536
> > tee returned 53248
> > splice returned 53248
> > tee returned 57344
> > splice returned 57344
> > tee returned 7245
> > splice returned 7245
> > tee returned 0
> >    6212    6213  248909
> > 
> > Things look good so far: runs produce the results I expect, and 
> > no OOPSes (which Luiz Fernando reported when running multiple
> > instances in parallel, but I didn't see myself because I didn't
> > try doing that with vanilla 2.6.17) and no command-line hangs.
> 
> So far, so good.
> 
> > > The most notable differences between my program and yours
> > > are:
> > >
> > > * I print some debugging info to stderr.
> > >
> > > * I don't pass SPLICE_F_NONBLOCK to tee().
> > [...]
> > > On different runs I see:
> > >
> > > a) No output from ls through the pipeline:
> > >
> > > tee returned 0
> > >       0       0       0
> > 
> > I am no longer seeing results like this. So am I correct in 
> > understanding that tee() should only return 0 on EOF?
> 
> tee() can still return 0 without SPLICE_F_NONBLOCK being set, if the
> pipes are changed in between the _prep calls and link_pipe(). There's
> really nothing we can do about that. There's no EOF condition for
> link_pipe(), as it purely operates on pipes. A 0 return means that we
> had no data to splice and could not wait for data, either because it
> would be a locking violation or because it simply doesn't make sense to
> wait (eg no writers attached to the pipe). It will only return EAGAIN
> for a non-blocking tee() now though.

Okay.

[...]

> > > c) Occasionally the command line just hangs, producing no output.
> > >    In this case I can't kill it with ^C or ^\.  This is a
> > >    hard-to-reproduce behaviour on my (x86) system, but I have
> > >    seen it several times by now.
> > 
> > I no longer see this behaviour (at least so far, after quite a
> > bit of testing).
> 
> Good, it should be fixed with the blocking removal from link_pipe().
> 
> > One slight strangeness.  Most of the time, the 
> > "find . | ktee r | wc" command line takes about 0.1 seconds to 
> > execute, but about 1 time in 5 on my x86 system, it takes about 
> > 1.5 to 2 seconds to execute.  Any ideas about what's happening 
> > there?
> 
> That is pretty odd. Any chance you can do a quick sysrq-t and see where
> find/ktee/wc is stuck when this happens? You should not be seeing that,
> naturally, I'll see if I can reproduce that here. How much data does
> find . return in your example?

See the start of this message.

One sysrq-t output output below.

Cheers,

Michael


find          D B9099C00     0 14170   4167         14171       (NOTLB)
   ca279d04 00118054 00000008 b9099c00 003d0ca2 e7b0b9f8 00000009 d307f688
   d307f580 c0459dc0 c1507620 b9099c00 003d0ca2 00000000 00000000 00118054
   00000001 00001000 c015010e e647f5ac e647f5b8 00000046 00000000 00000000
Call Trace:
 <c015010e> __getblk+0x1d/0x225
 <c03e1e7c> io_schedule+0x26/0x30
 <c0150874> sync_buffer+0x37/0x3a
 <c03e25fd> __wait_on_bit+0x33/0x59
 <c015083d> sync_buffer+0x0/0x3a
 <c03e2695> out_of_line_wait_on_bit+0x72/0x7a
 <c015083d> sync_buffer+0x0/0x3a
 <c0127dd5> wake_bit_function+0x0/0x34
 <c019e39b> search_by_key+0x133/0xd91
 <c0110a1c> do_page_fault+0x0/0x532
 <c0189e2d> search_by_entry_key+0x20/0x22f
 <c015dc76> filldir64+0x8e/0xc3
 <c019e214> pathrelse+0x1b/0x2f
 <c019388c> reiserfs_readdir+0x3e3/0x3fb
 <c0193895> reiserfs_readdir+0x3ec/0x3fb
 <c018d6e5> reiserfs_update_sd_size+0x67/0x24c
 <c01a55c9> journal_begin+0x9c/0xdc
 <c0196cb0> reiserfs_dirty_inode+0x5a/0x76
 <c016bacc> __mark_inode_dirty+0x2d/0x15e
 <c015dd9a> vfs_readdir+0x58/0x6f
 <c015de14> sys_getdents64+0x63/0xa8
 <c015dbe8> filldir64+0x0/0xc3
 <c010287f> syscall_call+0x7/0xb
ktee          S FCFCA100     0 14171   4167         14172 14170 (NOTLB)
   ccad7f40 d4a13668 d4a13668 fcfca100 003d0ca2 e75553a0 00000009 d0fd8648
   d0fd8540 e7a9c030 c1507620 fcfca100 003d0ca2 00000000 00000000 cde230cc
   e75552dc 00000000 00000202 ccad7f68 00000000 e4930a00 00000000 ccad7f68
Call Trace:
 <c0158a0d> pipe_wait+0x6b/0x8c
 <c0132e9d> audit_syscall_entry+0x104/0x12b
 <c0127d9b> autoremove_wake_function+0x0/0x3a
 <c017026b> sys_tee+0x149/0x3af
 <c010287f> syscall_call+0x7/0xb
wc            S B8CC9300     0 14172   4167               14171 (NOTLB)
   d012dec0 e70be2ac e70be304 b8cc9300 003d0ca2 0000011b 00000009 d07d6178
   d07d6070 d4bbea50 c1507620 b8cc9300 003d0ca2 00000000 00000000 c045f800
   00000000 00000246 00000202 d012dee8 00000000 d1f5ca00 bfa2a386 d012dee8
Call Trace:
 <c0158a0d> pipe_wait+0x6b/0x8c
 <c0127d9b> autoremove_wake_function+0x0/0x3a
 <c01590fb> pipe_readv+0x2c9/0x339
 <c013f60a> __handle_mm_fault+0x27c/0x73c
 <c013f771> __handle_mm_fault+0x3e3/0x73c
 <c0159191> pipe_read+0x26/0x2a
 <c014ea2b> vfs_read+0x87/0x11b
 <c014ee76> sys_read+0x3b/0x64
 <c010287f> syscall_call+0x7/0xb



-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  8:40                         ` Michael Kerrisk
@ 2006-07-10  8:46                           ` Jens Axboe
  2006-07-10  8:50                             ` Michael Kerrisk
  2006-07-10  8:50                             ` Jens Axboe
  0 siblings, 2 replies; 34+ messages in thread
From: Jens Axboe @ 2006-07-10  8:46 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: lcapitulino, vendor-sec, linux-kernel, akpm, michael.kerrisk

On Mon, Jul 10 2006, Michael Kerrisk wrote:
> > > One slight strangeness.  Most of the time, the 
> > > "find . | ktee r | wc" command line takes about 0.1 seconds to 
> > > execute, but about 1 time in 5 on my x86 system, it takes about 
> > > 1.5 to 2 seconds to execute.  Any ideas about what's happening 
> > > there?
> > 
> > That is pretty odd. Any chance you can do a quick sysrq-t and see where
> > find/ktee/wc is stuck when this happens? You should not be seeing that,
> > naturally, I'll see if I can reproduce that here. How much data does
> > find . return in your example?
> 
> See the start of this message.
> 
> One sysrq-t output output below.
> 
> Cheers,
> 
> Michael
> 
> 
> find          D B9099C00     0 14170   4167         14171       (NOTLB)
>    ca279d04 00118054 00000008 b9099c00 003d0ca2 e7b0b9f8 00000009 d307f688
>    d307f580 c0459dc0 c1507620 b9099c00 003d0ca2 00000000 00000000 00118054
>    00000001 00001000 c015010e e647f5ac e647f5b8 00000046 00000000 00000000
> Call Trace:
>  <c015010e> __getblk+0x1d/0x225
>  <c03e1e7c> io_schedule+0x26/0x30
>  <c0150874> sync_buffer+0x37/0x3a
>  <c03e25fd> __wait_on_bit+0x33/0x59
>  <c015083d> sync_buffer+0x0/0x3a
>  <c03e2695> out_of_line_wait_on_bit+0x72/0x7a
>  <c015083d> sync_buffer+0x0/0x3a
>  <c0127dd5> wake_bit_function+0x0/0x34
>  <c019e39b> search_by_key+0x133/0xd91
>  <c0110a1c> do_page_fault+0x0/0x532
>  <c0189e2d> search_by_entry_key+0x20/0x22f
>  <c015dc76> filldir64+0x8e/0xc3
>  <c019e214> pathrelse+0x1b/0x2f
>  <c019388c> reiserfs_readdir+0x3e3/0x3fb
>  <c0193895> reiserfs_readdir+0x3ec/0x3fb
>  <c018d6e5> reiserfs_update_sd_size+0x67/0x24c
>  <c01a55c9> journal_begin+0x9c/0xdc
>  <c0196cb0> reiserfs_dirty_inode+0x5a/0x76
>  <c016bacc> __mark_inode_dirty+0x2d/0x15e
>  <c015dd9a> vfs_readdir+0x58/0x6f
>  <c015de14> sys_getdents64+0x63/0xa8
>  <c015dbe8> filldir64+0x0/0xc3
>  <c010287f> syscall_call+0x7/0xb

So it's find being stuck, this doesn't look tee/splice related at all.
Can you reproduce the same thing just by doing the find . > /dev/null?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  8:46                           ` Jens Axboe
@ 2006-07-10  8:50                             ` Michael Kerrisk
  2006-07-10  9:06                               ` Jens Axboe
  2006-07-10  8:50                             ` Jens Axboe
  1 sibling, 1 reply; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-10  8:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: michael.kerrisk, akpm, linux-kernel, vendor-sec, lcapitulino

> So it's find being stuck, this doesn't look tee/splice related at all.
> Can you reproduce the same thing just by doing the find . > /dev/null?

Hmm -- yes, I can.  So you are right, it's unrelated to splice.
Not sure what's going on...

Thanks,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  8:46                           ` Jens Axboe
  2006-07-10  8:50                             ` Michael Kerrisk
@ 2006-07-10  8:50                             ` Jens Axboe
  1 sibling, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2006-07-10  8:50 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: lcapitulino, vendor-sec, linux-kernel, akpm, michael.kerrisk

On Mon, Jul 10 2006, Jens Axboe wrote:
> On Mon, Jul 10 2006, Michael Kerrisk wrote:
> > > > One slight strangeness.  Most of the time, the 
> > > > "find . | ktee r | wc" command line takes about 0.1 seconds to 
> > > > execute, but about 1 time in 5 on my x86 system, it takes about 
> > > > 1.5 to 2 seconds to execute.  Any ideas about what's happening 
> > > > there?
> > > 
> > > That is pretty odd. Any chance you can do a quick sysrq-t and see where
> > > find/ktee/wc is stuck when this happens? You should not be seeing that,
> > > naturally, I'll see if I can reproduce that here. How much data does
> > > find . return in your example?
> > 
> > See the start of this message.
> > 
> > One sysrq-t output output below.
> > 
> > Cheers,
> > 
> > Michael
> > 
> > 
> > find          D B9099C00     0 14170   4167         14171       (NOTLB)
> >    ca279d04 00118054 00000008 b9099c00 003d0ca2 e7b0b9f8 00000009 d307f688
> >    d307f580 c0459dc0 c1507620 b9099c00 003d0ca2 00000000 00000000 00118054
> >    00000001 00001000 c015010e e647f5ac e647f5b8 00000046 00000000 00000000
> > Call Trace:
> >  <c015010e> __getblk+0x1d/0x225
> >  <c03e1e7c> io_schedule+0x26/0x30
> >  <c0150874> sync_buffer+0x37/0x3a
> >  <c03e25fd> __wait_on_bit+0x33/0x59
> >  <c015083d> sync_buffer+0x0/0x3a
> >  <c03e2695> out_of_line_wait_on_bit+0x72/0x7a
> >  <c015083d> sync_buffer+0x0/0x3a
> >  <c0127dd5> wake_bit_function+0x0/0x34
> >  <c019e39b> search_by_key+0x133/0xd91
> >  <c0110a1c> do_page_fault+0x0/0x532
> >  <c0189e2d> search_by_entry_key+0x20/0x22f
> >  <c015dc76> filldir64+0x8e/0xc3
> >  <c019e214> pathrelse+0x1b/0x2f
> >  <c019388c> reiserfs_readdir+0x3e3/0x3fb
> >  <c0193895> reiserfs_readdir+0x3ec/0x3fb
> >  <c018d6e5> reiserfs_update_sd_size+0x67/0x24c
> >  <c01a55c9> journal_begin+0x9c/0xdc
> >  <c0196cb0> reiserfs_dirty_inode+0x5a/0x76
> >  <c016bacc> __mark_inode_dirty+0x2d/0x15e
> >  <c015dd9a> vfs_readdir+0x58/0x6f
> >  <c015de14> sys_getdents64+0x63/0xa8
> >  <c015dbe8> filldir64+0x0/0xc3
> >  <c010287f> syscall_call+0x7/0xb
> 
> So it's find being stuck, this doesn't look tee/splice related at all.
> Can you reproduce the same thing just by doing the find . > /dev/null?

I think you found an unrelated bug, I can reproduce the same thing with
just find . > /dev/null here:

centera:/data1 # time find . > /dev/null

real    0m0.206s
user    0m0.009s
sys     0m0.196s
centera:/data1 # time find . > /dev/null

real    0m0.205s
user    0m0.008s
sys     0m0.198s
centera:/data1 # time find . > /dev/null

real    0m0.205s
user    0m0.012s
sys     0m0.194s
centera:/data1 # time find . > /dev/null

real    0m0.836s
user    0m0.011s
sys     0m0.194s

It's pretty close to 0.2 seconds most of the time, sometimes find takes
more than 1 second to complete though. Even nice'ing find to -20
reproduces the same thing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  8:50                             ` Michael Kerrisk
@ 2006-07-10  9:06                               ` Jens Axboe
  2006-07-10  9:08                                 ` Michael Kerrisk
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2006-07-10  9:06 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: michael.kerrisk, akpm, linux-kernel, vendor-sec, lcapitulino

On Mon, Jul 10 2006, Michael Kerrisk wrote:
> > So it's find being stuck, this doesn't look tee/splice related at all.
> > Can you reproduce the same thing just by doing the find . > /dev/null?
> 
> Hmm -- yes, I can.  So you are right, it's unrelated to splice.
> Not sure what's going on...

Hmm duh, the one-in-five-runs should have run a bell (and I should have
read the trace more carefully) - it's the access times being synced to
disk. If you mount with noatime,nodiratime you should get consistent
times across runs.

So nothing unexpected there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: splice/tee bugs?
  2006-07-10  9:06                               ` Jens Axboe
@ 2006-07-10  9:08                                 ` Michael Kerrisk
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Kerrisk @ 2006-07-10  9:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: lcapitulino, vendor-sec, linux-kernel, akpm, michael.kerrisk

> Hmm duh, the one-in-five-runs should have run a bell (and I should have
> read the trace more carefully) - it's the access times being synced to
> disk. If you mount with noatime,nodiratime you should get consistent
> times across runs.
> 
> So nothing unexpected there.

Thanks.  I was wondering if it was something like this.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2006-07-10  9:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-07  7:07 splice/tee bugs? Michael Kerrisk
2006-07-07 11:07 ` Andrew Morton
2006-07-07 11:42   ` Michael Kerrisk
2006-07-07 12:03     ` Jens Axboe
2006-07-07 12:28       ` Jens Axboe
2006-07-07 12:31         ` Michael Kerrisk
2006-07-07 12:41           ` Jens Axboe
2006-07-07 13:12             ` Jens Axboe
2006-07-07 13:14               ` Jens Axboe
2006-07-07 13:21                 ` Arjan van de Ven
2006-07-07 13:26                   ` Jens Axboe
2006-07-07 13:54                     ` Paulo Marques
2006-07-07 14:02                       ` Jens Axboe
2006-07-07 14:05               ` Michael Kerrisk
2006-07-07 14:08                 ` Jens Axboe
2006-07-07 16:13   ` Luiz Fernando N. Capitulino
2006-07-07 21:43     ` Luiz Fernando N. Capitulino
2006-07-08  6:41     ` Jens Axboe
2006-07-08 21:09       ` Luiz Fernando N. Capitulino
2006-07-09 10:36         ` Jens Axboe
2006-07-09 11:16           ` Jens Axboe
2006-07-09 16:47             ` Luiz Fernando N. Capitulino
2006-07-09 17:57               ` Jens Axboe
2006-07-10  6:25                 ` Michael Kerrisk
2006-07-10  6:43                   ` Jens Axboe
2006-07-10  8:09                     ` Michael Kerrisk
2006-07-10  8:24                       ` Jens Axboe
2006-07-10  8:40                         ` Michael Kerrisk
2006-07-10  8:46                           ` Jens Axboe
2006-07-10  8:50                             ` Michael Kerrisk
2006-07-10  9:06                               ` Jens Axboe
2006-07-10  9:08                                 ` Michael Kerrisk
2006-07-10  8:50                             ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2006-07-08  5:33 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox