Pathological case identified from contest

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Pathological case identified from contest
@ 2002-10-17  2:13 Con Kolivas
  2002-10-17  2:49 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Con Kolivas @ 2002-10-17  2:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux kernel mailing list

I found a pathological case in 2.5 while running contest with process_load
recently after checking the results which showed a bad result for 2.5.43-mm1:

2.5.43-mm1              101.38  72%     42      31%
2.5.43-mm1              102.90  75%     34      28%
2.5.43-mm1              504.12  14%     603     85%
2.5.43-mm1              96.73   77%     34      26%

This was very strange so I looked into it further

The default for process_load is this command:

process_load --processes $nproc --recordsize 8192 --injections 2

where $nproc=4*num_cpus

When I changed recordsize to 16384, many of the 2.5 kernels started exhibiting
the same behaviour. While the machine was apparently still alive and would
respond to my request to abort, the kernel compile would all but stop while
process_load just continued without allowing anything to happen from kernel
compilation for up to 5 minutes at a time. This doesnt happen with any 2.4 kernels.

I'm not sure what to make of this. Suggestion?

Con

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-17  2:13 Pathological case identified from contest Con Kolivas
@ 2002-10-17  2:49 ` Andrew Morton
  2002-10-17  4:26   ` Con Kolivas
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2002-10-17  2:49 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

Con Kolivas wrote:
> 
> I found a pathological case in 2.5 while running contest with process_load
> recently after checking the results which showed a bad result for 2.5.43-mm1:
> 
> 2.5.43-mm1              101.38  72%     42      31%
> 2.5.43-mm1              102.90  75%     34      28%
> 2.5.43-mm1              504.12  14%     603     85%
> 2.5.43-mm1              96.73   77%     34      26%
> 
> This was very strange so I looked into it further
> 
> The default for process_load is this command:
> 
> process_load --processes $nproc --recordsize 8192 --injections 2
> 
> where $nproc=4*num_cpus
> 
> When I changed recordsize to 16384, many of the 2.5 kernels started exhibiting
> the same behaviour. While the machine was apparently still alive and would
> respond to my request to abort, the kernel compile would all but stop while
> process_load just continued without allowing anything to happen from kernel
> compilation for up to 5 minutes at a time. This doesnt happen with any 2.4 kernels.
> 

Well it doesn't happen on my test machine (UP or SMP).  I tried
various recordsizes.  It's probably related to HZ, memory bandwidth
and the precise timing at which things happen.

The test describes itself thusly:

 *  This test generates a load which simulates a process-loaded system.
 *
 *  The test creates a ring of processes, each connected to its predecessor
 *  and successor by a pipe.  After the ring is created, the parent process
 *  injects some dummy data records into the ring and then joins.  The
 *  processes pass the data records around the ring until they are killed.
 *

It'll be starvation in the CPU scheduler I expect.  For some reason
the ring of piping processes is just never giving a timeslice to
anything else.  Or maybe something to do with the exceptional
wakeup strategy which pipes use.

Don't now, sorry.  One for the kernel/*.c guys.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-17  2:49 ` Andrew Morton
@ 2002-10-17  4:26   ` Con Kolivas
  2002-10-17  7:16     ` Con Kolivas
  0 siblings, 1 reply; 9+ messages in thread
From: Con Kolivas @ 2002-10-17  4:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux kernel mailing list, William Lee Irwin

Quoting Andrew Morton <akpm@digeo.com>:

> Con Kolivas wrote:
> > 
> > I found a pathological case in 2.5 while running contest with process_load
> > recently after checking the results which showed a bad result for
> 2.5.43-mm1:
> > 
> > 2.5.43-mm1              101.38  72%     42      31%
> > 2.5.43-mm1              102.90  75%     34      28%
> > 2.5.43-mm1              504.12  14%     603     85%
> > 2.5.43-mm1              96.73   77%     34      26%
> > 
> > This was very strange so I looked into it further
> > 
> > The default for process_load is this command:
> > 
> > process_load --processes $nproc --recordsize 8192 --injections 2
> > 
> > where $nproc=4*num_cpus
> > 
> > When I changed recordsize to 16384, many of the 2.5 kernels started
> exhibiting
> > the same behaviour. While the machine was apparently still alive and would
> > respond to my request to abort, the kernel compile would all but stop
> while
> > process_load just continued without allowing anything to happen from
> kernel
> > compilation for up to 5 minutes at a time. This doesnt happen with any 2.4
> kernels.
> > 
> 
> Well it doesn't happen on my test machine (UP or SMP).  I tried
> various recordsizes.  It's probably related to HZ, memory bandwidth
> and the precise timing at which things happen.
> 
> The test describes itself thusly:
> 
>  *  This test generates a load which simulates a process-loaded system.
>  *
>  *  The test creates a ring of processes, each connected to its predecessor
>  *  and successor by a pipe.  After the ring is created, the parent process
>  *  injects some dummy data records into the ring and then joins.  The
>  *  processes pass the data records around the ring until they are killed.
>  *
> 
> It'll be starvation in the CPU scheduler I expect.  For some reason
> the ring of piping processes is just never giving a timeslice to
> anything else.  Or maybe something to do with the exceptional
> wakeup strategy which pipes use.
> 
> Don't now, sorry.  One for the kernel/*.c guys.

Ok well I've done some profiling as suggested by wli and it shows pretty much
what I find in the results - it gets stuck while doing process_load and never
moves on.

recordsize 8192 kern profile:
c01223ac 76997    4.48583     do_anonymous_page
c0188694 135835   7.91373     __generic_copy_from_user
c0188610 345071   20.1038     __generic_copy_to_user
c0105298 801429   46.6911     poll_idle
sysprofile:
00000000 160258   5.03854     (no symbol)             /lib/i686/libc-2.2.5.so
c0188610 345071   10.8491     __generic_copy_to_user 
/home/con/kernel/linux-2.5.43/vmlinux
c0105298 801429   25.1971     poll_idle              
/home/con/kernel/linux-2.5.43/vmlinux
00000000 1132668  35.6113     (no symbol)            
/usr/lib/gcc-lib/i686-pc-linux-gnu/2.95.3/cc1

Normal run consistent with doing kernel compilation most of the time.

recordsize 16384 kernprofile: 
c0111ef4 403545   4.3407      do_schedule
c0105298 558704   6.00965     poll_idle
c0188694 2571995  27.6655     __generic_copy_from_user
c0188610 4489796  48.2941     __generic_copy_to_user
sysprofile:
c0111ef4 403545   4.24896     do_schedule            
/home/con/kernel/linux-2.5.43/vmlinux
c0105298 558704   5.88264     poll_idle              
/home/con/kernel/linux-2.5.43/vmlinux
c0188694 2571995  27.0807     __generic_copy_from_user
/home/con/kernel/linux-2.5.43/vmlinux
c0188610 4489796  47.2734     __generic_copy_to_user 
/home/con/kernel/linux-2.5.43/vmlinux

I had to abort the run with recordsize 16384 but you can see it's just stuck in
process_load copying data between forked processes.

Can someone else on lkml decipher why it gets stuck here?

Con

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-17  4:26   ` Con Kolivas
@ 2002-10-17  7:16     ` Con Kolivas
  2002-10-17  7:35       ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Con Kolivas @ 2002-10-17  7:16 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: Andrew Morton

Quoting Con Kolivas <conman@kolivas.net>:

> Quoting Andrew Morton <akpm@digeo.com>:
> 
> > Con Kolivas wrote:
> > > 
> > > I found a pathological case in 2.5 while running contest with
> process_load
> > > recently after checking the results which showed a bad result for
> > 2.5.43-mm1:
> > > 
> > > 2.5.43-mm1              101.38  72%     42      31%
> > > 2.5.43-mm1              102.90  75%     34      28%
> > > 2.5.43-mm1              504.12  14%     603     85%
> > > 2.5.43-mm1              96.73   77%     34      26%
> > > 
> > > This was very strange so I looked into it further
> > > 
> > > The default for process_load is this command:
> > > 
> > > process_load --processes $nproc --recordsize 8192 --injections 2
> > > 
> > > where $nproc=4*num_cpus
> > > 
> > > When I changed recordsize to 16384, many of the 2.5 kernels started
> > exhibiting
> > > the same behaviour. While the machine was apparently still alive and
> would
> > > respond to my request to abort, the kernel compile would all but stop
> > while
> > > process_load just continued without allowing anything to happen from
> > kernel
> > > compilation for up to 5 minutes at a time. This doesnt happen with any
> 2.4
> > kernels.
> > > 
> > 
> > Well it doesn't happen on my test machine (UP or SMP).  I tried
> > various recordsizes.  It's probably related to HZ, memory bandwidth
> > and the precise timing at which things happen.
> > 
> > The test describes itself thusly:
> > 
> >  *  This test generates a load which simulates a process-loaded system.
> >  *
> >  *  The test creates a ring of processes, each connected to its
> predecessor
> >  *  and successor by a pipe.  After the ring is created, the parent
> process
> >  *  injects some dummy data records into the ring and then joins.  The
> >  *  processes pass the data records around the ring until they are killed.
> >  *
> > 
> > It'll be starvation in the CPU scheduler I expect.  For some reason
> > the ring of piping processes is just never giving a timeslice to
> > anything else.  Or maybe something to do with the exceptional
> > wakeup strategy which pipes use.
> > 
> > Don't now, sorry.  One for the kernel/*.c guys.
> 
> Ok well I've done some profiling as suggested by wli and it shows pretty
> much
> what I find in the results - it gets stuck while doing process_load and
> never
> moves on.
> 
> recordsize 8192 kern profile:
> c01223ac 76997    4.48583     do_anonymous_page
> c0188694 135835   7.91373     __generic_copy_from_user
> c0188610 345071   20.1038     __generic_copy_to_user
> c0105298 801429   46.6911     poll_idle
> sysprofile:
> 00000000 160258   5.03854     (no symbol)            
> /lib/i686/libc-2.2.5.so
> c0188610 345071   10.8491     __generic_copy_to_user 
> /home/con/kernel/linux-2.5.43/vmlinux
> c0105298 801429   25.1971     poll_idle              
> /home/con/kernel/linux-2.5.43/vmlinux
> 00000000 1132668  35.6113     (no symbol)            
> /usr/lib/gcc-lib/i686-pc-linux-gnu/2.95.3/cc1
> 
> Normal run consistent with doing kernel compilation most of the time.
> 
> recordsize 16384 kernprofile: 
> c0111ef4 403545   4.3407      do_schedule
> c0105298 558704   6.00965     poll_idle
> c0188694 2571995  27.6655     __generic_copy_from_user
> c0188610 4489796  48.2941     __generic_copy_to_user
> sysprofile:
> c0111ef4 403545   4.24896     do_schedule            
> /home/con/kernel/linux-2.5.43/vmlinux
> c0105298 558704   5.88264     poll_idle              
> /home/con/kernel/linux-2.5.43/vmlinux
> c0188694 2571995  27.0807     __generic_copy_from_user
> /home/con/kernel/linux-2.5.43/vmlinux
> c0188610 4489796  47.2734     __generic_copy_to_user 
> /home/con/kernel/linux-2.5.43/vmlinux
> 
> I had to abort the run with recordsize 16384 but you can see it's just stuck
> in
> process_load copying data between forked processes.
> 
> Can someone else on lkml decipher why it gets stuck here?

Well this has become more common with 2.5.43-mm2. I had to abort the
process_load run 3 times when benchmarking it. Going back to other kernels and
trying them it didnt happen so I dont think its my hardware failing or something
like that.

Con

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-17  7:16     ` Con Kolivas
@ 2002-10-17  7:35       ` Andrew Morton
  2002-10-17 17:15         ` Rik van Riel
  2002-10-20  2:59         ` Con Kolivas
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2002-10-17  7:35 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

Con Kolivas wrote:
> 
> ...
> Well this has become more common with 2.5.43-mm2. I had to abort the
> process_load run 3 times when benchmarking it. Going back to other kernels and
> trying them it didnt happen so I dont think its my hardware failing or something
> like that.

No, it's a bug in either the pipe code or the CPU scheduler I'd say.

You could try backing out to the 2.5.40 pipe implementation; not sure if
that would tell us much though.

Here's a backout patch:


 fs/pipe.c                 |  323 +++++++++++++++++++++++++---------------------
 include/linux/pipe_fs_i.h |    2 
 2 files changed, 183 insertions(+), 142 deletions(-)

--- 2.5.43/include/linux/pipe_fs_i.h~pipe-backout	Thu Oct 17 00:28:13 2002
+++ 2.5.43-akpm/include/linux/pipe_fs_i.h	Thu Oct 17 00:28:25 2002
@@ -9,6 +9,7 @@ struct pipe_inode_info {
 	unsigned int start;
 	unsigned int readers;
 	unsigned int writers;
+	unsigned int waiting_readers;
 	unsigned int waiting_writers;
 	unsigned int r_counter;
 	unsigned int w_counter;
@@ -27,6 +28,7 @@ struct pipe_inode_info {
 #define PIPE_LEN(inode)		((inode).i_pipe->len)
 #define PIPE_READERS(inode)	((inode).i_pipe->readers)
 #define PIPE_WRITERS(inode)	((inode).i_pipe->writers)
+#define PIPE_WAITING_READERS(inode)	((inode).i_pipe->waiting_readers)
 #define PIPE_WAITING_WRITERS(inode)	((inode).i_pipe->waiting_writers)
 #define PIPE_RCOUNTER(inode)	((inode).i_pipe->r_counter)
 #define PIPE_WCOUNTER(inode)	((inode).i_pipe->w_counter)
--- 2.5.43/fs/pipe.c~pipe-backout	Thu Oct 17 00:28:16 2002
+++ 2.5.43-akpm/fs/pipe.c	Thu Oct 17 00:28:20 2002
@@ -25,9 +25,6 @@
  *
  * FIFOs and Pipes now generate SIGIO for both readers and writers.
  * -- Jeremy Elson <jelson@circlemud.org> 2001-08-16
- *
- * pipe_read & write cleanup
- * -- Manfred Spraul <manfred@colorfullife.com> 2002-05-09
  */
 
 /* Drop the inode semaphore and wait for a pipe event, atomically */
@@ -47,81 +44,97 @@ static ssize_t
 pipe_read(struct file *filp, char *buf, size_t count, loff_t *ppos)
 {
 	struct inode *inode = filp->f_dentry->d_inode;
-	int do_wakeup;
-	ssize_t ret;
+	ssize_t size, read, ret;
 
-	/* pread is not allowed on pipes. */
-	if (unlikely(ppos != &filp->f_pos))
-		return -ESPIPE;
-	
-	/* Null read succeeds. */
-	if (unlikely(count == 0))
-		return 0;
+	/* Seeks are not allowed on pipes.  */
+	ret = -ESPIPE;
+	read = 0;
+	if (ppos != &filp->f_pos)
+		goto out_nolock;
 
-	do_wakeup = 0;
+	/* Always return 0 on null read.  */
 	ret = 0;
-	down(PIPE_SEM(*inode));
-	for (;;) {
-		int size = PIPE_LEN(*inode);
-		if (size) {
-			char *pipebuf = PIPE_BASE(*inode) + PIPE_START(*inode);
-			ssize_t chars = PIPE_MAX_RCHUNK(*inode);
-
-			if (chars > count)
-				chars = count;
-			if (chars > size)
-				chars = size;
-
-			if (copy_to_user(buf, pipebuf, chars)) {
-				if (!ret) ret = -EFAULT;
-				break;
-			}
-			ret += chars;
+	if (count == 0)
+		goto out_nolock;
 
-			PIPE_START(*inode) += chars;
-			PIPE_START(*inode) &= (PIPE_SIZE - 1);
-			PIPE_LEN(*inode) -= chars;
-			count -= chars;
-			buf += chars;
-			do_wakeup = 1;
-		}
-		if (!count)
-			break;	/* common path: read succeeded */
-		if (PIPE_LEN(*inode)) /* test for cyclic buffers */
-			continue;
+	/* Get the pipe semaphore */
+	ret = -ERESTARTSYS;
+	if (down_interruptible(PIPE_SEM(*inode)))
+		goto out_nolock;
+
+	if (PIPE_EMPTY(*inode)) {
+do_more_read:
+		ret = 0;
 		if (!PIPE_WRITERS(*inode))
-			break;
-		if (!PIPE_WAITING_WRITERS(*inode)) {
-			/* syscall merging: Usually we must not sleep
-			 * if O_NONBLOCK is set, or if we got some data.
-			 * But if a writer sleeps in kernel space, then
-			 * we can wait for that data without violating POSIX.
-			 */
-			if (ret)
-				break;
-			if (filp->f_flags & O_NONBLOCK) {
-				ret = -EAGAIN;
+			goto out;
+
+		ret = -EAGAIN;
+		if (filp->f_flags & O_NONBLOCK)
+			goto out;
+
+		for (;;) {
+			PIPE_WAITING_READERS(*inode)++;
+			pipe_wait(inode);
+			PIPE_WAITING_READERS(*inode)--;
+			ret = -ERESTARTSYS;
+			if (signal_pending(current))
+				goto out;
+			ret = 0;
+			if (!PIPE_EMPTY(*inode))
 				break;
-			}
+			if (!PIPE_WRITERS(*inode))
+				goto out;
 		}
-		if (signal_pending(current)) {
-			if (!ret) ret = -ERESTARTSYS;
-			break;
-		}
-		if (do_wakeup) {
-			wake_up_interruptible(PIPE_WAIT(*inode));
- 			kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT);
-		}
-		pipe_wait(inode);
 	}
-	up(PIPE_SEM(*inode));
-	/* Signal writers asynchronously that there is more room.  */
-	if (do_wakeup) {
+
+	/* Read what data is available.  */
+	ret = -EFAULT;
+	while (count > 0 && (size = PIPE_LEN(*inode))) {
+		char *pipebuf = PIPE_BASE(*inode) + PIPE_START(*inode);
+		ssize_t chars = PIPE_MAX_RCHUNK(*inode);
+
+		if (chars > count)
+			chars = count;
+		if (chars > size)
+			chars = size;
+
+		if (copy_to_user(buf, pipebuf, chars))
+			goto out;
+
+		read += chars;
+		PIPE_START(*inode) += chars;
+		PIPE_START(*inode) &= (PIPE_SIZE - 1);
+		PIPE_LEN(*inode) -= chars;
+		count -= chars;
+		buf += chars;
+	}
+
+	/* Cache behaviour optimization */
+	if (!PIPE_LEN(*inode))
+		PIPE_START(*inode) = 0;
+
+	if (count && PIPE_WAITING_WRITERS(*inode) && !(filp->f_flags & O_NONBLOCK)) {
+		/*
+		 * We know that we are going to sleep: signal
+		 * writers synchronously that there is more
+		 * room.
+		 */
 		wake_up_interruptible_sync(PIPE_WAIT(*inode));
-		kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT);
+ 		kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT);
+		if (!PIPE_EMPTY(*inode))
+			BUG();
+		goto do_more_read;
 	}
-	if (ret > 0)
-		UPDATE_ATIME(inode);
+	/* Signal writers asynchronously that there is more room.  */
+	wake_up_interruptible(PIPE_WAIT(*inode));
+ 	kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT);
+
+	ret = read;
+out:
+	up(PIPE_SEM(*inode));
+out_nolock:
+	if (read)
+		ret = read;
 	return ret;
 }
 
@@ -129,90 +142,116 @@ static ssize_t
 pipe_write(struct file *filp, const char *buf, size_t count, loff_t *ppos)
 {
 	struct inode *inode = filp->f_dentry->d_inode;
-	ssize_t ret;
-	size_t min;
-	int do_wakeup;
-
-	/* pwrite is not allowed on pipes. */
-	if (unlikely(ppos != &filp->f_pos))
-		return -ESPIPE;
-	
-	/* Null write succeeds. */
-	if (unlikely(count == 0))
-		return 0;
+	ssize_t free, written, ret;
 
-	do_wakeup = 0;
+	/* Seeks are not allowed on pipes.  */
+	ret = -ESPIPE;
+	written = 0;
+	if (ppos != &filp->f_pos)
+		goto out_nolock;
+
+	/* Null write succeeds.  */
 	ret = 0;
-	min = count;
-	if (min > PIPE_BUF)
-		min = 1;
-	down(PIPE_SEM(*inode));
-	for (;;) {
-		int free;
-		if (!PIPE_READERS(*inode)) {
-			send_sig(SIGPIPE, current, 0);
-			if (!ret) ret = -EPIPE;
-			break;
+	if (count == 0)
+		goto out_nolock;
+
+	ret = -ERESTARTSYS;
+	if (down_interruptible(PIPE_SEM(*inode)))
+		goto out_nolock;
+
+	/* No readers yields SIGPIPE.  */
+	if (!PIPE_READERS(*inode))
+		goto sigpipe;
+
+	/* If count <= PIPE_BUF, we have to make it atomic.  */
+	free = (count <= PIPE_BUF ? count : 1);
+
+	/* Wait, or check for, available space.  */
+	if (filp->f_flags & O_NONBLOCK) {
+		ret = -EAGAIN;
+		if (PIPE_FREE(*inode) < free)
+			goto out;
+	} else {
+		while (PIPE_FREE(*inode) < free) {
+			PIPE_WAITING_WRITERS(*inode)++;
+			pipe_wait(inode);
+			PIPE_WAITING_WRITERS(*inode)--;
+			ret = -ERESTARTSYS;
+			if (signal_pending(current))
+				goto out;
+
+			if (!PIPE_READERS(*inode))
+				goto sigpipe;
 		}
-		free = PIPE_FREE(*inode);
-		if (free >= min) {
-			/* transfer data */
-			ssize_t chars = PIPE_MAX_WCHUNK(*inode);
-			char *pipebuf = PIPE_BASE(*inode) + PIPE_END(*inode);
-			/* Always wakeup, even if the copy fails. Otherwise
-			 * we lock up (O_NONBLOCK-)readers that sleep due to
-			 * syscall merging.
-			 */
-			do_wakeup = 1;
+	}
+
+	/* Copy into available space.  */
+	ret = -EFAULT;
+	while (count > 0) {
+		int space;
+		char *pipebuf = PIPE_BASE(*inode) + PIPE_END(*inode);
+		ssize_t chars = PIPE_MAX_WCHUNK(*inode);
+
+		if ((space = PIPE_FREE(*inode)) != 0) {
 			if (chars > count)
 				chars = count;
-			if (chars > free)
-				chars = free;
+			if (chars > space)
+				chars = space;
 
-			if (copy_from_user(pipebuf, buf, chars)) {
-				if (!ret) ret = -EFAULT;
-				break;
-			}
+			if (copy_from_user(pipebuf, buf, chars))
+				goto out;
 
-			ret += chars;
+			written += chars;
 			PIPE_LEN(*inode) += chars;
 			count -= chars;
 			buf += chars;
-		}
-		if (!count)
-			break;
-		if (PIPE_FREE(*inode) && ret) {
-			/* handle cyclic data buffers */
-			min = 1;
+			space = PIPE_FREE(*inode);
 			continue;
 		}
-		if (filp->f_flags & O_NONBLOCK) {
-			if (!ret) ret = -EAGAIN;
-			break;
-		}
-		if (signal_pending(current)) {
-			if (!ret) ret = -ERESTARTSYS;
+
+		ret = written;
+		if (filp->f_flags & O_NONBLOCK)
 			break;
-		}
-		if (do_wakeup) {
+
+		do {
+			/*
+			 * Synchronous wake-up: it knows that this process
+			 * is going to give up this CPU, so it doesn't have
+			 * to do idle reschedules.
+			 */
 			wake_up_interruptible_sync(PIPE_WAIT(*inode));
-			kill_fasync(PIPE_FASYNC_READERS(*inode), SIGIO, POLL_IN);
-			do_wakeup = 0;
-		}
-		PIPE_WAITING_WRITERS(*inode)++;
-		pipe_wait(inode);
-		PIPE_WAITING_WRITERS(*inode)--;
+ 			kill_fasync(PIPE_FASYNC_READERS(*inode), SIGIO, POLL_IN);
+			PIPE_WAITING_WRITERS(*inode)++;
+			pipe_wait(inode);
+			PIPE_WAITING_WRITERS(*inode)--;
+			if (signal_pending(current))
+				goto out;
+			if (!PIPE_READERS(*inode))
+				goto sigpipe;
+		} while (!PIPE_FREE(*inode));
+		ret = -EFAULT;
 	}
+
+	/* Signal readers asynchronously that there is more data.  */
+	wake_up_interruptible(PIPE_WAIT(*inode));
+	kill_fasync(PIPE_FASYNC_READERS(*inode), SIGIO, POLL_IN);
+
+	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+	mark_inode_dirty(inode);
+
+out:
 	up(PIPE_SEM(*inode));
-	if (do_wakeup) {
-		wake_up_interruptible(PIPE_WAIT(*inode));
-		kill_fasync(PIPE_FASYNC_READERS(*inode), SIGIO, POLL_IN);
-	}
-	if (ret > 0) {
-		inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-		mark_inode_dirty(inode);
-	}
+out_nolock:
+	if (written)
+		ret = written;
 	return ret;
+
+sigpipe:
+	if (written)
+		goto out;
+	up(PIPE_SEM(*inode));
+	send_sig(SIGPIPE, current, 0);
+	return -EPIPE;
 }
 
 static ssize_t
@@ -412,7 +451,7 @@ struct file_operations read_fifo_fops = 
 	.ioctl		= pipe_ioctl,
 	.open		= pipe_read_open,
 	.release	= pipe_read_release,
-	.fasync		= pipe_read_fasync,
+	.fasync         = pipe_read_fasync,
 };
 
 struct file_operations write_fifo_fops = {
@@ -423,7 +462,7 @@ struct file_operations write_fifo_fops =
 	.ioctl		= pipe_ioctl,
 	.open		= pipe_write_open,
 	.release	= pipe_write_release,
-	.fasync		= pipe_write_fasync,
+	.fasync         = pipe_write_fasync,
 };
 
 struct file_operations rdwr_fifo_fops = {
@@ -434,7 +473,7 @@ struct file_operations rdwr_fifo_fops = 
 	.ioctl		= pipe_ioctl,
 	.open		= pipe_rdwr_open,
 	.release	= pipe_rdwr_release,
-	.fasync		= pipe_rdwr_fasync,
+	.fasync         = pipe_rdwr_fasync,
 };
 
 struct file_operations read_pipe_fops = {
@@ -445,7 +484,7 @@ struct file_operations read_pipe_fops = 
 	.ioctl		= pipe_ioctl,
 	.open		= pipe_read_open,
 	.release	= pipe_read_release,
-	.fasync		= pipe_read_fasync,
+	.fasync         = pipe_read_fasync,
 };
 
 struct file_operations write_pipe_fops = {
@@ -456,7 +495,7 @@ struct file_operations write_pipe_fops =
 	.ioctl		= pipe_ioctl,
 	.open		= pipe_write_open,
 	.release	= pipe_write_release,
-	.fasync		= pipe_write_fasync,
+	.fasync         = pipe_write_fasync,
 };
 
 struct file_operations rdwr_pipe_fops = {
@@ -467,7 +506,7 @@ struct file_operations rdwr_pipe_fops = 
 	.ioctl		= pipe_ioctl,
 	.open		= pipe_rdwr_open,
 	.release	= pipe_rdwr_release,
-	.fasync		= pipe_rdwr_fasync,
+	.fasync         = pipe_rdwr_fasync,
 };
 
 struct inode* pipe_new(struct inode* inode)
@@ -486,7 +525,7 @@ struct inode* pipe_new(struct inode* ino
 	PIPE_BASE(*inode) = (char*) page;
 	PIPE_START(*inode) = PIPE_LEN(*inode) = 0;
 	PIPE_READERS(*inode) = PIPE_WRITERS(*inode) = 0;
-	PIPE_WAITING_WRITERS(*inode) = 0;
+	PIPE_WAITING_READERS(*inode) = PIPE_WAITING_WRITERS(*inode) = 0;
 	PIPE_RCOUNTER(*inode) = PIPE_WCOUNTER(*inode) = 1;
 	*PIPE_FASYNC_READERS(*inode) = *PIPE_FASYNC_WRITERS(*inode) = NULL;
 

.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-17  7:35       ` Andrew Morton
@ 2002-10-17 17:15         ` Rik van Riel
  2002-10-20  2:59         ` Con Kolivas
  1 sibling, 0 replies; 9+ messages in thread
From: Rik van Riel @ 2002-10-17 17:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux kernel mailing list, Ingo Molnar

On Thu, 17 Oct 2002, Andrew Morton wrote:

> No, it's a bug in either the pipe code or the CPU scheduler I'd say.

The scheduler definately seems to have a big error here.

Say we're doing a pipe test with 3 processes.  On an idle
system this would result in each process getting 33% of
the CPU and using the other 66% of the time sleeping.
Of course, this makes all 3 processes "interactive", since
they sleep twice as much as they run.

Now introduce 1 CPU hog in competition to this load, this
CPU hog never sleeps so it is quickly marked as cpu hog and
will end up on the expired array after one timeslice (150 ms?).

The pipe processes will have the CPU all to themselves until
STARVATION_LIMIT (2 seconds) time has passed.

This means that the CPU hog will get 7.5% of the CPU, while
the pipe processes get the other 92.5% of the CPU, or 30.8%
each, almost 4 times as much as the CPU hog.

This could either mean the sleep average isn't a useful way
to measure CPU priority or the way we measure the sleep
average needs to be changed somewhat...

kind regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-17  7:35       ` Andrew Morton
  2002-10-17 17:15         ` Rik van Riel
@ 2002-10-20  2:59         ` Con Kolivas
  2002-10-20  3:05           ` Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Con Kolivas @ 2002-10-20  2:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 17 Oct 2002 05:35 pm, you wrote:
> Con Kolivas wrote:
> > ...
> > Well this has become more common with 2.5.43-mm2. I had to abort the
> > process_load run 3 times when benchmarking it. Going back to other
> > kernels and trying them it didnt happen so I dont think its my hardware
> > failing or something like that.
>
> No, it's a bug in either the pipe code or the CPU scheduler I'd say.
>
> You could try backing out to the 2.5.40 pipe implementation; not sure if
> that would tell us much though.

I massaged the patch a little for it to apply and  it _is_ the offending code. 
Backing out the pipe changes fixed the problem. I was unable to reproduce the 
holdup I was seeing with process_load even at higher data sizes. Now what?

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9shwEF6dfvkL3i1gRAkQgAJ9J3uKeQ5AT3vCPPbGKgk0xuW4V1gCfXBJ3
93vaP5XLpT/WRGAqcVOxVkU=
=OluT
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-20  2:59         ` Con Kolivas
@ 2002-10-20  3:05           ` Andrew Morton
  2002-10-20  6:27             ` Con Kolivas
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2002-10-20  3:05 UTC (permalink / raw)
  To: conman; +Cc: Rik van Riel, linux kernel mailing list

Con Kolivas wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thu, 17 Oct 2002 05:35 pm, you wrote:
> > Con Kolivas wrote:
> > > ...
> > > Well this has become more common with 2.5.43-mm2. I had to abort the
> > > process_load run 3 times when benchmarking it. Going back to other
> > > kernels and trying them it didnt happen so I dont think its my hardware
> > > failing or something like that.
> >
> > No, it's a bug in either the pipe code or the CPU scheduler I'd say.
> >
> > You could try backing out to the 2.5.40 pipe implementation; not sure if
> > that would tell us much though.
> 
> I massaged the patch a little for it to apply and  it _is_ the offending code.
> Backing out the pipe changes fixed the problem. I was unable to reproduce the
> holdup I was seeing with process_load even at higher data sizes. Now what?
> 

Try Manfred's pipe fix I guess?


--- 2.5/fs/pipe.c	Sat Oct 19 11:40:14 2002
+++ build-2.5/fs/pipe.c	Sat Oct 19 19:44:04 2002
@@ -109,7 +109,7 @@
 			break;
 		}
 		if (do_wakeup) {
-			wake_up_interruptible(PIPE_WAIT(*inode));
+			wake_up_interruptible_sync(PIPE_WAIT(*inode));
  			kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT);
 		}
 		pipe_wait(inode);
@@ -117,7 +117,7 @@
 	up(PIPE_SEM(*inode));
 	/* Signal writers asynchronously that there is more room.  */
 	if (do_wakeup) {
-		wake_up_interruptible_sync(PIPE_WAIT(*inode));
+		wake_up_interruptible(PIPE_WAIT(*inode));
 		kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT);
 	}
 	if (ret > 0)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pathological case identified from contest
  2002-10-20  3:05           ` Andrew Morton
@ 2002-10-20  6:27             ` Con Kolivas
  0 siblings, 0 replies; 9+ messages in thread
From: Con Kolivas @ 2002-10-20  6:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 20 Oct 2002 01:05 pm, Andrew Morton wrote:
> Con Kolivas wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On Thu, 17 Oct 2002 05:35 pm, you wrote:
> > > Con Kolivas wrote:
> > > > ...
> > > > Well this has become more common with 2.5.43-mm2. I had to abort the
> > > > process_load run 3 times when benchmarking it. Going back to other
> > > > kernels and trying them it didnt happen so I dont think its my
> > > > hardware failing or something like that.
> > >
> > > No, it's a bug in either the pipe code or the CPU scheduler I'd say.
> > >
> > > You could try backing out to the 2.5.40 pipe implementation; not sure
> > > if that would tell us much though.
> >
> > I massaged the patch a little for it to apply and  it _is_ the offending
> > code. Backing out the pipe changes fixed the problem. I was unable to
> > reproduce the holdup I was seeing with process_load even at higher data
> > sizes. Now what?
>
> Try Manfred's pipe fix I guess?
>

Well *that* makes sense. Tried it and it fixed it thank you.

Cheers,
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9skzBF6dfvkL3i1gRAtbyAKCg6bIWNnEbZeFnRT2mcS7TkkBtsQCfatyT
m4Q37qYkOZ389DlcvluL9vA=
=PXfw
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-10-20  6:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-17  2:13 Pathological case identified from contest Con Kolivas
2002-10-17  2:49 ` Andrew Morton
2002-10-17  4:26   ` Con Kolivas
2002-10-17  7:16     ` Con Kolivas
2002-10-17  7:35       ` Andrew Morton
2002-10-17 17:15         ` Rik van Riel
2002-10-20  2:59         ` Con Kolivas
2002-10-20  3:05           ` Andrew Morton
2002-10-20  6:27             ` Con Kolivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox