public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [0/3] filtered wakeups respun
@ 2004-05-05  6:06 William Lee Irwin III
  2004-05-05  6:08 ` William Lee Irwin III
  0 siblings, 1 reply; 6+ messages in thread
From: William Lee Irwin III @ 2004-05-05  6:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

[1/3]: filtered wakeups
	filter wakeups by the page being woken up for
[2/3]: filtered buffers
	filter wakeups by the bh being woken up for
[3/3]: wakeone
	restore wake-one semantics to bitlocking for pages and bh's

Same machine/etc. as before, except this time, ext3 instead of ext2.
ext3 shows noise-level differences in raw throughputs with large
reductions in cpu overhead, mostly on the read side.

ext2 results differ from these in that a 23% boost to sequential write
cpu efficiency (throughput scaled by %cpu) is also achieved for
sequential writes, almost entirely due to wake-one semantics. The tests
take long enough to run that I've not done the ext2 results on a
precisely-matching codebase. From the extant ext2 results:

$ cat ~/tmp/virgin_mm.log/tiotest.log  
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs | 1118.1 s |  14.654 MB/s |   1.6 %  | 280.9 % |
| Random Write 2000 MBs |  336.2 s |   5.950 MB/s |   0.8 %  |  20.4 % |
| Read        16384 MBs | 1717.1 s |   9.542 MB/s |   1.4 %  |  31.8 % |
| Random Read  2000 MBs |  465.2 s |   4.300 MB/s |   1.1 %  |  36.1 % |
`----------------------------------------------------------------------'
$ cat ~/tmp/filtered_wakeup.log/tiotest.log                   
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs | 1099.5 s |  14.901 MB/s |   2.2 %  | 279.3 % |
| Random Write 2000 MBs |  333.8 s |   5.991 MB/s |   1.0 %  |  14.9 % |
| Read        16384 MBs | 1706.3 s |   9.602 MB/s |   1.4 %  |  19.1 % |
| Random Read  2000 MBs |  460.3 s |   4.345 MB/s |   1.1 %  |  14.8 % |
`----------------------------------------------------------------------'
$ cat ~/tmp/wakeone.log/tiotest.log                          
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs | 1073.8 s |  15.258 MB/s |   1.5 %  | 237.3 % |
| Random Write 2000 MBs |  336.9 s |   5.937 MB/s |   0.9 %  |  15.2 % |
| Read        16384 MBs | 1703.0 s |   9.621 MB/s |   1.3 %  |  18.8 % |
| Random Read  2000 MBs |  458.6 s |   4.361 MB/s |   1.0 %  |  14.9 % |
`----------------------------------------------------------------------'

/home/wli/tmp/virgin_mm.log/tiotest.log:
Write:            5.1873MB/cpusec
Random Write:    28.0660MB/cpusec
Read:            28.7410MB/cpusec
Random Read:     11.5591MB/cpusec
/home/wli/tmp/filtered_wakeup.log/tiotest.log:
Write:            5.2934MB/cpusec
Random Write:    37.6792MB/cpusec
Read:            46.8390MB/cpusec
Random Read:     27.3270MB/cpusec
/home/wli/tmp/wakeone.log/tiotest.log:
Write:            6.3894MB/cpusec
Random Write:    36.8758MB/cpusec
Read:            47.8657MB/cpusec
Random Read:     27.4277MB/cpusec

The wakeone implementation used for the ext2 run(s) above was somewhat
less refined than the current one in that it didn't implement wake-one
semantics for lock_buffer() and committed a major stupidity in waking
more waiters than necessary in its wake_up_filtered().

One should also note specific complaints about random read performance
are going around, and this near triples ext3's cpu efficiency on random
reads i.e. it takes ext3 from 10.3MB/cpusec to 28.5MB/cpusec.


ext3 results;
before:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs |  926.5 s |  17.683 MB/s |   1.9 %  | 161.3 % |
| Random Write 2000 MBs |  333.5 s |   5.998 MB/s |   0.9 %  |  21.0 % |
| Read        16384 MBs | 1634.0 s |  10.027 MB/s |   1.5 %  |  28.4 % |
| Random Read  2000 MBs |  448.1 s |   4.463 MB/s |   1.2 %  |  42.2 % |
`----------------------------------------------------------------------'

Throughput scaled by cpu consumption:
Write:           10.8352MB/cpusec
Random Write:    27.3881MB/cpusec
Read:            33.5351MB/cpusec
Random Read:     10.2834MB/cpusec

top 10 cpu consumers:
 15328 finish_task_switch                        79.8333
 10149 __wake_up                                158.5781
  9859 generic_file_aio_write_nolock              4.3393
  8836 file_read_actor                           39.4464
  7601 __do_softirq                              26.3924
  3114 kmem_cache_free                           24.3281
  2810 __find_get_block                           9.7569
  2727 prepare_to_wait                           21.3047
  2464 kmem_cache_alloc                          19.2500
  1675 tl0_linux32                               52.3438

top 10 scheduler callers:
8827430 wait_on_page_bit                         30650.7986
327735 __wait_on_buffer                         1463.1027
209926 __handle_preemption                      13120.3750
138613 worker_thread                            254.8033
 35838 generic_file_aio_write_nolock             15.7738
 32265 __lock_page                              112.0312
 16281 pipe_wait                                127.1953
  9538 do_exit                                    9.3145
  7622 shrink_list                                4.1067
  6816 compat_sys_nanosleep                      17.7500

after:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs |  926.7 s |  17.680 MB/s |   1.9 %  | 140.4 % |
| Random Write 2000 MBs |  334.4 s |   5.981 MB/s |   0.9 %  |  19.7 % |
| Read        16384 MBs | 1649.8 s |   9.931 MB/s |   1.3 %  |  19.0 % |
| Random Read  2000 MBs |  443.6 s |   4.509 MB/s |   1.1 %  |  14.7 % |
`----------------------------------------------------------------------'

Throughput scaled by cpu consumption:
Write:           12.4245MB/cpusec
Random Write:    29.0340MB/cpusec
Read:            48.9212MB/cpusec
Random Read:     28.5380MB/cpusec

top 10 cpu consumers:
  9751 generic_file_aio_write_nolock              4.2918
  9116 file_read_actor                           40.6964
  7419 __do_softirq                              25.7604
  5217 finish_task_switch                        27.1719
  3482 __find_get_block                          12.0903
  2725 kmem_cache_free                           21.2891
  2669 wake_up_filtered                          13.9010
  2543 kmem_cache_alloc                          19.8672
  1629 find_get_page                             16.9688
  1613 tl0_linux32                               50.4062

top 10 scheduler callers:
2402700 wait_on_page_bit                         6825.8523
198357 __handle_preemption                      12397.3125
179318 worker_thread                            329.6287
 18343 generic_file_aio_write_nolock              8.0735
 15687 pipe_wait                                122.5547
  9306 do_exit                                    9.0879
  7531 __lock_buffer                             39.2240
  6814 compat_sys_nanosleep                      17.7448
  6716 kswapd                                    29.9821
  5429 sys_wait4                                  9.4253


-- wli

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [0/3] filtered wakeups respun
  2004-05-05  6:06 [0/3] filtered wakeups respun William Lee Irwin III
@ 2004-05-05  6:08 ` William Lee Irwin III
  2004-05-05  6:11   ` [2/3] filtered buffer_head wakeups William Lee Irwin III
  0 siblings, 1 reply; 6+ messages in thread
From: William Lee Irwin III @ 2004-05-05  6:08 UTC (permalink / raw)
  To: akpm, linux-kernel

On Tue, May 04, 2004 at 11:06:12PM -0700, William Lee Irwin III wrote:
> Same machine/etc. as before, except this time, ext3 instead of ext2.
> ext3 shows noise-level differences in raw throughputs with large
> reductions in cpu overhead, mostly on the read side.

Precisely the same filtered wakeups for pages as before. Drop in a
fresh wakeup primitive that uses a key to discriminate between the
waiters for different objects on a hashed waitqueue, and make the
page waiting functions use it.


-- wli


Index: wake-2.6.6-rc3-mm1/include/linux/wait.h
===================================================================
--- wake-2.6.6-rc3-mm1.orig/include/linux/wait.h	2004-04-03 19:37:07.000000000 -0800
+++ wake-2.6.6-rc3-mm1/include/linux/wait.h	2004-05-04 13:16:00.000000000 -0700
@@ -28,6 +28,11 @@
 	struct list_head task_list;
 };
 
+struct filtered_wait_queue {
+	void *key;
+	wait_queue_t wait;
+};
+
 struct __wait_queue_head {
 	spinlock_t lock;
 	struct list_head task_list;
@@ -104,6 +109,7 @@
 	list_del(&old->task_list);
 }
 
+void FASTCALL(wake_up_filtered(wait_queue_head_t *, void *));
 extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr));
 extern void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode));
 extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr));
@@ -257,6 +263,16 @@
 		wait->func = autoremove_wake_function;			\
 		INIT_LIST_HEAD(&wait->task_list);			\
 	} while (0)
+
+#define DEFINE_FILTERED_WAIT(name, p)					\
+	struct filtered_wait_queue name = {				\
+		.key	= p,						\
+		.wait	=	{					\
+			.task	= current,				\
+			.func	= autoremove_wake_function,		\
+			.task_list = LIST_HEAD_INIT(name.wait.task_list),\
+		},							\
+	}
 	
 #endif /* __KERNEL__ */
 
Index: wake-2.6.6-rc3-mm1/kernel/sched.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/kernel/sched.c	2004-04-30 15:06:49.000000000 -0700
+++ wake-2.6.6-rc3-mm1/kernel/sched.c	2004-05-04 13:16:00.000000000 -0700
@@ -2518,6 +2518,19 @@
 	}
 }
 
+void fastcall wake_up_filtered(wait_queue_head_t *q, void *key)
+{
+	unsigned long flags;
+	unsigned int mode = TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE;
+	struct filtered_wait_queue *wait, *save;
+	spin_lock_irqsave(&q->lock, flags);
+	list_for_each_entry_safe(wait, save, &q->task_list, wait.task_list) {
+		if (wait->key == key)
+			wait->wait.func(&wait->wait, mode, 0);
+	}
+	spin_unlock_irqrestore(&q->lock, flags);
+}
+
 /**
  * __wake_up - wake up threads blocked on a waitqueue.
  * @q: the waitqueue
Index: wake-2.6.6-rc3-mm1/mm/filemap.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/mm/filemap.c	2004-04-30 15:06:49.000000000 -0700
+++ wake-2.6.6-rc3-mm1/mm/filemap.c	2004-05-04 13:16:00.000000000 -0700
@@ -307,16 +307,16 @@
 void fastcall wait_on_page_bit(struct page *page, int bit_nr)
 {
 	wait_queue_head_t *waitqueue = page_waitqueue(page);
-	DEFINE_WAIT(wait);
+	DEFINE_FILTERED_WAIT(wait, page);
 
 	do {
-		prepare_to_wait(waitqueue, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(waitqueue, &wait.wait, TASK_UNINTERRUPTIBLE);
 		if (test_bit(bit_nr, &page->flags)) {
 			sync_page(page);
 			io_schedule();
 		}
 	} while (test_bit(bit_nr, &page->flags));
-	finish_wait(waitqueue, &wait);
+	finish_wait(waitqueue, &wait.wait);
 }
 
 EXPORT_SYMBOL(wait_on_page_bit);
@@ -344,7 +344,7 @@
 		BUG();
 	smp_mb__after_clear_bit(); 
 	if (waitqueue_active(waitqueue))
-		wake_up_all(waitqueue);
+		wake_up_filtered(waitqueue, page);
 }
 
 EXPORT_SYMBOL(unlock_page);
@@ -363,7 +363,7 @@
 		smp_mb__after_clear_bit();
 	}
 	if (waitqueue_active(waitqueue))
-		wake_up_all(waitqueue);
+		wake_up_filtered(waitqueue, page);
 }
 
 EXPORT_SYMBOL(end_page_writeback);
@@ -379,16 +379,16 @@
 void fastcall __lock_page(struct page *page)
 {
 	wait_queue_head_t *wqh = page_waitqueue(page);
-	DEFINE_WAIT(wait);
+	DEFINE_FILTERED_WAIT(wait, page);
 
 	while (TestSetPageLocked(page)) {
-		prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
 		if (PageLocked(page)) {
 			sync_page(page);
 			io_schedule();
 		}
 	}
-	finish_wait(wqh, &wait);
+	finish_wait(wqh, &wait.wait);
 }
 
 EXPORT_SYMBOL(__lock_page);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [2/3] filtered buffer_head wakeups
  2004-05-05  6:08 ` William Lee Irwin III
@ 2004-05-05  6:11   ` William Lee Irwin III
  2004-05-05  6:16     ` [3/3] wake-one PG_locked/BH_Lock semantics William Lee Irwin III
  0 siblings, 1 reply; 6+ messages in thread
From: William Lee Irwin III @ 2004-05-05  6:11 UTC (permalink / raw)
  To: akpm, linux-kernel

On Tue, May 04, 2004 at 11:08:49PM -0700, William Lee Irwin III wrote:
> Precisely the same filtered wakeups for pages as before. Drop in a
> fresh wakeup primitive that uses a key to discriminate between the
> waiters for different objects on a hashed waitqueue, and make the
> page waiting functions use it.

Now, make bh's use the new wakeup primitive also. This has the bugfix
vs. the prior version that autoremoved waitqueue wakeup functions are
made to match autoremove API usage in __wait_event_filtered().


-- wli

Index: wake-2.6.6-rc3-mm1/fs/buffer.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/fs/buffer.c	2004-04-30 15:06:46.000000000 -0700
+++ wake-2.6.6-rc3-mm1/fs/buffer.c	2004-05-04 13:16:16.000000000 -0700
@@ -74,7 +74,7 @@
 
 	smp_mb();
 	if (waitqueue_active(wq))
-		wake_up_all(wq);
+		wake_up_filtered(wq, bh);
 }
 EXPORT_SYMBOL(wake_up_buffer);
 
@@ -93,10 +93,10 @@
 void __wait_on_buffer(struct buffer_head * bh)
 {
 	wait_queue_head_t *wqh = bh_waitq_head(bh);
-	DEFINE_WAIT(wait);
+	DEFINE_FILTERED_WAIT(wait, bh);
 
 	do {
-		prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
 		if (buffer_locked(bh)) {
 			struct block_device *bd;
 			smp_mb();
@@ -106,7 +106,7 @@
 			io_schedule();
 		}
 	} while (buffer_locked(bh));
-	finish_wait(wqh, &wait);
+	finish_wait(wqh, &wait.wait);
 }
 
 static void
Index: wake-2.6.6-rc3-mm1/fs/jbd/transaction.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/fs/jbd/transaction.c	2004-04-30 15:06:46.000000000 -0700
+++ wake-2.6.6-rc3-mm1/fs/jbd/transaction.c	2004-05-04 13:16:16.000000000 -0700
@@ -638,7 +638,7 @@
 			jbd_unlock_bh_state(bh);
 			/* commit wakes up all shadow buffers after IO */
 			wqh = bh_waitq_head(jh2bh(jh));
-			wait_event(*wqh, (jh->b_jlist != BJ_Shadow));
+			wait_event_filtered(*wqh, jh2bh(jh), (jh->b_jlist != BJ_Shadow));
 			goto repeat;
 		}
 
Index: wake-2.6.6-rc3-mm1/include/linux/wait.h
===================================================================
--- wake-2.6.6-rc3-mm1.orig/include/linux/wait.h	2004-05-04 13:16:00.000000000 -0700
+++ wake-2.6.6-rc3-mm1/include/linux/wait.h	2004-05-04 13:16:27.000000000 -0700
@@ -146,7 +146,6 @@
 		break;							\
 	__wait_event(wq, condition);					\
 } while (0)
-
 #define __wait_event_interruptible(wq, condition, ret)			\
 do {									\
 	wait_queue_t __wait;						\
@@ -273,7 +272,28 @@
 			.task_list = LIST_HEAD_INIT(name.wait.task_list),\
 		},							\
 	}
-	
+
+#define __wait_event_filtered(wq, key, condition) 			\
+do {									\
+	DEFINE_FILTERED_WAIT(__wait, key);				\
+	wait_queue_head_t *__wqh = &(wq);				\
+	wait_queue_t *__wqe = &__wait.wait;				\
+	for (;;) {							\
+		prepare_to_wait(__wqh, __wqe, TASK_UNINTERRUPTIBLE);	\
+		if (condition)						\
+			break;						\
+		schedule();						\
+	}								\
+	finish_wait(__wqh, __wqe);					\
+} while (0)
+
+
+#define wait_event_filtered(wq, key, condition)				\
+do {									\
+	if (!(condition))						\
+		__wait_event_filtered(wq, key, condition);		\
+} while (0)
+
 #endif /* __KERNEL__ */
 
 #endif

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [3/3] wake-one PG_locked/BH_Lock semantics
  2004-05-05  6:11   ` [2/3] filtered buffer_head wakeups William Lee Irwin III
@ 2004-05-05  6:16     ` William Lee Irwin III
  2004-05-05  6:42       ` Michael J. Cohen
  0 siblings, 1 reply; 6+ messages in thread
From: William Lee Irwin III @ 2004-05-05  6:16 UTC (permalink / raw)
  To: akpm, linux-kernel

On Tue, May 04, 2004 at 11:11:21PM -0700, William Lee Irwin III wrote:
> Now, make bh's use the new wakeup primitive also. This has the bugfix
> vs. the prior version that autoremoved waitqueue wakeup functions are
> made to match autoremove API usage in __wait_event_filtered().

This is still grossly inefficient in that it's only necessary to wake
one waiter when the waiter promises to eventually issue another wakeup
e.g. when it releases the bit on the page. So here, wake-one semantics
are implemented for those cases, using the WQ_FLAG_EXCLUSIVE flag in
the waitqueue and the surrounding API's e.g. prepare_to_wait_exclusive().

I took the small liberty of adding list_for_each_entry_reverse_safe()
to list.h as it generally makes sense, and gives the opportunity for
fair FIFO wakeups wrapped up in a neat API.


-- wli

Index: wake-2.6.6-rc3-mm1/fs/buffer.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/fs/buffer.c	2004-05-04 13:16:16.000000000 -0700
+++ wake-2.6.6-rc3-mm1/fs/buffer.c	2004-05-04 15:19:49.000000000 -0700
@@ -78,6 +78,30 @@
 }
 EXPORT_SYMBOL(wake_up_buffer);
 
+static void sync_buffer(struct buffer_head *bh)
+{
+	struct block_device *bd;
+	smp_mb();
+	bd = bh->b_bdev;
+	if (bd)
+		blk_run_address_space(bd->bd_inode->i_mapping);
+}
+
+void fastcall __lock_buffer(struct buffer_head *bh)
+{
+	wait_queue_head_t *wqh = bh_waitq_head(bh);
+	DEFINE_FILTERED_WAIT(wait, bh);
+	do {
+		prepare_to_wait_exclusive(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
+		if (buffer_locked(bh)) {
+			sync_buffer(bh);
+			io_schedule();
+		}
+	} while (test_set_buffer_locked(bh));
+	finish_wait(wqh, &wait.wait);
+}
+EXPORT_SYMBOL(__lock_buffer);
+
 void fastcall unlock_buffer(struct buffer_head *bh)
 {
 	clear_buffer_locked(bh);
@@ -98,11 +122,7 @@
 	do {
 		prepare_to_wait(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
 		if (buffer_locked(bh)) {
-			struct block_device *bd;
-			smp_mb();
-			bd = bh->b_bdev;
-			if (bd)
-				blk_run_address_space(bd->bd_inode->i_mapping);
+			sync_buffer(bh);
 			io_schedule();
 		}
 	} while (buffer_locked(bh));
Index: wake-2.6.6-rc3-mm1/include/linux/list.h
===================================================================
--- wake-2.6.6-rc3-mm1.orig/include/linux/list.h	2004-04-30 15:06:48.000000000 -0700
+++ wake-2.6.6-rc3-mm1/include/linux/list.h	2004-05-04 13:16:53.000000000 -0700
@@ -413,6 +413,19 @@
 	     pos = n, n = list_entry(n->member.next, typeof(*n), member))
 
 /**
+ * list_for_each_entry_reverse_safe - iterate over list of given type safe against removal of list entry backward
+ * @pos:	the type * to use as a loop counter.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_reverse_safe(pos, n, head, member)		\
+	for (pos = list_entry((head)->prev, typeof(*pos), member),	\
+		n = list_entry(pos->member.prev, typeof(*pos), member);	\
+	     &pos->member != (head); 					\
+	     pos = n, n = list_entry(n->member.prev, typeof(*n), member))
+
+/**
  * list_for_each_rcu	-	iterate over an rcu-protected list
  * @pos:	the &struct list_head to use as a loop counter.
  * @head:	the head for your list.
Index: wake-2.6.6-rc3-mm1/include/linux/buffer_head.h
===================================================================
--- wake-2.6.6-rc3-mm1.orig/include/linux/buffer_head.h	2004-04-30 15:05:52.000000000 -0700
+++ wake-2.6.6-rc3-mm1/include/linux/buffer_head.h	2004-05-04 15:13:37.000000000 -0700
@@ -170,6 +170,7 @@
 struct buffer_head *alloc_buffer_head(int gfp_flags);
 void free_buffer_head(struct buffer_head * bh);
 void FASTCALL(unlock_buffer(struct buffer_head *bh));
+void FASTCALL(__lock_buffer(struct buffer_head *bh));
 void ll_rw_block(int, int, struct buffer_head * bh[]);
 void sync_dirty_buffer(struct buffer_head *bh);
 void submit_bh(int, struct buffer_head *);
@@ -279,8 +280,8 @@
 
 static inline void lock_buffer(struct buffer_head *bh)
 {
-	while (test_set_buffer_locked(bh))
-		__wait_on_buffer(bh);
+	if (test_set_buffer_locked(bh))
+		__lock_buffer(bh);
 }
 
 #endif /* _LINUX_BUFFER_HEAD_H */
Index: wake-2.6.6-rc3-mm1/kernel/sched.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/kernel/sched.c	2004-05-04 13:16:00.000000000 -0700
+++ wake-2.6.6-rc3-mm1/kernel/sched.c	2004-05-04 18:27:39.000000000 -0700
@@ -2524,9 +2524,14 @@
 	unsigned int mode = TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE;
 	struct filtered_wait_queue *wait, *save;
 	spin_lock_irqsave(&q->lock, flags);
-	list_for_each_entry_safe(wait, save, &q->task_list, wait.task_list) {
-		if (wait->key == key)
-			wait->wait.func(&wait->wait, mode, 0);
+	list_for_each_entry_reverse_safe(wait, save, &q->task_list, wait.task_list) {
+		int exclusive = wait->wait.flags & WQ_FLAG_EXCLUSIVE;
+		if (wait->key != key)
+			continue;
+		else if (!wait->wait.func(&wait->wait, mode, 0))
+			continue;
+		else if (exclusive)
+			break;
 	}
 	spin_unlock_irqrestore(&q->lock, flags);
 }
Index: wake-2.6.6-rc3-mm1/mm/filemap.c
===================================================================
--- wake-2.6.6-rc3-mm1.orig/mm/filemap.c	2004-05-04 13:16:00.000000000 -0700
+++ wake-2.6.6-rc3-mm1/mm/filemap.c	2004-05-04 15:24:01.000000000 -0700
@@ -297,17 +297,23 @@
  * at a cost of "thundering herd" phenomena during rare hash
  * collisions.
  */
-static wait_queue_head_t *page_waitqueue(struct page *page)
+static wait_queue_head_t *page_waitqueue(struct page *page, int bit)
 {
 	const struct zone *zone = page_zone(page);
+	unsigned long key = (unsigned long)page + bit;
+	return &zone->wait_table[hash_long(key, zone->wait_table_bits)];
+}
 
-	return &zone->wait_table[hash_ptr(page, zone->wait_table_bits)];
+#define PAGE_KEY_SHIFT	(BITS_PER_LONG - (BITS_PER_LONG == 32 ? 5 : 6))
+static void *page_key(struct page *page, unsigned long bit)
+{
+	return (void *)(page_to_pfn(page) | bit << PAGE_KEY_SHIFT);
 }
 
 void fastcall wait_on_page_bit(struct page *page, int bit_nr)
 {
-	wait_queue_head_t *waitqueue = page_waitqueue(page);
-	DEFINE_FILTERED_WAIT(wait, page);
+	wait_queue_head_t *waitqueue = page_waitqueue(page, bit_nr);
+	DEFINE_FILTERED_WAIT(wait, page_key(page, bit_nr));
 
 	do {
 		prepare_to_wait(waitqueue, &wait.wait, TASK_UNINTERRUPTIBLE);
@@ -338,13 +344,13 @@
  */
 void fastcall unlock_page(struct page *page)
 {
-	wait_queue_head_t *waitqueue = page_waitqueue(page);
+	wait_queue_head_t *waitqueue = page_waitqueue(page, PG_locked);
 	smp_mb__before_clear_bit();
 	if (!TestClearPageLocked(page))
 		BUG();
 	smp_mb__after_clear_bit(); 
 	if (waitqueue_active(waitqueue))
-		wake_up_filtered(waitqueue, page);
+		wake_up_filtered(waitqueue, page_key(page, PG_locked));
 }
 
 EXPORT_SYMBOL(unlock_page);
@@ -355,7 +361,7 @@
  */
 void end_page_writeback(struct page *page)
 {
-	wait_queue_head_t *waitqueue = page_waitqueue(page);
+	wait_queue_head_t *waitqueue = page_waitqueue(page, PG_writeback);
 
 	if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
 		if (!test_clear_page_writeback(page))
@@ -363,7 +369,7 @@
 		smp_mb__after_clear_bit();
 	}
 	if (waitqueue_active(waitqueue))
-		wake_up_filtered(waitqueue, page);
+		wake_up_filtered(waitqueue, page_key(page, PG_writeback));
 }
 
 EXPORT_SYMBOL(end_page_writeback);
@@ -378,11 +384,11 @@
  */
 void fastcall __lock_page(struct page *page)
 {
-	wait_queue_head_t *wqh = page_waitqueue(page);
-	DEFINE_FILTERED_WAIT(wait, page);
+	wait_queue_head_t *wqh = page_waitqueue(page, PG_locked);
+	DEFINE_FILTERED_WAIT(wait, page_key(page, PG_locked));
 
 	while (TestSetPageLocked(page)) {
-		prepare_to_wait(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait_exclusive(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
 		if (PageLocked(page)) {
 			sync_page(page);
 			io_schedule();

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [3/3] wake-one PG_locked/BH_Lock semantics
  2004-05-05  6:16     ` [3/3] wake-one PG_locked/BH_Lock semantics William Lee Irwin III
@ 2004-05-05  6:42       ` Michael J. Cohen
  2004-05-05  9:29         ` Michael J. Cohen
  0 siblings, 1 reply; 6+ messages in thread
From: Michael J. Cohen @ 2004-05-05  6:42 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: akpm, linux-kernel

tiobench numbers will be coming shortly from my test box but it appears
that this one at least feels slightly better than mainline.

------
Michael

On Wed, 2004-05-05 at 02:16, William Lee Irwin III wrote:
> On Tue, May 04, 2004 at 11:11:21PM -0700, William Lee Irwin III wrote:
> > Now, make bh's use the new wakeup primitive also. This has the bugfix
> > vs. the prior version that autoremoved waitqueue wakeup functions are
> > made to match autoremove API usage in __wait_event_filtered().
> 
> This is still grossly inefficient in that it's only necessary to wake
> one waiter when the waiter promises to eventually issue another wakeup
> e.g. when it releases the bit on the page. So here, wake-one semantics
> are implemented for those cases, using the WQ_FLAG_EXCLUSIVE flag in
> the waitqueue and the surrounding API's e.g. prepare_to_wait_exclusive().
> 
> I took the small liberty of adding list_for_each_entry_reverse_safe()
> to list.h as it generally makes sense, and gives the opportunity for
> fair FIFO wakeups wrapped up in a neat API.
> 
> 
> -- wli
> 
> Index: wake-2.6.6-rc3-mm1/fs/buffer.c
> ===================================================================
> --- wake-2.6.6-rc3-mm1.orig/fs/buffer.c	2004-05-04 13:16:16.000000000 -0700
> +++ wake-2.6.6-rc3-mm1/fs/buffer.c	2004-05-04 15:19:49.000000000 -0700
> @@ -78,6 +78,30 @@
>  }
>  EXPORT_SYMBOL(wake_up_buffer);
>  
> +static void sync_buffer(struct buffer_head *bh)
> +{
> +	struct block_device *bd;
> +	smp_mb();
> +	bd = bh->b_bdev;
> +	if (bd)
> +		blk_run_address_space(bd->bd_inode->i_mapping);
> +}
> +
> +void fastcall __lock_buffer(struct buffer_head *bh)
> +{
> +	wait_queue_head_t *wqh = bh_waitq_head(bh);
> +	DEFINE_FILTERED_WAIT(wait, bh);
> +	do {
> +		prepare_to_wait_exclusive(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
> +		if (buffer_locked(bh)) {
> +			sync_buffer(bh);
> +			io_schedule();
> +		}
> +	} while (test_set_buffer_locked(bh));
> +	finish_wait(wqh, &wait.wait);
> +}
> +EXPORT_SYMBOL(__lock_buffer);
> +
>  void fastcall unlock_buffer(struct buffer_head *bh)
>  {
>  	clear_buffer_locked(bh);
> @@ -98,11 +122,7 @@
>  	do {
>  		prepare_to_wait(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
>  		if (buffer_locked(bh)) {
> -			struct block_device *bd;
> -			smp_mb();
> -			bd = bh->b_bdev;
> -			if (bd)
> -				blk_run_address_space(bd->bd_inode->i_mapping);
> +			sync_buffer(bh);
>  			io_schedule();
>  		}
>  	} while (buffer_locked(bh));
> Index: wake-2.6.6-rc3-mm1/include/linux/list.h
> ===================================================================
> --- wake-2.6.6-rc3-mm1.orig/include/linux/list.h	2004-04-30 15:06:48.000000000 -0700
> +++ wake-2.6.6-rc3-mm1/include/linux/list.h	2004-05-04 13:16:53.000000000 -0700
> @@ -413,6 +413,19 @@
>  	     pos = n, n = list_entry(n->member.next, typeof(*n), member))
>  
>  /**
> + * list_for_each_entry_reverse_safe - iterate over list of given type safe against removal of list entry backward
> + * @pos:	the type * to use as a loop counter.
> + * @n:		another type * to use as temporary storage
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + */
> +#define list_for_each_entry_reverse_safe(pos, n, head, member)		\
> +	for (pos = list_entry((head)->prev, typeof(*pos), member),	\
> +		n = list_entry(pos->member.prev, typeof(*pos), member);	\
> +	     &pos->member != (head); 					\
> +	     pos = n, n = list_entry(n->member.prev, typeof(*n), member))
> +
> +/**
>   * list_for_each_rcu	-	iterate over an rcu-protected list
>   * @pos:	the &struct list_head to use as a loop counter.
>   * @head:	the head for your list.
> Index: wake-2.6.6-rc3-mm1/include/linux/buffer_head.h
> ===================================================================
> --- wake-2.6.6-rc3-mm1.orig/include/linux/buffer_head.h	2004-04-30 15:05:52.000000000 -0700
> +++ wake-2.6.6-rc3-mm1/include/linux/buffer_head.h	2004-05-04 15:13:37.000000000 -0700
> @@ -170,6 +170,7 @@
>  struct buffer_head *alloc_buffer_head(int gfp_flags);
>  void free_buffer_head(struct buffer_head * bh);
>  void FASTCALL(unlock_buffer(struct buffer_head *bh));
> +void FASTCALL(__lock_buffer(struct buffer_head *bh));
>  void ll_rw_block(int, int, struct buffer_head * bh[]);
>  void sync_dirty_buffer(struct buffer_head *bh);
>  void submit_bh(int, struct buffer_head *);
> @@ -279,8 +280,8 @@
>  
>  static inline void lock_buffer(struct buffer_head *bh)
>  {
> -	while (test_set_buffer_locked(bh))
> -		__wait_on_buffer(bh);
> +	if (test_set_buffer_locked(bh))
> +		__lock_buffer(bh);
>  }
>  
>  #endif /* _LINUX_BUFFER_HEAD_H */
> Index: wake-2.6.6-rc3-mm1/kernel/sched.c
> ===================================================================
> --- wake-2.6.6-rc3-mm1.orig/kernel/sched.c	2004-05-04 13:16:00.000000000 -0700
> +++ wake-2.6.6-rc3-mm1/kernel/sched.c	2004-05-04 18:27:39.000000000 -0700
> @@ -2524,9 +2524,14 @@
>  	unsigned int mode = TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE;
>  	struct filtered_wait_queue *wait, *save;
>  	spin_lock_irqsave(&q->lock, flags);
> -	list_for_each_entry_safe(wait, save, &q->task_list, wait.task_list) {
> -		if (wait->key == key)
> -			wait->wait.func(&wait->wait, mode, 0);
> +	list_for_each_entry_reverse_safe(wait, save, &q->task_list, wait.task_list) {
> +		int exclusive = wait->wait.flags & WQ_FLAG_EXCLUSIVE;
> +		if (wait->key != key)
> +			continue;
> +		else if (!wait->wait.func(&wait->wait, mode, 0))
> +			continue;
> +		else if (exclusive)
> +			break;
>  	}
>  	spin_unlock_irqrestore(&q->lock, flags);
>  }
> Index: wake-2.6.6-rc3-mm1/mm/filemap.c
> ===================================================================
> --- wake-2.6.6-rc3-mm1.orig/mm/filemap.c	2004-05-04 13:16:00.000000000 -0700
> +++ wake-2.6.6-rc3-mm1/mm/filemap.c	2004-05-04 15:24:01.000000000 -0700
> @@ -297,17 +297,23 @@
>   * at a cost of "thundering herd" phenomena during rare hash
>   * collisions.
>   */
> -static wait_queue_head_t *page_waitqueue(struct page *page)
> +static wait_queue_head_t *page_waitqueue(struct page *page, int bit)
>  {
>  	const struct zone *zone = page_zone(page);
> +	unsigned long key = (unsigned long)page + bit;
> +	return &zone->wait_table[hash_long(key, zone->wait_table_bits)];
> +}
>  
> -	return &zone->wait_table[hash_ptr(page, zone->wait_table_bits)];
> +#define PAGE_KEY_SHIFT	(BITS_PER_LONG - (BITS_PER_LONG == 32 ? 5 : 6))
> +static void *page_key(struct page *page, unsigned long bit)
> +{
> +	return (void *)(page_to_pfn(page) | bit << PAGE_KEY_SHIFT);
>  }
>  
>  void fastcall wait_on_page_bit(struct page *page, int bit_nr)
>  {
> -	wait_queue_head_t *waitqueue = page_waitqueue(page);
> -	DEFINE_FILTERED_WAIT(wait, page);
> +	wait_queue_head_t *waitqueue = page_waitqueue(page, bit_nr);
> +	DEFINE_FILTERED_WAIT(wait, page_key(page, bit_nr));
>  
>  	do {
>  		prepare_to_wait(waitqueue, &wait.wait, TASK_UNINTERRUPTIBLE);
> @@ -338,13 +344,13 @@
>   */
>  void fastcall unlock_page(struct page *page)
>  {
> -	wait_queue_head_t *waitqueue = page_waitqueue(page);
> +	wait_queue_head_t *waitqueue = page_waitqueue(page, PG_locked);
>  	smp_mb__before_clear_bit();
>  	if (!TestClearPageLocked(page))
>  		BUG();
>  	smp_mb__after_clear_bit(); 
>  	if (waitqueue_active(waitqueue))
> -		wake_up_filtered(waitqueue, page);
> +		wake_up_filtered(waitqueue, page_key(page, PG_locked));
>  }
>  
>  EXPORT_SYMBOL(unlock_page);
> @@ -355,7 +361,7 @@
>   */
>  void end_page_writeback(struct page *page)
>  {
> -	wait_queue_head_t *waitqueue = page_waitqueue(page);
> +	wait_queue_head_t *waitqueue = page_waitqueue(page, PG_writeback);
>  
>  	if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
>  		if (!test_clear_page_writeback(page))
> @@ -363,7 +369,7 @@
>  		smp_mb__after_clear_bit();
>  	}
>  	if (waitqueue_active(waitqueue))
> -		wake_up_filtered(waitqueue, page);
> +		wake_up_filtered(waitqueue, page_key(page, PG_writeback));
>  }
>  
>  EXPORT_SYMBOL(end_page_writeback);
> @@ -378,11 +384,11 @@
>   */
>  void fastcall __lock_page(struct page *page)
>  {
> -	wait_queue_head_t *wqh = page_waitqueue(page);
> -	DEFINE_FILTERED_WAIT(wait, page);
> +	wait_queue_head_t *wqh = page_waitqueue(page, PG_locked);
> +	DEFINE_FILTERED_WAIT(wait, page_key(page, PG_locked));
>  
>  	while (TestSetPageLocked(page)) {
> -		prepare_to_wait(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
> +		prepare_to_wait_exclusive(wqh, &wait.wait, TASK_UNINTERRUPTIBLE);
>  		if (PageLocked(page)) {
>  			sync_page(page);
>  			io_schedule();
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [3/3] wake-one PG_locked/BH_Lock semantics
  2004-05-05  6:42       ` Michael J. Cohen
@ 2004-05-05  9:29         ` Michael J. Cohen
  0 siblings, 0 replies; 6+ messages in thread
From: Michael J. Cohen @ 2004-05-05  9:29 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 120 bytes --]

here's the tiobenches, run several times. there was no significant
deviation for -mm1 so it's not considered a problem.

[-- Attachment #2: tiobench-p4m-2.6.6-rc3-mm1 --]
[-- Type: text/plain, Size: 3422 bytes --]

No size specified, using 1022 MB

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   27.59 8.909%     0.138      313.59   0.00000  0.00000   310
2.6.6-rc3-mm1                 1022  4096    2   26.30 8.676%     0.293      392.22   0.00000  0.00000   303
2.6.6-rc3-mm1                 1022  4096    4   24.55 8.094%     0.625      525.17   0.00000  0.00000   303
2.6.6-rc3-mm1                 1022  4096    8   24.34 7.966%     1.224     1035.81   0.00000  0.00000   306

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.52 0.429%     7.444       51.22   0.00000  0.00000   122
2.6.6-rc3-mm1                 1022  4096    2    0.54 0.255%    14.238      212.31   0.00000  0.00000   211
2.6.6-rc3-mm1                 1022  4096    4    0.59 0.332%    25.666      234.95   0.00000  0.00000   178
2.6.6-rc3-mm1                 1022  4096    8    0.60 0.334%    45.477      413.88   0.00000  0.00000   180

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   24.51 18.11%     0.146     8561.11   0.00038  0.00000   135
2.6.6-rc3-mm1                 1022  4096    2   26.12 19.45%     0.253     5324.12   0.00038  0.00000   134
2.6.6-rc3-mm1                 1022  4096    4   20.58 15.03%     0.613    10596.65   0.00153  0.00077   137
2.6.6-rc3-mm1                 1022  4096    8   20.87 15.14%     1.143    10301.78   0.02345  0.00077   138

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.90 0.473%     0.013        0.37   0.00000  0.00000   191
2.6.6-rc3-mm1                 1022  4096    2    0.85 0.418%     0.093      175.91   0.00000  0.00000   203
2.6.6-rc3-mm1                 1022  4096    4    0.84 0.444%     0.020       25.46   0.00000  0.00000   188
2.6.6-rc3-mm1                 1022  4096    8    0.81 0.444%     0.013        0.37   0.00000  0.00000   182

[-- Attachment #3: tiobench-p4m-2.6.6-rc3-mm1-waitone --]
[-- Type: text/plain, Size: 3422 bytes --]

No size specified, using 1022 MB

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   25.94 11.71%     0.146      370.36   0.00000  0.00000   221
2.6.6-rc3-mm1                 1022  4096    2   24.96 11.34%     0.304      507.07   0.00000  0.00000   220
2.6.6-rc3-mm1                 1022  4096    4   24.35 10.74%     0.626      776.48   0.00000  0.00000   227
2.6.6-rc3-mm1                 1022  4096    8   24.50 10.80%     1.225     1037.49   0.00000  0.00000   227

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.58 0.579%     6.761      108.38   0.00000  0.00000   100
2.6.6-rc3-mm1                 1022  4096    2    0.66 0.446%    11.394      134.13   0.00000  0.00000   147
2.6.6-rc3-mm1                 1022  4096    4    0.64 0.414%    23.910      267.87   0.00000  0.00000   155
2.6.6-rc3-mm1                 1022  4096    8    0.74 0.500%    39.739      379.63   0.00000  0.00000   147

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   25.92 28.26%     0.119      789.56   0.00000  0.00000    92
2.6.6-rc3-mm1                 1022  4096    2   25.08 24.29%     0.236     7824.13   0.00076  0.00000   103
2.6.6-rc3-mm1                 1022  4096    4   23.12 24.59%     0.532     8510.70   0.00038  0.00000    94
2.6.6-rc3-mm1                 1022  4096    8   22.88 21.04%     1.018     7272.02   0.01922  0.00000   109

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.87 0.620%     0.023       19.71   0.00000  0.00000   140
2.6.6-rc3-mm1                 1022  4096    2    0.87 0.632%     0.048      118.52   0.00000  0.00000   137
2.6.6-rc3-mm1                 1022  4096    4    0.84 0.557%     0.020       11.00   0.00000  0.00000   150
2.6.6-rc3-mm1                 1022  4096    8    0.81 0.626%     0.018        0.45   0.00000  0.00000   129

[-- Attachment #4: tiobench-p4m-2.6.6-rc3-mm1-waitone-2 --]
[-- Type: text/plain, Size: 3422 bytes --]

No size specified, using 1022 MB

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   27.70 8.979%     0.138      337.13   0.00000  0.00000   309
2.6.6-rc3-mm1                 1022  4096    2   26.03 8.543%     0.295      271.51   0.00000  0.00000   305
2.6.6-rc3-mm1                 1022  4096    4   23.63 7.704%     0.643      533.84   0.00000  0.00000   307
2.6.6-rc3-mm1                 1022  4096    8   22.98 7.555%     1.295     1157.90   0.00000  0.00000   304

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.48 0.359%     8.204       58.92   0.00000  0.00000   132
2.6.6-rc3-mm1                 1022  4096    2    0.51 0.259%    14.726      208.67   0.00000  0.00000   198
2.6.6-rc3-mm1                 1022  4096    4    0.52 0.303%    29.583      256.53   0.00000  0.00000   172
2.6.6-rc3-mm1                 1022  4096    8    0.59 0.283%    49.995      347.57   0.00000  0.00000   208

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   23.53 36.09%     0.135     8438.51   0.00038  0.00000    65
2.6.6-rc3-mm1                 1022  4096    2   26.13 19.06%     0.235     1934.21   0.00000  0.00000   137
2.6.6-rc3-mm1                 1022  4096    4   23.62 17.09%     0.539     9043.92   0.00115  0.00000   138
2.6.6-rc3-mm1                 1022  4096    8   24.08 17.34%     0.985     7376.00   0.01499  0.00000   139

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.90 0.465%     0.014        0.26   0.00000  0.00000   193
2.6.6-rc3-mm1                 1022  4096    2    0.87 0.468%     0.014        0.32   0.00000  0.00000   186
2.6.6-rc3-mm1                 1022  4096    4    0.84 0.438%     0.013        0.09   0.00000  0.00000   191
2.6.6-rc3-mm1                 1022  4096    8    0.84 0.460%     0.014        1.46   0.00000  0.00000   182

[-- Attachment #5: Type: text/plain, Size: 3422 bytes --]

No size specified, using 1022 MB

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   27.00 8.584%     0.142      346.75   0.00000  0.00000   315
2.6.6-rc3-mm1                 1022  4096    2   25.95 8.042%     0.295      413.18   0.00000  0.00000   323
2.6.6-rc3-mm1                 1022  4096    4   23.97 7.407%     0.639      701.70   0.00000  0.00000   324
2.6.6-rc3-mm1                 1022  4096    8   24.52 7.811%     1.218     1026.37   0.00000  0.00000   314

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.64 0.374%     6.134      132.43   0.00000  0.00000   170
2.6.6-rc3-mm1                 1022  4096    2    0.67 0.262%    11.186      176.23   0.00000  0.00000   256
2.6.6-rc3-mm1                 1022  4096    4    0.67 0.289%    20.849      343.38   0.00000  0.00000   230
2.6.6-rc3-mm1                 1022  4096    8    0.67 0.264%    45.265      396.09   0.00000  0.00000   252

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1   24.72 18.37%     0.149     9206.91   0.00038  0.00000   135
2.6.6-rc3-mm1                 1022  4096    2   24.82 18.16%     0.234     2451.44   0.00038  0.00000   137
2.6.6-rc3-mm1                 1022  4096    4   23.66 17.31%     0.586     8381.46   0.00153  0.00000   137
2.6.6-rc3-mm1                 1022  4096    8   23.45 16.87%     0.942     7985.45   0.01423  0.00000   139

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.6-rc3-mm1                 1022  4096    1    0.91 0.470%     0.013        0.27   0.00000  0.00000   193
2.6.6-rc3-mm1                 1022  4096    2    0.83 0.402%     0.013        0.27   0.00000  0.00000   206
2.6.6-rc3-mm1                 1022  4096    4    0.85 0.417%     0.013        0.29   0.00000  0.00000   203
2.6.6-rc3-mm1                 1022  4096    8    0.81 0.388%     0.013        0.27   0.00000  0.00000   208

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-05-05  9:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-05  6:06 [0/3] filtered wakeups respun William Lee Irwin III
2004-05-05  6:08 ` William Lee Irwin III
2004-05-05  6:11   ` [2/3] filtered buffer_head wakeups William Lee Irwin III
2004-05-05  6:16     ` [3/3] wake-one PG_locked/BH_Lock semantics William Lee Irwin III
2004-05-05  6:42       ` Michael J. Cohen
2004-05-05  9:29         ` Michael J. Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox