public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
@ 2001-12-21 14:11 rwhron
  2001-12-21 14:46 ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-21 14:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: axboe

While running "dbench 32" on 2.5.2-pre1:

I noticed the test was taking much longer than usual,
and I could not do a new "login".  

vmstat 8 looked like this:

r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs us  sy  id
0 34  1      0 222504  12248 736088   0   0     0     0  103    59 0   0 100
1 34  1      0 222504  12248 736088   0   0     0     0  100    56 0   0 100
0 34  1      0 222504  12248 736088   0   0     0     0  103    59 0   0 100

<sysrq Sync Umount> did not print their "done" messages.
The "b" and "w" columns when up though:

r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs us  sy  id
0 37  3      0 222456 12280 736092   0   0     0     0  222   269   0   0 100

There was no Oops.
2.5.1-dj3 completed dbench normally.

Configs between the 2 kernels:
diff 2.5.2-pre1 2.5.1-dj3
> CONFIG_IP_NF_QUEUE=m

2.5.1-pre1[01] and 2.5.1-final did not exhibit this behavior.

Hardware:
1333 Athlon
1GB RAM

CONFIG_HIGHMEM4G=y
CONFIG_HIGHMEM=y
-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 14:11 2.5.2-pre1 dbench 32 hangs in vmstat "b" state rwhron
@ 2001-12-21 14:46 ` Jens Axboe
  2001-12-21 16:43   ` rwhron
  2001-12-21 23:55   ` rwhron
  0 siblings, 2 replies; 21+ messages in thread
From: Jens Axboe @ 2001-12-21 14:46 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

On Fri, Dec 21 2001, rwhron@earthlink.net wrote:
> While running "dbench 32" on 2.5.2-pre1:
> 
> I noticed the test was taking much longer than usual,
> and I could not do a new "login".  
> 
> vmstat 8 looked like this:

You neglected to mention what disk I/O system you are using? IDE or
SCSI, and if the latter what host adapter?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 14:46 ` Jens Axboe
@ 2001-12-21 16:43   ` rwhron
  2001-12-21 17:01     ` Jens Axboe
  2001-12-21 23:55   ` rwhron
  1 sibling, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-21 16:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Fri, Dec 21, 2001 at 03:46:54PM +0100, Jens Axboe wrote:
> You neglected to mention what disk I/O system you are using? IDE or
> SCSI, and if the latter what host adapter?
> 
> -- 
> Jens Axboe

Sorry about that.  It's an IDE drive.

00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)

CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 16:43   ` rwhron
@ 2001-12-21 17:01     ` Jens Axboe
  2001-12-21 18:47       ` rwhron
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2001-12-21 17:01 UTC (permalink / raw)
  To: rwhron; +Cc: Jens Axboe, linux-kernel

On Fri, Dec 21 2001, rwhron@earthlink.net wrote:
> On Fri, Dec 21, 2001 at 03:46:54PM +0100, Jens Axboe wrote:
> > You neglected to mention what disk I/O system you are using? IDE or
> > SCSI, and if the latter what host adapter?
> > 
> > -- 
> > Jens Axboe
> 
> Sorry about that.  It's an IDE drive.
> 
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
> 00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
> 01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
> 
> CONFIG_IDE=y
> CONFIG_BLK_DEV_IDE=y
> CONFIG_BLK_DEV_IDEDISK=y
> CONFIG_IDEDISK_MULTI_MODE=y
> CONFIG_BLK_DEV_IDECD=m
> CONFIG_BLK_DEV_IDEPCI=y
> CONFIG_BLK_DEV_IDEDMA_PCI=y
> CONFIG_IDEDMA_PCI_AUTO=y
> CONFIG_BLK_DEV_IDEDMA=y
> CONFIG_IDEDMA_AUTO=y
> CONFIG_BLK_DEV_IDE_MODES=y

Thanks -- could you also try and do sysrq-t back traces when it seems
stuck?

Does a non-highmem kernel run ok?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 17:01     ` Jens Axboe
@ 2001-12-21 18:47       ` rwhron
  2001-12-21 22:19         ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-21 18:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Fri, Dec 21, 2001 at 06:01:56PM +0100, Jens Axboe wrote:
> Thanks -- could you also try and do sysrq-t back traces when it seems
> stuck?
> 
> Does a non-highmem kernel run ok?
> 
> -- 
> Jens Axboe

I recompiled with highmem turned off.  
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set

I run a scripty that executes dbench 32, then dbench 128.

dbench 32 completed this time.
dbench 128 hung similar to dbench 32 in the previous message.
I don't have the vmstat output captured, but "b" was 128,
bi and bo were 0, and idle was 100.

I couldn't save a stack trace because /bin/ed would not open a file.
I.E: ed output  - no prompt about file does not exist.  "w" would
not save, etc.  The vmstat "b" column went up by 2 after I started
ed and tried another console login.

	--

Before running dbench, I normally create a small loopback reiserfs
filesystem.  This worked okay the first time I did it (with highmem).

After recompiling without highmem, I ran my "build_rootfs" script
to create a small uml root fs, and got an Oops.  The same script
was fine on 2.5.1-pre[5-9] and 2.5.1-pre1[01].  (you fixed 
something like this in the patches between 2.5.1-pre3 and pre4.)

I rebooted after each Oops, so the dbench's above were run 
after a fresh boot.

invalid operand: 0000
CPU:    0
EIP:    0010:[<c012fbf0>]    Not tainted
EFLAGS: 00010287
eax: 00000070   ebx: 00000700   ecx: c02a45dc   edx: 00038001
esi: 00000000   edi: 00000000   ebp: f4a5a000   esp: f4a8fe38
ds: 0018   es: 0018   ss: 0018
Process mkreiserfs (pid: 135, stackpage=f4a8f000)
Stack: 00000700 00000000 00000000 f4a5a000 c023896c 00000246 f7ef1740 00000000 
00000000 fac4a887 00038001 00000070 f4a8fe98 00000700 00000000 c02a45dc 
f7ef1740 00000000 00000001 00000030 00000000 00000000 c018a4a0 c02a45dc 
Call Trace: [<fac4a887>] [<c018a4a0>] [<c018a54c>] [<c018a5f6>] [<c01340f0>] 
[<c012c923>] [<c0136aff>] [<c0136a60>] [<c0126ab5>] [<c0126ee5>] [<c0126e00>] 
[<c0131ae6>] [<c01086eb>] 
Code: 0f 0b 8b 35 04 59 29 c0 c7 44 24 18 70 00 00 00 89 74 24 14 

>>EIP; c012fbf0 <create_bounce+40/250>   <=====
Trace; fac4a886 <END_OF_CODE+207b8/????>
Trace; c018a4a0 <generic_make_request+170/190>
Trace; c018a54c <submit_bio+4c/60>
Trace; c018a5f6 <submit_bh+96/a0>
Trace; c01340f0 <block_read_full_page+1a0/1c0>
Trace; c012c922 <__alloc_pages+32/170>
Trace; c0136afe <blkdev_readpage+e/20>
Trace; c0136a60 <blkdev_get_block+0/40>
Trace; c0126ab4 <do_generic_file_read+274/3f0>
Trace; c0126ee4 <generic_file_read+84/140>
Trace; c0126e00 <file_read_actor+0/60>
Trace; c0131ae6 <sys_read+96/d0>
Trace; c01086ea <system_call+32/38>
Code;  c012fbf0 <create_bounce+40/250>
00000000 <_EIP>:
Code;  c012fbf0 <create_bounce+40/250>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012fbf2 <create_bounce+42/250>
   2:   8b 35 04 59 29 c0         mov    0xc0295904,%esi
Code;  c012fbf8 <create_bounce+48/250>
   8:   c7 44 24 18 70 00 00      movl   $0x70,0x18(%esp,1)
Code;  c012fbfe <create_bounce+4e/250>
   f:   00 
Code;  c012fc00 <create_bounce+50/250>
  10:   89 74 24 14               mov    %esi,0x14(%esp,1)


I rebooted, and tried to create the loopback reiserfs again and
got:

invalid operand: 0000
CPU:    0
EIP:    0010:[<c012fbf0>]    Not tainted
EFLAGS: 00010287
eax: 00000070   ebx: 00000700   ecx: c02a45dc   edx: 00038001
esi: 00000000   edi: 00000000   ebp: f4d0e000   esp: f4c31e38
ds: 0018   es: 0018   ss: 0018
Process mkreiserfs (pid: 118, stackpage=f4c31000)
Stack: 00000700 00000000 00000000 f4d0e000 f4c4c2c0 00000246 f7ef1900 00000000 
00000000 fac28887 00038001 00000070 f4c31e98 00000700 00000000 c02a45dc 
f7ef1900 00000000 00000001 00000030 00000000 00000000 c018a4a0 c02a45dc 
Call Trace: [<fac28887>] [<c018a4a0>] [<c018a54c>] [<c018a5f6>] [<c01340f0>] 
[<c012c923>] [<c0136aff>] [<c0136a60>] [<c0126ab5>] [<c0126ee5>] [<c0126e00>] 
[<c0131ae6>] [<c01086eb>] 
Code: 0f 0b 8b 35 04 59 29 c0 c7 44 24 18 70 00 00 00 89 74 24 14 

>>EIP; c012fbf0 <create_bounce+40/250>   <=====
Trace; fac28886 <[loop]loop_make_request+96/200>
Trace; c018a4a0 <generic_make_request+170/190>
Trace; c018a54c <submit_bio+4c/60>
Trace; c018a5f6 <submit_bh+96/a0>
Trace; c01340f0 <block_read_full_page+1a0/1c0>
Trace; c012c922 <__alloc_pages+32/170>
Trace; c0136afe <blkdev_readpage+e/20>
Trace; c0136a60 <blkdev_get_block+0/40>
Trace; c0126ab4 <do_generic_file_read+274/3f0>
Trace; c0126ee4 <generic_file_read+84/140>
Trace; c0126e00 <file_read_actor+0/60>
Trace; c0131ae6 <sys_read+96/d0>
Trace; c01086ea <system_call+32/38>
Code;  c012fbf0 <create_bounce+40/250>
00000000 <_EIP>:
Code;  c012fbf0 <create_bounce+40/250>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012fbf2 <create_bounce+42/250>
   2:   8b 35 04 59 29 c0         mov    0xc0295904,%esi
Code;  c012fbf8 <create_bounce+48/250>
   8:   c7 44 24 18 70 00 00      movl   $0x70,0x18(%esp,1)
Code;  c012fbfe <create_bounce+4e/250>
   f:   00 
Code;  c012fc00 <create_bounce+50/250>
  10:   89 74 24 14               mov    %esi,0x14(%esp,1)

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 18:47       ` rwhron
@ 2001-12-21 22:19         ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2001-12-21 22:19 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

On Fri, Dec 21 2001, rwhron@earthlink.net wrote:
> On Fri, Dec 21, 2001 at 06:01:56PM +0100, Jens Axboe wrote:
> > Thanks -- could you also try and do sysrq-t back traces when it seems
> > stuck?
> > 
> > Does a non-highmem kernel run ok?
> > 
> > -- 
> > Jens Axboe
> 
> I recompiled with highmem turned off.  
> # CONFIG_HIGHMEM4G is not set
> # CONFIG_HIGHMEM64G is not set
> 
> I run a scripty that executes dbench 32, then dbench 128.

Ok, please try something for me. In drivers/block/elevator.c, comment
out this block:

	if (q->last_merge) {
		__rq = list_entry_rq(q->last_merge);
		BUG_ON(__rq->flags & REQ_STARTED);

		if ((ret = elv_try_merge(__rq, bio))) {
			*req = __rq;
			return ret;
		}
	}

(just #if 0 the entire thing) -- the one inside elevator_linus_merge()

Loop back highmem issue is different, I'll take a look at that later.
I'll be pretty unresponsive over christmas, though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 14:46 ` Jens Axboe
  2001-12-21 16:43   ` rwhron
@ 2001-12-21 23:55   ` rwhron
  2001-12-24 14:03     ` Jens Axboe
  1 sibling, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-21 23:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

> Ok, please try something for me. In drivers/block/elevator.c, comment
> out this block:

After commenting the block of code, make clean, etc, I rebooted and ran 
the dbench 32, 128 scripty.  It completed dbench 32 again, but dbench
128 hung again.  I could quit some tools.  df, ps, wouldn't return
and didn't listen to <ctrl c>.

> Loop back highmem issue is different, I'll take a look at that later.
> I'll be pretty unresponsive over christmas, though.
> 
> Jens Axboe

Enjoy the holidays!

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-21 23:55   ` rwhron
@ 2001-12-24 14:03     ` Jens Axboe
  2001-12-24 16:59       ` rwhron
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2001-12-24 14:03 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

On Fri, Dec 21 2001, rwhron@earthlink.net wrote:
> > Ok, please try something for me. In drivers/block/elevator.c, comment
> > out this block:
> 
> After commenting the block of code, make clean, etc, I rebooted and ran 
> the dbench 32, 128 scripty.  It completed dbench 32 again, but dbench
> 128 hung again.  I could quit some tools.  df, ps, wouldn't return
> and didn't listen to <ctrl c>.

What IDE controller are you using? The two other reports so far have
been with VIA, maybe that's a clue.

Anyways, could you please reproduce with this applied?

-- 
Jens Axboe


[-- Attachment #2: bio-252p1-2 --]
[-- Type: text/plain, Size: 6933 bytes --]

diff -ur -X exclude /opt/kernel/linux-2.5.2-pre1/drivers/block/elevator.c linux/drivers/block/elevator.c
--- /opt/kernel/linux-2.5.2-pre1/drivers/block/elevator.c	Sun Dec 23 17:11:54 2001
+++ linux/drivers/block/elevator.c	Sun Dec 23 15:53:07 2001
@@ -124,21 +124,21 @@
 inline int elv_try_merge(struct request *__rq, struct bio *bio)
 {
 	unsigned int count = bio_sectors(bio);
-
-	if (!elv_rq_merge_ok(__rq, bio))
-		return ELEVATOR_NO_MERGE;
+	int ret = ELEVATOR_NO_MERGE;
 
 	/*
 	 * we can merge and sequence is ok, check if it's possible
 	 */
-	if (__rq->sector + __rq->nr_sectors == bio->bi_sector) {
-		return ELEVATOR_BACK_MERGE;
-	} else if (__rq->sector - count == bio->bi_sector) {
-		__rq->elevator_sequence -= count;
-		return ELEVATOR_FRONT_MERGE;
+	if (elv_rq_merge_ok(__rq, bio)) {
+		if (__rq->sector + __rq->nr_sectors == bio->bi_sector) {
+			ret = ELEVATOR_BACK_MERGE;
+		} else if (__rq->sector - count == bio->bi_sector) {
+			__rq->elevator_sequence -= count;
+			ret = ELEVATOR_FRONT_MERGE;
+		}
 	}
 
-	return ELEVATOR_NO_MERGE;
+	return ret;
 }
 
 int elevator_linus_merge(request_queue_t *q, struct request **req,
@@ -172,15 +172,17 @@
 		 */
 		if (__rq->elevator_sequence-- <= 0)
 			break;
+
 		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
 			break;
 		if (!(__rq->flags & REQ_CMD))
 			continue;
-		if (__rq->elevator_sequence < 0)
-			break;
 
 		if (!*req && bio_rq_in_between(bio, __rq, &q->queue_head))
 			*req = __rq;
+
+		if (__rq->elevator_sequence < bio_sectors(bio))
+			break;
 
 		if ((ret = elv_try_merge(__rq, bio))) {
 			*req = __rq;
diff -ur -X exclude /opt/kernel/linux-2.5.2-pre1/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c
--- /opt/kernel/linux-2.5.2-pre1/drivers/block/ll_rw_blk.c	Sun Dec 23 17:11:54 2001
+++ linux/drivers/block/ll_rw_blk.c	Mon Dec 24 14:50:46 2001
@@ -155,6 +155,11 @@
 	blk_queue_max_sectors(q, MAX_SECTORS);
 	blk_queue_hardsect_size(q, 512);
 
+	/*
+	 * by default assume old behaviour and bounce for any highmem page
+	 */
+	blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
+
 	init_waitqueue_head(&q->queue_wait);
 }
 
@@ -603,9 +608,6 @@
 		return 0;
 
 	/* Merge is OK... */
-	if (q->last_merge == &next->queuelist)
-		q->last_merge = NULL;
-
 	req->nr_phys_segments = total_phys_segments;
 	req->nr_hw_segments = total_hw_segments;
 	return 1;
@@ -812,12 +814,8 @@
 	q->plug_tq.data		= q;
 	q->queue_flags		= (1 << QUEUE_FLAG_CLUSTER);
 	q->queue_lock		= lock;
+	q->last_merge		= NULL;
 	
-	/*
-	 * by default assume old behaviour and bounce for any highmem page
-	 */
-	blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
-
 	blk_queue_segment_boundary(q, 0xffffffff);
 
 	blk_queue_make_request(q, __make_request);
@@ -886,6 +884,12 @@
 	if (!rq && (gfp_mask & __GFP_WAIT))
 		rq = get_request_wait(q, rw);
 
+	if (rq) {
+		rq->flags = 0;
+		rq->buffer = NULL;
+		rq->bio = rq->biotail = NULL;
+		rq->waiting = NULL;
+	}
 	return rq;
 }
 
@@ -953,10 +977,15 @@
 	/*
 	 * debug stuff...
 	 */
-	if (insert_here == &q->queue_head) {
-		struct request *__rq = __elv_next_request(q);
+	if (insert_here->next != &q->queue_head) {
+		struct request *__rq = list_entry_rq(insert_here->next);
 
+#if 0
 		BUG_ON(__rq && (__rq->flags & REQ_STARTED));
+#else
+		if (__rq->flags & REQ_STARTED)
+			printk("add_request: irk, next is started\n");
+#endif
 	}
 
 	/*
@@ -972,11 +1001,15 @@
 void blkdev_release_request(struct request *req)
 {
 	struct request_list *rl = req->rl;
+	request_queue_t *q = req->q;
 
 	req->rq_status = RQ_INACTIVE;
 	req->q = NULL;
 	req->rl = NULL;
 
+	if (q && q->last_merge == &req->queuelist)
+		q->last_merge = NULL;
+
 	/*
 	 * Request may not have originated from ll_rw_blk. if not,
 	 * it didn't come out of our reserved rq pools
@@ -1571,21 +1604,23 @@
 
 inline void blk_recalc_rq_sectors(struct request *rq, int nsect)
 {
-	rq->hard_sector += nsect;
-	rq->hard_nr_sectors -= nsect;
-	rq->sector = rq->hard_sector;
-	rq->nr_sectors = rq->hard_nr_sectors;
+	if (rq->flags & REQ_CMD) {
+		rq->hard_sector += nsect;
+		rq->hard_nr_sectors -= nsect;
+		rq->sector = rq->hard_sector;
+		rq->nr_sectors = rq->hard_nr_sectors;
 
-	rq->current_nr_sectors = bio_iovec(rq->bio)->bv_len >> 9;
-	rq->hard_cur_sectors = rq->current_nr_sectors;
+		rq->current_nr_sectors = bio_iovec(rq->bio)->bv_len >> 9;
+		rq->hard_cur_sectors = rq->current_nr_sectors;
 
-	/*
-	 * if total number of sectors is less than the first segment
-	 * size, something has gone terribly wrong
-	 */
-	if (rq->nr_sectors < rq->current_nr_sectors) {
-		printk("blk: request botched\n");
-		rq->nr_sectors = rq->current_nr_sectors;
+		/*
+		 * if total number of sectors is less than the first segment
+		 * size, something has gone terribly wrong
+		 */
+		if (rq->nr_sectors < rq->current_nr_sectors) {
+			printk("blk: request botched\n");
+			rq->nr_sectors = rq->current_nr_sectors;
+		}
 	}
 }
 
diff -ur -X exclude /opt/kernel/linux-2.5.2-pre1/include/linux/blkdev.h linux/include/linux/blkdev.h
--- /opt/kernel/linux-2.5.2-pre1/include/linux/blkdev.h	Sun Dec 23 17:11:55 2001
+++ linux/include/linux/blkdev.h	Sun Dec 23 17:15:02 2001
@@ -196,8 +196,7 @@
 #define RQ_SCSI_DISCONNECTING	0xffe0
 
 #define QUEUE_FLAG_PLUGGED	0	/* queue is plugged */
-#define QUEUE_FLAG_NOSPLIT	1	/* can process bio over several goes */
-#define QUEUE_FLAG_CLUSTER	2	/* cluster several segments into 1 */
+#define QUEUE_FLAG_CLUSTER	1	/* cluster several segments into 1 */
 
 #define blk_queue_plugged(q)	test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
 #define blk_mark_plugged(q)	set_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
diff -ur -X exclude /opt/kernel/linux-2.5.2-pre1/mm/highmem.c linux/mm/highmem.c
--- /opt/kernel/linux-2.5.2-pre1/mm/highmem.c	Sun Dec 23 17:11:56 2001
+++ linux/mm/highmem.c	Mon Dec 24 13:59:21 2001
@@ -25,7 +25,9 @@
 
 static void *page_pool_alloc(int gfp_mask, void *data)
 {
-	return alloc_page(gfp_mask);
+	int gfp = gfp_mask | (int) data;
+
+	return alloc_page(gfp);
 }
 
 static void page_pool_free(void *page, void *data)
@@ -252,7 +254,7 @@
 	if (isa_page_pool)
 		return 0;
 
-	isa_page_pool = mempool_create(ISA_POOL_SIZE, page_pool_alloc, page_pool_free, NULL);
+	isa_page_pool = mempool_create(ISA_POOL_SIZE, page_pool_alloc, page_pool_free, (void *) __GFP_DMA);
 	if (!isa_page_pool)
 		BUG();
 
@@ -272,7 +274,7 @@
 	int i;
 
 	__bio_for_each_segment(tovec, to, i, 0) {
-		fromvec = &from->bi_io_vec[i];
+		fromvec = from->bi_io_vec + i;
 
 		/*
 		 * not bounced
@@ -301,7 +303,7 @@
 	 * free up bounce indirect pages used
 	 */
 	__bio_for_each_segment(bvec, bio, i, 0) {
-		org_vec = &bio_orig->bi_io_vec[i];
+		org_vec = bio_orig->bi_io_vec + i;
 		if (bvec->bv_page == org_vec->bv_page)
 			continue;
 
@@ -394,7 +397,7 @@
 		if (!bio)
 			bio = bio_alloc(bio_gfp, (*bio_orig)->bi_vcnt);
 
-		to = &bio->bi_io_vec[i];
+		to = bio->bi_io_vec + i;
 
 		to->bv_page = mempool_alloc(pool, gfp);
 		to->bv_len = from->bv_len;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-24 14:03     ` Jens Axboe
@ 2001-12-24 16:59       ` rwhron
  2001-12-24 17:02         ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-24 16:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Mon, Dec 24, 2001 at 03:03:37PM +0100, Jens Axboe wrote:
> On Fri, Dec 21 2001, rwhron@earthlink.net wrote:
> What IDE controller are you using? The two other reports so far have
> been with VIA, maybe that's a clue.

I do have one of the perhaps buggier VIA chipsets.  

00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)

00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
	Subsystem: VIA Technologies, Inc. Bus Master IDE
	Flags: bus master, medium devsel, latency 32
	I/O ports at d000 [size=16]
	Capabilities: <available only to root>

It's been reliable for a long time, but it wouldn't compile an Athlon 
optimized kernel until 2.4.1x.  (Kernel would Oops at boot time unless 
compiled with CONFIG_M586=y)

It was reliable when not optimized for Athlon.

> Anyways, could you please reproduce with this applied?
> 
> -- 
> Jens Axboe

With the patch, it still hangs on this system.  I recompiled with
CONFIG_NOHIGHMEM=y and CONFIG_M586=y, but that ended up with all processes 
in "b" state during dbench 32 too.

I tried unpatched 2.5.2-pre1 on a k6-2.  dbench 32 hung similarly with 
32 in "b", bo and bi = 0, and id = 100.  That machine is ill now and can't
find "init" when booting, boot single, or boot init=/bin/bash.

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-24 16:59       ` rwhron
@ 2001-12-24 17:02         ` Jens Axboe
  2001-12-24 22:14           ` rwhron
  2001-12-27 19:07           ` rwhron
  0 siblings, 2 replies; 21+ messages in thread
From: Jens Axboe @ 2001-12-24 17:02 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

On Mon, Dec 24 2001, rwhron@earthlink.net wrote:
> On Mon, Dec 24, 2001 at 03:03:37PM +0100, Jens Axboe wrote:
> > On Fri, Dec 21 2001, rwhron@earthlink.net wrote:
> > What IDE controller are you using? The two other reports so far have
> > been with VIA, maybe that's a clue.
> 
> I do have one of the perhaps buggier VIA chipsets.  
> 
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
> 00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
> 01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
> 
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
> 	Subsystem: VIA Technologies, Inc. Bus Master IDE
> 	Flags: bus master, medium devsel, latency 32
> 	I/O ports at d000 [size=16]
> 	Capabilities: <available only to root>
> 
> It's been reliable for a long time, but it wouldn't compile an Athlon 
> optimized kernel until 2.4.1x.  (Kernel would Oops at boot time unless 
> compiled with CONFIG_M586=y)

Ok noted

> > Anyways, could you please reproduce with this applied?
> > 
> > -- 
> > Jens Axboe
> 
> With the patch, it still hangs on this system.  I recompiled with
> CONFIG_NOHIGHMEM=y and CONFIG_M586=y, but that ended up with all processes 
> in "b" state during dbench 32 too.

I would suspect that, do you get any kernel messages?

> I tried unpatched 2.5.2-pre1 on a k6-2.  dbench 32 hung similarly with 
> 32 in "b", bo and bi = 0, and id = 100.  That machine is ill now and can't
> find "init" when booting, boot single, or boot init=/bin/bash.

Please send ps -eo cmd,wchan info for a hung machine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-24 17:02         ` Jens Axboe
@ 2001-12-24 22:14           ` rwhron
  2001-12-27 19:07           ` rwhron
  1 sibling, 0 replies; 21+ messages in thread
From: rwhron @ 2001-12-24 22:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Mon, Dec 24, 2001 at 06:02:44PM +0100, Jens Axboe wrote:
> 
> I would suspect that, do you get any kernel messages?

When the machine gets in this state, it won't save any files,
so kern.log doesn't have anything after the initial boot message.

> Please send ps -eo cmd,wchan info for a hung machine.
> 
> -- 
> Jens Axboe

Strangely (to me anyway), when dbench 32 hangs the machine,
ps will not print anything.  vmstat will continue it's 8 
second cycle though.

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-24 17:02         ` Jens Axboe
  2001-12-24 22:14           ` rwhron
@ 2001-12-27 19:07           ` rwhron
  2001-12-28 11:40             ` Jens Axboe
  1 sibling, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-27 19:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Mon, Dec 24, 2001 at 06:02:44PM +0100, Jens Axboe wrote:
> > I tried unpatched 2.5.2-pre1 on a k6-2.  dbench 32 hung similarly with 
> > 32 in "b", bo and bi = 0, and id = 100.  That machine is ill now and can't
> > find "init" when booting, boot single, or boot init=/bin/bash.
> 
> Please send ps -eo cmd,wchan info for a hung machine.
> 
> -- 
> Jens Axboe
> 

I rebuilt the reiserfs that dbench writes to.
Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:

CMD              WCHAN
init             do_select
[keventd]        context_thread
[ksoftirqd_CPU0] ksoftirqd
[kswapd]         kswapd
[bdflush]        bdflush
[kupdated]       get_request_wait
[kreiserfsd]     get_request_wait
/usr/sbin/syslog get_request_wait
/usr/sbin/klogd  do_syslog
[eth0]           rtl8139_thread
/usr/sbin/iplog  do_select
/usr/sbin/iplog  do_poll
/usr/sbin/iplog  get_request_wait
/usr/sbin/iplog  do_select
/usr/sbin/iplog  wait_for_packet
/usr/sbin/sshd   do_select
/sbin/agetty tty read_chan
/bin/login --    down
/usr/sbin/sshd   do_select
-bash            wait4
-su              wait4
/usr/sbin/sshd   do_select
-bash            wait4
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/dbench 32      get_request_wait
/usr/sbin/sshd   do_select
/usr/sbin/sshd   get_request_wait
ed /tmp/ls       get_request_wait
ps -eo cmd,wchan -


vmstat 3
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1 37  2      0  25464   3224 333252   0   0    13   371  107    33   0   4  96
 0 37  2      0  25460   3224 333252   0   0     0     0  102     6   0   0 100
 0 37  2      0  25460   3224 333252   0   0     0     0  101     7   0   0 100


I rebooted and ran dbench 32 on a new ext2 filesystem.  dbench runs okay for about
30 seconds.  Towards the end of the vmstat output below, I try to ssh in, the "b"
column goes up, but I don't the a bash prompt.

mountain:~$ vmstat 10
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 346236  20012   6316   0   0   793    67  174   164   3   8  90
 0 32  0      0 182364  21396 162428   0   0    79  3492  136   109   2  26  72
21 11  0      0 163904  21532 180264   0   0     0 11683  209    97   0  11  89
 0 32  0      0  32416  23224 306540   0   0     5  6375  226   108   1  27  72
 0 32  1      0  22552  23392 315972   0   0     3  9807  206    98   0   8  92
 0 32  2    132   4584   7128 349660   0   0    13  2905  192   204   2  29  69
 0 32  2    132   4580   7128 349660   0   0     0     0  101    44   0   0 100
 0 32  2    132   4580   7128 349660   0   0     0     0  100    45   0   0 100
 0 32  2    132   4580   7128 349660   0   0     0     0  100    44   0   0 100
 0 32  2    132   4580   7128 349660   0   0     0     0  100    44   0   0 100
 0 32  2    132   4580   7128 349660   0   0     0     0  100    44   0   0 100
 0 32  2    132   4580   7128 349660   0   0     0     0  100    44   0   0 100
 0 32  2    132   4580   7128 349660   0   0     0     0  101    45   0   0 100
 0 35  2    132   4156   7128 349672   0   0     1     1  104    52   1   0  99
 0 35  2    132   4156   7128 349672   0   0     0     0  100    44   0   0 100

Below is software, hardware, and kernel configs:

Linux (none) 2.5.2-pre2 #1 Thu Dec 27 12:32:39 EST 2001 i586 unknown

Gnu C                  2.95.3
Gnu make               3.79.1
binutils               2.11.2
util-linux             2.11n
mount                  2.11n
modutils               2.4.11
e2fsprogs              1.25
reiserfsprogs          3.x.0k-pre14
PPP                    2.4.1
Linux C Library        2.2.4
Dynamic linker (ldd)   2.2.4
Procps                 2.0.7
Net-tools              1.60
Kbd                    1.06
Sh-utils               2.0
Modules Loaded


This machine has a VIA chipset.  No proprietary drivers.
384 MB RAM.
Root filesystem on /dev/hdc2  # not the usual /dev/hda

00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev 04)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA [Apollo VP] (rev 47)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.3 Host bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
01:00.0 VGA compatible controller: nVidia Corporation Vanta [NV6] (rev 15)

2.4.18-pre1 (and other 2.4.17* kernels run dbench 32, 128 okay on this system)
This is the config difference:

diff 2.5.2-pre2 2.4.18-pre1
> CONFIG_NETLINK_DEV=y
< CONFIG_RAMFS=y


# 2.5.2-pre2 config
CONFIG_X86=y
CONFIG_ISA=y
CONFIG_UID16=y
CONFIG_EXPERIMENTAL=y
CONFIG_MODULES=y
CONFIG_KMOD=y
CONFIG_MK6=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_TSC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_PM=y
CONFIG_APM=m
CONFIG_APM_DO_ENABLE=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_NF_CONNTRACK=y
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_LIMIT=y
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_STATE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_NAT=y
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_PCI=y
CONFIG_8139TOO=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_SERIAL_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=64
CONFIG_MOUSE=m
CONFIG_PSMOUSE=y
CONFIG_REISERFS_FS=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=m
CONFIG_NTFS_FS=m
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_EXT2_FS=y
CONFIG_CODA_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_VGA_CONSOLE=y
CONFIG_VIDEO_SELECT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-27 19:07           ` rwhron
@ 2001-12-28 11:40             ` Jens Axboe
  2001-12-28 14:14               ` rwhron
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2001-12-28 11:40 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

On Thu, Dec 27 2001, rwhron@earthlink.net wrote:
> On Mon, Dec 24, 2001 at 06:02:44PM +0100, Jens Axboe wrote:
> > > I tried unpatched 2.5.2-pre1 on a k6-2.  dbench 32 hung similarly with 
> > > 32 in "b", bo and bi = 0, and id = 100.  That machine is ill now and can't
> > > find "init" when booting, boot single, or boot init=/bin/bash.
> > 
> > Please send ps -eo cmd,wchan info for a hung machine.
> > 
> > -- 
> > Jens Axboe
> > 
> 
> I rebuilt the reiserfs that dbench writes to.
> Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:

Ah this is interesting, all stuck in get_request_wait. I cannot
reproduce your problem here whatever I do, no reiser though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-28 11:40             ` Jens Axboe
@ 2001-12-28 14:14               ` rwhron
  2001-12-28 14:30                 ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-28 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Fri, Dec 28, 2001 at 12:40:37PM +0100, Jens Axboe wrote:
> > I rebuilt the reiserfs that dbench writes to.
> > Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:
> 
> Ah this is interesting, all stuck in get_request_wait. I cannot
> reproduce your problem here whatever I do, no reiser though.
> 
> -- 
> Jens Axboe

That's good news.  It's probably something with my configuration
or hardware.  I saw the livelock on both ext2 and reiserfs.

I removed these options from the config and rebuilt 2.5.2-pre2:
CONFIG_PM=y
CONFIG_APM=m
CONFIG_APM_DO_ENABLE=y
CONFIG_NTFS_FS=m
CONFIG_CODA_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_VIDEO_SELECT=y

The initial dbench on ext2 completed for 32 processes but 128 didn't:

vmstat 8
  procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0 128  1    132  14796  22136 314916   0   0     0  5467  272   122   1  13  86
 0 128  1    636   3968  21844 328132   0   9     1  1338  132   125   1  18  81
 0 128  1    636   3964  21844 328132   0   0     0     0  101    44   0   0 100
 0 128  1    636   3964  21844 328132   0   0     0     0  101    45   0   0 100
 0 128  1    636   3964  21844 328132   0   0     0     0  101    45   0   0 100

ps -eo cmd,wchan | uniq
CMD              WCHAN
init             pollwait
[keventd]        context_thread
[ksoftirqd_CPU0] ksoftirqd
[kswapd]         refill_inactive
[bdflush]        try_to_free_buffers
[kupdated]       init_private_file
[kreiserfsd]     reiserfs_get_block
/usr/sbin/syslog pollwait
/usr/sbin/klogd  do_syslog
[eth0]           timer_do_blank_screen
/usr/sbin/iplog  pollwait
/usr/sbin/iplog  select
/usr/sbin/iplog  rt_sigsuspend
/usr/sbin/iplog  pollwait
/usr/sbin/iplog  netdev_ethtool_ioctl
/usr/sbin/sshd   pollwait
/sbin/agetty tty is_internal
/bin/login --    write_chan
/usr/sbin/sshd   pollwait
-bash            wait4
/usr/sbin/sshd   pollwait
-bash            wait4
/bin/bash ./chk  wait4
/dbench 128     wait4
/dbench 128     down
/dbench 128     write_chan
/dbench 128     init_private_file
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     add_to_page_cache_unique
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
/dbench 128     write_chan
/dbench 128     down
ps -eo cmd,wchan -
uniq             do_execve

I stripped down the config a little more by removing these:

CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
CONFIG_IP_NF_CONNTRACK=y
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_LIMIT=y
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_STATE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_NAT=y
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_NAT_FTP=m
CONFIG_BLK_DEV_IDECD=m
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_ISO9660_FS=m
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m

With the stripped config, I built 2.5.2-pre3.  It panic'd
with the stripped config.  2.5.2-pre3 panic'd yesterday
on this machine's normal config too.

Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
8139too Fast Ethernet driver 0.9.22
PCI: Found IRQ 11 for device 00:13.0
IRQ routing conflict for 00:13.0, have irq 9, want irq 11
eth0: RealTek RTL8139 Fast Ethernet at 0xd8800000, 00:50:bf:25:68:f3, IRQ 9
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Kernel panic: Out of memory and no killable processes...

I haven't noticed any reports of this panic on 2.5.2-pre3.

Back to 2.5.2-pre2, I removed these:
CONFIG_BLK_DEV_LOOP=m
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_BLK_DEV_IDE_MODES=y

dbench 32 locked up again.

I re-ran dbench 32, 128 with 2.4.17rc2aa2 on this machine and 
it worked fine.  I'll try 2.5.1 on this machine (2.5.1 was 
okay on another machine).  

--
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-28 14:14               ` rwhron
@ 2001-12-28 14:30                 ` Jens Axboe
  2001-12-28 17:49                   ` rwhron
                                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jens Axboe @ 2001-12-28 14:30 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

On Fri, Dec 28 2001, rwhron@earthlink.net wrote:
> On Fri, Dec 28, 2001 at 12:40:37PM +0100, Jens Axboe wrote:
> > > I rebuilt the reiserfs that dbench writes to.
> > > Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:
> > 
> > Ah this is interesting, all stuck in get_request_wait. I cannot
> > reproduce your problem here whatever I do, no reiser though.
> > 
> > -- 
> > Jens Axboe
> 
> That's good news.  It's probably something with my configuration
> or hardware.  I saw the livelock on both ext2 and reiserfs.

Thanks for an excellent report. I can't quite see what the problem
should be yet, especially since the problems seem to start with
2.5.2-pre1 which doesn't really have a lot of interesting changes. I'll
keep looking, though. Could you do sysrq-t for a livelocked system?

The livelocks in this mail appear different than the previous ones.
Could you try running without swap?

> With the stripped config, I built 2.5.2-pre3.  It panic'd
> with the stripped config.  2.5.2-pre3 panic'd yesterday
> on this machine's normal config too.
> 
> Floppy drive(s): fd0 is 1.44M
> FDC 0 is a post-1991 82077
> 8139too Fast Ethernet driver 0.9.22
> PCI: Found IRQ 11 for device 00:13.0
> IRQ routing conflict for 00:13.0, have irq 9, want irq 11
> eth0: RealTek RTL8139 Fast Ethernet at 0xd8800000, 00:50:bf:25:68:f3, IRQ 9
> NET4: Linux TCP/IP 1.0 for NET4.0
> IP Protocols: ICMP, UDP, TCP
> IP: routing cache hash table of 4096 buckets, 32Kbytes
> TCP: Hash tables configured (established 32768 bind 32768)
> NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
> Kernel panic: Out of memory and no killable processes...
> 
> I haven't noticed any reports of this panic on 2.5.2-pre3.

Someone else did report a similar case. Very strange, doesn't look bio
related at all. WHat's the entire boot message for a 2.5.2-pre3 boot
attempt like the above?

> I re-ran dbench 32, 128 with 2.4.17rc2aa2 on this machine and 
> it worked fine.  I'll try 2.5.1 on this machine (2.5.1 was 
> okay on another machine).  

2.5.1 vs 2.5.2-preX is much more interesting.

(btw, attached patch should fix your highmem oops)

--- /opt/kernel/linux-2.5.2-pre3/include/linux/blkdev.h	Fri Dec 28 11:43:04 2001
+++ include/linux/blkdev.h	Fri Dec 28 15:25:36 2001
@@ -228,8 +228,8 @@
  * BLK_BOUNCE_ANY	: don't bounce anything
  * BLK_BOUNCE_ISA	: bounce pages above ISA DMA boundary
  */
-#define BLK_BOUNCE_HIGH		((blk_max_low_pfn + 1) << PAGE_SHIFT)
-#define BLK_BOUNCE_ANY		((blk_max_pfn + 1) << PAGE_SHIFT)
+#define BLK_BOUNCE_HIGH		(blk_max_low_pfn << PAGE_SHIFT)
+#define BLK_BOUNCE_ANY		(blk_max_pfn << PAGE_SHIFT)
 #define BLK_BOUNCE_ISA		(ISA_DMA_THRESHOLD)
 
 extern int init_emergency_isa_pool(void);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-28 14:30                 ` Jens Axboe
@ 2001-12-28 17:49                   ` rwhron
  2001-12-28 19:29                   ` rwhron
  2001-12-29  6:42                   ` rwhron
  2 siblings, 0 replies; 21+ messages in thread
From: rwhron @ 2001-12-28 17:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Fri, Dec 28, 2001 at 03:30:22PM +0100, Jens Axboe wrote:
> Thanks for an excellent report. I can't quite see what the problem
> should be yet, especially since the problems seem to start with
> 2.5.2-pre1 which doesn't really have a lot of interesting changes. I'll
> keep looking, though. Could you do sysrq-t for a livelocked system?

I don't know how to do sysrq-t via serial console.  If I put a monitor
and keyboard on the box, syslogd is blocked when the livelock occurs,
and I haven't figured out a workaround yet.

2.5.1 runs dbench 32, 128, by the way.

> The livelocks in this mail appear different than the previous ones.
> Could you try running without swap?

Here is without swap on 2.5.2-pre2.

vmstat 8
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 350756  19484   5464   0   0     0     0  100    41   0   0 100
 0  0  0      0 350756  19484   5464   0   0     0     0  100    41   0   0 100
 3 29  0      0 344668  19588   8464   0   0    29     0  108    70   1   1  98
 0 32  1      0 184264  20824 162556   0   0    32  9123 1085    59   3  86  11
21 11  3      0 181748  20864 164916   0   0     1 10500 1503    20   1  83  16
 0 32  1      0 148560  21272 196764   0   0     4  4838  893    52   2  47  51
 6 26  2      0 106532  21804 237140   0   0     2  5590  836    62   2  35  64
 0 32  2      0   4448   5380 353332   0   0    11    44  253   120   2  26  73
 0 32  2      0   4448   5380 353332   0   0     0     0  101    41   0   0 100
 0 32  2      0   4448   5380 353332   0   0     0     0  101    41   0   0 100

ps -eo cmd,wchan
CMD              WCHAN
init             do_select
[keventd]        context_thread
[ksoftirqd_CPU0] ksoftirqd
[kswapd]         kswapd
[bdflush]        wait_on_buffer
[kupdated]       wait_on_buffer
[kreiserfsd]     reiserfs_journal_commit_thread
/usr/sbin/syslog do_select
/usr/sbin/klogd  do_syslog
[eth0]           rtl8139_thread
/usr/sbin/sshd   do_select
/sbin/agetty tty read_chan
/sbin/agetty -h  read_chan
/usr/sbin/sshd   do_select
-bash            wait4
/usr/sbin/sshd   -
-bash            wait4
/bin/bash ./chk  wait4
/dbench 32      wait4
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      wait_on_buffer
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
/dbench 32      down
ps -eo cmd,wchan -


> > Kernel panic: Out of memory and no killable processes...
> 
> Someone else did report a similar case. Very strange, doesn't look bio
> related at all. WHat's the entire boot message for a 2.5.2-pre3 boot
> attempt like the above?

I rebuilt 2.5.2-pre3 with mrproper using the config that worked for 2.5.1 
first and noticed some depmod errors during the build:

if [ -r System.map ]; then /sbin/depmod -ae -F System.map  2.5.2-pre3; fi
depmod: *** Unresolved symbols in /lib/modules/2.5.2-pre3/kernel/fs/nfs/nfs.o
depmod:         seq_escape
depmod:         seq_printf
make[1]: Entering directory `/usr/src/linux/arch/i386/boot'
sh -x ./install.sh 2.5.2-pre3 bzImage /usr/src/linux/System.map "/boot"

So I removed initrd, loopback, nfs, coda, ntfs, dosfs, vfat, and rebuilt
with mrproper.  

Here is the boot message and panic:

LILO 22.1 boot:
Loading lfs.............
Linux version 2.5.2-pre3 (root@mountain) (gcc version 2.95.3 20010315 (release)) #1 Fri Dec 28 12:33:00 EST 2001
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000018000000 (usable)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
On node 0 totalpages: 98304
zone(0): 4096 pages.
zone(1): 94208 pages.
zone(2): 0 pages.
Kernel command line: BOOT_IMAGE=lfs ro root=1602 console=ttyS1,38400n8
Initializing CPU#0
Detected 501.155 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 999.42 BogoMIPS
Memory: 385036k/393216k available (962k kernel code, 7796k reserved, 243k data, 200k init, 0k highmem)
Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line)
CPU: AMD-K6(tm) 3D processor stepping 0c
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: AMD K6
PCI: PCI BIOS revision 2.10 entry at 0xfb3c0, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router VIA [1106/0586] at 00:07.0
Activating ISA DMA hang workarounds.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Starting kswapd
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec: init pool 0, 1 entries, 12 bytes
biovec: init pool 1, 4 entries, 48 bytes
biovec: init pool 2, 16 entries, 192 bytes
biovec: init pool 3, 64 entries, 768 bytes
biovec: init pool 4, 128 entries, 1536 bytes
biovec: init pool 5, 256 entries, 3072 bytes
Journalled Block Device driver loaded
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
keyboard: Timeout - AT keyboard not present?(ed)
keyboard: Timeout - AT keyboard not present?(f4)
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
block: 256 slots per queue, batch=32
Uniform Multi-Platform E-IDE driver Revision: 6.32
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI slot 00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c586b (rev 47) IDE UDMA33 controller on pci00:07.1
    ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 51536U3, ATA DISK drive
hdb: ATAPI CDROM, ATAPI CD/DVD-ROM drive
hdc: Maxtor 52049U4, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
blk: queue c028dcc4, I/O limit 4095Mb (mask 0xffffffff)
hda: 30015216 sectors (15368 MB) w/2048KiB Cache, CHS=1868/255/63, UDMA(33)
blk: queue c028e054, I/O limit 4095Mb (mask 0xffffffff)
hdc: 40020624 sectors (20491 MB) w/2048KiB Cache, CHS=39703/16/63, UDMA(33)
Partition check:
 hda: hda1 hda2 hda3 < hda5 hda6 hda7 >
 hdc: hdc1 hdc2 hdc3 < hdc5 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
8139too Fast Ethernet driver 0.9.22
PCI: Found IRQ 11 for device 00:13.0
IRQ routing conflict for 00:13.0, have irq 9, want irq 11
eth0: RealTek RTL8139 Fast Ethernet at 0xd8800000, 00:50:bf:25:68:f3, IRQ 9
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
ip_conntrack (3072 buckets, 24576 max)
ip_tables: (c)2000 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Kernel panic: Out of memory and no killable processes...


> > I re-ran dbench 32, 128 with 2.4.17rc2aa2 on this machine and 
>
> 2.5.1 vs 2.5.2-preX is much more interesting.

2.5.1 finishes dbench 32, 128 on this machine.  :)
Throughput 21.6466 MB/sec (NB=27.0582 MB/sec  216.466 MBit/sec)  32 procs
Throughput 5.91991 MB/sec (NB=7.39989 MB/sec  59.1991 MBit/sec)  128 procs


> (btw, attached patch should fix your highmem oops)
> 
> -- 
> Jens Axboe

I'm going to hold off testing on my highmem box for a while.

BTW, the original "cannot find init" after 2.5.1-pre1 was because
I had an invalid "root=" entry in lilo.conf for the kernels 
other than current and "old".  

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-28 14:30                 ` Jens Axboe
  2001-12-28 17:49                   ` rwhron
@ 2001-12-28 19:29                   ` rwhron
  2001-12-29  6:42                   ` rwhron
  2 siblings, 0 replies; 21+ messages in thread
From: rwhron @ 2001-12-28 19:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, linux-kernel

On Fri, Dec 28, 2001 at 03:30:22PM +0100, Jens Axboe wrote:
> keep looking, though. Could you do sysrq-t for a livelocked system?
> -- 
> Jens Axboe

Using a tip from Russell King:

This is while running dbench 32 on an ext2 filesystem.

SysRq : Show State

                         free                        sibling
  task             PC    stack   pid father child younger older
init          S C177FF24  4608     1      0    43       3       (NOTLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c01398d4>] [<c0139c82>] [<c01337d6>]
   [<c01085b3>]
keventd       S 00010000  6596     2      1             7       (L-TLB)
Call Trace: [<c011e245>] [<c0106efc>]
ksoftirqd_CPU S C1770000  6412     3      0             4     1 (L-TLB)
Call Trace: [<c01179b2>] [<c0106efc>]
kswapd        S C176E000  6652     4      0             5     3 (L-TLB)
Call Trace: [<c01282c6>] [<c0106efc>]
bdflush       S 00000286  6568     5      0             6     4 (L-TLB)
Call Trace: [<c0111b29>] [<c0130b53>] [<c0106efc>]
kupdated      D 00000048  5860     6      0                   5 (L-TLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012f265>] [<c015bd08>] [<c015bd95>]
   [<c013dd35>] [<c01309bd>] [<c0130c45>] [<c0106efc>]
kreiserfsd    S D7D1BFB4  6148     7      1            25     2 (L-TLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c0111b7e>] [<c0177257>] [<c0106efc>]
syslogd       D 00000048  4788    25      1            27     7 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c01665b8>]
   [<c0124c35>] [<c012db02>] [<c012479c>] [<c012dc0f>] [<c01085b3>]
klogd         S 7FFFFFFF  2656    27      1            32    25 (NOTLB)
Call Trace: [<c011153f>] [<c01dd4ad>] [<c01ddd37>] [<c01aed94>] [<c01aef9f>]
   [<c012d91a>] [<c01085b3>]
eth0          S D7945F98  2656    32      1            41    27 (L-TLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c0111b7e>] [<c01a0d7e>] [<c0106efc>]
sshd          S 7FFFFFFF  4788    41      1    52      42    32 (NOTLB)
Call Trace: [<c011153f>] [<c01af15d>] [<c01398d4>] [<c0139c82>] [<c01085b3>]
agetty        S 7FFFFFFF  4364    42      1            43    41 (NOTLB)
Call Trace: [<c011153f>] [<c018350d>] [<c017f786>] [<c012d855>] [<c01085b3>]
agetty        S 7FFFFFFF     0    43      1                  42 (NOTLB)
Call Trace: [<c011153f>] [<c018350d>] [<c017f786>] [<c012d855>] [<c01085b3>]
sshd          S 7FFFFFFF  5484    45     41    46      52       (NOTLB)
Call Trace: [<c011153f>] [<c01398d4>] [<c0139c82>] [<c01085b3>]
bash          S 00000000  4580    46     45    59               (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
sshd          S 7FFFFFFF  1568    52     41    53            45 (NOTLB)
Call Trace: [<c0183b6f>] [<c011153f>] [<c01398d4>] [<c0139c82>] [<c01085b3>]
bash          S 00000000  2656    53     52    58               (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
vmstat        S D72B5F8C   644    58     53                     (NOTLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c011a959>] [<c01085b3>]
chk           S 00000000  5284    59     46    60               (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
dbench        S 00000000  5208    60     59    93               (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
dbench        D 00000048  5692    62     60            63       (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D D7744244  5532    63     60            64    62 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
   [<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench        D 00000048  5684    64     60            65    63 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5624    65     60            66    64 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5700    66     60            67    65 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D D7744244  5660    67     60            68    66 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e34b9>] [<c0159886>] [<c015c03c>]
   [<c013655d>] [<c01366ca>] [<c012d07a>] [<c012d3b7>] [<c01085b3>]
dbench        D 00000048  5688    68     60            69    67 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D D7744244  5532    69     60            70    68 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
   [<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench        D 00000048  5780    70     60            71    69 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5756    71     60            72    70 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5692    72     60            73    71 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D D7744244  5692    73     60            74    72 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
   [<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench        D D7744244  5612    74     60            75    73 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
   [<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench        D 00000048  5740    75     60            76    74 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5600    76     60            77    75 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5448    77     60            78    76 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5692    78     60            79    77 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5640    79     60            80    78 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012f265>] [<c0159085>] [<c015a85c>]
   [<c015aa70>] [<c015ae09>] [<c012f8f2>] [<c012fee1>] [<c015ac38>] [<c015aff6>]
   [<c015ac38>] [<c0124bed>] [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5692    80     60            81    79 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5468    81     60            82    80 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5412    82     60            83    81 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5400    83     60            84    82 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5700    84     60            85    83 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5692    85     60            86    84 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5336    86     60            87    85 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D D7744244  5628    87     60            88    86 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e34b9>] [<c0159886>] [<c015c35d>]
   [<c0136da4>] [<c0136e65>] [<c01085b3>]
dbench        D 00000048  5484    88     60            89    87 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D D7744244  5740    89     60            90    88 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
   [<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench        D 00000048  5420    90     60            91    89 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5652    91     60            92    90 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5660    92     60            93    91 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]
dbench        D 00000048  5592    93     60                  92 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
   [<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
   [<c012d91a>] [<c01085b3>]


  PID CMD              WCHAN
    1 init             do_select
    2 [keventd]        context_thread
    3 [ksoftirqd_CPU0] ksoftirqd
    4 [kswapd]         kswapd
    5 [bdflush]        bdflush
    6 [kupdated]       get_request_wait
    7 [kreiserfsd]     reiserfs_journal_commit_thread
   25 /usr/sbin/syslog get_request_wait
   27 /usr/sbin/klogd  unix_wait_for_peer
   32 [eth0]           rtl8139_thread
   41 /usr/sbin/sshd   do_select
   42 /sbin/agetty tty read_chan
   43 /sbin/agetty -h  read_chan
   45 /usr/sbin/sshd   do_select
   46 -bash            wait4
   52 /usr/sbin/sshd   do_select
   53 -bash            wait4
   59 /bin/bash ./chk  wait4
   60 ./dbench 32      wait4
   62 ./dbench 32      get_request_wait
   63 ./dbench 32      down
   64 ./dbench 32      get_request_wait
   65 ./dbench 32      get_request_wait
   66 ./dbench 32      get_request_wait
   67 ./dbench 32      down
   68 ./dbench 32      get_request_wait
   69 ./dbench 32      down
   70 ./dbench 32      get_request_wait
   71 ./dbench 32      get_request_wait
   72 ./dbench 32      get_request_wait
   73 ./dbench 32      down
   74 ./dbench 32      down
   75 ./dbench 32      get_request_wait
   76 ./dbench 32      get_request_wait
   77 ./dbench 32      get_request_wait
   78 ./dbench 32      get_request_wait
   79 ./dbench 32      get_request_wait
   80 ./dbench 32      get_request_wait
   81 ./dbench 32      get_request_wait
   82 ./dbench 32      get_request_wait
   83 ./dbench 32      get_request_wait
   84 ./dbench 32      get_request_wait
   85 ./dbench 32      get_request_wait
   86 ./dbench 32      get_request_wait
   87 ./dbench 32      down
   88 ./dbench 32      get_request_wait
   89 ./dbench 32      down
   90 ./dbench 32      get_request_wait
   91 ./dbench 32      get_request_wait
   92 ./dbench 32      get_request_wait
   93 ./dbench 32      get_request_wait
   97 ps -eo pid,cmd,w -


SysRq : Show Regs

Pid: 0, comm:              swapper
EIP: 0010:[<c0106c03>] CPU: 0 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c0220000 ECX: d7d1a270 EDX: d7d1a270
ESI: c0106be0 EDI: ffffe000 EBP: 0008e000 DS: 0018 ES: 0018
CR0: 8005003b CR2: 080cc00c CR3: 17d02000 CR4: 00000090
Call Trace: [<c0106c67>] [<c0105000>] [<c0105027>]

SysRq : Show Memory
Mem-info:
Free pages:       83640kB (     0kB HighMem)
Zone:DMA freepages: 14632kB min:   128kB low:   256kB high:   384kB
Zone:Normal freepages: 69008kB min:  1020kB low:  2040kB high:  3060kB
Zone:HighMem freepages:     0kB min:     0kB low:     0kB high:     0kB
( Active: 1576, inactive: 69884, free: 20910 )
4*4kB 3*8kB 2*16kB 3*32kB 4*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 6*2048kB = 14632kB)
10*4kB 3*8kB 1*16kB 2*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 33*2048kB = 69008kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap:       136512kB
98304 pages of RAM
0 pages of HIGHMEM
1980 reserved pages
75748 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory:     4252kB


-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-28 14:30                 ` Jens Axboe
  2001-12-28 17:49                   ` rwhron
  2001-12-28 19:29                   ` rwhron
@ 2001-12-29  6:42                   ` rwhron
  2001-12-29 17:33                     ` Jens Axboe
  2 siblings, 1 reply; 21+ messages in thread
From: rwhron @ 2001-12-29  6:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: viro, linux-kernel

> > Kernel panic: Out of memory and no killable processes...
> 
> Someone else did report a similar case. Very strange, doesn't look bio

Al Viro posted a fix:
http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2

I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
and no panic.

Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.

                         free                        sibling
  task             PC    stack   pid father child younger older
init          S C177FF24  4592     1      0    45       3       (NOTLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0139d54>] [<c013a102>] [<c0133c46>]
   [<c01085b3>]
keventd       S 00010000  6580     2      1             7       (L-TLB)
Call Trace: [<c011e3f5>] [<c0106ef0>]
ksoftirqd_CPU S C1770000  6396     3      0             4     1 (L-TLB)
Call Trace: [<c0117b12>] [<c0106ef0>]
kswapd        S C176E000  6636     4      0             5     3 (L-TLB)
Call Trace: [<c0128716>] [<c0106ef0>]
bdflush       S 00000286  6552     5      0             6     4 (L-TLB)
Call Trace: [<c0111c69>] [<c0130fb3>] [<c0106ef0>]
kupdated      D C176BFAC  5864     6      0                   5 (L-TLB)
Call Trace: [<c012e96a>] [<c012eb2b>] [<c0131023>] [<c0106ef0>]
kreiserfsd    S D68E9FB4  6156     7      1            25     2 (L-TLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0111cbe>] [<c0177717>] [<c0106ef0>]
syslogd       D 00000048  4772    25      1            27     7 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0166aa8>]
   [<c0125085>] [<c012df62>] [<c0124bec>] [<c012e06f>] [<c01085b3>]
klogd         S 7FFFFFFF  4772    27      1            32    25 (NOTLB)
Call Trace: [<c011157b>] [<c01e6c4d>] [<c01e74e7>] [<c01b0d77>] [<c01b0f87>]
   [<c012dd7a>] [<c01085b3>]
eth0          S D646FF98     0    32      1            37    27 (L-TLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0111cbe>] [<c01a125e>] [<c0106ef0>]
iplog         S 7FFFFFFF  5304    37      1    38      43    32 (NOTLB)
Call Trace: [<c011157b>] [<c0139bd1>] [<c0139d54>] [<c013a102>] [<c01085b3>]
iplog         S D616DF28   188    38     37    41               (NOTLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c013a37c>] [<c013a57d>] [<c011191c>]
   [<c01085b3>]
iplog         S D6169FB0  5684    39     38            40       (NOTLB)
Call Trace: [<c0107767>] [<c01085b3>]
iplog         S D6165F24  6280    40     38            41    39 (NOTLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0139d54>] [<c013a102>] [<c01085b3>]
iplog         S 7FFFFFFF  5656    41     38                  40 (NOTLB)
Call Trace: [<c011157b>] [<c01bed35>] [<c01b51e2>] [<c01b52fe>] [<c01e960f>]
   [<c01b0dd5>] [<c01b1b47>] [<c011b314>] [<c011b550>] [<c011bc78>] [<c01b6c4b>]
   [<c01b2267>] [<c01085b3>]
sshd          S 7FFFFFFF  4772    43      1    55      44    37 (NOTLB)
Call Trace: [<c011157b>] [<c01b113d>] [<c0139d54>] [<c013a102>] [<c01085b3>]
agetty        S 7FFFFFFF  4468    44      1            45    43 (NOTLB)
Call Trace: [<c011157b>] [<c0183a0d>] [<c017fc76>] [<c012dcb5>] [<c01085b3>]
agetty        S 7FFFFFFF     0    45      1                  44 (NOTLB)
Call Trace: [<c011157b>] [<c0183a0d>] [<c017fc76>] [<c012dcb5>] [<c01085b3>]
sshd          S 7FFFFFFF   548    47     43    48      55       (NOTLB)
Call Trace: [<c011157b>] [<c0139d54>] [<c013a102>] [<c01085b3>]
bash          S 00000000  4564    48     47    62               (NOTLB)
Call Trace: [<c0116b4e>] [<c01085b3>]
sshd          S 7FFFFFFF     0    55     43    56            47 (NOTLB)
Call Trace: [<c011157b>] [<c0139d54>] [<c013a102>] [<c01085b3>]
bash          S 7FFFFFFF  2640    56     55                     (NOTLB)
Call Trace: [<c011157b>] [<c0183a0d>] [<c017fc76>] [<c012dcb5>] [<c01085b3>]
chk           S 00000000     0    62     48    63               (NOTLB)
Call Trace: [<c0116b4e>] [<c01085b3>]
dbench        S 00000000  5192    63     62    96               (NOTLB)
Call Trace: [<c0116b4e>] [<c01085b3>]
dbench        D 00000048  5372    65     63            66       (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5620    66     63            67    65 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c015974b>]
   [<c015a1fd>] [<c015c847>] [<c0137224>] [<c01372e5>] [<c01085b3>]
dbench        D 00000000  5620    67     63            68    66 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5728    68     63            69    67 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5608    69     63            70    68 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000000  5948    70     63            71    69 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5572    71     63            72    70 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5264    72     63            73    71 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5464    73     63            74    72 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5728    74     63            75    73 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012f6e5>] [<c015b32a>] [<c012f974>]
   [<c012fb2d>] [<c012fd51>] [<c0130341>] [<c015b0d8>] [<c015b496>] [<c015b0d8>]
   [<c012503d>] [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5528    75     63            76    74 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5676    76     63            77    75 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5332    77     63            78    76 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5584    78     63            79    77 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000000  5644    79     63            80    78 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea8e>] [<c012f974>] [<c012fb73>] [<c012fd6e>] [<c012f745>]
   [<c012f629>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>] [<c012dd7a>]
   [<c01085b3>]
dbench        D 00000048  5620    80     63            81    79 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5600    81     63            82    80 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000000  5620    82     63            83    81 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5604    83     63            84    82 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5620    84     63            85    83 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5632    85     63            86    84 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5676    86     63            87    85 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5676    87     63            88    86 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5620    88     63            89    87 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5620    89     63            90    88 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012f6e5>] [<c015b32a>] [<c012f974>]
   [<c012fb2d>] [<c012fd51>] [<c0130341>] [<c015b0d8>] [<c015b496>] [<c015b0d8>]
   [<c012503d>] [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5728    90     63            91    89 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5628    91     63            92    90 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000000  5676    92     63            93    91 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
dbench        D 00000000  5948    93     63            94    92 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000000  5692    94     63            95    93 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000000  5488    95     63            96    94 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]
dbench        D 00000048  5372    96     63                  95 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
   [<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
   [<c012dd7a>] [<c01085b3>]

SysRq : Show Regs

Pid: 0, comm:              swapper
EIP: 0010:[<c0106c03>] CPU: 0 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c022e000 ECX: d68e8280 EDX: d68e8280
ESI: c0106be0 EDI: ffffe000 EBP: 0008e000 DS: 0018 ES: 0018
CR0: 8005003b CR2: 40014000 CR3: 16177000 CR4: 00000090
Call Trace: [<c0106c59>] [<c0105000>] [<c0105027>]

SysRq : Show Memory
Mem-info:
Free pages:       95596kB (     0kB HighMem)
Zone:DMA freepages: 14572kB min:   128kB low:   256kB high:   384kB
Zone:Normal freepages: 81024kB min:  1020kB low:  2040kB high:  3060kB
Zone:HighMem freepages:     0kB min:     0kB low:     0kB high:     0kB
( Active: 1427, inactive: 67074, free: 23899 )
3*4kB 2*8kB 1*16kB 2*32kB 2*64kB 4*128kB 2*256kB 0*512kB 1*1024kB 6*2048kB = 14572kB)
10*4kB 3*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 39*2048kB = 81024kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap:       136512kB
98304 pages of RAM
0 pages of HIGHMEM
1995 reserved pages
72913 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory:    23796kB

mountain:~/dbench$ ps -eo pid,cmd,wchan
  PID CMD              WCHAN
    1 init             do_select
    2 [keventd]        context_thread
    3 [ksoftirqd_CPU0] ksoftirqd
    4 [kswapd]         kswapd
    5 [bdflush]        bdflush
    6 [kupdated]       wait_on_buffer
    7 [kreiserfsd]     reiserfs_journal_commit_thread
   25 /usr/sbin/syslog get_request_wait
   27 /usr/sbin/klogd  unix_wait_for_peer
   32 [eth0]           rtl8139_thread
   37 /usr/sbin/iplog  do_select
   38 /usr/sbin/iplog  do_poll
   39 /usr/sbin/iplog  rt_sigsuspend
   40 /usr/sbin/iplog  do_select
   41 /usr/sbin/iplog  wait_for_packet
   43 /usr/sbin/sshd   do_select
   44 /sbin/agetty tty read_chan
   45 /sbin/agetty -h  read_chan
   47 /usr/sbin/sshd   -
   48 -bash            wait4
   55 /usr/sbin/sshd   do_select
   56 -bash            read_chan
   65 ./dbench 32      get_request_wait
   66 ./dbench 32      get_request_wait
   67 ./dbench 32      get_request_wait
   68 ./dbench 32      get_request_wait
   69 ./dbench 32      get_request_wait
   70 ./dbench 32      get_request_wait
   71 ./dbench 32      get_request_wait
   72 ./dbench 32      get_request_wait
   73 ./dbench 32      get_request_wait
   74 ./dbench 32      get_request_wait
   75 ./dbench 32      get_request_wait
   76 ./dbench 32      get_request_wait
   77 ./dbench 32      get_request_wait
   78 ./dbench 32      get_request_wait
   79 ./dbench 32      get_request_wait
   80 ./dbench 32      get_request_wait
   81 ./dbench 32      get_request_wait
   82 ./dbench 32      get_request_wait
   83 ./dbench 32      get_request_wait
   84 ./dbench 32      get_request_wait
   85 ./dbench 32      get_request_wait
   86 ./dbench 32      get_request_wait
   87 ./dbench 32      get_request_wait
   88 ./dbench 32      get_request_wait
   89 ./dbench 32      get_request_wait
   90 ./dbench 32      get_request_wait
   91 ./dbench 32      get_request_wait
   92 ./dbench 32      get_request_wait
   93 ./dbench 32      get_request_wait
   94 ./dbench 32      get_request_wait
   95 ./dbench 32      get_request_wait
   96 ./dbench 32      get_request_wait
   97 ps -eo pid,cmd,w -

dbench was run on the ext2 filesystem.

mountain:~/dbench$ df -kT .
Filesystem    Type   1k-blocks      Used Available Use% Mounted on
/dev/hda6     ext2     4032092    249208   3578060   7% /home


-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-29  6:42                   ` rwhron
@ 2001-12-29 17:33                     ` Jens Axboe
  2001-12-29 17:48                       ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2001-12-29 17:33 UTC (permalink / raw)
  To: rwhron; +Cc: viro, linux-kernel

On Sat, Dec 29 2001, rwhron@earthlink.net wrote:
> > > Kernel panic: Out of memory and no killable processes...
> > 
> > Someone else did report a similar case. Very strange, doesn't look bio
> 
> Al Viro posted a fix:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2
> 
> I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
> and no panic.
> 
> Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.

Thanks, could you try with this patch? It's not a fix (haven't found the
bug yet), but I think we are looking at list corruption so please check
if this patch at least alters when it hangs etc.

--- /opt/kernel/linux-2.5.2-pre3/drivers/block/elevator.c	Sat Dec 29 12:17:53 2001
+++ drivers/block/elevator.c	Sat Dec 29 12:30:20 2001
@@ -142,7 +142,7 @@
 int elevator_linus_merge(request_queue_t *q, struct request **req,
 			 struct bio *bio)
 {
-	struct list_head *entry;
+	struct list_head *entry, *head = &q->queue_head;
 	struct request *__rq;
 	int ret;
 
@@ -160,17 +160,22 @@
 		}
 	}
 
+	if ((__rq = __elv_next_request(q)))
+		if (__rq->flags & REQ_STARTED)
+			head = head->next;
+
 	entry = &q->queue_head;
 	ret = ELEVATOR_NO_MERGE;
-	while ((entry = entry->prev) != &q->queue_head) {
+	while ((entry = entry->prev) != head) {
 		__rq = list_entry_rq(entry);
 
+		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
+			break;
+
 		/*
 		 * simply "aging" of requests in queue
 		 */
 		if (__rq->elevator_sequence-- <= 0)
-			break;
-		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
 			break;
 		if (!(__rq->flags & REQ_CMD))
 			continue;

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-29 17:33                     ` Jens Axboe
@ 2001-12-29 17:48                       ` Jens Axboe
  2001-12-29 19:43                         ` rwhron
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2001-12-29 17:48 UTC (permalink / raw)
  To: rwhron; +Cc: viro, linux-kernel

On Sat, Dec 29 2001, Jens Axboe wrote:
> On Sat, Dec 29 2001, rwhron@earthlink.net wrote:
> > > > Kernel panic: Out of memory and no killable processes...
> > > 
> > > Someone else did report a similar case. Very strange, doesn't look bio
> > 
> > Al Viro posted a fix:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2
> > 
> > I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
> > and no panic.
> > 
> > Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.
> 
> Thanks, could you try with this patch? It's not a fix (haven't found the
> bug yet), but I think we are looking at list corruption so please check
> if this patch at least alters when it hangs etc.

Ah I think I got it -- appears to be down to no rechecking for empty
queue after a potential queue_lock droppage (busy I/O, no request left
get_request returns NULL, drop lock and run get_request_wait). This
explains the get_request_wait deadlock, compiling right now...

--- /opt/kernel/linux-2.5.2-pre3/drivers/block/ll_rw_blk.c	Sat Dec 29 12:17:53 2001
+++ drivers/block/ll_rw_blk.c	Sat Dec 29 12:45:04 2001
@@ -881,7 +881,9 @@
 
 	BUG_ON(rw != READ && rw != WRITE);
 
+	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, rw);
+	spin_unlock_irq(q->queue_lock);
 
 	if (!rq && (gfp_mask & __GFP_WAIT))
 		rq = get_request_wait(q, rw);
@@ -1081,7 +1083,7 @@
 {
 	struct request *req, *freereq = NULL;
 	int el_ret, latency = 0, rw, nr_sectors, cur_nr_sectors, barrier;
-	struct list_head *insert_here = &q->queue_head;
+	struct list_head *insert_here;
 	elevator_t *elevator = &q->elevator;
 	sector_t sector;
 
@@ -1103,15 +1105,14 @@
 	barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);
 
 	spin_lock_irq(q->queue_lock);
+again:
+	req = NULL;
+	insert_here = q->queue_head.prev;
 
 	if (blk_queue_empty(q) || barrier) {
 		blk_plug_device(q);
 		goto get_rq;
 	}
-
-again:
-	req = NULL;
-	insert_here = q->queue_head.prev;
 
 	el_ret = elevator->elevator_merge_fn(q, &req, bio);
 	switch (el_ret) {

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state
  2001-12-29 17:48                       ` Jens Axboe
@ 2001-12-29 19:43                         ` rwhron
  0 siblings, 0 replies; 21+ messages in thread
From: rwhron @ 2001-12-29 19:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: rwhron, viro, linux-kernel

On Sat, Dec 29, 2001 at 06:48:37PM +0100, Jens Axboe wrote:
> Ah I think I got it -- appears to be down to no rechecking for empty
> queue after a potential queue_lock droppage (busy I/O, no request left
> get_request returns NULL, drop lock and run get_request_wait). This
> explains the get_request_wait deadlock, compiling right now...
> 
> -- 
> Jens Axboe

Two thumbs up!!  With your ll_rw_blk.c and elevator.c patches,
2.5.2-pre3 completes dbench 32, 128.  

I'm running a more complete battery of tests and will let you know
if there are any unusual results.

Thanks!

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-12-29 19:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-21 14:11 2.5.2-pre1 dbench 32 hangs in vmstat "b" state rwhron
2001-12-21 14:46 ` Jens Axboe
2001-12-21 16:43   ` rwhron
2001-12-21 17:01     ` Jens Axboe
2001-12-21 18:47       ` rwhron
2001-12-21 22:19         ` Jens Axboe
2001-12-21 23:55   ` rwhron
2001-12-24 14:03     ` Jens Axboe
2001-12-24 16:59       ` rwhron
2001-12-24 17:02         ` Jens Axboe
2001-12-24 22:14           ` rwhron
2001-12-27 19:07           ` rwhron
2001-12-28 11:40             ` Jens Axboe
2001-12-28 14:14               ` rwhron
2001-12-28 14:30                 ` Jens Axboe
2001-12-28 17:49                   ` rwhron
2001-12-28 19:29                   ` rwhron
2001-12-29  6:42                   ` rwhron
2001-12-29 17:33                     ` Jens Axboe
2001-12-29 17:48                       ` Jens Axboe
2001-12-29 19:43                         ` rwhron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox