public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
@ 2004-06-20  9:41 Matthias Schniedermeyer
  2004-06-20 10:29 ` Nick Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Schniedermeyer @ 2004-06-20  9:41 UTC (permalink / raw)
  To: linux-kernel

Hi



First. Kernels <= 2.6.5 don't have this problem. After 2.6.6 show this
behaviour sometimes i downgraded to 2.6.5 as i thought that it would be
fixed in 2.6.7, but 2.6.7 also show this behaviour.

The I/O i do is split some large files (>2GB) into smaller files <= 2GB.
Sometimes the process that does this just hangs (currently i have such a
hangung process), top currently shows up to 90% I/O-Wait.

SOME of my "konsole"s(xterm) hang then too, but others don't (like this
where i type this email) starting new "konsole"s sometimes work, sometimes
not.

System is:
Distribution: Debian SID.
2xP3-933Mhz, 3GB-RAM, Serverworks HE-SL-Chipset
"System"-HDD is SCSI connected via Symbios-53c1010 (Dual U160)
"Data"-HDD(s)(where the split-process does it's work) is connected to a
Highpoint RocketRAID 1540 (HPT-374 Chipset)
Filesystem is XFS for the Data-HDD(s) and Reiserfs for the system-HDD.

If other info is needed i will provide them.





Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20  9:41 Kernel 2.6.6 & 2.6.7 sometime hang after much I/O Matthias Schniedermeyer
@ 2004-06-20 10:29 ` Nick Piggin
  2004-06-20 11:59   ` Matthias Schniedermeyer
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2004-06-20 10:29 UTC (permalink / raw)
  To: Matthias Schniedermeyer; +Cc: linux-kernel

Matthias Schniedermeyer wrote:
> Hi
> 
> 
> 
> First. Kernels <= 2.6.5 don't have this problem. After 2.6.6 show this
> behaviour sometimes i downgraded to 2.6.5 as i thought that it would be
> fixed in 2.6.7, but 2.6.7 also show this behaviour.
> 
> The I/O i do is split some large files (>2GB) into smaller files <= 2GB.
> Sometimes the process that does this just hangs (currently i have such a
> hangung process), top currently shows up to 90% I/O-Wait.
> 
> SOME of my "konsole"s(xterm) hang then too, but others don't (like this
> where i type this email) starting new "konsole"s sometimes work, sometimes
> not.
> 
> System is:
> Distribution: Debian SID.
> 2xP3-933Mhz, 3GB-RAM, Serverworks HE-SL-Chipset
> "System"-HDD is SCSI connected via Symbios-53c1010 (Dual U160)
> "Data"-HDD(s)(where the split-process does it's work) is connected to a
> Highpoint RocketRAID 1540 (HPT-374 Chipset)
> Filesystem is XFS for the Data-HDD(s) and Reiserfs for the system-HDD.
> 
> If other info is needed i will provide them.
> 

When the process has hung, press Alt + SysRq + T to get a task
trace. Run

	dmesg -s 1000000 > tmp

and send us tmp. You'd better send your .config and dmesg too.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20 10:29 ` Nick Piggin
@ 2004-06-20 11:59   ` Matthias Schniedermeyer
  2004-06-20 13:05     ` Nick Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Schniedermeyer @ 2004-06-20 11:59 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

On Sun, Jun 20, 2004 at 08:29:20PM +1000, Nick Piggin wrote:
> Matthias Schniedermeyer wrote:
> >
> >
> >First. Kernels <= 2.6.5 don't have this problem. After 2.6.6 show this
> >behaviour sometimes i downgraded to 2.6.5 as i thought that it would be
> >fixed in 2.6.7, but 2.6.7 also show this behaviour.
> >
> >The I/O i do is split some large files (>2GB) into smaller files <= 2GB.
> >Sometimes the process that does this just hangs (currently i have such a
> >hangung process), top currently shows up to 90% I/O-Wait.
> >
> >SOME of my "konsole"s(xterm) hang then too, but others don't (like this
> >where i type this email) starting new "konsole"s sometimes work, sometimes
> >not.
> >
> >System is:
> >Distribution: Debian SID.
> >2xP3-933Mhz, 3GB-RAM, Serverworks HE-SL-Chipset
> >"System"-HDD is SCSI connected via Symbios-53c1010 (Dual U160)
> >"Data"-HDD(s)(where the split-process does it's work) is connected to a
> >Highpoint RocketRAID 1540 (HPT-374 Chipset)
> >Filesystem is XFS for the Data-HDD(s) and Reiserfs for the system-HDD.
> >
> >If other info is needed i will provide them.
> >
> 
> When the process has hung, press Alt + SysRq + T to get a task
> trace. Run
> 
> 	dmesg -s 1000000 > tmp
> 
> and send us tmp. You'd better send your .config and dmesg too.

Here we go.

Addendum: After some time more and more konsole froze. Up to the point
where i (had to) kill(ed) X(CTRL-ALT-Backspace) and after i couldn't
even log in at the console anymore i rebooted (into 2.6.5). Then i
recompiled 2.6.7 with SYSRQ-support and tried to reproduce the hanging
without X. After 3 runs i "gave up" and started X. Here i had luck and
the process ('cut-movie.pl') froze at first try. Then i killed X and did
the above on the console.

As the system is currently unsuable enough to reboot, i will reboot in
2.6.5 after this mail, but i can always reboot into 2.6.7 if you need
more input.




Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.


[-- Attachment #2: config.gz --]
[-- Type: application/x-gunzip, Size: 6192 bytes --]

[-- Attachment #3: dmesg.gz --]
[-- Type: application/x-gunzip, Size: 6038 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20 11:59   ` Matthias Schniedermeyer
@ 2004-06-20 13:05     ` Nick Piggin
  2004-06-20 14:17       ` Matthias Schniedermeyer
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2004-06-20 13:05 UTC (permalink / raw)
  To: Matthias Schniedermeyer; +Cc: linux-kernel, Jens Axboe

Matthias Schniedermeyer wrote:

> Here we go.
> 
> Addendum: After some time more and more konsole froze. Up to the point
> where i (had to) kill(ed) X(CTRL-ALT-Backspace) and after i couldn't
> even log in at the console anymore i rebooted (into 2.6.5). Then i
> recompiled 2.6.7 with SYSRQ-support and tried to reproduce the hanging
> without X. After 3 runs i "gave up" and started X. Here i had luck and
> the process ('cut-movie.pl') froze at first try. Then i killed X and did
> the above on the console.
> 
> As the system is currently unsuable enough to reboot, i will reboot in
> 2.6.5 after this mail, but i can always reboot into 2.6.7 if you need
> more input.
> 
> 

The attached trace was with 2.6.7, right? Can you reproduce the hang,
then, as root, do:

	echo 1024 > /sys/block/sda/queue/nr_requests

Replace sda with whatever devices your hung processes were
doing IO to. Do things start up again?


Interesting parts of dmesg.gz...

syslogd       D C2828BE0     0  1432      1          1435  1134 (NOTLB)
f7817ce4 00000086 f7242290 c2828be0 00000000 00000000 f78584c0 c16f1780
        c16f2d40 c16f1760 c04fca38 00000000 c0123dc0 f7817cf8 e2083440 c2828be0
        00000cbb ffe58e20 000000a2 f7242440 00063f67 f7817cf8 f7817d54 f7817d28
Call Trace:
  [<c0123dc0>] del_timer_sync+0x40/0x150
  [<c03d756c>] schedule_timeout+0x6c/0xc0
  [<c01247f0>] process_timeout+0x0/0x10
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c03d745b>] io_schedule_timeout+0x2b/0xd0
  [<c029b00f>] blk_congestion_wait+0x7f/0xa0
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c013a933>] get_dirty_limits+0x13/0xd0
  [<c013aada>] balance_dirty_pages+0xea/0x150
  [<c0199182>] reiserfs_file_write+0x652/0x7e0
  [<c03d6d9e>] schedule+0x2ee/0x5f0
  [<c036eb9c>] sockfd_lookup+0x1c/0x80
  [<c03704e2>] sys_recvfrom+0x102/0x120
  [<c03755ab>] datagram_poll+0x2b/0xca
  [<c0163e14>] poll_freewait+0x44/0x50
  [<c026d9f3>] copy_from_user+0x53/0x80
  [<c0150ffa>] do_readv_writev+0x19a/0x280
  [<c0198b30>] reiserfs_file_write+0x0/0x7e0
  [<c01511a8>] vfs_writev+0x58/0x70
  [<c0151272>] sys_writev+0x42/0x70
  [<c0105edf>] syscall_call+0x7/0xb

...

tee           D C2828BE0     0  2174   2172                     (NOTLB)
e6567d4c 00000086 00000000 c2828be0 c2829540 c04fca38 f727c2c0 c15c007b
        c16b007b ffffff00 c04fca38 00000000 c0123dc0 e6567d60 00000000 c2828be0
        000001b4 ffc6f6e4 000000a2 f725fa60 00063f65 e6567d60 e6567dbc e6567d90
Call Trace:
  [<c0123dc0>] del_timer_sync+0x40/0x150
  [<c03d756c>] schedule_timeout+0x6c/0xc0
  [<c01247f0>] process_timeout+0x0/0x10
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c03d745b>] io_schedule_timeout+0x2b/0xd0
  [<c029b00f>] blk_congestion_wait+0x7f/0xa0
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c013a933>] get_dirty_limits+0x13/0xd0
  [<c013aada>] balance_dirty_pages+0xea/0x150
  [<c0199182>] reiserfs_file_write+0x652/0x7e0
  [<c015d463>] pipe_wait+0xa3/0xc0
  [<c016b409>] update_atime+0xd9/0xe0
  [<c015d6d3>] pipe_readv+0x253/0x2d0
  [<c015d788>] pipe_read+0x38/0x40
  [<c0150bc8>] vfs_write+0xb8/0x130
  [<c0150cf2>] sys_write+0x42/0x70
  [<c0105edf>] syscall_call+0x7/0xb

login         D C2828BE0     0  2175      1          2238  2172 (NOTLB)
e57a5d4c 00000082 f7217360 c2828be0 c2829540 00000000 f738f980 c16f2b60
        c1054d00 c2830be0 c04fca38 00000000 c0123dc0 e57a5d60 f7242290 c2828be0
        00000ee1 ffe58165 000000a2 f7217510 00063f67 e57a5d60 e57a5dbc e57a5d90
Call Trace:
  [<c0123dc0>] del_timer_sync+0x40/0x150
  [<c03d756c>] schedule_timeout+0x6c/0xc0
  [<c01247f0>] process_timeout+0x0/0x10
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c03d745b>] io_schedule_timeout+0x2b/0xd0
  [<c029b00f>] blk_congestion_wait+0x7f/0xa0
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c013a933>] get_dirty_limits+0x13/0xd0
  [<c013aada>] balance_dirty_pages+0xea/0x150
  [<c0199182>] reiserfs_file_write+0x652/0x7e0
  [<c01457b0>] handle_mm_fault+0xf0/0x160
  [<c0114c1c>] do_page_fault+0x13c/0x561
  [<c0150bc8>] vfs_write+0xb8/0x130
  [<c0150cf2>] sys_write+0x42/0x70
  [<c0105edf>] syscall_call+0x7/0xb

kdeinit       D C2828BE0     0  2238      1          3266  2175 (NOTLB)
dd999d4c 00200086 00000000 c2828be0 c2829540 00000000 f7858900 c16f1760
        c16f70e0 c15cf260 c04fca38 00000000 c0123dc0 dd999d60 00000000 c2828be0
        000002ec ffa884e8 000000a2 c2b57330 00063f63 dd999d60 dd999dbc dd999d90
Call Trace:
  [<c0123dc0>] del_timer_sync+0x40/0x150
  [<c03d756c>] schedule_timeout+0x6c/0xc0
  [<c01247f0>] process_timeout+0x0/0x10
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c03d745b>] io_schedule_timeout+0x2b/0xd0
  [<c029b00f>] blk_congestion_wait+0x7f/0xa0
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c013a933>] get_dirty_limits+0x13/0xd0
  [<c013aada>] balance_dirty_pages+0xea/0x150
  [<c0199182>] reiserfs_file_write+0x652/0x7e0
  [<c0145234>] do_anonymous_page+0x154/0x1a0
  [<c01457b0>] handle_mm_fault+0xf0/0x160
  [<c0114c1c>] do_page_fault+0x13c/0x561
  [<c0147297>] do_mmap_pgoff+0x3c7/0x6d0
  [<c0150bc8>] vfs_write+0xb8/0x130
  [<c0150cf2>] sys_write+0x42/0x70
  [<c0105edf>] syscall_call+0x7/0xb

cut-movie.pl  D C2828BE0     0  3266      1                2238 (NOTLB)
ea2c9c20 00200086 00000000 c2828be0 c2829540 c043f42c f02ba6c0 00000000
        c0108289 00000000 c04fca38 00000000 c0123dc0 ea2c9c34 00000000 c2828be0
        00000a95 ffe598b5 000000a2 e20835f0 00063f67 ea2c9c34 ea2c9c90 ea2c9c64
Call Trace:
  [<c0108289>] handle_IRQ_event+0x49/0x80
  [<c0123dc0>] del_timer_sync+0x40/0x150
  [<c03d756c>] schedule_timeout+0x6c/0xc0
  [<c01247f0>] process_timeout+0x0/0x10
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c03d745b>] io_schedule_timeout+0x2b/0xd0
  [<c029b00f>] blk_congestion_wait+0x7f/0xa0
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c0119f80>] autoremove_wake_function+0x0/0x60
  [<c013a933>] get_dirty_limits+0x13/0xd0
  [<c013aada>] balance_dirty_pages+0xea/0x150
  [<c0154d9d>] generic_commit_write+0x7d/0xa0
  [<c01374d1>] generic_file_aio_write_nolock+0x4d1/0xb90
  [<c023a194>] xfs_log_move_tail+0x24/0x180
  [<c024a416>] xfs_trans_unlocked_item+0x56/0x60
  [<c0260067>] xfs_write+0x287/0x8a0
  [<c010c599>] timer_interrupt+0x59/0x120
  [<c025b6fe>] linvfs_write+0xbe/0x130
  [<c0150ad9>] do_sync_write+0x89/0xc0
  [<c01246e0>] do_timer+0xc0/0xd0
  [<c0117efd>] scheduler_tick+0x11d/0x4c0
  [<c0120075>] __do_softirq+0xb5/0xc0
  [<c0117efd>] scheduler_tick+0x11d/0x4c0
  [<c0150bc8>] vfs_write+0xb8/0x130
  [<c0120075>] __do_softirq+0xb5/0xc0
  [<c0150cf2>] sys_write+0x42/0x70
  [<c0105edf>] syscall_call+0x7/0xb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20 13:05     ` Nick Piggin
@ 2004-06-20 14:17       ` Matthias Schniedermeyer
  2004-06-20 14:19         ` Jens Axboe
  2004-06-20 14:38         ` Matthias Schniedermeyer
  0 siblings, 2 replies; 8+ messages in thread
From: Matthias Schniedermeyer @ 2004-06-20 14:17 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Jens Axboe

On Sun, Jun 20, 2004 at 11:05:23PM +1000, Nick Piggin wrote:
> Matthias Schniedermeyer wrote:
> 
> >Here we go.
> >
> >Addendum: After some time more and more konsole froze. Up to the point
> >where i (had to) kill(ed) X(CTRL-ALT-Backspace) and after i couldn't
> >even log in at the console anymore i rebooted (into 2.6.5). Then i
> >recompiled 2.6.7 with SYSRQ-support and tried to reproduce the hanging
> >without X. After 3 runs i "gave up" and started X. Here i had luck and
> >the process ('cut-movie.pl') froze at first try. Then i killed X and did
> >the above on the console.
> >
> >As the system is currently unsuable enough to reboot, i will reboot in
> >2.6.5 after this mail, but i can always reboot into 2.6.7 if you need
> >more input.
> >
> >
> 
> The attached trace was with 2.6.7, right?

Yes.

> Can you reproduce the hang, then, as root, do:
> 
> 	echo 1024 > /sys/block/sda/queue/nr_requests
> 
> Replace sda with whatever devices your hung processes were
> doing IO to. Do things start up again?

1 try (with X) with unchanged nr_requests. (I was stupid enough to issues the
command on the wrong HDD :-) )
(AFAIR i had the same situation with 2.6.6, sometimes the hang didn't happen)

6 tries (with X) with nr_requests=1024 and no hang.

1 try with nr_requests back to 128 and now it hangs.
now changing to nr_request=1024 doesn't seem to change anyting, my
konsoles start to freeze.


Don't know if it is relevant but the bytes transfered are always rougly
around 3000-3400MB (1500-1700 MB read & 1500-1700 MB write. The program
reads 100MB, then writes 100MB, then issues "sync", the hangs happend
always about every after 15-17 "rounds")





Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20 14:17       ` Matthias Schniedermeyer
@ 2004-06-20 14:19         ` Jens Axboe
  2004-06-20 14:43           ` Matthias Schniedermeyer
  2004-06-20 14:38         ` Matthias Schniedermeyer
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2004-06-20 14:19 UTC (permalink / raw)
  To: Matthias Schniedermeyer; +Cc: Nick Piggin, linux-kernel

On Sun, Jun 20 2004, Matthias Schniedermeyer wrote:
> On Sun, Jun 20, 2004 at 11:05:23PM +1000, Nick Piggin wrote:
> > Matthias Schniedermeyer wrote:
> > 
> > >Here we go.
> > >
> > >Addendum: After some time more and more konsole froze. Up to the point
> > >where i (had to) kill(ed) X(CTRL-ALT-Backspace) and after i couldn't
> > >even log in at the console anymore i rebooted (into 2.6.5). Then i
> > >recompiled 2.6.7 with SYSRQ-support and tried to reproduce the hanging
> > >without X. After 3 runs i "gave up" and started X. Here i had luck and
> > >the process ('cut-movie.pl') froze at first try. Then i killed X and did
> > >the above on the console.
> > >
> > >As the system is currently unsuable enough to reboot, i will reboot in
> > >2.6.5 after this mail, but i can always reboot into 2.6.7 if you need
> > >more input.
> > >
> > >
> > 
> > The attached trace was with 2.6.7, right?
> 
> Yes.
> 
> > Can you reproduce the hang, then, as root, do:
> > 
> > 	echo 1024 > /sys/block/sda/queue/nr_requests
> > 
> > Replace sda with whatever devices your hung processes were
> > doing IO to. Do things start up again?
> 
> 1 try (with X) with unchanged nr_requests. (I was stupid enough to issues the
> command on the wrong HDD :-) )
> (AFAIR i had the same situation with 2.6.6, sometimes the hang didn't happen)
> 
> 6 tries (with X) with nr_requests=1024 and no hang.
> 
> 1 try with nr_requests back to 128 and now it hangs.
> now changing to nr_request=1024 doesn't seem to change anyting, my
> konsoles start to freeze.
> 
> 
> Don't know if it is relevant but the bytes transfered are always rougly
> around 3000-3400MB (1500-1700 MB read & 1500-1700 MB write. The program
> reads 100MB, then writes 100MB, then issues "sync", the hangs happend
> always about every after 15-17 "rounds")

(missed the initial report) - what io hardware are you using?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20 14:17       ` Matthias Schniedermeyer
  2004-06-20 14:19         ` Jens Axboe
@ 2004-06-20 14:38         ` Matthias Schniedermeyer
  1 sibling, 0 replies; 8+ messages in thread
From: Matthias Schniedermeyer @ 2004-06-20 14:38 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Jens Axboe

On Sun, Jun 20, 2004 at 04:17:34PM +0200, Matthias Schniedermeyer wrote:
> On Sun, Jun 20, 2004 at 11:05:23PM +1000, Nick Piggin wrote:
> > Matthias Schniedermeyer wrote:
> > 
> > >Here we go.
> > >
> > >Addendum: After some time more and more konsole froze. Up to the point
> > >where i (had to) kill(ed) X(CTRL-ALT-Backspace) and after i couldn't
> > >even log in at the console anymore i rebooted (into 2.6.5). Then i
> > >recompiled 2.6.7 with SYSRQ-support and tried to reproduce the hanging
> > >without X. After 3 runs i "gave up" and started X. Here i had luck and
> > >the process ('cut-movie.pl') froze at first try. Then i killed X and did
> > >the above on the console.
> > >
> > >As the system is currently unsuable enough to reboot, i will reboot in
> > >2.6.5 after this mail, but i can always reboot into 2.6.7 if you need
> > >more input.
> > >
> > >
> > 
> > The attached trace was with 2.6.7, right?
> 
> Yes.
> 
> > Can you reproduce the hang, then, as root, do:
> > 
> > 	echo 1024 > /sys/block/sda/queue/nr_requests
> > 
> > Replace sda with whatever devices your hung processes were
> > doing IO to. Do things start up again?
> 
> 1 try (with X) with unchanged nr_requests. (I was stupid enough to issues the
> command on the wrong HDD :-) )
> (AFAIR i had the same situation with 2.6.6, sometimes the hang didn't happen)
> 
> 6 tries (with X) with nr_requests=1024 and no hang.
> 
> 1 try with nr_requests back to 128 and now it hangs.
> now changing to nr_request=1024 doesn't seem to change anyting, my
> konsoles start to freeze.
> 
> 
> Don't know if it is relevant but the bytes transfered are always rougly
> around 3000-3400MB (1500-1700 MB read & 1500-1700 MB write. The program
> reads 100MB, then writes 100MB, then issues "sync", the hangs happend
> always about every after 15-17 "rounds")

After a fresh bootup i did another try with nr_requests=1024.
This time it froze at the third try. At the same round as the former
try. (17th round)





Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.6 & 2.6.7 sometime hang after much I/O
  2004-06-20 14:19         ` Jens Axboe
@ 2004-06-20 14:43           ` Matthias Schniedermeyer
  0 siblings, 0 replies; 8+ messages in thread
From: Matthias Schniedermeyer @ 2004-06-20 14:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Nick Piggin, linux-kernel

On Sun, Jun 20, 2004 at 04:19:39PM +0200, Jens Axboe wrote:
> On Sun, Jun 20 2004, Matthias Schniedermeyer wrote:
> > On Sun, Jun 20, 2004 at 11:05:23PM +1000, Nick Piggin wrote:
> > > Matthias Schniedermeyer wrote:
> > > 
> > > >Here we go.
> > > >
> > > >Addendum: After some time more and more konsole froze. Up to the point
> > > >where i (had to) kill(ed) X(CTRL-ALT-Backspace) and after i couldn't
> > > >even log in at the console anymore i rebooted (into 2.6.5). Then i
> > > >recompiled 2.6.7 with SYSRQ-support and tried to reproduce the hanging
> > > >without X. After 3 runs i "gave up" and started X. Here i had luck and
> > > >the process ('cut-movie.pl') froze at first try. Then i killed X and did
> > > >the above on the console.
> > > >
> > > >As the system is currently unsuable enough to reboot, i will reboot in
> > > >2.6.5 after this mail, but i can always reboot into 2.6.7 if you need
> > > >more input.
> > > >
> > > >
> > > 
> > > The attached trace was with 2.6.7, right?
> > 
> > Yes.
> > 
> > > Can you reproduce the hang, then, as root, do:
> > > 
> > > 	echo 1024 > /sys/block/sda/queue/nr_requests
> > > 
> > > Replace sda with whatever devices your hung processes were
> > > doing IO to. Do things start up again?
> > 
> > 1 try (with X) with unchanged nr_requests. (I was stupid enough to issues the
> > command on the wrong HDD :-) )
> > (AFAIR i had the same situation with 2.6.6, sometimes the hang didn't happen)
> > 
> > 6 tries (with X) with nr_requests=1024 and no hang.
> > 
> > 1 try with nr_requests back to 128 and now it hangs.
> > now changing to nr_request=1024 doesn't seem to change anyting, my
> > konsoles start to freeze.
> > 
> > 
> > Don't know if it is relevant but the bytes transfered are always rougly
> > around 3000-3400MB (1500-1700 MB read & 1500-1700 MB write. The program
> > reads 100MB, then writes 100MB, then issues "sync", the hangs happend
> > always about every after 15-17 "rounds")
> 
> (missed the initial report) - what io hardware are you using?

The data-HDD is connected via a Highpoint-RocketRAID 1540, HPT-374
chipset. The cable-connection is via double S-ATA <-> P-ATA adapters.
(The RocketRAID has the adapters onboard and the HDD has another one.

My system-HDD is a SCSI one, connected via Symbios 53c1010 (Dual U160)
As i can't even start new programs and running programms freeze one
after the other and none has ANY I/O with the data-HDD i would suspect
the Symbios more than the Highpoint.



Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-06-20 14:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-20  9:41 Kernel 2.6.6 & 2.6.7 sometime hang after much I/O Matthias Schniedermeyer
2004-06-20 10:29 ` Nick Piggin
2004-06-20 11:59   ` Matthias Schniedermeyer
2004-06-20 13:05     ` Nick Piggin
2004-06-20 14:17       ` Matthias Schniedermeyer
2004-06-20 14:19         ` Jens Axboe
2004-06-20 14:43           ` Matthias Schniedermeyer
2004-06-20 14:38         ` Matthias Schniedermeyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox