* RAID-1 and Reiser4 issue: umount hangs
@ 2006-05-27 22:14 Arend Freije
2006-05-27 22:46 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Arend Freije @ 2006-05-27 22:14 UTC (permalink / raw)
To: linux-kernel
Hi,
I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it works
just fine. Recently I bought a second disk (/dev/sdb) for RAID-1
mirroring. With mdadm I created a degraded raid-1 array on /dev/md/0,
devices missing,/dev/sdb1. After that I created a Reiser4 filesystem on
/dev/md/0 and mounted it at /mnt. Then I copied the data from /dev/sda1
to /mnt.
All goes well until I umount /mnt, umount simply hangs without any
error. Syslog doesn't report any error. In /proc/mdstat, the array
remains "active sync". Shutting down linux fails because the umount is
still waiting and seems to block other umounts. The umount process
cannot be killed by the root. A hard reset is the only resolution to get
my system functioning normally, but without the raid-1 of course.
The problem seems to emerge only with the combination of RAID and
Reiser4. I've created an ext-2 filesystem on /dev/md/0, and after that
mount ; cp -ax ; umount works without a problem, and the hanging umount
re-appears when using Reiser4 again.
My questions:
- how can I find the cause of the hanging umount?
- how can I fix it?
A few details of my linux-box:
Gentoo Linux, 2.6.16 kernel with Reiser4-for-2.6.16-2.patch.gz
i686 AMD Athlon(tm) XP 2400+
DC4300 SATA-II controller (Silicon Image 3124, libata + sata_sil24 drivers)
2 x Samsung SP2504C hard disk
Thanx in advance,
Arend
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-27 22:14 RAID-1 and Reiser4 issue: umount hangs Arend Freije
@ 2006-05-27 22:46 ` Neil Brown
2006-05-28 9:45 ` Arend Freije
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2006-05-27 22:46 UTC (permalink / raw)
To: Arend Freije; +Cc: linux-kernel
On Sunday May 28, afreije@inn.nl wrote:
>
> My questions:
> - how can I find the cause of the hanging umount?
# echo t > /proc/sysrq-trigger
and look at the resulting kernel messages, particularly for the
unmount process. If they don't make sense to you, post them.
> - how can I fix it?
Understand the problems, change the code :-)
Probably we can manage it together...
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-27 22:46 ` Neil Brown
@ 2006-05-28 9:45 ` Arend Freije
2006-05-28 23:49 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Arend Freije @ 2006-05-28 9:45 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
Neil Brown wrote:
> # echo t > /proc/sysrq-trigger
>
> and look at the resulting kernel messages, particularly for the
> unmount process. If they don't make sense to you, post them.
>
>
Tnx for your response. After recompiling the kernel with magic_sysrq
enabled, rebooting, and repeating umount with the reiser4 md device, the
following trace contains a reiser4-entry:
> syslog-ng R running 0 7581 1 7197 (NOTLB)
> umount D C011B591 0 7588 7200 (NOTLB)
> f6643c98 00000000 c1808320 c011b591 f7db5ad0 f4d18c00 003d092a 00000000
> 00000000 f7db5ad0 c1808320 00000000 f4d18c00 003d092a f6b33540 c1808320
> 00000000 f4d18c00 003d092a f6b33540 f6b33668 c1808320 00000000 f6643cfc
> Call Trace:
> [<c011b591>] __wake_up_common+0x41/0x70
> [<c0318346>] io_schedule+0x26/0x30
> [<c01469fb>] sync_page+0x4b/0x60
> [<c03184d5>] __wait_on_bit+0x45/0x70
> [<c01469b0>] sync_page+0x0/0x60
> [<c014726d>] wait_on_page_bit+0xad/0xc0
> [<c0136a30>] wake_bit_function+0x0/0x60
> [<c0136a30>] wake_bit_function+0x0/0x60
> [<c01ac2f9>] jwait_io+0x59/0x80
> [<c01c1763>] update_journal_header+0x83/0xb0
> [<c01c272a>] commit_tx+0xca/0x110
> [<c01c29a1>] reiser4_write_logs+0x141/0x1e0
> [<c01b9d91>] commit_current_atom+0x171/0x2c0
> [<c01baabf>] try_commit_txnh+0x13f/0x1e0
> [<c01bab94>] commit_txnh+0x34/0xd0
> [<c01b91bc>] txn_end+0x2c/0x30
> [<c01b91d0>] txn_restart+0x10/0x30
> [<c01b920a>] txn_restart_current+0x1a/0x20
> [<c01b9f1f>] force_commit_atom+0x3f/0x70
> [<c01ba03a>] txnmgr_force_commit_all+0xea/0x130
> [<c01fcb0e>] release_format40+0x7e/0x150
> [<c01b5ea8>] init_context+0x58/0x80
> [<c01c8b89>] reiser4_put_super+0x89/0xf0
> [<c01857ed>] invalidate_inodes+0x5d/0x80
> [<c0170fb6>] generic_shutdown_super+0x156/0x160
> [<c0171b2d>] kill_block_super+0x2d/0x50
> [<c0170d90>] deactivate_super+0x60/0x80
> [<c0188e1f>] sys_umount+0x3f/0x90
> [<c01171b0>] do_page_fault+0x1c0/0x5a8
> [<c0159bc1>] sys_munmap+0x51/0x80
> [<c0188e87>] sys_oldumount+0x17/0x20
> [<c010306b>] sysenter_past_esp+0x54/0x75
Freely inerpreting this trace, I'd say that the umount causes the
reiser4 filesystem to do an extra commit, and the sync_page seems to be
waiting for IO.
Are there other traces of interrest?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-28 9:45 ` Arend Freije
@ 2006-05-28 23:49 ` Neil Brown
2006-05-31 9:18 ` Arend Freije
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2006-05-28 23:49 UTC (permalink / raw)
To: Arend Freije; +Cc: linux-kernel
On Sunday May 28, afreije@inn.nl wrote:
> Neil Brown wrote:
> > # echo t > /proc/sysrq-trigger
> >
> > and look at the resulting kernel messages, particularly for the
> > unmount process. If they don't make sense to you, post them.
> >
> >
> Tnx for your response. After recompiling the kernel with magic_sysrq
> enabled, rebooting, and repeating umount with the reiser4 md device, the
> following trace contains a reiser4-entry:
This looks very much like a reiser4 problem rather than a raid
problem, or at least you will need someone very familiar with reiser4
to understand what is going on here.
I recommend you post this information to
reiserfs-dev@namesys.com
(and cc: to linux-kernel too if you like) to make sure someone
knowledgeable sees it.
I suspect reiser4 has some kernel threads that it uses. They might
have 'obvious' names like
ent:....
ktxnmgrd
from looking at the source. Including the stack trace of these and
anything else with 'reiser4' in the trace would probably be helpful.
NeilBrown
>
> > syslog-ng R running 0 7581 1 7197 (NOTLB)
> > umount D C011B591 0 7588 7200 (NOTLB)
> > f6643c98 00000000 c1808320 c011b591 f7db5ad0 f4d18c00 003d092a 00000000
> > 00000000 f7db5ad0 c1808320 00000000 f4d18c00 003d092a f6b33540 c1808320
> > 00000000 f4d18c00 003d092a f6b33540 f6b33668 c1808320 00000000 f6643cfc
> > Call Trace:
> > [<c011b591>] __wake_up_common+0x41/0x70
> > [<c0318346>] io_schedule+0x26/0x30
> > [<c01469fb>] sync_page+0x4b/0x60
> > [<c03184d5>] __wait_on_bit+0x45/0x70
> > [<c01469b0>] sync_page+0x0/0x60
> > [<c014726d>] wait_on_page_bit+0xad/0xc0
> > [<c0136a30>] wake_bit_function+0x0/0x60
> > [<c0136a30>] wake_bit_function+0x0/0x60
> > [<c01ac2f9>] jwait_io+0x59/0x80
> > [<c01c1763>] update_journal_header+0x83/0xb0
> > [<c01c272a>] commit_tx+0xca/0x110
> > [<c01c29a1>] reiser4_write_logs+0x141/0x1e0
> > [<c01b9d91>] commit_current_atom+0x171/0x2c0
> > [<c01baabf>] try_commit_txnh+0x13f/0x1e0
> > [<c01bab94>] commit_txnh+0x34/0xd0
> > [<c01b91bc>] txn_end+0x2c/0x30
> > [<c01b91d0>] txn_restart+0x10/0x30
> > [<c01b920a>] txn_restart_current+0x1a/0x20
> > [<c01b9f1f>] force_commit_atom+0x3f/0x70
> > [<c01ba03a>] txnmgr_force_commit_all+0xea/0x130
> > [<c01fcb0e>] release_format40+0x7e/0x150
> > [<c01b5ea8>] init_context+0x58/0x80
> > [<c01c8b89>] reiser4_put_super+0x89/0xf0
> > [<c01857ed>] invalidate_inodes+0x5d/0x80
> > [<c0170fb6>] generic_shutdown_super+0x156/0x160
> > [<c0171b2d>] kill_block_super+0x2d/0x50
> > [<c0170d90>] deactivate_super+0x60/0x80
> > [<c0188e1f>] sys_umount+0x3f/0x90
> > [<c01171b0>] do_page_fault+0x1c0/0x5a8
> > [<c0159bc1>] sys_munmap+0x51/0x80
> > [<c0188e87>] sys_oldumount+0x17/0x20
> > [<c010306b>] sysenter_past_esp+0x54/0x75
>
> Freely inerpreting this trace, I'd say that the umount causes the
> reiser4 filesystem to do an extra commit, and the sync_page seems to be
> waiting for IO.
> Are there other traces of interrest?
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-28 23:49 ` Neil Brown
@ 2006-05-31 9:18 ` Arend Freije
2006-05-31 9:36 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Arend Freije @ 2006-05-31 9:18 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
Neil Brown wrote:
> On Sunday May 28, Arend Freije wrote:
>
>> Neil Brown wrote:
>>
>>> # echo t > /proc/sysrq-trigger
>>>
>>> and look at the resulting kernel messages, particularly for the
>>> unmount process. If they don't make sense to you, post them.
>>>
>> Tnx for your response. After recompiling the kernel with magic_sysrq
>> enabled, rebooting, and repeating umount with the reiser4 md device, the
>> following trace contains a reiser4-entry:
>>
>
> This looks very much like a reiser4 problem rather than a raid
> problem, or at least you will need someone very familiar with reiser4
> to understand what is going on here.
>
> I recommend you post this information to
> reiserfs-dev@namesys.com
>
That would be reiserfs-list@namesys.com
> (and cc: to linux-kernel too if you like) to make sure someone
> knowledgeable sees it.
>
> I suspect reiser4 has some kernel threads that it uses. They might
> have 'obvious' names like
> ent:....
> ktxnmgrd
> from looking at the source. Including the stack trace of these and
> anything else with 'reiser4' in the trace would probably be helpful.
>
> NeilBrown
>
>
Alexander Zarochentsev wrote:
>>> would it work better with "no_write_barrier" mount option?
>>>
>> It would indeed. What's the purpose of this option?
>>
>
> It disables write barrier support in reiser4 which may be buggy. It is
> not necessary that reiser4 code has a bug, but the elevator, md device
> or disk driver code.
Alexander,
You're probably right. I found several posts on the linux-kernel list
involving problems with write barrier support in combination with SATA
and ext3 . So I tried:
# mkfs -t ext3 /dev/md/0
# mount -o barrier=1 /dev/md/0 /mnt
# cp -a $src /mnt
# umount /mnt
And indeed, umount hangs now as well.
So it seems to be a linux-kernel issue after all...
The call traces of interrest with the ext3 case seem to be:
> kjournald D 00000000 0 6790 1 6745 (L-TLB)
> f104ddcc 00000000 c1808320 00000000 00000000 f76cb2b0 c011b591 dfcc3a90
> 5b666c00 003d092c 00000000 00000000 dfcc3a90 c1808320 00000000 c1808320
> 00000000 5b666c00 003d092c f7c77a90 f7c77bb8 c1808320 00000000 f104de38
> Call Trace:
> [<c011b591>] __wake_up_common+0x41/0x70
> [<c0315766>] io_schedule+0x26/0x30
> [<c016af13>] sync_buffer+0x53/0x60
> [<c03158f5>] __wait_on_bit+0x45/0x70
> [<c016aec0>] sync_buffer+0x0/0x60
> [<c016aec0>] sync_buffer+0x0/0x60
> [<c03159bd>] out_of_line_wait_on_bit+0x9d/0xc0
> [<c0136a30>] wake_bit_function+0x0/0x60
> [<c0136a30>] wake_bit_function+0x0/0x60
> [<c016ee0e>] sync_dirty_buffer+0x5e/0xd0
> [<f9d57055>] journal_write_commit_record+0x95/0x140 [jbd]
> [<f9d5cd18>] journal_free_journal_head+0x18/0x20 [jbd]
> [<f9d5d266>] journal_put_journal_head+0x96/0x100 [jbd]
> [<f9d56207>] journal_unfile_buffer+0x67/0xb0 [jbd]
> [<f9d57ca7>] journal_commit_transaction+0xba7/0x13a0 [jbd]
> [<c0314f85>] preempt_schedule+0x45/0x70
> [<c012ab24>] lock_timer_base+0x24/0x50
> [<c012ad91>] try_to_del_timer_sync+0x51/0x60
> [<f9d5aa45>] kjournald+0xa5/0x220 [jbd]
> [<c01369d0>] autoremove_wake_function+0x0/0x60
> [<c01369d0>] autoremove_wake_function+0x0/0x60
> [<c0102fa2>] ret_from_fork+0x6/0x14
> [<f9d5a990>] commit_timeout+0x0/0x10 [jbd]
> [<f9d5a9a0>] kjournald+0x0/0x220 [jbd]
> [<c0101225>] kernel_thread_helper+0x5/0x10
and
> umount D C014761A 0 6806 6779 (NOTLB)
> f0d0fe60 00000000 c1808320 c014761a f05efc9c f0d0fe6c 00000000 0000000e
> f548d48c f548d484 00000000 00000001 f0d0fe60 c011b600 f548d484 c1808320
> 0007a120 f80c8400 003d0930 dfcd1a70 dfcd1b98 f548d400 f548d464 f0d0fe88
> Call Trace:
> [<c014761a>] find_get_pages+0x5a/0x70
> [<c011b600>] __wake_up+0x40/0x60
> [<f9d5ad35>] journal_kill_thread+0xc5/0x110 [jbd]
> [<c01369d0>] autoremove_wake_function+0x0/0x60
> [<c01369d0>] autoremove_wake_function+0x0/0x60
> [<f9eb2510>] ext3_clear_inode+0x20/0x40 [ext3]
> [<f9d5c181>] journal_destroy+0x11/0x1e0 [jbd]
> [<c018567d>] dispose_list+0xcd/0xf0
> [<f9eb21f9>] ext3_put_super+0x29/0x1b0 [ext3]
> [<c01857ad>] invalidate_inodes+0x5d/0x80
> [<c0170f76>] generic_shutdown_super+0x156/0x160
> [<c0171aed>] kill_block_super+0x2d/0x50
Kind Regards,
Arend Freije
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-31 9:18 ` Arend Freije
@ 2006-05-31 9:36 ` Jens Axboe
2006-06-01 5:54 ` Arend Freije
2006-06-01 18:52 ` Arend Freije
0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2006-05-31 9:36 UTC (permalink / raw)
To: Arend Freije; +Cc: Neil Brown, linux-kernel
On Wed, May 31 2006, Arend Freije wrote:
> You're probably right. I found several posts on the linux-kernel list
> involving problems with write barrier support in combination with SATA
> and ext3 . So I tried:
>
> # mkfs -t ext3 /dev/md/0
> # mount -o barrier=1 /dev/md/0 /mnt
> # cp -a $src /mnt
> # umount /mnt
>
> And indeed, umount hangs now as well.
> So it seems to be a linux-kernel issue after all...
Can you please try 2.6.17-rc5?
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-31 9:36 ` Jens Axboe
@ 2006-06-01 5:54 ` Arend Freije
2006-06-01 18:52 ` Arend Freije
1 sibling, 0 replies; 15+ messages in thread
From: Arend Freije @ 2006-06-01 5:54 UTC (permalink / raw)
To: Jens Axboe, Alexander Zarochentsev
Cc: Neil Brown, linux-kernel, reiserfs-list
[-- Attachment #1: Type: text/plain, Size: 603 bytes --]
Jens Axboe wrote:
> On Wed, May 31 2006, Arend Freije wrote:
>
>> You're probably right. I found several posts on the linux-kernel list
>> involving problems with write barrier support in combination with SATA
>> and ext3 . So I tried:
>>
>> # mkfs -t ext3 /dev/md/0
>> # mount -o barrier=1 /dev/md/0 /mnt
>> # cp -a $src /mnt
>> # umount /mnt
>>
>> And indeed, umount hangs now as well.
>> So it seems to be a linux-kernel issue after all...
>>
>
> Can you please try 2.6.17-rc5?
>
>
Great! 2.6.17_rc5 fixed this is issue. For Reiser4 as well as Ext3 (with
barrier=1).
Many Thanks!
Arend
[-- Attachment #2: Type: text/html, Size: 1012 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-31 9:36 ` Jens Axboe
2006-06-01 5:54 ` Arend Freije
@ 2006-06-01 18:52 ` Arend Freije
1 sibling, 0 replies; 15+ messages in thread
From: Arend Freije @ 2006-06-01 18:52 UTC (permalink / raw)
Cc: linux-kernel
Jens Axboe wrote:
> On Wed, May 31 2006, Arend Freije wrote:
>
>> You're probably right. I found several posts on the linux-kernel list
>> involving problems with write barrier support in combination with SATA
>> and ext3 . So I tried:
>>
>> # mkfs -t ext3 /dev/md/0
>> # mount -o barrier=1 /dev/md/0 /mnt
>> # cp -a $src /mnt
>> # umount /mnt
>>
>> And indeed, umount hangs now as well.
>> So it seems to be a linux-kernel issue after all...
>>
>
> Can you please try 2.6.17-rc5?
>
>
Great! 2.6.17_rc5 fixed this is issue. For Reiser4 as well as Ext3 (with
barrier=1).
Many Thanks!
Arend
^ permalink raw reply [flat|nested] 15+ messages in thread
* RAID-1 and Reiser4 issue: umount hangs
@ 2006-05-29 20:17 Arend Freije
2006-05-30 11:53 ` Alexander Zarochentsev
2006-05-30 15:31 ` Vladimir V. Saveliev
0 siblings, 2 replies; 15+ messages in thread
From: Arend Freije @ 2006-05-29 20:17 UTC (permalink / raw)
To: reiserfs-list
Hi,
I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it works
just fine. Recently I bought a second disk (/dev/sdb) for RAID-1
mirroring. With mdadm I created a degraded raid-1 array on /dev/md/0,
devices missing,/dev/sdb1. After that I created a Reiser4 filesystem on
/dev/md/0 and mounted it at /mnt. Then I copied the data from /dev/sda1
to /mnt.
All goes well until I umount /mnt, umount simply hangs without any
error. Syslog doesn't report any error. In /proc/mdstat, the array
remains "active sync". Shutting down linux fails because the umount is
still waiting and seems to block other umounts. The umount process
cannot be killed by the root. A hard reset is the only resolution to get
my system functioning normally, but without the raid-1 of course.
The problem seems to emerge only with the combination of RAID and
Reiser4. I've created an ext-2 filesystem on /dev/md/0, and after that
mount ; cp -ax ; umount works without a problem, and the hanging umount
re-appears when using Reiser4 again.
My questions:
- how can I find the cause of the hanging umount?
- how can I fix it?
A few details of my linux-box:
Gentoo Linux, 2.6.16 kernel with Reiser4-for-2.6.16-2.patch.gz
i686 AMD Athlon(tm) XP 2400+
DC4300 SATA-II controller (Silicon Image 3124, libata + sata_sil24 drivers)
2 x Samsung SP2504C hard disk
I've posted this issue to the linux-kernel mailing list, but Neil Brown
wrote:
> This looks very much like a reiser4 problem rather than a raid
> problem, or at least you will need someone very familiar with reiser4
> to understand what is going on here.
>
So here's my post.
The call trace of the hanging umount is the following:
> syslog-ng R running 0 7581 1 7197 (NOTLB)
> umount D C011B591 0 7588 7200 (NOTLB)
> f6643c98 00000000 c1808320 c011b591 f7db5ad0 f4d18c00 003d092a 00000000
> 00000000 f7db5ad0 c1808320 00000000 f4d18c00 003d092a f6b33540 c1808320
> 00000000 f4d18c00 003d092a f6b33540 f6b33668 c1808320 00000000 f6643cfc
> Call Trace:
> [<c011b591>] __wake_up_common+0x41/0x70
> [<c0318346>] io_schedule+0x26/0x30
> [<c01469fb>] sync_page+0x4b/0x60
> [<c03184d5>] __wait_on_bit+0x45/0x70
> [<c01469b0>] sync_page+0x0/0x60
> [<c014726d>] wait_on_page_bit+0xad/0xc0
> [<c0136a30>] wake_bit_function+0x0/0x60
> [<c0136a30>] wake_bit_function+0x0/0x60
> [<c01ac2f9>] jwait_io+0x59/0x80
> [<c01c1763>] update_journal_header+0x83/0xb0
> [<c01c272a>] commit_tx+0xca/0x110
> [<c01c29a1>] reiser4_write_logs+0x141/0x1e0
> [<c01b9d91>] commit_current_atom+0x171/0x2c0
> [<c01baabf>] try_commit_txnh+0x13f/0x1e0
> [<c01bab94>] commit_txnh+0x34/0xd0
> [<c01b91bc>] txn_end+0x2c/0x30
> [<c01b91d0>] txn_restart+0x10/0x30
> [<c01b920a>] txn_restart_current+0x1a/0x20
> [<c01b9f1f>] force_commit_atom+0x3f/0x70
> [<c01ba03a>] txnmgr_force_commit_all+0xea/0x130
> [<c01fcb0e>] release_format40+0x7e/0x150
> [<c01b5ea8>] init_context+0x58/0x80
> [<c01c8b89>] reiser4_put_super+0x89/0xf0
> [<c01857ed>] invalidate_inodes+0x5d/0x80
> [<c0170fb6>] generic_shutdown_super+0x156/0x160
> [<c0171b2d>] kill_block_super+0x2d/0x50
> [<c0170d90>] deactivate_super+0x60/0x80
> [<c0188e1f>] sys_umount+0x3f/0x90
> [<c01171b0>] do_page_fault+0x1c0/0x5a8
> [<c0159bc1>] sys_munmap+0x51/0x80
> [<c0188e87>] sys_oldumount+0x17/0x20
> [<c010306b>] sysenter_past_esp+0x54/0x75
Thanx in advance,
Arend
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-29 20:17 Arend Freije
@ 2006-05-30 11:53 ` Alexander Zarochentsev
2006-05-30 19:20 ` Arend Freije
2006-05-30 15:31 ` Vladimir V. Saveliev
1 sibling, 1 reply; 15+ messages in thread
From: Alexander Zarochentsev @ 2006-05-30 11:53 UTC (permalink / raw)
To: reiserfs-list; +Cc: Arend Freije
Hello,
On Tuesday 30 May 2006 00:17, Arend Freije wrote:
> Hi,
>
> I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it
> works just fine. Recently I bought a second disk (/dev/sdb) for
> RAID-1 mirroring. With mdadm I created a degraded raid-1 array on
> /dev/md/0, devices missing,/dev/sdb1. After that I created a Reiser4
> filesystem on /dev/md/0 and mounted it at /mnt. Then I copied the
> data from /dev/sda1 to /mnt.
would it work better with "no_write_barrier" mount option?
> All goes well until I umount /mnt, umount simply hangs without any
> error. Syslog doesn't report any error. In /proc/mdstat, the array
> remains "active sync". Shutting down linux fails because the umount
> is still waiting and seems to block other umounts. The umount process
> cannot be killed by the root. A hard reset is the only resolution to
> get my system functioning normally, but without the raid-1 of course.
> The problem seems to emerge only with the combination of RAID and
> Reiser4. I've created an ext-2 filesystem on /dev/md/0, and after
> that mount ; cp -ax ; umount works without a problem, and the
> hanging umount re-appears when using Reiser4 again.
>
> My questions:
> - how can I find the cause of the hanging umount?
> - how can I fix it?
>
> A few details of my linux-box:
>
> Gentoo Linux, 2.6.16 kernel with Reiser4-for-2.6.16-2.patch.gz
> i686 AMD Athlon(tm) XP 2400+
> DC4300 SATA-II controller (Silicon Image 3124, libata + sata_sil24
> drivers) 2 x Samsung SP2504C hard disk
>
> I've posted this issue to the linux-kernel mailing list, but Neil
> Brown
>
> wrote:
> > This looks very much like a reiser4 problem rather than a raid
> > problem, or at least you will need someone very familiar with
> > reiser4 to understand what is going on here.
>
> So here's my post.
>
> The call trace of the hanging umount is the following:
> > syslog-ng R running 0 7581 1 7197
> > (NOTLB) umount D C011B591 0 7588 7200
> > (NOTLB) f6643c98 00000000 c1808320 c011b591 f7db5ad0 f4d18c00
> > 003d092a 00000000 00000000 f7db5ad0 c1808320 00000000 f4d18c00
> > 003d092a f6b33540 c1808320 00000000 f4d18c00 003d092a f6b33540
> > f6b33668 c1808320 00000000 f6643cfc Call Trace:
> > [<c011b591>] __wake_up_common+0x41/0x70
> > [<c0318346>] io_schedule+0x26/0x30
> > [<c01469fb>] sync_page+0x4b/0x60
> > [<c03184d5>] __wait_on_bit+0x45/0x70
> > [<c01469b0>] sync_page+0x0/0x60
> > [<c014726d>] wait_on_page_bit+0xad/0xc0
> > [<c0136a30>] wake_bit_function+0x0/0x60
> > [<c0136a30>] wake_bit_function+0x0/0x60
> > [<c01ac2f9>] jwait_io+0x59/0x80
> > [<c01c1763>] update_journal_header+0x83/0xb0
> > [<c01c272a>] commit_tx+0xca/0x110
> > [<c01c29a1>] reiser4_write_logs+0x141/0x1e0
> > [<c01b9d91>] commit_current_atom+0x171/0x2c0
> > [<c01baabf>] try_commit_txnh+0x13f/0x1e0
> > [<c01bab94>] commit_txnh+0x34/0xd0
> > [<c01b91bc>] txn_end+0x2c/0x30
> > [<c01b91d0>] txn_restart+0x10/0x30
> > [<c01b920a>] txn_restart_current+0x1a/0x20
> > [<c01b9f1f>] force_commit_atom+0x3f/0x70
> > [<c01ba03a>] txnmgr_force_commit_all+0xea/0x130
> > [<c01fcb0e>] release_format40+0x7e/0x150
> > [<c01b5ea8>] init_context+0x58/0x80
> > [<c01c8b89>] reiser4_put_super+0x89/0xf0
> > [<c01857ed>] invalidate_inodes+0x5d/0x80
> > [<c0170fb6>] generic_shutdown_super+0x156/0x160
> > [<c0171b2d>] kill_block_super+0x2d/0x50
> > [<c0170d90>] deactivate_super+0x60/0x80
> > [<c0188e1f>] sys_umount+0x3f/0x90
> > [<c01171b0>] do_page_fault+0x1c0/0x5a8
> > [<c0159bc1>] sys_munmap+0x51/0x80
> > [<c0188e87>] sys_oldumount+0x17/0x20
> > [<c010306b>] sysenter_past_esp+0x54/0x75
>
> Thanx in advance,
>
> Arend
>
>
>
>
> !DSPAM:447b56ed69838791294130!
--
Alex.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-30 11:53 ` Alexander Zarochentsev
@ 2006-05-30 19:20 ` Arend Freije
2006-05-31 7:13 ` Alexander Zarochentsev
0 siblings, 1 reply; 15+ messages in thread
From: Arend Freije @ 2006-05-30 19:20 UTC (permalink / raw)
To: Alexander Zarochentsev; +Cc: reiserfs-list
[-- Attachment #1: Type: text/plain, Size: 636 bytes --]
Alexander Zarochentsev wrote:
> Hello,
>
> On Tuesday 30 May 2006 00:17, Arend Freije wrote:
>
>> Hi,
>>
>> I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it
>> works just fine. Recently I bought a second disk (/dev/sdb) for
>> RAID-1 mirroring. With mdadm I created a degraded raid-1 array on
>> /dev/md/0, devices missing,/dev/sdb1. After that I created a Reiser4
>> filesystem on /dev/md/0 and mounted it at /mnt. Then I copied the
>> data from /dev/sda1 to /mnt.
>>
>
> would it work better with "no_write_barrier" mount option?
>
It would indeed. What's the purpose of this option?
Kind Regards,
Arend
[-- Attachment #2: Type: text/html, Size: 1035 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-30 19:20 ` Arend Freije
@ 2006-05-31 7:13 ` Alexander Zarochentsev
2006-05-31 9:06 ` Arend Freije
0 siblings, 1 reply; 15+ messages in thread
From: Alexander Zarochentsev @ 2006-05-31 7:13 UTC (permalink / raw)
To: reiserfs-list; +Cc: Arend Freije
On Tuesday 30 May 2006 23:20, Arend Freije wrote:
> Alexander Zarochentsev wrote:
> > Hello,
> >
> > On Tuesday 30 May 2006 00:17, Arend Freije wrote:
> >> Hi,
> >>
> >> I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it
> >> works just fine. Recently I bought a second disk (/dev/sdb) for
> >> RAID-1 mirroring. With mdadm I created a degraded raid-1 array on
> >> /dev/md/0, devices missing,/dev/sdb1. After that I created a
> >> Reiser4 filesystem on /dev/md/0 and mounted it at /mnt. Then I
> >> copied the data from /dev/sda1 to /mnt.
> >
> > would it work better with "no_write_barrier" mount option?
>
> It would indeed. What's the purpose of this option?
It disables write barrier support in reiser4 which may be buggy. It is
not necessary that reiser4 code has a bug, but the elevator, md device
or disk driver code.
> Kind Regards,
>
> Arend
>
>
> !DSPAM:447c9c9d95615939618124!
--
Alex.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-31 7:13 ` Alexander Zarochentsev
@ 2006-05-31 9:06 ` Arend Freije
2006-05-31 10:41 ` PFC
0 siblings, 1 reply; 15+ messages in thread
From: Arend Freije @ 2006-05-31 9:06 UTC (permalink / raw)
To: Alexander Zarochentsev
Cc: reiserfs-list, E.Gryaznova, Vladimir Saveliev, Hans Reiser
Alexander Zarochentsev wrote:
> On Tuesday 30 May 2006 23:20, Arend Freije wrote:
>
>> Alexander Zarochentsev wrote:
>>
>>> Hello,
>>>
>>> On Tuesday 30 May 2006 00:17, Arend Freije wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it
>>>> works just fine. Recently I bought a second disk (/dev/sdb) for
>>>> RAID-1 mirroring. With mdadm I created a degraded raid-1 array on
>>>> /dev/md/0, devices missing,/dev/sdb1. After that I created a
>>>> Reiser4 filesystem on /dev/md/0 and mounted it at /mnt. Then I
>>>> copied the data from /dev/sda1 to /mnt.
>>>>
>>> would it work better with "no_write_barrier" mount option?
>>>
>> It would indeed. What's the purpose of this option?
>>
>
> It disables write barrier support in reiser4 which may be buggy. It is
> not necessary that reiser4 code has a bug, but the elevator, md device
> or disk driver code.
>
Alexander,
You're probably right. I found several posts on the linux-kernel list
involving problems with write barrier support in combination with SATA
and ext3 . So I tried:
# mkfs -t ext3 /dev/md/0
# mount -o barrier=1 /dev/md/0 /mnt
# cp -a $src /mnt
# umount /mnt
And indeed, umount hangs now as well.
So it seems to be a linux-kernel issue after all...
If write-barrier support is a known problem, wouldn't it be better to
disable it in the Reiser4 driver by default, as in the ext3-driver ?
Kind Regards,
Arend Freije
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RAID-1 and Reiser4 issue: umount hangs
2006-05-29 20:17 Arend Freije
2006-05-30 11:53 ` Alexander Zarochentsev
@ 2006-05-30 15:31 ` Vladimir V. Saveliev
1 sibling, 0 replies; 15+ messages in thread
From: Vladimir V. Saveliev @ 2006-05-30 15:31 UTC (permalink / raw)
To: Arend Freije; +Cc: reiserfs-list
Hello
On Mon, 2006-05-29 at 22:17 +0200, Arend Freije wrote:
> Hi,
>
> I'm using Reiser4 for my filesystems on disk (/dev/sda) , and it works
> just fine. Recently I bought a second disk (/dev/sdb) for RAID-1
> mirroring. With mdadm I created a degraded raid-1 array on /dev/md/0,
> devices missing,/dev/sdb1. After that I created a Reiser4 filesystem on
> /dev/md/0 and mounted it at /mnt. Then I copied the data from /dev/sda1
> to /mnt.
> All goes well until I umount /mnt, umount simply hangs without any
> error. Syslog doesn't report any error. In /proc/mdstat, the array
> remains "active sync". Shutting down linux fails because the umount is
> still waiting and seems to block other umounts. The umount process
> cannot be killed by the root. A hard reset is the only resolution to get
> my system functioning normally, but without the raid-1 of course.
> The problem seems to emerge only with the combination of RAID and
> Reiser4. I've created an ext-2 filesystem on /dev/md/0, and after that
> mount ; cp -ax ; umount works without a problem, and the hanging umount
> re-appears when using Reiser4 again.
>
> My questions:
> - how can I find the cause of the hanging umount?
> - how can I fix it?
>
> A few details of my linux-box:
>
> Gentoo Linux, 2.6.16 kernel with Reiser4-for-2.6.16-2.patch.gz
Would you please try whether
ftp://ftp.namesys.com/pub/reiser4-for-2.6/2.6.16/reiser4-for-2.6.16-4.patch.gz
has the same problem?
> i686 AMD Athlon(tm) XP 2400+
> DC4300 SATA-II controller (Silicon Image 3124, libata + sata_sil24 drivers)
> 2 x Samsung SP2504C hard disk
>
> I've posted this issue to the linux-kernel mailing list, but Neil Brown
> wrote:
>
> > This looks very much like a reiser4 problem rather than a raid
> > problem, or at least you will need someone very familiar with reiser4
> > to understand what is going on here.
> >
>
> So here's my post.
> The call trace of the hanging umount is the following:
>
> > syslog-ng R running 0 7581 1 7197 (NOTLB)
> > umount D C011B591 0 7588 7200 (NOTLB)
> > f6643c98 00000000 c1808320 c011b591 f7db5ad0 f4d18c00 003d092a 00000000
> > 00000000 f7db5ad0 c1808320 00000000 f4d18c00 003d092a f6b33540 c1808320
> > 00000000 f4d18c00 003d092a f6b33540 f6b33668 c1808320 00000000 f6643cfc
> > Call Trace:
> > [<c011b591>] __wake_up_common+0x41/0x70
> > [<c0318346>] io_schedule+0x26/0x30
> > [<c01469fb>] sync_page+0x4b/0x60
> > [<c03184d5>] __wait_on_bit+0x45/0x70
> > [<c01469b0>] sync_page+0x0/0x60
> > [<c014726d>] wait_on_page_bit+0xad/0xc0
> > [<c0136a30>] wake_bit_function+0x0/0x60
> > [<c0136a30>] wake_bit_function+0x0/0x60
> > [<c01ac2f9>] jwait_io+0x59/0x80
> > [<c01c1763>] update_journal_header+0x83/0xb0
> > [<c01c272a>] commit_tx+0xca/0x110
> > [<c01c29a1>] reiser4_write_logs+0x141/0x1e0
> > [<c01b9d91>] commit_current_atom+0x171/0x2c0
> > [<c01baabf>] try_commit_txnh+0x13f/0x1e0
> > [<c01bab94>] commit_txnh+0x34/0xd0
> > [<c01b91bc>] txn_end+0x2c/0x30
> > [<c01b91d0>] txn_restart+0x10/0x30
> > [<c01b920a>] txn_restart_current+0x1a/0x20
> > [<c01b9f1f>] force_commit_atom+0x3f/0x70
> > [<c01ba03a>] txnmgr_force_commit_all+0xea/0x130
> > [<c01fcb0e>] release_format40+0x7e/0x150
> > [<c01b5ea8>] init_context+0x58/0x80
> > [<c01c8b89>] reiser4_put_super+0x89/0xf0
> > [<c01857ed>] invalidate_inodes+0x5d/0x80
> > [<c0170fb6>] generic_shutdown_super+0x156/0x160
> > [<c0171b2d>] kill_block_super+0x2d/0x50
> > [<c0170d90>] deactivate_super+0x60/0x80
> > [<c0188e1f>] sys_umount+0x3f/0x90
> > [<c01171b0>] do_page_fault+0x1c0/0x5a8
> > [<c0159bc1>] sys_munmap+0x51/0x80
> > [<c0188e87>] sys_oldumount+0x17/0x20
> > [<c010306b>] sysenter_past_esp+0x54/0x75
>
>
> Thanx in advance,
>
> Arend
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-06-01 18:52 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-27 22:14 RAID-1 and Reiser4 issue: umount hangs Arend Freije
2006-05-27 22:46 ` Neil Brown
2006-05-28 9:45 ` Arend Freije
2006-05-28 23:49 ` Neil Brown
2006-05-31 9:18 ` Arend Freije
2006-05-31 9:36 ` Jens Axboe
2006-06-01 5:54 ` Arend Freije
2006-06-01 18:52 ` Arend Freije
-- strict thread matches above, loose matches on Subject: below --
2006-05-29 20:17 Arend Freije
2006-05-30 11:53 ` Alexander Zarochentsev
2006-05-30 19:20 ` Arend Freije
2006-05-31 7:13 ` Alexander Zarochentsev
2006-05-31 9:06 ` Arend Freije
2006-05-31 10:41 ` PFC
2006-05-30 15:31 ` Vladimir V. Saveliev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.