All of lore.kernel.org
 help / color / mirror / Atom feed
* / is no longer Reiser4 :(
@ 2005-11-19 15:15 John Gilmore
  2005-11-21 17:57 ` Hans Reiser
  0 siblings, 1 reply; 7+ messages in thread
From: John Gilmore @ 2005-11-19 15:15 UTC (permalink / raw)
  To: reiserfs-list

Following Han's comment about the deliterious effects of 6% fragmentation, I 
attempted a manual defrag of my hard disk.

While restoring the .tar file, I had nothing better to do than watch it. And a 
good thing too! It got a recurring oops. about every other minute or so, it 
would stop with a long kernel message than mostly scrolled off of the 
screen... I thought those where supposed to show up in a log files somewhere 
if possible, but I can't find it. And it should have been possible, as the 
computer continued to run just fine.

These oopses caused some sort of data corruption - root wouldn't boot properly 
afterwards. So I reformated as ext3 and untarred my root again. That worked 
fine, so I know it wasn't corruption of the tar file.

I took a photograph, and I'll try to type in some of it. Just looking at the 
names of the procudures, it looks like memory pressure made reiser4 flush, 
and then some of the lower level functions tried to allocate memory and 
failed. But since I don't have the top of the oops message, I can't tell.

Wait - I could've stopped the scrolling with ^S, scrolled back with ^pageup, 
and photoed the whole thing! Aaaargghh....

Well, I'm not redoing it right now, I need to be getting to bed.

I may try it again later - but then maybe I'll update to 2.6.14-mm2 with patch 
from namesys first...

Here's the (tail end of the) oops message, sans addresses and offsets because 
I'm feeling lazy and I'm in a hurry:

mempool_alloc+0x3a/0xe0
__split_bio+0x128/0x190
in_drive_list
dm_request
generic_make_request
submit_bio
do_IRQ
reiser4_clear_page_dirty
write_jnodes_to_disk_extent
write_jnode_list
write_fq
flush_current_atom
flush_some_atom
writeout
reiser4_sync_inodes
writeback_inodes
background_writeout
pdflush
__pdflush
pdflush
background_writeout
kthread
kthread
kernel_thread_helper

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: / is no longer Reiser4 :(
  2005-11-19 15:15 / is no longer Reiser4 :( John Gilmore
@ 2005-11-21 17:57 ` Hans Reiser
  2005-11-21 19:15   ` Alexander Zarochentsev
  0 siblings, 1 reply; 7+ messages in thread
From: Hans Reiser @ 2005-11-21 17:57 UTC (permalink / raw)
  To: Alexander Zarochentcev; +Cc: John Gilmore, reiserfs-list

zam, please look into this.

Hans

John Gilmore wrote:

>Following Han's comment about the deliterious effects of 6% fragmentation, I 
>attempted a manual defrag of my hard disk.
>
>While restoring the .tar file, I had nothing better to do than watch it. And a 
>good thing too! It got a recurring oops. about every other minute or so, it 
>would stop with a long kernel message than mostly scrolled off of the 
>screen... I thought those where supposed to show up in a log files somewhere 
>if possible, but I can't find it. And it should have been possible, as the 
>computer continued to run just fine.
>
>These oopses caused some sort of data corruption - root wouldn't boot properly 
>afterwards. So I reformated as ext3 and untarred my root again. That worked 
>fine, so I know it wasn't corruption of the tar file.
>
>I took a photograph, and I'll try to type in some of it. Just looking at the 
>names of the procudures, it looks like memory pressure made reiser4 flush, 
>and then some of the lower level functions tried to allocate memory and 
>failed. But since I don't have the top of the oops message, I can't tell.
>
>Wait - I could've stopped the scrolling with ^S, scrolled back with ^pageup, 
>and photoed the whole thing! Aaaargghh....
>
>Well, I'm not redoing it right now, I need to be getting to bed.
>
>I may try it again later - but then maybe I'll update to 2.6.14-mm2 with patch 
>from namesys first...
>
>Here's the (tail end of the) oops message, sans addresses and offsets because 
>I'm feeling lazy and I'm in a hurry:
>
>mempool_alloc+0x3a/0xe0
>__split_bio+0x128/0x190
>in_drive_list
>dm_request
>generic_make_request
>submit_bio
>do_IRQ
>reiser4_clear_page_dirty
>write_jnodes_to_disk_extent
>write_jnode_list
>write_fq
>flush_current_atom
>flush_some_atom
>writeout
>reiser4_sync_inodes
>writeback_inodes
>background_writeout
>pdflush
>__pdflush
>pdflush
>background_writeout
>kthread
>kthread
>kernel_thread_helper
>
>
>  
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: / is no longer Reiser4 :(
  2005-11-21 17:57 ` Hans Reiser
@ 2005-11-21 19:15   ` Alexander Zarochentsev
  2005-11-21 19:23     ` Jake Maciejewski
  2005-11-21 23:17     ` Hans Reiser
  0 siblings, 2 replies; 7+ messages in thread
From: Alexander Zarochentsev @ 2005-11-21 19:15 UTC (permalink / raw)
  To: Hans Reiser; +Cc: John Gilmore, reiserfs-list

Hi

On Monday 21 November 2005 20:57, Hans Reiser wrote:
> zam, please look into this.

>
> Hans
>
> John Gilmore wrote:
> >Following Han's comment about the deliterious effects of 6% fragmentation,
> > I attempted a manual defrag of my hard disk.
> >
> >While restoring the .tar file, I had nothing better to do than watch it.
> > And a good thing too! It got a recurring oops. about every other minute
> > or so, it would stop with a long kernel message than mostly scrolled off
> > of the screen... I thought those where supposed to show up in a log files
> > somewhere if possible, but I can't find it. And it should have been
> > possible, as the computer continued to run just fine.
> >
> >These oopses caused some sort of data corruption - root wouldn't boot

one bug responsible for fs corruption was fixed recently.
the fix is in 2.6.14-mm2 already.

> > properly afterwards. So I reformated as ext3 and untarred my root again.
> > That worked fine, so I know it wasn't corruption of the tar file.
> >
> >I took a photograph, and I'll try to type in some of it. Just looking at
> > the names of the procudures, it looks like memory pressure made reiser4
> > flush, and then some of the lower level functions tried to allocate
> > memory and failed. But since I don't have the top of the oops message, I
> > can't tell.
> >
> >Wait - I could've stopped the scrolling with ^S, scrolled back with
> > ^pageup, and photoed the whole thing! Aaaargghh....
> >
> >Well, I'm not redoing it right now, I need to be getting to bed.
> >
> >I may try it again later - but then maybe I'll update to 2.6.14-mm2 with
> > patch from namesys first...
> >
> >Here's the (tail end of the) oops message, sans addresses and offsets
> > because I'm feeling lazy and I'm in a hurry:
> >
> >mempool_alloc+0x3a/0xe0
> >__split_bio+0x128/0x190
> >in_drive_list
> >dm_request
> >generic_make_request
> >submit_bio
> >do_IRQ
> >reiser4_clear_page_dirty
> >write_jnodes_to_disk_extent
> >write_jnode_list
> >write_fq
> >flush_current_atom
> >flush_some_atom
> >writeout
> >reiser4_sync_inodes
> >writeback_inodes
> >background_writeout
> >pdflush
> >__pdflush
> >pdflush
> >background_writeout
> >kthread
> >kthread
> >kernel_thread_helper

-- 
Alex.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: / is no longer Reiser4 :(
  2005-11-21 19:15   ` Alexander Zarochentsev
@ 2005-11-21 19:23     ` Jake Maciejewski
  2005-11-21 19:56       ` Alexander Zarochentsev
  2005-11-21 23:17     ` Hans Reiser
  1 sibling, 1 reply; 7+ messages in thread
From: Jake Maciejewski @ 2005-11-21 19:23 UTC (permalink / raw)
  To: Alexander Zarochentsev; +Cc: Hans Reiser, John Gilmore, reiserfs-list

On Mon, 2005-11-21 at 22:15 +0300, Alexander Zarochentsev wrote:
> Hi
> 
> On Monday 21 November 2005 20:57, Hans Reiser wrote:
> > zam, please look into this.
> 
> >
> > Hans
> >
> > John Gilmore wrote:
> > >Following Han's comment about the deliterious effects of 6% fragmentation,
> > > I attempted a manual defrag of my hard disk.
> > >
> > >While restoring the .tar file, I had nothing better to do than watch it.
> > > And a good thing too! It got a recurring oops. about every other minute
> > > or so, it would stop with a long kernel message than mostly scrolled off
> > > of the screen... I thought those where supposed to show up in a log files
> > > somewhere if possible, but I can't find it. And it should have been
> > > possible, as the computer continued to run just fine.
> > >
> > >These oopses caused some sort of data corruption - root wouldn't boot
> 
> one bug responsible for fs corruption was fixed recently.
> the fix is in 2.6.14-mm2 already.

Can we get a fix for vanilla? I haven't had problems yet, but I don't
want to run mm unless absolutely necessary, and lately I've lost
confidence in the "apply mm patches to vanilla and hope it works"
approach.

> > > properly afterwards. So I reformated as ext3 and untarred my root again.
> > > That worked fine, so I know it wasn't corruption of the tar file.
> > >
> > >I took a photograph, and I'll try to type in some of it. Just looking at
> > > the names of the procudures, it looks like memory pressure made reiser4
> > > flush, and then some of the lower level functions tried to allocate
> > > memory and failed. But since I don't have the top of the oops message, I
> > > can't tell.
> > >
> > >Wait - I could've stopped the scrolling with ^S, scrolled back with
> > > ^pageup, and photoed the whole thing! Aaaargghh....
> > >
> > >Well, I'm not redoing it right now, I need to be getting to bed.
> > >
> > >I may try it again later - but then maybe I'll update to 2.6.14-mm2 with
> > > patch from namesys first...
> > >
> > >Here's the (tail end of the) oops message, sans addresses and offsets
> > > because I'm feeling lazy and I'm in a hurry:
> > >
> > >mempool_alloc+0x3a/0xe0
> > >__split_bio+0x128/0x190
> > >in_drive_list
> > >dm_request
> > >generic_make_request
> > >submit_bio
> > >do_IRQ
> > >reiser4_clear_page_dirty
> > >write_jnodes_to_disk_extent
> > >write_jnode_list
> > >write_fq
> > >flush_current_atom
> > >flush_some_atom
> > >writeout
> > >reiser4_sync_inodes
> > >writeback_inodes
> > >background_writeout
> > >pdflush
> > >__pdflush
> > >pdflush
> > >background_writeout
> > >kthread
> > >kthread
> > >kernel_thread_helper
> 
-- 
Jake Maciejewski <maciejej@msoe.edu>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: / is no longer Reiser4 :(
  2005-11-21 19:23     ` Jake Maciejewski
@ 2005-11-21 19:56       ` Alexander Zarochentsev
  0 siblings, 0 replies; 7+ messages in thread
From: Alexander Zarochentsev @ 2005-11-21 19:56 UTC (permalink / raw)
  To: Jake Maciejewski; +Cc: Hans Reiser, John Gilmore, reiserfs-list

On Monday 21 November 2005 22:23, Jake Maciejewski wrote:
> On Mon, 2005-11-21 at 22:15 +0300, Alexander Zarochentsev wrote:
> > Hi
> >
> > On Monday 21 November 2005 20:57, Hans Reiser wrote:
> > > zam, please look into this.
> > >
> > >
> > > Hans
> > >
> > > John Gilmore wrote:
> > > >Following Han's comment about the deliterious effects of 6%
> > > > fragmentation, I attempted a manual defrag of my hard disk.
> > > >
> > > >While restoring the .tar file, I had nothing better to do than watch
> > > > it. And a good thing too! It got a recurring oops. about every other
> > > > minute or so, it would stop with a long kernel message than mostly
> > > > scrolled off of the screen... I thought those where supposed to show
> > > > up in a log files somewhere if possible, but I can't find it. And it
> > > > should have been possible, as the computer continued to run just
> > > > fine.
> > > >
> > > >These oopses caused some sort of data corruption - root wouldn't boot
> >
> > one bug responsible for fs corruption was fixed recently.
> > the fix is in 2.6.14-mm2 already.
>
> Can we get a fix for vanilla? I haven't had problems yet, but I don't
> want to run mm unless absolutely necessary, and lately I've lost
> confidence in the "apply mm patches to vanilla and hope it works"
> approach.

reiser4-for-2.6.14-1.patch.gz contains the fix as well,

the initial fix was:

--- a/as_ops.c
+++ b/as_ops.c
@@ -229,7 +229,7 @@ int reiser4_invalidatepage(struct page *
        node = jprivate(page);
        spin_lock_jnode(node);
        if (!JF_ISSET(node, JNODE_DIRTY) && !JF_ISSET(node, 
JNODE_FLUSH_QUEUED) &&
-           !JF_ISSET(node, JNODE_WRITEBACK)) {
+           !JF_ISSET(node, JNODE_WRITEBACK) && !JF_ISSET(node, JNODE_OVRWR)) 
{
                /* there is not need to capture */
                jref(node);
                JF_SET(node, JNODE_HEARD_BANSHEE);

our git repo shows that the bug was added at 16 of August.

>
> > > > properly afterwards. So I reformated as ext3 and untarred my root
> > > > again. That worked fine, so I know it wasn't corruption of the tar
> > > > file.
> > > >
> > > >I took a photograph, and I'll try to type in some of it. Just looking
> > > > at the names of the procudures, it looks like memory pressure made
> > > > reiser4 flush, and then some of the lower level functions tried to
> > > > allocate memory and failed. But since I don't have the top of the
> > > > oops message, I can't tell.
> > > >
> > > >Wait - I could've stopped the scrolling with ^S, scrolled back with
> > > > ^pageup, and photoed the whole thing! Aaaargghh....
> > > >
> > > >Well, I'm not redoing it right now, I need to be getting to bed.
> > > >
> > > >I may try it again later - but then maybe I'll update to 2.6.14-mm2
> > > > with patch from namesys first...
> > > >
> > > >Here's the (tail end of the) oops message, sans addresses and offsets
> > > > because I'm feeling lazy and I'm in a hurry:
> > > >
> > > >mempool_alloc+0x3a/0xe0
> > > >__split_bio+0x128/0x190
> > > >in_drive_list
> > > >dm_request
> > > >generic_make_request
> > > >submit_bio
> > > >do_IRQ
> > > >reiser4_clear_page_dirty
> > > >write_jnodes_to_disk_extent
> > > >write_jnode_list
> > > >write_fq
> > > >flush_current_atom
> > > >flush_some_atom
> > > >writeout
> > > >reiser4_sync_inodes
> > > >writeback_inodes
> > > >background_writeout
> > > >pdflush
> > > >__pdflush
> > > >pdflush
> > > >background_writeout
> > > >kthread
> > > >kthread
> > > >kernel_thread_helper

-- 
Alex.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: / is no longer Reiser4 :(
  2005-11-21 19:15   ` Alexander Zarochentsev
  2005-11-21 19:23     ` Jake Maciejewski
@ 2005-11-21 23:17     ` Hans Reiser
  2005-11-23  9:34       ` John Gilmore
  1 sibling, 1 reply; 7+ messages in thread
From: Hans Reiser @ 2005-11-21 23:17 UTC (permalink / raw)
  To: Alexander Zarochentsev; +Cc: John Gilmore, reiserfs-list

Alexander Zarochentsev wrote:

>
>one bug responsible for fs corruption was fixed recently.
>the fix is in 2.6.14-mm2 already.
>
>  
>
Then send an email titled something like "Data corruption bug was fixed,
be sure to upgrade!" to our list.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: / is no longer Reiser4 :(
  2005-11-21 23:17     ` Hans Reiser
@ 2005-11-23  9:34       ` John Gilmore
  0 siblings, 0 replies; 7+ messages in thread
From: John Gilmore @ 2005-11-23  9:34 UTC (permalink / raw)
  To: reiserfs-list

On Monday 21 November 2005 23:17, Hans Reiser wrote:
> Alexander Zarochentsev wrote:
> >one bug responsible for fs corruption was fixed recently.
> >the fix is in 2.6.14-mm2 already.
>
> Then send an email titled something like "Data corruption bug was fixed,
> be sure to upgrade!" to our list.

I tried it again and got the complete oops text. It's a "soft lockup detected 
on CPU#0" message, which leads me to believe that it's a side effect of the 
sync taking a long time. I've got 1.5 gigs of memory and a very slow hard 
disk. hdparm -tT gives ~4.5 MB/s or up to 8 MB/s if I have everythings turned 
on that I can. I can't enable dma, because hdparm refuses to do so, and I 
haven't figured out which parameters to pass to which modules to make it so 
that I can. It's also possible that my hardware is buggy, and the driver 
knows that and is thus refusing to enable dma and corrupt data.

I've got 2.6.14-mm2 with the latest reiser4 patch, but it's giving my loads of 
garbage like:
*** Warning: 
"plugin_set_compression" [fs/reiser4/plugin/compress/compress_plugins.ko] 
undefined!

I think that maybe the source was corrupted in the restore process (I'll have 
to do it again---later)

I'm moving on friday/saturday, and I don't have arrangements for internet 
access at the new digs yet, so if you've got questions, ask them now...


BUG: soft lockup detected on CPU#0

Pid: 4582, comm:              pdflush
EIP: 0060:[<f892da3b>] CPU: 0
EIP is at ide_pio_sector+0xcb/0x120 [ide_core]
 EFLAGS: 00000282    Not tainted  (2.6.14-mm1)
EAX: ec5bc000 EBX: eb531000 ECX: 00000000 EDX: 000001f0
ESI: 00000004 EDO: f893f120 ENP: 00000282 DS: 007b ES: 007b
CR0: 8005003b CR2: 0812b008 CR3: 298b8000 CR4: 000006d0
 [<f892dadd>] ide_pio_multi+0x4d/0x70 [ide_core]
 [<f892de61>] task_out_intr+0x101/0x140 [ide_core]
 [<f892836d>] ide_intr+0x7d/0x180 [ide_core]
 [<f892dd60>] task_out_intr+0x0/0x140 [ide_core]
 [<c013c22d>] handle_IRQ_event+0x3d/0x70
 [<c013c2c3>] __do_IRQ+0x63/0xc0
 [<c0105379>] do_IRQ+0x19/0x30
 [<c0103b1a>] common_interupt+0x1a/0x20
 [<c013d6ca>] unlock_page+0xa/0x30
 [<f8a4c130>] write_jnodes_to_disk_extent+0x1b0/0x2c0 [reiser4]
 [<f8a4c4c9>] write_jnode_list+0xa9/0x110 [reiser4]
 [<f8a51483>] write_fq+0x53/0x70 [reiser4]
 [<f9a47d19>] write_prepped_nodes+0x39/0x40 [resier4]
 [<f8a48f0c>] squeeze_right_twig+0x10c/0x160 [reiser4]
 [<f8a49156>] squeeze_right_twig_and_advance_coord+0x26/0x80 [reiser4]
 [<f8a49a84>] handle_pos_end_of_twig+0xd4/0x290 [reiser4]
 [<f8a810d5>] item_length_by_coord+0x15/0x20 [reiser4]
 [<f8a49f28>] squalloc+0x28/0x60 [reiser4]
 [<f8a4829f>] jnode_flush+0x2cf/0x340 [reiser4]
 [<f8a48519>] flush_current_atom+0xf9/0x250 [reiser4]
 [<f8a457cf>] flush_some_atom+0xaf/0x2c0 [reiser4]
 [<f8a565a4>] writeout+0x124/0x200 [reiser4]
 [<f8a52c04>] reiser4_sync_inodes+0x64/0xf0 [reiser4]
 [<c0184b0d>] writeback_inodes+0x4d/0xb0
 [<c0144118>] background_writeout+0x98/0xe0
 [<c0144cb0>] pdflush+0x0/0x30
 [<c0144c0d>] __pdflush+0xbd/0x160
 [<c0144cd6>] pdflush+0x26/0x30
 [<c0144080>] background_writeout+0x0/0xe0
 [<c0144080>] background_writeout+0x0/0xe0
 [<c012f1e6>] kthread+0xb6/0xc0
 [<c012f130>] kthread+0x0/0xc0
 [<c0101369>] kernel_thread_helper+0x5/0xc


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-11-23  9:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-19 15:15 / is no longer Reiser4 :( John Gilmore
2005-11-21 17:57 ` Hans Reiser
2005-11-21 19:15   ` Alexander Zarochentsev
2005-11-21 19:23     ` Jake Maciejewski
2005-11-21 19:56       ` Alexander Zarochentsev
2005-11-21 23:17     ` Hans Reiser
2005-11-23  9:34       ` John Gilmore

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.