From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Zarochentsev Subject: Re: / is no longer Reiser4 :( Date: Mon, 21 Nov 2005 22:56:36 +0300 Message-ID: <200511212256.36359.zam@namesys.com> References: <200511191515.48570.jgilmore@glycou.com> <200511212215.47467.zam@namesys.com> <1132601006.16002.9.camel@gentoo> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <1132601006.16002.9.camel@gentoo> Content-Disposition: inline List-Id: Content-Type: text/plain; charset="us-ascii" To: Jake Maciejewski Cc: Hans Reiser , John Gilmore , reiserfs-list@namesys.com On Monday 21 November 2005 22:23, Jake Maciejewski wrote: > On Mon, 2005-11-21 at 22:15 +0300, Alexander Zarochentsev wrote: > > Hi > > > > On Monday 21 November 2005 20:57, Hans Reiser wrote: > > > zam, please look into this. > > > > > > > > > Hans > > > > > > John Gilmore wrote: > > > >Following Han's comment about the deliterious effects of 6% > > > > fragmentation, I attempted a manual defrag of my hard disk. > > > > > > > >While restoring the .tar file, I had nothing better to do than watch > > > > it. And a good thing too! It got a recurring oops. about every other > > > > minute or so, it would stop with a long kernel message than mostly > > > > scrolled off of the screen... I thought those where supposed to show > > > > up in a log files somewhere if possible, but I can't find it. And it > > > > should have been possible, as the computer continued to run just > > > > fine. > > > > > > > >These oopses caused some sort of data corruption - root wouldn't boot > > > > one bug responsible for fs corruption was fixed recently. > > the fix is in 2.6.14-mm2 already. > > Can we get a fix for vanilla? I haven't had problems yet, but I don't > want to run mm unless absolutely necessary, and lately I've lost > confidence in the "apply mm patches to vanilla and hope it works" > approach. reiser4-for-2.6.14-1.patch.gz contains the fix as well, the initial fix was: --- a/as_ops.c +++ b/as_ops.c @@ -229,7 +229,7 @@ int reiser4_invalidatepage(struct page * node = jprivate(page); spin_lock_jnode(node); if (!JF_ISSET(node, JNODE_DIRTY) && !JF_ISSET(node, JNODE_FLUSH_QUEUED) && - !JF_ISSET(node, JNODE_WRITEBACK)) { + !JF_ISSET(node, JNODE_WRITEBACK) && !JF_ISSET(node, JNODE_OVRWR)) { /* there is not need to capture */ jref(node); JF_SET(node, JNODE_HEARD_BANSHEE); our git repo shows that the bug was added at 16 of August. > > > > > properly afterwards. So I reformated as ext3 and untarred my root > > > > again. That worked fine, so I know it wasn't corruption of the tar > > > > file. > > > > > > > >I took a photograph, and I'll try to type in some of it. Just looking > > > > at the names of the procudures, it looks like memory pressure made > > > > reiser4 flush, and then some of the lower level functions tried to > > > > allocate memory and failed. But since I don't have the top of the > > > > oops message, I can't tell. > > > > > > > >Wait - I could've stopped the scrolling with ^S, scrolled back with > > > > ^pageup, and photoed the whole thing! Aaaargghh.... > > > > > > > >Well, I'm not redoing it right now, I need to be getting to bed. > > > > > > > >I may try it again later - but then maybe I'll update to 2.6.14-mm2 > > > > with patch from namesys first... > > > > > > > >Here's the (tail end of the) oops message, sans addresses and offsets > > > > because I'm feeling lazy and I'm in a hurry: > > > > > > > >mempool_alloc+0x3a/0xe0 > > > >__split_bio+0x128/0x190 > > > >in_drive_list > > > >dm_request > > > >generic_make_request > > > >submit_bio > > > >do_IRQ > > > >reiser4_clear_page_dirty > > > >write_jnodes_to_disk_extent > > > >write_jnode_list > > > >write_fq > > > >flush_current_atom > > > >flush_some_atom > > > >writeout > > > >reiser4_sync_inodes > > > >writeback_inodes > > > >background_writeout > > > >pdflush > > > >__pdflush > > > >pdflush > > > >background_writeout > > > >kthread > > > >kthread > > > >kernel_thread_helper -- Alex.