public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Wendel <jwendel10@comcast.net>
To: Eric Sandeen <esandeen@redhat.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>, Jan Kara <jack@suse.cz>,
	Eric Sandeen <sandeen@sandeen.net>, Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@osdl.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.18 ext3 panic.
Date: Wed, 11 Oct 2006 21:34:13 -0700	[thread overview]
Message-ID: <452DC5C5.3040507@comcast.net> (raw)
In-Reply-To: <452DAA26.6080200@redhat.com>

Eric Sandeen wrote:
> Badari Pulavarty wrote:
>
>> Here is what I think is happening..
>>
>> journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but
>> its a part of the same page. So it remained on the page->buffers
>> list. (at this time its not part of any transaction).
>>
>> Then, ordererd_commit_write() called journal_dirty_data() and we added
>> all these buffers to BJ_SyncData list. (at this time buffer is clean -
>> not dirty).
>>
>> Now msync() called __set_page_dirty_buffers() and dirtied *all* the
>> buffers attached to this page.
>>
>> journal_submit_data_buffers() got around to this buffer and tried to
>> submit the buffer...
>
> This seems about right, but one thing bothers me in the traces; it 
> seems like there is some locking that is missing.  In
> http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt
> for example, it looks like journal_dirty_data gets started, but then 
> the buffer_head is acted on by journal_unmap_buffer, which decides 
> this buffer is part of the running transaction, past EOF, and clears 
> mapped, dirty, etc.  Then journal_dirty_data picks up again, decides 
> that the buffer is not on the right list (now BJ_None) and puts it 
> back on BJ_SyncData.  Then it gets picked up by 
> journal_submit_data_buffers and submitted, and oops.
>
> Talking with Stephen, it seemed like the page lock should synchronize 
> these threads, but I've found that we can get to journal_dirty_data 
> acting on the buffer heads w/o having the page locked...
>
> I'm still digging, and, er, grasping at straws here... Am I off base?
>
> -Eric
>
>
>> Andrew is right - only option for us to check the filesize in the
>> write out path and skip the buffers beyond EOF.
>>
>> Thanks,
>> Badari
>>
Here's another data point for your consideration. I've been seeing this 
error since I started running 2.6.18, I assumed it was hardware, so I've 
tried 3 different disks, a PATA and 2 SATA drives, with VIA and Promise 
controllers, the error has occurred on all of them. I see the error 
infrequently, always when downloading lots of small files from Usenet 
and building, copying and deleting large (200 - 300 MB). I haven't ever 
had an oops/panic, just this error.  When I run fsck, I always see a 
single message that "deleted inode nnn has zero dtime". I hope this will 
be useful.

Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): 
ext3_free_blocks_sb: bit already cleared for block 4740550
Oct 11 20:37:32 Godzilla kernel: Aborting journal on device hda5.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_free_blocks_sb: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_free_blocks_sb: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_truncate: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_orphan_del: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_delete_inode: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing 
b_committed_data
Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing 
b_committed_data
Oct 11 20:37:32 Godzilla kernel: ext3_abort called.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): 
ext3_journal_start_sb: Detected aborted journal
Oct 11 20:37:32 Godzilla kernel: Remounting filesystem read-only


  reply	other threads:[~2006-10-12  4:34 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-02 19:47 2.6.18 ext3 panic Dave Jones
2006-10-03  5:22 ` Dave Jones
2006-10-03  5:43   ` Eric Sandeen
2006-10-03  6:19     ` Andrew Morton
2006-10-03  6:40       ` Dave Jones
2006-10-03 16:45         ` Dave Jones
2006-10-09 19:46       ` Eric Sandeen
2006-10-09 19:59         ` Eric Sandeen
2006-10-09 21:59         ` Badari Pulavarty
2006-10-09 22:50           ` Dave Jones
2006-10-10 14:11             ` Jan Kara
2006-10-10 18:42               ` Andrew Morton
2006-10-10 22:03               ` Eric Sandeen
2006-10-10 22:25                 ` Badari Pulavarty
2006-10-11  1:43                   ` Eric Sandeen
2006-10-11 10:33                     ` Jan Kara
2006-10-11 13:44                       ` Eric Sandeen
2006-10-11 14:22                         ` Jan Kara
2006-10-11 17:54                           ` Badari Pulavarty
2006-10-12  2:36                             ` Eric Sandeen
2006-10-12  4:34                               ` John Wendel [this message]
2006-10-12  6:57                                 ` Jan-Benedict Glaw
2006-10-12 12:28                               ` Jan Kara
2006-10-12 13:20                                 ` Eric Sandeen
2006-10-12 16:40                                 ` Andrew Morton
2006-10-12 16:44                                   ` Eric Sandeen
2006-10-12 20:07                                   ` Eric Sandeen
2006-10-12 21:55                                     ` Badari Pulavarty
2006-10-12 21:57                                       ` Eric Sandeen
2006-10-12 22:34                                         ` Badari Pulavarty
2006-10-13  7:56                                       ` Jan Kara
2006-10-13 16:08                                         ` Eric Sandeen
2006-10-16 16:54                                           ` Jan Kara
2006-10-16 16:56                                             ` Eric Sandeen
2006-10-09 22:40         ` Jan-Benedict Glaw
2006-10-10 13:16           ` Jan Kara
2006-10-10 16:39             ` Jan-Benedict Glaw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=452DC5C5.3040507@comcast.net \
    --to=jwendel10@comcast.net \
    --cc=akpm@osdl.org \
    --cc=davej@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbadari@us.ibm.com \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox