linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] jbd2: Checkpointing fix and cleanups
@ 2012-01-11  0:31 Jan Kara
  2012-01-11  0:31 ` [PATCH 1/6] jbd2: Issue cache flush after checkpointing even with internal journal Jan Kara
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Jan Kara @ 2012-01-11  0:31 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4


  Hello,

  I'm chasing for some time occasional reports of filesystem corruption
demonstrated most often by 'bit already cleared' errors. I'm seeing such
reports for several years with a rate of about 1 - 2 per year. At first I
attributed those to memory errors (and some of those reports indeed might
be due to HW problems) but some of them probably are not. Recently I've got
one such report and user was nice enough to get me e2image of corrupted
filesystem from which it was more or less obvious that during a crash we lost
writes to some blocks (bitmap was among them).

I think the problem is due to a missing cache flush in checkpointing code
(see patch 1 for details). I've tweaked Chris Mason's barrier-test IO
scheduler to be evil in reordering requests in the right way and indeed I
was able to trigger the fs corruption after a crash.

When I was inspecting checkpointing code, I also found several things that
deserve a cleanup so patches 2-5 are a result of that. Finally patch 6 is
a possible speedup - we can use barriers happening during transaction commits
for pushing the journal tail safely. The observable speedup is disputable
since jbd2_cleanup_journal_tail() is called rather rarely (for metadata heavy
load I saw about one jbd2_cleanup_journal_tail() for about 200 commits) so
the cost of additional cache flush will be likely in the noise. But the patch
is simple enough so I send it for others to judge whether it makes sense or
not.

Review is highly welcome.

								Honza

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-02-10 13:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-11  0:31 [PATCH 0/6] jbd2: Checkpointing fix and cleanups Jan Kara
2012-01-11  0:31 ` [PATCH 1/6] jbd2: Issue cache flush after checkpointing even with internal journal Jan Kara
2012-01-11 12:49   ` Jan Kara
2012-02-09  3:05     ` Ted Ts'o
2012-02-09  5:26       ` Theodore Tso
2012-02-10 13:58         ` Jan Kara
2012-02-10 13:55       ` Jan Kara
2012-01-11  0:31 ` [PATCH 2/6] jbd2: Fix BH_JWrite setting in checkpointing code Jan Kara
2012-01-11  0:31 ` [PATCH 3/6] jbd2: __jbd2_journal_temp_unlink_buffer() is static Jan Kara
2012-01-11  0:31 ` [PATCH 4/6] jbd2: Remove always true condition in __journal_try_to_free_buffer() Jan Kara
2012-01-11  0:31 ` [PATCH 5/6] jbd2: Remove bh_state lock from checkpointing code Jan Kara
2012-01-11  0:31 ` [PATCH 6/6] jbd2: Cleanup journal tail after transaction commit Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).