* ocfs2 inconsistent when updating journal superblock failed
@ 2015-06-02  7:47 Joseph Qi
  2015-06-03  2:40 ` [Ocfs2-devel] " Junxiao Bi
  2015-06-04 11:26 ` Joseph Qi
  0 siblings, 2 replies; 7+ messages in thread
From: Joseph Qi @ 2015-06-02  7:47 UTC (permalink / raw)
  To: ocfs2-devel@oss.oracle.com, linux-ext4
Hi all,
If jbd2 has failed to update superblock because of iscsi link down, it
may cause ocfs2 inconsistent.
kernel version: 3.0.93
dmesg:
JBD2: I/O error detected when updating journal superblock for
dm-41-36.
Case description:
Node 1 was doing the checkpoint of global bitmap.
ocfs2_commit_thread
  ocfs2_commit_cache
    jbd2_journal_flush
      jbd2_cleanup_journal_tail
        jbd2_journal_update_superblock
          sync_dirty_buffer
            submit_bh  *failed*
Since the error was ignored, jbd2_journal_flush would return 0.
Then ocfs2_commit_cache thought it normal, incremented trans id and woke
downconvert thread.
So node 2 could get the lock because the checkpoint had been done
successfully (in fact, bitmap on disk had been updated but journal
superblock not). Then node 2 did the update to global bitmap as normal.
After a while, node 2 found node 1 down and began the journal recovery.
As a result, the new update by node 2 would be overwritten and filesystem
became inconsistent.
I'm not sure if ext4 has the same case (can it be deployed on LUN?).
But for ocfs2, I don't think the error can be omitted.
Any ideas about this?
Thanks,
Joseph
^ permalink raw reply	[flat|nested] 7+ messages in thread* Re: [Ocfs2-devel] ocfs2 inconsistent when updating journal superblock failed 2015-06-02 7:47 ocfs2 inconsistent when updating journal superblock failed Joseph Qi @ 2015-06-03 2:40 ` Junxiao Bi 2015-06-03 3:52 ` Joseph Qi 2015-06-04 11:26 ` Joseph Qi 1 sibling, 1 reply; 7+ messages in thread From: Junxiao Bi @ 2015-06-03 2:40 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel@oss.oracle.com, linux-ext4 Hi Joseph, On 06/02/2015 03:47 PM, Joseph Qi wrote: > Hi all, > If jbd2 has failed to update superblock because of iscsi link down, it > may cause ocfs2 inconsistent. > > kernel version: 3.0.93 > dmesg: > JBD2: I/O error detected when updating journal superblock for > dm-41-36. > > Case description: > Node 1 was doing the checkpoint of global bitmap. > ocfs2_commit_thread > ocfs2_commit_cache > jbd2_journal_flush > jbd2_cleanup_journal_tail > jbd2_journal_update_superblock > sync_dirty_buffer > submit_bh *failed* > Since the error was ignored, jbd2_journal_flush would return 0. > Then ocfs2_commit_cache thought it normal, incremented trans id and woke > downconvert thread. > So node 2 could get the lock because the checkpoint had been done > successfully (in fact, bitmap on disk had been updated but journal > superblock not). Then node 2 did the update to global bitmap as normal. > After a while, node 2 found node 1 down and began the journal recovery. > As a result, the new update by node 2 would be overwritten and filesystem > became inconsistent. If this is the case, this seemed a generic issue. Assume a two node cluster, node 1 updated global bitmap, and the transaction for this update have been written into node 1's journal. Then node 2 updated global bitmap, after that, node 1 crash and node 2 replay node 1's journal and will overwrite global bitmap to old one. Do i miss some point? Thanks, Junxiao. > > I'm not sure if ext4 has the same case (can it be deployed on LUN?). > But for ocfs2, I don't think the error can be omitted. > Any ideas about this? > > Thanks, > Joseph > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Ocfs2-devel] ocfs2 inconsistent when updating journal superblock failed 2015-06-03 2:40 ` [Ocfs2-devel] " Junxiao Bi @ 2015-06-03 3:52 ` Joseph Qi 2015-06-03 6:58 ` Junxiao Bi 0 siblings, 1 reply; 7+ messages in thread From: Joseph Qi @ 2015-06-03 3:52 UTC (permalink / raw) To: Junxiao Bi; +Cc: ocfs2-devel@oss.oracle.com, linux-ext4 Hi Junxiao, On 2015/6/3 10:40, Junxiao Bi wrote: > Hi Joseph, > > On 06/02/2015 03:47 PM, Joseph Qi wrote: >> Hi all, >> If jbd2 has failed to update superblock because of iscsi link down, it >> may cause ocfs2 inconsistent. >> >> kernel version: 3.0.93 >> dmesg: >> JBD2: I/O error detected when updating journal superblock for >> dm-41-36. >> >> Case description: >> Node 1 was doing the checkpoint of global bitmap. >> ocfs2_commit_thread >> ocfs2_commit_cache >> jbd2_journal_flush >> jbd2_cleanup_journal_tail >> jbd2_journal_update_superblock >> sync_dirty_buffer >> submit_bh *failed* >> Since the error was ignored, jbd2_journal_flush would return 0. >> Then ocfs2_commit_cache thought it normal, incremented trans id and woke >> downconvert thread. >> So node 2 could get the lock because the checkpoint had been done >> successfully (in fact, bitmap on disk had been updated but journal >> superblock not). Then node 2 did the update to global bitmap as normal. >> After a while, node 2 found node 1 down and began the journal recovery. >> As a result, the new update by node 2 would be overwritten and filesystem >> became inconsistent. > If this is the case, this seemed a generic issue. Assume a two node > cluster, node 1 updated global bitmap, and the transaction for this > update have been written into node 1's journal. Then node 2 updated > global bitmap, after that, node 1 crash and node 2 replay node 1's > journal and will overwrite global bitmap to old one. Do i miss some point? > > Thanks, > Junxiao. > In normal case, node 2 can update global bitmap only after it has already got the lock. And this make sure node 1 has already done the checkpoint. For the case described above, one condition is the two updates should be on the same gd. And right after journal data has been flushed, updating journal superblock fails, that means sb_start still points to the old log block number. Then the journal replay during recovery will write the old update again. >> >> I'm not sure if ext4 has the same case (can it be deployed on LUN?). >> But for ocfs2, I don't think the error can be omitted. >> Any ideas about this? >> >> Thanks, >> Joseph >> >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> > > > . > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ocfs2 inconsistent when updating journal superblock failed 2015-06-03 3:52 ` Joseph Qi @ 2015-06-03 6:58 ` Junxiao Bi 2015-06-03 7:27 ` [Ocfs2-devel] " Joseph Qi 0 siblings, 1 reply; 7+ messages in thread From: Junxiao Bi @ 2015-06-03 6:58 UTC (permalink / raw) To: Joseph Qi; +Cc: linux-ext4, ocfs2-devel@oss.oracle.com Hi Joseph, On 06/03/2015 11:52 AM, Joseph Qi wrote: > Hi Junxiao, > > On 2015/6/3 10:40, Junxiao Bi wrote: >> Hi Joseph, >> >> On 06/02/2015 03:47 PM, Joseph Qi wrote: >>> Hi all, >>> If jbd2 has failed to update superblock because of iscsi link down, it >>> may cause ocfs2 inconsistent. >>> >>> kernel version: 3.0.93 >>> dmesg: >>> JBD2: I/O error detected when updating journal superblock for >>> dm-41-36. >>> >>> Case description: >>> Node 1 was doing the checkpoint of global bitmap. >>> ocfs2_commit_thread >>> ocfs2_commit_cache >>> jbd2_journal_flush >>> jbd2_cleanup_journal_tail >>> jbd2_journal_update_superblock >>> sync_dirty_buffer >>> submit_bh *failed* >>> Since the error was ignored, jbd2_journal_flush would return 0. >>> Then ocfs2_commit_cache thought it normal, incremented trans id and woke >>> downconvert thread. >>> So node 2 could get the lock because the checkpoint had been done >>> successfully (in fact, bitmap on disk had been updated but journal >>> superblock not). Then node 2 did the update to global bitmap as normal. >>> After a while, node 2 found node 1 down and began the journal recovery. >>> As a result, the new update by node 2 would be overwritten and filesystem >>> became inconsistent. >> If this is the case, this seemed a generic issue. Assume a two node >> cluster, node 1 updated global bitmap, and the transaction for this >> update have been written into node 1's journal. Then node 2 updated >> global bitmap, after that, node 1 crash and node 2 replay node 1's >> journal and will overwrite global bitmap to old one. Do i miss some point? >> >> Thanks, >> Junxiao. >> > In normal case, node 2 can update global bitmap only after it has already > got the lock. And this make sure node 1 has already done the checkpoint. Yes, you are right. > For the case described above, one condition is the two updates should be > on the same gd. And right after journal data has been flushed, updating > journal superblock fails, that means sb_start still points to the old log > block number. > Then the journal replay during recovery will write the old update again. Right. This seemed also an issue for ext4. In __jbd2_update_log_tail(), the journal starting block and seq id in memory are updated even they fail update to journal superblock in the disk. If the starting blocks are reused and an power down happen, the journal replay will corrupt the fs. I think we should return the error back. Thanks, Junxiao. > >>> >>> I'm not sure if ext4 has the same case (can it be deployed on LUN?). >>> But for ocfs2, I don't think the error can be omitted. >>> Any ideas about this? >>> >>> Thanks, >>> Joseph >>> >>> >>> _______________________________________________ >>> Ocfs2-devel mailing list >>> Ocfs2-devel@oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>> >> >> >> . >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Ocfs2-devel] ocfs2 inconsistent when updating journal superblock failed 2015-06-03 6:58 ` Junxiao Bi @ 2015-06-03 7:27 ` Joseph Qi 2015-06-03 7:38 ` Junxiao Bi 0 siblings, 1 reply; 7+ messages in thread From: Joseph Qi @ 2015-06-03 7:27 UTC (permalink / raw) To: Junxiao Bi; +Cc: ocfs2-devel@oss.oracle.com, linux-ext4 On 2015/6/3 14:58, Junxiao Bi wrote: > Hi Joseph, > > On 06/03/2015 11:52 AM, Joseph Qi wrote: >> Hi Junxiao, >> >> On 2015/6/3 10:40, Junxiao Bi wrote: >>> Hi Joseph, >>> >>> On 06/02/2015 03:47 PM, Joseph Qi wrote: >>>> Hi all, >>>> If jbd2 has failed to update superblock because of iscsi link down, it >>>> may cause ocfs2 inconsistent. >>>> >>>> kernel version: 3.0.93 >>>> dmesg: >>>> JBD2: I/O error detected when updating journal superblock for >>>> dm-41-36. >>>> >>>> Case description: >>>> Node 1 was doing the checkpoint of global bitmap. >>>> ocfs2_commit_thread >>>> ocfs2_commit_cache >>>> jbd2_journal_flush >>>> jbd2_cleanup_journal_tail >>>> jbd2_journal_update_superblock >>>> sync_dirty_buffer >>>> submit_bh *failed* >>>> Since the error was ignored, jbd2_journal_flush would return 0. >>>> Then ocfs2_commit_cache thought it normal, incremented trans id and woke >>>> downconvert thread. >>>> So node 2 could get the lock because the checkpoint had been done >>>> successfully (in fact, bitmap on disk had been updated but journal >>>> superblock not). Then node 2 did the update to global bitmap as normal. >>>> After a while, node 2 found node 1 down and began the journal recovery. >>>> As a result, the new update by node 2 would be overwritten and filesystem >>>> became inconsistent. >>> If this is the case, this seemed a generic issue. Assume a two node >>> cluster, node 1 updated global bitmap, and the transaction for this >>> update have been written into node 1's journal. Then node 2 updated >>> global bitmap, after that, node 1 crash and node 2 replay node 1's >>> journal and will overwrite global bitmap to old one. Do i miss some point? >>> >>> Thanks, >>> Junxiao. >>> >> In normal case, node 2 can update global bitmap only after it has already >> got the lock. And this make sure node 1 has already done the checkpoint. > Yes, you are right. > >> For the case described above, one condition is the two updates should be >> on the same gd. And right after journal data has been flushed, updating >> journal superblock fails, that means sb_start still points to the old log >> block number. >> Then the journal replay during recovery will write the old update again. > Right. This seemed also an issue for ext4. In > __jbd2_update_log_tail(), the journal starting block and seq id in > memory are updated even they fail update to journal superblock in the > disk. If the starting blocks are reused and an power down happen, the > journal replay will corrupt the fs. I think we should return the error back. > For ext4, since it is on local disk, I'm not sure if it can happen that ext4 can proceed to run after updating journal superblock fails. If only in case of power down, I don't think it will cause the same issue. During the next mount, it just rewrite data but no corruption. > Thanks, > Junxiao. >> >>>> >>>> I'm not sure if ext4 has the same case (can it be deployed on LUN?). >>>> But for ocfs2, I don't think the error can be omitted. >>>> Any ideas about this? >>>> >>>> Thanks, >>>> Joseph >>>> >>>> >>>> _______________________________________________ >>>> Ocfs2-devel mailing list >>>> Ocfs2-devel@oss.oracle.com >>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>> >>> >>> >>> . >>> >> >> > > > . > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Ocfs2-devel] ocfs2 inconsistent when updating journal superblock failed 2015-06-03 7:27 ` [Ocfs2-devel] " Joseph Qi @ 2015-06-03 7:38 ` Junxiao Bi 0 siblings, 0 replies; 7+ messages in thread From: Junxiao Bi @ 2015-06-03 7:38 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel@oss.oracle.com, linux-ext4 On 06/03/2015 03:27 PM, Joseph Qi wrote: > On 2015/6/3 14:58, Junxiao Bi wrote: >> Hi Joseph, >> >> On 06/03/2015 11:52 AM, Joseph Qi wrote: >>> Hi Junxiao, >>> >>> On 2015/6/3 10:40, Junxiao Bi wrote: >>>> Hi Joseph, >>>> >>>> On 06/02/2015 03:47 PM, Joseph Qi wrote: >>>>> Hi all, >>>>> If jbd2 has failed to update superblock because of iscsi link down, it >>>>> may cause ocfs2 inconsistent. >>>>> >>>>> kernel version: 3.0.93 >>>>> dmesg: >>>>> JBD2: I/O error detected when updating journal superblock for >>>>> dm-41-36. >>>>> >>>>> Case description: >>>>> Node 1 was doing the checkpoint of global bitmap. >>>>> ocfs2_commit_thread >>>>> ocfs2_commit_cache >>>>> jbd2_journal_flush >>>>> jbd2_cleanup_journal_tail >>>>> jbd2_journal_update_superblock >>>>> sync_dirty_buffer >>>>> submit_bh *failed* >>>>> Since the error was ignored, jbd2_journal_flush would return 0. >>>>> Then ocfs2_commit_cache thought it normal, incremented trans id and woke >>>>> downconvert thread. >>>>> So node 2 could get the lock because the checkpoint had been done >>>>> successfully (in fact, bitmap on disk had been updated but journal >>>>> superblock not). Then node 2 did the update to global bitmap as normal. >>>>> After a while, node 2 found node 1 down and began the journal recovery. >>>>> As a result, the new update by node 2 would be overwritten and filesystem >>>>> became inconsistent. >>>> If this is the case, this seemed a generic issue. Assume a two node >>>> cluster, node 1 updated global bitmap, and the transaction for this >>>> update have been written into node 1's journal. Then node 2 updated >>>> global bitmap, after that, node 1 crash and node 2 replay node 1's >>>> journal and will overwrite global bitmap to old one. Do i miss some point? >>>> >>>> Thanks, >>>> Junxiao. >>>> >>> In normal case, node 2 can update global bitmap only after it has already >>> got the lock. And this make sure node 1 has already done the checkpoint. >> Yes, you are right. >> >>> For the case described above, one condition is the two updates should be >>> on the same gd. And right after journal data has been flushed, updating >>> journal superblock fails, that means sb_start still points to the old log >>> block number. >>> Then the journal replay during recovery will write the old update again. >> Right. This seemed also an issue for ext4. In >> __jbd2_update_log_tail(), the journal starting block and seq id in >> memory are updated even they fail update to journal superblock in the >> disk. If the starting blocks are reused and an power down happen, the >> journal replay will corrupt the fs. I think we should return the error back. >> > For ext4, since it is on local disk, I'm not sure if it can happen that ext4 > can proceed to run after updating journal superblock fails. > If only in case of power down, I don't think it will cause the same issue. > During the next mount, it just rewrite data but no corruption. Beside power down, the old starting block in the journal superblock in the disk may be overwritten. Could journal replay starting from this overwritten block fail? Thanks, Junxiao. > >> Thanks, >> Junxiao. >>> >>>>> >>>>> I'm not sure if ext4 has the same case (can it be deployed on LUN?). >>>>> But for ocfs2, I don't think the error can be omitted. >>>>> Any ideas about this? >>>>> >>>>> Thanks, >>>>> Joseph >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-devel mailing list >>>>> Ocfs2-devel@oss.oracle.com >>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>>> >>>> >>>> >>>> . >>>> >>> >>> >> >> >> . >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Ocfs2-devel] ocfs2 inconsistent when updating journal superblock failed 2015-06-02 7:47 ocfs2 inconsistent when updating journal superblock failed Joseph Qi 2015-06-03 2:40 ` [Ocfs2-devel] " Junxiao Bi @ 2015-06-04 11:26 ` Joseph Qi 1 sibling, 0 replies; 7+ messages in thread From: Joseph Qi @ 2015-06-04 11:26 UTC (permalink / raw) To: Theodore Ts'o; +Cc: ocfs2-devel@oss.oracle.com, linux-ext4 Hi Ted, I have gone through the latest jbd2 code, though some functions are refactored, the error is still omitted when updating superblock fails. I want to return the error to the caller, so that ocfs2_commit_cache fails without incrementing trans id and then prevents the other node doing update. Only after it has recovered the failed node, it can proceeds to do update. But this may impact some flows in jbd2. Could you please give your valuable inputs to fix this issue? On 2015/6/2 15:47, Joseph Qi wrote: > Hi all, > If jbd2 has failed to update superblock because of iscsi link down, it > may cause ocfs2 inconsistent. > > kernel version: 3.0.93 > dmesg: > JBD2: I/O error detected when updating journal superblock for > dm-41-36. > > Case description: > Node 1 was doing the checkpoint of global bitmap. > ocfs2_commit_thread > ocfs2_commit_cache > jbd2_journal_flush > jbd2_cleanup_journal_tail > jbd2_journal_update_superblock > sync_dirty_buffer > submit_bh *failed* > Since the error was ignored, jbd2_journal_flush would return 0. > Then ocfs2_commit_cache thought it normal, incremented trans id and woke > downconvert thread. > So node 2 could get the lock because the checkpoint had been done > successfully (in fact, bitmap on disk had been updated but journal > superblock not). Then node 2 did the update to global bitmap as normal. > After a while, node 2 found node 1 down and began the journal recovery. > As a result, the new update by node 2 would be overwritten and filesystem > became inconsistent. > > I'm not sure if ext4 has the same case (can it be deployed on LUN?). > But for ocfs2, I don't think the error can be omitted. > Any ideas about this? > > Thanks, > Joseph > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-04 11:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-02 7:47 ocfs2 inconsistent when updating journal superblock failed Joseph Qi 2015-06-03 2:40 ` [Ocfs2-devel] " Junxiao Bi 2015-06-03 3:52 ` Joseph Qi 2015-06-03 6:58 ` Junxiao Bi 2015-06-03 7:27 ` [Ocfs2-devel] " Joseph Qi 2015-06-03 7:38 ` Junxiao Bi 2015-06-04 11:26 ` Joseph Qi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).