* [Ocfs2-devel] How can ecc be corrected? @ 2011-06-17 15:55 Goldwyn Rodrigues 2011-06-17 16:53 ` Sunil Mushran 0 siblings, 1 reply; 9+ messages in thread From: Goldwyn Rodrigues @ 2011-06-17 15:55 UTC (permalink / raw) To: ocfs2-devel Hi, I am not able to understand the use of metaecc or the ECC in the metadata. All the metadata contain the ecc to check if the data written to the block is sane, but what happens in case the ecc does not match? All it does is fail in case it does not match. There does not seem a way to correct it. fsck simply fails in ocfs2_read_inode, (or in some cases such as superblock inode (2) does not even check) if the ecc does not match. What is the best way to correct ecc errors? I understand that an incorrect ECC means the data might be corrupt, but what if we want to recover? or is it not meant to be corrected at all? Regards, -- Goldwyn ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-17 15:55 [Ocfs2-devel] How can ecc be corrected? Goldwyn Rodrigues @ 2011-06-17 16:53 ` Sunil Mushran 2011-06-17 18:50 ` Goldwyn Rodrigues 0 siblings, 1 reply; 9+ messages in thread From: Sunil Mushran @ 2011-06-17 16:53 UTC (permalink / raw) To: ocfs2-devel On 06/17/2011 08:55 AM, Goldwyn Rodrigues wrote: > I am not able to understand the use of metaecc or the ECC in the > metadata. All the metadata contain the ecc to check if the data > written to the block is sane, but what happens in case the ecc does > not match? All it does is fail in case it does not match. There does > not seem a way to correct it. > > fsck simply fails in ocfs2_read_inode, (or in some cases such as > superblock inode (2) does not even check) if the ecc does not match. > What is the best way to correct ecc errors? I understand that an > incorrect ECC means the data might be corrupt, but what if we want to > recover? or is it not meant to be corrected at all? I think originally our thought was that bad checksum means bad block. But we are wiser now. As in, while that works in the fs, we could to do better job in the tools. And that's the reason it is not yet enabled by default. If you have ideas, do share. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-17 16:53 ` Sunil Mushran @ 2011-06-17 18:50 ` Goldwyn Rodrigues 2011-06-17 19:14 ` Sunil Mushran 0 siblings, 1 reply; 9+ messages in thread From: Goldwyn Rodrigues @ 2011-06-17 18:50 UTC (permalink / raw) To: ocfs2-devel On Fri, Jun 17, 2011 at 11:53 AM, Sunil Mushran <sunil.mushran@oracle.com> wrote: > On 06/17/2011 08:55 AM, Goldwyn Rodrigues wrote: >> >> I am not able to understand the use of metaecc or the ECC in the >> metadata. All the metadata contain the ecc to check if the data >> written to the block is sane, but what happens in case the ecc does >> not match? All it does is fail in case it does not match. There does >> not seem a way to correct it. >> >> fsck simply fails in ocfs2_read_inode, (or in some cases such as >> superblock inode (2) does not even check) if the ecc does not match. >> What is the best way to correct ecc errors? I understand that an >> incorrect ECC means the data might be corrupt, but what if we want to >> recover? or is it not meant to be corrected at all? > > I think originally our thought was that bad checksum means bad block. But > we are wiser now. As in, while that works in the fs, we could to do better > job in the tools. And that's the reason it is not yet enabled by default. > So, what is the plan in the future? Do you intend to put it as a default option or let things be as is? In any case, I agree we should modify tools to correct the filesystem (fsck) if the filesystem fails due to metaecc error or else we could end up having an unusable filesystem. It sure is a good debugging tool for development purposes though. > If you have ideas, do share. No ideas as such. I raised this question because a customer was facing this issue with the superblock and no way to fix it. Fortunately, he can still use the filesystem. It is debugfs.ocfs2 which is failing. I guess I will have to work on a patch to fix this. -- Goldwyn ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-17 18:50 ` Goldwyn Rodrigues @ 2011-06-17 19:14 ` Sunil Mushran 2011-06-17 23:16 ` Joel Becker 2011-06-19 4:13 ` Goldwyn Rodrigues 0 siblings, 2 replies; 9+ messages in thread From: Sunil Mushran @ 2011-06-17 19:14 UTC (permalink / raw) To: ocfs2-devel On 06/17/2011 11:50 AM, Goldwyn Rodrigues wrote: > On Fri, Jun 17, 2011 at 11:53 AM, Sunil Mushran > <sunil.mushran@oracle.com> wrote: >> On 06/17/2011 08:55 AM, Goldwyn Rodrigues wrote: >>> I am not able to understand the use of metaecc or the ECC in the >>> metadata. All the metadata contain the ecc to check if the data >>> written to the block is sane, but what happens in case the ecc does >>> not match? All it does is fail in case it does not match. There does >>> not seem a way to correct it. >>> >>> fsck simply fails in ocfs2_read_inode, (or in some cases such as >>> superblock inode (2) does not even check) if the ecc does not match. >>> What is the best way to correct ecc errors? I understand that an >>> incorrect ECC means the data might be corrupt, but what if we want to >>> recover? or is it not meant to be corrected at all? >> I think originally our thought was that bad checksum means bad block. But >> we are wiser now. As in, while that works in the fs, we could to do better >> job in the tools. And that's the reason it is not yet enabled by default. >> > So, what is the plan in the future? Do you intend to put it as a > default option or let things be as is? > > In any case, I agree we should modify tools to correct the filesystem > (fsck) if the filesystem fails due to metaecc error or else we could > end up having an unusable filesystem. It sure is a good debugging tool > for development purposes though. Oh absolutely it will be made a default. But we have to address this shortcoming first. >> If you have ideas, do share. > No ideas as such. I raised this question because a customer was facing > this issue with the superblock and no way to fix it. Fortunately, he > can still use the filesystem. It is debugfs.ocfs2 which is failing. I > guess I will have to work on a patch to fix this. So I remember we had a bug in tunefs that changed the superblock without recomputing the checksum. It has been fixed since. How can he still use the fs? One solution is to disable it... manually. And then re-enable it using the latest tunefs. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-17 19:14 ` Sunil Mushran @ 2011-06-17 23:16 ` Joel Becker 2011-06-20 16:22 ` Sunil Mushran 2011-06-19 4:13 ` Goldwyn Rodrigues 1 sibling, 1 reply; 9+ messages in thread From: Joel Becker @ 2011-06-17 23:16 UTC (permalink / raw) To: ocfs2-devel On Fri, Jun 17, 2011 at 12:14:36PM -0700, Sunil Mushran wrote: > >> If you have ideas, do share. > > No ideas as such. I raised this question because a customer was facing > > this issue with the superblock and no way to fix it. Fortunately, he > > can still use the filesystem. It is debugfs.ocfs2 which is failing. I > > guess I will have to work on a patch to fix this. > > So I remember we had a bug in tunefs that changed the superblock > without recomputing the checksum. It has been fixed since. > > How can he still use the fs? > > One solution is to disable it... manually. And then re-enable it using > the latest tunefs. I thought we were going to patch fsck.ocfs2 to run in an ignore-metaecc mode? Joel -- "Hey mister if you're gonna walk on water, Could you drop a line my way?" http://www.jlbec.org/ jlbec at evilplan.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-17 23:16 ` Joel Becker @ 2011-06-20 16:22 ` Sunil Mushran 2011-06-20 17:34 ` Goldwyn Rodrigues 0 siblings, 1 reply; 9+ messages in thread From: Sunil Mushran @ 2011-06-20 16:22 UTC (permalink / raw) To: ocfs2-devel On 06/17/2011 04:16 PM, Joel Becker wrote: > On Fri, Jun 17, 2011 at 12:14:36PM -0700, Sunil Mushran wrote: >>>> If you have ideas, do share. >>> No ideas as such. I raised this question because a customer was facing >>> this issue with the superblock and no way to fix it. Fortunately, he >>> can still use the filesystem. It is debugfs.ocfs2 which is failing. I >>> guess I will have to work on a patch to fix this. >> So I remember we had a bug in tunefs that changed the superblock >> without recomputing the checksum. It has been fixed since. >> >> How can he still use the fs? >> >> One solution is to disable it... manually. And then re-enable it using >> the latest tunefs. > I thought we were going to patch fsck.ocfs2 to run in an > ignore-metaecc mode? Oh I did not know we had decided on that. Though that appears to be the best solution. fsck and debugfs always run in ignore-metaecc mode. fsck will need a fixup code for that. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-20 16:22 ` Sunil Mushran @ 2011-06-20 17:34 ` Goldwyn Rodrigues 0 siblings, 0 replies; 9+ messages in thread From: Goldwyn Rodrigues @ 2011-06-20 17:34 UTC (permalink / raw) To: ocfs2-devel On Mon, Jun 20, 2011 at 11:22 AM, Sunil Mushran <sunil.mushran@oracle.com> wrote: > On 06/17/2011 04:16 PM, Joel Becker wrote: >> >> On Fri, Jun 17, 2011 at 12:14:36PM -0700, Sunil Mushran wrote: >>>>> >>>>> If you have ideas, do share. >>>> >>>> No ideas as such. I raised this question because a customer was facing >>>> this issue with the superblock and no way to fix it. Fortunately, he >>>> can still use the filesystem. It is debugfs.ocfs2 which is failing. I >>>> guess I will have to work on a patch to fix this. >>> >>> So I remember we had a bug in tunefs that changed the superblock >>> without recomputing the checksum. It has been fixed since. >>> >>> How can he still use the fs? >>> >>> One solution is to disable it... manually. And then re-enable it using >>> the latest tunefs. >> >> ? ? ? ?I thought we were going to patch fsck.ocfs2 to run in an >> ignore-metaecc mode? > > > Oh I did not know we had decided on that. Though that appears to be the > best solution. fsck and debugfs always run in ignore-metaecc mode. fsck > will need a fixup code for that. > Cool. I have sent a set of 3 patches on the tools mailing list. Let me know if it works for you. -- Goldwyn ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-17 19:14 ` Sunil Mushran 2011-06-17 23:16 ` Joel Becker @ 2011-06-19 4:13 ` Goldwyn Rodrigues 2011-06-20 16:32 ` Sunil Mushran 1 sibling, 1 reply; 9+ messages in thread From: Goldwyn Rodrigues @ 2011-06-19 4:13 UTC (permalink / raw) To: ocfs2-devel On Fri, Jun 17, 2011 at 2:14 PM, Sunil Mushran <sunil.mushran@oracle.com> wrote: > On 06/17/2011 11:50 AM, Goldwyn Rodrigues wrote: >> >> On Fri, Jun 17, 2011 at 11:53 AM, Sunil Mushran >> <sunil.mushran@oracle.com> ?wrote: >>> >>> On 06/17/2011 08:55 AM, Goldwyn Rodrigues wrote: >>>> >>>> I am not able to understand the use of metaecc or the ECC in the >>>> metadata. All the metadata contain the ecc to check if the data >>>> written to the block is sane, but what happens in case the ecc does >>>> not match? All it does is fail in case it does not match. There does >>>> not seem a way to correct it. >>>> >>>> fsck simply fails in ocfs2_read_inode, (or in some cases such as >>>> superblock inode (2) does not even check) if the ecc does not match. Oh, I was wrong about this. I patched fswreck to mess_up the superblock ECC values real bad, and neither mount nor fsck worked. But an error in correctable limits will go ignored and block_check will remain the same. At this state, there is no way to revive the fs. Like Joel mentioned, we need to ignore-metaecc for fsck to correct it. >>>> What is the best way to correct ecc errors? I understand that an >>>> incorrect ECC means the data might be corrupt, but what if we want to >>>> recover? or is it not meant to be corrected at all? >>> >>> I think originally our thought was that bad checksum means bad block. But >>> we are wiser now. As in, while that works in the fs, we could to do >>> better >>> job in the tools. And that's the reason it is not yet enabled by default. >>> >> So, what is the plan in the future? Do you intend to put it as a >> default option or let things be as is? >> >> In any case, I agree we should modify tools to correct the filesystem >> (fsck) if the filesystem fails due to metaecc error or else we could >> end up having an unusable filesystem. It sure is a good debugging tool >> for development purposes though. > > Oh absolutely it will be made a default. But we have to address this > shortcoming first. > >>> If you have ideas, do share. >> >> No ideas as such. I raised this question because a customer was facing >> this issue with the superblock and no way to fix it. Fortunately, he >> can still use the filesystem. It is debugfs.ocfs2 which is failing. I >> guess I will have to work on a patch to fix this. > > So I remember we had a bug in tunefs that changed the superblock > without recomputing the checksum. It has been fixed since. > > How can he still use the fs? > I suppose it is still in the correctable limits. By failing I meant a "stat" output in debugfs gives a "FAILED CHECKSUM" error. On reading more I found we are not writing the superblock anywhere in kernel module and perhaps the reason the block_check values remain unchanged. PCMIIW. This brings me to the next question: Why don't we use mnt_count? The fact that it is distributed makes life complicated, but still... > One solution is to disable it... manually. And then re-enable it using > the latest tunefs. > -- Goldwyn ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ocfs2-devel] How can ecc be corrected? 2011-06-19 4:13 ` Goldwyn Rodrigues @ 2011-06-20 16:32 ` Sunil Mushran 0 siblings, 0 replies; 9+ messages in thread From: Sunil Mushran @ 2011-06-20 16:32 UTC (permalink / raw) To: ocfs2-devel On 06/18/2011 09:13 PM, Goldwyn Rodrigues wrote: > I suppose it is still in the correctable limits. By failing I meant a > "stat" output in debugfs gives a "FAILED CHECKSUM" error. > > On reading more I found we are not writing the superblock anywhere in > kernel module and perhaps the reason the block_check values remain > unchanged. PCMIIW. > > This brings me to the next question: Why don't we use mnt_count? The > fact that it is distributed makes life complicated, but still... Yeah.. Mark had added the failed checksum check in debugfs. Without that we were running blind. Hard to compute it in the head. ;) mnt count was originally added in extN to force fsck after N mounts. That has never worked for us because fsck is a offline process. And it could take time. It is prudent to let users control when it's run. FWIW, extN has also changed its default behaviour to ignore mnt count. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-06-20 17:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-17 15:55 [Ocfs2-devel] How can ecc be corrected? Goldwyn Rodrigues 2011-06-17 16:53 ` Sunil Mushran 2011-06-17 18:50 ` Goldwyn Rodrigues 2011-06-17 19:14 ` Sunil Mushran 2011-06-17 23:16 ` Joel Becker 2011-06-20 16:22 ` Sunil Mushran 2011-06-20 17:34 ` Goldwyn Rodrigues 2011-06-19 4:13 ` Goldwyn Rodrigues 2011-06-20 16:32 ` Sunil Mushran
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.