* [PATCH] ext3: fix message in ext3_remount for rw-remount case @ 2011-08-01 4:54 Toshiyuki Okajima 2011-08-01 8:45 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Toshiyuki Okajima @ 2011-08-01 4:54 UTC (permalink / raw) To: jack, akpm, adilger.kernel; +Cc: linux-ext4 If there are some inodes in orphan list while a filesystem is being read-only mounted, we should recommend that pepole umount and then mount it when they try to remount with read-write. But the current message/comment recommends that they umount and then remount it. ext3_remount: /* * If we have an unprocessed orphan list hanging * around from a previously readonly bdev mount, * require a full umount/remount for now. ^^^^^^^^^^^^^^ */ if (es->s_last_orphan) { printk(KERN_WARNING "EXT3-fs: %s: couldn't " "remount RDWR because of unprocessed " "orphan inode list. Please " "umount/remount instead.\n", ^^^^^^^^^^^^^^ sb->s_id); Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> --- fs/ext3/super.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext3/super.c b/fs/ext3/super.c index 7beb69a..d3df0d4 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -2669,13 +2669,13 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data) /* * If we have an unprocessed orphan list hanging * around from a previously readonly bdev mount, - * require a full umount/remount for now. + * require a full umount/mount for now. */ if (es->s_last_orphan) { ext3_msg(sb, KERN_WARNING, "warning: couldn't " "remount RDWR because of unprocessed " "orphan inode list. Please " - "umount/remount instead."); + "umount/mount instead."); err = -EINVAL; goto restore_opts; } -- 1.5.5.6 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-01 4:54 [PATCH] ext3: fix message in ext3_remount for rw-remount case Toshiyuki Okajima @ 2011-08-01 8:45 ` Jan Kara 2011-08-01 9:45 ` Toshiyuki Okajima 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2011-08-01 8:45 UTC (permalink / raw) To: Toshiyuki Okajima; +Cc: jack, akpm, adilger.kernel, linux-ext4 On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: > If there are some inodes in orphan list while a filesystem is being > read-only mounted, we should recommend that pepole umount and then > mount it when they try to remount with read-write. But the current > message/comment recommends that they umount and then remount it. > > ext3_remount: > /* > * If we have an unprocessed orphan list hanging > * around from a previously readonly bdev mount, > * require a full umount/remount for now. > ^^^^^^^^^^^^^^ > */ > if (es->s_last_orphan) { > printk(KERN_WARNING "EXT3-fs: %s: couldn't " > "remount RDWR because of unprocessed " > "orphan inode list. Please " > "umount/remount instead.\n", > ^^^^^^^^^^^^^^ > sb->s_id); OK, so how about using "umount & mount"? The '/' is what would confuse me the most... BTW, I guess you didn't really see this message in practice, did you? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-01 8:45 ` Jan Kara @ 2011-08-01 9:45 ` Toshiyuki Okajima 2011-08-01 9:57 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Toshiyuki Okajima @ 2011-08-01 9:45 UTC (permalink / raw) To: Jan Kara; +Cc: akpm, adilger.kernel, linux-ext4 Hi. (2011/08/01 17:45), Jan Kara wrote: > On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: >> If there are some inodes in orphan list while a filesystem is being >> read-only mounted, we should recommend that pepole umount and then >> mount it when they try to remount with read-write. But the current >> message/comment recommends that they umount and then remount it. >> >> ext3_remount: >> /* >> * If we have an unprocessed orphan list hanging >> * around from a previously readonly bdev mount, >> * require a full umount/remount for now. >> ^^^^^^^^^^^^^^ >> */ >> if (es->s_last_orphan) { >> printk(KERN_WARNING "EXT3-fs: %s: couldn't " >> "remount RDWR because of unprocessed " >> "orphan inode list. Please " >> "umount/remount instead.\n", >> ^^^^^^^^^^^^^^ >> sb->s_id); > OK, so how about using "umount& mount"? The '/' is what would confuse me OK. I modify it like your comment. umount/mount => umount & mount > the most... BTW, I guess you didn't really see this message in practice, did > you? No. I have seen this message in practice while quotacheck command was repeatedly executed per an hour. Thanks, Toshiyuki Okajima ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-01 9:45 ` Toshiyuki Okajima @ 2011-08-01 9:57 ` Jan Kara 2011-08-02 9:14 ` Toshiyuki Okajima 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2011-08-01 9:57 UTC (permalink / raw) To: Toshiyuki Okajima; +Cc: Jan Kara, akpm, adilger.kernel, linux-ext4 On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: > (2011/08/01 17:45), Jan Kara wrote: > >On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: > >>If there are some inodes in orphan list while a filesystem is being > >>read-only mounted, we should recommend that pepole umount and then > >>mount it when they try to remount with read-write. But the current > >>message/comment recommends that they umount and then remount it. > >> > >>ext3_remount: > >> /* > >> * If we have an unprocessed orphan list hanging > >> * around from a previously readonly bdev mount, > >> * require a full umount/remount for now. > >> ^^^^^^^^^^^^^^ > >> */ > >> if (es->s_last_orphan) { > >> printk(KERN_WARNING "EXT3-fs: %s: couldn't " > >> "remount RDWR because of unprocessed " > >> "orphan inode list. Please " > >> "umount/remount instead.\n", > >> ^^^^^^^^^^^^^^ > >> sb->s_id); > > > OK, so how about using "umount& mount"? The '/' is what would confuse me > OK. I modify it like your comment. > > umount/mount => umount & mount Thanks. > >the most... BTW, I guess you didn't really see this message in practice, did > >you? > No. > I have seen this message in practice while quotacheck command was repeatedly > executed per an hour. Interesting. Are you able to reproduce this? Quotacheck does remount read-only + remount read-write but you cannot really remount the filesystem read-only when it has orphan inodes and so you should not see those when you remount read-write again. Possibly there's race between remounting and unlinking... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-01 9:57 ` Jan Kara @ 2011-08-02 9:14 ` Toshiyuki Okajima 2011-08-03 2:42 ` Toshiyuki Okajima 0 siblings, 1 reply; 11+ messages in thread From: Toshiyuki Okajima @ 2011-08-02 9:14 UTC (permalink / raw) To: Jan Kara; +Cc: akpm, adilger.kernel, linux-ext4 Hi. (2011/08/01 18:57), Jan Kara wrote: > On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: >> (2011/08/01 17:45), Jan Kara wrote: >>> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: >>>> If there are some inodes in orphan list while a filesystem is being >>>> read-only mounted, we should recommend that pepole umount and then >>>> mount it when they try to remount with read-write. But the current >>>> message/comment recommends that they umount and then remount it. >>>> >>>> ext3_remount: >>>> /* >>>> * If we have an unprocessed orphan list hanging >>>> * around from a previously readonly bdev mount, >>>> * require a full umount/remount for now. >>>> ^^^^^^^^^^^^^^ >>>> */ >>>> if (es->s_last_orphan) { >>>> printk(KERN_WARNING "EXT3-fs: %s: couldn't " >>>> "remount RDWR because of unprocessed " >>>> "orphan inode list. Please " >>>> "umount/remount instead.\n", >>>> ^^^^^^^^^^^^^^ >>>> sb->s_id); >> >>> OK, so how about using "umount& mount"? The '/' is what would confuse me >> OK. I modify it like your comment. >> >> umount/mount => umount& mount > Thanks. > >>> the most... BTW, I guess you didn't really see this message in practice, did >>> you? >> No. >> I have seen this message in practice while quotacheck command was repeatedly >> executed per an hour. > Interesting. Are you able to reproduce this? Quotacheck does remount > read-only + remount read-write but you cannot really remount the filesystem > read-only when it has orphan inodes and so you should not see those when > you remount read-write again. Possibly there's race between remounting and > unlinking... Yes. I can reproduce it. However, it is not frequently reproduced by using the original procedure (qutacheck per an hour). So, I made a reproducer. It is: [go.sh] #!/bin/sh dd if=/dev/zero of=./img bs=1k count=1 seek=100k > /dev/null 2>&1 /sbin/mkfs.ext3 -Fq ./img /sbin/tune2fs -c 0 ./img mkdir -p mnt LOOP=10000 for ((i=0; i<LOOP; i++)); do mount -o loop ./img ./mnt sh ./writer.sh ./mnt & PID=$! j=0 while [ 1 ]; do sleep 1 # remount if ((j%2 == 0)); then mount -o loop,remount,ro ./mnt else mount -o loop,remount,rw ./mnt fi tail -n 1 /var/log/messages| grep "remount RDWR" if [ $? -eq 0 ]; then break fi j=$((j+1)) done kill -9 $PID umount ./mnt /sbin/e2fsck -pf ./img done exit [writer.sh] #!/bin/sh i=0 path=$1 num=0 stride=64 while [ 1 ]; do for ((j=0;j<stride;j++)); do num=$((i*stride + j)) dd if=/dev/zero of=${path}/file${num} bs=8k count=1 > /dev/null 2>&1 & done for ((j=0;j<stride;j++)); do num=$((i*stride + j)) rm -f ${path}/file${num} & done wait i=$((i+1)) done # vi go.sh # vi writer.sh # sh go.sh (It might be reproduced after a while...) Thanks, Toshiyuki Okajima ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-02 9:14 ` Toshiyuki Okajima @ 2011-08-03 2:42 ` Toshiyuki Okajima 2011-08-03 9:57 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Toshiyuki Okajima @ 2011-08-03 2:42 UTC (permalink / raw) To: Jan Kara; +Cc: akpm, adilger.kernel, linux-ext4 Hi. (2011/08/02 18:14), Toshiyuki Okajima wrote: > Hi. > > (2011/08/01 18:57), Jan Kara wrote: >> On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: >>> (2011/08/01 17:45), Jan Kara wrote: >>>> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: >>>>> If there are some inodes in orphan list while a filesystem is being >>>>> read-only mounted, we should recommend that pepole umount and then >>>>> mount it when they try to remount with read-write. But the current >>>>> message/comment recommends that they umount and then remount it. <SNIP> >>>> the most... BTW, I guess you didn't really see this message in practice, did >>>> you? >>> No. >>> I have seen this message in practice while quotacheck command was repeatedly >>> executed per an hour. >> Interesting. Are you able to reproduce this? Quotacheck does remount >> read-only + remount read-write but you cannot really remount the filesystem >> read-only when it has orphan inodes and so you should not see those when >> you remount read-write again. Possibly there's race between remounting and >> unlinking... > Yes. I can reproduce it. However, it is not frequently reproduced > by using the original procedure (qutacheck per an hour). So, I made a > reproducer. To tell the truth, I think the race creates the message: ----------------------------------------------------------------------- EXT3-fs: <dev>: couldn't remount RDWR because of unprocessed orphan inode list. Please umount/remount instead. ----------------------------------------------------------------------- which hides a serious problem. By using my reproducer, I found that it can show another message that is not the above mentioned message: ----------------------------------------------------------------------- EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem ----------------------------------------------------------------------- After I examined the code path which message could display, I found it can display if the following steps are satisfied: [[CASE 1]] ( 1) [process A] do_unlinkat ( 2) [process B] do_remount_sb(, RDONLY, ) ( 3) [process A] vfs_unlink ( 4) [process A] ext3_unlink ( 5) [process A] ext3_journal_start ( 6) [process B] fs_may_remount_ro (=> return 0) ( 7) [process A] inode->i_nlink-- (i_nlink=0) ( 8) [process A] ext3_orphan_add ( 9) [process A] ext3_journal_stop (10) [process A] dput (11) [process A] iput (12) [process A] ext3_evict_inode (13) [process B] ext3_remount (14) [process A] start_transaction (15) [process B] sb->s_flags |= MS_RDONLY (16) [process B] ext3_mark_recovery_complete (17) [process A] start_this_handle (new transaction is created) (18) [process A] ext3_truncate (19) [process A] start_transaction (failed => this message is displayed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (20) [process A] ext3_orphan_del (21) [process A] ext3_journal_stop * "Process A" deletes a file successfully(21). However the file data is left because (18) fails. **Furthermore, new transaction can be created after ext3_mark_recovery_complete finishes.** [[CASE2]] ( 1) [process A] do_unlinkat ( 2) [process B] do_remount_sb(, RDONLY, ) ( 3) [process A] vfs_unlink ( 4) [process A] ext3_unlink ( 5) [process A] ext3_journal_start ( 6) [process B] fs_may_remount_ro (=> return 0) ( 7) [process A] inode->i_nlink-- (i_nlink=0) ( 8) [process A] ext3_orphan_add ( 9) [process A] ext3_journal_stop (10) [process A] dput (11) [process A] iput (12) [process A] ext3_evict_inode (13) [process B] ext3_remount (14) [process A] start_transaction (15) [process B] sb->s_flags |= MS_RDONLY (17) [process A] start_this_handle (new transaction is created) (18) [process A] ext3_truncate (19) [process A] start_transaction (failed => this message is displayed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (20) [process A] ext3_orphan_del (21) [process A] ext3_journal_stop (22) [process B] ext3_mark_recovery_complete * "Process A" deletes a file successfully(21). However the file data is left because (18) fails. This transaction can finish before ext3_mark_recovery_complete finishes. I will try to fix this problem not to do with fs-error. Please comment about the fix if I have created one. Thanks, Toshiyuki Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-03 2:42 ` Toshiyuki Okajima @ 2011-08-03 9:57 ` Jan Kara 2011-08-03 13:25 ` Toshiyuki Okajima 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2011-08-03 9:57 UTC (permalink / raw) To: Toshiyuki Okajima; +Cc: Jan Kara, akpm, adilger.kernel, linux-ext4 Hello, On Wed 03-08-11 11:42:03, Toshiyuki Okajima wrote: > >(2011/08/01 18:57), Jan Kara wrote: > >>On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: > >>>(2011/08/01 17:45), Jan Kara wrote: > >>>>On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: > >>>>>If there are some inodes in orphan list while a filesystem is being > >>>>>read-only mounted, we should recommend that pepole umount and then > >>>>>mount it when they try to remount with read-write. But the current > >>>>>message/comment recommends that they umount and then remount it. > <SNIP> > >>>>the most... BTW, I guess you didn't really see this message in practice, did > >>>>you? > >>>No. > >>>I have seen this message in practice while quotacheck command was repeatedly > >>>executed per an hour. > >>Interesting. Are you able to reproduce this? Quotacheck does remount > >>read-only + remount read-write but you cannot really remount the filesystem > >>read-only when it has orphan inodes and so you should not see those when > >>you remount read-write again. Possibly there's race between remounting and > >>unlinking... > >Yes. I can reproduce it. However, it is not frequently reproduced > >by using the original procedure (qutacheck per an hour). So, I made a > >reproducer. > To tell the truth, I think the race creates the message: > ----------------------------------------------------------------------- > EXT3-fs: <dev>: couldn't remount RDWR because of > unprocessed orphan inode list. Please umount/remount instead. > ----------------------------------------------------------------------- > which hides a serious problem. I've inquired about this at linux-fsdevel (I think you were in CC unless I forgot). It's a race in VFS remount code as you properly analyzed below. People are working on fixing it but it's not trivial. Filesystem is really a wrong place to fix such problem. If there is a trivial fix for ext3 to workaround the issue, I can take it but I'm not willing to push anything complex - effort should better be spent working on a generic fix. Honza > By using my reproducer, I found that it can show another message that > is not the above mentioned message: > ----------------------------------------------------------------------- > EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem > ----------------------------------------------------------------------- > After I examined the code path which message could display, I found > it can display if the following steps are satisfied: > > [[CASE 1]] > ( 1) [process A] do_unlinkat > ( 2) [process B] do_remount_sb(, RDONLY, ) > ( 3) [process A] vfs_unlink > ( 4) [process A] ext3_unlink > ( 5) [process A] ext3_journal_start > ( 6) [process B] fs_may_remount_ro (=> return 0) > ( 7) [process A] inode->i_nlink-- (i_nlink=0) > ( 8) [process A] ext3_orphan_add > ( 9) [process A] ext3_journal_stop > (10) [process A] dput > (11) [process A] iput > (12) [process A] ext3_evict_inode > (13) [process B] ext3_remount > (14) [process A] start_transaction > (15) [process B] sb->s_flags |= MS_RDONLY > (16) [process B] ext3_mark_recovery_complete > (17) [process A] start_this_handle (new transaction is created) > (18) [process A] ext3_truncate > (19) [process A] start_transaction (failed => this message is displayed) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > (20) [process A] ext3_orphan_del > (21) [process A] ext3_journal_stop > > * "Process A" deletes a file successfully(21). However the file data is left > because (18) fails. **Furthermore, new transaction can be created after > ext3_mark_recovery_complete finishes.** > > [[CASE2]] > ( 1) [process A] do_unlinkat > ( 2) [process B] do_remount_sb(, RDONLY, ) > ( 3) [process A] vfs_unlink > ( 4) [process A] ext3_unlink > ( 5) [process A] ext3_journal_start > ( 6) [process B] fs_may_remount_ro (=> return 0) > ( 7) [process A] inode->i_nlink-- (i_nlink=0) > ( 8) [process A] ext3_orphan_add > ( 9) [process A] ext3_journal_stop > (10) [process A] dput > (11) [process A] iput > (12) [process A] ext3_evict_inode > (13) [process B] ext3_remount > (14) [process A] start_transaction > (15) [process B] sb->s_flags |= MS_RDONLY > (17) [process A] start_this_handle (new transaction is created) > (18) [process A] ext3_truncate > (19) [process A] start_transaction (failed => this message is displayed) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > (20) [process A] ext3_orphan_del > (21) [process A] ext3_journal_stop > (22) [process B] ext3_mark_recovery_complete > > * "Process A" deletes a file successfully(21). However the file data is left > because (18) fails. This transaction can finish before > ext3_mark_recovery_complete finishes. > > I will try to fix this problem not to do with fs-error. > Please comment about the fix if I have created one. > > Thanks, > Toshiyuki Okajima > -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-03 9:57 ` Jan Kara @ 2011-08-03 13:25 ` Toshiyuki Okajima 2011-08-03 16:25 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Toshiyuki Okajima @ 2011-08-03 13:25 UTC (permalink / raw) To: Jan Kara; +Cc: akpm, adilger.kernel, linux-ext4 Hi. On Wed, 3 Aug 2011 11:57:54 +0200 Jan Kara <jack@suse.cz> wrote: > Hello, > > On Wed 03-08-11 11:42:03, Toshiyuki Okajima wrote: > > >(2011/08/01 18:57), Jan Kara wrote: > > >>On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: > > >>>(2011/08/01 17:45), Jan Kara wrote: > > >>>>On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: > > >>>>>If there are some inodes in orphan list while a filesystem is being > > >>>>>read-only mounted, we should recommend that pepole umount and then > > >>>>>mount it when they try to remount with read-write. But the current > > >>>>>message/comment recommends that they umount and then remount it. > > <SNIP> > > >>>>the most... BTW, I guess you didn't really see this message in practice, did > > >>>>you? > > >>>No. > > >>>I have seen this message in practice while quotacheck command was repeatedly > > >>>executed per an hour. > > >>Interesting. Are you able to reproduce this? Quotacheck does remount > > >>read-only + remount read-write but you cannot really remount the filesystem > > >>read-only when it has orphan inodes and so you should not see those when > > >>you remount read-write again. Possibly there's race between remounting and > > >>unlinking... > > >Yes. I can reproduce it. However, it is not frequently reproduced > > >by using the original procedure (qutacheck per an hour). So, I made a > > >reproducer. > > To tell the truth, I think the race creates the message: > > ----------------------------------------------------------------------- > > EXT3-fs: <dev>: couldn't remount RDWR because of > > unprocessed orphan inode list. Please umount/remount instead. > > ----------------------------------------------------------------------- > > which hides a serious problem. > I've inquired about this at linux-fsdevel (I think you were in CC unless > I forgot). It's a race in VFS remount code as you properly analyzed below. > People are working on fixing it but it's not trivial. Filesystem is really > a wrong place to fix such problem. If there is a trivial fix for ext3 to > workaround the issue, I can take it but I'm not willing to push anything > complex - effort should better be spent working on a generic fix. I also think read-only remount race in VFS layer should be fixed. However, I think this race depends on ext3/ext4 filesystem implementation. (Orphan inode list) So, we should modify ext3/ext4(jbd/jbd2) to fix it. > > Honza > > > By using my reproducer, I found that it can show another message that > > is not the above mentioned message: > > ----------------------------------------------------------------------- > > EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem > > ----------------------------------------------------------------------- > > After I examined the code path which message could display, I found > > it can display if the following steps are satisfied: <SNIP> > > I will try to fix this problem not to do with fs-error. > > Please comment about the fix if I have created one. I created a sample patch to fix above EXT3-fs error. But it is tricky, I think. :-) Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> --- fs/ext3/inode.c | 27 ++++++++++++++++++++++----- fs/ext3/super.c | 15 ++++++++++++++- fs/jbd/transaction.c | 1 + include/linux/ext3_fs.h | 3 +++ include/linux/jbd.h | 2 ++ 5 files changed, 42 insertions(+), 6 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 04da6ac..abfa500 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -201,6 +201,7 @@ void ext3_evict_inode (struct inode *inode) struct ext3_block_alloc_info *rsv; handle_t *handle; int want_delete = 0; + int err = 0; trace_ext3_evict_inode(inode); if (!inode->i_nlink && !is_bad_inode(inode)) { @@ -246,6 +247,7 @@ void ext3_evict_inode (struct inode *inode) handle = start_transaction(inode); if (IS_ERR(handle)) { +skip_trans: /* * If we're going to skip the normal cleanup, we still need to * make sure that the in-core orphan linked list is properly @@ -258,8 +260,13 @@ void ext3_evict_inode (struct inode *inode) if (IS_SYNC(inode)) handle->h_sync = 1; inode->i_size = 0; - if (inode->i_blocks) - ext3_truncate(inode); + if (inode->i_blocks) { + err = __ext3_truncate(inode); + if (err == -EROFS) { + ext3_journal_stop(handle); + goto skip_trans; + } + } /* * Kill off the orphan record created when the inode lost the last * link. Note that ext3_orphan_del() has to be able to cope with the @@ -2511,7 +2518,7 @@ int ext3_can_truncate(struct inode *inode) * that's fine - as long as they are linked from the inode, the post-crash * ext3_truncate() run will find them and release them. */ -void ext3_truncate(struct inode *inode) +int __ext3_truncate(struct inode *inode) { handle_t *handle; struct ext3_inode_info *ei = EXT3_I(inode); @@ -2524,6 +2531,7 @@ void ext3_truncate(struct inode *inode) int n; long last_block; unsigned blocksize = inode->i_sb->s_blocksize; + int err = 0; trace_ext3_truncate_enter(inode); @@ -2534,8 +2542,11 @@ void ext3_truncate(struct inode *inode) ext3_set_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE); handle = start_transaction(inode); - if (IS_ERR(handle)) + if (IS_ERR(handle)) { + if (PTR_ERR(handle) == -EROFS) + err = -EROFS; goto out_notrans; + } last_block = (inode->i_size + blocksize-1) >> EXT3_BLOCK_SIZE_BITS(inode->i_sb); @@ -2654,7 +2665,7 @@ out_stop: ext3_journal_stop(handle); trace_ext3_truncate_exit(inode); - return; + return 0; out_notrans: /* * Delete the inode from orphan list so that it doesn't stay there @@ -2663,6 +2674,12 @@ out_notrans: if (inode->i_nlink) ext3_orphan_del(NULL, inode); trace_ext3_truncate_exit(inode); + return err; +} + +void ext3_truncate(struct inode *inode) +{ + __ext3_truncate(inode); } static ext3_fsblk_t ext3_get_inode_block(struct super_block *sb, diff --git a/fs/ext3/super.c b/fs/ext3/super.c index 7beb69a..22ffb04 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -270,6 +270,10 @@ void __ext3_std_error (struct super_block * sb, const char * function, if (errno == -EROFS && journal_current_handle() == NULL && (sb->s_flags & MS_RDONLY)) return; + smp_rmb(); + if ((sb->s_flags & MS_RDONLY) && errno == -EROFS && + (EXT3_SB(sb)->s_mount_state & EXT3_ROMOUNT)) + return; errstr = ext3_decode_error(sb, errno, nbuf); ext3_msg(sb, KERN_CRIT, "error in %s: %s", function, errstr); @@ -2642,7 +2646,12 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data) * First of all, the unconditional stuff we have to do * to disable replay of the journal when we next remount */ + spin_lock(&sbi->s_journal->j_state_lock); + sbi->s_journal->j_flags |= JFS_ROMOUNT; + spin_unlock(&sbi->s_journal->j_state_lock); + sbi->s_mount_state |= EXT3_ROMOUNT; sb->s_flags |= MS_RDONLY; + smp_wmb(); /* * OK, test if we are remounting a valid rw partition @@ -2651,7 +2660,8 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data) */ if (!(es->s_state & cpu_to_le16(EXT3_VALID_FS)) && (sbi->s_mount_state & EXT3_VALID_FS)) - es->s_state = cpu_to_le16(sbi->s_mount_state); + es->s_state = + cpu_to_le16(sbi->s_mount_state & ~EXT3_ROMOUNT); ext3_mark_recovery_complete(sb, es); } else { @@ -2686,6 +2696,9 @@ static int ext3_remount (struct super_block * sb, int * flags, char * data) * been changed by e2fsck since we originally mounted * the partition.) */ + spin_lock(&sbi->s_journal->j_state_lock); + sbi->s_journal->j_flags &= ~JFS_ROMOUNT; + spin_unlock(&sbi->s_journal->j_state_lock); ext3_clear_journal_err(sb, es); sbi->s_mount_state = le16_to_cpu(es->s_state); if ((err = ext3_group_extend(sb, es, n_blocks_count))) diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index 7e59c6e..26139a3 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -118,6 +118,7 @@ repeat: spin_lock(&journal->j_state_lock); repeat_locked: if (is_journal_aborted(journal) || + journal->j_flags & JFS_ROMOUNT || (journal->j_errno != 0 && !(journal->j_flags & JFS_ACK_ERR))) { spin_unlock(&journal->j_state_lock); ret = -EROFS; diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h index 67a803a..f02ed52 100644 --- a/include/linux/ext3_fs.h +++ b/include/linux/ext3_fs.h @@ -369,6 +369,8 @@ struct ext3_inode { #define EXT3_VALID_FS 0x0001 /* Unmounted cleanly */ #define EXT3_ERROR_FS 0x0002 /* Errors detected */ #define EXT3_ORPHAN_FS 0x0004 /* Orphans being recovered */ +#define EXT3_ROMOUNT 0x0008 /* FS is ro-mounted + * (Internal use only) */ /* * Misc. filesystem flags @@ -912,6 +914,7 @@ extern void ext3_dirty_inode(struct inode *, int); extern int ext3_change_inode_journal_flag(struct inode *, int); extern int ext3_get_inode_loc(struct inode *, struct ext3_iloc *); extern int ext3_can_truncate(struct inode *inode); +extern int __ext3_truncate(struct inode *inode); extern void ext3_truncate(struct inode *inode); extern void ext3_set_inode_flags(struct inode *); extern void ext3_get_inode_flags(struct ext3_inode_info *); diff --git a/include/linux/jbd.h b/include/linux/jbd.h index e6a5e34..eae4b62 100644 --- a/include/linux/jbd.h +++ b/include/linux/jbd.h @@ -831,6 +831,8 @@ struct journal_s #define JFS_ABORT_ON_SYNCDATA_ERR 0x040 /* Abort the journal on file * data write error in ordered * mode */ +#define JFS_ROMOUNT 0x080 /* Prevent new transaction creating + * while ro-mounting. */ /* * Function declarations for the journaling transaction and buffer -- 1.5.5.6 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-03 13:25 ` Toshiyuki Okajima @ 2011-08-03 16:25 ` Jan Kara 2011-08-08 4:27 ` Toshiyuki Okajima 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2011-08-03 16:25 UTC (permalink / raw) To: Toshiyuki Okajima; +Cc: Jan Kara, akpm, adilger.kernel, linux-ext4 Hello, On Wed 03-08-11 22:25:48, Toshiyuki Okajima wrote: > On Wed, 3 Aug 2011 11:57:54 +0200 > Jan Kara <jack@suse.cz> wrote: > > On Wed 03-08-11 11:42:03, Toshiyuki Okajima wrote: > > > >(2011/08/01 18:57), Jan Kara wrote: > > > >>On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: > > > >>>(2011/08/01 17:45), Jan Kara wrote: > > > >>>>On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: > > > >>>>>If there are some inodes in orphan list while a filesystem is being > > > >>>>>read-only mounted, we should recommend that pepole umount and then > > > >>>>>mount it when they try to remount with read-write. But the current > > > >>>>>message/comment recommends that they umount and then remount it. > > > <SNIP> > > > >>>>the most... BTW, I guess you didn't really see this message in practice, did > > > >>>>you? > > > >>>No. > > > >>>I have seen this message in practice while quotacheck command was repeatedly > > > >>>executed per an hour. > > > >>Interesting. Are you able to reproduce this? Quotacheck does remount > > > >>read-only + remount read-write but you cannot really remount the filesystem > > > >>read-only when it has orphan inodes and so you should not see those when > > > >>you remount read-write again. Possibly there's race between remounting and > > > >>unlinking... > > > >Yes. I can reproduce it. However, it is not frequently reproduced > > > >by using the original procedure (qutacheck per an hour). So, I made a > > > >reproducer. > > > To tell the truth, I think the race creates the message: > > > ----------------------------------------------------------------------- > > > EXT3-fs: <dev>: couldn't remount RDWR because of > > > unprocessed orphan inode list. Please umount/remount instead. > > > ----------------------------------------------------------------------- > > > which hides a serious problem. > > I've inquired about this at linux-fsdevel (I think you were in CC unless > > I forgot). It's a race in VFS remount code as you properly analyzed below. > > People are working on fixing it but it's not trivial. Filesystem is really > > a wrong place to fix such problem. If there is a trivial fix for ext3 to > > workaround the issue, I can take it but I'm not willing to push anything > > complex - effort should better be spent working on a generic fix. > I also think read-only remount race in VFS layer should be fixed. > However, I think this race depends on ext3/ext4 filesystem > implementation. (Orphan inode list) > So, we should modify ext3/ext4(jbd/jbd2) to fix it. Umm, I don't understand here. If VFS makes sure that there are no files open for writing, no unfinished operations changing the filesystem (e.g. unlink), and no open-but-unlinked files, what remains for ext3 to check? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-03 16:25 ` Jan Kara @ 2011-08-08 4:27 ` Toshiyuki Okajima 2011-08-08 12:48 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Toshiyuki Okajima @ 2011-08-08 4:27 UTC (permalink / raw) To: Jan Kara; +Cc: akpm, adilger.kernel, linux-ext4 Hi. I'm sorry for my late response. I took vacations till yesterday. (2011/08/04 1:25), Jan Kara wrote: > Hello, > > On Wed 03-08-11 22:25:48, Toshiyuki Okajima wrote: >> On Wed, 3 Aug 2011 11:57:54 +0200 >> Jan Kara<jack@suse.cz> wrote: >>> On Wed 03-08-11 11:42:03, Toshiyuki Okajima wrote: >>>>> (2011/08/01 18:57), Jan Kara wrote: >>>>>> On Mon 01-08-11 18:45:58, Toshiyuki Okajima wrote: >>>>>>> (2011/08/01 17:45), Jan Kara wrote: >>>>>>>> On Mon 01-08-11 13:54:51, Toshiyuki Okajima wrote: >>>>>>>>> If there are some inodes in orphan list while a filesystem is being >>>>>>>>> read-only mounted, we should recommend that pepole umount and then >>>>>>>>> mount it when they try to remount with read-write. But the current >>>>>>>>> message/comment recommends that they umount and then remount it. >>>> <SNIP> >>>>>>>> the most... BTW, I guess you didn't really see this message in practice, did >>>>>>>> you? >>>>>>> No. >>>>>>> I have seen this message in practice while quotacheck command was repeatedly >>>>>>> executed per an hour. >>>>>> Interesting. Are you able to reproduce this? Quotacheck does remount >>>>>> read-only + remount read-write but you cannot really remount the filesystem >>>>>> read-only when it has orphan inodes and so you should not see those when >>>>>> you remount read-write again. Possibly there's race between remounting and >>>>>> unlinking... >>>>> Yes. I can reproduce it. However, it is not frequently reproduced >>>>> by using the original procedure (qutacheck per an hour). So, I made a >>>>> reproducer. >>>> To tell the truth, I think the race creates the message: >>>> ----------------------------------------------------------------------- >>>> EXT3-fs:<dev>: couldn't remount RDWR because of >>>> unprocessed orphan inode list. Please umount/remount instead. >>>> ----------------------------------------------------------------------- >>>> which hides a serious problem. >>> I've inquired about this at linux-fsdevel (I think you were in CC unless >>> I forgot). It's a race in VFS remount code as you properly analyzed below. >>> People are working on fixing it but it's not trivial. Filesystem is really >>> a wrong place to fix such problem. If there is a trivial fix for ext3 to >>> workaround the issue, I can take it but I'm not willing to push anything >>> complex - effort should better be spent working on a generic fix. >> I also think read-only remount race in VFS layer should be fixed. >> However, I think this race depends on ext3/ext4 filesystem >> implementation. (Orphan inode list) >> So, we should modify ext3/ext4(jbd/jbd2) to fix it. > Umm, I don't understand here. If VFS makes sure that there are no After I saw the following messages, I thought we must fix EXT3-fs error at first. So, I created the fix patch. (1) kernel: EXT3-fs: <dev>: couldn't remount RDWR because of unprocessed orphan inode list. Please umount/remount instead. (2) kernel: EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem I wasn't aware that by fixing the race between "ro-remount" and "unlink", that EXT3-fs error can be also fixed then. > files open for writing, no unfinished operations changing the filesystem (e.g. > unlink), and no open-but-unlinked files, what remains for ext3 to check? OK. Now, I also think we need not modify ext3 to fix these problems. If we can prevent to add an inode into the orphan list (to start unlinking) while ro-remounting, we can also prevent (1) and (2). However, new mechanism to confirm whether "no open-but-unlinked" files exist while ro-remounting is required, isn't it? Thanks, Toshiyuki Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext3: fix message in ext3_remount for rw-remount case 2011-08-08 4:27 ` Toshiyuki Okajima @ 2011-08-08 12:48 ` Jan Kara 0 siblings, 0 replies; 11+ messages in thread From: Jan Kara @ 2011-08-08 12:48 UTC (permalink / raw) To: Toshiyuki Okajima; +Cc: Jan Kara, akpm, adilger.kernel, linux-ext4 Hello, On Mon 08-08-11 13:27:12, Toshiyuki Okajima wrote: > >On Wed 03-08-11 22:25:48, Toshiyuki Okajima wrote: > >>On Wed, 3 Aug 2011 11:57:54 +0200 > >>Jan Kara<jack@suse.cz> wrote: > >>>>To tell the truth, I think the race creates the message: > >>>>----------------------------------------------------------------------- > >>>> EXT3-fs:<dev>: couldn't remount RDWR because of > >>>> unprocessed orphan inode list. Please umount/remount instead. > >>>>----------------------------------------------------------------------- > >>>>which hides a serious problem. > >>> I've inquired about this at linux-fsdevel (I think you were in CC unless > >>>I forgot). It's a race in VFS remount code as you properly analyzed below. > >>>People are working on fixing it but it's not trivial. Filesystem is really > >>>a wrong place to fix such problem. If there is a trivial fix for ext3 to > >>>workaround the issue, I can take it but I'm not willing to push anything > >>>complex - effort should better be spent working on a generic fix. > >>I also think read-only remount race in VFS layer should be fixed. > >>However, I think this race depends on ext3/ext4 filesystem > >>implementation. (Orphan inode list) > >>So, we should modify ext3/ext4(jbd/jbd2) to fix it. > > > Umm, I don't understand here. If VFS makes sure that there are no > > After I saw the following messages, I thought we must fix EXT3-fs error > at first. So, I created the fix patch. > > (1) kernel: EXT3-fs: <dev>: couldn't remount RDWR because of > unprocessed orphan inode list. Please umount/remount instead. > (2) kernel: EXT3-fs error (device <dev>) in start_transaction: Readonly filesystem > > I wasn't aware that by fixing the race between "ro-remount" and "unlink", > that EXT3-fs error can be also fixed then. > > >files open for writing, no unfinished operations changing the filesystem (e.g. > >unlink), and no open-but-unlinked files, what remains for ext3 to check? > OK. > Now, I also think we need not modify ext3 to fix these problems. > If we can prevent to add an inode into the orphan list (to start unlinking) > while ro-remounting, we can also prevent (1) and (2). > > However, new mechanism to confirm whether "no open-but-unlinked" files > exist while ro-remounting is required, isn't it? Yes. Actually, VFS already tracks open-but-unlinked files but the check & remount pair is not atomic so that needs to be fixed and it is not that simple. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-08-08 12:48 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-01 4:54 [PATCH] ext3: fix message in ext3_remount for rw-remount case Toshiyuki Okajima 2011-08-01 8:45 ` Jan Kara 2011-08-01 9:45 ` Toshiyuki Okajima 2011-08-01 9:57 ` Jan Kara 2011-08-02 9:14 ` Toshiyuki Okajima 2011-08-03 2:42 ` Toshiyuki Okajima 2011-08-03 9:57 ` Jan Kara 2011-08-03 13:25 ` Toshiyuki Okajima 2011-08-03 16:25 ` Jan Kara 2011-08-08 4:27 ` Toshiyuki Okajima 2011-08-08 12:48 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).