public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: Christian Kujau <lists@nerdbynature.de>, Jan Kara <jack@suse.cz>,
	Eric Sandeen <sandeen@redhat.com>,
	mszeredi@suse.cz, Al Viro <viro@ZenIV.linux.org.uk>
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list
Date: Thu, 06 Oct 2011 19:12:33 +0900	[thread overview]
Message-ID: <4E8D7F11.8050309@jp.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.01.1110051823380.8000@trent.utfs.org>

(2011/10/06 10:34), Christian Kujau wrote:
> On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote:
>>> With Miklos' patches applied to -rc5, this happend again just now :-(
>>>
>> Thanks for careful testing! Hmm, since you are able to reproduce on ppc
>> but not on x86 there might be some memory ordering bug in Miklos' patches
>> or it's simply because of different timing. Miklos, care to debug this
>> further?
> 
> Just to be clear: I'm still not entirely sure how to reproduce this at 
> will. I *assumed* that the daily remount-rw-and-ro-again routine that left 
> some inodes in limbo and eventually lead to those "unprocessed orphan 
> inodes". With that in mind I tried to reproduce this with the help of a 
> test-script (test-remount.sh, [0]) - but the message did not occur while 
> the script was running.
> 
> I've ran the script again today on the said powerpc machine on a 
> loop-mounted 500MB ext4 partition. But even after 100 iterations no
> such message occured.
> 
> So maybe it's caused by something else or my test-script just doesn't get 
> the scenario right and there's something subtle to this whole 
> remounting-business I haven't figured out yet, leading to those orphan 
> inodes.
> 
> I'm at 3.1.0-rc9 now and will wait until the errors occur again.
> 
> Christian.
> 
> [0] nerdbynature.de/bits/3.1-rc4/ext4/

With Miklos' patches applies to -rc8, I could display
"Couldn't remount RDWR because of unprocessed orphan inode list".
on my x86_64 machine by my reproducer.

Because actual removal starts from over a range between mnt_want_write() and
mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write()
and mnt_drop_write() to prevent a filesystem from re-mounting read-only.

My reproducer is as follows:
-----------------------------------------------------------------------------
[1] go.sh
#!/bin/sh

dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1000k > /dev/null 2>&1
/sbin/mkfs.ext4 -Fq /tmp/img
mount -o loop /tmp/img /mnt
./writer.sh /mnt &
LOOP=1000000000
for ((i=0; i<LOOP; i++));
do
	echo "[$i]"
	if ((i%2 == 0));
	then
		mount -o ro,remount,loop /mnt
	else
		mount -o rw,remount,loop /mnt
	fi
	sleep 1
done

[2] writer.sh
#!/bin/sh

dir=$1
for ((i=0;i<10000000;i++));
do
	for ((j=0;j<64;j++));
	do
		filename="$dir/file$((i*64 + j))"
		dd if=/dev/zero of=$filename bs=1k count=8 > /dev/null 2>&1 &
	done
	for ((j=0;j<64;j++));
	do
		filename="$dir/file$((i*64 + j))"
		rm -f $filename > /dev/null 2>&1 &
	done
	wait
	if ((i%100 == 0 && i > 0));
	then
		rm -f $dir/file*
	fi
done
exit

[step to run]
# ./go.sh
-----------------------------------------------------------------------------

Therefore, we need a mechanism to prevent a filesystem from re-mounting 
read-only until actual removal finishes.

------------------------------------------------------------------------
[example fix]
 do_unlinkat() {
   ...
   mnt_want_write()
   vfs_unlink()
   if (inode && inode->i_nlink == 0) {              //
      atomic_inc(&inode->i_sb->s_unlink_count);     //   
      inode->i_deleting++;                          // 
   }                                                // 
   mnt_drop_write()
   ...
   iput() // usually, an acutal removal starts
   ...
 }

destroy_inode() {
  ...
  if (inode->i_deleting)
    atomic_dec(&inode->i_sb->s_unlink_count);
  ...
}

do_remount_sb() {
  ...
  else if (!fs_may_remount_ro(sb) || atomic_read(&sb->s_unlink_count)
     return -EBUSY;
  ...
}
------------------------------------------------------------------------

Besides, my reproducer also detects the following message:
"Ext4-fs (xxx): ext4_da_writepages: jbd2_start: xxx pages, ino xx: err -30"

This is because ext4_remount() cannot guarantee to write all ext4 
filesystem data out due to the delayed allocation feature.
(ext4_da_writepages() fails after ext4_remount() sets MS_RDONLY with 
sb->s_flags)

Therefore, we must write all delayed allocation buffers out before 
ext4_remount() sets sb->s_flags with MS_RDONLY. 

------------------------------------------------------------------------
[example fix] // This requires Miklos' patches. 

ext4_remount() {
  ...
  if (*flags & MS_RDONLY) {
      err = dquot_suspend(sb, -1);
      if (err < 0) 
         goto restore_opts;

      sync_filesystem(sb);  // write all delayed buffers out
      sb->s_flags |= MS_RDONLY;
  ...
}      
------------------------------------------------------------------------

Best Regards,
Toshiyuki Okajima


  reply	other threads:[~2011-10-06 10:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-02 21:00 EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list Christian Kujau
2011-09-06 16:17 ` Eric Sandeen
2011-09-06 16:37   ` Christian Kujau
2011-09-06 16:44     ` Eric Sandeen
2011-09-06 18:14       ` Christian Kujau
2011-09-08 18:51       ` Jan Kara
2011-09-10  1:11         ` Christian Kujau
2011-09-10 20:04           ` Jan Kara
2011-09-13  4:52             ` Christian Kujau
2011-09-16  3:49               ` Christian Kujau
2011-09-16 12:04                 ` Amir Goldstein
2011-09-16 12:17                   ` Christian Kujau
2011-09-16 12:36                     ` Amir Goldstein
2011-10-05 18:03                 ` Jan Kara
2011-10-06  1:34                   ` Christian Kujau
2011-10-06 10:12                     ` Toshiyuki Okajima [this message]
2011-10-11  8:45                       ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E8D7F11.8050309@jp.fujitsu.com \
    --to=toshi.okajima@jp.fujitsu.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lists@nerdbynature.de \
    --cc=mszeredi@suse.cz \
    --cc=sandeen@redhat.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox