All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akira Fujita <a-fujita@rs.jp.nec.com>
To: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [RFC][PATCH 2/3] ext4: Exchange the blocks between two inodes
Date: Tue, 03 Mar 2009 17:36:05 +0900	[thread overview]
Message-ID: <49ACEBF5.8020407@rs.jp.nec.com> (raw)
In-Reply-To: <20090204152029.GE14762@mit.edu>

Hi Ted,
Thank you for comment. :-)

Theodore Tso wrote:
>> +
>> +	up_write(&EXT4_I(org_inode)->i_data_sem);
>> +	ret = a_ops->write_begin(o_filp, mapping, offs, data_len, w_flags,
>> +								&page, &fsdata);
>> +	down_write(&EXT4_I(org_inode)->i_data_sem);
>>     
>
> This is going to be a problem.  Once we release i_data_sem, there is
> the possibility that other processes which might be running and
> accessing the file at the same time that the defragger is running
> could be blocked waiting for i_data_sem to be released.  The moment it
> gets released, they will grab the lock then start to modify extent
> tree --- either allocating new blocks to it, or worse, truncating or
> unlinking the target inode.
>
> This is going to be a mess to fix, since Linux doesn't have recursive
> locking primitives.  We do take i_mutex, which will protect us against
> truncates, but it won't protect against a write() system call.  Also,
> if there inode has delayed allocation blocks pending, those could get
> written out by the page cleaner, and i_mutex won't protect us against
> changes to i_data_sem, either.
>
>   
As you said, we take i_mutex at the start of ext4_defrag()
and hold it until the end of this function,
so orig file is protected against truncates and
it never be shrunk during defrag.

On the other hand, semaphore is released/taken around write_begin()
in ext4_defrag_partial() every page, so it does not protect orig file
against a write() system call from other process.  
So that defrag result (fragmentation) might not be best,
but data corruption does not occur at least.
So I think it is not a serious problem.

As above, it is not necessary to lock the whole of ext4_defrag()
with semaphore, we should just lock only a necessary point.
Therefore defrag V1's lock seems have unneeded lock points.
I will change lock point and semaphore type in the next version.

Do I overlook something?

Regards,
Akira Fujita


> We could add special-case kludgery to wrap around all of the places
> that takes or release the i_data_sem so that we get recursive locking
> support --- but that would be very ugly indeed.
>
> I'm not sure what's the best way to deal with this; maybe if we sleep
> on it someone will come up with a better suggestion --- but it's
> something that we have to figure out.
>
> 	  					- Ted
>
>   



  reply	other threads:[~2009-03-03 23:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30  6:13 [RFC][PATCH 2/3] ext4: Exchange the blocks between two inodes Akira Fujita
2009-02-04 15:20 ` Theodore Tso
2009-03-03  8:36   ` Akira Fujita [this message]
  -- strict thread matches above, loose matches on Subject: below --
2009-01-30  6:13 Akira Fujita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49ACEBF5.8020407@rs.jp.nec.com \
    --to=a-fujita@rs.jp.nec.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.