linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: Ted Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: Check io list state and avoid an unnecessary mutex_lock in ext4_end_io_work.
Date: Sun, 30 Oct 2011 15:50:25 +0800	[thread overview]
Message-ID: <4EAD01C1.1060002@tao.ma> (raw)
In-Reply-To: <20111029205740.GB16825@thunk.org>

Hi Ted,
On 10/30/2011 04:57 AM, Ted Ts'o wrote:
> On Wed, Sep 14, 2011 at 04:33:02PM +0800, Tao Ma wrote:
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> When we finish the end io work in ext4_flush_completed_IO, we take
>> the io work away from the list, but don't free it. Then in the workqueue,
>> we can check the list state and then avoid the extra work if it is also
>> done. It is good, but we check list state in ext4_end_io_nolock with i_mutex held
>> instead of the spin_lock in other places. This is wrong.
> 
> Hi Tao,
> 
> Thanks for finding and pointing out this bug in ext4_end_io_nolock().
> Unfortunately your proposed fix doesn't take into account that there
> are other callers of ext4_end_io_nolock() besides ext4_end_io_work(),
> and so it's not sufficient to move the test from former function to
> the latter.
sorry, but I thought I had considered this case.
There are 2 callers. One is ext4_end_io_work(which has the bug I pointed
out), the other is ext4_flush_complete_IO which has already done the
check before calling ext4_end_io_nolock. And that's the reason why I
move the check from ext4_end_io_nolock to ext4_end_io_work. So for the
ext4_flush_complete_IO case, your new patch will spin_lock twice for the
checking. Do I miss something here?

Thanks
Tao
> 
> Attached please find the patch which I am planning to check into the
> ext4 tree to address this bug which you have pointed out.
> 
> Regards,
> 
> 						- Ted
> 
> commit c0e36d8410bfad4db4edefeb4175f85a5d216c8d
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Sat Oct 29 16:57:19 2011 -0400
> 
>     ext4: use spinlock before checking io->list in ext4_io_end_nolock()
>     
>     In ext4_end_io_nolock(), io->list is checked to see if it is empty
>     without taking the ei->i_completed_io_lock spinlock.  This violates
>     the locking protocol, and can cause very hard to debug failures.
>     
>     Also optimize ext4_end_io_work() so that if ext4_end_io_nolock() is
>     not going to do any work, don't try getting the i_mutex and possibly
>     requeuing the end_io request if the trylock doesn't succeed in grabbing
>     the mutex.
>     
>     Thanks to Tao Ma <boyu.mt@taobao.com> for spotting the error and
>     providing an initial fix to address the problem.
>     
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>     Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> 
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 92f38ee..0af5607 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -90,21 +90,27 @@ void ext4_free_io_end(ext4_io_end_t *io)
>   */
>  int ext4_end_io_nolock(ext4_io_end_t *io)
>  {
> -	struct inode *inode = io->inode;
> -	loff_t offset = io->offset;
> -	ssize_t size = io->size;
> -	wait_queue_head_t *wq;
> -	int ret = 0;
> +	unsigned long		flags;
> +	struct inode		*inode = io->inode;
> +	loff_t			offset = io->offset;
> +	ssize_t			size = io->size;
> +	struct ext4_inode_info	*ei = EXT4_I(inode);
> +	wait_queue_head_t	*wq;
> +	int			ret;
>  
>  	ext4_debug("ext4_end_io_nolock: io 0x%p from inode %lu,list->next 0x%p,"
>  		   "list->prev 0x%p\n",
>  		   io, inode->i_ino, io->list.next, io->list.prev);
>  
> -	if (list_empty(&io->list))
> -		return ret;
> +	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> +	if (list_empty(&io->list)) {
> +		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> +		return 0;
> +	}
> +	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
>  
>  	if (!(io->flag & EXT4_IO_END_UNWRITTEN))
> -		return ret;
> +		return 0;
>  
>  	ret = ext4_convert_unwritten_extents(inode, offset, size);
>  	if (ret < 0) {
> @@ -142,6 +148,16 @@ static void ext4_end_io_work(struct work_struct *work)
>  	unsigned long		flags;
>  	int			ret;
>  
> +	if (!(io->flag & EXT4_IO_END_UNWRITTEN))
> +		goto free_io_end;
> +
> +	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> +	if (list_empty(&io->list)) {
> +		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> +		goto free_io_end;
> +	}
> +	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> +
>  	if (!mutex_trylock(&inode->i_mutex)) {
>  		/*
>  		 * Requeue the work instead of waiting so that the work
> @@ -170,6 +186,7 @@ static void ext4_end_io_work(struct work_struct *work)
>  		list_del_init(&io->list);
>  	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
>  	mutex_unlock(&inode->i_mutex);
> +free_io_end:
>  	ext4_free_io_end(io);
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2011-10-30  7:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-14  8:33 [PATCH] ext4: Check io list state and avoid an unnecessary mutex_lock in ext4_end_io_work Tao Ma
2011-10-29 20:57 ` Ted Ts'o
2011-10-30  7:50   ` Tao Ma [this message]
2011-10-31 15:02     ` Ted Ts'o
2011-10-31 15:02       ` [PATCH 1/3] ext4: Use correct locking for ext4_end_io_nolock() Theodore Ts'o
2011-10-31 15:02         ` [PATCH 2/3] ext4: remove unnecessary call to waitqueue_active() Theodore Ts'o
2011-10-31 15:02         ` [PATCH 3/3] ext4: optimize locking for end_io extent conversion Theodore Ts'o
2011-11-01  2:51           ` Tao Ma
2011-11-01 10:30             ` Theodore Tso
2011-10-31 15:28         ` [PATCH 1/3] ext4: Use correct locking for ext4_end_io_nolock() Tao Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EAD01C1.1060002@tao.ma \
    --to=tm@tao.ma \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).