Re: [PATCH] zram: fix umount-reset_store-mount race condition

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jerome Marchand <jmarchan@redhat.com>,
	Ganesh Mahendran <opensource.ganesh@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] zram: fix umount-reset_store-mount race condition
Date: Tue, 3 Feb 2015 12:01:28 +0900	[thread overview]
Message-ID: <20150203030127.GA1541@blaptop> (raw)
In-Reply-To: <1422886120-16534-1-git-send-email-sergey.senozhatsky@gmail.com>

Hello Sergey,

On Mon, Feb 02, 2015 at 11:08:40PM +0900, Sergey Senozhatsky wrote:
> Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex
> to avoid ->bd_holders race condition:
> 
>         CPU0                            CPU1
> umount /* zram->init_done is true */
> reset_store()
> bdev->bd_holders == 0                   mount
> ...                                     zram_make_request()
> zram_reset_device()
> 
> However, his solution required some considerable amount of code movement,
> which we can avoid.
> 
> Apart from using bdev->bd_mutex in reset_store(), this patch also
> simplifies zram_reset_device().
> 
> zram_reset_device() has a bool parameter reset_capacity which tells
> it whether disk capacity and itself disk should be reset. There are
> two zram_reset_device() callers:
> -- zram_exit() passes reset_capacity=false
> -- reset_store() passes reset_capacity=true
> 
> So we can move reset_capacity-sensitive work out of zram_reset_device()
> and perform it unconditionally in reset_store(). This also lets us drop
> reset_capacity parameter from zram_reset_device() and pass zram pointer
> only.
> 
> Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Looks good to me but as I told you, we need to check processes open
the block device as !FMODE_EXCL.
I can make following warning with below simple script.
In addition, I added msleep(2000) below set_capacity(zram->disk, 0)
after applying your patch to make window huge(Kudos to Ganesh!)

#!/bin/bash

echo $((60<<30)) > /sys/block/zram0/disksize
# Check your kernel turns on CONFIG_SCHED_AUTOGROUP=y
# I have no idea why it doesn't happen without setsid and CONFIG_SCHED_AUTOGROUP=y
# Please let me know it if someone can know the reason.
setsid dd if=/dev/zero of=/dev/zram0 &
sleep 1
setsid echo 1 > /sys/block/zram0/reset

[   30.683449] zram0: detected capacity change from 0 to 64424509440
[   33.736869] Buffer I/O error on dev zram0, logical block 180823, lost async page write
[   33.738814] Buffer I/O error on dev zram0, logical block 180824, lost async page write
[   33.740654] Buffer I/O error on dev zram0, logical block 180825, lost async page write
[   33.742551] Buffer I/O error on dev zram0, logical block 180826, lost async page write
[   33.744153] Buffer I/O error on dev zram0, logical block 180827, lost async page write
[   33.745807] Buffer I/O error on dev zram0, logical block 180828, lost async page write
[   33.747419] Buffer I/O error on dev zram0, logical block 180829, lost async page write
[   33.749060] Buffer I/O error on dev zram0, logical block 180830, lost async page write
[   33.750687] Buffer I/O error on dev zram0, logical block 180831, lost async page write
[   33.752286] Buffer I/O error on dev zram0, logical block 180832, lost async page write
[   33.811590] ------------[ cut here ]------------
[   33.812038] WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210()
[   33.812817] Modules linked in: 
[   33.813142] CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125
[   33.813837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[   33.814525]  ffffffff81801a2d ffff880061e77db8 ffffffff815b848e 0000000000000001
[   33.815196]  0000000000000000 ffff880061e77df8 ffffffff8104de2a 0000000000000000
[   33.815867]  ffff88005da287f0 ffff88005da28680 ffff88005da28770 ffff88005da28698
[   33.816536] Call Trace:
[   33.816817]  [<ffffffff815b848e>] dump_stack+0x45/0x57
[   33.817304]  [<ffffffff8104de2a>] warn_slowpath_common+0x8a/0xc0
[   33.817829]  [<ffffffff8104df1a>] warn_slowpath_null+0x1a/0x20
[   33.818331]  [<ffffffff811b60b7>] __blkdev_put+0x1d7/0x210
[   33.818797]  [<ffffffff811b69c0>] blkdev_put+0x50/0x130
[   33.819244]  [<ffffffff811b6b55>] blkdev_close+0x25/0x30
[   33.819723]  [<ffffffff8118079f>] __fput+0xdf/0x1e0
[   33.820140]  [<ffffffff811808ee>] ____fput+0xe/0x10
[   33.820576]  [<ffffffff81068e07>] task_work_run+0xa7/0xe0
[   33.821151]  [<ffffffff81002b89>] do_notify_resume+0x49/0x60
[   33.821721]  [<ffffffff815bf09d>] int_signal+0x12/0x17
[   33.822228] ---[ end trace 274fbbc5664827d2 ]---

The warning comes from bdev_write_node in blkdev_put path.

tatic void bdev_write_inode(struct inode *inode)
{
        spin_lock(&inode->i_lock);
        while (inode->i_state & I_DIRTY) {
                spin_unlock(&inode->i_lock);
                WARN_ON_ONCE(write_inode_now(inode, true)); <========= here.
                spin_lock(&inode->i_lock);
        }    
        spin_unlock(&inode->i_lock);
}

The reason is dd process encounters I/O fails due to sudden block device
disappear so in filemap_check_errors in __writeback_single_inode returns
-EIO.

If we checks bd_openners instead of bd_holders, we could address the
problem. When I see the brd, it already have used it rather than
bd_holders so although I'm not a expert of block layer, it seems
to be better.

> ---
>  drivers/block/zram/zram_drv.c | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index aa5a4c5..a32069f 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -717,7 +717,7 @@ static void zram_bio_discard(struct zram *zram, u32 index,
>  	}
>  }
>  
> -static void zram_reset_device(struct zram *zram, bool reset_capacity)
> +static void zram_reset_device(struct zram *zram)
>  {
>  	down_write(&zram->init_lock);
>  
> @@ -736,18 +736,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	memset(&zram->stats, 0, sizeof(zram->stats));
>  
>  	zram->disksize = 0;
> -	if (reset_capacity)
> -		set_capacity(zram->disk, 0);
> -
>  	up_write(&zram->init_lock);
> -
> -	/*
> -	 * Revalidate disk out of the init_lock to avoid lockdep splat.
> -	 * It's okay because disk's capacity is protected by init_lock
> -	 * so that revalidate_disk always sees up-to-date capacity.
> -	 */
> -	if (reset_capacity)
> -		revalidate_disk(zram->disk);
>  }
>  
>  static ssize_t disksize_store(struct device *dev,
> @@ -820,6 +809,7 @@ static ssize_t reset_store(struct device *dev,
>  	if (!bdev)
>  		return -ENOMEM;
>  
> +	mutex_lock(&bdev->bd_mutex);
>  	/* Do not reset an active device! */
>  	if (bdev->bd_holders) {
>  		ret = -EBUSY;
> @@ -837,12 +827,17 @@ static ssize_t reset_store(struct device *dev,
>  
>  	/* Make sure all pending I/O is finished */
>  	fsync_bdev(bdev);
> +	zram_reset_device(zram);
> +	set_capacity(zram->disk, 0);
> +
> +	mutex_unlock(&bdev->bd_mutex);
> +	revalidate_disk(zram->disk);
>  	bdput(bdev);
>  
> -	zram_reset_device(zram, true);
>  	return len;
>  
>  out:
> +	mutex_unlock(&bdev->bd_mutex);
>  	bdput(bdev);
>  	return ret;
>  }
> @@ -1188,7 +1183,7 @@ static void __exit zram_exit(void)
>  		 * Shouldn't access zram->disk after destroy_device
>  		 * because destroy_device already released zram->disk.
>  		 */
> -		zram_reset_device(zram, false);
> +		zram_reset_device(zram);
>  	}
>  
>  	unregister_blkdev(zram_major, "zram");
> -- 
> 2.3.0.rc2
> 

-- 
Kind regards,
Minchan Kim

next prev parent reply	other threads:[~2015-02-03  3:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-02 14:08 [PATCH] zram: fix umount-reset_store-mount race condition Sergey Senozhatsky
2015-02-03  3:01 ` Minchan Kim [this message]
2015-02-03 14:05 ` Ganesh Mahendran
2015-02-03 14:15   ` Sergey Senozhatsky
2015-02-03 14:52     ` Sergey Senozhatsky
2015-02-03 15:06       ` Sergey Senozhatsky
2015-02-03 15:39         ` Minchan Kim
2015-02-03 16:20           ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150203030127.GA1541@blaptop \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=opensource.ganesh@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.