All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Brian Geffon <bgeffon@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	linux-kernel@vger.kernel.org,
	Suleiman Souhlal <suleiman@google.com>,
	Rom Lemarchand <romlem@google.com>,
	linux-mm@kvack.org
Subject: Re: [RESEND RFC] zram: Allow rw_page when page isn't written back.
Date: Fri, 23 Sep 2022 19:31:30 +0000	[thread overview]
Message-ID: <Yy4JkpZ/SnXtrVRf@google.com> (raw)
In-Reply-To: <20220908125037.1119114-1-bgeffon@google.com>

On Thu, Sep 08, 2022 at 08:50:37AM -0400, Brian Geffon wrote:
> Today when a zram device has a backing device we change the ops to
> a new set which does not expose a rw_page method. This prevents the
> upper layers from trying to issue a synchronous rw. This has the
> downside that we penalize every rw even when it could possibly

Do you mean addiontal bio alloc/free?
Please specify something more detail.

> still be performed as a synchronous rw. By the very nature of

Even though zram go though the block layer in the case, it's still
synchronous operation against on in-memory compressed data. Only
asynchrnous IO happens for the data in backing device.

> zram all writes are synchronous so it's unfortunate to have to
> accept this limitation.
> 
> This change will always expose a rw_page function and if the page
> has been written back it will return -EOPNOTSUPP which will force the
> upper layers to try again with bio.

Sounds a good idea.

> 
> To safely allow a synchronous read to proceed for pages which have not
> yet written back we introduce a new flag ZRAM_NO_WB. On the first
> synchronous read if the page is not written back we will set the
> ZRAM_NO_WB flag. This flag, which is never cleared, prevents writeback
> from ever happening to that page.

Why do we need a addtional flag?
Why couldn't we do?

1. expose the rw_page all the time.
2. If the page was written back, just return an error in rw_page to make
upper layer retry it with bio.

> 
> This approach works because in the case of zram as a swap backing device
> the page is going to be removed from zram shortly thereafter so
> preventing writeback is fine. However, if zram is being used as a
> generic block device then this might prevent writeback of the page.
> 
> This proposal is still very much RFC, feedback would be appreciated.
> 
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> ---
>  drivers/block/zram/zram_drv.c | 68 +++++++++++++++++++++--------------
>  drivers/block/zram/zram_drv.h |  1 +
>  2 files changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 92cb929a45b7..22b69e8b6042 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -52,9 +52,6 @@ static unsigned int num_devices = 1;
>  static size_t huge_class_size;
>  
>  static const struct block_device_operations zram_devops;
> -#ifdef CONFIG_ZRAM_WRITEBACK
> -static const struct block_device_operations zram_wb_devops;
> -#endif
>  
>  static void zram_free_page(struct zram *zram, size_t index);
>  static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
> @@ -309,7 +306,8 @@ static void mark_idle(struct zram *zram, ktime_t cutoff)
>  		 */
>  		zram_slot_lock(zram, index);
>  		if (zram_allocated(zram, index) &&
> -				!zram_test_flag(zram, index, ZRAM_UNDER_WB)) {
> +				!zram_test_flag(zram, index, ZRAM_UNDER_WB) &&
> +				!zram_test_flag(zram, index, ZRAM_NO_WB)) {
>  #ifdef CONFIG_ZRAM_MEMORY_TRACKING
>  			is_idle = !cutoff || ktime_after(cutoff, zram->table[index].ac_time);
>  #endif
> @@ -439,7 +437,6 @@ static void reset_bdev(struct zram *zram)
>  	filp_close(zram->backing_dev, NULL);
>  	zram->backing_dev = NULL;
>  	zram->bdev = NULL;
> -	zram->disk->fops = &zram_devops;
>  	kvfree(zram->bitmap);
>  	zram->bitmap = NULL;
>  }
> @@ -543,17 +540,6 @@ static ssize_t backing_dev_store(struct device *dev,
>  	zram->backing_dev = backing_dev;
>  	zram->bitmap = bitmap;
>  	zram->nr_pages = nr_pages;
> -	/*
> -	 * With writeback feature, zram does asynchronous IO so it's no longer
> -	 * synchronous device so let's remove synchronous io flag. Othewise,
> -	 * upper layer(e.g., swap) could wait IO completion rather than
> -	 * (submit and return), which will cause system sluggish.
> -	 * Furthermore, when the IO function returns(e.g., swap_readpage),
> -	 * upper layer expects IO was done so it could deallocate the page
> -	 * freely but in fact, IO is going on so finally could cause
> -	 * use-after-free when the IO is really done.
> -	 */
> -	zram->disk->fops = &zram_wb_devops;
>  	up_write(&zram->init_lock);
>  
>  	pr_info("setup backing device %s\n", file_name);
> @@ -722,7 +708,8 @@ static ssize_t writeback_store(struct device *dev,
>  
>  		if (zram_test_flag(zram, index, ZRAM_WB) ||
>  				zram_test_flag(zram, index, ZRAM_SAME) ||
> -				zram_test_flag(zram, index, ZRAM_UNDER_WB))
> +				zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
> +				zram_test_flag(zram, index, ZRAM_NO_WB))
>  			goto next;
>  
>  		if (mode & IDLE_WRITEBACK &&
> @@ -1226,6 +1213,10 @@ static void zram_free_page(struct zram *zram, size_t index)
>  		goto out;
>  	}
>  
> +	if (zram_test_flag(zram, index, ZRAM_NO_WB)) {
> +		zram_clear_flag(zram, index, ZRAM_NO_WB);
> +	}
> +
>  	/*
>  	 * No memory is allocated for same element filled pages.
>  	 * Simply clear same page flag.
> @@ -1654,6 +1645,40 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  	index = sector >> SECTORS_PER_PAGE_SHIFT;
>  	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
>  
> +#ifdef CONFIG_ZRAM_WRITEBACK
> +	/*
> +	 * With writeback feature, zram does asynchronous IO so it's no longer
> +	 * synchronous device so let's remove synchronous io flag. Othewise,
> +	 * upper layer(e.g., swap) could wait IO completion rather than
> +	 * (submit and return), which will cause system sluggish.
> +	 * Furthermore, when the IO function returns(e.g., swap_readpage),
> +	 * upper layer expects IO was done so it could deallocate the page
> +	 * freely but in fact, IO is going on so finally could cause
> +	 * use-after-free when the IO is really done.
> +	 *
> +	 * If the page is not currently written back then we may proceed to
> +	 * read the page synchronously, otherwise, we must fail with
> +	 * -EOPNOTSUPP to force the upper layers to use a normal bio.
> +	 */
> +	zram_slot_lock(zram, index);
> +	if (zram_test_flag(zram, index, ZRAM_WB) ||
> +			zram_test_flag(zram, index, ZRAM_UNDER_WB)) {
> +		zram_slot_unlock(zram, index);
> +		/* We cannot proceed with synchronous read */
> +		return -EOPNOTSUPP;
> +	}
> +
> +	/*
> +	 * Don't allow the page to be written back while we read it,
> +	 * this flag is never cleared. It shouldn't be a problem that
> +	 * we don't clear this flag because in the case of swap this
> +	 * page will be removed shortly after this read anyway.
> +	 */
> +	if (op == REQ_OP_READ)
> +		zram_set_flag(zram, index, ZRAM_NO_WB);
> +	zram_slot_unlock(zram, index);
> +#endif
> +
>  	bv.bv_page = page;
>  	bv.bv_len = PAGE_SIZE;
>  	bv.bv_offset = 0;
> @@ -1827,15 +1852,6 @@ static const struct block_device_operations zram_devops = {
>  	.owner = THIS_MODULE
>  };
>  
> -#ifdef CONFIG_ZRAM_WRITEBACK
> -static const struct block_device_operations zram_wb_devops = {
> -	.open = zram_open,
> -	.submit_bio = zram_submit_bio,
> -	.swap_slot_free_notify = zram_slot_free_notify,
> -	.owner = THIS_MODULE
> -};
> -#endif
> -
>  static DEVICE_ATTR_WO(compact);
>  static DEVICE_ATTR_RW(disksize);
>  static DEVICE_ATTR_RO(initstate);
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index 158c91e54850..20e4c6a579e0 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -50,6 +50,7 @@ enum zram_pageflags {
>  	ZRAM_UNDER_WB,	/* page is under writeback */
>  	ZRAM_HUGE,	/* Incompressible page */
>  	ZRAM_IDLE,	/* not accessed page since last idle marking */
> +	ZRAM_NO_WB,	/* Do not allow page to be written back */
>  
>  	__NR_ZRAM_PAGEFLAGS,
>  };
> -- 
> 2.37.2.789.g6183377224-goog
> 
> 


  parent reply	other threads:[~2022-09-23 19:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 12:50 [RESEND RFC] zram: Allow rw_page when page isn't written back Brian Geffon
2022-09-09  8:30 ` Sergey Senozhatsky
2022-09-12  4:37   ` Sergey Senozhatsky
2022-09-12  6:07     ` Sergey Senozhatsky
2022-09-23 19:31 ` Minchan Kim [this message]
2022-09-30 19:33   ` Brian Geffon
2022-09-30 19:52   ` [PATCH] zram: Always expose rw_page Brian Geffon
2022-10-03  2:59     ` Sergey Senozhatsky
2022-10-03 14:46       ` Brian Geffon
2022-10-03 14:48   ` [PATCH v2] " Brian Geffon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yy4JkpZ/SnXtrVRf@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ngupta@vflare.org \
    --cc=romlem@google.com \
    --cc=senozhatsky@chromium.org \
    --cc=suleiman@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.