linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>,
	Jerome Marchand <jmarchan@redhat.com>,
	driverdev-devel@linuxdriverproject.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
Date: Tue, 17 Sep 2013 20:24:45 +0300	[thread overview]
Message-ID: <20130917172445.GA2289@swordfish> (raw)
In-Reply-To: <20130916000217.GA18616@bbox>


Hello,

On (09/16/13 09:02), Minchan Kim wrote:
> Hello Sergey,
> 
> Sorry for really slow response. I was really busy by internal works
> and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> I read your threads roughly so I may miss something. If so, sorry
> for that. Anyway I will put my opinion.
> 
> On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> 
> Right but "init_lock" is what I really want to remove.
> Yes. It's just read-side lock so most of time it doesn't hurt us but it
> makes code very complicated and deadlock prone so I'd like to replace
> it with RCU. Yeah, It's off topic but just let me put my opinion in
> future direction.
> 
> Abought the bug, how about moving flush_work below down_write(init_lock)?
> zram_make_request is already closed by init_lock and we have a rule about
> lock ordering as following so I don't see any problem.
> 
>   init_lock
>     zram->lock
> 
> > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > zram_reset_device(). This also allows to safely check zram->init_done
> > in handle_pending_slot_free().
> > 
> > Initial intention was to minimze number of handle_pending_slot_free()
> > call from zram_bvec_rw(), which were slowing down READ requests due to
> > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > handle_pending_slot_free() from zram_bvec_rw().
> > 
> > Link: https://lkml.org/lkml/2013/9/9/172
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > 
> > ---
> > 
> >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> >  1 file changed, 5 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > index 91d94b5..7a2d4de 100644
> > --- a/drivers/staging/zram/zram_drv.c
> > +++ b/drivers/staging/zram/zram_drv.c
> > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> >  	while (zram->slot_free_rq) {
> >  		free_rq = zram->slot_free_rq;
> >  		zram->slot_free_rq = free_rq->next;
> > -		zram_free_page(zram, free_rq->index);
> > +		if (zram->init_done)
> > +			zram_free_page(zram, free_rq->index);
> >  		kfree(free_rq);
> >  	}
> >  	spin_unlock(&zram->slot_free_lock);
> > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> >  
> >  	if (rw == READ) {
> >  		down_read(&zram->lock);
> > -		handle_pending_slot_free(zram);
> 
> Read side is okay but actually I have a nitpick.
> If someone poll a block in zram-swap device, he would see a block
> has zero value suddenly although there was no I/O.(I don't want to argue
> it's sane user or not, anyway) it never happens on real block device and
> it never happens on zram-block device. Only it can happen zram-swap device.
> And such behavior was there since we introduced swap_slot_free_notify.
> (off-topic: I'd like to remove it because it makes tight coupling between
> zram and swap and obviously, it was layering violation function)
> so now, I don't have strong objection. 
> 
> The idea is to remove swap_slot_free_notify is to use frontswap when
> user want to use zram as swap so zram can be notified when the block
> lose the owner but still we should solve the mutex problem in notify
> handler.
> 
> 
> >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> >  		up_read(&zram->lock);
> >  	} else {
> >  		down_write(&zram->lock);
> > -		handle_pending_slot_free(zram);
> 
> Why did you remove this in write-side?
> We can't expect when the work will trigger. It means the work could remove
> valid block under the us.
> 


not sure I understand how.
zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
any RW requests. RW requests are under read() lock and zram_slot_free() is under
write() lock.

> >  		ret = zram_bvec_write(zram, bvec, index, offset);
> >  		up_write(&zram->lock);
> >  	}
> > -
> >  	return ret;
> >  }
> >  
> > @@ -750,12 +748,11 @@ error:
> >  
> >  static void zram_slot_free(struct work_struct *work)
> >  {
> > -	struct zram *zram;
> > +	struct zram *zram = container_of(work, struct zram, free_work);
> >  
> > -	zram = container_of(work, struct zram, free_work);
> > -	down_write(&zram->lock);
> > +	down_write(&zram->init_lock);
> 
> I don't like this.
> Primary problem is we should handle it as atomic so that we should use
> spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> I should solve this problem as that way.
> 
> The simple solution popped from my mind is that
> 
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 91d94b5..b23bf0e 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
>  
>  	if (rw == READ) {
>  		down_read(&zram->lock);
> -		handle_pending_slot_free(zram);
>  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
>  		up_read(&zram->lock);
>  	} else {
>  		down_write(&zram->lock);
> +		/*
> +		 * We should free pending slot. Otherwise it would
> +		 * free valid blocks under the us.
> +		 */
>  		handle_pending_slot_free(zram);
>  		ret = zram_bvec_write(zram, bvec, index, offset);
>  		up_write(&zram->lock);
> @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	size_t index;
>  	struct zram_meta *meta;
>  
> -	flush_work(&zram->free_work);
>  
>  	down_write(&zram->init_lock);
>  	if (!zram->init_done) {
> @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> +	flush_work(&zram->free_work);
>  	meta = zram->meta;
>  	zram->init_done = 0;

this one looks ok to me.

	-ss

>  But more ideal way I am thinking now is 
> 
> 1) replace init_lock with RCU lock
> 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> 3) use atmoic lock in notify handler.
> 
> -- 
> Kind regards,
> Minchan Kim
> 

  reply	other threads:[~2013-09-17 17:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-06 15:12 [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
2013-09-09 12:33 ` Dan Carpenter
2013-09-09 12:49   ` Sergey Senozhatsky
2013-09-09 13:21     ` Dan Carpenter
2013-09-09 13:46       ` Jerome Marchand
2013-09-09 16:10         ` Jerome Marchand
2013-09-10 14:34           ` Sergey Senozhatsky
2013-09-10 14:58             ` Dan Carpenter
2013-09-10 15:15               ` Greg Kroah-Hartman
2013-09-10 23:12               ` [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race Sergey Senozhatsky
2013-09-12 22:12                 ` Greg KH
2013-09-13  9:17                   ` Sergey Senozhatsky
2013-09-16  0:02                 ` Minchan Kim
2013-09-17 17:24                   ` Sergey Senozhatsky [this message]
2013-09-23  4:24                     ` Minchan Kim
2013-09-23  8:42                       ` Sergey Senozhatsky
2013-09-10 23:19               ` [PATCH 2/2] staging: zram: remove init_done from zram struct (v3) Sergey Senozhatsky
2013-09-10 23:27             ` [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
2013-09-09 14:42       ` Sergey Senozhatsky
2013-09-09 14:52         ` Dan Carpenter
2013-09-09 15:09           ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130917172445.GA2289@swordfish \
    --to=sergey.senozhatsky@gmail.com \
    --cc=dan.carpenter@oracle.com \
    --cc=driverdev-devel@linuxdriverproject.org \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).