* [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
@ 2013-10-17 17:31 Merla, ShivaKrishna
2013-10-17 18:53 ` Mike Snitzer
0 siblings, 1 reply; 10+ messages in thread
From: Merla, ShivaKrishna @ 2013-10-17 17:31 UTC (permalink / raw)
To: dm-devel@redhat.com; +Cc: snitzer@redhat.com, agk@redhat.com
Whenever multipath_dtr is happening, we should prevent queueing any further path
activation work. There was a kernel panic where after pg_init_done() decrements
pg_init_in_progress to 0, wait_for_pg_init_completion call assumes there are no
more pending path management commands. But if pg_init_required is set by
pg_init_done call due to retriable mode_select errors , then process_queued_ios()
will again queue the path activation work. If free_multipath call has been
completed by the time activate_path work is called, kernel panic was seen on
accessing multipath members.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
[<ffffffff81090ac0>] worker_thread+0x170/0x2a0
[<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
This patch will fix the issue by preventing any further path activations when
multipath structures are being freed during table suspend/reload.
Signed-off-by: Shiva Krishna Merla<shivakrishna.merla@netapp.com>
Reviewed-by: Krishnasamy Somasundaram<somasundaram.krishnasamy@netapp.com>
Tested-by: Speagle Andy<Andy.Speagle@netapp.com>
---
--- a/drivers/md/dm-mpath.c 2013-01-29 10:12:10.000000000 -0600
+++ b/drivers/md/dm-mpath.c 2013-10-17 09:23:21.896062928 -0500
@@ -73,6 +73,7 @@ struct multipath {
wait_queue_head_t pg_init_wait; /* Wait for pg_init completion */
+ unsigned dtr_in_progress; /* multipath destroy in progress */
unsigned pg_init_required; /* pg_init needs calling? */
unsigned pg_init_in_progress; /* Only one pg_init allowed at once */
unsigned pg_init_delay_retry; /* Delay pg_init retry? */
@@ -498,7 +499,8 @@ static void process_queued_ios(struct wo
(!pgpath && !m->queue_if_no_path))
must_queue = 0;
- if (m->pg_init_required && !m->pg_init_in_progress && pgpath)
+ if (m->pg_init_required && !m->pg_init_in_progress && pgpath &&
+ !m->dtr_in_progress)
__pg_init_all_paths(m);
spin_unlock_irqrestore(&m->lock, flags);
@@ -952,6 +954,7 @@ static void multipath_dtr(struct dm_targ
{
struct multipath *m = ti->private;
+ m->dtr_in_progress = 1;
flush_multipath_work(m);
free_multipath(m);
}
@@ -1164,7 +1167,7 @@ static int pg_init_limit_reached(struct
spin_lock_irqsave(&m->lock, flags);
- if (m->pg_init_count <= m->pg_init_retries)
+ if (m->pg_init_count <= m->pg_init_retries && !m->dtr_in_progress)
m->pg_init_required = 1;
else
limit_reached = 1;
--
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-17 17:31 [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done Merla, ShivaKrishna
@ 2013-10-17 18:53 ` Mike Snitzer
2013-10-17 21:47 ` Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2013-10-17 18:53 UTC (permalink / raw)
To: Merla, ShivaKrishna; +Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
Thanks for reporting this. Much appreciated. More comments below.
On Thu, Oct 17 2013 at 1:31pm -0400,
Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
> Whenever multipath_dtr is happening, we should prevent queueing any further path
> activation work. There was a kernel panic where after pg_init_done() decrements
> pg_init_in_progress to 0, wait_for_pg_init_completion call assumes there are no
> more pending path management commands. But if pg_init_required is set by
> pg_init_done call due to retriable mode_select errors , then process_queued_ios()
> will again queue the path activation work. If free_multipath call has been
> completed by the time activate_path work is called, kernel panic was seen on
> accessing multipath members.
Your locking looks suspect to me, see comment inlined below multipath_dtr
But shouldn't we just train multipath_wait_for_pg_init_completion() to
look at m->pg_init_required? Have it wait for both pg_init_required and
pg_init_in_progress to be zero? We'd also have to audit that
pg_init_required cannot be set while pg_init_in_progress.
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
> RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
> [<ffffffff81090ac0>] worker_thread+0x170/0x2a0
> [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
>
> This patch will fix the issue by preventing any further path activations when
> multipath structures are being freed during table suspend/reload.
>
> Signed-off-by: Shiva Krishna Merla<shivakrishna.merla@netapp.com>
> Reviewed-by: Krishnasamy Somasundaram<somasundaram.krishnasamy@netapp.com>
> Tested-by: Speagle Andy<Andy.Speagle@netapp.com>
>
> ---
> --- a/drivers/md/dm-mpath.c 2013-01-29 10:12:10.000000000 -0600
> +++ b/drivers/md/dm-mpath.c 2013-10-17 09:23:21.896062928 -0500
> @@ -73,6 +73,7 @@ struct multipath {
>
> wait_queue_head_t pg_init_wait; /* Wait for pg_init completion */
>
> + unsigned dtr_in_progress; /* multipath destroy in progress */
> unsigned pg_init_required; /* pg_init needs calling? */
> unsigned pg_init_in_progress; /* Only one pg_init allowed at once */
> unsigned pg_init_delay_retry; /* Delay pg_init retry? */
> @@ -498,7 +499,8 @@ static void process_queued_ios(struct wo
> (!pgpath && !m->queue_if_no_path))
> must_queue = 0;
>
> - if (m->pg_init_required && !m->pg_init_in_progress && pgpath)
> + if (m->pg_init_required && !m->pg_init_in_progress && pgpath &&
> + !m->dtr_in_progress)
> __pg_init_all_paths(m);
>
> spin_unlock_irqrestore(&m->lock, flags);
> @@ -952,6 +954,7 @@ static void multipath_dtr(struct dm_targ
> {
> struct multipath *m = ti->private;
>
> + m->dtr_in_progress = 1;
> flush_multipath_work(m);
> free_multipath(m);
> }
Don't we need synchronization in multipath_dtr? Otherwise isn't there
still potential for a narrow race when checking m->dtr_in_progress from
process_queued_ios() or pg_init_limit_reached()?
Anyway, my concerns should be moot if multipath_wait_for_pg_init_completion()
is updated to look at pg_init_required. But I could be missing
something.
Mikulas (or Hannes or NEC guys), would welcome your take on this.
> @@ -1164,7 +1167,7 @@ static int pg_init_limit_reached(struct
>
> spin_lock_irqsave(&m->lock, flags);
>
> - if (m->pg_init_count <= m->pg_init_retries)
> + if (m->pg_init_count <= m->pg_init_retries && !m->dtr_in_progress)
> m->pg_init_required = 1;
> else
> limit_reached = 1;
> --
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-17 21:47 ` Hannes Reinecke
@ 2013-10-17 21:10 ` Mike Snitzer
2013-10-17 22:03 ` Merla, ShivaKrishna
0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2013-10-17 21:10 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Mikulas Patocka, dm-devel@redhat.com, Merla, ShivaKrishna,
agk@redhat.com
On Thu, Oct 17 2013 at 5:47pm -0400,
Hannes Reinecke <hare@suse.de> wrote:
> On 10/17/2013 08:53 PM, Mike Snitzer wrote:
> >Thanks for reporting this. Much appreciated. More comments below.
> >
> >On Thu, Oct 17 2013 at 1:31pm -0400,
> >Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
> >
> >>Whenever multipath_dtr is happening, we should prevent queueing any further path
> >>activation work. There was a kernel panic where after pg_init_done() decrements
> >>pg_init_in_progress to 0, wait_for_pg_init_completion call assumes there are no
> >>more pending path management commands. But if pg_init_required is set by
> >>pg_init_done call due to retriable mode_select errors , then process_queued_ios()
> >>will again queue the path activation work. If free_multipath call has been
> >>completed by the time activate_path work is called, kernel panic was seen on
> >>accessing multipath members.
> >
> >Your locking looks suspect to me, see comment inlined below multipath_dtr
> >
> >But shouldn't we just train multipath_wait_for_pg_init_completion() to
> >look at m->pg_init_required? Have it wait for both pg_init_required and
> >pg_init_in_progress to be zero? We'd also have to audit that
> >pg_init_required cannot be set while pg_init_in_progress.
> >
> Hmm.
>
> We _could_ try to resolve it by pushing I/O back onto the request queue
> (cf my earlier post 'requeue I/O during pg_init').
>
> I was hoping to excite some comments with that, but seems to be my
> fate nowadays to send out patches with no reply.
patchwork caught it:
https://patchwork.kernel.org/patch/2969111/
I've just been distracted with other stuff the past week; but I'll be
looking closer at this issue (and your earlier patch) shortly and we'll
get a fix queued for 3.13.
> Anyway, maybe this will be giving it some more attention.
> It definitely would avoid this problem, by virtue of not having to
> queue I/O internally during pg_init, so we could easily tear down
> the queue.
Sounds good.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-17 18:53 ` Mike Snitzer
@ 2013-10-17 21:47 ` Hannes Reinecke
2013-10-17 21:10 ` Mike Snitzer
0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2013-10-17 21:47 UTC (permalink / raw)
To: Mike Snitzer, Merla, ShivaKrishna
Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
On 10/17/2013 08:53 PM, Mike Snitzer wrote:
> Thanks for reporting this. Much appreciated. More comments below.
>
> On Thu, Oct 17 2013 at 1:31pm -0400,
> Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
>
>> Whenever multipath_dtr is happening, we should prevent queueing any further path
>> activation work. There was a kernel panic where after pg_init_done() decrements
>> pg_init_in_progress to 0, wait_for_pg_init_completion call assumes there are no
>> more pending path management commands. But if pg_init_required is set by
>> pg_init_done call due to retriable mode_select errors , then process_queued_ios()
>> will again queue the path activation work. If free_multipath call has been
>> completed by the time activate_path work is called, kernel panic was seen on
>> accessing multipath members.
>
> Your locking looks suspect to me, see comment inlined below multipath_dtr
>
> But shouldn't we just train multipath_wait_for_pg_init_completion() to
> look at m->pg_init_required? Have it wait for both pg_init_required and
> pg_init_in_progress to be zero? We'd also have to audit that
> pg_init_required cannot be set while pg_init_in_progress.
>
Hmm.
We _could_ try to resolve it by pushing I/O back onto the request queue
(cf my earlier post 'requeue I/O during pg_init').
I was hoping to excite some comments with that, but seems to be my fate
nowadays to send out patches with no reply.
Anyway, maybe this will be giving it some more attention.
It definitely would avoid this problem, by virtue of not having to queue
I/O internally during pg_init, so we could easily tear down the queue.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-17 21:10 ` Mike Snitzer
@ 2013-10-17 22:03 ` Merla, ShivaKrishna
2013-10-30 0:57 ` Mike Snitzer
2013-11-01 18:21 ` Mikulas Patocka
0 siblings, 2 replies; 10+ messages in thread
From: Merla, ShivaKrishna @ 2013-10-17 22:03 UTC (permalink / raw)
To: Mike Snitzer, Hannes Reinecke
Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: Thursday, October 17, 2013 4:10 PM
> To: Hannes Reinecke
> Cc: Merla, ShivaKrishna; dm-devel@redhat.com; agk@redhat.com; Mikulas
> Patocka
> Subject: Re: [PATCH]dm-mpath: fix for race condition between
> multipath_dtr and pg_init_done.
>
> On Thu, Oct 17 2013 at 5:47pm -0400,
> Hannes Reinecke <hare@suse.de> wrote:
>
> > On 10/17/2013 08:53 PM, Mike Snitzer wrote:
> > >Thanks for reporting this. Much appreciated. More comments below.
> > >
> > >On Thu, Oct 17 2013 at 1:31pm -0400,
> > >Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
> > >
> > >>Whenever multipath_dtr is happening, we should prevent queueing any
> further path
> > >>activation work. There was a kernel panic where after pg_init_done()
> decrements
> > >>pg_init_in_progress to 0, wait_for_pg_init_completion call assumes
> there are no
> > >>more pending path management commands. But if pg_init_required is
> set by
> > >>pg_init_done call due to retriable mode_select errors , then
> process_queued_ios()
> > >>will again queue the path activation work. If free_multipath call has been
> > >>completed by the time activate_path work is called, kernel panic was
> seen on
> > >>accessing multipath members.
> > >
> > >Your locking looks suspect to me, see comment inlined below
> multipath_dtr
> > >
> > >But shouldn't we just train multipath_wait_for_pg_init_completion() to
> > >look at m->pg_init_required? Have it wait for both pg_init_required and
> > >pg_init_in_progress to be zero? We'd also have to audit that
> > >pg_init_required cannot be set while pg_init_in_progress.
> > >
> > Hmm.
> >
> > We _could_ try to resolve it by pushing I/O back onto the request queue
> > (cf my earlier post 'requeue I/O during pg_init').
> >
> > I was hoping to excite some comments with that, but seems to be my
> > fate nowadays to send out patches with no reply.
>
> patchwork caught it:
> https://patchwork.kernel.org/patch/2969111/
>
> I've just been distracted with other stuff the past week; but I'll be
> looking closer at this issue (and your earlier patch) shortly and we'll
> get a fix queued for 3.13.
>
> > Anyway, maybe this will be giving it some more attention.
> > It definitely would avoid this problem, by virtue of not having to
> > queue I/O internally during pg_init, so we could easily tear down
> > the queue.
>
> Sounds good.
Thanks for your comments. I agree we should lock while setting dtr_in_progress, I think I overlooked it as its handled in process_queued_ios as well.
We looked into handling this in wait_for_pg_init_completion() but checking for pg_init_required here will not help as well ( until we prevent setting pg_init_required while pg_init_in_progress is set ).
Here due to SCSI_DH_RETRY on mode_select, pg_init_done will set the pg_init_required as activation needs to be retried under normal
circumstances. But it completely differs when multipath target is being destroyed. Apparently I didn't see any pending_ios in our test
while this is happening. Just path activations are held up since controller was returning 5/91/36 CC's. With this condition either one of pg_init_required or pg_init_in_progress flags are set all the time.
Hannes patch will take care of preventing queueing of IO's when pg_init_in_progress is set, but currently running activation commands will not return until controller returns SUCCESS on mode_select.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-17 22:03 ` Merla, ShivaKrishna
@ 2013-10-30 0:57 ` Mike Snitzer
2013-10-30 13:32 ` Merla, ShivaKrishna
2013-11-01 18:21 ` Mikulas Patocka
1 sibling, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2013-10-30 0:57 UTC (permalink / raw)
To: Merla, ShivaKrishna; +Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
On Thu, Oct 17 2013 at 6:03pm -0400,
Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
> > From: Mike Snitzer [mailto:snitzer@redhat.com]
> > Sent: Thursday, October 17, 2013 4:10 PM
> > To: Hannes Reinecke
> > Cc: Merla, ShivaKrishna; dm-devel@redhat.com; agk@redhat.com; Mikulas
> > Patocka
> > Subject: Re: [PATCH]dm-mpath: fix for race condition between
> > multipath_dtr and pg_init_done.
> >
> > On Thu, Oct 17 2013 at 5:47pm -0400,
> > Hannes Reinecke <hare@suse.de> wrote:
> >
> > > On 10/17/2013 08:53 PM, Mike Snitzer wrote:
> > > >Thanks for reporting this. Much appreciated. More comments below.
> > > >
> > > >On Thu, Oct 17 2013 at 1:31pm -0400,
> > > >Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
> > > >
> > > >>Whenever multipath_dtr is happening, we should prevent queueing any
> > further path
> > > >>activation work. There was a kernel panic where after pg_init_done()
> > decrements
> > > >>pg_init_in_progress to 0, wait_for_pg_init_completion call assumes
> > there are no
> > > >>more pending path management commands. But if pg_init_required is
> > set by
> > > >>pg_init_done call due to retriable mode_select errors , then
> > process_queued_ios()
> > > >>will again queue the path activation work. If free_multipath call has been
> > > >>completed by the time activate_path work is called, kernel panic was
> > seen on
> > > >>accessing multipath members.
> > > >
> > > >Your locking looks suspect to me, see comment inlined below
> > multipath_dtr
> > > >
> > > >But shouldn't we just train multipath_wait_for_pg_init_completion() to
> > > >look at m->pg_init_required? Have it wait for both pg_init_required and
> > > >pg_init_in_progress to be zero? We'd also have to audit that
> > > >pg_init_required cannot be set while pg_init_in_progress.
> > > >
> > > Hmm.
> > >
> > > We _could_ try to resolve it by pushing I/O back onto the request queue
> > > (cf my earlier post 'requeue I/O during pg_init').
> > >
> > > I was hoping to excite some comments with that, but seems to be my
> > > fate nowadays to send out patches with no reply.
> >
> > patchwork caught it:
> > https://patchwork.kernel.org/patch/2969111/
> >
> > I've just been distracted with other stuff the past week; but I'll be
> > looking closer at this issue (and your earlier patch) shortly and we'll
> > get a fix queued for 3.13.
> >
> > > Anyway, maybe this will be giving it some more attention.
> > > It definitely would avoid this problem, by virtue of not having to
> > > queue I/O internally during pg_init, so we could easily tear down
> > > the queue.
> >
> > Sounds good.
>
> Thanks for your comments. I agree we should lock while setting dtr_in_progress, I think I overlooked it as its handled in process_queued_ios as well.
> We looked into handling this in wait_for_pg_init_completion() but checking for pg_init_required here will not help as well ( until we prevent setting pg_init_required while pg_init_in_progress is set ).
> Here due to SCSI_DH_RETRY on mode_select, pg_init_done will set the pg_init_required as activation needs to be retried under normal
> circumstances. But it completely differs when multipath target is being destroyed. Apparently I didn't see any pending_ios in our test
> while this is happening. Just path activations are held up since controller was returning 5/91/36 CC's. With this condition either one of pg_init_required or pg_init_in_progress flags are set all the time.
> Hannes patch will take care of preventing queueing of IO's when pg_init_in_progress is set, but currently running activation commands will not return until controller returns SUCCESS on mode_select.
So... do you have an updated patch with proper locking that takes
Hannes' patch into consideration?
I'll be reviewing hannes patch closer (for v3.13) tomorrow so if you'd
like your issue resolved in the near-term I'd appreciate us getting some
closer on proposed solutions.
Thanks,
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-30 0:57 ` Mike Snitzer
@ 2013-10-30 13:32 ` Merla, ShivaKrishna
2013-10-30 13:42 ` Mike Snitzer
0 siblings, 1 reply; 10+ messages in thread
From: Merla, ShivaKrishna @ 2013-10-30 13:32 UTC (permalink / raw)
To: Mike Snitzer; +Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: Tuesday, October 29, 2013 7:57 PM
> To: Merla, ShivaKrishna
> Cc: Hannes Reinecke; dm-devel@redhat.com; agk@redhat.com; Mikulas
> Patocka
> Subject: Re: [PATCH]dm-mpath: fix for race condition between
> multipath_dtr and pg_init_done.
>
> On Thu, Oct 17 2013 at 6:03pm -0400,
> Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
>
> > > From: Mike Snitzer [mailto:snitzer@redhat.com]
> > > Sent: Thursday, October 17, 2013 4:10 PM
> > > To: Hannes Reinecke
> > > Cc: Merla, ShivaKrishna; dm-devel@redhat.com; agk@redhat.com;
> Mikulas
> > > Patocka
> > > Subject: Re: [PATCH]dm-mpath: fix for race condition between
> > > multipath_dtr and pg_init_done.
> > >
> > > On Thu, Oct 17 2013 at 5:47pm -0400,
> > > Hannes Reinecke <hare@suse.de> wrote:
> > >
> > > > On 10/17/2013 08:53 PM, Mike Snitzer wrote:
> > > > >Thanks for reporting this. Much appreciated. More comments below.
> > > > >
> > > > >On Thu, Oct 17 2013 at 1:31pm -0400,
> > > > >Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
> > > > >
> > > > >>Whenever multipath_dtr is happening, we should prevent queueing
> any
> > > further path
> > > > >>activation work. There was a kernel panic where after pg_init_done()
> > > decrements
> > > > >>pg_init_in_progress to 0, wait_for_pg_init_completion call assumes
> > > there are no
> > > > >>more pending path management commands. But if pg_init_required
> is
> > > set by
> > > > >>pg_init_done call due to retriable mode_select errors , then
> > > process_queued_ios()
> > > > >>will again queue the path activation work. If free_multipath call has
> been
> > > > >>completed by the time activate_path work is called, kernel panic was
> > > seen on
> > > > >>accessing multipath members.
> > > > >
> > > > >Your locking looks suspect to me, see comment inlined below
> > > multipath_dtr
> > > > >
> > > > >But shouldn't we just train multipath_wait_for_pg_init_completion()
> to
> > > > >look at m->pg_init_required? Have it wait for both pg_init_required
> and
> > > > >pg_init_in_progress to be zero? We'd also have to audit that
> > > > >pg_init_required cannot be set while pg_init_in_progress.
> > > > >
> > > > Hmm.
> > > >
> > > > We _could_ try to resolve it by pushing I/O back onto the request
> queue
> > > > (cf my earlier post 'requeue I/O during pg_init').
> > > >
> > > > I was hoping to excite some comments with that, but seems to be my
> > > > fate nowadays to send out patches with no reply.
> > >
> > > patchwork caught it:
> > > https://patchwork.kernel.org/patch/2969111/
> > >
> > > I've just been distracted with other stuff the past week; but I'll be
> > > looking closer at this issue (and your earlier patch) shortly and we'll
> > > get a fix queued for 3.13.
> > >
> > > > Anyway, maybe this will be giving it some more attention.
> > > > It definitely would avoid this problem, by virtue of not having to
> > > > queue I/O internally during pg_init, so we could easily tear down
> > > > the queue.
> > >
> > > Sounds good.
> >
> > Thanks for your comments. I agree we should lock while setting
> dtr_in_progress, I think I overlooked it as its handled in process_queued_ios
> as well.
> > We looked into handling this in wait_for_pg_init_completion() but checking
> for pg_init_required here will not help as well ( until we prevent setting
> pg_init_required while pg_init_in_progress is set ).
> > Here due to SCSI_DH_RETRY on mode_select, pg_init_done will set the
> pg_init_required as activation needs to be retried under normal
> > circumstances. But it completely differs when multipath target is being
> destroyed. Apparently I didn't see any pending_ios in our test
> > while this is happening. Just path activations are held up since controller
> was returning 5/91/36 CC's. With this condition either one of
> pg_init_required or pg_init_in_progress flags are set all the time.
> > Hannes patch will take care of preventing queueing of IO's when
> pg_init_in_progress is set, but currently running activation commands will
> not return until controller returns SUCCESS on mode_select.
>
> So... do you have an updated patch with proper locking that takes
> Hannes' patch into consideration?
>
> I'll be reviewing hannes patch closer (for v3.13) tomorrow so if you'd
> like your issue resolved in the near-term I'd appreciate us getting some
> closer on proposed solutions.
>
> Thanks,
> Mike
I have re-submitted my patch with locking in multipath_dtr. I revisited Hannes patch and I think
we still have this issue when multipath is destroyed and path activation work is queued. This is with
the scenario of scsi_dh_rdac returning SCSI_DH_RETRY on mode_select due to 5/91/36 CC's.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-30 13:32 ` Merla, ShivaKrishna
@ 2013-10-30 13:42 ` Mike Snitzer
2013-10-30 13:44 ` Merla, ShivaKrishna
0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2013-10-30 13:42 UTC (permalink / raw)
To: Merla, ShivaKrishna; +Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
On Wed, Oct 30 2013 at 9:32am -0400,
Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
>
>
> > -----Original Message-----
> > From: Mike Snitzer [mailto:snitzer@redhat.com]
> > Sent: Tuesday, October 29, 2013 7:57 PM
> > To: Merla, ShivaKrishna
> > Cc: Hannes Reinecke; dm-devel@redhat.com; agk@redhat.com; Mikulas
> > Patocka
> > Subject: Re: [PATCH]dm-mpath: fix for race condition between
> > multipath_dtr and pg_init_done.
...
> >
> > So... do you have an updated patch with proper locking that takes
> > Hannes' patch into consideration?
> >
> > I'll be reviewing hannes patch closer (for v3.13) tomorrow so if you'd
> > like your issue resolved in the near-term I'd appreciate us getting some
> > closer on proposed solutions.
> >
> > Thanks,
> > Mike
>
> I have re-submitted my patch with locking in multipath_dtr. I
> revisited Hannes patch and I think we still have this issue when
> multipath is destroyed and path activation work is queued. This is
> with the scenario of scsi_dh_rdac returning SCSI_DH_RETRY on
> mode_select due to 5/91/36 CC's.
OK, I'll work toward including both patches.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-30 13:42 ` Mike Snitzer
@ 2013-10-30 13:44 ` Merla, ShivaKrishna
0 siblings, 0 replies; 10+ messages in thread
From: Merla, ShivaKrishna @ 2013-10-30 13:44 UTC (permalink / raw)
To: Mike Snitzer; +Cc: dm-devel@redhat.com, Mikulas Patocka, agk@redhat.com
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: Wednesday, October 30, 2013 8:42 AM
> To: Merla, ShivaKrishna
> Cc: Hannes Reinecke; dm-devel@redhat.com; agk@redhat.com; Mikulas
> Patocka
> Subject: Re: [PATCH]dm-mpath: fix for race condition between
> multipath_dtr and pg_init_done.
>
> On Wed, Oct 30 2013 at 9:32am -0400,
> Merla, ShivaKrishna <ShivaKrishna.Merla@netapp.com> wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Mike Snitzer [mailto:snitzer@redhat.com]
> > > Sent: Tuesday, October 29, 2013 7:57 PM
> > > To: Merla, ShivaKrishna
> > > Cc: Hannes Reinecke; dm-devel@redhat.com; agk@redhat.com; Mikulas
> > > Patocka
> > > Subject: Re: [PATCH]dm-mpath: fix for race condition between
> > > multipath_dtr and pg_init_done.
> ...
> > >
> > > So... do you have an updated patch with proper locking that takes
> > > Hannes' patch into consideration?
> > >
> > > I'll be reviewing hannes patch closer (for v3.13) tomorrow so if you'd
> > > like your issue resolved in the near-term I'd appreciate us getting some
> > > closer on proposed solutions.
> > >
> > > Thanks,
> > > Mike
> >
> > I have re-submitted my patch with locking in multipath_dtr. I
> > revisited Hannes patch and I think we still have this issue when
> > multipath is destroyed and path activation work is queued. This is
> > with the scenario of scsi_dh_rdac returning SCSI_DH_RETRY on
> > mode_select due to 5/91/36 CC's.
>
> OK, I'll work toward including both patches.
Thanks Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.
2013-10-17 22:03 ` Merla, ShivaKrishna
2013-10-30 0:57 ` Mike Snitzer
@ 2013-11-01 18:21 ` Mikulas Patocka
1 sibling, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2013-11-01 18:21 UTC (permalink / raw)
To: Merla, ShivaKrishna; +Cc: dm-devel@redhat.com, agk@redhat.com, Mike Snitzer
> Thanks for your comments. I agree we should lock while setting
> dtr_in_progress,
You can use spinlock when you read or write the variable.
Or - without locking, you can use memory barriers - put smp_mb() after you
write dtr_in_progress and before you read it. Use
ACCESS_ONCE(m->dtr_in_progress) in the if statement where you read
dtr_in_progress.
Mikulas
> I think I overlooked it as its handled in process_queued_ios as well. We
> looked into handling this in wait_for_pg_init_completion() but checking
> for pg_init_required here will not help as well ( until we prevent
> setting pg_init_required while pg_init_in_progress is set ). Here due to
> SCSI_DH_RETRY on mode_select, pg_init_done will set the pg_init_required
> as activation needs to be retried under normal circumstances. But it
> completely differs when multipath target is being destroyed.
> Apparently I didn't see any pending_ios in our test while this is
> happening. Just path activations are held up since controller was
> returning 5/91/36 CC's. With this condition either one of
> pg_init_required or pg_init_in_progress flags are set all the time.
> Hannes patch will take care of preventing queueing of IO's when
> pg_init_in_progress is set, but currently running activation commands
> will not return until controller returns SUCCESS on mode_select.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-11-01 18:21 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-17 17:31 [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done Merla, ShivaKrishna
2013-10-17 18:53 ` Mike Snitzer
2013-10-17 21:47 ` Hannes Reinecke
2013-10-17 21:10 ` Mike Snitzer
2013-10-17 22:03 ` Merla, ShivaKrishna
2013-10-30 0:57 ` Mike Snitzer
2013-10-30 13:32 ` Merla, ShivaKrishna
2013-10-30 13:42 ` Mike Snitzer
2013-10-30 13:44 ` Merla, ShivaKrishna
2013-11-01 18:21 ` Mikulas Patocka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.