All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted
@ 2010-01-19 20:40 Takahiro Yasui
  2010-01-20  2:47 ` Kiyoshi Ueda
  0 siblings, 1 reply; 5+ messages in thread
From: Takahiro Yasui @ 2010-01-19 20:40 UTC (permalink / raw)
  To: device-mapper development

Hi,

This is a patch set to fix deadlock on suspending of mirror device.


ISSUE
=====

Suspend procedure on a dm-mirror device could cause deadlock on recovery_count
semaphore.

When mirror_presuspend is called, recovery_count semaphore is acquired in
dm_rh_stop_recovery() to stop recovery routine, but when an signal is caught
in dm_wait_for_completion() or an error occurred in in dm_suspend(),
the suspend process is interrupted without releasing recovery_count semaphore
of a mirror device. This means that another suspend is executed, and then
the suspend process gets stuck at dm_rh_stop_recovery().

When suspend procedure is interrupted, the device should work properly since
the status of the device is not "suspended."


SOLUTION
========

Introduce a target handler, cancel_presuspend, to cancel status changes
done by a target specific presuspend handler.


PATCH SET
=========
    1/3: dm: introduce cancel_presuspend framework
    2/3: dm-raid1: add cancel_presuspend function
    3/3: dm-delay: add cancel_presuspend function


I appreciate your comments.

Thanks,
-- 
Takahiro Yasui
Hitachi Computer Products (America), Inc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted
  2010-01-19 20:40 [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted Takahiro Yasui
@ 2010-01-20  2:47 ` Kiyoshi Ueda
  2010-01-20 22:58   ` Takahiro Yasui
  0 siblings, 1 reply; 5+ messages in thread
From: Kiyoshi Ueda @ 2010-01-20  2:47 UTC (permalink / raw)
  To: device-mapper development

Hi Yasui-san,

On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
> Hi,
> 
> This is a patch set to fix deadlock on suspending of mirror device.
> 
> 
> ISSUE
> =====
> 
> Suspend procedure on a dm-mirror device could cause deadlock on recovery_count
> semaphore.
> 
> When mirror_presuspend is called, recovery_count semaphore is acquired in
> dm_rh_stop_recovery() to stop recovery routine, but when an signal is caught
> in dm_wait_for_completion() or an error occurred in in dm_suspend(),
> the suspend process is interrupted without releasing recovery_count semaphore
> of a mirror device. This means that another suspend is executed, and then
> the suspend process gets stuck at dm_rh_stop_recovery().
> 
> When suspend procedure is interrupted, the device should work properly since
> the status of the device is not "suspended."
> 
> 
> SOLUTION
> ========
> 
> Introduce a target handler, cancel_presuspend, to cancel status changes
> done by a target specific presuspend handler.

How about using ->resume as a cancelling method?
Though you have to audit existing targets' ->resume handler,
I think it's better idea than adding another target handler
just for this purpose.

And in your dm-raid1 patch, cancelling log's presuspend which is used
by dm-log-userspace is missed.
So it seems that dm-raid1 can use ->resume to cancel presuspend.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted
  2010-01-20  2:47 ` Kiyoshi Ueda
@ 2010-01-20 22:58   ` Takahiro Yasui
  2010-01-21  9:20     ` Kiyoshi Ueda
  0 siblings, 1 reply; 5+ messages in thread
From: Takahiro Yasui @ 2010-01-20 22:58 UTC (permalink / raw)
  To: k-ueda; +Cc: device-mapper development

Hi Ueda-san,

Kiyoshi Ueda wrote:
> On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
>> Hi,
>>
>> This is a patch set to fix deadlock on suspending of mirror device.
>>
>>
>> ISSUE
>> =====
>>
>> Suspend procedure on a dm-mirror device could cause deadlock on recovery_count
>> semaphore.
>>
>> When mirror_presuspend is called, recovery_count semaphore is acquired in
>> dm_rh_stop_recovery() to stop recovery routine, but when an signal is caught
>> in dm_wait_for_completion() or an error occurred in in dm_suspend(),
>> the suspend process is interrupted without releasing recovery_count semaphore
>> of a mirror device. This means that another suspend is executed, and then
>> the suspend process gets stuck at dm_rh_stop_recovery().
>>
>> When suspend procedure is interrupted, the device should work properly since
>> the status of the device is not "suspended."
>>
>>
>> SOLUTION
>> ========
>>
>> Introduce a target handler, cancel_presuspend, to cancel status changes
>> done by a target specific presuspend handler.
> 
> How about using ->resume as a cancelling method?
> Though you have to audit existing targets' ->resume handler,
> I think it's better idea than adding another target handler
> just for this purpose.

A resume method contains a whole resume procedure, but when suspend is
interrupted, postsuspend handler is not processed. So the requirements
are to restore state changes done by presuspend handler. If a whole
resume procedure is executed, at least, dm-log will have a problem.

mirror log is flushed in postsuspend handler and log disk might contain
stale data at the moment when suspend is interrupted. If resume handler
is used instead of cancel_presuspend handler, log data on memory will be
overwritten by stale data on disk.

I'm afraid that we need to modify each target's resume handler so that
they work properly even after processing presuspend handler but before
postsuspend handler.

Please let me know if there is some oversight.

> And in your dm-raid1 patch, cancelling log's presuspend which is used
> by dm-log-userspace is missed.

Thank you for telling this. Yes, userspace target should be also handled.
I will fix it.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted
  2010-01-20 22:58   ` Takahiro Yasui
@ 2010-01-21  9:20     ` Kiyoshi Ueda
  2010-01-22  6:16       ` Takahiro Yasui
  0 siblings, 1 reply; 5+ messages in thread
From: Kiyoshi Ueda @ 2010-01-21  9:20 UTC (permalink / raw)
  To: device-mapper development

Hi Yasui-san,

On 01/21/2010 07:58 AM +0900, Takahiro Yasui wrote:
> Kiyoshi Ueda wrote:
>> On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
>>> Introduce a target handler, cancel_presuspend, to cancel status changes
>>> done by a target specific presuspend handler.
>>
>> How about using ->resume as a cancelling method?
>> Though you have to audit existing targets' ->resume handler,
>> I think it's better idea than adding another target handler
>> just for this purpose.
>
<snip>
> 
> I'm afraid that we need to modify each target's resume handler so that
> they work properly even after processing presuspend handler but before
> postsuspend handler.
> 
> Please let me know if there is some oversight.

There is no oversight.
Perhaps I should have said 'audit (and modify if necessary)'.

If auditing and modifying all targets are difficult, I don't object
your approach which adds ->cancel_presuspend.
But ->cancel_presuspend should be a subset of ->resume and it should
make some code duplication.  If ->resume can always restore target state
correctly, it may avoid such duplication.

Althrough I'm not sure the following idea is really reasonable,
I think it should be worth to consider about it:
  When ->postsuspend is called and the device is really suspended,
  DMF_SUSPENDED flag is set in md->flags.
  So targets' ->resume handler can use it to check ->postsuspend
  had been called or not.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted
  2010-01-21  9:20     ` Kiyoshi Ueda
@ 2010-01-22  6:16       ` Takahiro Yasui
  0 siblings, 0 replies; 5+ messages in thread
From: Takahiro Yasui @ 2010-01-22  6:16 UTC (permalink / raw)
  To: k-ueda; +Cc: device-mapper development

Hi Ueda-san,

Kiyoshi Ueda wrote:
> On 01/21/2010 07:58 AM +0900, Takahiro Yasui wrote:
>> Kiyoshi Ueda wrote:
>>> On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
>>>> Introduce a target handler, cancel_presuspend, to cancel status changes
>>>> done by a target specific presuspend handler.
>>> How about using ->resume as a cancelling method?
>>> Though you have to audit existing targets' ->resume handler,
>>> I think it's better idea than adding another target handler
>>> just for this purpose.
> <snip>
>> I'm afraid that we need to modify each target's resume handler so that
>> they work properly even after processing presuspend handler but before
>> postsuspend handler.
>>
>> Please let me know if there is some oversight.
> 
> There is no oversight.
> Perhaps I should have said 'audit (and modify if necessary)'.
> 
> If auditing and modifying all targets are difficult, I don't object
> your approach which adds ->cancel_presuspend.
> But ->cancel_presuspend should be a subset of ->resume and it should
> make some code duplication.  If ->resume can always restore target state
> correctly, it may avoid such duplication.
> 
> Althrough I'm not sure the following idea is really reasonable,
> I think it should be worth to consider about it:
>   When ->postsuspend is called and the device is really suspended,
>   DMF_SUSPENDED flag is set in md->flags.
>   So targets' ->resume handler can use it to check ->postsuspend
>   had been called or not.

Hmm. It looks good idea.

By exporting dm_suspended(), targets running in kernel space, can get
the current status and choose actions to cancel state changes done
in presuspend(). One execption is dm-log-userspace. It calls user space
functions. So other solution is needed.

I will try to update this patch set according to your suggestions.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-22  6:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-19 20:40 [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted Takahiro Yasui
2010-01-20  2:47 ` Kiyoshi Ueda
2010-01-20 22:58   ` Takahiro Yasui
2010-01-21  9:20     ` Kiyoshi Ueda
2010-01-22  6:16       ` Takahiro Yasui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.