[bug report] rbd unmap hangs after pausing and unpausing I/O

ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [bug report] rbd unmap hangs after pausing and unpausing I/O
@ 2025-09-23 10:38 Raphael Zimmer
  2025-09-23 17:42 ` Viacheslav Dubeyko
  2025-09-23 18:33 ` Ilya Dryomov
  0 siblings, 2 replies; 6+ messages in thread
From: Raphael Zimmer @ 2025-09-23 10:38 UTC (permalink / raw)
  To: Ilya Dryomov, Xiubo Li; +Cc: ceph-devel

Hello,

I encountered an error with the kernel Ceph client (specifically using 
an RBD device) when pausing I/O on the cluster by setting and unsetting 
pauserd and pausewr flags. An error was seen with two different setups, 
which I believe is due to the same problem.

1) When pausing and later unpausing I/O on the cluster, everything seems 
to work as expected until trying to unmap an RBD device from the kernel. 
In this case, the rbd unmap command hangs and also can't be killed. To 
get back to a normally working state, a system reboot is needed. This 
behavior was observed on different systems (Debian 12 and 13) and could 
also be reproduced with an installation of the mainline kernel (v6.17-rc6).

Steps to reproduce:
- Connect kernel client to RBD device (rbd map)
- Pause I/O on cluster (ceph osd pause)
- Wait some time (3 minutes should be enough)
- Unpause I/O on cluster
- Try to unmap RBD device on client

2) When using an application that internally uses the kernel Ceph client 
code, I observed the following behavior:

Pausing I/O leads to a watch error after some time (same as with failing 
OSDs or e.g. when pool quota is reached). In rbd_watch_errcb 
(drivers/block/rbd.c), the watch_dwork gets scheduled, which leads to a 
call of rbd_reregister_watch -> __rbd_register_watch -> ceph_osdc_watch 
(net/ceph/osd_client.c) -> linger_reg_commit_wait -> 
wait_for_completion_killable. At this point, it waits without any 
timeout for the completion. The normal behavior is to wait until the 
causing condition is resolved and then return. With pausing and 
unpausing I/O, wait_for_completion_killable does not return even after 
unpausing because no call to complete or complete_all happens. I would 
guess that on unpausing some call is missing so that committing the 
linger request never completes.

 From what I am seeing, it seems like this missing completion in the 
second case is also the cause of the hanging rbd unmap with the 
unmodified kernel.

Best regards,

Raphael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re:  [bug report] rbd unmap hangs after pausing and unpausing I/O
  2025-09-23 10:38 [bug report] rbd unmap hangs after pausing and unpausing I/O Raphael Zimmer
@ 2025-09-23 17:42 ` Viacheslav Dubeyko
  2025-09-24 11:51   ` Raphael Zimmer
  2025-09-23 18:33 ` Ilya Dryomov
  1 sibling, 1 reply; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-09-23 17:42 UTC (permalink / raw)
  To: raphael.zimmer@tu-ilmenau.de, idryomov@gmail.com, Xiubo Li
  Cc: ceph-devel@vger.kernel.org

Hi Raphael,

On Tue, 2025-09-23 at 12:38 +0200, Raphael Zimmer wrote:
> Hello,
> 
> I encountered an error with the kernel Ceph client (specifically using 
> an RBD device) when pausing I/O on the cluster by setting and unsetting 
> pauserd and pausewr flags. An error was seen with two different setups, 
> which I believe is due to the same problem.
> 

Thanks a lot for the report. Could you please create the ticket in a tracker
system [1]?

> 1) When pausing and later unpausing I/O on the cluster, everything seems 
> to work as expected until trying to unmap an RBD device from the kernel. 
> In this case, the rbd unmap command hangs and also can't be killed. To 
> get back to a normally working state, a system reboot is needed. This 
> behavior was observed on different systems (Debian 12 and 13) and could 
> also be reproduced with an installation of the mainline kernel (v6.17-rc6).
> 
> Steps to reproduce:
> - Connect kernel client to RBD device (rbd map)
> - Pause I/O on cluster (ceph osd pause)
> - Wait some time (3 minutes should be enough)
> - Unpause I/O on cluster
> - Try to unmap RBD device on client
> 

Do you have a script? Could you please share the sequence of commands that you
used in command line to reproduce the issue?

Have you created any folders/files before pause/unpause the I/O requests on
cluster?
How have you initiated the I/O operations before pausing the I/O requests on
cluster?
Have you observed any warnings, call traces, or crashes from CephFS kernel
client in system log when rbd unmap command hangs (usually, kernel complains if
something is hanging significant amount of time)?

Thanks,
Slava.

> 
> 2) When using an application that internally uses the kernel Ceph client 
> code, I observed the following behavior:
> 
> Pausing I/O leads to a watch error after some time (same as with failing 
> OSDs or e.g. when pool quota is reached). In rbd_watch_errcb 
> (drivers/block/rbd.c), the watch_dwork gets scheduled, which leads to a 
> call of rbd_reregister_watch -> __rbd_register_watch -> ceph_osdc_watch 
> (net/ceph/osd_client.c) -> linger_reg_commit_wait -> 
> wait_for_completion_killable. At this point, it waits without any 
> timeout for the completion. The normal behavior is to wait until the 
> causing condition is resolved and then return. With pausing and 
> unpausing I/O, wait_for_completion_killable does not return even after 
> unpausing because no call to complete or complete_all happens. I would 
> guess that on unpausing some call is missing so that committing the 
> linger request never completes.
> 
>  From what I am seeing, it seems like this missing completion in the 
> second case is also the cause of the hanging rbd unmap with the 
> unmodified kernel.
> 
> 
> Best regards,
> 
> Raphael

[1] https://tracker.ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug report] rbd unmap hangs after pausing and unpausing I/O
  2025-09-23 17:42 ` Viacheslav Dubeyko
@ 2025-09-24 11:51   ` Raphael Zimmer
  2025-09-24 17:49     ` Viacheslav Dubeyko
  0 siblings, 1 reply; 6+ messages in thread
From: Raphael Zimmer @ 2025-09-24 11:51 UTC (permalink / raw)
  To: Viacheslav Dubeyko, idryomov@gmail.com, Xiubo Li
  Cc: ceph-devel@vger.kernel.org


On 23.09.25 19:42, Viacheslav Dubeyko wrote:
> Hi Raphael,
> 
> On Tue, 2025-09-23 at 12:38 +0200, Raphael Zimmer wrote:
>> Hello,
>>
>> I encountered an error with the kernel Ceph client (specifically using
>> an RBD device) when pausing I/O on the cluster by setting and unsetting
>> pauserd and pausewr flags. An error was seen with two different setups,
>> which I believe is due to the same problem.
>>
> 
> Thanks a lot for the report. Could you please create the ticket in a tracker
> system [1]?
> 
>> 1) When pausing and later unpausing I/O on the cluster, everything seems
>> to work as expected until trying to unmap an RBD device from the kernel.
>> In this case, the rbd unmap command hangs and also can't be killed. To
>> get back to a normally working state, a system reboot is needed. This
>> behavior was observed on different systems (Debian 12 and 13) and could
>> also be reproduced with an installation of the mainline kernel (v6.17-rc6).
>>
>> Steps to reproduce:
>> - Connect kernel client to RBD device (rbd map)
>> - Pause I/O on cluster (ceph osd pause)
>> - Wait some time (3 minutes should be enough)
>> - Unpause I/O on cluster
>> - Try to unmap RBD device on client
>>
> 
> Do you have a script? Could you please share the sequence of commands that you
> used in command line to reproduce the issue?
> 
> Have you created any folders/files before pause/unpause the I/O requests on
> cluster?
> How have you initiated the I/O operations before pausing the I/O requests on
> cluster?
> Have you observed any warnings, call traces, or crashes from CephFS kernel
> client in system log when rbd unmap command hangs (usually, kernel complains if
> something is hanging significant amount of time)?
> 
> Thanks,
> Slava.
> 

Hi Slava,

I haven't used CephFS. Only an RBD image.
The behavior is completely independent of whether I initiate any I/O 
operations on the RBD image or not. You can reproduce the behavior by 
following the exact steps from above:
- rbd map <image-name> for an arbitrary image on the client host
- ceph osd pause on the cluster
- after some time: ceph osd unpause on the cluster
- rbd unmap <image-name> on the client host
You don't need to do anything else in between.

Since Ilya has already identified the issue and will attempt to fix it, 
do you still want me to create the ticket in the tracker system?

Best regards,
Raphael

>>
>> 2) When using an application that internally uses the kernel Ceph client
>> code, I observed the following behavior:
>>
>> Pausing I/O leads to a watch error after some time (same as with failing
>> OSDs or e.g. when pool quota is reached). In rbd_watch_errcb
>> (drivers/block/rbd.c), the watch_dwork gets scheduled, which leads to a
>> call of rbd_reregister_watch -> __rbd_register_watch -> ceph_osdc_watch
>> (net/ceph/osd_client.c) -> linger_reg_commit_wait ->
>> wait_for_completion_killable. At this point, it waits without any
>> timeout for the completion. The normal behavior is to wait until the
>> causing condition is resolved and then return. With pausing and
>> unpausing I/O, wait_for_completion_killable does not return even after
>> unpausing because no call to complete or complete_all happens. I would
>> guess that on unpausing some call is missing so that committing the
>> linger request never completes.
>>
>>   From what I am seeing, it seems like this missing completion in the
>> second case is also the cause of the hanging rbd unmap with the
>> unmodified kernel.
>>
>>
>> Best regards,
>>
>> Raphael
> 
> [1] https://tracker.ceph.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [bug report] rbd unmap hangs after pausing and unpausing I/O
  2025-09-24 11:51   ` Raphael Zimmer
@ 2025-09-24 17:49     ` Viacheslav Dubeyko
  0 siblings, 0 replies; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-09-24 17:49 UTC (permalink / raw)
  To: raphael.zimmer@tu-ilmenau.de, idryomov@gmail.com, Xiubo Li
  Cc: ceph-devel@vger.kernel.org

On Wed, 2025-09-24 at 13:51 +0200, Raphael Zimmer wrote:
> On 23.09.25 19:42, Viacheslav Dubeyko wrote:
> > Hi Raphael,
> > 
> > On Tue, 2025-09-23 at 12:38 +0200, Raphael Zimmer wrote:
> > > Hello,
> > > 
> > > I encountered an error with the kernel Ceph client (specifically using
> > > an RBD device) when pausing I/O on the cluster by setting and unsetting
> > > pauserd and pausewr flags. An error was seen with two different setups,
> > > which I believe is due to the same problem.
> > > 
> > 
> > Thanks a lot for the report. Could you please create the ticket in a tracker
> > system [1]?
> > 
> > > 1) When pausing and later unpausing I/O on the cluster, everything seems
> > > to work as expected until trying to unmap an RBD device from the kernel.
> > > In this case, the rbd unmap command hangs and also can't be killed. To
> > > get back to a normally working state, a system reboot is needed. This
> > > behavior was observed on different systems (Debian 12 and 13) and could
> > > also be reproduced with an installation of the mainline kernel (v6.17-rc6).
> > > 
> > > Steps to reproduce:
> > > - Connect kernel client to RBD device (rbd map)
> > > - Pause I/O on cluster (ceph osd pause)
> > > - Wait some time (3 minutes should be enough)
> > > - Unpause I/O on cluster
> > > - Try to unmap RBD device on client
> > > 
> > 
> > Do you have a script? Could you please share the sequence of commands that you
> > used in command line to reproduce the issue?
> > 
> > Have you created any folders/files before pause/unpause the I/O requests on
> > cluster?
> > How have you initiated the I/O operations before pausing the I/O requests on
> > cluster?
> > Have you observed any warnings, call traces, or crashes from CephFS kernel
> > client in system log when rbd unmap command hangs (usually, kernel complains if
> > something is hanging significant amount of time)?
> > 
> > Thanks,
> > Slava.
> > 
> 
> Hi Slava,
> 
> I haven't used CephFS. Only an RBD image.
> The behavior is completely independent of whether I initiate any I/O 
> operations on the RBD image or not. You can reproduce the behavior by 
> following the exact steps from above:
> - rbd map <image-name> for an arbitrary image on the client host
> - ceph osd pause on the cluster
> - after some time: ceph osd unpause on the cluster
> - rbd unmap <image-name> on the client host
> You don't need to do anything else in between.
> 
> Since Ilya has already identified the issue and will attempt to fix it, 
> do you still want me to create the ticket in the tracker system?
> 

Hi Raphael,

I believe it will be better to create the ticket, anyway. Because, we need to
gather all known issues in the tracker system. Potentially, we can receive a
similar report with the same issue in the future and it will be great to have
the ticket with clear explanation of symptoms and reproducing path. You can
assign the ticket directly on Ilya. He will be able to close the ticket after
the fix. 

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug report] rbd unmap hangs after pausing and unpausing I/O
  2025-09-23 10:38 [bug report] rbd unmap hangs after pausing and unpausing I/O Raphael Zimmer
  2025-09-23 17:42 ` Viacheslav Dubeyko
@ 2025-09-23 18:33 ` Ilya Dryomov
  2025-09-24 12:05   ` Raphael Zimmer
  1 sibling, 1 reply; 6+ messages in thread
From: Ilya Dryomov @ 2025-09-23 18:33 UTC (permalink / raw)
  To: Raphael Zimmer; +Cc: Xiubo Li, ceph-devel

On Tue, Sep 23, 2025 at 12:38 PM Raphael Zimmer
<raphael.zimmer@tu-ilmenau.de> wrote:
>
> Hello,
>
> I encountered an error with the kernel Ceph client (specifically using
> an RBD device) when pausing I/O on the cluster by setting and unsetting
> pauserd and pausewr flags. An error was seen with two different setups,
> which I believe is due to the same problem.

Hi Raphael,

What is your use case for applying pauserd and pausewr?  I'm curious
because it's not something that I have seen used in normal operation
and most Ceph users probably aren't even aware of these flags.

>
> 1) When pausing and later unpausing I/O on the cluster, everything seems
> to work as expected until trying to unmap an RBD device from the kernel.
> In this case, the rbd unmap command hangs and also can't be killed. To
> get back to a normally working state, a system reboot is needed. This
> behavior was observed on different systems (Debian 12 and 13) and could
> also be reproduced with an installation of the mainline kernel (v6.17-rc6).
>
> Steps to reproduce:
> - Connect kernel client to RBD device (rbd map)
> - Pause I/O on cluster (ceph osd pause)
> - Wait some time (3 minutes should be enough)
> - Unpause I/O on cluster
> - Try to unmap RBD device on client
>
>
> 2) When using an application that internally uses the kernel Ceph client
> code, I observed the following behavior:
>
> Pausing I/O leads to a watch error after some time (same as with failing
> OSDs or e.g. when pool quota is reached). In rbd_watch_errcb
> (drivers/block/rbd.c), the watch_dwork gets scheduled, which leads to a
> call of rbd_reregister_watch -> __rbd_register_watch -> ceph_osdc_watch
> (net/ceph/osd_client.c) -> linger_reg_commit_wait ->
> wait_for_completion_killable. At this point, it waits without any
> timeout for the completion. The normal behavior is to wait until the
> causing condition is resolved and then return. With pausing and
> unpausing I/O, wait_for_completion_killable does not return even after
> unpausing because no call to complete or complete_all happens. I would
> guess that on unpausing some call is missing so that committing the
> linger request never completes.
>
>  From what I am seeing, it seems like this missing completion in the
> second case is also the cause of the hanging rbd unmap with the
> unmodified kernel.

You are pretty close ;)  The completion is indeed missing, but it's
more of a side effect than the root cause.  The root cause is that the
watch request doesn't get resubmitted on paused -> unpaused transitions
like it happens on e.g. full -> no-longer-full transitions -- the logic
around forming need_resend_linger list isn't quite right.  I'll try to
put together a fix in the coming days.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug report] rbd unmap hangs after pausing and unpausing I/O
  2025-09-23 18:33 ` Ilya Dryomov
@ 2025-09-24 12:05   ` Raphael Zimmer
  0 siblings, 0 replies; 6+ messages in thread
From: Raphael Zimmer @ 2025-09-24 12:05 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Xiubo Li, ceph-devel

On 23.09.25 20:33, Ilya Dryomov wrote:
> On Tue, Sep 23, 2025 at 12:38 PM Raphael Zimmer
> <raphael.zimmer@tu-ilmenau.de> wrote:
>>
>> Hello,
>>
>> I encountered an error with the kernel Ceph client (specifically using
>> an RBD device) when pausing I/O on the cluster by setting and unsetting
>> pauserd and pausewr flags. An error was seen with two different setups,
>> which I believe is due to the same problem.
> 
> Hi Raphael,
> 
> What is your use case for applying pauserd and pausewr?  I'm curious
> because it's not something that I have seen used in normal operation
> and most Ceph users probably aren't even aware of these flags.
> 

Hi Ilya,
I was doing some robustness tests with an application and a debug 
cluster trying out various cluster operations/configurations, when I 
discovered the bug.

But I heard from a colleague about a real use case where the same issue 
was also observed. They used it in a data center to temporarily stop all 
I/O during a major cluster maintenance.

>>
>> 1) When pausing and later unpausing I/O on the cluster, everything seems
>> to work as expected until trying to unmap an RBD device from the kernel.
>> In this case, the rbd unmap command hangs and also can't be killed. To
>> get back to a normally working state, a system reboot is needed. This
>> behavior was observed on different systems (Debian 12 and 13) and could
>> also be reproduced with an installation of the mainline kernel (v6.17-rc6).
>>
>> Steps to reproduce:
>> - Connect kernel client to RBD device (rbd map)
>> - Pause I/O on cluster (ceph osd pause)
>> - Wait some time (3 minutes should be enough)
>> - Unpause I/O on cluster
>> - Try to unmap RBD device on client
>>
>>
>> 2) When using an application that internally uses the kernel Ceph client
>> code, I observed the following behavior:
>>
>> Pausing I/O leads to a watch error after some time (same as with failing
>> OSDs or e.g. when pool quota is reached). In rbd_watch_errcb
>> (drivers/block/rbd.c), the watch_dwork gets scheduled, which leads to a
>> call of rbd_reregister_watch -> __rbd_register_watch -> ceph_osdc_watch
>> (net/ceph/osd_client.c) -> linger_reg_commit_wait ->
>> wait_for_completion_killable. At this point, it waits without any
>> timeout for the completion. The normal behavior is to wait until the
>> causing condition is resolved and then return. With pausing and
>> unpausing I/O, wait_for_completion_killable does not return even after
>> unpausing because no call to complete or complete_all happens. I would
>> guess that on unpausing some call is missing so that committing the
>> linger request never completes.
>>
>>   From what I am seeing, it seems like this missing completion in the
>> second case is also the cause of the hanging rbd unmap with the
>> unmodified kernel.
> 
> You are pretty close ;)  The completion is indeed missing, but it's
> more of a side effect than the root cause.  The root cause is that the
> watch request doesn't get resubmitted on paused -> unpaused transitions
> like it happens on e.g. full -> no-longer-full transitions -- the logic
> around forming need_resend_linger list isn't quite right.  I'll try to
> put together a fix in the coming days.
> 
> Thanks,
> 
>                  Ilya

Thanks for looking into it. That's pretty much what I thought was going on.

Best regards,
Raphael

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-24 17:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 10:38 [bug report] rbd unmap hangs after pausing and unpausing I/O Raphael Zimmer
2025-09-23 17:42 ` Viacheslav Dubeyko
2025-09-24 11:51   ` Raphael Zimmer
2025-09-24 17:49     ` Viacheslav Dubeyko
2025-09-23 18:33 ` Ilya Dryomov
2025-09-24 12:05   ` Raphael Zimmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).