ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "raphael.zimmer@tu-ilmenau.de" <raphael.zimmer@tu-ilmenau.de>,
	"idryomov@gmail.com" <idryomov@gmail.com>,
	Xiubo Li <xiubli@redhat.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re:  [bug report] rbd unmap hangs after pausing and unpausing I/O
Date: Tue, 23 Sep 2025 17:42:24 +0000	[thread overview]
Message-ID: <b94f8a4b7d70a9fd3603a1cfcb6a708cf6bd44b9.camel@ibm.com> (raw)
In-Reply-To: <36681e9d-fde6-4c5d-bf35-db9d85865900@tu-ilmenau.de>

Hi Raphael,

On Tue, 2025-09-23 at 12:38 +0200, Raphael Zimmer wrote:
> Hello,
> 
> I encountered an error with the kernel Ceph client (specifically using 
> an RBD device) when pausing I/O on the cluster by setting and unsetting 
> pauserd and pausewr flags. An error was seen with two different setups, 
> which I believe is due to the same problem.
> 

Thanks a lot for the report. Could you please create the ticket in a tracker
system [1]?

> 1) When pausing and later unpausing I/O on the cluster, everything seems 
> to work as expected until trying to unmap an RBD device from the kernel. 
> In this case, the rbd unmap command hangs and also can't be killed. To 
> get back to a normally working state, a system reboot is needed. This 
> behavior was observed on different systems (Debian 12 and 13) and could 
> also be reproduced with an installation of the mainline kernel (v6.17-rc6).
> 
> Steps to reproduce:
> - Connect kernel client to RBD device (rbd map)
> - Pause I/O on cluster (ceph osd pause)
> - Wait some time (3 minutes should be enough)
> - Unpause I/O on cluster
> - Try to unmap RBD device on client
> 

Do you have a script? Could you please share the sequence of commands that you
used in command line to reproduce the issue?

Have you created any folders/files before pause/unpause the I/O requests on
cluster?
How have you initiated the I/O operations before pausing the I/O requests on
cluster?
Have you observed any warnings, call traces, or crashes from CephFS kernel
client in system log when rbd unmap command hangs (usually, kernel complains if
something is hanging significant amount of time)?

Thanks,
Slava.

> 
> 2) When using an application that internally uses the kernel Ceph client 
> code, I observed the following behavior:
> 
> Pausing I/O leads to a watch error after some time (same as with failing 
> OSDs or e.g. when pool quota is reached). In rbd_watch_errcb 
> (drivers/block/rbd.c), the watch_dwork gets scheduled, which leads to a 
> call of rbd_reregister_watch -> __rbd_register_watch -> ceph_osdc_watch 
> (net/ceph/osd_client.c) -> linger_reg_commit_wait -> 
> wait_for_completion_killable. At this point, it waits without any 
> timeout for the completion. The normal behavior is to wait until the 
> causing condition is resolved and then return. With pausing and 
> unpausing I/O, wait_for_completion_killable does not return even after 
> unpausing because no call to complete or complete_all happens. I would 
> guess that on unpausing some call is missing so that committing the 
> linger request never completes.
> 
>  From what I am seeing, it seems like this missing completion in the 
> second case is also the cause of the hanging rbd unmap with the 
> unmodified kernel.
> 
> 
> Best regards,
> 
> Raphael

[1] https://tracker.ceph.com

  reply	other threads:[~2025-09-23 17:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 10:38 [bug report] rbd unmap hangs after pausing and unpausing I/O Raphael Zimmer
2025-09-23 17:42 ` Viacheslav Dubeyko [this message]
2025-09-24 11:51   ` Raphael Zimmer
2025-09-24 17:49     ` Viacheslav Dubeyko
2025-09-23 18:33 ` Ilya Dryomov
2025-09-24 12:05   ` Raphael Zimmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b94f8a4b7d70a9fd3603a1cfcb6a708cf6bd44b9.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=raphael.zimmer@tu-ilmenau.de \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).