[PATCH] nbd: fix partial sending

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] nbd: fix partial sending
@ 2024-10-17 11:36 Ming Lei
  2024-10-17 15:13 ` Bart Van Assche
  2024-10-17 15:47 ` Kevin Wolf
  0 siblings, 2 replies; 6+ messages in thread
From: Ming Lei @ 2024-10-17 11:36 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: josef, nbd, eblake, Ming Lei, vincent.chen, Leon Schuermann,
	Bart Van Assche, Kevin Wolf

nbd driver sends request header and payload with multiple call of
sock_sendmsg, and partial sending can't be avoided. However, nbd driver
returns BLK_STS_RESOURCE to block core in this situation. This way causes
one issue: request->tag may change in the next run of nbd_queue_rq(), but
the original old tag has been sent as part of header cookie, this way
confuses nbd driver reply handling, since the real request can't be
retrieved any more with the obsolete old tag.

Fix it by retrying sending directly, this way is reasonable & safe since
nothing can move on if the current hw queue(socket) has pending request,
and unnecessary requeue can be avoided in this way.

Cc: vincent.chen@sifive.com
Cc: Leon Schuermann <leon@is.currently.online>
Cc: Bart Van Assche <bvanassche@acm.org>
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
Kevin,
	Please test this version, thanks!

 drivers/block/nbd.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b852050d8a96..ef84071041e3 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -701,8 +701,9 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
 			if (sent) {
 				nsock->pending = req;
 				nsock->sent = sent;
+			} else {
+				set_bit(NBD_CMD_REQUEUED, &cmd->flags);
 			}
-			set_bit(NBD_CMD_REQUEUED, &cmd->flags);
 			return BLK_STS_RESOURCE;
 		}
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
@@ -743,7 +744,6 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
 					 */
 					nsock->pending = req;
 					nsock->sent = sent;
-					set_bit(NBD_CMD_REQUEUED, &cmd->flags);
 					return BLK_STS_RESOURCE;
 				}
 				dev_err(disk_to_dev(nbd->disk),
@@ -778,6 +778,35 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
 	return BLK_STS_OK;
 }
 
+/*
+ * Send pending nbd request and payload, part of them have been sent
+ * already, so we have to send them all with current request, and can't
+ * return BLK_STS_RESOURCE, otherwise request tag may be changed in next
+ * retry
+ */
+static blk_status_t nbd_send_pending_cmd(struct nbd_device *nbd,
+		struct nbd_cmd *cmd)
+{
+	struct request *req = blk_mq_rq_from_pdu(cmd);
+	unsigned long deadline = READ_ONCE(req->deadline);
+	unsigned int wait_ms = 2;
+	blk_status_t res;
+
+	WARN_ON_ONCE(test_bit(NBD_CMD_REQUEUED, &cmd->flags));
+
+	while (true) {
+		res = nbd_send_cmd(nbd, cmd, cmd->index);
+		if (res != BLK_STS_RESOURCE)
+			return res;
+		if (READ_ONCE(jiffies) + msecs_to_jiffies(wait_ms) >= deadline)
+			break;
+		msleep(wait_ms);
+		wait_ms *= 2;
+	}
+
+	return BLK_STS_IOERR;
+}
+
 static int nbd_read_reply(struct nbd_device *nbd, struct socket *sock,
 			  struct nbd_reply *reply)
 {
@@ -1111,6 +1140,8 @@ static blk_status_t nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		goto out;
 	}
 	ret = nbd_send_cmd(nbd, cmd, index);
+	if (ret == BLK_STS_RESOURCE && nsock->pending == req)
+		ret = nbd_send_pending_cmd(nbd, cmd);
 out:
 	mutex_unlock(&nsock->tx_lock);
 	nbd_config_put(nbd);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nbd: fix partial sending
  2024-10-17 11:36 [PATCH] nbd: fix partial sending Ming Lei
@ 2024-10-17 15:13 ` Bart Van Assche
  2024-10-17 15:22   ` Jens Axboe
  2024-10-17 15:47 ` Kevin Wolf
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2024-10-17 15:13 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: josef, nbd, eblake, vincent.chen, Leon Schuermann, Kevin Wolf

On 10/17/24 4:36 AM, Ming Lei wrote:
> +static blk_status_t nbd_send_pending_cmd(struct nbd_device *nbd,
> +		struct nbd_cmd *cmd)
> +{
> +	struct request *req = blk_mq_rq_from_pdu(cmd);
> +	unsigned long deadline = READ_ONCE(req->deadline);
> +	unsigned int wait_ms = 2;
> +	blk_status_t res;
> +
> +	WARN_ON_ONCE(test_bit(NBD_CMD_REQUEUED, &cmd->flags));
> +
> +	while (true) {
> +		res = nbd_send_cmd(nbd, cmd, cmd->index);
> +		if (res != BLK_STS_RESOURCE)
> +			return res;
> +		if (READ_ONCE(jiffies) + msecs_to_jiffies(wait_ms) >= deadline)
> +			break;
> +		msleep(wait_ms);
> +		wait_ms *= 2;
> +	}

I think that there are better solutions to wait until more data
can be sent, e.g. by using the kernel equivalent of the C library
function select().

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nbd: fix partial sending
  2024-10-17 15:13 ` Bart Van Assche
@ 2024-10-17 15:22   ` Jens Axboe
  2024-10-17 15:42     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2024-10-17 15:22 UTC (permalink / raw)
  To: Bart Van Assche, Ming Lei, linux-block
  Cc: josef, nbd, eblake, vincent.chen, Leon Schuermann, Kevin Wolf

On 10/17/24 9:13 AM, Bart Van Assche wrote:
> On 10/17/24 4:36 AM, Ming Lei wrote:
>> +static blk_status_t nbd_send_pending_cmd(struct nbd_device *nbd,
>> +        struct nbd_cmd *cmd)
>> +{
>> +    struct request *req = blk_mq_rq_from_pdu(cmd);
>> +    unsigned long deadline = READ_ONCE(req->deadline);
>> +    unsigned int wait_ms = 2;
>> +    blk_status_t res;
>> +
>> +    WARN_ON_ONCE(test_bit(NBD_CMD_REQUEUED, &cmd->flags));
>> +
>> +    while (true) {
>> +        res = nbd_send_cmd(nbd, cmd, cmd->index);
>> +        if (res != BLK_STS_RESOURCE)
>> +            return res;
>> +        if (READ_ONCE(jiffies) + msecs_to_jiffies(wait_ms) >= deadline)
>> +            break;
>> +        msleep(wait_ms);
>> +        wait_ms *= 2;
>> +    }
> 
> I think that there are better solutions to wait until more data
> can be sent, e.g. by using the kernel equivalent of the C library
> function select().

It's vfs_poll() - but I don't think that'd be worth it here, the nbd
driver sets BLK_MQ_F_BLOCKING anyway. Using a poll trigger for this
would be a lot more complicated, and need quite a bit of support code.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nbd: fix partial sending
  2024-10-17 15:22   ` Jens Axboe
@ 2024-10-17 15:42     ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2024-10-17 15:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, linux-block, josef, nbd, eblake, vincent.chen,
	Leon Schuermann, Kevin Wolf

On Thu, Oct 17, 2024 at 09:22:22AM -0600, Jens Axboe wrote:
> On 10/17/24 9:13 AM, Bart Van Assche wrote:
> > On 10/17/24 4:36 AM, Ming Lei wrote:
> >> +static blk_status_t nbd_send_pending_cmd(struct nbd_device *nbd,
> >> +        struct nbd_cmd *cmd)
> >> +{
> >> +    struct request *req = blk_mq_rq_from_pdu(cmd);
> >> +    unsigned long deadline = READ_ONCE(req->deadline);
> >> +    unsigned int wait_ms = 2;
> >> +    blk_status_t res;
> >> +
> >> +    WARN_ON_ONCE(test_bit(NBD_CMD_REQUEUED, &cmd->flags));
> >> +
> >> +    while (true) {
> >> +        res = nbd_send_cmd(nbd, cmd, cmd->index);
> >> +        if (res != BLK_STS_RESOURCE)
> >> +            return res;
> >> +        if (READ_ONCE(jiffies) + msecs_to_jiffies(wait_ms) >= deadline)
> >> +            break;
> >> +        msleep(wait_ms);
> >> +        wait_ms *= 2;
> >> +    }
> > 
> > I think that there are better solutions to wait until more data
> > can be sent, e.g. by using the kernel equivalent of the C library
> > function select().
> 
> It's vfs_poll() - but I don't think that'd be worth it here, the nbd
> driver sets BLK_MQ_F_BLOCKING anyway. Using a poll trigger for this
> would be a lot more complicated, and need quite a bit of support code.

Agree.

It is one unlikely event and not worth vfs_poll() here.

And the retry with exponential backoff wait should work just fine.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nbd: fix partial sending
  2024-10-17 11:36 [PATCH] nbd: fix partial sending Ming Lei
  2024-10-17 15:13 ` Bart Van Assche
@ 2024-10-17 15:47 ` Kevin Wolf
  2024-10-18  0:33   ` Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2024-10-17 15:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, josef, nbd, eblake, vincent.chen,
	Leon Schuermann, Bart Van Assche

Am 17.10.2024 um 13:36 hat Ming Lei geschrieben:
> nbd driver sends request header and payload with multiple call of
> sock_sendmsg, and partial sending can't be avoided. However, nbd driver
> returns BLK_STS_RESOURCE to block core in this situation. This way causes
> one issue: request->tag may change in the next run of nbd_queue_rq(), but
> the original old tag has been sent as part of header cookie, this way
> confuses nbd driver reply handling, since the real request can't be
> retrieved any more with the obsolete old tag.
> 
> Fix it by retrying sending directly, this way is reasonable & safe since
> nothing can move on if the current hw queue(socket) has pending request,
> and unnecessary requeue can be avoided in this way.
> 
> Cc: vincent.chen@sifive.com
> Cc: Leon Schuermann <leon@is.currently.online>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Reported-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> Kevin,
> 	Please test this version, thanks!

The NBD errors seem to go away with this.

I'm not sure about side effects, though. Isn't the idea behind EINTR
that you return to userspace to let it handle a signal? Looping in the
kernel doesn't quite achieve this, so do we delay/prevent signal
delivery with this? On the other hand, if it were completely prevented,
then this should become an infinite loop, which it didn't in my test.

>  drivers/block/nbd.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index b852050d8a96..ef84071041e3 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -701,8 +701,9 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
>  			if (sent) {
>  				nsock->pending = req;
>  				nsock->sent = sent;
> +			} else {
> +				set_bit(NBD_CMD_REQUEUED, &cmd->flags);
>  			}
> -			set_bit(NBD_CMD_REQUEUED, &cmd->flags);
>  			return BLK_STS_RESOURCE;
>  		}
>  		dev_err_ratelimited(disk_to_dev(nbd->disk),
> @@ -743,7 +744,6 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
>  					 */
>  					nsock->pending = req;
>  					nsock->sent = sent;
> -					set_bit(NBD_CMD_REQUEUED, &cmd->flags);
>  					return BLK_STS_RESOURCE;
>  				}
>  				dev_err(disk_to_dev(nbd->disk),
> @@ -778,6 +778,35 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
>  	return BLK_STS_OK;
>  }
>  
> +/*
> + * Send pending nbd request and payload, part of them have been sent
> + * already, so we have to send them all with current request, and can't
> + * return BLK_STS_RESOURCE, otherwise request tag may be changed in next
> + * retry
> + */
> +static blk_status_t nbd_send_pending_cmd(struct nbd_device *nbd,
> +		struct nbd_cmd *cmd)
> +{
> +	struct request *req = blk_mq_rq_from_pdu(cmd);
> +	unsigned long deadline = READ_ONCE(req->deadline);
> +	unsigned int wait_ms = 2;
> +	blk_status_t res;
> +
> +	WARN_ON_ONCE(test_bit(NBD_CMD_REQUEUED, &cmd->flags));
> +
> +	while (true) {
> +		res = nbd_send_cmd(nbd, cmd, cmd->index);
> +		if (res != BLK_STS_RESOURCE)
> +			return res;
> +		if (READ_ONCE(jiffies) + msecs_to_jiffies(wait_ms) >= deadline)
> +			break;
> +		msleep(wait_ms);
> +		wait_ms *= 2;
> +	}
> +
> +	return BLK_STS_IOERR;
> +}
> +
>  static int nbd_read_reply(struct nbd_device *nbd, struct socket *sock,
>  			  struct nbd_reply *reply)
>  {
> @@ -1111,6 +1140,8 @@ static blk_status_t nbd_handle_cmd(struct nbd_cmd *cmd, int index)
>  		goto out;
>  	}
>  	ret = nbd_send_cmd(nbd, cmd, index);
> +	if (ret == BLK_STS_RESOURCE && nsock->pending == req)
> +		ret = nbd_send_pending_cmd(nbd, cmd);

Is there a reason to call nbd_send_cmd() outside of the new loop first
instead of going to the loop directly? It's always better to only have
a single code path.

>  out:
>  	mutex_unlock(&nsock->tx_lock);
>  	nbd_config_put(nbd);

Kevin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nbd: fix partial sending
  2024-10-17 15:47 ` Kevin Wolf
@ 2024-10-18  0:33   ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2024-10-18  0:33 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Jens Axboe, linux-block, josef, nbd, eblake, vincent.chen,
	Leon Schuermann, Bart Van Assche

On Thu, Oct 17, 2024 at 05:47:53PM +0200, Kevin Wolf wrote:
> Am 17.10.2024 um 13:36 hat Ming Lei geschrieben:
> > nbd driver sends request header and payload with multiple call of
> > sock_sendmsg, and partial sending can't be avoided. However, nbd driver
> > returns BLK_STS_RESOURCE to block core in this situation. This way causes
> > one issue: request->tag may change in the next run of nbd_queue_rq(), but
> > the original old tag has been sent as part of header cookie, this way
> > confuses nbd driver reply handling, since the real request can't be
> > retrieved any more with the obsolete old tag.
> > 
> > Fix it by retrying sending directly, this way is reasonable & safe since
> > nothing can move on if the current hw queue(socket) has pending request,
> > and unnecessary requeue can be avoided in this way.
> > 
> > Cc: vincent.chen@sifive.com
> > Cc: Leon Schuermann <leon@is.currently.online>
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Reported-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > Kevin,
> > 	Please test this version, thanks!
> 
> The NBD errors seem to go away with this.
> 
> I'm not sure about side effects, though. Isn't the idea behind EINTR
> that you return to userspace to let it handle a signal? Looping in the

Well, the retry can be done in one work function, then userspace can get
chance to handle signal.

> kernel doesn't quite achieve this, so do we delay/prevent signal
> delivery with this? On the other hand, if it were completely prevented,
> then this should become an infinite loop, which it didn't in my test.

If retry can't succeed in the request's deadline, it will fail.

> 
> >  drivers/block/nbd.c | 35 +++++++++++++++++++++++++++++++++--
> >  1 file changed, 33 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > index b852050d8a96..ef84071041e3 100644
> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -701,8 +701,9 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
> >  			if (sent) {
> >  				nsock->pending = req;
> >  				nsock->sent = sent;
> > +			} else {
> > +				set_bit(NBD_CMD_REQUEUED, &cmd->flags);
> >  			}
> > -			set_bit(NBD_CMD_REQUEUED, &cmd->flags);
> >  			return BLK_STS_RESOURCE;
> >  		}
> >  		dev_err_ratelimited(disk_to_dev(nbd->disk),
> > @@ -743,7 +744,6 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
> >  					 */
> >  					nsock->pending = req;
> >  					nsock->sent = sent;
> > -					set_bit(NBD_CMD_REQUEUED, &cmd->flags);
> >  					return BLK_STS_RESOURCE;
> >  				}
> >  				dev_err(disk_to_dev(nbd->disk),
> > @@ -778,6 +778,35 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd,
> >  	return BLK_STS_OK;
> >  }
> >  
> > +/*
> > + * Send pending nbd request and payload, part of them have been sent
> > + * already, so we have to send them all with current request, and can't
> > + * return BLK_STS_RESOURCE, otherwise request tag may be changed in next
> > + * retry
> > + */
> > +static blk_status_t nbd_send_pending_cmd(struct nbd_device *nbd,
> > +		struct nbd_cmd *cmd)
> > +{
> > +	struct request *req = blk_mq_rq_from_pdu(cmd);
> > +	unsigned long deadline = READ_ONCE(req->deadline);
> > +	unsigned int wait_ms = 2;
> > +	blk_status_t res;
> > +
> > +	WARN_ON_ONCE(test_bit(NBD_CMD_REQUEUED, &cmd->flags));
> > +
> > +	while (true) {
> > +		res = nbd_send_cmd(nbd, cmd, cmd->index);
> > +		if (res != BLK_STS_RESOURCE)
> > +			return res;
> > +		if (READ_ONCE(jiffies) + msecs_to_jiffies(wait_ms) >= deadline)
> > +			break;
> > +		msleep(wait_ms);
> > +		wait_ms *= 2;
> > +	}
> > +
> > +	return BLK_STS_IOERR;
> > +}
> > +
> >  static int nbd_read_reply(struct nbd_device *nbd, struct socket *sock,
> >  			  struct nbd_reply *reply)
> >  {
> > @@ -1111,6 +1140,8 @@ static blk_status_t nbd_handle_cmd(struct nbd_cmd *cmd, int index)
> >  		goto out;
> >  	}
> >  	ret = nbd_send_cmd(nbd, cmd, index);
> > +	if (ret == BLK_STS_RESOURCE && nsock->pending == req)
> > +		ret = nbd_send_pending_cmd(nbd, cmd);
> 
> Is there a reason to call nbd_send_cmd() outside of the new loop first
> instead of going to the loop directly? It's always better to only have
> a single code path.

IMO, it is better to add new cold code path for handling the unusual
pending request, and nbd_send_cmd() has been too complicated to maintain.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-18  0:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-17 11:36 [PATCH] nbd: fix partial sending Ming Lei
2024-10-17 15:13 ` Bart Van Assche
2024-10-17 15:22   ` Jens Axboe
2024-10-17 15:42     ` Ming Lei
2024-10-17 15:47 ` Kevin Wolf
2024-10-18  0:33   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).