Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Ofir Gal <ofir.gal@volumez.com>,
	davem@davemloft.net, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, netdev@vger.kernel.org,
	ceph-devel@vger.kernel.org
Cc: dhowells@redhat.com, edumazet@google.com, pabeni@redhat.com,
	kbusch@kernel.org, axboe@kernel.dk, hch@lst.de,
	philipp.reisner@linbit.com, lars.ellenberg@linbit.com,
	christoph.boehmwalder@linbit.com, idryomov@gmail.com,
	xiubli@redhat.com
Subject: Re: [PATCH 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
Date: Thu, 30 May 2024 20:58:06 +0300	[thread overview]
Message-ID: <d6b2c19b-c2a6-400c-bbf1-bf0469138777@grimberg.me> (raw)
In-Reply-To: <20240530132629.4180932-1-ofir.gal@volumez.com>

Hey Ofir,

On 30/05/2024 16:26, Ofir Gal wrote:
> skb_splice_from_iter() warns on !sendpage_ok() which results in nvme-tcp
> data transfer failure. This warning leads to hanging IO.
>
> nvme-tcp using sendpage_ok() to check the first page of an iterator in
> order to disable MSG_SPLICE_PAGES. The iterator can represent a list of
> contiguous pages.
>
> When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
> it requires all pages in the iterator to be sendable.
> skb_splice_from_iter() checks each page with sendpage_ok().
>
> nvme_tcp_try_send_data() might allow MSG_SPLICE_PAGES when the first
> page is sendable, but the next one are not. skb_splice_from_iter() will
> attempt to send all the pages in the iterator. When reaching an
> unsendable page the IO will hang.

Interesting. Do you know where this buffer came from? I find it strange
that a we get a bvec with a contiguous segment which consists of non slab
originated pages together with slab originated pages... it is surprising 
to see
a mix of the two.

I'm wandering if this is something that happened before david's splice_pages
changes. Maybe before that with multipage bvecs? Anyways it is strange, 
never
seen that.

David,  strange that nvme-tcp is setting a single contiguous element 
bvec but it
is broken up into PAGE_SIZE increments in skb_splice_from_iter...

>
> The patch introduces a helper sendpages_ok(), it returns true if all the
> continuous pages are sendable.
>
> Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
> this helper to check whether the page list is OK. If the helper does not
> return true, the driver should remove MSG_SPLICE_PAGES flag.
>
>
> The bug is reproducible, in order to reproduce we need nvme-over-tcp
> controllers with optimal IO size bigger than PAGE_SIZE. Creating a raid
> with bitmap over those devices reproduces the bug.
>
> In order to simulate large optimal IO size you can use dm-stripe with a
> single device.
> Script to reproduce the issue on top of brd devices using dm-stripe is
> attached below.

This is a great candidate for blktests. would be very beneficial to have 
it added there.


  parent reply	other threads:[~2024-05-30 17:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-30 13:26 [PATCH 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
2024-05-30 13:26 ` [PATCH 1/4] net: introduce helper sendpages_ok() Ofir Gal
2024-06-03  7:18   ` Hannes Reinecke
2024-06-03 11:47     ` Ofir Gal
2024-05-30 13:26 ` [PATCH 2/4] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
2024-06-03  7:22   ` Hannes Reinecke
2024-05-30 13:26 ` [PATCH 3/4] drbd: use sendpages_ok() to " Ofir Gal
2024-05-30 13:26 ` [PATCH 4/4] libceph: " Ofir Gal
2024-05-30 17:58 ` Sagi Grimberg [this message]
2024-06-03 10:32   ` [PATCH 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6b2c19b-c2a6-400c-bbf1-bf0469138777@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=axboe@kernel.dk \
    --cc=ceph-devel@vger.kernel.org \
    --cc=christoph.boehmwalder@linbit.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=idryomov@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=ofir.gal@volumez.com \
    --cc=pabeni@redhat.com \
    --cc=philipp.reisner@linbit.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox