From: Ofir Gal <ofir.gal@volumez.com>
To: Sagi Grimberg <sagi@grimberg.me>,
davem@davemloft.net, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, netdev@vger.kernel.org,
ceph-devel@vger.kernel.org
Cc: dhowells@redhat.com, edumazet@google.com, pabeni@redhat.com,
kbusch@kernel.org, axboe@kernel.dk, hch@lst.de,
philipp.reisner@linbit.com, lars.ellenberg@linbit.com,
christoph.boehmwalder@linbit.com, idryomov@gmail.com,
xiubli@redhat.com
Subject: Re: [PATCH 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
Date: Mon, 3 Jun 2024 13:32:54 +0300 [thread overview]
Message-ID: <bca1947a-4826-472d-a62c-5ca5ad724939@volumez.com> (raw)
In-Reply-To: <d6b2c19b-c2a6-400c-bbf1-bf0469138777@grimberg.me>
On 30/05/2024 20:58, Sagi Grimberg wrote:
> Hey Ofir,
>
> On 30/05/2024 16:26, Ofir Gal wrote:
>> skb_splice_from_iter() warns on !sendpage_ok() which results in nvme-tcp
>> data transfer failure. This warning leads to hanging IO.
>>
>> nvme-tcp using sendpage_ok() to check the first page of an iterator in
>> order to disable MSG_SPLICE_PAGES. The iterator can represent a list of
>> contiguous pages.
>>
>> When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
>> it requires all pages in the iterator to be sendable.
>> skb_splice_from_iter() checks each page with sendpage_ok().
>>
>> nvme_tcp_try_send_data() might allow MSG_SPLICE_PAGES when the first
>> page is sendable, but the next one are not. skb_splice_from_iter() will
>> attempt to send all the pages in the iterator. When reaching an
>> unsendable page the IO will hang.
>
> Interesting. Do you know where this buffer came from? I find it strange
> that a we get a bvec with a contiguous segment which consists of non slab
> originated pages together with slab originated pages... it is surprising to see
> a mix of the two.
I find it strange as well, I haven't investigate the origin of the IO
yet. I suspect the first 2 pages are the superblocks of the raid
(mdp_superblock_1 and bitmap_super_s) and the rest of the IO is the
bitmap.
I have stumbled with the same issue when running xfs_format (couldn't
reproduce it from scratch). I suspect there are others cases that mix
the slab pages and non-slab pages.
> I'm wandering if this is something that happened before david's splice_pages
> changes. Maybe before that with multipage bvecs? Anyways it is strange, never
> seen that.
I haven't bisect the commit that caused the behavior but I have tested
ubuntu with 6.2.0 kernel, the bug didn't occur. (6.2.0 doesn't contain
david's splice_pages changes).
I'm not familiar with "multipage bvecs" patch, which patch do you refer
to?
> David, strange that nvme-tcp is setting a single contiguous element bvec but it
> is broken up into PAGE_SIZE increments in skb_splice_from_iter...
>
>>
>> The patch introduces a helper sendpages_ok(), it returns true if all the
>> continuous pages are sendable.
>>
>> Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
>> this helper to check whether the page list is OK. If the helper does not
>> return true, the driver should remove MSG_SPLICE_PAGES flag.
>>
>>
>> The bug is reproducible, in order to reproduce we need nvme-over-tcp
>> controllers with optimal IO size bigger than PAGE_SIZE. Creating a raid
>> with bitmap over those devices reproduces the bug.
>>
>> In order to simulate large optimal IO size you can use dm-stripe with a
>> single device.
>> Script to reproduce the issue on top of brd devices using dm-stripe is
>> attached below.
>
> This is a great candidate for blktests. would be very beneficial to have it added there.
Good idea, will do!
prev parent reply other threads:[~2024-06-03 10:33 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 13:26 [PATCH 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
2024-05-30 13:26 ` [PATCH 1/4] net: introduce helper sendpages_ok() Ofir Gal
2024-06-03 7:18 ` Hannes Reinecke
2024-06-03 11:47 ` Ofir Gal
2024-05-30 13:26 ` [PATCH 2/4] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
2024-06-03 7:22 ` Hannes Reinecke
2024-05-30 13:26 ` [PATCH 3/4] drbd: use sendpages_ok() to " Ofir Gal
2024-05-30 13:26 ` [PATCH 4/4] libceph: " Ofir Gal
2024-05-30 17:58 ` [PATCH 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Sagi Grimberg
2024-06-03 10:32 ` Ofir Gal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bca1947a-4826-472d-a62c-5ca5ad724939@volumez.com \
--to=ofir.gal@volumez.com \
--cc=axboe@kernel.dk \
--cc=ceph-devel@vger.kernel.org \
--cc=christoph.boehmwalder@linbit.com \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hch@lst.de \
--cc=idryomov@gmail.com \
--cc=kbusch@kernel.org \
--cc=lars.ellenberg@linbit.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=philipp.reisner@linbit.com \
--cc=sagi@grimberg.me \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox