From: Ofir Gal <ofir.gal@volumez.com>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: davem@davemloft.net, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, netdev@vger.kernel.org,
ceph-devel@vger.kernel.org, dhowells@redhat.com,
edumazet@google.com, pabeni@redhat.com, kbusch@kernel.org,
axboe@kernel.dk, philipp.reisner@linbit.com,
lars.ellenberg@linbit.com, christoph.boehmwalder@linbit.com,
idryomov@gmail.com, xiubli@redhat.com
Subject: Re: [PATCH v2 1/4] net: introduce helper sendpages_ok()
Date: Thu, 6 Jun 2024 15:57:25 +0300 [thread overview]
Message-ID: <b13305d7-35c8-432e-bea1-616410a9da15@volumez.com> (raw)
In-Reply-To: <ef7ea4a8-c0e4-4fd9-9abb-42ae95090fc8@grimberg.me>
On 04/06/2024 16:01, Sagi Grimberg wrote:
>
>
> On 04/06/2024 11:24, Sagi Grimberg wrote:
>>
>>
>> On 04/06/2024 7:27, Christoph Hellwig wrote:
>>> On Tue, Jun 04, 2024 at 12:27:06AM +0300, Sagi Grimberg wrote:
>>>>>> I still don't understand how a page in the middle of a contiguous range ends
>>>>>> up coming from the slab while others don't.
>>>>> I haven't investigate the origin of the IO
>>>>> yet. I suspect the first 2 pages are the superblocks of the raid
>>>>> (mdp_superblock_1 and bitmap_super_s) and the rest of the IO is the bitmap.
>>>> Well, if these indeed are different origins and just *happen* to be a
>>>> mixture
>>>> of slab originated pages and non-slab pages combined together in a single
>>>> bio of a bvec entry,
>>>> I'd suspect that it would be more beneficial to split the bvec (essentially
>>>> not allow bio_add_page
>>>> to append the page to tail bvec depending on a queue limit (similar to how
>>>> we handle sg gaps).
I have investigated the origin of the IO. It's a bug in the md-bitmap
code. It's a single IO that __write_sb_page() sends, it rounds up the io
size to the optimal io size but doesn't check that the final size exceeds
the amount of pages it allocated.
The slab pages aren't allocated by the md-bitmap, they are pages that
happens to be after the allocated pages. I'm applying a patch to the md
subsystem asap.
I have added some logs to test the theory:
...
md: created bitmap (1 pages) for device md127
__write_sb_page before md_super_write. offset: 16, size: 262144. pfn: 0x53ee
### __write_sb_page before md_super_write. logging pages ###
pfn: 0x53ee, slab: 0
pfn: 0x53ef, slab: 1
pfn: 0x53f0, slab: 0
pfn: 0x53f1, slab: 0
pfn: 0x53f2, slab: 0
pfn: 0x53f3, slab: 1
...
nvme_tcp: sendpage_ok - pfn: 0x53ee, len: 262144, offset: 0
skbuff: before sendpage_ok() - pfn: 0x53ee
skbuff: before sendpage_ok() - pfn: 0x53ef
WARNING at net/core/skbuff.c:6848 skb_splice_from_iter+0x142/0x450
skbuff: !sendpage_ok - pfn: 0x53ef. is_slab: 1, page_count: 1
...
There is only 1 page allocated for bitmap but __write_sb_page() tries to
write 64 pages.
>>> So you want to add a PageSlab check to bvec_try_merge_page? That sounds
>>> fairly expensive..
>>>
>>
>> The check needs to happen somewhere apparently, and given that it will be gated by a queue flag
>> only request queues that actually needed will suffer, but they will suffer anyways...
> What I now see is that we will check PageSlab twice (bvec last index and append page)
> and skb_splice_from_iter checks it again... How many times check we check this :)
>
> Would be great if the network stack can just check it once and fallback to page copy...
Nevertheless I think a check in the network stack or when merging bio's
is still necessary.
next prev parent reply other threads:[~2024-06-06 12:57 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 14:24 [PATCH v2 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
2024-05-30 14:24 ` [PATCH v2 1/4] net: introduce helper sendpages_ok() Ofir Gal
2024-05-31 7:32 ` Christoph Hellwig
2024-05-31 8:51 ` Sagi Grimberg
2024-06-03 12:35 ` Ofir Gal
2024-06-03 21:27 ` Sagi Grimberg
2024-06-04 4:27 ` Christoph Hellwig
2024-06-04 8:24 ` Sagi Grimberg
2024-06-04 13:01 ` Sagi Grimberg
2024-06-06 12:57 ` Ofir Gal [this message]
2024-06-06 13:08 ` Christoph Hellwig
2024-06-06 13:18 ` Ofir Gal
2024-06-06 13:52 ` Christoph Hellwig
2024-06-06 15:42 ` Ofir Gal
2024-05-30 14:24 ` [PATCH v2 2/4] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
2024-05-31 7:32 ` Christoph Hellwig
2024-05-30 14:24 ` [PATCH v2 3/4] drbd: " Ofir Gal
2024-06-04 14:43 ` Christoph Böhmwalder
2024-05-30 14:24 ` [PATCH v2 4/4] libceph: " Ofir Gal
2024-05-31 7:32 ` [PATCH v2 0/4] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Christoph Hellwig
2024-06-01 22:36 ` Jakub Kicinski
2024-06-04 4:30 ` Christoph Hellwig
2024-06-04 14:42 ` Jakub Kicinski
2024-06-05 7:27 ` Christoph Hellwig
2024-06-01 22:34 ` Jakub Kicinski
2024-06-02 7:48 ` Sagi Grimberg
2024-06-03 9:07 ` Hannes Reinecke
2024-06-03 12:46 ` Ofir Gal
2024-06-03 7:24 ` Hannes Reinecke
2024-06-03 12:49 ` Ofir Gal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b13305d7-35c8-432e-bea1-616410a9da15@volumez.com \
--to=ofir.gal@volumez.com \
--cc=axboe@kernel.dk \
--cc=ceph-devel@vger.kernel.org \
--cc=christoph.boehmwalder@linbit.com \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hch@lst.de \
--cc=idryomov@gmail.com \
--cc=kbusch@kernel.org \
--cc=lars.ellenberg@linbit.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=philipp.reisner@linbit.com \
--cc=sagi@grimberg.me \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox