netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
@ 2024-07-18  8:45 Ofir Gal
  2024-07-18  8:45 ` [PATCH v5 1/3] net: introduce helper sendpages_ok() Ofir Gal
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Ofir Gal @ 2024-07-18  8:45 UTC (permalink / raw)
  To: davem, linux-block, linux-nvme, netdev, ceph-devel
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, axboe, hch, sagi,
	philipp.reisner, lars.ellenberg, christoph.boehmwalder, idryomov,
	xiubli

skb_splice_from_iter() warns on !sendpage_ok() which results in nvme-tcp
data transfer failure. This warning leads to hanging IO.

nvme-tcp using sendpage_ok() to check the first page of an iterator in
order to disable MSG_SPLICE_PAGES. The iterator can represent a list of
contiguous pages.

When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
it requires all pages in the iterator to be sendable.
skb_splice_from_iter() checks each page with sendpage_ok().

nvme_tcp_try_send_data() might allow MSG_SPLICE_PAGES when the first
page is sendable, but the next one are not. skb_splice_from_iter() will
attempt to send all the pages in the iterator. When reaching an
unsendable page the IO will hang.

The patch introduces a helper sendpages_ok(), it returns true if all the
continuous pages are sendable.

Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
this helper to check whether the page list is OK. If the helper does not
return true, the driver should remove MSG_SPLICE_PAGES flag.

The root cause of the bug is a bug in md-bitmap, it sends a pages that
wasn't allocated for the bitmap. This cause the IO to be a mixture of
slab and non slab pages.
As Christoph Hellwig said in the v2, the issue can occur in similar
cases due to IO merges.


The bug is reproducible, in order to reproduce we need nvme-over-tcp
controllers with optimal IO size bigger than PAGE_SIZE. Creating a raid
with bitmap over those devices reproduces the bug.

In order to simulate large optimal IO size you can use dm-stripe with a
single device.
Test to reproduce the issue on top of brd devices using dm-stripe is
being added to blktests ("md: add regression test for "md/md-bitmap: fix
writing non bitmap pages").

The bug won't reproduce once "md/md-bitmap: fix writing non bitmap
pages" is merged, becuase it solve the root cause issue. A different
test can be done to reproduce the bug.


I have added 3 prints to test my theory. One in nvme_tcp_try_send_data()
and two others in skb_splice_from_iter() the first before sendpage_ok()
and the second on !sendpage_ok(), after the warning.
...
nvme_tcp: sendpage_ok, page: 0x654eccd7 (pfn: 120755), len: 262144, offset: 0
skbuff: before sendpage_ok - i: 0. page: 0x654eccd7 (pfn: 120755)
skbuff: before sendpage_ok - i: 1. page: 0x1666a4da (pfn: 120756)
skbuff: before sendpage_ok - i: 2. page: 0x54f9f140 (pfn: 120757)
WARNING: at net/core/skbuff.c:6848 skb_splice_from_iter+0x142/0x450
skbuff: !sendpage_ok - page: 0x54f9f140 (pfn: 120757). is_slab: 1, page_count: 1
...


stack trace:
...
WARNING: at net/core/skbuff.c:6848 skb_splice_from_iter+0x141/0x450
Workqueue: nvme_tcp_wq nvme_tcp_io_work
Call Trace:
 ? show_regs+0x6a/0x80
 ? skb_splice_from_iter+0x141/0x450
 ? __warn+0x8d/0x130
 ? skb_splice_from_iter+0x141/0x450
 ? report_bug+0x18c/0x1a0
 ? handle_bug+0x40/0x70
 ? exc_invalid_op+0x19/0x70
 ? asm_exc_invalid_op+0x1b/0x20
 ? skb_splice_from_iter+0x141/0x450
 tcp_sendmsg_locked+0x39e/0xee0
 ? _prb_read_valid+0x216/0x290
 tcp_sendmsg+0x2d/0x50
 inet_sendmsg+0x43/0x80
 sock_sendmsg+0x102/0x130
 ? vprintk_default+0x1d/0x30
 ? vprintk+0x3c/0x70
 ? _printk+0x58/0x80
 nvme_tcp_try_send_data+0x17d/0x530
 nvme_tcp_try_send+0x1b7/0x300
 nvme_tcp_io_work+0x3c/0xc0
 process_one_work+0x22e/0x420
 worker_thread+0x50/0x3f0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xd6/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x3c/0x60
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
...

---
Changelog:
v5, removed libceph patch as it not necessary
v4, move assigment to declaration at sendpages_ok(), add review tags
    from Sagi Grimberg
v3, removed the ROUND_DIV_UP as sagi suggested. add reviewed tags from
    Christoph Hellwig, Hannes Reinecke and Christoph Böhmwalder.
    Add explanation to the root cause issue in the cover letter.
v2, fix typo in patch subject

Ofir Gal (3):
  net: introduce helper sendpages_ok()
  nvme-tcp: use sendpages_ok() instead of sendpage_ok()
  drbd: use sendpages_ok() instead of sendpage_ok()

 drivers/block/drbd/drbd_main.c |  2 +-
 drivers/nvme/host/tcp.c        |  2 +-
 include/linux/net.h            | 19 +++++++++++++++++++
 3 files changed, 21 insertions(+), 2 deletions(-)

-- 
2.45.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/3] net: introduce helper sendpages_ok()
  2024-07-18  8:45 [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
@ 2024-07-18  8:45 ` Ofir Gal
  2024-07-23  0:45   ` Jakub Kicinski
  2024-07-18  8:45 ` [PATCH v5 2/3] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Ofir Gal @ 2024-07-18  8:45 UTC (permalink / raw)
  To: davem, linux-block, linux-nvme, netdev, ceph-devel
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, axboe, hch, sagi,
	philipp.reisner, lars.ellenberg, christoph.boehmwalder, idryomov,
	xiubli

Network drivers are using sendpage_ok() to check the first page of an
iterator in order to disable MSG_SPLICE_PAGES. The iterator can
represent list of contiguous pages.

When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
it requires all pages in the iterator to be sendable. Therefore it needs
to check that each page is sendable.

The patch introduces a helper sendpages_ok(), it returns true if all the
contiguous pages are sendable.

Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
this helper to check whether the page list is OK. If the helper does not
return true, the driver should remove MSG_SPLICE_PAGES flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
---
 include/linux/net.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index 688320b79fcc..b75bc534c1b3 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -322,6 +322,25 @@ static inline bool sendpage_ok(struct page *page)
 	return !PageSlab(page) && page_count(page) >= 1;
 }
 
+/*
+ * Check sendpage_ok on contiguous pages.
+ */
+static inline bool sendpages_ok(struct page *page, size_t len, size_t offset)
+{
+	struct page *p = page + (offset >> PAGE_SHIFT);
+	size_t count = 0;
+
+	while (count < len) {
+		if (!sendpage_ok(p))
+			return false;
+
+		p++;
+		count += PAGE_SIZE;
+	}
+
+	return true;
+}
+
 int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec,
 		   size_t num, size_t len);
 int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg,
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/3] nvme-tcp: use sendpages_ok() instead of sendpage_ok()
  2024-07-18  8:45 [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
  2024-07-18  8:45 ` [PATCH v5 1/3] net: introduce helper sendpages_ok() Ofir Gal
@ 2024-07-18  8:45 ` Ofir Gal
  2024-07-18  8:45 ` [PATCH v5 3/3] drbd: " Ofir Gal
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ofir Gal @ 2024-07-18  8:45 UTC (permalink / raw)
  To: davem, linux-block, linux-nvme, netdev, ceph-devel
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, axboe, hch, sagi,
	philipp.reisner, lars.ellenberg, christoph.boehmwalder, idryomov,
	xiubli, Hannes Reinecke

Currently nvme_tcp_try_send_data() use sendpage_ok() in order to disable
MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator
may represent contiguous pages.

MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().

When nvme_tcp_try_send_data() sends an iterator that the first page is
sendable, but one of the other pages isn't skb_splice_from_iter() warns
and aborts the data transfer.

Using the new helper sendpages_ok() in order to disable MSG_SPLICE_PAGES
solves the issue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 8b5e4327fe83..9f0fd14cbcb7 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1051,7 +1051,7 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 		else
 			msg.msg_flags |= MSG_MORE;
 
-		if (!sendpage_ok(page))
+		if (!sendpages_ok(page, len, offset))
 			msg.msg_flags &= ~MSG_SPLICE_PAGES;
 
 		bvec_set_page(&bvec, page, len, offset);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/3] drbd: use sendpages_ok() instead of sendpage_ok()
  2024-07-18  8:45 [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
  2024-07-18  8:45 ` [PATCH v5 1/3] net: introduce helper sendpages_ok() Ofir Gal
  2024-07-18  8:45 ` [PATCH v5 2/3] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
@ 2024-07-18  8:45 ` Ofir Gal
  2024-07-22 22:21 ` [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Jens Axboe
  2024-07-23 13:30 ` Jens Axboe
  4 siblings, 0 replies; 10+ messages in thread
From: Ofir Gal @ 2024-07-18  8:45 UTC (permalink / raw)
  To: davem, linux-block, linux-nvme, netdev, ceph-devel
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, axboe, hch, sagi,
	philipp.reisner, lars.ellenberg, christoph.boehmwalder, idryomov,
	xiubli

Currently _drbd_send_page() use sendpage_ok() in order to enable
MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator
may represent contiguous pages.

MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().

When _drbd_send_page() sends an iterator that the first page is
sendable, but one of the other pages isn't skb_splice_from_iter() warns
and aborts the data transfer.

Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
solves the issue.

Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
---
 drivers/block/drbd/drbd_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 113b441d4d36..a5dbbf6cce23 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1550,7 +1550,7 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
 	 * put_page(); and would cause either a VM_BUG directly, or
 	 * __page_cache_release a page that would actually still be referenced
 	 * by someone, leading to some obscure delayed Oops somewhere else. */
-	if (!drbd_disable_sendpage && sendpage_ok(page))
+	if (!drbd_disable_sendpage && sendpages_ok(page, len, offset))
 		msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
 
 	drbd_update_congested(peer_device->connection);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
  2024-07-18  8:45 [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
                   ` (2 preceding siblings ...)
  2024-07-18  8:45 ` [PATCH v5 3/3] drbd: " Ofir Gal
@ 2024-07-22 22:21 ` Jens Axboe
  2024-07-23  0:45   ` Jakub Kicinski
  2024-07-23 13:30 ` Jens Axboe
  4 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-07-22 22:21 UTC (permalink / raw)
  To: Ofir Gal, davem, linux-block, linux-nvme, netdev, ceph-devel
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, hch, sagi,
	philipp.reisner, lars.ellenberg, christoph.boehmwalder, idryomov,
	xiubli

Hi,

Who's queuing up these patches? I can certainly do it, but would be nice
with an ack on the networking patch if so.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 1/3] net: introduce helper sendpages_ok()
  2024-07-18  8:45 ` [PATCH v5 1/3] net: introduce helper sendpages_ok() Ofir Gal
@ 2024-07-23  0:45   ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2024-07-23  0:45 UTC (permalink / raw)
  To: Ofir Gal
  Cc: davem, linux-block, linux-nvme, netdev, ceph-devel, dhowells,
	edumazet, pabeni, kbusch, axboe, hch, sagi, philipp.reisner,
	lars.ellenberg, christoph.boehmwalder, idryomov, xiubli

On Thu, 18 Jul 2024 11:45:12 +0300 Ofir Gal wrote:
> Network drivers are using sendpage_ok() to check the first page of an
> iterator in order to disable MSG_SPLICE_PAGES. The iterator can
> represent list of contiguous pages.
> 
> When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
> it requires all pages in the iterator to be sendable. Therefore it needs
> to check that each page is sendable.
> 
> The patch introduces a helper sendpages_ok(), it returns true if all the
> contiguous pages are sendable.
> 
> Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
> this helper to check whether the page list is OK. If the helper does not
> return true, the driver should remove MSG_SPLICE_PAGES flag.

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
  2024-07-22 22:21 ` [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Jens Axboe
@ 2024-07-23  0:45   ` Jakub Kicinski
  2024-07-23  0:46     ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2024-07-23  0:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ofir Gal, davem, linux-block, linux-nvme, netdev, ceph-devel,
	dhowells, edumazet, pabeni, kbusch, hch, sagi, philipp.reisner,
	lars.ellenberg, christoph.boehmwalder, idryomov, xiubli

On Mon, 22 Jul 2024 16:21:37 -0600 Jens Axboe wrote:
> Who's queuing up these patches? I can certainly do it, but would be nice
> with an ack on the networking patch if so.

For for it!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
  2024-07-23  0:45   ` Jakub Kicinski
@ 2024-07-23  0:46     ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2024-07-23  0:46 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ofir Gal, davem, linux-block, linux-nvme, netdev, ceph-devel,
	dhowells, edumazet, pabeni, kbusch, hch, sagi, philipp.reisner,
	lars.ellenberg, christoph.boehmwalder, idryomov, xiubli

On Mon, 22 Jul 2024 17:45:48 -0700 Jakub Kicinski wrote:
> On Mon, 22 Jul 2024 16:21:37 -0600 Jens Axboe wrote:
> > Who's queuing up these patches? I can certainly do it, but would be nice
> > with an ack on the networking patch if so.  
> 
> For for it!

Go..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
  2024-07-18  8:45 [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
                   ` (3 preceding siblings ...)
  2024-07-22 22:21 ` [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Jens Axboe
@ 2024-07-23 13:30 ` Jens Axboe
  2024-07-23 17:51   ` Sagi Grimberg
  4 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-07-23 13:30 UTC (permalink / raw)
  To: davem, linux-block, linux-nvme, netdev, ceph-devel, Ofir Gal
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, hch, sagi,
	philipp.reisner, lars.ellenberg, christoph.boehmwalder, idryomov,
	xiubli


On Thu, 18 Jul 2024 11:45:11 +0300, Ofir Gal wrote:
> skb_splice_from_iter() warns on !sendpage_ok() which results in nvme-tcp
> data transfer failure. This warning leads to hanging IO.
> 
> nvme-tcp using sendpage_ok() to check the first page of an iterator in
> order to disable MSG_SPLICE_PAGES. The iterator can represent a list of
> contiguous pages.
> 
> [...]

Applied, thanks!

[1/3] net: introduce helper sendpages_ok()
      commit: 80b272a6f50b2a76f7d2c71a5c097c56d103a9ed
[2/3] nvme-tcp: use sendpages_ok() instead of sendpage_ok()
      commit: 41669803e5001f674083c9c176a4749eb1abbe29
[3/3] drbd: use sendpages_ok() instead of sendpage_ok()
      commit: e601087934f178a9a9ae8f5a3938b4aa76379ea1

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages
  2024-07-23 13:30 ` Jens Axboe
@ 2024-07-23 17:51   ` Sagi Grimberg
  0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2024-07-23 17:51 UTC (permalink / raw)
  To: Jens Axboe, davem, linux-block, linux-nvme, netdev, ceph-devel,
	Ofir Gal
  Cc: dhowells, edumazet, kuba, pabeni, kbusch, hch, philipp.reisner,
	lars.ellenberg, christoph.boehmwalder, idryomov, xiubli



On 23/07/2024 16:30, Jens Axboe wrote:
> On Thu, 18 Jul 2024 11:45:11 +0300, Ofir Gal wrote:
>> skb_splice_from_iter() warns on !sendpage_ok() which results in nvme-tcp
>> data transfer failure. This warning leads to hanging IO.
>>
>> nvme-tcp using sendpage_ok() to check the first page of an iterator in
>> order to disable MSG_SPLICE_PAGES. The iterator can represent a list of
>> contiguous pages.
>>
>> [...]
> Applied, thanks!
>
> [1/3] net: introduce helper sendpages_ok()
>        commit: 80b272a6f50b2a76f7d2c71a5c097c56d103a9ed
> [2/3] nvme-tcp: use sendpages_ok() instead of sendpage_ok()
>        commit: 41669803e5001f674083c9c176a4749eb1abbe29
> [3/3] drbd: use sendpages_ok() instead of sendpage_ok()
>        commit: e601087934f178a9a9ae8f5a3938b4aa76379ea1
>
> Best regards,

Thanks Jens.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-07-23 17:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18  8:45 [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Ofir Gal
2024-07-18  8:45 ` [PATCH v5 1/3] net: introduce helper sendpages_ok() Ofir Gal
2024-07-23  0:45   ` Jakub Kicinski
2024-07-18  8:45 ` [PATCH v5 2/3] nvme-tcp: use sendpages_ok() instead of sendpage_ok() Ofir Gal
2024-07-18  8:45 ` [PATCH v5 3/3] drbd: " Ofir Gal
2024-07-22 22:21 ` [PATCH v5 0/3] bugfix: Introduce sendpages_ok() to check sendpage_ok() on contiguous pages Jens Axboe
2024-07-23  0:45   ` Jakub Kicinski
2024-07-23  0:46     ` Jakub Kicinski
2024-07-23 13:30 ` Jens Axboe
2024-07-23 17:51   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).