From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alvaro Karsz <alvaro.karsz@solid-run.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
"edumazet@google.com" <edumazet@google.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"pabeni@redhat.com" <pabeni@redhat.com>,
"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [PATCH net] virtio-net: reject small vring sizes
Date: Tue, 25 Apr 2023 08:31:54 -0400 [thread overview]
Message-ID: <20230425082150-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <AM0PR04MB4723CE2A9B8BFA7963A66A98D4649@AM0PR04MB4723.eurprd04.prod.outlook.com>
On Tue, Apr 25, 2023 at 09:41:35AM +0000, Alvaro Karsz wrote:
> > So, let's add some funky flags in virtio device to block out
> > features, have core compare these before and after,
> > detect change, reset and retry?
>
> In the virtnet case, we'll decide which features to block based on the ring size.
> 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
why MRG_RXBUF? what does it matter?
> So we'll need a new virtio callback instead of flags.
> Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features?
>
> In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
>
> So, the flow is something like:
>
> * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> using small vrings, if so, we reset and renegotiate the features.
Using which APIs? What does peek_vqs_len do and why does it matter that
it is super early?
> * We continue normally and create the VQs.
> * We check if the created rings are small.
> If they are and some blocked features were negotiated anyway (may occur if
> the re-negotiation fails, or if the transport has no implementation for
> peek_vqs_len), we fail probe.
> If the ring is small and the features are ok, we mark the virtnet device as
> vring_small and fixup some variables.
>
>
> peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
>
> During virtnet_find_vqs we check the following:
> vi->has_cvq
> vi->big_packets
> vi->mergeable_rx_bufs
>
> But these will change if the ring is small..
>
> (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
>
>
> The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
>
> I'm considering splitting the effort into 2 series.
> A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
>
> I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
>
> What do you think?
Lots of work spilling over to transports.
And I especially don't like that it slows down boot on good path.
I have the following idea:
- add a blocked features value in virtio_device
- before calling probe, core saves blocked features
- if probe fails, checks blocked features.
if any were added, reset, negotiate all features
except blocked ones and do the validate/probe dance again
This will mean mostly no changes to drivers: just check condition,
block feature and fail probe.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alvaro Karsz <alvaro.karsz@solid-run.com>
Cc: Jason Wang <jasowang@redhat.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"edumazet@google.com" <edumazet@google.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"pabeni@redhat.com" <pabeni@redhat.com>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net] virtio-net: reject small vring sizes
Date: Tue, 25 Apr 2023 08:31:54 -0400 [thread overview]
Message-ID: <20230425082150-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <AM0PR04MB4723CE2A9B8BFA7963A66A98D4649@AM0PR04MB4723.eurprd04.prod.outlook.com>
On Tue, Apr 25, 2023 at 09:41:35AM +0000, Alvaro Karsz wrote:
> > So, let's add some funky flags in virtio device to block out
> > features, have core compare these before and after,
> > detect change, reset and retry?
>
> In the virtnet case, we'll decide which features to block based on the ring size.
> 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
why MRG_RXBUF? what does it matter?
> So we'll need a new virtio callback instead of flags.
> Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features?
>
> In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
>
> So, the flow is something like:
>
> * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> using small vrings, if so, we reset and renegotiate the features.
Using which APIs? What does peek_vqs_len do and why does it matter that
it is super early?
> * We continue normally and create the VQs.
> * We check if the created rings are small.
> If they are and some blocked features were negotiated anyway (may occur if
> the re-negotiation fails, or if the transport has no implementation for
> peek_vqs_len), we fail probe.
> If the ring is small and the features are ok, we mark the virtnet device as
> vring_small and fixup some variables.
>
>
> peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
>
> During virtnet_find_vqs we check the following:
> vi->has_cvq
> vi->big_packets
> vi->mergeable_rx_bufs
>
> But these will change if the ring is small..
>
> (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
>
>
> The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
>
> I'm considering splitting the effort into 2 series.
> A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
>
> I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
>
> What do you think?
Lots of work spilling over to transports.
And I especially don't like that it slows down boot on good path.
I have the following idea:
- add a blocked features value in virtio_device
- before calling probe, core saves blocked features
- if probe fails, checks blocked features.
if any were added, reset, negotiate all features
except blocked ones and do the validate/probe dance again
This will mean mostly no changes to drivers: just check condition,
block feature and fail probe.
--
MST
next prev parent reply other threads:[~2023-04-25 12:32 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-16 7:46 [PATCH net] virtio-net: reject small vring sizes Alvaro Karsz
2023-04-16 7:46 ` Alvaro Karsz
2023-04-16 16:54 ` Alvaro Karsz
2023-04-16 16:54 ` Alvaro Karsz
2023-04-16 20:45 ` Michael S. Tsirkin
2023-04-16 20:45 ` Michael S. Tsirkin
2023-04-17 3:24 ` Jason Wang
2023-04-17 3:24 ` Jason Wang
2023-04-17 6:20 ` Michael S. Tsirkin
2023-04-17 6:20 ` Michael S. Tsirkin
2023-04-17 6:38 ` Alvaro Karsz
2023-04-17 6:38 ` Alvaro Karsz
2023-04-17 6:41 ` Michael S. Tsirkin
2023-04-17 6:41 ` Michael S. Tsirkin
2023-04-17 7:03 ` Alvaro Karsz
2023-04-17 7:03 ` Alvaro Karsz
2023-04-17 7:10 ` Michael S. Tsirkin
2023-04-17 7:10 ` Michael S. Tsirkin
2023-04-17 7:33 ` Alvaro Karsz
2023-04-17 7:33 ` Alvaro Karsz
2023-04-17 9:20 ` Michael S. Tsirkin
2023-04-17 9:20 ` Michael S. Tsirkin
2023-04-17 10:04 ` Alvaro Karsz
2023-04-17 10:04 ` Alvaro Karsz
2023-04-17 11:40 ` Michael S. Tsirkin
2023-04-17 11:40 ` Michael S. Tsirkin
2023-04-17 11:51 ` Alvaro Karsz
2023-04-17 11:51 ` Alvaro Karsz
2023-04-17 11:57 ` Michael S. Tsirkin
2023-04-17 11:57 ` Michael S. Tsirkin
2023-04-23 6:51 ` Alvaro Karsz
2023-04-23 6:51 ` Alvaro Karsz
2023-04-23 7:19 ` Michael S. Tsirkin
2023-04-23 7:19 ` Michael S. Tsirkin
2023-04-23 7:52 ` Alvaro Karsz
2023-04-23 7:52 ` Alvaro Karsz
2023-04-23 11:06 ` Michael S. Tsirkin
2023-04-23 11:06 ` Michael S. Tsirkin
2023-04-23 12:28 ` Alvaro Karsz
2023-04-23 12:28 ` Alvaro Karsz
2023-04-23 20:17 ` Michael S. Tsirkin
2023-04-23 20:17 ` Michael S. Tsirkin
2023-04-25 8:34 ` Michael S. Tsirkin
2023-04-25 8:34 ` Michael S. Tsirkin
2023-04-25 9:41 ` Alvaro Karsz
2023-04-25 9:41 ` Alvaro Karsz
2023-04-25 11:11 ` Alvaro Karsz
2023-04-25 11:11 ` Alvaro Karsz
2023-04-25 12:33 ` Michael S. Tsirkin
2023-04-25 12:33 ` Michael S. Tsirkin
2023-04-25 12:31 ` Michael S. Tsirkin [this message]
2023-04-25 12:31 ` Michael S. Tsirkin
2023-04-25 13:02 ` Alvaro Karsz
2023-04-25 13:02 ` Alvaro Karsz
2023-04-25 13:08 ` Michael S. Tsirkin
2023-04-25 13:08 ` Michael S. Tsirkin
2023-04-23 8:01 ` Alvaro Karsz
2023-04-23 8:01 ` Alvaro Karsz
2023-04-23 11:08 ` Michael S. Tsirkin
2023-04-23 11:08 ` Michael S. Tsirkin
2023-04-17 6:44 ` Xuan Zhuo
2023-04-17 6:44 ` Xuan Zhuo
2023-04-17 7:07 ` Alvaro Karsz
2023-04-17 7:07 ` Alvaro Karsz
2023-04-17 7:11 ` Michael S. Tsirkin
2023-04-17 7:11 ` Michael S. Tsirkin
2023-04-16 20:38 ` Michael S. Tsirkin
2023-04-16 20:38 ` Michael S. Tsirkin
2023-04-17 6:43 ` Alvaro Karsz
2023-04-17 6:43 ` Alvaro Karsz
2023-04-23 11:09 ` Michael S. Tsirkin
2023-04-23 11:09 ` Michael S. Tsirkin
2023-04-17 1:53 ` Xuan Zhuo
2023-04-17 1:53 ` Xuan Zhuo
2023-04-17 6:47 ` Alvaro Karsz
2023-04-17 6:47 ` Alvaro Karsz
2023-04-17 3:34 ` Xuan Zhuo
2023-04-17 3:34 ` Xuan Zhuo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230425082150-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alvaro.karsz@solid-run.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.