* [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size @ 2025-01-05 21:30 Aaron Tomlin 2025-01-05 21:30 ` [RFC PATCH 1/1] " Aaron Tomlin 2025-01-06 23:47 ` [RFC PATCH 0/1] " Jakub Kicinski 0 siblings, 2 replies; 11+ messages in thread From: Aaron Tomlin @ 2025-01-05 21:30 UTC (permalink / raw) To: ronak.doshi, andrew+netdev, davem, edumazet, kuba, pabeni Cc: bcm-kernel-feedback-list, netdev, linux-kernel, atomlin Hi Ronak, Paolo, I managed to trigger the MAX_PAGE_ORDER warning in the context of function __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 whereby CONFIG_CMA is not enabled. I think it does not make sense to attempt a large memory allocation request for physically contiguous memory, to hold the Rx Data ring that could exceed the maximum page-order supported by the system. I am not familiar with drivers/net/vmxnet3 related code. Please let me know your thoughts. Thank you. Aaron Tomlin (1): vmxnet3: Adjust maximum Rx ring buffer size drivers/net/vmxnet3/vmxnet3_defs.h | 4 ++++ 1 file changed, 4 insertions(+) -- 2.47.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-05 21:30 [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size Aaron Tomlin @ 2025-01-05 21:30 ` Aaron Tomlin 2025-01-06 23:47 ` [RFC PATCH 0/1] " Jakub Kicinski 1 sibling, 0 replies; 11+ messages in thread From: Aaron Tomlin @ 2025-01-05 21:30 UTC (permalink / raw) To: ronak.doshi, andrew+netdev, davem, edumazet, kuba, pabeni Cc: bcm-kernel-feedback-list, netdev, linux-kernel, atomlin In the context of vmxnet3_rq_create(), the Rx Data ring's size is calculated by multiplying the size of Ring 0 by the size of the Rx ring buffer. See __dma_direct_alloc_pages(). Now if CMA (Contiguous Memory Allocator) is not available or the allocation attempt failed, the zone buddy allocator is used to try to allocate physically contiguous memory. The problem is, when the maximum supported Ring 0 and Rx ring buffer size is selected, the page-order required to accommodate the new size of the Rx Data ring is greater than the default MAX_PAGE_ORDER (10) i.e. __get_order(4096 * 2048) == 11. Consequently, this request can trigger the following warning condition in __alloc_pages_noprof(): if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; This patch ensures that the maximum Rx ring buffer size is reduced under a Linux kernel without CMA (Contiguous Memory Allocator) support. There is no point attempting a large memory allocation request that could exceed the maximum page-order supported by the system. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> --- drivers/net/vmxnet3/vmxnet3_defs.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h index 5c5148768039..cc71e697a5f3 100644 --- a/drivers/net/vmxnet3/vmxnet3_defs.h +++ b/drivers/net/vmxnet3/vmxnet3_defs.h @@ -466,7 +466,11 @@ union Vmxnet3_GenericDesc { #define VMXNET3_TXDATA_DESC_MIN_SIZE 128 #define VMXNET3_TXDATA_DESC_MAX_SIZE 2048 +#if defined(CONFIG_DMA_CMA) #define VMXNET3_RXDATA_DESC_MAX_SIZE 2048 +#else +#define VMXNET3_RXDATA_DESC_MAX_SIZE 1024 +#endif #define VMXNET3_TXTS_DESC_MAX_SIZE 256 #define VMXNET3_RXTS_DESC_MAX_SIZE 256 -- 2.47.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-05 21:30 [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size Aaron Tomlin 2025-01-05 21:30 ` [RFC PATCH 1/1] " Aaron Tomlin @ 2025-01-06 23:47 ` Jakub Kicinski 2025-01-06 23:51 ` Florian Fainelli 1 sibling, 1 reply; 11+ messages in thread From: Jakub Kicinski @ 2025-01-06 23:47 UTC (permalink / raw) To: Aaron Tomlin Cc: ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: > I managed to trigger the MAX_PAGE_ORDER warning in the context of function > __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini > 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. > Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 > whereby CONFIG_CMA is not enabled. I think it does not make sense to > attempt a large memory allocation request for physically contiguous memory, > to hold the Rx Data ring that could exceed the maximum page-order supported > by the system. I think CMA should be a bit orthogonal to the warning. Off the top of my head the usual way to solve the warning is to add __GFP_NOWARN to the allocations which trigger it. And then handle the error gracefully. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-06 23:47 ` [RFC PATCH 0/1] " Jakub Kicinski @ 2025-01-06 23:51 ` Florian Fainelli 2025-01-07 0:57 ` Jakub Kicinski 0 siblings, 1 reply; 11+ messages in thread From: Florian Fainelli @ 2025-01-06 23:51 UTC (permalink / raw) To: Jakub Kicinski, Aaron Tomlin Cc: ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote: > On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: >> I managed to trigger the MAX_PAGE_ORDER warning in the context of function >> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini >> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. >> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 >> whereby CONFIG_CMA is not enabled. I think it does not make sense to >> attempt a large memory allocation request for physically contiguous memory, >> to hold the Rx Data ring that could exceed the maximum page-order supported >> by the system. > > I think CMA should be a bit orthogonal to the warning. > > Off the top of my head the usual way to solve the warning is to add > __GFP_NOWARN to the allocations which trigger it. And then handle > the error gracefully. That IMHO should really be the default for any driver that calls __netdev_alloc_skb() under the hood, we should not really have to specify __GFP_NOWARN, rather if people want it, they should specify it. -- Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-06 23:51 ` Florian Fainelli @ 2025-01-07 0:57 ` Jakub Kicinski 2025-01-07 22:55 ` Aaron Tomlin 2025-01-08 16:53 ` Florian Fainelli 0 siblings, 2 replies; 11+ messages in thread From: Jakub Kicinski @ 2025-01-07 0:57 UTC (permalink / raw) To: Florian Fainelli Cc: Aaron Tomlin, ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Mon, 6 Jan 2025 15:51:10 -0800 Florian Fainelli wrote: > On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote: > > On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: > >> I managed to trigger the MAX_PAGE_ORDER warning in the context of function > >> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini > >> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. > >> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 > >> whereby CONFIG_CMA is not enabled. I think it does not make sense to > >> attempt a large memory allocation request for physically contiguous memory, > >> to hold the Rx Data ring that could exceed the maximum page-order supported > >> by the system. > > > > I think CMA should be a bit orthogonal to the warning. > > > > Off the top of my head the usual way to solve the warning is to add > > __GFP_NOWARN to the allocations which trigger it. And then handle > > the error gracefully. > > That IMHO should really be the default for any driver that calls > __netdev_alloc_skb() under the hood, we should not really have to > specify __GFP_NOWARN, rather if people want it, they should specify it. True, although TBH I don't fully understand why this flag exists in the first place. Is it just supposed to be catching programming errors, or is it due to potential DoS implications of users triggering large allocations? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-07 0:57 ` Jakub Kicinski @ 2025-01-07 22:55 ` Aaron Tomlin 2025-01-07 23:46 ` Jakub Kicinski 2025-01-08 16:53 ` Florian Fainelli 1 sibling, 1 reply; 11+ messages in thread From: Aaron Tomlin @ 2025-01-07 22:55 UTC (permalink / raw) To: Jakub Kicinski Cc: Florian Fainelli, Aaron Tomlin, ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Tue, 7 Jan 2025, Jakub Kicinski wrote: > True, although TBH I don't fully understand why this flag exists > in the first place. Is it just supposed to be catching programming > errors, or is it due to potential DoS implications of users triggering > large allocations? Jakub, I suspect that introducing __GFP_NOWARN would mask the issue, no? I think the warning was useful. Otherwise it would be rather difficult to establish precisely why the Rx Data ring was disable. In this particular case, if I understand correctly, the intended size of the Rx Data ring was simply too large due to the size of the maximum supported Rx Data buffer. -- Aaron Tomlin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-07 22:55 ` Aaron Tomlin @ 2025-01-07 23:46 ` Jakub Kicinski [not found] ` <CAP1Q3XQ_Fubke4=SYrFkaiJj0RHB99ehdMedMVDTFtRS6R_RCw@mail.gmail.com> 2025-01-08 21:05 ` Aaron Tomlin 0 siblings, 2 replies; 11+ messages in thread From: Jakub Kicinski @ 2025-01-07 23:46 UTC (permalink / raw) To: Aaron Tomlin Cc: Florian Fainelli, ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Tue, 7 Jan 2025 22:55:38 +0000 (GMT) Aaron Tomlin wrote: > On Tue, 7 Jan 2025, Jakub Kicinski wrote: > > True, although TBH I don't fully understand why this flag exists > > in the first place. Is it just supposed to be catching programming > > errors, or is it due to potential DoS implications of users triggering > > large allocations? > > Jakub, > > I suspect that introducing __GFP_NOWARN would mask the issue, no? > I think the warning was useful. Otherwise it would be rather difficult to > establish precisely why the Rx Data ring was disable. In this particular > case, if I understand correctly, the intended size of the Rx Data ring was > simply too large due to the size of the maximum supported Rx Data buffer. This is a bit of a weird driver. But we should distinguish the default ring size, which yes, should not be too large, and max ring size which can be large but user setting a large size risks the fact the allocations will fail and device will not open. This driver seems to read the default size from the hypervisor, is that the value that is too large in your case? Maybe we should min() it with something reasonable? The max allowed to be set via ethtool can remain high IMO ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAP1Q3XQ_Fubke4=SYrFkaiJj0RHB99ehdMedMVDTFtRS6R_RCw@mail.gmail.com>]
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size [not found] ` <CAP1Q3XQ_Fubke4=SYrFkaiJj0RHB99ehdMedMVDTFtRS6R_RCw@mail.gmail.com> @ 2025-01-08 17:24 ` Ronak Doshi 0 siblings, 0 replies; 11+ messages in thread From: Ronak Doshi @ 2025-01-08 17:24 UTC (permalink / raw) To: Jakub Kicinski Cc: Aaron Tomlin, Florian Fainelli, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Tue, Jan 7, 2025 at 3:46 PM Jakub Kicinski <kuba@kernel.org> wrote: >This driver seems to read the default size from the hypervisor, is that >the value that is too large in your case? The default should be 128 which is way less than max value. Thanks, Ronak -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-07 23:46 ` Jakub Kicinski [not found] ` <CAP1Q3XQ_Fubke4=SYrFkaiJj0RHB99ehdMedMVDTFtRS6R_RCw@mail.gmail.com> @ 2025-01-08 21:05 ` Aaron Tomlin 2025-01-15 20:55 ` Aaron Tomlin 1 sibling, 1 reply; 11+ messages in thread From: Aaron Tomlin @ 2025-01-08 21:05 UTC (permalink / raw) To: Jakub Kicinski Cc: Aaron Tomlin, Florian Fainelli, ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Tue, 7 Jan 2025, Jakub Kicinski wrote: > This is a bit of a weird driver. But we should distinguish the default > ring size, which yes, should not be too large, and max ring size which > can be large but user setting a large size risks the fact the > allocations will fail and device will not open. > > This driver seems to read the default size from the hypervisor, is that > the value that is too large in your case? Maybe we should min() it with > something reasonable? The max allowed to be set via ethtool can remain > high IMO > See vmxnet3_get_ringparam(). If I understand correctly, since commit 50a5ce3e7116a ("vmxnet3: add receive data ring support"), if the specified VMXNET3 adapter has support for the Rx Data ring feature then the maximum Rx Data buffer size is reported as VMXNET3_RXDATA_DESC_MAX_SIZE (i.e. 2048) by 'ethtool'. Furthermore, See vmxnet3_set_ringparam(). A user specified Rx mini value cannot be more than VMXNET3_RXDATA_DESC_MAX_SIZE. Indeed the Rx mini value in the context of VMXNET3 would be the size of the Rx Data ring buffer. See the following excerpt from vmxnet3_set_ringparam(). As far as I can tell, the Rx Data buffer cannot be more than VMXNET3_RXDATA_DESC_MAX_SIZE: 686 static int 687 vmxnet3_set_ringparam(struct net_device *netdev, 688 struct ethtool_ringparam *param, 689 struct kernel_ethtool_ringparam *kernel_param, 690 struct netlink_ext_ack *extack) 691 { : 760 new_rxdata_desc_size = 761 (param->rx_mini_pending + VMXNET3_RXDATA_DESC_SIZE_MASK) & 762 ~VMXNET3_RXDATA_DESC_SIZE_MASK; 763 new_rxdata_desc_size = min_t(u16, new_rxdata_desc_size, 764 VMXNET3_RXDATA_DESC_MAX_SIZE); Have I missed something? -- Aaron Tomlin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-08 21:05 ` Aaron Tomlin @ 2025-01-15 20:55 ` Aaron Tomlin 0 siblings, 0 replies; 11+ messages in thread From: Aaron Tomlin @ 2025-01-15 20:55 UTC (permalink / raw) To: Jakub Kicinski Cc: Florian Fainelli, ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On Wed, Jan 08, 2025 at 09:05:15PM +0000, Aaron Tomlin wrote: > On Tue, 7 Jan 2025, Jakub Kicinski wrote: > > This is a bit of a weird driver. But we should distinguish the default > > ring size, which yes, should not be too large, and max ring size which > > can be large but user setting a large size risks the fact the > > allocations will fail and device will not open. > > > > This driver seems to read the default size from the hypervisor, is that > > the value that is too large in your case? Maybe we should min() it with > > something reasonable? The max allowed to be set via ethtool can remain > > high IMO > > > > See vmxnet3_get_ringparam(). If I understand correctly, since commit > 50a5ce3e7116a ("vmxnet3: add receive data ring support"), if the specified > VMXNET3 adapter has support for the Rx Data ring feature then the maximum > Rx Data buffer size is reported as VMXNET3_RXDATA_DESC_MAX_SIZE (i.e. 2048) > by 'ethtool'. Furthermore, See vmxnet3_set_ringparam(). A user specified Rx > mini value cannot be more than VMXNET3_RXDATA_DESC_MAX_SIZE. Indeed the Rx > mini value in the context of VMXNET3 would be the size of the Rx Data ring > buffer. See the following excerpt from vmxnet3_set_ringparam(). As far as I > can tell, the Rx Data buffer cannot be more than > VMXNET3_RXDATA_DESC_MAX_SIZE: > > 686 static int > 687 vmxnet3_set_ringparam(struct net_device *netdev, > 688 struct ethtool_ringparam *param, > 689 struct kernel_ethtool_ringparam *kernel_param, > 690 struct netlink_ext_ack *extack) > 691 { > : > 760 new_rxdata_desc_size = > 761 (param->rx_mini_pending + VMXNET3_RXDATA_DESC_SIZE_MASK) & > 762 ~VMXNET3_RXDATA_DESC_SIZE_MASK; > 763 new_rxdata_desc_size = min_t(u16, new_rxdata_desc_size, > 764 VMXNET3_RXDATA_DESC_MAX_SIZE); > > > Have I missed something? Any thoughts? Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size 2025-01-07 0:57 ` Jakub Kicinski 2025-01-07 22:55 ` Aaron Tomlin @ 2025-01-08 16:53 ` Florian Fainelli 1 sibling, 0 replies; 11+ messages in thread From: Florian Fainelli @ 2025-01-08 16:53 UTC (permalink / raw) To: Jakub Kicinski, Florian Fainelli Cc: Aaron Tomlin, ronak.doshi, andrew+netdev, davem, edumazet, pabeni, bcm-kernel-feedback-list, netdev, linux-kernel On 1/6/25 16:57, Jakub Kicinski wrote: > On Mon, 6 Jan 2025 15:51:10 -0800 Florian Fainelli wrote: >> On 1/6/25 15:47, 'Jakub Kicinski' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote: >>> On Sun, 5 Jan 2025 21:30:35 +0000 Aaron Tomlin wrote: >>>> I managed to trigger the MAX_PAGE_ORDER warning in the context of function >>>> __alloc_pages_noprof() with /usr/sbin/ethtool --set-ring rx 4096 rx-mini >>>> 2048 [devname]' using the maximum supported Ring 0 and Rx ring buffer size. >>>> Admittedly this was under the stock Linux kernel-4.18.0-477.27.1.el8_8 >>>> whereby CONFIG_CMA is not enabled. I think it does not make sense to >>>> attempt a large memory allocation request for physically contiguous memory, >>>> to hold the Rx Data ring that could exceed the maximum page-order supported >>>> by the system. >>> >>> I think CMA should be a bit orthogonal to the warning. >>> >>> Off the top of my head the usual way to solve the warning is to add >>> __GFP_NOWARN to the allocations which trigger it. And then handle >>> the error gracefully. >> >> That IMHO should really be the default for any driver that calls >> __netdev_alloc_skb() under the hood, we should not really have to >> specify __GFP_NOWARN, rather if people want it, they should specify it. > > True, although TBH I don't fully understand why this flag exists > in the first place. Is it just supposed to be catching programming > errors, or is it due to potential DoS implications of users triggering > large allocations? > There is some value IMHO in printing when allocations fail, where they came from, their gfp_t flags and page order so you can track high order offenders in hot paths (one of our Wi-Fi driver was notorious for doing that and having verbose out of memory dumps by default definitively helped). Once you fix those however, hogging the system while dumping lines and lines of information onto a slow console tends to be worse than the recovery from out of memory itself. One could argue that triggering an OOM plus dumping information can result in a DoS, so that should be frowned upon... -- Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-01-15 20:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-05 21:30 [RFC PATCH 0/1] vmxnet3: Adjust maximum Rx ring buffer size Aaron Tomlin
2025-01-05 21:30 ` [RFC PATCH 1/1] " Aaron Tomlin
2025-01-06 23:47 ` [RFC PATCH 0/1] " Jakub Kicinski
2025-01-06 23:51 ` Florian Fainelli
2025-01-07 0:57 ` Jakub Kicinski
2025-01-07 22:55 ` Aaron Tomlin
2025-01-07 23:46 ` Jakub Kicinski
[not found] ` <CAP1Q3XQ_Fubke4=SYrFkaiJj0RHB99ehdMedMVDTFtRS6R_RCw@mail.gmail.com>
2025-01-08 17:24 ` Ronak Doshi
2025-01-08 21:05 ` Aaron Tomlin
2025-01-15 20:55 ` Aaron Tomlin
2025-01-08 16:53 ` Florian Fainelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).