* [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink
@ 2023-11-14 9:55 Li RongQing
2023-11-14 11:31 ` Yunsheng Lin
2023-11-14 22:37 ` Jakub Kicinski
0 siblings, 2 replies; 5+ messages in thread
From: Li RongQing @ 2023-11-14 9:55 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, Liam.Howlett, anjali.k.kulkarni,
leon, fw, shayagr, idosch, razor, linyunsheng, netdev
if a PF has 256 or more VFs, ip link command will allocate a order 3
memory or more, and maybe trigger OOM due to memory fragement,
the VFs needed memory size is computed in rtnl_vfinfo_size.
so instroduce vnlmsg_new which calls netlink_alloc_large_skb in which
vmalloc is used for large memory, to avoid the failure of allocating
memory
ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE
Call Trace:
dump_stack+0x57/0x6a
dump_header+0x4a/0x210
oom_kill_process+0xe4/0x140
out_of_memory+0x3e8/0x790
__alloc_pages_slowpath.constprop.116+0x953/0xc50
__alloc_pages_nodemask+0x2af/0x310
kmalloc_large_node+0x38/0xf0
__kmalloc_node_track_caller+0x417/0x4d0
__kmalloc_reserve.isra.61+0x2e/0x80
__alloc_skb+0x82/0x1c0
rtnl_getlink+0x24f/0x370
rtnetlink_rcv_msg+0x12c/0x350
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x1b2/0x280
netlink_sendmsg+0x355/0x4a0
sock_sendmsg+0x5b/0x60
____sys_sendmsg+0x1ea/0x250
___sys_sendmsg+0x88/0xd0
__sys_sendmsg+0x5e/0xa0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f95a65a5b70
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
diff with v1: not move netlink_alloc_large_skb to skbuff.c
include/linux/netlink.h | 1 +
include/net/netlink.h | 17 +++++++++++++++++
net/core/rtnetlink.c | 2 +-
net/netlink/af_netlink.c | 2 +-
4 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 75d7de3..abe91ed 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -351,5 +351,6 @@ bool netlink_ns_capable(const struct sk_buff *skb,
struct user_namespace *ns, int cap);
bool netlink_capable(const struct sk_buff *skb, int cap);
bool netlink_net_capable(const struct sk_buff *skb, int cap);
+struct sk_buff *netlink_alloc_large_skb(unsigned int size, int broadcast);
#endif /* __LINUX_NETLINK_H */
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 83bdf78..7d31217 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1011,6 +1011,23 @@ static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
}
/**
+ * vnlmsg_new - Allocate a new netlink message with non-contiguous
+ * physical memory
+ * @payload: size of the message payload
+ *
+ * Use NLMSG_DEFAULT_SIZE if the size of the payload isn't known
+ * and a good default is needed.
+ *
+ * The allocated skb is unable to have frag page for shinfo->frags*,
+ * as the NULL setting for skb->head in netlink_skb_destructor() will
+ * bypass most of the handling in skb_release_data()
+ */
+static inline struct sk_buff *vnlmsg_new(size_t payload)
+{
+ return netlink_alloc_large_skb(nlmsg_total_size(payload), 0);
+}
+
+/**
* nlmsg_end - Finalize a netlink message
* @skb: socket buffer the message is stored in
* @nlh: netlink message header
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e8431c6..bfae6bf 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3849,7 +3849,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
goto out;
err = -ENOBUFS;
- nskb = nlmsg_new(if_nlmsg_size(dev, ext_filter_mask), GFP_KERNEL);
+ nskb = vnlmsg_new(if_nlmsg_size(dev, ext_filter_mask));
if (nskb == NULL)
goto out;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index eb086b0..17587f1 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1204,7 +1204,7 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
return sock;
}
-static struct sk_buff *netlink_alloc_large_skb(unsigned int size,
+struct sk_buff *netlink_alloc_large_skb(unsigned int size,
int broadcast)
{
struct sk_buff *skb;
--
2.9.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink
2023-11-14 9:55 [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink Li RongQing
@ 2023-11-14 11:31 ` Yunsheng Lin
2023-11-14 12:02 ` Li,Rongqing
2023-11-14 22:37 ` Jakub Kicinski
1 sibling, 1 reply; 5+ messages in thread
From: Yunsheng Lin @ 2023-11-14 11:31 UTC (permalink / raw)
To: Li RongQing, davem, edumazet, kuba, pabeni, Liam.Howlett,
anjali.k.kulkarni, leon, fw, shayagr, idosch, razor, netdev
On 2023/11/14 17:55, Li RongQing wrote:
> if a PF has 256 or more VFs, ip link command will allocate a order 3
> memory or more, and maybe trigger OOM due to memory fragement,
fragement -> fragment?
> the VFs needed memory size is computed in rtnl_vfinfo_size.
>
> so instroduce vnlmsg_new which calls netlink_alloc_large_skb in which
instroduce -> introduce?
> vmalloc is used for large memory, to avoid the failure of allocating
> memory
>
> ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
> __GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
> CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE
> Call Trace:
> dump_stack+0x57/0x6a
> dump_header+0x4a/0x210
> oom_kill_process+0xe4/0x140
> out_of_memory+0x3e8/0x790
> __alloc_pages_slowpath.constprop.116+0x953/0xc50
> __alloc_pages_nodemask+0x2af/0x310
> kmalloc_large_node+0x38/0xf0
> __kmalloc_node_track_caller+0x417/0x4d0
> __kmalloc_reserve.isra.61+0x2e/0x80
> __alloc_skb+0x82/0x1c0
> rtnl_getlink+0x24f/0x370
> rtnetlink_rcv_msg+0x12c/0x350
> netlink_rcv_skb+0x50/0x100
> netlink_unicast+0x1b2/0x280
> netlink_sendmsg+0x355/0x4a0
> sock_sendmsg+0x5b/0x60
> ____sys_sendmsg+0x1ea/0x250
> ___sys_sendmsg+0x88/0xd0
> __sys_sendmsg+0x5e/0xa0
> do_syscall_64+0x33/0x40
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f95a65a5b70
>
> Cc: Yunsheng Lin <linyunsheng@huawei.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> diff with v1: not move netlink_alloc_large_skb to skbuff.c
>
> include/linux/netlink.h | 1 +
> include/net/netlink.h | 17 +++++++++++++++++
> net/core/rtnetlink.c | 2 +-
> net/netlink/af_netlink.c | 2 +-
> 4 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netlink.h b/include/linux/netlink.h
> index 75d7de3..abe91ed 100644
> --- a/include/linux/netlink.h
> +++ b/include/linux/netlink.h
> @@ -351,5 +351,6 @@ bool netlink_ns_capable(const struct sk_buff *skb,
> struct user_namespace *ns, int cap);
> bool netlink_capable(const struct sk_buff *skb, int cap);
> bool netlink_net_capable(const struct sk_buff *skb, int cap);
> +struct sk_buff *netlink_alloc_large_skb(unsigned int size, int broadcast);
>
> #endif /* __LINUX_NETLINK_H */
> diff --git a/include/net/netlink.h b/include/net/netlink.h
> index 83bdf78..7d31217 100644
> --- a/include/net/netlink.h
> +++ b/include/net/netlink.h
> @@ -1011,6 +1011,23 @@ static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
> }
>
> /**
> + * vnlmsg_new - Allocate a new netlink message with non-contiguous
> + * physical memory
> + * @payload: size of the message payload
> + *
> + * Use NLMSG_DEFAULT_SIZE if the size of the payload isn't known
> + * and a good default is needed.
> + *
> + * The allocated skb is unable to have frag page for shinfo->frags*,
> + * as the NULL setting for skb->head in netlink_skb_destructor() will
> + * bypass most of the handling in skb_release_data()
> + */
> +static inline struct sk_buff *vnlmsg_new(size_t payload)
> +{
> + return netlink_alloc_large_skb(nlmsg_total_size(payload), 0);
> +}
The nlmsg_new() has the below parameters, there is no gfp flags for
vnlmsg_new() and always assuming GFP_KERNEL?
* @payload: size of the message payload
* @flags: the type of memory to allocate.
There are a lot of callers for nlmsg_new(), I am wondering how many
of existing nlmsg_new() caller can change to use vnlmsg_new().
https://elixir.free-electrons.com/linux/v6.7-rc1/A/ident/nlmsg_new
> +
> +/**
> * nlmsg_end - Finalize a netlink message
> * @skb: socket buffer the message is stored in
> * @nlh: netlink message header
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index e8431c6..bfae6bf 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3849,7 +3849,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> goto out;
>
> err = -ENOBUFS;
> - nskb = nlmsg_new(if_nlmsg_size(dev, ext_filter_mask), GFP_KERNEL);
> + nskb = vnlmsg_new(if_nlmsg_size(dev, ext_filter_mask));
> if (nskb == NULL)
> goto out;
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index eb086b0..17587f1 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1204,7 +1204,7 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
> return sock;
> }
>
> -static struct sk_buff *netlink_alloc_large_skb(unsigned int size,
> +struct sk_buff *netlink_alloc_large_skb(unsigned int size,
> int broadcast)
> {
> struct sk_buff *skb;
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink
2023-11-14 11:31 ` Yunsheng Lin
@ 2023-11-14 12:02 ` Li,Rongqing
0 siblings, 0 replies; 5+ messages in thread
From: Li,Rongqing @ 2023-11-14 12:02 UTC (permalink / raw)
To: Yunsheng Lin, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, Liam.Howlett@oracle.com,
anjali.k.kulkarni@oracle.com, leon@kernel.org, fw@strlen.de,
shayagr@amazon.com, idosch@nvidia.com, razor@blackwall.org,
netdev@vger.kernel.org
> -----Original Message-----
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Sent: Tuesday, November 14, 2023 7:32 PM
> To: Li,Rongqing <lirongqing@baidu.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> Liam.Howlett@oracle.com; anjali.k.kulkarni@oracle.com; leon@kernel.org;
> fw@strlen.de; shayagr@amazon.com; idosch@nvidia.com;
> razor@blackwall.org; netdev@vger.kernel.org
> Subject: Re: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it
> in rtnl_getlink
>
> On 2023/11/14 17:55, Li RongQing wrote:
> > if a PF has 256 or more VFs, ip link command will allocate a order 3
> > memory or more, and maybe trigger OOM due to memory fragement,
>
> fragement -> fragment?
I will fix it
Thanks
>
> > the VFs needed memory size is computed in rtnl_vfinfo_size.
> >
> > so instroduce vnlmsg_new which calls netlink_alloc_large_skb in which
>
> instroduce -> introduce?
Thanks
>
> > vmalloc is used for large memory, to avoid the failure of allocating
> > memory
> >
> > ip invoked oom-killer:
> gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
> > __GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
> > CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P
> OE
> > Call Trace:
> > dump_stack+0x57/0x6a
> > dump_header+0x4a/0x210
> > oom_kill_process+0xe4/0x140
> > out_of_memory+0x3e8/0x790
> > __alloc_pages_slowpath.constprop.116+0x953/0xc50
> > __alloc_pages_nodemask+0x2af/0x310
> > kmalloc_large_node+0x38/0xf0
> > __kmalloc_node_track_caller+0x417/0x4d0
> > __kmalloc_reserve.isra.61+0x2e/0x80
> > __alloc_skb+0x82/0x1c0
> > rtnl_getlink+0x24f/0x370
> > rtnetlink_rcv_msg+0x12c/0x350
> > netlink_rcv_skb+0x50/0x100
> > netlink_unicast+0x1b2/0x280
> > netlink_sendmsg+0x355/0x4a0
> > sock_sendmsg+0x5b/0x60
> > ____sys_sendmsg+0x1ea/0x250
> > ___sys_sendmsg+0x88/0xd0
> > __sys_sendmsg+0x5e/0xa0
> > do_syscall_64+0x33/0x40
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x7f95a65a5b70
> >
> > Cc: Yunsheng Lin <linyunsheng@huawei.com>
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> > diff with v1: not move netlink_alloc_large_skb to skbuff.c
> >
> > include/linux/netlink.h | 1 +
> > include/net/netlink.h | 17 +++++++++++++++++
> > net/core/rtnetlink.c | 2 +-
> > net/netlink/af_netlink.c | 2 +-
> > 4 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/netlink.h b/include/linux/netlink.h index
> > 75d7de3..abe91ed 100644
> > --- a/include/linux/netlink.h
> > +++ b/include/linux/netlink.h
> > @@ -351,5 +351,6 @@ bool netlink_ns_capable(const struct sk_buff *skb,
> > struct user_namespace *ns, int cap); bool
> netlink_capable(const
> > struct sk_buff *skb, int cap); bool netlink_net_capable(const struct
> > sk_buff *skb, int cap);
> > +struct sk_buff *netlink_alloc_large_skb(unsigned int size, int
> > +broadcast);
> >
> > #endif /* __LINUX_NETLINK_H */
> > diff --git a/include/net/netlink.h b/include/net/netlink.h index
> > 83bdf78..7d31217 100644
> > --- a/include/net/netlink.h
> > +++ b/include/net/netlink.h
> > @@ -1011,6 +1011,23 @@ static inline struct sk_buff *nlmsg_new(size_t
> > payload, gfp_t flags) }
> >
> > /**
> > + * vnlmsg_new - Allocate a new netlink message with non-contiguous
> > + * physical memory
> > + * @payload: size of the message payload
> > + *
> > + * Use NLMSG_DEFAULT_SIZE if the size of the payload isn't known
> > + * and a good default is needed.
> > + *
> > + * The allocated skb is unable to have frag page for shinfo->frags*,
> > + * as the NULL setting for skb->head in netlink_skb_destructor() will
> > + * bypass most of the handling in skb_release_data() */ static
> > +inline struct sk_buff *vnlmsg_new(size_t payload) {
> > + return netlink_alloc_large_skb(nlmsg_total_size(payload), 0); }
>
> The nlmsg_new() has the below parameters, there is no gfp flags for
> vnlmsg_new() and always assuming GFP_KERNEL?
>
I think that vnlmsg_new is similar as vmalloc, so no flag is needed, and always assuming GFP_KERNEL
-Li
> * @payload: size of the message payload
> * @flags: the type of memory to allocate.
>
> There are a lot of callers for nlmsg_new(), I am wondering how many of existing
> nlmsg_new() caller can change to use vnlmsg_new().
> https://elixir.free-electrons.com/linux/v6.7-rc1/A/ident/nlmsg_new
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink
2023-11-14 9:55 [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink Li RongQing
2023-11-14 11:31 ` Yunsheng Lin
@ 2023-11-14 22:37 ` Jakub Kicinski
2023-11-15 8:16 ` Li,Rongqing
1 sibling, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2023-11-14 22:37 UTC (permalink / raw)
To: Li RongQing
Cc: davem, edumazet, pabeni, Liam.Howlett, anjali.k.kulkarni, leon,
fw, shayagr, idosch, razor, linyunsheng, netdev
On Tue, 14 Nov 2023 17:55:22 +0800 Li RongQing wrote:
> - nskb = nlmsg_new(if_nlmsg_size(dev, ext_filter_mask), GFP_KERNEL);
> + nskb = vnlmsg_new(if_nlmsg_size(dev, ext_filter_mask));
Why vnlmsg_new()? nlmsg_ is a prefix, for netlink message.
prefixes do not combine like you're trying to make them.
Can you call it nlmsg_new_large() or similar?
> if (nskb == NULL)
> goto out;
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index eb086b0..17587f1 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1204,7 +1204,7 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
> return sock;
> }
>
> -static struct sk_buff *netlink_alloc_large_skb(unsigned int size,
> +struct sk_buff *netlink_alloc_large_skb(unsigned int size,
> int broadcast)
You need to fix the alignment of the continuation line.
Perhaps it now fits in 80chars so line break is not needed?
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink
2023-11-14 22:37 ` Jakub Kicinski
@ 2023-11-15 8:16 ` Li,Rongqing
0 siblings, 0 replies; 5+ messages in thread
From: Li,Rongqing @ 2023-11-15 8:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
Liam.Howlett@oracle.com, anjali.k.kulkarni@oracle.com,
leon@kernel.org, fw@strlen.de, shayagr@amazon.com,
idosch@nvidia.com, razor@blackwall.org, linyunsheng@huawei.com,
netdev@vger.kernel.org
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, November 15, 2023 6:38 AM
> To: Li,Rongqing <lirongqing@baidu.com>
> Cc: davem@davemloft.net; edumazet@google.com; pabeni@redhat.com;
> Liam.Howlett@oracle.com; anjali.k.kulkarni@oracle.com; leon@kernel.org;
> fw@strlen.de; shayagr@amazon.com; idosch@nvidia.com;
> razor@blackwall.org; linyunsheng@huawei.com; netdev@vger.kernel.org
> Subject: Re: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it
> in rtnl_getlink
>
> On Tue, 14 Nov 2023 17:55:22 +0800 Li RongQing wrote:
> > - nskb = nlmsg_new(if_nlmsg_size(dev, ext_filter_mask), GFP_KERNEL);
> > + nskb = vnlmsg_new(if_nlmsg_size(dev, ext_filter_mask));
>
> Why vnlmsg_new()? nlmsg_ is a prefix, for netlink message.
> prefixes do not combine like you're trying to make them.
> Can you call it nlmsg_new_large() or similar?
I will rename it as nlmsg_new_large
>
> > if (nskb == NULL)
>
> You need to fix the alignment of the continuation line.
> Perhaps it now fits in 80chars so line break is not needed?
Line break is not needed,
Thanks
-Li
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-15 8:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-14 9:55 [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl_getlink Li RongQing
2023-11-14 11:31 ` Yunsheng Lin
2023-11-14 12:02 ` Li,Rongqing
2023-11-14 22:37 ` Jakub Kicinski
2023-11-15 8:16 ` Li,Rongqing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).