* [RFC PATCH] packet: rework packet_pick_tx_queue() to use common code selection @ 2019-02-28 14:35 Paolo Abeni 2019-02-28 16:30 ` Willem de Bruijn 0 siblings, 1 reply; 4+ messages in thread From: Paolo Abeni @ 2019-02-28 14:35 UTC (permalink / raw) To: netdev Cc: Saeed Mahameed, Jeff Kirsher, Eric Dumazet, Willem de Bruijn, David S. Miller Currently packet_pick_tx_queue() is the only caller of ndo_select_queue() using a fallback argument other than __netdev_pick_tx. Leveraging rx queue, we can obtain a similar queue selection behavior using core helpers. After this change, ndo_select_queue() is always invoked with __netdev_pick_tx() as fallback. We can change ndo_select_queue() signature in a followup patch, dropping an indirect call per transmitted packet in some scenarios (e.g. TCP syn and XDP generic xmit) This changes slightly how af packet queue selection happens when PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit() tacking in account both XPS and TC mapping. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- Note: the main goal here is getting rid of the fallback() indirect call in the device drivers implementing ndo_select_queue(). We can obtain the same result with the INDIRECT_CALL() harness. Both ways we need to export __netdev_pick_tx() but here we avoid the need for additional branching. --- include/linux/netdevice.h | 2 ++ net/core/dev.c | 5 +++-- net/packet/af_packet.c | 9 +++++++-- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c10b60297d28..29a558bded82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2152,6 +2152,8 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, &qdisc_xmit_lock_key); \ } +u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, + struct net_device *sb_dev); struct netdev_queue *netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev); diff --git a/net/core/dev.c b/net/core/dev.c index 2b67f2aa59dd..004d1180671a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3704,8 +3704,8 @@ u16 dev_pick_tx_cpu_id(struct net_device *dev, struct sk_buff *skb, } EXPORT_SYMBOL(dev_pick_tx_cpu_id); -static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, - struct net_device *sb_dev) +u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, + struct net_device *sb_dev) { struct sock *sk = skb->sk; int queue_index = sk_tx_queue_get(sk); @@ -3729,6 +3729,7 @@ static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, return queue_index; } +EXPORT_SYMBOL(__netdev_pick_tx); struct netdev_queue *netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8376bc1c1508..1f7ae4a34d27 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -285,14 +285,19 @@ static u16 packet_pick_tx_queue(struct sk_buff *skb) { struct net_device *dev = skb->dev; const struct net_device_ops *ops = dev->netdev_ops; + int hint = __packet_pick_tx_queue(dev, skb, NULL); u16 queue_index; +#ifdef CONFIG_XPS + skb->sender_cpu = hint + 1; +#endif + skb_record_rx_queue(skb, hint); if (ops->ndo_select_queue) { queue_index = ops->ndo_select_queue(dev, skb, NULL, - __packet_pick_tx_queue); + __netdev_pick_tx); queue_index = netdev_cap_txqueue(dev, queue_index); } else { - queue_index = __packet_pick_tx_queue(dev, skb, NULL); + queue_index = __netdev_pick_tx(dev, skb, NULL); } return queue_index; -- 2.20.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] packet: rework packet_pick_tx_queue() to use common code selection 2019-02-28 14:35 [RFC PATCH] packet: rework packet_pick_tx_queue() to use common code selection Paolo Abeni @ 2019-02-28 16:30 ` Willem de Bruijn 2019-02-28 17:00 ` Paolo Abeni 0 siblings, 1 reply; 4+ messages in thread From: Willem de Bruijn @ 2019-02-28 16:30 UTC (permalink / raw) To: Paolo Abeni Cc: Network Development, Saeed Mahameed, Jeff Kirsher, Eric Dumazet, Willem de Bruijn, David S. Miller On Thu, Feb 28, 2019 at 10:04 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Currently packet_pick_tx_queue() is the only caller of > ndo_select_queue() using a fallback argument other than > __netdev_pick_tx. > > Leveraging rx queue, we can obtain a similar queue selection > behavior using core helpers. After this change, ndo_select_queue() > is always invoked with __netdev_pick_tx() as fallback. > We can change ndo_select_queue() signature in a followup patch, > dropping an indirect call per transmitted packet in some scenarios > (e.g. TCP syn and XDP generic xmit) > > This changes slightly how af packet queue selection happens when > PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit() > tacking in account both XPS and TC mapping. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > Note: the main goal here is getting rid of the fallback() indirect > call in the device drivers implementing ndo_select_queue(). We can > obtain the same result with the INDIRECT_CALL() harness. Both ways > we need to export __netdev_pick_tx() but here we avoid the need > for additional branching. > --- > include/linux/netdevice.h | 2 ++ > net/core/dev.c | 5 +++-- > net/packet/af_packet.c | 9 +++++++-- > 3 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index c10b60297d28..29a558bded82 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -2152,6 +2152,8 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, > &qdisc_xmit_lock_key); \ > } > > +u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > + struct net_device *sb_dev); > struct netdev_queue *netdev_pick_tx(struct net_device *dev, > struct sk_buff *skb, > struct net_device *sb_dev); > diff --git a/net/core/dev.c b/net/core/dev.c > index 2b67f2aa59dd..004d1180671a 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3704,8 +3704,8 @@ u16 dev_pick_tx_cpu_id(struct net_device *dev, struct sk_buff *skb, > } > EXPORT_SYMBOL(dev_pick_tx_cpu_id); > > -static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > - struct net_device *sb_dev) > +u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > + struct net_device *sb_dev) > { > struct sock *sk = skb->sk; > int queue_index = sk_tx_queue_get(sk); > @@ -3729,6 +3729,7 @@ static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > > return queue_index; > } > +EXPORT_SYMBOL(__netdev_pick_tx); > > struct netdev_queue *netdev_pick_tx(struct net_device *dev, > struct sk_buff *skb, > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index 8376bc1c1508..1f7ae4a34d27 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -285,14 +285,19 @@ static u16 packet_pick_tx_queue(struct sk_buff *skb) > { > struct net_device *dev = skb->dev; > const struct net_device_ops *ops = dev->netdev_ops; > + int hint = __packet_pick_tx_queue(dev, skb, NULL); > > u16 queue_index; > > +#ifdef CONFIG_XPS > + skb->sender_cpu = hint + 1; should this not be raw_smp_processor_id() without the modulo by tx queue count? also, __packet_pick_tx_queue is a one-line wrapper around dev_pick_tx_cpu_id. If simplifying the entire wrapper can be removed. > > > +#endif > + skb_record_rx_queue(skb, hint); how is this used in the transmit path? > if (ops->ndo_select_queue) { > queue_index = ops->ndo_select_queue(dev, skb, NULL, > - __packet_pick_tx_queue); > + __netdev_pick_tx); > queue_index = netdev_cap_txqueue(dev, queue_index); > } else { > - queue_index = __packet_pick_tx_queue(dev, skb, NULL); > + queue_index = __netdev_pick_tx(dev, skb, NULL); > } this now duplicates the core of netdev_pick_tx. I wonder if we can just call that instead of open coding. either by changing dev_direct_xmit to take a struct netdev_queue * (it only has two callers), or by moving the core into a separate netdev_pick_tx_core (as __netdev_pick_tx is already used). ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] packet: rework packet_pick_tx_queue() to use common code selection 2019-02-28 16:30 ` Willem de Bruijn @ 2019-02-28 17:00 ` Paolo Abeni 2019-02-28 18:29 ` Willem de Bruijn 0 siblings, 1 reply; 4+ messages in thread From: Paolo Abeni @ 2019-02-28 17:00 UTC (permalink / raw) To: Willem de Bruijn Cc: Network Development, Saeed Mahameed, Jeff Kirsher, Eric Dumazet, Willem de Bruijn, David S. Miller Hi, Thank you for the feedback. On Thu, 2019-02-28 at 11:30 -0500, Willem de Bruijn wrote: > On Thu, Feb 28, 2019 at 10:04 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Currently packet_pick_tx_queue() is the only caller of > > ndo_select_queue() using a fallback argument other than > > __netdev_pick_tx. > > > > Leveraging rx queue, we can obtain a similar queue selection > > behavior using core helpers. After this change, ndo_select_queue() > > is always invoked with __netdev_pick_tx() as fallback. > > We can change ndo_select_queue() signature in a followup patch, > > dropping an indirect call per transmitted packet in some scenarios > > (e.g. TCP syn and XDP generic xmit) > > > > This changes slightly how af packet queue selection happens when > > PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit() > > tacking in account both XPS and TC mapping. > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > Note: the main goal here is getting rid of the fallback() indirect > > call in the device drivers implementing ndo_select_queue(). We can > > obtain the same result with the INDIRECT_CALL() harness. Both ways > > we need to export __netdev_pick_tx() but here we avoid the need > > for additional branching. > > --- > > include/linux/netdevice.h | 2 ++ > > net/core/dev.c | 5 +++-- > > net/packet/af_packet.c | 9 +++++++-- > > 3 files changed, 12 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index c10b60297d28..29a558bded82 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -2152,6 +2152,8 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, > > &qdisc_xmit_lock_key); \ > > } > > > > +u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > > + struct net_device *sb_dev); > > struct netdev_queue *netdev_pick_tx(struct net_device *dev, > > struct sk_buff *skb, > > struct net_device *sb_dev); > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 2b67f2aa59dd..004d1180671a 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3704,8 +3704,8 @@ u16 dev_pick_tx_cpu_id(struct net_device *dev, struct sk_buff *skb, > > } > > EXPORT_SYMBOL(dev_pick_tx_cpu_id); > > > > -static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > > - struct net_device *sb_dev) > > +u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > > + struct net_device *sb_dev) > > { > > struct sock *sk = skb->sk; > > int queue_index = sk_tx_queue_get(sk); > > @@ -3729,6 +3729,7 @@ static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > > > > return queue_index; > > } > > +EXPORT_SYMBOL(__netdev_pick_tx); > > > > struct netdev_queue *netdev_pick_tx(struct net_device *dev, > > struct sk_buff *skb, > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > index 8376bc1c1508..1f7ae4a34d27 100644 > > --- a/net/packet/af_packet.c > > +++ b/net/packet/af_packet.c > > @@ -285,14 +285,19 @@ static u16 packet_pick_tx_queue(struct sk_buff *skb) > > { > > struct net_device *dev = skb->dev; > > const struct net_device_ops *ops = dev->netdev_ops; > > + int hint = __packet_pick_tx_queue(dev, skb, NULL); > > > > u16 queue_index; > > > > +#ifdef CONFIG_XPS > > + skb->sender_cpu = hint + 1; > > should this not be raw_smp_processor_id() without the modulo by tx queue count? yes, here sender_cpu is intentionally 'misused' to keep a behavior more similar to the unpatched kernel. I'll drop the modulo in the next iteration. > also, __packet_pick_tx_queue is a one-line wrapper around > dev_pick_tx_cpu_id. If simplifying the entire wrapper can be removed. Yes, ideally that will go with a follow-up patch. dev_pick_tx_cpu_id() is going to loose an argument too, and I'll have to touch this chunk again. I thought that touching it once would reduce noise. > > > > > +#endif > > + skb_record_rx_queue(skb, hint); > > how is this used in the transmit path? __netdev_pick_tx() [no XPS conf in place] -> skb_tx_hash() -> skb_rx_queue_recorded() -> the packet is mapped to (), module tc mappings. > > if (ops->ndo_select_queue) { > > queue_index = ops->ndo_select_queue(dev, skb, NULL, > > - __packet_pick_tx_queue); > > + __netdev_pick_tx); > > queue_index = netdev_cap_txqueue(dev, queue_index); > > } else { > > - queue_index = __packet_pick_tx_queue(dev, skb, NULL); > > + queue_index = __netdev_pick_tx(dev, skb, NULL); > > } > > this now duplicates the core of netdev_pick_tx. I wonder if we can > just call that instead of open coding. > > either by changing to take a struct netdev_queue * (it > only has two callers), or by moving the core into a separate > netdev_pick_tx_core (as __netdev_pick_tx is already used). I tried to minimize the changes in this RFC. I'll go for netdev_pick_tx_core() in the next iteration, if there are no problem with an additional EXPORT_SYMBOL_GPL(). Thanks, Paolo ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] packet: rework packet_pick_tx_queue() to use common code selection 2019-02-28 17:00 ` Paolo Abeni @ 2019-02-28 18:29 ` Willem de Bruijn 0 siblings, 0 replies; 4+ messages in thread From: Willem de Bruijn @ 2019-02-28 18:29 UTC (permalink / raw) To: Paolo Abeni Cc: Network Development, Saeed Mahameed, Jeff Kirsher, Eric Dumazet, Willem de Bruijn, David S. Miller > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > index 8376bc1c1508..1f7ae4a34d27 100644 > > > --- a/net/packet/af_packet.c > > > +++ b/net/packet/af_packet.c > > > @@ -285,14 +285,19 @@ static u16 packet_pick_tx_queue(struct sk_buff *skb) > > > { > > > struct net_device *dev = skb->dev; > > > const struct net_device_ops *ops = dev->netdev_ops; > > > + int hint = __packet_pick_tx_queue(dev, skb, NULL); > > > > > > u16 queue_index; > > > > > > +#ifdef CONFIG_XPS > > > + skb->sender_cpu = hint + 1; > > > > should this not be raw_smp_processor_id() without the modulo by tx queue count? > > yes, here sender_cpu is intentionally 'misused' to keep a behavior more > similar to the unpatched kernel. I'll drop the modulo in the next > iteration. > > > also, __packet_pick_tx_queue is a one-line wrapper around > > dev_pick_tx_cpu_id. If simplifying the entire wrapper can be removed. > > Yes, ideally that will go with a follow-up patch. dev_pick_tx_cpu_id() > is going to loose an argument too, and I'll have to touch this chunk > again. I thought that touching it once would reduce noise. > > > > > > > > +#endif > > > + skb_record_rx_queue(skb, hint); > > > > how is this used in the transmit path? > > __netdev_pick_tx() [no XPS conf in place] -> skb_tx_hash() -> > skb_rx_queue_recorded() -> the packet is mapped to (), module tc > mappings. Ah right. > > > if (ops->ndo_select_queue) { > > > queue_index = ops->ndo_select_queue(dev, skb, NULL, > > > - __packet_pick_tx_queue); > > > + __netdev_pick_tx); > > > queue_index = netdev_cap_txqueue(dev, queue_index); > > > } else { > > > - queue_index = __packet_pick_tx_queue(dev, skb, NULL); > > > + queue_index = __netdev_pick_tx(dev, skb, NULL); > > > } > > > > this now duplicates the core of netdev_pick_tx. I wonder if we can > > just call that instead of open coding. > > > > either by changing to take a struct netdev_queue * (it > > only has two callers), or by moving the core into a separate > > netdev_pick_tx_core (as __netdev_pick_tx is already used). > > I tried to minimize the changes in this RFC. I'll go for > netdev_pick_tx_core() in the next iteration, if there are no problem > with an additional EXPORT_SYMBOL_GPL(). I think the original patch may have consciously added a trivial packet_pick_tx_queue to keep the transmit path as short as possible for this PACKET_QDISC_BYPASS mode. The more we deduplicate between that and normal path, the less the feature makes sense. That said, this particular change seems fine. Even more so if it allows a follow-up optimization that benefits the common case. Thanks for answering my questions. Acked-by: Willem de Bruijn <willemb@google.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-02-28 18:29 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-02-28 14:35 [RFC PATCH] packet: rework packet_pick_tx_queue() to use common code selection Paolo Abeni 2019-02-28 16:30 ` Willem de Bruijn 2019-02-28 17:00 ` Paolo Abeni 2019-02-28 18:29 ` Willem de Bruijn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).