All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>,
	"Samudrala, Sridhar" <sridhar.samudrala@intel.com>
Cc: Jiri Benc <jbenc@redhat.com>, Jiri Pirko <jiri@resnulli.us>,
	netdev@vger.kernel.org, Thomas Graf <tgraf@suug.ch>,
	Roopa Prabhu <roopa@cumulusnetworks.com>,
	ogerlitz@mellanox.com, ast@kernel.org, daniel@iogearbox.net,
	simon.horman@netronome.com, Paolo Abeni <pabeni@redhat.com>,
	Pravin B Shelar <pshelar@nicira.com>,
	hannes@stressinduktion.org, kubakici@wp.pl
Subject: Re: [RFC] net: store port/representative id in metadata_dst
Date: Fri, 23 Sep 2016 13:25:10 -0700	[thread overview]
Message-ID: <57E58FA6.3050001@gmail.com> (raw)
In-Reply-To: <20160923211728.4c2f05ab@jkicinski-Precision-T1700>

On 16-09-23 01:17 PM, Jakub Kicinski wrote:
> On Fri, 23 Sep 2016 10:22:59 -0700, Samudrala, Sridhar wrote:
>> On 9/23/2016 8:29 AM, Jakub Kicinski wrote:
>>> On Fri, 23 Sep 2016 07:23:26 -0700, John Fastabend wrote:  
>>>> Yep, I like the idea in general. I had a slightly different approach in
>>>> mind though. If you look at __dev_queue_xmit() there is a void
>>>> accel_priv pointer (gather you found this based on your commit note).
>>>> My take was we could extend this a bit so it can be used by the VFR
>>>> devices and they could do a dev_queue_xmit_accel(). In this way there is
>>>> no need to touch /net/core/{filter, dst, ip_tunnel}.c etc. Maybe the
>>>> accel logic needs to be extended to push the priv pointer all the way
>>>> through the xmit routine of the target netdev though. This should look
>>>> a lot like the macvlan accelerated xmit device path without the
>>>> switching logic.
>>>>
>>>> Of course maybe the name would be extended to dev_queue_xmit_extended()
>>>> or something.
>>>>
>>>> So the flow on ingress would be,
>>>>
>>>>    1. pkt_received_by_PF_netdev
>>>>    2. PF_netdev reads some tag off packet/descriptor and sets correct
>>>>       skb->dev field. This is needed so stack "sees" packets from
>>>>       correct VF ports.
>>>>    3. packet passed up to stack.
>>>>
>>>> I guess it is a bit "zombie" like on the receive path because the packet
>>>> is never actually handled by VF netdev code per se and on egress can
>>>> traverse both the VFR and PF netdevs qdiscs. But on the other hand the
>>>> VFR netdevs and PF netdevs are all in the same driver. Plus using a
>>>> queue per VFR is a bit of a waste as its not needed and also hardware
>>>> may not have any mechanism to push VF traffic onto a rx queue.
>>>>
>>>> On egress,
>>>>
>>>>    1. VFR xmit is called
>>>>    2. VFR xmit calls dev_queue_xmit_accel() with some meta-data if needed
>>>>       for the lower netdev
>>>>    3. lower netdev sends out the packet.
>>>>
>>>> Again we don't need to waste any queues for each VFR and the VFR can be
>>>> a LLTX device. In this scheme I think you avoid much of the changes in
>>>> your patch and keep it all contained in the driver. Any thoughts?  
>>
>> The 'accel' parameter in dev_queue_xmit_accel() is currently only passed
>> to ndo_select_queue() via netdev_pick_tx() and is used to select the tx 
>> queue.
>> Also, it is not passed all the way to the driver specific xmit routine.  
>> Doesn't it require
>> changing all the driver xmit routines if we want to pass this parameter?
>>
>>> Goes without saying that you have a much better understanding of packet
>>> scheduling so please bear with me :)  My target model is that I have
>>> n_cpus x "n_tc/prio" queues on the PF and I want to transmit the
>>> fallback traffic over those same queues.  So no new HW queues are used
>>> for VFRs at all.  This is a reverse of macvlan offload which AFAICT has
>>> "bastard hw queues" which actually TX for a separate software device.
>>>
>>> My understanding was that I can rework this model to have software
>>> queues for VFRs (#sw queues == #PF queues + #VFRs) but no extra HW
>>> queues (#hw queues == #PF queues) but then when the driver sees a
>>> packet on sw-only VFR queue it has to pick one of the PF queues (which
>>> one?), lock PF software queue to own it, and only then can it
>>> transmit.  With the dst_metadata there is no need for extra locking or
>>> queue selection.  
>>
>> Yes.  The VFPR netdevs don't have any HW queues associated with them and 
>> we would like
>> to use the PF queues for the xmit.
>> I was also looking into some way of passing the port id via skb 
>> parameter to the
>> dev_queue_xmit() call so that the PF xmit routine can do a directed 
>> transmit to a specifc VF.
>> Is skb->cb an option to pass this info?
>> dst_metadata approach would work  too if it is acceptable.
> 
> I don't think we can trust skb->cb to be set to anything meaningful
> when the skb is received by the lower device. 
> 

Agreed. I wouldn't recommend using skb->cb. How about passing it through
dev_queue_xmit_accel() through to the driver?

If you pass the metadata through the dev_queue_xmit_accel() handle tx
queue  selection would work using normal mechanisms (xps, select_queue,
cls  hook, etc.). If you wanted to pick some specific queue based on
policy the policy could be loaded into one of those hooks.

.John

  reply	other threads:[~2016-09-23 20:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 19:26 [RFC] net: store port/representative id in metadata_dst Jakub Kicinski
2016-09-23  6:34 ` Jiri Pirko
2016-09-23  9:06   ` Jiri Benc
2016-09-23 12:55     ` Jakub Kicinski
2016-09-23 14:23       ` John Fastabend
2016-09-23 15:29         ` Jakub Kicinski
2016-09-23 17:22           ` Samudrala, Sridhar
2016-09-23 20:17             ` Jakub Kicinski
2016-09-23 20:25               ` John Fastabend [this message]
2016-09-23 20:45                 ` Jakub Kicinski
2016-09-23 21:20                   ` John Fastabend
2016-09-29 11:10                     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E58FA6.3050001@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@stressinduktion.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jbenc@redhat.com \
    --cc=jiri@resnulli.us \
    --cc=kubakici@wp.pl \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=pabeni@redhat.com \
    --cc=pshelar@nicira.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=simon.horman@netronome.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.