Re: [RFC 6/9] staging: dpaa2-switch: add .ndo_start_xmit() callback

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Ioana Ciornei <ciorneiioana@gmail.com>
To: Andrew Lunn <andrew@lunn.ch>
Cc: Ioana Ciornei <ciorneiioana@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Ioana Ciornei <ioana.ciornei@nxp.com>
Subject: Re: [RFC 6/9] staging: dpaa2-switch: add .ndo_start_xmit() callback
Date: Thu, 5 Nov 2020 17:51:50 +0200	[thread overview]
Message-ID: <20201105155150.qc44olbqyxihislh@skbuf> (raw)
In-Reply-To: <20201105134512.GJ933237@lunn.ch>

On Thu, Nov 05, 2020 at 02:45:12PM +0100, Andrew Lunn wrote:
> > > Where is the TX confirm which uses this stored pointer. I don't see it
> > > in this file.
> > > 
> > 
> > The Tx confirm - dpaa2_switch_tx_conf() - is added in patch 5/9.
> 
> Not so obvious. Could it be moved here?
> 

Sure, I'll move it here so that we have both Tx and Tx confirmation in
the same patch.

> > > It can be expensive to store pointer like this in buffers used for
> > > DMA.
> > 
> > Yes, it is. But the hardware does not give us any other indication that
> > a packet was actually sent so that we can move ahead with consuming the
> > initial skb.
> > 
> > > It has to be flushed out of the cache here as part of the
> > > send. Then the TX complete needs to invalidate and then read it back
> > > into the cache. Or you use coherent memory which is just slow.
> > > 
> > > It can be cheaper to keep a parallel ring in cacheable memory which
> > > never gets flushed.
> > 
> > I'm afraid I don't really understand your suggestion. In this parallel
> > ring I would keep the skb pointers of all frames which are in-flight?
> > Then, when a packet is received on the Tx confirmation queue I would
> > have to loop over the parallel ring and determine somehow which skb was
> > this packet initially associated to. Isn't this even more expensive?
> 
> I don't know this particular hardware, so i will talk in general
> terms. Generally, you have a transmit ring. You add new frames to be
> sent to the beginning of the ring, and you take off completed frames
> from the end of the ring. This is kept in 'expensive' memory, in that
> either it is coherent, or you need to do flushed/invalidates.
> 
> It is expected that the hardware keeps to ring order. It does not pick
> and choose which frames it sends, it does them in order. That means
> completion also happens in ring order. So the driver can keep a simple
> linear array the size of the ring, in cachable memory, with pointers
> to the skbuf. And it just needs a counting index to know which one
> just completed.

I agree with all of the above in a general sense.

> 
> Now, your hardware is more complex. You have one queue feeding
> multiple switch ports.

Not really. I have one Tx queue for each switch port and just one Tx
confirmation queue for all of them.

> Maybe it does not keep to ring order?

If the driver enqueues frames #1, #2, #3 in this exact order on a switch
port then the frames will arrive in the same order on the Tx
confirmation queue irrespective of any other traffic sent on other
switch ports.

> If you
> have one port running at 10M/Half, and another at 10G/Full, does it
> leave frames for the 10/Half port in the ring when its egress queue it
> full? That is probably a bad idea, since the 10G/Full port could then
> starve for lack of free slots in the ring? So my guess would be, the
> frames get dropped. And so ring order is maintained.
> 
> If you are paranoid it could get out of sync, keep an array of tuples,
> address of the frame descriptor and the skbuf. If the fd address does
> not match what you expect, then do the linear search of the fd
> address, and increment a counter that something odd has happened.
> 

The problem with this would be, I think, with two TX softirqs on two
different cores which want to send a frame on the same switch port. In
order to update the shadow ring, there should be some kind of locking
mechanism on the access to the shadow ring which would might invalidate
any attempt to make this more efficient.

This might not be a problem for the dpaa2-switch since it does not
enable NETIF_F_LLTX but it might be for dpaa2-eth.

Also, as the architecture is defined now, the driver does not really see
the Tx queues as being fixed-size so that it can infer the size for the
shadow copy.

I will have to dig a little bit more in this area to understand exactly
why the decision to use skb backpointers was made in the first place (I
am not really talking about the dpaa2-switch here, dpaa2-eth has the
same exact behavior and has been around for some time now).

Ioana

next prev parent reply	other threads:[~2020-11-05 15:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04 16:57 [RFC 0/9] staging: dpaa2-switch: add support for CPU terminated traffic Ioana Ciornei
2020-11-04 16:57 ` [RFC 1/9] staging: dpaa2-switch: get control interface attributes Ioana Ciornei
2020-11-04 16:57 ` [RFC 2/9] staging: dpaa2-switch: setup buffer pool for control traffic Ioana Ciornei
2020-11-04 16:57 ` [RFC 3/9] staging: dpaa2-switch: setup RX path rings Ioana Ciornei
2020-11-04 16:57 ` [RFC 4/9] staging: dpaa2-switch: setup dpio Ioana Ciornei
2020-11-04 16:57 ` [RFC 5/9] staging: dpaa2-switch: handle Rx path on control interface Ioana Ciornei
2020-11-05  0:45   ` Andrew Lunn
2020-11-05 11:22     ` Ioana Ciornei
2020-11-04 16:57 ` [RFC 6/9] staging: dpaa2-switch: add .ndo_start_xmit() callback Ioana Ciornei
2020-11-04 21:27   ` Vladimir Oltean
2020-11-05  8:11     ` Ioana Ciornei
2020-11-05  1:04   ` Andrew Lunn
2020-11-05  8:25     ` Ioana Ciornei
2020-11-05 13:45       ` Andrew Lunn
2020-11-05 15:51         ` Ioana Ciornei [this message]
2020-11-04 16:57 ` [RFC 7/9] staging: dpaa2-switch: enable the control interface Ioana Ciornei
2020-11-04 16:57 ` [RFC 8/9] staging: dpaa2-switch: properly setup switching domains Ioana Ciornei
2020-11-04 22:08   ` Vladimir Oltean
2020-11-05 10:58     ` Ioana Ciornei
2020-11-04 16:57 ` [RFC 9/9] staging: dpaa2-switch: accept only vlan-aware upper devices Ioana Ciornei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201105155150.qc44olbqyxihislh@skbuf \
    --to=ciorneiioana@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=gregkh@linuxfoundation.org \
    --cc=ioana.ciornei@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox