From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: Willy Tarreau <w@1wt.eu>
Cc: netdev@vger.kernel.org,
Gregory CLEMENT <gregory.clement@free-electrons.com>,
Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH] net: convert mvneta to build_skb()
Date: Fri, 5 Jul 2013 09:50:38 +0200 [thread overview]
Message-ID: <20130705095038.018de378@skate> (raw)
In-Reply-To: <20130705074330.GB25188@1wt.eu>
Dear Willy Tarreau,
On Fri, 5 Jul 2013 09:43:30 +0200, Willy Tarreau wrote:
> > Thanks Willy. Sorry for asking such a stupid question, but I'm not very
> > familiar with how this mechanism works. Can you explain why a single
> > 'frag_size' field per port is sufficient? My concern is that this
> > frag_size seems to be a per-packet information, but we have potentially
> > multiple packets being received, and multiple RX queues. Is one single
> > 'frag_size' per network interface sufficient?
>
> I had the exact same question when Eric sent me an experimental patch to
> do this on mv64xxx_eth a few months ago. Then I checked how the frag_size
> is computed. As you can see below, it does not depend on a per-packet size
> but on a per-port size which is in fact the MTU ("pkt_size" is the misleading
> part here) :
>
> skb_size = SKB_DATA_ALIGN(pp->pkt_size + MVNETA_MH_SIZE + NET_SKB_PAD) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> So if skb_size depends solely on pp (struct mvneta_port), then it makes
> sense to have frag_size stored at the same place.
>
> In practice we don't really need to store the frag_size in the struct, we
> just need to know if the data were allocated using netdev_alloc_frag()
> or kmalloc() to know how to free them. So a single bit would be enough,
> and I thought about just doing a +1 on the pointer when we need to free
> using kmalloc(). But that would add unneeded extra work in the fast path
> to fix the pointers. And since we need to pass frag_size to build_skb()
> it was a reasonable solution in my opinion.
Aah, okay makes sense. So now, the question that comes up is why this
skb_size calculation is done in every call of mvneta_rx_refill() ? It
should be computed once, at the same time pkt_size is calculated, and
stored in the mvneta_port structure? Then you just need to test whether
it is smaller or larger than PAGE_SIZE to decide whether to use
netdev_alloc_frag() vs. kmalloc().
I.e, I would turn all the:
if (pp->frag_size)
...
else
...
into:
if (pp->skb_size <= PAGE_SIZE)
...
else
...
Of course, you can always hide this test behind a small macro or inline
function, to make it something like:
if (mvneta_uses_small_skbs(pp))
...
else
...
A better name than mvneta_uses_small_skbs() would of course be useful.
> > For example, in mvneta_rx_refill(), you store the skb_size in
> > pp->frag_size, and then you later re-use it in mvneta_rxq_drop_pkts.
> > What guarantees you that mvneta_rx_refill() hasn't be called in the
> > mean time for a different packet, in a different rxq, for the same
> > network interface, and the value of pp->frag_size has been overridden?
>
> It's not a problem since the refill() applies pp->pkt_size which doesn't
> change between calls. It's only changed in mvneta_change_mtu() which
> first stops the device. So I think it's safe.
Yeah, sure, now that I see that pp->frag_size is constant for a given
MTU configuration, it looks safe. But I find the current code that
retests and reassigns pp->frag_size at every call of rxq_refill() to be
very confusing.
Thanks!
Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
next prev parent reply other threads:[~2013-07-05 7:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-04 17:35 [PATCH] net: convert mvneta to build_skb() Willy Tarreau
2013-07-04 21:31 ` David Miller
2013-07-04 22:12 ` Willy Tarreau
2013-07-05 7:17 ` Thomas Petazzoni
2013-07-05 7:43 ` Willy Tarreau
2013-07-05 7:50 ` Thomas Petazzoni [this message]
2013-07-05 8:09 ` Willy Tarreau
2013-07-15 14:34 ` Thomas Petazzoni
2013-07-15 15:12 ` Willy Tarreau
2013-07-15 15:23 ` Thomas Petazzoni
2013-07-15 15:30 ` Willy Tarreau
2013-07-15 15:35 ` Thomas Petazzoni
2013-07-15 15:52 ` Florian Fainelli
2013-07-15 17:01 ` Willy Tarreau
2013-07-15 19:44 ` Thomas Petazzoni
2013-07-15 23:02 ` Florian Fainelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130705095038.018de378@skate \
--to=thomas.petazzoni@free-electrons.com \
--cc=edumazet@google.com \
--cc=gregory.clement@free-electrons.com \
--cc=netdev@vger.kernel.org \
--cc=w@1wt.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.