From: Johann Baudy <johaahn@gmail.com>
To: "Lovich, Vitali" <vlovich@qualcomm.com>
Cc: David Miller <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH] Packet socket: mmapped IO: PACKET_TX_RING
Date: Fri, 31 Oct 2008 11:58:26 +0100 [thread overview]
Message-ID: <1225450706.5301.94.camel@localhost> (raw)
Hi Vitali,
> There's no need to keep the index (packet_index). Just store the
pointer directly (change it to void *) - saves an extra lookup.
Indeed, it will be faster. I'll do the change.
> Also, I don't think setting TP_STATUS_COPY is necessary since the user
> can't really do anything with that information. Simply leave it at
> TP_STATUS_USER & TP_STATUS_KERNEL.
This information is not useful for user. It is to prevent kernel from
sending a packet twice or more. Inside the tx ring lock, queued packets
must be tagged as "Kernel has already handled this packet" to not send
it again at next turn of tx ring.
(That case can happen if device/queue is very slow or if you have only
few frames)
> The atomic_inc of pending_skb should happen after the skb was
> allocated - the atomic_dec in out_status can also be removed.
> out_status can be removed completely if you get rid of
> TP_STATUS_COPY. If you leave it, you still need the barriers as above
> after changing tp_status, or the user may not see the change.
I don't understand why "atomic_inc of pending_skb should happen after
the skb was allocated". This counter is used to monitor the number of TX
packets queued. So as requirement, we have to increment it before
dev_queue_xmit().
atomic_dec() will be needed anyway if tpacket_fill_skb() or
dev_queue_xmit() are failing (If performed after skb alloc).
> Also, you've got a potential threading issue - you're not protecting
> the frame index behind the spinlock.
You are right, I think I will spin-lock outside the do_while loop.
> Also, when you set the status back to TP_STATUS_KERNEL in the
destructor, you need
> to add the following barriers:
>
> __packet_set_status(po, ph, TP_STATUS_KERNEL);
> smp_mb(); // make sure the TP_STATUS_KERNEL was actually written to
> memory before this - couldn't this actually be just a smp_wmb?
> flush_dcache_page(virt_to_page(ph)); // on non-x86 architectures like
> ARM that have a moronic cache (i.e cache by virtual rather than
> physical address). on x86 this is a noop.
>
So, If my understanding of those memory barriers is correct, we should
have a smp_rmb() before status reading and smp_wmb() after status
writing in skb destructor and send procedure.
> Also, I think that I looked at your code while working on my version
> and there may have been some logical problems with the way you're
> building packets. I'll compare it within the next few days as I start
> cleaning up my code.
I've noticed "data += dev->hard_header_len; to_write -=
dev->hard_header_len;" that must be in (sock->type != SOCK_DGRAM)
condition.
> As a benchmark on a 10G card (and not performing any optimizations
> like using syspart & dedicating a cpu for tx), I was able to hit 8.6
> GBits/s using a dedicated kernel thread for the transmission. With
> the dedicated CPU, I'm confident the line-rate will go up
> significantly
>
> .I'll try to test your changes within the next few days. TCPdump
> maxes out at around 1.5 GBits/s
>
> As for CPU usage, there's a noticeable advantage to traditional send.
> Using tcpreplay - there's about 80% CPU utilization when sending.
> Using the tx ring, there's maybe 10-15%.
>
On my side, I'm using a 1G device with a PPC405.
I've reached 107MBytes/s with TX RING against 25MBytes/S with standard
packet socket raw and 107MBytes with pktgen.
> I'm going to have latency numbers soon as well (i.e. how much jitter
> is introduced by the kernel).
>
Many thanks Vitali for your comments and help :)
--
Johann Baudy
johaahn@gmail.com
next reply other threads:[~2008-10-31 10:58 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-31 10:58 Johann Baudy [this message]
2008-10-31 17:07 ` [PATCH] Packet socket: mmapped IO: PACKET_TX_RING Lovich, Vitali
2008-10-31 18:24 ` Lovich, Vitali
2008-11-04 22:45 ` Johann Baudy
2008-11-06 0:47 ` Lovich, Vitali
2008-11-06 8:03 ` Evgeniy Polyakov
2008-11-06 18:49 ` Lovich, Vitali
2008-11-06 19:40 ` Evgeniy Polyakov
2008-11-06 19:53 ` Lovich, Vitali
2008-11-07 16:36 ` Johann Baudy
2008-11-07 17:19 ` Lovich, Vitali
2008-11-10 20:29 ` Lovich, Vitali
2008-11-11 0:29 ` Lovich, Vitali
[not found] ` <7e0dd21a0811110656yff651afp8ff0f9928b79f545@mail.gmail.com>
2008-11-11 14:59 ` Johann Baudy
2008-11-11 19:05 ` Lovich, Vitali
2008-11-11 12:10 ` Johann Baudy
2008-11-11 17:44 ` Lovich, Vitali
2008-11-11 18:08 ` Johann Baudy
2008-11-11 18:19 ` Lovich, Vitali
2008-11-11 18:59 ` Johann Baudy
2008-11-11 19:10 ` Lovich, Vitali
2008-11-12 12:09 ` Johann Baudy
2008-11-12 17:12 ` Lovich, Vitali
2008-11-11 11:43 ` Johann Baudy
2008-11-11 17:38 ` Lovich, Vitali
2008-11-11 17:50 ` Johann Baudy
2008-11-11 18:14 ` Lovich, Vitali
2008-11-11 18:50 ` Evgeniy Polyakov
2008-11-11 19:19 ` Johann Baudy
2008-11-11 19:29 ` Evgeniy Polyakov
2008-11-12 13:43 ` Johann Baudy
2008-11-12 13:58 ` Evgeniy Polyakov
2008-11-12 17:07 ` Lovich, Vitali
2008-11-12 17:41 ` Evgeniy Polyakov
2008-11-12 17:59 ` Lovich, Vitali
2008-11-12 18:11 ` Evgeniy Polyakov
2008-11-12 19:05 ` Lovich, Vitali
2008-11-12 19:14 ` Evgeniy Polyakov
2008-11-12 21:23 ` Lovich, Vitali
2008-11-12 21:46 ` Evgeniy Polyakov
2008-11-12 22:33 ` Lovich, Vitali
2008-11-18 18:49 ` Johann Baudy
2008-11-18 19:10 ` Evgeniy Polyakov
2008-11-18 19:46 ` Lovich, Vitali
2008-11-07 17:28 ` Evgeniy Polyakov
2008-11-07 20:22 ` David Miller
2008-10-31 20:28 ` Evgeniy Polyakov
2008-11-04 22:33 ` Johann Baudy
2008-11-05 1:50 ` Lovich, Vitali
-- strict thread matches above, loose matches on Subject: below --
2008-11-12 13:19 Johann Baudy
2008-11-05 15:16 Johann Baudy
2008-11-05 17:49 ` Lovich, Vitali
2008-11-05 10:55 Johann Baudy
2008-11-05 11:02 ` Patrick McHardy
2008-11-05 17:32 ` Lovich, Vitali
2008-10-30 13:00 Johann Baudy
2008-10-30 18:21 ` Lovich, Vitali
2008-10-27 9:33 Johann Baudy
2008-10-28 22:44 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1225450706.5301.94.camel@localhost \
--to=johaahn@gmail.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=vlovich@qualcomm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).