linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Carsten Andrich <carsten.andrich-hs6bpBdVsEZfm0AUMx9V0g@public.gmane.org>
To: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Daniel Borkmann
	<dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Michael Kerrisk (man-pages)"
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>,
	jbrouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: Improving PACKET_{RX,TX}_RING documentation
Date: Thu, 22 May 2014 14:22:22 +0200	[thread overview]
Message-ID: <d1afaaacb05c031517bb96211c098edf@localhost> (raw)
In-Reply-To: <CA+FuTSeWh_iQGqc-4usL7vr28OrkHTnBvHvXvVO=LcGsNRgtMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Thanks for the feedback, Daniel and Willem.

Executive summary: We need a good concept for distributing (preferably
mutually exclusive) information among packet(7) and packet_mmap.txt.

Willem de Bruijn schrieb:
>>     0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
>>        it's layout, constraints etc. Btw, not sure if that's also
> 
> This would duplicate the contents of
> Documentation/networking/packet_mmap.txt? I would caution against
> having two sources of documentation that may become inconsistent over
> time. A detailed discussion could also become too long for a manual
> page: packet_mmap.txt is already 1067 lines (albeit about half in
> example code). If that document is confusing, a thorough edit of that
> would be very helpful, though.
> 
>>        included already, but the same mmap-technique exists also for
>>        netlink sockets.
> 
> See also Documentation/networking/netlink_mmap.txt . If the ring is a
> generic netlink feature (i.e., not specific to nfnetlink), then man 7
> netlink is the right place for user documentation (in as far as this
> is a user-oriented feature).

Some deduplication between netlink(7), packet(7), netlink_mmap.txt and
packet_mmap.txt is probably a good idea. However, this is much more than
I initially bargained for :)
I had a superficial look at the netlink documents, and most concepts
appear to be very much alike. The operational aspects make quite a large
exception, though, since the netlink header (usage) is a lot simpler
than tpacket_hdr including its different versions.

IMHO user-space API documentation should reside in the man page and not
Documentation/, but I'd like to heard Michael's opinion on that. Maybe
it's a good idea to have at least a basic description on packet(7) and
reserve packet_mmap.txt for the more advanced topics?

>>>>       1. Increase detail of PACKET_{RX,TX}_RING socket options, including
>>>>          description of struct tpacket_hdr and anything else required to
>>>>          operate the ring.
> 
> If expanding the man page, then moving mmap into a separate section
> sounds good to me. If a man page is more user documentation than
> kernel Documentation/ then perhaps start by discussing the pros and
> cons of mmapped rings over recv and to help users decide whether to
> use the mmapped ring, or for instance batch with recvmmsg().

Actually I wanted to maintain the structure of the man page and
describe everything inside the appropriate "Socket options" sections.
However, adding a reference to recvmmsg() is a good idea as well, which
would justify creating a new sections for "Advanced packet socket
techniques".

>>>>       2. Move some details from other sockopts (e.g. PACKET_LOSS) into
>>>>          *_RING.
> 
> Yes, please move all ring-specific details into the new ring section.
> 
>>>>       3. Add fully functional example source code for simple
>>>>          PACKET_{RX,TX}_RING operation (initialization and operation).
>>>>          This may be as much as 3 different example programs if I
>>>>          incorporate [2] and [3] in an appropriate manner. It might be a
>>>>          good idea to add a non-*_RING example as well.
>>
>>
>> Yes, some examples for mmap RX, mmap TX, fanout, and perhaps TPACKET_V3
>> might be great.
>>
>>
>>>>       4. Add a warning about inferior _TX_RING performance [1] which I
>>>>          suffered from only recently in the measurements I made for my
>>>>          thesis on Linux 3.14.
> 
> I would describe such points in a positive manner (optimization) as
> opposed to a negative (inferior performance).

Using positive wording is always a good idea, but packet_mmap.txt
already tricked me into believing that PACKET_TX_RING should be faster
than plain sendto(). The user should be allowed to make an informed
decision, which requires the manpage to tell the (ugly) truth that
sendto() currently outperforms TX_RING.

> The optimization you refer to is to attach the tx-only packet socket
> to a protocol family that is never observed, so that no packets are
> looped back into the socket on receive. This is a great trick. There
> are probably others. Again, I believe that such details belong more in
> packet_mmap.txt than in the man page. But that is just one opinion, so
> I'll gladly defer to Michael and others on that point.

Since the tx-only optimization socket(*, *, 0) is not TX_RING-specific,
this should be in the manpage (IMHO). As I said above, packet_mmap.txt
may be a decent spot for advanced and {RX,TX}_RING-specific techniques.

>> Can you elaborate? Jesper made recently a nice summary on using trafgen
>> which uses TX_RING internally:
>>
>>   http://netoptimizer.blogspot.ch/2014/04/trafgen-fast-packet-generator.html

For my thesis I implemented a programmable switch in user-space (for
better programmability and guaranteed API compatibility). Testing with
64-byte frames I reached a maximum frame rate of ~0.6 Mpps using
{RX,TX}_RING, but ~0.87 Mpps using plain recvfrom()/sendto(). A hybrid
RX_RING/sendto() approach with seperate RX/TX threads yielded the same
frame rate as tests with pktgen (~0.89 Mpps). Fun fact: Open vSwitch
reached ~1 Mpps and thus surprisingly surpassed pktgen. This might be
worth investigating.

>> Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the
>> like.
> 
> That would be very interesting. The packet -> block batching mechanism
> likely was tested with small packet performance, but may have little
> benefit for larger packets. A discussion of the trade offs from a user
> point of view would be very interesting.

Actually I intended to deal only with TPACKET_V2 for now, since it is
simpler than TPACKET_V3 and can be use for RX and TX. TPACKET_V3 can be
added later on or could remain in packet_mmap.txt.


Cheers,
Carsten

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-05-22 12:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-17 13:13 Improving PACKET_{RX,TX}_RING documentation Carsten Andrich
     [not found] ` <1400332406.2395.35.camel-FQO4gtnRtnzkVFMGpb/cPg@public.gmane.org>
2014-05-19  4:54   ` Michael Kerrisk (man-pages)
     [not found]     ` <53798E97.1000505-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-19 10:14       ` Daniel Borkmann
     [not found]         ` <5379D9A2.1070008-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-19 15:05           ` Willem de Bruijn
     [not found]             ` <CA+FuTSeWh_iQGqc-4usL7vr28OrkHTnBvHvXvVO=LcGsNRgtMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-19 16:01               ` Daniel Borkmann
2014-05-22 12:22               ` Carsten Andrich [this message]
2014-05-22 13:13                 ` Michael Kerrisk (man-pages)
2014-05-22 13:37                 ` Jesper Dangaard Brouer
2014-05-22 14:51                 ` Willem de Bruijn
     [not found]                   ` <CA+FuTSfpORKtm_kdG+CycoPiq+Gxf58=nXqKApFEmR+xZs69_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-26 10:49                     ` Carsten Andrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1afaaacb05c031517bb96211c098edf@localhost \
    --to=carsten.andrich-hs6bpbdvsezfm0aumx9v0g@public.gmane.org \
    --cc=dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jbrouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org \
    --cc=willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).