* Improving PACKET_{RX,TX}_RING documentation
@ 2014-05-17 13:13 Carsten Andrich
[not found] ` <1400332406.2395.35.camel-FQO4gtnRtnzkVFMGpb/cPg@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Carsten Andrich @ 2014-05-17 13:13 UTC (permalink / raw)
To: linux-man-u79uwXL29TY76Z2rM5mHXA
Cc: Daniel Borkmann, Willem de Bruijn, Neil Horman,
Michael Kerrisk-manpages
Hello again everyone,
roughly 3 weeks ago the aftermath of an actually minor patch to fix an
inaccuracy in packet.7's PACKET_TX_RING-related documentation led me to
offer improving the entire PACKET_{RX,TX}_RING-documentation.
Since I do happen to have most of my spare time back by now, I'd like to
tackle this effort before I change my mind :)
On 04/24/2014 12:21 PM, Michael Kerrisk (man-pages) wrote:
> I'd leave that plan largely to you. It sounds like Willem and
> Daniel are willing to help out.
I'd like to start with getting packet.7's documentation of
PACKET_{RX,TX}_RING into a shape, that should allow most readers to
actually use it without consulting packet_mmap.txt. The latter can be
quite confusing for those unfamiliar with PACKET_{RX,TX}_RING.
I plan to do the following to packet.7:
1. Increase detail of PACKET_{RX,TX}_RING socket options, including
description of struct tpacket_hdr and anything else required to
operate the ring.
2. Move some details from other sockopts (e.g. PACKET_LOSS) into
*_RING.
3. Add fully functional example source code for simple
PACKET_{RX,TX}_RING operation (initialization and operation).
This may be as much as 3 different example programs if I
incorporate [2] and [3] in an appropriate manner. It might be a
good idea to add a non-*_RING example as well.
4. Add a warning about inferior _TX_RING performance [1] which I
suffered from only recently in the measurements I made for my
thesis on Linux 3.14.
5. Other minor changes that'll come up while taking care of 1 thru
4 :)
Any suggestions regarding this rough course of action?
Cheers,
Carsten
[1]https://github.com/netsniff-ng/netsniff-ng/commit/c3602a9
[2]https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/Documentation/networking/packet_mmap.txt?id=66e56cd46b93ef407c60adcac62cf33b06119d50
[3]https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/Documentation/networking/packet_mmap.txt?id=7e11daa7c19ec319fa4b750fd249a18957f17797
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
[not found] ` <1400332406.2395.35.camel-FQO4gtnRtnzkVFMGpb/cPg@public.gmane.org>
@ 2014-05-19 4:54 ` Michael Kerrisk (man-pages)
[not found] ` <53798E97.1000505-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-19 4:54 UTC (permalink / raw)
To: Carsten Andrich, linux-man-u79uwXL29TY76Z2rM5mHXA
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Daniel Borkmann,
Willem de Bruijn, Neil Horman
Hi Carsten,
On 05/17/2014 03:13 PM, Carsten Andrich wrote:
> Hello again everyone,
>
> roughly 3 weeks ago the aftermath of an actually minor patch to fix an
> inaccuracy in packet.7's PACKET_TX_RING-related documentation led me to
> offer improving the entire PACKET_{RX,TX}_RING-documentation.
> Since I do happen to have most of my spare time back by now, I'd like to
> tackle this effort before I change my mind :)
Thanks for following up!
> On 04/24/2014 12:21 PM, Michael Kerrisk (man-pages) wrote:
>> I'd leave that plan largely to you. It sounds like Willem and
>> Daniel are willing to help out.
>
> I'd like to start with getting packet.7's documentation of
> PACKET_{RX,TX}_RING into a shape, that should allow most readers to
> actually use it without consulting packet_mmap.txt. The latter can be
> quite confusing for those unfamiliar with PACKET_{RX,TX}_RING.
>
> I plan to do the following to packet.7:
> 1. Increase detail of PACKET_{RX,TX}_RING socket options, including
> description of struct tpacket_hdr and anything else required to
> operate the ring.
> 2. Move some details from other sockopts (e.g. PACKET_LOSS) into
> *_RING.
> 3. Add fully functional example source code for simple
> PACKET_{RX,TX}_RING operation (initialization and operation).
> This may be as much as 3 different example programs if I
> incorporate [2] and [3] in an appropriate manner. It might be a
> good idea to add a non-*_RING example as well.
> 4. Add a warning about inferior _TX_RING performance [1] which I
> suffered from only recently in the measurements I made for my
> thesis on Linux 3.14.
> 5. Other minor changes that'll come up while taking care of 1 thru
> 4 :)
>
> Any suggestions regarding this rough course of action?
Well, I can't speak to the fine technical details, but the plan looks
rational to me. Perhaps Neil, Willem, or Daniel has a comment.
Just by the way, I suggest CCing netdeve-u79uwXL29TY76Z2rM5mHXA@public.gmane.org on all patches.
It may be that someone else also comments.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
[not found] ` <53798E97.1000505-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-05-19 10:14 ` Daniel Borkmann
[not found] ` <5379D9A2.1070008-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Borkmann @ 2014-05-19 10:14 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Carsten Andrich, linux-man-u79uwXL29TY76Z2rM5mHXA,
Willem de Bruijn, Neil Horman, jbrouer-H+wXaHxf7aLQT0dZR+AlfA
Hi Carsten,
On 05/19/2014 06:54 AM, Michael Kerrisk (man-pages) wrote:
> On 05/17/2014 03:13 PM, Carsten Andrich wrote:
>> Hello again everyone,
>>
>> roughly 3 weeks ago the aftermath of an actually minor patch to fix an
>> inaccuracy in packet.7's PACKET_TX_RING-related documentation led me to
>> offer improving the entire PACKET_{RX,TX}_RING-documentation.
>> Since I do happen to have most of my spare time back by now, I'd like to
>> tackle this effort before I change my mind :)
>
> Thanks for following up!
>
>> On 04/24/2014 12:21 PM, Michael Kerrisk (man-pages) wrote:
>>> I'd leave that plan largely to you. It sounds like Willem and
>>> Daniel are willing to help out.
>>
>> I'd like to start with getting packet.7's documentation of
>> PACKET_{RX,TX}_RING into a shape, that should allow most readers to
>> actually use it without consulting packet_mmap.txt. The latter can be
>> quite confusing for those unfamiliar with PACKET_{RX,TX}_RING.
>>
>> I plan to do the following to packet.7:
0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
it's layout, constraints etc. Btw, not sure if that's also
included already, but the same mmap-technique exists also for
netlink sockets.
>> 1. Increase detail of PACKET_{RX,TX}_RING socket options, including
>> description of struct tpacket_hdr and anything else required to
>> operate the ring.
>> 2. Move some details from other sockopts (e.g. PACKET_LOSS) into
>> *_RING.
>> 3. Add fully functional example source code for simple
>> PACKET_{RX,TX}_RING operation (initialization and operation).
>> This may be as much as 3 different example programs if I
>> incorporate [2] and [3] in an appropriate manner. It might be a
>> good idea to add a non-*_RING example as well.
Yes, some examples for mmap RX, mmap TX, fanout, and perhaps TPACKET_V3
might be great.
>> 4. Add a warning about inferior _TX_RING performance [1] which I
>> suffered from only recently in the measurements I made for my
>> thesis on Linux 3.14.
Can you elaborate? Jesper made recently a nice summary on using trafgen
which uses TX_RING internally:
http://netoptimizer.blogspot.ch/2014/04/trafgen-fast-packet-generator.html
>> 5. Other minor changes that'll come up while taking care of 1 thru
>> 4 :)
Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the like.
>> Any suggestions regarding this rough course of action?
>
> Well, I can't speak to the fine technical details, but the plan looks
> rational to me. Perhaps Neil, Willem, or Daniel has a comment.
>
> Just by the way, I suggest CCing netdeve-u79uwXL29TY76Z2rM5mHXA@public.gmane.org on all patches.
> It may be that someone else also comments.
>
> Cheers,
>
> Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
[not found] ` <5379D9A2.1070008-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-05-19 15:05 ` Willem de Bruijn
[not found] ` <CA+FuTSeWh_iQGqc-4usL7vr28OrkHTnBvHvXvVO=LcGsNRgtMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Willem de Bruijn @ 2014-05-19 15:05 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Michael Kerrisk (man-pages), Carsten Andrich,
linux-man-u79uwXL29TY76Z2rM5mHXA, Neil Horman,
jbrouer-H+wXaHxf7aLQT0dZR+AlfA
On Mon, May 19, 2014 at 6:14 AM, Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Hi Carsten,
>
>
> On 05/19/2014 06:54 AM, Michael Kerrisk (man-pages) wrote:
>>
>> On 05/17/2014 03:13 PM, Carsten Andrich wrote:
>>>
>>> Hello again everyone,
>>>
>>> roughly 3 weeks ago the aftermath of an actually minor patch to fix an
>>> inaccuracy in packet.7's PACKET_TX_RING-related documentation led me to
>>> offer improving the entire PACKET_{RX,TX}_RING-documentation.
>>> Since I do happen to have most of my spare time back by now, I'd like to
>>> tackle this effort before I change my mind :)
>>
>>
>> Thanks for following up!
Indeed!
>>
>>> On 04/24/2014 12:21 PM, Michael Kerrisk (man-pages) wrote:
>>>>
>>>> I'd leave that plan largely to you. It sounds like Willem and
>>>> Daniel are willing to help out.
>>>
>>>
>>> I'd like to start with getting packet.7's documentation of
>>> PACKET_{RX,TX}_RING into a shape, that should allow most readers to
>>> actually use it without consulting packet_mmap.txt. The latter can be
>>> quite confusing for those unfamiliar with PACKET_{RX,TX}_RING.
>>>
>>> I plan to do the following to packet.7:
>
>
> 0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
> it's layout, constraints etc. Btw, not sure if that's also
This would duplicate the contents of
Documentation/networking/packet_mmap.txt? I would caution against
having two sources of documentation that may become inconsistent over
time. A detailed discussion could also become too long for a manual
page: packet_mmap.txt is already 1067 lines (albeit about half in
example code). If that document is confusing, a thorough edit of that
would be very helpful, though.
> included already, but the same mmap-technique exists also for
> netlink sockets.
See also Documentation/networking/netlink_mmap.txt . If the ring is a
generic netlink feature (i.e., not specific to nfnetlink), then man 7
netlink is the right place for user documentation (in as far as this
is a user-oriented feature).
>>> 1. Increase detail of PACKET_{RX,TX}_RING socket options, including
>>> description of struct tpacket_hdr and anything else required to
>>> operate the ring.
If expanding the man page, then moving mmap into a separate section
sounds good to me. If a man page is more user documentation than
kernel Documentation/ then perhaps start by discussing the pros and
cons of mmapped rings over recv and to help users decide whether to
use the mmapped ring, or for instance batch with recvmmsg().
>>> 2. Move some details from other sockopts (e.g. PACKET_LOSS) into
>>> *_RING.
Yes, please move all ring-specific details into the new ring section.
>>> 3. Add fully functional example source code for simple
>>> PACKET_{RX,TX}_RING operation (initialization and operation).
>>> This may be as much as 3 different example programs if I
>>> incorporate [2] and [3] in an appropriate manner. It might be a
>>> good idea to add a non-*_RING example as well.
>
>
> Yes, some examples for mmap RX, mmap TX, fanout, and perhaps TPACKET_V3
> might be great.
>
>
>>> 4. Add a warning about inferior _TX_RING performance [1] which I
>>> suffered from only recently in the measurements I made for my
>>> thesis on Linux 3.14.
I would describe such points in a positive manner (optimization) as
opposed to a negative (inferior performance).
The optimization you refer to is to attach the tx-only packet socket
to a protocol family that is never observed, so that no packets are
looped back into the socket on receive. This is a great trick. There
are probably others. Again, I believe that such details belong more in
packet_mmap.txt than in the man page. But that is just one opinion, so
I'll gladly defer to Michael and others on that point.
>
>
> Can you elaborate? Jesper made recently a nice summary on using trafgen
> which uses TX_RING internally:
>
> http://netoptimizer.blogspot.ch/2014/04/trafgen-fast-packet-generator.html
>
>
>>> 5. Other minor changes that'll come up while taking care of 1 thru
>>> 4 :)
>
>
> Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the
> like.
That would be very interesting. The packet -> block batching mechanism
likely was tested with small packet performance, but may have little
benefit for larger packets. A discussion of the trade offs from a user
point of view would be very interesting.
>>> Any suggestions regarding this rough course of action?
>>
>>
>> Well, I can't speak to the fine technical details, but the plan looks
>> rational to me. Perhaps Neil, Willem, or Daniel has a comment.
>>
>> Just by the way, I suggest CCing netdeve-u79uwXL29TY76Z2rM5mHXA@public.gmane.org on all patches.
>> It may be that someone else also comments.
>>
>> Cheers,
>>
>> Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
[not found] ` <CA+FuTSeWh_iQGqc-4usL7vr28OrkHTnBvHvXvVO=LcGsNRgtMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-05-19 16:01 ` Daniel Borkmann
2014-05-22 12:22 ` Carsten Andrich
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Borkmann @ 2014-05-19 16:01 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Michael Kerrisk (man-pages), Carsten Andrich,
linux-man-u79uwXL29TY76Z2rM5mHXA, Neil Horman,
jbrouer-H+wXaHxf7aLQT0dZR+AlfA
On 05/19/2014 05:05 PM, Willem de Bruijn wrote:
...
> This would duplicate the contents of
> Documentation/networking/packet_mmap.txt? I would caution against
> having two sources of documentation that may become inconsistent over
> time. A detailed discussion could also become too long for a manual
> page: packet_mmap.txt is already 1067 lines (albeit about half in
> example code). If that document is confusing, a thorough edit of that
> would be very helpful, though.
Okay, agreed, that would also be a good idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
[not found] ` <CA+FuTSeWh_iQGqc-4usL7vr28OrkHTnBvHvXvVO=LcGsNRgtMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-19 16:01 ` Daniel Borkmann
@ 2014-05-22 12:22 ` Carsten Andrich
2014-05-22 13:13 ` Michael Kerrisk (man-pages)
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Carsten Andrich @ 2014-05-22 12:22 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Michael Kerrisk (man-pages),
linux-man-u79uwXL29TY76Z2rM5mHXA, Neil Horman,
jbrouer-H+wXaHxf7aLQT0dZR+AlfA
Thanks for the feedback, Daniel and Willem.
Executive summary: We need a good concept for distributing (preferably
mutually exclusive) information among packet(7) and packet_mmap.txt.
Willem de Bruijn schrieb:
>> 0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
>> it's layout, constraints etc. Btw, not sure if that's also
>
> This would duplicate the contents of
> Documentation/networking/packet_mmap.txt? I would caution against
> having two sources of documentation that may become inconsistent over
> time. A detailed discussion could also become too long for a manual
> page: packet_mmap.txt is already 1067 lines (albeit about half in
> example code). If that document is confusing, a thorough edit of that
> would be very helpful, though.
>
>> included already, but the same mmap-technique exists also for
>> netlink sockets.
>
> See also Documentation/networking/netlink_mmap.txt . If the ring is a
> generic netlink feature (i.e., not specific to nfnetlink), then man 7
> netlink is the right place for user documentation (in as far as this
> is a user-oriented feature).
Some deduplication between netlink(7), packet(7), netlink_mmap.txt and
packet_mmap.txt is probably a good idea. However, this is much more than
I initially bargained for :)
I had a superficial look at the netlink documents, and most concepts
appear to be very much alike. The operational aspects make quite a large
exception, though, since the netlink header (usage) is a lot simpler
than tpacket_hdr including its different versions.
IMHO user-space API documentation should reside in the man page and not
Documentation/, but I'd like to heard Michael's opinion on that. Maybe
it's a good idea to have at least a basic description on packet(7) and
reserve packet_mmap.txt for the more advanced topics?
>>>> 1. Increase detail of PACKET_{RX,TX}_RING socket options, including
>>>> description of struct tpacket_hdr and anything else required to
>>>> operate the ring.
>
> If expanding the man page, then moving mmap into a separate section
> sounds good to me. If a man page is more user documentation than
> kernel Documentation/ then perhaps start by discussing the pros and
> cons of mmapped rings over recv and to help users decide whether to
> use the mmapped ring, or for instance batch with recvmmsg().
Actually I wanted to maintain the structure of the man page and
describe everything inside the appropriate "Socket options" sections.
However, adding a reference to recvmmsg() is a good idea as well, which
would justify creating a new sections for "Advanced packet socket
techniques".
>>>> 2. Move some details from other sockopts (e.g. PACKET_LOSS) into
>>>> *_RING.
>
> Yes, please move all ring-specific details into the new ring section.
>
>>>> 3. Add fully functional example source code for simple
>>>> PACKET_{RX,TX}_RING operation (initialization and operation).
>>>> This may be as much as 3 different example programs if I
>>>> incorporate [2] and [3] in an appropriate manner. It might be a
>>>> good idea to add a non-*_RING example as well.
>>
>>
>> Yes, some examples for mmap RX, mmap TX, fanout, and perhaps TPACKET_V3
>> might be great.
>>
>>
>>>> 4. Add a warning about inferior _TX_RING performance [1] which I
>>>> suffered from only recently in the measurements I made for my
>>>> thesis on Linux 3.14.
>
> I would describe such points in a positive manner (optimization) as
> opposed to a negative (inferior performance).
Using positive wording is always a good idea, but packet_mmap.txt
already tricked me into believing that PACKET_TX_RING should be faster
than plain sendto(). The user should be allowed to make an informed
decision, which requires the manpage to tell the (ugly) truth that
sendto() currently outperforms TX_RING.
> The optimization you refer to is to attach the tx-only packet socket
> to a protocol family that is never observed, so that no packets are
> looped back into the socket on receive. This is a great trick. There
> are probably others. Again, I believe that such details belong more in
> packet_mmap.txt than in the man page. But that is just one opinion, so
> I'll gladly defer to Michael and others on that point.
Since the tx-only optimization socket(*, *, 0) is not TX_RING-specific,
this should be in the manpage (IMHO). As I said above, packet_mmap.txt
may be a decent spot for advanced and {RX,TX}_RING-specific techniques.
>> Can you elaborate? Jesper made recently a nice summary on using trafgen
>> which uses TX_RING internally:
>>
>> http://netoptimizer.blogspot.ch/2014/04/trafgen-fast-packet-generator.html
For my thesis I implemented a programmable switch in user-space (for
better programmability and guaranteed API compatibility). Testing with
64-byte frames I reached a maximum frame rate of ~0.6 Mpps using
{RX,TX}_RING, but ~0.87 Mpps using plain recvfrom()/sendto(). A hybrid
RX_RING/sendto() approach with seperate RX/TX threads yielded the same
frame rate as tests with pktgen (~0.89 Mpps). Fun fact: Open vSwitch
reached ~1 Mpps and thus surprisingly surpassed pktgen. This might be
worth investigating.
>> Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the
>> like.
>
> That would be very interesting. The packet -> block batching mechanism
> likely was tested with small packet performance, but may have little
> benefit for larger packets. A discussion of the trade offs from a user
> point of view would be very interesting.
Actually I intended to deal only with TPACKET_V2 for now, since it is
simpler than TPACKET_V3 and can be use for RX and TX. TPACKET_V3 can be
added later on or could remain in packet_mmap.txt.
Cheers,
Carsten
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
2014-05-22 12:22 ` Carsten Andrich
@ 2014-05-22 13:13 ` Michael Kerrisk (man-pages)
2014-05-22 13:37 ` Jesper Dangaard Brouer
2014-05-22 14:51 ` Willem de Bruijn
2 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-22 13:13 UTC (permalink / raw)
To: Carsten Andrich
Cc: Willem de Bruijn, Daniel Borkmann, linux-man, Neil Horman,
jbrouer-H+wXaHxf7aLQT0dZR+AlfA
On Thu, May 22, 2014 at 2:22 PM, Carsten Andrich
<carsten.andrich-hs6bpBdVsEZfm0AUMx9V0g@public.gmane.org> wrote:
> Thanks for the feedback, Daniel and Willem.
>
> Executive summary: We need a good concept for distributing (preferably
> mutually exclusive) information among packet(7) and packet_mmap.txt.
>
> Willem de Bruijn schrieb:
>>> 0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
>>> it's layout, constraints etc. Btw, not sure if that's also
>>
>> This would duplicate the contents of
>> Documentation/networking/packet_mmap.txt? I would caution against
>> having two sources of documentation that may become inconsistent over
>> time. A detailed discussion could also become too long for a manual
>> page: packet_mmap.txt is already 1067 lines (albeit about half in
>> example code). If that document is confusing, a thorough edit of that
>> would be very helpful, though.
>>
>>> included already, but the same mmap-technique exists also for
>>> netlink sockets.
>>
>> See also Documentation/networking/netlink_mmap.txt . If the ring is a
>> generic netlink feature (i.e., not specific to nfnetlink), then man 7
>> netlink is the right place for user documentation (in as far as this
>> is a user-oriented feature).
>
> Some deduplication between netlink(7), packet(7), netlink_mmap.txt and
> packet_mmap.txt is probably a good idea. However, this is much more than
> I initially bargained for :)
> I had a superficial look at the netlink documents, and most concepts
> appear to be very much alike. The operational aspects make quite a large
> exception, though, since the netlink header (usage) is a lot simpler
> than tpacket_hdr including its different versions.
>
> IMHO user-space API documentation should reside in the man page and not
> Documentation/, but I'd like to heard Michael's opinion on that. Maybe
> it's a good idea to have at least a basic description on packet(7) and
> reserve packet_mmap.txt for the more advanced topics?
Yes, user-space API documentation is best in the man pages, IMO. The
split you suggest seems okay to me: the essentials in packet(7) and
referring as needed for advanced stuff to Documentation/
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
2014-05-22 12:22 ` Carsten Andrich
2014-05-22 13:13 ` Michael Kerrisk (man-pages)
@ 2014-05-22 13:37 ` Jesper Dangaard Brouer
2014-05-22 14:51 ` Willem de Bruijn
2 siblings, 0 replies; 10+ messages in thread
From: Jesper Dangaard Brouer @ 2014-05-22 13:37 UTC (permalink / raw)
To: Carsten Andrich
Cc: Willem de Bruijn, Daniel Borkmann, Michael Kerrisk (man-pages),
linux-man-u79uwXL29TY76Z2rM5mHXA, Neil Horman
On Thu, 22 May 2014 14:22:22 +0200 Carsten Andrich <carsten.andrich-hs6bpBdVsEZfm0AUMx9V0g@public.gmane.org> wrote:
> Using positive wording is always a good idea, but packet_mmap.txt
> already tricked me into believing that PACKET_TX_RING should be faster
> than plain sendto(). The user should be allowed to make an informed
> decision, which requires the manpage to tell the (ugly) truth that
> sendto() currently outperforms TX_RING.
See:
https://github.com/netsniff-ng/netsniff-ng/commit/d21b30bd64fdf4e7358037aa2d6f0cea02c49b6e
With most recent trafgen using small packets, TX_RING version is faster than sendto().
> > The optimization you refer to is to attach the tx-only packet socket
> > to a protocol family that is never observed, so that no packets are
> > looped back into the socket on receive. This is a great trick. There
> > are probably others. Again, I believe that such details belong more in
> > packet_mmap.txt than in the man page. But that is just one opinion, so
> > I'll gladly defer to Michael and others on that point.
See commit, on how to avoid the packet_rcv() call:
https://github.com/netsniff-ng/netsniff-ng/commit/c3602a995b21e8133c7f
It differs a little sendto() vs. TX_RING setup.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
2014-05-22 12:22 ` Carsten Andrich
2014-05-22 13:13 ` Michael Kerrisk (man-pages)
2014-05-22 13:37 ` Jesper Dangaard Brouer
@ 2014-05-22 14:51 ` Willem de Bruijn
[not found] ` <CA+FuTSfpORKtm_kdG+CycoPiq+Gxf58=nXqKApFEmR+xZs69_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2 siblings, 1 reply; 10+ messages in thread
From: Willem de Bruijn @ 2014-05-22 14:51 UTC (permalink / raw)
To: Carsten Andrich
Cc: Daniel Borkmann, Michael Kerrisk (man-pages),
linux-man-u79uwXL29TY76Z2rM5mHXA, Neil Horman,
jbrouer-H+wXaHxf7aLQT0dZR+AlfA
On Thu, May 22, 2014 at 8:22 AM, Carsten Andrich
<carsten.andrich-hs6bpBdVsEZfm0AUMx9V0g@public.gmane.org> wrote:
> Thanks for the feedback, Daniel and Willem.
>
> Executive summary: We need a good concept for distributing (preferably
> mutually exclusive) information among packet(7) and packet_mmap.txt.
>
> Willem de Bruijn schrieb:
>>> 0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
>>> it's layout, constraints etc. Btw, not sure if that's also
>>
>> This would duplicate the contents of
>> Documentation/networking/packet_mmap.txt? I would caution against
>> having two sources of documentation that may become inconsistent over
>> time. A detailed discussion could also become too long for a manual
>> page: packet_mmap.txt is already 1067 lines (albeit about half in
>> example code). If that document is confusing, a thorough edit of that
>> would be very helpful, though.
>>
>>> included already, but the same mmap-technique exists also for
>>> netlink sockets.
>>
>> See also Documentation/networking/netlink_mmap.txt . If the ring is a
>> generic netlink feature (i.e., not specific to nfnetlink), then man 7
>> netlink is the right place for user documentation (in as far as this
>> is a user-oriented feature).
>
> Some deduplication between netlink(7), packet(7), netlink_mmap.txt and
> packet_mmap.txt is probably a good idea. However, this is much more than
> I initially bargained for :)
Only do the bits that you enjoy. I certainly did not mean to imply that you
should do all this :) Just be aware of the consistency problem of
duplicating existing documentation.
> I had a superficial look at the netlink documents, and most concepts
> appear to be very much alike. The operational aspects make quite a large
> exception, though, since the netlink header (usage) is a lot simpler
> than tpacket_hdr including its different versions.
>
> IMHO user-space API documentation should reside in the man page and not
> Documentation/, but I'd like to heard Michael's opinion on that. Maybe
> it's a good idea to have at least a basic description on packet(7) and
> reserve packet_mmap.txt for the more advanced topics?
>
>>>>> 1. Increase detail of PACKET_{RX,TX}_RING socket options, including
>>>>> description of struct tpacket_hdr and anything else required to
>>>>> operate the ring.
>>
>> If expanding the man page, then moving mmap into a separate section
>> sounds good to me. If a man page is more user documentation than
>> kernel Documentation/ then perhaps start by discussing the pros and
>> cons of mmapped rings over recv and to help users decide whether to
>> use the mmapped ring, or for instance batch with recvmmsg().
>
> Actually I wanted to maintain the structure of the man page and
> describe everything inside the appropriate "Socket options" sections.
This may make the document unbalanced. Some options are only relevant
to the rings, and the ring setup itself is a large paragraph.
> However, adding a reference to recvmmsg() is a good idea as well, which
> would justify creating a new sections for "Advanced packet socket
> techniques".
>
>>>>> 2. Move some details from other sockopts (e.g. PACKET_LOSS) into
>>>>> *_RING.
>>
>> Yes, please move all ring-specific details into the new ring section.
>>
>>>>> 3. Add fully functional example source code for simple
>>>>> PACKET_{RX,TX}_RING operation (initialization and operation).
>>>>> This may be as much as 3 different example programs if I
>>>>> incorporate [2] and [3] in an appropriate manner. It might be a
>>>>> good idea to add a non-*_RING example as well.
>>>
>>>
>>> Yes, some examples for mmap RX, mmap TX, fanout, and perhaps TPACKET_V3
>>> might be great.
>>>
>>>
>>>>> 4. Add a warning about inferior _TX_RING performance [1] which I
>>>>> suffered from only recently in the measurements I made for my
>>>>> thesis on Linux 3.14.
>>
>> I would describe such points in a positive manner (optimization) as
>> opposed to a negative (inferior performance).
>
> Using positive wording is always a good idea, but packet_mmap.txt
> already tricked me into believing that PACKET_TX_RING should be faster
> than plain sendto(). The user should be allowed to make an informed
> decision,
Indeed. The document should not contain any simple statements about
one option being faster than another, because this invariably depends on
workload details (packet size, rate, threading, ...).
Instead, it should just explain the technical details and their implications:
an mmapped ring reduces the number of system calls, as does
recvmmsg/sendmmsg. It does not necessarily reduce the number of
data copies (a common misconception). Etcetera.
> which requires the manpage to tell the (ugly) truth that
> sendto() currently outperforms TX_RING.
I would not make such statements either way, then.
>
>> The optimization you refer to is to attach the tx-only packet socket
>> to a protocol family that is never observed, so that no packets are
>> looped back into the socket on receive. This is a great trick. There
>> are probably others. Again, I believe that such details belong more in
>> packet_mmap.txt than in the man page. But that is just one opinion, so
>> I'll gladly defer to Michael and others on that point.
>
> Since the tx-only optimization socket(*, *, 0) is not TX_RING-specific,
> this should be in the manpage (IMHO).
Agreed. I actually was unaware that 0 is even a correct value. As I said,
I used to use impopular protocol filters to achieve the same.
> As I said above, packet_mmap.txt
> may be a decent spot for advanced and {RX,TX}_RING-specific techniques.
>
>>> Can you elaborate? Jesper made recently a nice summary on using trafgen
>>> which uses TX_RING internally:
>>>
>>> http://netoptimizer.blogspot.ch/2014/04/trafgen-fast-packet-generator.html
>
> For my thesis I implemented a programmable switch in user-space (for
> better programmability and guaranteed API compatibility). Testing with
> 64-byte frames I reached a maximum frame rate of ~0.6 Mpps using
> {RX,TX}_RING, but ~0.87 Mpps using plain recvfrom()/sendto(). A hybrid
> RX_RING/sendto() approach with seperate RX/TX threads yielded the same
> frame rate as tests with pktgen (~0.89 Mpps). Fun fact: Open vSwitch
> reached ~1 Mpps and thus surprisingly surpassed pktgen. This might be
> worth investigating.
>
>>> Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the
>>> like.
>>
>> That would be very interesting. The packet -> block batching mechanism
>> likely was tested with small packet performance, but may have little
>> benefit for larger packets. A discussion of the trade offs from a user
>> point of view would be very interesting.
>
> Actually I intended to deal only with TPACKET_V2 for now, since it is
> simpler than TPACKET_V3 and can be use for RX and TX. TPACKET_V3 can be
> added later on or could remain in packet_mmap.txt.
Sure, let's leave that.
Your plan sounds good to me, Carsten.
>
>
> Cheers,
> Carsten
>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Improving PACKET_{RX,TX}_RING documentation
[not found] ` <CA+FuTSfpORKtm_kdG+CycoPiq+Gxf58=nXqKApFEmR+xZs69_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-05-26 10:49 ` Carsten Andrich
0 siblings, 0 replies; 10+ messages in thread
From: Carsten Andrich @ 2014-05-26 10:49 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Michael Kerrisk (man-pages),
linux-man-u79uwXL29TY76Z2rM5mHXA, Neil Horman,
jbrouer-H+wXaHxf7aLQT0dZR+AlfA
Willem de Bruijn schrieb:
>>> I would describe such points in a positive manner (optimization) as
>>> opposed to a negative (inferior performance).
>>
>> Using positive wording is always a good idea, but packet_mmap.txt
>> already tricked me into believing that PACKET_TX_RING should be faster
>> than plain sendto(). The user should be allowed to make an informed
>> decision,
>
> Indeed. The document should not contain any simple statements about
> one option being faster than another, because this invariably depends on
> workload details (packet size, rate, threading, ...).
>
> Instead, it should just explain the technical details and their implications:
> an mmapped ring reduces the number of system calls, as does
> recvmmsg/sendmmsg. It does not necessarily reduce the number of
> data copies (a common misconception). Etcetera.
>
>> which requires the manpage to tell the (ugly) truth that
>> sendto() currently outperforms TX_RING.
>
> I would not make such statements either way, then.
You're right. I'll just a note regarding the necessity of careful
performance considerations/evaluations :)
Maybe, eventually, some of Jesper's findings regarding *_RING
performance should end up in packet_mmap.txt.
>>>> Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the
>>>> like.
>>>
>>> That would be very interesting. The packet -> block batching mechanism
>>> likely was tested with small packet performance, but may have little
>>> benefit for larger packets. A discussion of the trade offs from a user
>>> point of view would be very interesting.
>>
>> Actually I intended to deal only with TPACKET_V2 for now, since it is
>> simpler than TPACKET_V3 and can be use for RX and TX. TPACKET_V3 can be
>> added later on or could remain in packet_mmap.txt.
>
> Sure, let's leave that.
>
> Your plan sounds good to me, Carsten.
Okay, it might take me a few weeks to come up with a first draft.
Cheers,
Carsten
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-05-26 10:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-17 13:13 Improving PACKET_{RX,TX}_RING documentation Carsten Andrich
[not found] ` <1400332406.2395.35.camel-FQO4gtnRtnzkVFMGpb/cPg@public.gmane.org>
2014-05-19 4:54 ` Michael Kerrisk (man-pages)
[not found] ` <53798E97.1000505-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-19 10:14 ` Daniel Borkmann
[not found] ` <5379D9A2.1070008-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-19 15:05 ` Willem de Bruijn
[not found] ` <CA+FuTSeWh_iQGqc-4usL7vr28OrkHTnBvHvXvVO=LcGsNRgtMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-19 16:01 ` Daniel Borkmann
2014-05-22 12:22 ` Carsten Andrich
2014-05-22 13:13 ` Michael Kerrisk (man-pages)
2014-05-22 13:37 ` Jesper Dangaard Brouer
2014-05-22 14:51 ` Willem de Bruijn
[not found] ` <CA+FuTSfpORKtm_kdG+CycoPiq+Gxf58=nXqKApFEmR+xZs69_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-26 10:49 ` Carsten Andrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).