From: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org,
scott.a.mcmillan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org,
herbert-F6s6mLieUQo7FNHlEwC/lvQIK84fMopw@public.gmane.org
Subject: Re: [PATCH] man: packet.7: document fanout, ring and auxiliary options
Date: Sun, 21 Apr 2013 12:53:21 +0200 [thread overview]
Message-ID: <5173C521.7050208@redhat.com> (raw)
In-Reply-To: <1364563798-20221-1-git-send-email-willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
On 03/29/2013 02:29 PM, Willem de Bruijn wrote:
> The packet socket manual page does not list all socket options.
I guess this is version 2 of the patch, right?
> This patch adds descriptions of the common packet socket options
> PACKET_AUXDATA, PACKET_FANOUT, PACKET_RX_RING, PACKET_STATISTICS,
> PACKET_TX_RING
>
> and the ring-specific options
> PACKET_LOSS, PACKET_RESERVE, PACKET_TIMESTAMP, PACKET_VERSION
>
> It does not yet add descriptions for
> PACKET_COPY_THRESH, PACKET_HDRLEN, PACKET_ORIGDEV,
> PACKET_TX_HAS_OFF, PACKET_TX_TIMESTAMP, PACKET_VNET_HDR
>
> It tries to balance being informative with exposing kernel detail
> that is unlikely to be used by most readers or that may change
> frequently. For implementation details, the manpage points to the
> documentation in kernel Documentation/networking. Let me know if
> options should be added or removed.
>
> Source: PACKET_FANOUT, PACKET_RX_RING and PACKET_VERSION are in
> /tools/testing/net/psock_fanout.c in the latest Linux kernel source
> tree. PACKET_STATISTICS was in the first version of that test.
> PACKET_TX_RING I have used elsewhere. The other options are based
> on reading kernel code.
>
> If you are on the CC: list, then you are the author of one of
> the commits referred to in this manpage. If you can, please
> check whether my description of your change is correct. Thanks.
>
> Signed-off-by: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Content looks good to me, the two nitpicks below could be done in a tiny
follow-up patch.
Thanks for doing this Willem!
> ---
> man7/packet.7 | 207 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 198 insertions(+), 9 deletions(-)
>
> diff --git a/man7/packet.7 b/man7/packet.7
> index 006f2ac..a84ebee 100644
> --- a/man7/packet.7
> +++ b/man7/packet.7
> @@ -177,17 +177,22 @@ and
> .I sll_ifindex
> are used.
> .SS Socket options
> +Packet socket options are configured by calling
> +.BR setsockopt (2)
> +with level
> +.BR SOL_PACKET .
> +.TP
> +.BR PACKET_ADD_MEMBERSHIP
> +.PD 0
> +.TP
> +.BR PACKET_DROP_MEMBERSHIP
> +.PD
> Packet sockets can be used to configure physical layer multicasting
> and promiscuous mode.
> -It works by calling
> -.BR setsockopt (2)
> -on a packet socket for
> -.B SOL_PACKET
> -and one of the options
> .B PACKET_ADD_MEMBERSHIP
> -to add a binding or
> +adds a binding and
> .B PACKET_DROP_MEMBERSHIP
> -to drop it.
> +drops it.
> They both expect a
> .B packet_mreq
> structure as argument:
> @@ -227,11 +232,195 @@ In addition the traditional ioctls
> .BR SIOCADDMULTI ,
> .B SIOCDELMULTI
> can be used for the same purpose.
> +.TP
> +.BR PACKET_AUXDATA " (since Linux 2.6.21)"
> +.\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc
> +If this binary option is enabled, the packet socket passes a metadata
> +structure along with each packet in the
> +.BR recvmsg (2)
> +control field.
> +The structure can be read with
> +.BR cmsg (3).
> +It is defined as
> +
> +.in +4n
> +.nf
> +struct tpacket_auxdata {
> + __u32 tp_status;
> + __u32 tp_len; /* packet length */
> + __u32 tp_snaplen; /* captured length */
> + __u16 tp_mac;
> + __u16 tp_net;
> + __u16 tp_vlan_tci;
> + __u16 tp_padding;
> +};
> +.fi
> +.in
> +
> +.I tp_net
> +stores the offset to the network layer.
> +If the packet socket is of type
> +.BR SOCK_DGRAM ,
> +then
> +.I tp_mac
> +is the same.
> +If it is of type
> +.BR SOCK_RAW ,
> +then that field stores the offset to the link layer frame.
> +.TP
> +.BR PACKET_FANOUT " (since Linux 3.1)"
> +.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
> +To scale processing across threads, packet sockets can form a fanout
> +group.
> +In this mode, each matching packet is enqueued onto only one
> +socket in the group.
> +A socket joins a fanout group by calling
> +.BR setsockopt (2)
> +with level
> +.B SOL_PACKET
> +and option
> +.BR PACKET_FANOUT .
> +Each network namespace can have up to 65536 independent groups.
> +A socket selects a group by encoding the ID in the first 16 bits of
> +the integer option value.
> +The first packet socket to join a group implicitly creates it.
> +To successfully join an existing group, subsequent packet sockets
> +must have the same protocol, device settings and fanout mode and
> +flags (see below).
> +Packet sockets can leave a fanout group only by closing the socket.
> +The group is deleted when the last socket is closed.
> +
> +Fanout supports multiple algorithms to spread traffic between sockets.
> +The default mode,
> +.BR PACKET_FANOUT_HASH ,
> +sends packets from the same flow to the same socket to maintain
> +per-flow ordering.
> +For each packet, it chooses a socket by taking the packet flow hash
> +modulo the number of sockets in the group, where a flow hash is a hash
> +over network layer address and optional transport layer port fields.
> +The load balance mode
> +.BR PACKET_FANOUT_LB
> +implements a round-robin algorithm.
> +.BR PACKET_FANOUT_CPU
> +selects the socket based on the CPU that the packet arrived on.
> +
> +Fanout modes can take additional options.
> +IP fragmentation causes packets from the same flow to have different
> +flow hashes.
> +The flag
> +.BR PACKET_FANOUT_FLAG_DEFRAG ,
> +if set, causes packet to be defragmented before fanout is applied, to
> +preserve order even in this case.
> +Fanout mode and options are communicated in the second 16 bits of the
> +integer option value.
> +.TP
> +.BR PACKET_LOSS " (with PACKET_TX_RING)"
> +If set, do not silently drop a packet on transmission error, but
> +return it with status set to
> +.BR TP_STATUS_WRONG_FORMAT .
> +.TP
> +.BR PACKET_RESERVE " (with PACKET_RX_RING)"
> +By default, a packet receive ring writes packets immediately following the
> +metadata structure and alignment padding.
> +This integer option reserves additional headroom.
> +.TP
> +.BR PACKET_RX_RING
> +Create a memory mapped ring buffer for asynchronous packet reception.
> +The packet socket reserves a contiguous region of application address
> +space, lays it out into an array of packet slots and copies packets
> +(up to
> +.IR tp_snaplen)
Just a nitpick: I think here the ')' should not be underlined. But this
could be fixed in a follow-up patch probably.
> +into subsequent slots.
> +Each packet is preceded by a metadata structure similar to
> +.IR tpacket_auxdata .
> +Packet socket and application communicate the head and tail of the ring
> +through the
> +.I tp_status
> +field.
> +The packet socket owns all slots with status
> +.BR TP_STATUS_KERNEL .
> +After filling a slot, it changes the status of the slot to transfer
> +ownership to the application.
> +During normal operation, the new status is
> +.BR TP_STATUS_USER ,
> +to signal that a correctly received packet has been stored.
> +When the application has finished processing a packet, it transfers
> +ownership of the slot back to the socket by setting the status to
> +.BR TP_STATUS_KERNEL .
> +Packet sockets implement multiple variants of the packet ring.
> +The implementation details are described in
> +.IR Documentation/networking/packet_mmap.txt
> +in the Linux kernel source tree.
> +.TP
> +.BR PACKET_STATISTICS
> +Retrieve packet socket statistics in the form of a structure
> +
> +.in +4n
> +.nf
> +struct tpacket_stats {
> + __u32 tp_packets; /* total packet count */
> + __u32 tp_drops; /* dropped packet count */
> +};
> +.fi
> +.in
> +
> +Receiving statistics resets the internal counters.
> +The statistics structure differs when using a ring of variant
> +.BR TPACKET_V3 .
> +.TP
> +.BR PACKET_TIMESTAMP " (with PACKET_RX_RING)"
> +.\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649
> +The packet receive ring always stores a timestamp in the metadata header.
> +By default, this is a software generated timestamp generated when the
> +packet is copied into the ring.
> +This integer option selects the type of timestamp.
> +Besides the default, it support the two hardware formats described in
> +.IR Documentation/networking/timestamping.txt
> +in the Linux kernel source tree.
> +.TP
> +.BR PACKET_TX_RING " (since Linux 2.6.31)"
> +.\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1
> +Create a memory mapped ring buffer for packet transmission.
> +This option is similar to
> +.BR PACKET_RX_RING
> +and takes the same arguments.
> +The application writes packets into slots with status
> +.BR TP_STATUS_AVAILABLE
> +and schedules them for transmission by changing the status to
> +.BR TP_STATUS_SEND_REQUEST .
> +When packets are ready to be transmitted, the application calls
> +.BR send (2)
> +or a variant thereof.
> +The
> +.I buf
> +and
> +.I len
> +fields of this call are ignored.
> +If an address is passed using
> +.BR sendto (2)
> +or
> +.BR sendmsg (2) ,
> +then that overrides the socket default.
> +On successful transmission, the socket resets the slot to
> +.BR TP_STATUS_AVAILABLE .
> +It discards packets silently on error unless
> +.BR PACKET_LOSS
> +is set.
> +.TP
> +.BR PACKET_VERSION " (with PACKET_RX_RING)"
> +.\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279
> +By default,
> +.BR PACKET_RX_RING
> +creates a packet receive ring of variant
> +.BR TPACKET_V1 .
> +To create another variant, configure the desired variant by setting this
> +integer option before creating the ring.
> +
> .SS Ioctls
> .B SIOCGSTAMP
> can be used to receive the timestamp of the last received packet.
> Argument is a
> -.I struct timeval.
> +.I struct timeval .
Ditto '.'
> .\" FIXME Document SIOCGSTAMPNS
>
> In addition all standard ioctls defined in
> @@ -318,7 +507,7 @@ header to get a fully conforming packet.
> Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
> fields; instead they are supplied to the user as protocol
> .B ETH_P_802_2
> -with the LLC header prepended.
> +with the LLC header prefixed.
> It is thus not possible to bind to
> .BR ETH_P_802_3 ;
> bind to
>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-04-21 10:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-18 17:13 [PATCH] man: packet.7: document fanout, ring and auxiliary options Willem de Bruijn
[not found] ` <1363626807-22894-1-git-send-email-willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-03-28 10:01 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgYG7_iSAg0zZYs4V4TbYaBpQmNFcV8=XBGwUfrzi1amA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-29 13:25 ` Willem de Bruijn
[not found] ` <CA+FuTScjf2nkPykOkscuWXMuUSfAhDtQGCBhByxcyJ_cSsOPcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-29 13:29 ` Willem de Bruijn
[not found] ` <1364563798-20221-1-git-send-email-willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-03-29 15:02 ` McMillan, Scott A
2013-04-21 10:53 ` Daniel Borkmann [this message]
[not found] ` <5173C521.7050208-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-04-22 15:28 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5173C521.7050208@redhat.com \
--to=dborkman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
--cc=herbert-F6s6mLieUQo7FNHlEwC/lvQIK84fMopw@public.gmane.org \
--cc=johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org \
--cc=kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=scott.a.mcmillan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).