From: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Michael Kerrisk-manpages
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH man-pages] man: packet.7: document fanout, ring and auxiliary options
Date: Fri, 06 Dec 2013 17:14:15 +0100 [thread overview]
Message-ID: <52A1F7D7.6040305@redhat.com> (raw)
In-Reply-To: <CA+FuTSdCfH_yum57ZWV9tw5cd0=DkWWR-OvnaUEkUf5O7JCQYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 12/06/2013 05:11 PM, Willem de Bruijn wrote:
>> [Very minor fixups. -dborkman]
>>
>> Signed-off-by: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> Acked-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>> Just a resend of something that got lost in March this year.
>
> Thanks for dusting this off, Daniel!
>
> I spotted a few small issues. We also introduced a few new flags since
> the last revision. If we have to make changes anyway, may as well
> describe those, too. Let me know if you will resubmit or prefer me to
> do it.
>
> I did not test the output of my changes yet, btw.
Feel free and take this over and resubmit.
I just didn't want to get this effort lost somewhere.
Thanks Willem !
>> +.I tp_net
>> +stores the offset to the network layer.
>> +If the packet socket is of type
>> +.BR SOCK_DGRAM ,
>> +then
>> +.I tp_mac
>> +is the same.
>> +If it is of type
>> +.BR SOCK_RAW ,
>> +then that field stores the offset to the link layer frame.
>
> This only applies to the metadata when passed in a packet ring frame
> and has to be moved there. The ring metadata structure is very similar
> to tpacket_auxdata (as mentioned below), but they differ in this
> regard: with recvmsg/auxdata the mac always starts at offset 0 for
> obvious reasons.
>
>> +.TP
>> +.BR PACKET_FANOUT " (since Linux 3.1)"
>> +.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
>> +To scale processing across threads, packet sockets can form a fanout
>> +group.
>> +In this mode, each matching packet is enqueued onto only one
>> +socket in the group.
>> +A socket joins a fanout group by calling
>> +.BR setsockopt (2)
>> +with level
>> +.B SOL_PACKET
>> +and option
>> +.BR PACKET_FANOUT .
>> +Each network namespace can have up to 65536 independent groups.
>> +A socket selects a group by encoding the ID in the first 16 bits of
>> +the integer option value.
>> +The first packet socket to join a group implicitly creates it.
>> +To successfully join an existing group, subsequent packet sockets
>> +must have the same protocol, device settings and fanout mode and
>> +flags (see below).
>> +Packet sockets can leave a fanout group only by closing the socket.
>> +The group is deleted when the last socket is closed.
>> +
>> +Fanout supports multiple algorithms to spread traffic between sockets.
>> +The default mode,
>> +.BR PACKET_FANOUT_HASH ,
>> +sends packets from the same flow to the same socket to maintain
>> +per-flow ordering.
>> +For each packet, it chooses a socket by taking the packet flow hash
>> +modulo the number of sockets in the group, where a flow hash is a hash
>> +over network layer address and optional transport layer port fields.
>> +The load balance mode
>> +.BR PACKET_FANOUT_LB
>> +implements a round-robin algorithm.
>> +.BR PACKET_FANOUT_CPU
>> +selects the socket based on the CPU that the packet arrived on.
>
> New options since the last patch:
>
> +.BR PACKET_FANOUT_ROLLOVER
> +processes all data on a single socket, moves to the next when one
> becomes backlogged.
> +.BR PACKET_FANOUT_RND:
> +selects the socket using a pseudo random number generator.
>
>> +
>> +Fanout modes can take additional options.
>> +IP fragmentation causes packets from the same flow to have different
>> +flow hashes.
>> +The flag
>> +.BR PACKET_FANOUT_FLAG_DEFRAG ,
>> +if set, causes packet to be defragmented before fanout is applied, to
>> +preserve order even in this case.
>> +Fanout mode and options are communicated in the second 16 bits of the
>> +integer option value.
>
> .BR PACKET_FANOUT_FLAG_ROLLOVER ,
> +if set, enables the roll over mechanism as a backup strategy. If the
> +original fanout algorithm selects a backlogged cpu, roll over to the
> +next available one.
>
>> +.TP
>> +.BR PACKET_LOSS " (with PACKET_TX_RING)"
>> +If set, do not silently drop a packet on transmission error, but
>> +return it with status set to
>> +.BR TP_STATUS_WRONG_FORMAT .
>> +.TP
>> +.BR PACKET_RESERVE " (with PACKET_RX_RING)"
>> +By default, a packet receive ring writes packets immediately following the
>> +metadata structure and alignment padding.
>> +This integer option reserves additional headroom.
>> +.TP
>> +.BR PACKET_RX_RING
>> +Create a memory mapped ring buffer for asynchronous packet reception.
>> +The packet socket reserves a contiguous region of application address
>> +space, lays it out into an array of packet slots and copies packets
>> +(up to
>> +.IR tp_snaplen
>> +) into subsequent slots.
>> +Each packet is preceded by a metadata structure similar to
>> +.IR tpacket_auxdata .
>
> This is where the mac discussion from above belongs.
>
>> +Packet socket and application communicate the head and tail of the ring
>> +through the
>> +.I tp_status
>> +field.
>> +The packet socket owns all slots with status
>> +.BR TP_STATUS_KERNEL .
>> +After filling a slot, it changes the status of the slot to transfer
>> +ownership to the application.
>> +During normal operation, the new status is
>> +.BR TP_STATUS_USER ,
>> +to signal that a correctly received packet has been stored.
>> +When the application has finished processing a packet, it transfers
>> +ownership of the slot back to the socket by setting the status to
>> +.BR TP_STATUS_KERNEL .
>> +Packet sockets implement multiple variants of the packet ring.
>> +The implementation details are described in
>> +.IR Documentation/networking/packet_mmap.txt
>> +in the Linux kernel source tree.
>> +.TP
>> +.BR PACKET_STATISTICS
>> +Retrieve packet socket statistics in the form of a structure
>> +
>> +.in +4n
>> +.nf
>> +struct tpacket_stats {
>> + __u32 tp_packets; /* total packet count */
>> + __u32 tp_drops; /* dropped packet count */
>
> these should apparently be
>
> + unsigned int tp_packets; /* total packet count */
> + unsigned int tp_drops; /* dropped packet count */
>
>> +};
>> +.fi
>> +.in
>> +
>
> All the rest looked fine.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-12-06 16:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1386081779.git.dborkman@redhat.com>
2013-12-06 10:41 ` [PATCH man-pages] man: packet.7: document fanout, ring and auxiliary options Daniel Borkmann
2013-12-06 16:11 ` Willem de Bruijn
[not found] ` <CA+FuTSdCfH_yum57ZWV9tw5cd0=DkWWR-OvnaUEkUf5O7JCQYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-06 16:14 ` Daniel Borkmann [this message]
[not found] ` <52A1F7D7.6040305-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-12-06 17:18 ` Willem de Bruijn
2013-12-06 19:54 ` Daniel Borkmann
[not found] ` <52A22B78.8070109-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-01-04 14:47 ` Daniel Borkmann
[not found] ` <52C81EF8.6090908-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-01-04 21:57 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkj8G6VvPLYF56884XcAWw+yOTUcF1UZSRwcYTk52D--zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-04 23:10 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52A1F7D7.6040305@redhat.com \
--to=dborkman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).