linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Kerrisk <mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
To: Johann Baudy <johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org>
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] AF_PACKET and packet mmap
Date: Fri, 31 Jul 2009 05:57:53 +0200	[thread overview]
Message-ID: <cfd18e0f0907302057q6836abaek80b4fab46e0f8fe5@mail.gmail.com> (raw)
In-Reply-To: <1248908658.6777.0.camel@bender>

Hi Johann.

On Thu, Jul 30, 2009 at 1:04 AM, Johann Baudy<johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org> wrote:
> From: Johann Baudy <johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org>
>
> Documentation of PACKET_RX_RING and PACKET_TX_RING socket options.
>
> Signed-off-by: Johann Baudy <johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org>

(Please CC me on patches. Otherwise I can easily miss them.)

The patch looks useful. Could you tell me how you got the info? (It
would help me try to verify it.)

Also, what kernel version number did these options appear in?

Thanks,

Michael
> --
>
>  man7/packet.7 |  212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 212 insertions(+), 0 deletions(-)
>
> diff --git a/man7/packet.7 b/man7/packet.7
> index 0b6c669..ec4973a 100644
> --- a/man7/packet.7
> +++ b/man7/packet.7
> @@ -222,6 +222,218 @@ In addition the traditional ioctls
>  .BR SIOCADDMULTI ,
>  .B SIOCDELMULTI
>  can be used for the same purpose.
> +
> +Packet sockets can also be used to have a direct access to network device
> +through configurable circular buffers mapped in user space.
> +They can be used to either send or receive packets.
> +
> +.B PACKET_TX_RING
> +enables and allocates a circular buffer for transmission process.
> +
> +.B PACKET_RX_RING
> +enables and allocates a circular buffer for capture process.
> +
> +They both expect a
> +.B packet_mreq
> +structure as argument:
> +
> +.in +4n
> +.nf
> +struct tpacket_req {
> +    unsigned int    tp_block_size;  /* Minimal size of contiguous block */
> +    unsigned int    tp_block_nr;    /* Number of blocks */
> +    unsigned int    tp_frame_size;  /* Size of frame */
> +    unsigned int    tp_frame_nr;    /* Total number of frames */
> +};
> +.fi
> +.in
> +
> +This structure establishes a circular buffer of unswappable memory.
> +Being mapped in the capture process allows reading the captured frames and
> +related meta-information like timestamps without requiring a system call.
> +Being mapped in the transmission process allows writing multiple packets that will be sent during
> +.BR send (2).
> +By using a shared buffer between the kernel and the user space also has
> +the benefit of minimizing packet copies.
> +
> +Frames are grouped in blocks. Each block is a physically contiguous
> +region of memory and holds
> +.B tp_block_size
> +/
> +.B tp_frame_size
> +frames.
> +
> +The total number of blocks is
> +.B tp_block_nr.
> +Note that
> +.B tp_frame_nr
> +is a redundant parameter because
> +
> +.in +4n
> +frames_per_block = tp_block_size/tp_frame_size
> +.in
> +
> +Indeed, packet_set_ring checks that the following condition is true
> +
> +.in +4n
> +frames_per_block * tp_block_nr == tp_frame_nr
> +.in
> +
> +A frame can be of any size with the only condition it can fit in a block. A block
> +can only hold an integer number of frames, or in other words, a frame cannot
> +be spawned across two blocks. Please refer to
> +.I networking/packet_mmap.txt
> +in kernel documentation for more details.
> +
> +Each frame contains a header followed by data.
> +Header is either a
> +.B struct tpacket_hdr
> +or
> +.B struct tpacket2_hdr
> +according to socket option
> +.B PACKET_VERSION
> +(which can be set to
> +.B TPACKET_V1
> +or
> +.B TPACKET_V2
> +respectively through
> +.BR setsockopt(2)
> +).
> +
> +With
> +.B TPACKET_V1:
> +
> +.in +4n
> +.nf
> +struct tpacket_hdr
> +{
> +    unsigned long      tp_status;
> +    unsigned int       tp_len;
> +    unsigned int       tp_snaplen;
> +    unsigned short     tp_mac;
> +    unsigned short     tp_net;
> +    unsigned int       tp_sec;
> +    unsigned int       tp_usec;
> +};
> +.fi
> +.in
> +
> +With
> +.B TPACKET_V2:
> +
> +.in +4n
> +.nf
> +struct tpacket2_hdr
> +{
> +    __u32 tp_status;
> +    __u32 tp_len;
> +    __u32 tp_snaplen;
> +    __u16 tp_mac;
> +    __u16 tp_net;
> +    __u32 tp_sec;
> +    __u32 tp_nsec;
> +    __u16 tp_vlan_tci;
> +};
> +.fi
> +.in
> +
> +.B tp_len
> +is the size of data received from network.
> +
> +.B tp_snaplen
> +is the size of data that follows the header.
> +
> +.B tp_mac
> +is the mac address offset (
> +.B PACKET_RX_RING
> +only).
> +
> +.B tp_net
> +is the network offset (
> +.B PACKET_RX_RING
> +only).
> +
> +.B tp_sec
> +,
> +.B tp_usec
> +is the timestamp of received packet (
> +.B PACKET_RX_RING
> +only).
> +
> +.B tp_status
> +is the status of current frame.
> +
> +For
> +.B PACKET_TX_RING ,
> +status can be
> +.B TP_STATUS_AVAILABLE
> +if the frame is available for new packet transmission;
> +.B TP_STATUS_SEND_REQUEST
> +if the frame is filled by user for transmission;
> +.B TP_STATUS_SENDING
> +if the frame is currently in transmission within the kernel;
> +.B TP_STATUS_WRONG_FORMAT
> +if the frame format is not properly formatted (This status will only be used if socket option
> +.B PACKET_LOSS
> +is set to 1).
> +
> +For
> +.B PACKET_RX_RING ,
> +a status equal to
> +.B TP_STATUS_KERNEL
> +indicates that the frame is available for kernel;
> +.B TP_STATUS_USER
> +indicates that kernel has received a packet (The frame is ready for user);
> +.B TP_STATUS_COPY
> +indicates that the frame (and associated meta information)
> +has been truncated because it's larger than
> +.B tp_frame_size
> +;
> +.B TP_STATUS_LOSING
> +indicates there were packet drops from last time
> +statistics where checked with
> +.BR getsockopt(2)
> +and the
> +.B PACKET_STATISTICS
> +option;
> +.B TP_STATUS_CSUMNOTREADY
> +is used for outgoing IP packets which it's checksum will be done in hardware.
> +
> +In order to use this shared memory, the user must call
> +.BR mmap (2)
> +function on packet socket. Then process depends on socket options:
> +
> +For
> +.B PACKET_TX_RING ,
> +the kernel initializes all frames to
> +.B TP_STATUS_AVAILABLE.
> +To send a packet, the user fills a data buffer of an available frame, sets tp_len to
> +current data buffer size and sets its status field to
> +.B TP_STATUS_SEND_REQUEST.
> +This can be done on multiple frames. Once the user is ready to transmit, it
> +calls
> +.BR send (2) .
> +Then all buffers with status equal to
> +.B TP_STATUS_SEND_REQUEST
> +are forwarded to the network device.
> +The kernel updates each status of sent frames with
> +.B TP_STATUS_SENDING
> +until the end of transfer.
> +At the end of each transfer, buffer status returns to
> +.B TP_STATUS_AVAILABLE.
> +
> +For
> +.B PACKET_RX_RING ,
> +the kernel initializes all frames to
> +.B TP_STATUS_KERNEL ,
> +when the kernel
> +receives a packet it puts in the buffer and updates the status with
> +at least the
> +.B TP_STATUS_USER
> +flag. Then the user can read the packet,
> +once the packet is read the user must zero the status field, so the kernel
> +can use again that frame buffer.
> +
>  .SS Ioctls
>  .B SIOCGSTAMP
>  can be used to receive the timestamp of the last received packet.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-man" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Watch my Linux system programming book progress to publication!
http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-07-31  3:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-29 23:04 [PATCH] AF_PACKET and packet mmap Johann Baudy
2009-07-31  3:57 ` Michael Kerrisk [this message]
     [not found]   ` <cfd18e0f0907302057q6836abaek80b4fab46e0f8fe5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-08-20  6:52     ` Johann Baudy
     [not found]       ` <7e0dd21a0908192352r6b5df47fybd3d475ef6f16b4b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-27  9:29         ` Johann Baudy
  -- strict thread matches above, loose matches on Subject: below --
2009-06-23 19:47 Johann Baudy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfd18e0f0907302057q6836abaek80b4fab46e0f8fe5@mail.gmail.com \
    --to=mtk.manpages-gm/ye1e23mwn+bqq9rbeug@public.gmane.org \
    --cc=johann.baudy-1YmjpbiIw0bR7s880joybQ@public.gmane.org \
    --cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).