netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Chavent <paul.chavent@fnac.net>
To: "Ricardo Tubío" <rtpardavila@gmail.com>
Cc: netdev@vger.kernel.org
Subject: Re: Single socket with TX_RING and RX_RING
Date: Mon, 20 May 2013 22:50:29 +0200	[thread overview]
Message-ID: <519A8C95.6090609@fnac.net> (raw)
In-Reply-To: <loom.20130515T152203-727@post.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3388 bytes --]

On 05/15/2013 03:32 PM, Ricardo Tubío wrote:
> Daniel Borkmann <dborkman <at> redhat.com> writes:
>
>>
>> On 05/15/2013 02:53 PM, Ricardo Tubío wrote:
>>> Once I tell kernel to export the TX_RING through setsockopt() (see code
>>> below) I always get an error (EBUSY) if i try to tell kernel to export the
>>> RX_RING with the same socket descriptor. Therefore, I have to open an
>>> additional socket for the RX_RING and I require of two sockets when I though
>>> that I would only require of one socket for both TX and RX using mmap()ed
>>> memory.
>>>
>>> Do I need both sockets or am I doing something wrong?
>>
>> The second time you call init_ring() in your code e.g. with TX_RING, where
>> you have previously set it up for the RX_RING. The kernel will give you
>> -EBUSY because the packet socket is already mmap(2)'ed.
>>
>
> Ok, so if I make the following system calls:
>
> void *ring=NULL;
> setsockopt(socket_fd, SOL_PACKET, PACKET_RX_RING, p, LEN__TPACKET_REQ);
> ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED, socket_fd, 0);
>
> Would I be permitted to use the ring map obtained both for RX and for TX? If
> so, for me it is confusing to use PACKET_RX_RING if I can also TX data
> through that ring...
>

Hello Ricardo.

I managed to use the same socket and a single mmaped area for both RX_RING and TX_RING. Here is some sample code :

/* open socket */
sock_fd = socket(PF_PACKET, socket_type, htons(socket_protocol));

/* socket tuning and init */
[...]

/* rings geometry */
rx_packet_req.tp_block_size = pagesize << order;
rx_packet_req.tp_block_nr = 1;
rx_packet_req.tp_frame_size = frame_size;
rx_packet_req.tp_frame_nr = (rx_packet_req.tp_block_size / rx_packet_req.tp_frame_size) * rx_packet_req.tp_block_nr;

tx_packet_req = rx_packet_req;

/* set packet version */
setsockopt(sock_fd, SOL_PACKET, PACKET_VERSION, &version, sizeof(version))

/* set RX ring option */
setsockopt(sock_fd, SOL_PACKET, PACKET_RX_RING, &rx_packet_req, sizeof(rx_packet_req))

/* set TX ring option*/
setsockopt(sock_fd, SOL_PACKET, PACKET_TX_RING, &tx_packet_req, sizeof(tx_packet_req))

/* map rx + tx buffer to userspace : they are in this order */
mmap_size =
     rx_packet_req.tp_block_size * rx_packet_req.tp_block_nr +
     tx_packet_req.tp_block_size * tx_packet_req.tp_block_nr ;
mmap_base = mmap(0, mmap_size, PROT_READ|PROT_WRITE, MAP_SHARED, sock_fd, 0);

/* get rx and tx buffer description */
rx_buffer_size = rx_packet_req.tp_block_size * rx_packet_req.tp_block_nr;
rx_buffer_addr = mmap_base;
rx_buffer_idx  = 0;
rx_buffer_cnt  = rx_packet_req.tp_block_size * rx_packet_req.tp_block_nr / rx_packet_req.tp_frame_size;

tx_buffer_size = tx_packet_req.tp_block_size * tx_packet_req.tp_block_nr;
tx_buffer_addr = mmap_base + rx_buffer_size;
tx_buffer_idx  = 0;
tx_buffer_cnt  = tx_packet_req.tp_block_size * tx_packet_req.tp_block_nr / tx_packet_req.tp_frame_size;


I join to this mail a complete (but certainly outdated) sample code.

I've also begun to write a kind of howto (in french) on the packet mmap at this page : http://paul.chavent.free.fr/packet_mmap.html (this is a work in progress, i will add information on timestamping)

Regards.

Paul.

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


[-- Attachment #2: ethernet.c --]
[-- Type: text/plain, Size: 38648 bytes --]

/*
 * This module allow to send/receive ethernet frames.
 * The type of ethernet frames must be specified at compile time :
 *  - use 8021Q or not
 *    - tpid and tci
 *  - ethertype
 *  - filtering or not
 *
 * See /usr/src/linux/Documentation/networking/packet_mmap.txt  for improvement
 *
 *
 * Notes on packet mmap
 *
 * For tx example see :
 *   http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example
 * For rx example see :
 *   http://www.scaramanga.co.uk/code-fu/lincap.c
 *
 * (1) If we open the socket with SOCK_DGRAM, the tp_mac and the
 *     tp_net are the same (the mac header isn't provided by the
 *     user). Eg tp_mac=80 and tp_net=80. If we open the socket with
 *     SOCK_RAW, the tp_net = tp_mac + 14. Eg tp_mac=66 and tp_net=80.
 *     (see (6) for alignment)
 *
 * (2) The tx and rx are asymetrics. On tx we fill data at 
 *       TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)
 *     on rx we get data at (see (1)) 
 *       tp_mac 
 *     or 
 *       tp_net 
 *
 * (3) The mmaping is made only once for the two sides. The map gives
 *     rx before tx.
 * 
 * (4) The tp_len is the real len of the frame, the tp_snaplen is the
 *     len of the data in the ring buffer. If you give a too small
 *     size for the struct tpacket_req -> tp_frame_size is the real
 *     length and if the PACKET_COPY_TRESH sockopt is set,
 *     TP_STATUS_COPY is set in tp_status.
 *
 * (5) The minimum tp_frame_size for tx is the minimum size of the
 *     payload (including the mac header if SOCK_RAW is selected) plus :
 *       TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)           = 32 
 *     The TPACKET2_HDRLEN - sizeof(struct sockaddr_ll) is always aligned
 *     to 16 bytes
 *
 *
 * (6) The minimum tp_frame_size for rx is the minimum size of the
 *     payload (including the mac header if SOCK_RAW is selected) plus :
 *       ALIGN_16(TPACKET2_HDRLEN) + 16 + tp_reserve (=0)       = 80 = tp_net 
 *     The tp_net will always be aligned to 16 bytes boundaries
 *
 *
 * RX FRAME STRUCTURE :
 *
 * Start (aligned to TPACKET_ALIGNMENT=16)   TPACKET_ALIGNMENT=16                                   TPACKET_ALIGNMENT=16
 * v                                         v                                                      v
 * |                                         |                             | tp_mac                 |tp_net
 * |  struct tpacket_hdr  ... pad            | struct sockaddr_ll ... gap  | min(16, maclen) = 16   |
 * |<--------------------------------------------------------------------->|<---------------------->|<----... 
 *                                tp_hdrlen = TPACKET2_HDRLEN                   if SOCK_RAW             user data
 *
 *
 * TX FRAME STRUCTURE :
 *
 * Start (aligned to TPACKET_ALIGNMENT=16)   TPACKET_ALIGNMENT=16
 * v                                         v
 * |                                         |
 * |  struct tpacket_hdr  ... pad            | struct sockaddr_ll ... gap
 * |<--------------------------------------------------------------------->| 
 *                                tp_hdrlen = TPACKET2_HDRLEN
 *                                           |<---- ... 
 *                                               user data
 *
 *
 * TODO / IMPROVEMENTS
 *  vlan 802Q
 *  timestamp
 *  filtering
 *  set the mtu according to the tp_frame_size or set tp_frame_size according
 *  to the mtu ?
 */

#undef  USE_FILTER
#define COOKED_PACKET
#undef  P_8021Q
#define PATCHED_PACKET

#define _GNU_SOURCE 

#include <assert.h>           /* assert */
#include <stdio.h>            /* printf */
#include <stdlib.h>           /* calloc, free */
#include <string.h>           /* memcpy */
#include <errno.h>            /* errno, perror, etc */
#include <unistd.h>           /* close */
#include <sys/ioctl.h>        /* ioctl */
#include <arpa/inet.h>        /* htons, ntohs */
#include <poll.h>             /* poll */
#include <time.h>             /* struct timespec */
#include <sys/timerfd.h>      /* timerfd_create etc. */
#include <sys/mman.h>         /* mmap */
#include <sys/socket.h>       /* socket */
#include <net/if.h>           /* ifreq, ifconf */
#include <net/ethernet.h>     /* struct ether_header, ETH_ALEN, ... */
#include <linux/if_packet.h>  /* packet mmap*/
#if defined(USE_FILTER)
#include <linux/types.h>      /* attach filter */
#include <linux/filter.h>     /* attach filter */
#endif

#include "ethernet.h"
#if !defined(NDEBUG)
#include "debug.h"
#endif

#define MIN(x,y) ((x)<(y)?(x):(y))

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
 
static inline unsigned next_power_of_two(unsigned n)
{
  n--;
  n |= n >> 1;
  n |= n >> 2;
  n |= n >> 4;
  n |= n >> 8;
  n |= n >> 16;
  n++;
  return n;
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
 
static const uint8_t broadcast_addr[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
struct ethernet_s
{
#if !defined(NDEBUG)
  int                debug;
#endif

  int                timer_fd;

  int                sock_fd;

  struct sockaddr_ll local_addr;
  struct sockaddr_ll remote_addr;

  unsigned           mtu;

  struct tpacket_req rx_packet_req;
  struct tpacket_req tx_packet_req;

  void *             mmap_base;
  unsigned           mmap_size;

  unsigned           rx_buffer_size;
  void *             rx_buffer_addr;
  unsigned           rx_buffer_cnt;
  unsigned           rx_buffer_idx;
  unsigned           rx_buffer_payload_offset;
  unsigned           rx_buffer_payload_max_size;

  unsigned           tx_buffer_size;
  void *             tx_buffer_addr;
  unsigned           tx_buffer_cnt; 
  unsigned           tx_buffer_idx;
  unsigned           tx_buffer_payload_offset;
  unsigned           tx_buffer_payload_max_size;

  struct pollfd      pollfd[2];
};

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

/* http://standards.ieee.org/develop/regauth/ethertype/eth.txt */
#define ETH_TYPE 0x88b5

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if !defined(COOKED_PACKET)

static const int socket_type     = SOCK_DGRAM;
static const int socket_protocol = ETH_P_802_3;
static const int bind_protocol   = ETH_P_802_2; // man packet section Notes
static const int send_protocol   = ETH_TYPE;

#endif /* !defined(COOKED_PACKET) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if defined(COOKED_PACKET) && !defined(P_8021Q)

static const int socket_type     = SOCK_RAW;
static const int socket_protocol = ETH_P_802_3;
static const int bind_protocol   = ETH_P_802_2; // man packet section Notes
static const int send_protocol   = ETH_TYPE;

struct ether_header_s
{
  uint8_t  dhost[ETH_ALEN];
  uint8_t  shost[ETH_ALEN];
  uint16_t type;
} __attribute__ ((__packed__));

typedef struct ether_header_s ether_header_t;

#endif /* defined(COOKED_PACKET) && !defined(P_8021Q) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if defined(COOKED_PACKET) && defined(P_8021Q)

static const int socket_type     = SOCK_RAW;
static const int socket_protocol = ETH_P_ALL;
static const int bind_protocol   = ETH_P_ALL;
static const int send_protocol   = ETH_TYPE;

struct ether_header_s
{
  uint8_t   dhost[ETH_ALEN];
  uint8_t   shost[ETH_ALEN];
  uint16_t  tpid;
  uint16_t  tci;
  uint16_t  type;
} __attribute__ ((__packed__));

typedef struct ether_header_s ether_header_t;

#define E_8021Q_TPID 0x8100
#define E_8021Q_TCI  0xEFFE

#define E_8021Q_PCP 0x7     /* priority : highest -> better, from 0 to 7 */
#define E_8021Q_CFI 0
#define E_8021Q_VID 0xFFE   /* vlan id, from 0 (reserved) to 0xFFF (reserved) */

#endif /* defined(COOKED_PACKET) && defined(P_8021Q) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if defined(USE_FILTER)

static struct sock_filter filt_prog_code[] =
{
#if defined(P_8021Q)
  /* load and check tpid */
  BPF_STMT(BPF_LD  | BPF_H   | BPF_ABS, 12),                /* Load tpid */
  BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,   E_8021Q_TPID, 1, 0),/* equal 8021Q_TPID */
  BPF_STMT(BPF_RET | BPF_K,             0),                 /* reject */
  /* load and check tci */
  BPF_STMT(BPF_LD  | BPF_H   | BPF_ABS, 14),               /* Load tci */
  BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,   E_8021Q_TCI, 1, 0),/* equal 8021Q_TCI */
  BPF_STMT(BPF_RET | BPF_K,             0),                /* reject */
#endif /* defined(USE_8021Q) */
  BPF_STMT(BPF_LD  | BPF_H   | BPF_ABS, ETH_HDR_LEN - 2),  /* Load ether type */
  BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,   ETH_TYPE, 1, 0),   /* equal ETHER_TYPE */
  BPF_STMT(BPF_RET | BPF_K,             0),                /* reject */
  BPF_STMT(BPF_RET | BPF_K,             65535),            /* accept */
};

static struct sock_fprog filt_prog =
{
  sizeof(filt_prog_code) / sizeof(filt_prog_code[0]),
  filt_prog_code
};

#endif /* defined(USE_FILTER) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
#if !defined(NDEBUG)
static void ethernet_debug_frame(const void * base);
static void ethernet_debug_packet_req(const struct tpacket_req * rx_packet_req, const struct tpacket_req * tx_packet_req);
#endif

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
ethernet_t * ethernet_alloc()
{
  ethernet_t * itf = calloc(1, sizeof(*itf));
  if(itf)
    {
      itf->timer_fd = -1;
      itf->sock_fd = -1;
      itf->mmap_base = (void *)-1;
    }
  return itf;
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
void ethernet_free(ethernet_t *itf)
{
 /* check parameters */
  assert(itf);

  ethernet_close(itf);

  free(itf);
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_open(ethernet_t *itf, const char *itf_name)
{
  struct ifreq ifr;
  int err = 0;
  socklen_t errlen = sizeof(err);

  /* fill ifr name field */
  memset(&ifr, 0, sizeof(ifr));
  strncpy(ifr.ifr_name, itf_name, sizeof(ifr.ifr_name));

  /* check parameters */
  assert(itf);

  /* cleanup */
  ethernet_close(itf);

  /* setup timer fd */
  itf->timer_fd = timerfd_create(CLOCK_REALTIME, 0);
  if(itf->timer_fd < 0)
    {
      perror("timerfd_create failed");
      return -1;
    }

  /* open socket */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "socket\n");
    }
#endif
  itf->sock_fd = socket(PF_PACKET, socket_type, htons(socket_protocol));
  if(itf->sock_fd < 0)
    {
      perror("socket failed");
      return -1;
    }
  
#if defined(USE_FILTER)
  /* attach filter */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "setsockopt SO_ATTACH_FILTER\n");
    }
#endif
  if(setsockopt(itf->sock_fd, SOL_SOCKET, SO_ATTACH_FILTER, &filt_prog, sizeof(filt_prog)))
    {
      perror("getsockopt SO_ERROR failed");
      return -1;
    }
#endif /* defined(USE_FILTER) */

  /* set local addr */
  memset(&itf->local_addr, 0, sizeof(itf->local_addr));
  itf->local_addr.sll_family = AF_PACKET;
  itf->local_addr.sll_protocol = htons(bind_protocol);

  /* get itf index */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ioctl SIOCGIFINDEX\n");
    }
#endif
  if(ioctl(itf->sock_fd, SIOCGIFINDEX, &ifr) == -1)
    {
      perror("ioctl SIOCGIFINDEX failed");
      return -1;
    }
  itf->local_addr.sll_ifindex = ifr.ifr_ifindex;
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "if index %d\n", ifr.ifr_ifindex);
    }
#endif

  /* get own MAC address */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ioctl SIOCGIFHWADDR\n");
    }
#endif
  if(ioctl(itf->sock_fd, SIOCGIFHWADDR, &ifr) < 0)
    {
      perror("ioctl SIOCGIFHWADDR failed");
      return -1;
    }
  itf->local_addr.sll_halen = ETH_ALEN;
  memcpy(&itf->local_addr.sll_addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN);
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "if mac addr %02x:%02x:%02x:%02x:%02x:%02x:\n", 
              itf->local_addr.sll_addr[0], itf->local_addr.sll_addr[1], 
              itf->local_addr.sll_addr[2], itf->local_addr.sll_addr[3],
              itf->local_addr.sll_addr[4], itf->local_addr.sll_addr[5]);
    }
#endif

  /* bind to eth */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "bind\n");
    }
#endif
  if(bind(itf->sock_fd, (const void *)&itf->local_addr, sizeof(itf->local_addr)) == -1)
    {
      perror("bind failed");
      return -1;
    }

  /* any pending errors, e.g., network is down? */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "getsockopt SO_ERROR\n");
    }
#endif
  if(getsockopt(itf->sock_fd, SOL_SOCKET, SO_ERROR, &err, &errlen) == -1)
    {
      perror("getsockopt SO_ERROR failed");
      return -1;
    }
  if(err > 0)
    {
      fprintf(stderr, "network is down ?\n");
      return -1;
    }

  /* set remote addr */
  itf->remote_addr = itf->local_addr;
  itf->remote_addr.sll_protocol = htons(send_protocol);
  memcpy(&itf->remote_addr.sll_addr, broadcast_addr, ETH_ALEN);

  /* get own MTU */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ioctl SIOCGIFMTU\n");
    }
#endif
  if (ioctl(itf->sock_fd, SIOCGIFMTU, &ifr) < 0)
    {
      perror("ioctl SIOCGIFMTU failed");
      return -1;
    }
  itf->mtu = ifr.ifr_mtu;
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "Mtu %d\n", itf->mtu);
    }
#endif

  /* prepare packet mmaping */
  const long pagesize = sysconf(_SC_PAGESIZE); /* assume 4096 */
  const unsigned order = 1;
  const unsigned frame_size = next_power_of_two(itf->mtu + 128); /* 128 is an arbitrary value */ 

  /* tp_block_size must be a power of two of PAGE_SIZE */
  itf->rx_packet_req.tp_block_size = pagesize << order; 
  /* tp_block_nr */
  itf->rx_packet_req.tp_block_nr = 1;
  /* tp_frame_size must be greater than TPACKET2_HDRLEN and a multiple 
   * of TPACKET_ALIGNMENT. It should also be a divisor of tp_block_size */
  itf->rx_packet_req.tp_frame_size = frame_size;
  /* tp_frame_nr */
  itf->rx_packet_req.tp_frame_nr = (itf->rx_packet_req.tp_block_size / itf->rx_packet_req.tp_frame_size) * itf->rx_packet_req.tp_block_nr;

  /* sanity checks */
  if(frame_size <= TPACKET2_HDRLEN)
    {
      fprintf(stderr, "frame_size (%u) must be greater than TPACKET2_HDRLEN (%u)\n", frame_size, TPACKET2_HDRLEN);
      return -1;
    }
  if((frame_size % TPACKET_ALIGNMENT) != 0)
    {
      fprintf(stderr, "frame_size (%u) must be a multiple of TPACKET_ALIGNMENT (%u)\n", frame_size, TPACKET_ALIGNMENT);
      return -1;
    }
  if((itf->rx_packet_req.tp_block_size % frame_size) != 0)
    {
      fprintf(stderr, "frame_size (%u) must be a divisor of tp_block_size (%u)\n", frame_size, itf->rx_packet_req.tp_block_size);
      return -1;
    }

  /* same settings for tx */
  itf->tx_packet_req = itf->rx_packet_req;

#if !defined(NDEBUG)
  if(itf->debug)
    {
      ethernet_debug_packet_req(&itf->rx_packet_req, &itf->tx_packet_req);
    }
#endif
  
  /* set paquet version option */
  int version = TPACKET_V2;
  if(setsockopt(itf->sock_fd, SOL_PACKET, PACKET_VERSION, &version, sizeof(version)) < 0)
    {
      perror("setsockopt: PACKET_VERSION");
      return -1;
    }

  /* set RX ring option */
  if (setsockopt(itf->sock_fd, SOL_PACKET, PACKET_RX_RING, &itf->rx_packet_req, sizeof(itf->rx_packet_req)) < 0)
    {
      perror("setsockopt: PACKET_RX_RING");
      return -1;
    }
 
  /* set TX ring option*/
  if (setsockopt(itf->sock_fd, SOL_PACKET, PACKET_TX_RING, &itf->tx_packet_req, sizeof(itf->tx_packet_req)) < 0)
    {
      perror("setsockopt: PACKET_TX_RING");
      return -1;
    }

  /* map rx + tx buffer to userspace : they are in this order */
  itf->mmap_size = 
    itf->rx_packet_req.tp_block_size * itf->rx_packet_req.tp_block_nr +
    itf->tx_packet_req.tp_block_size * itf->tx_packet_req.tp_block_nr ;
  itf->mmap_base = mmap(0, itf->mmap_size, PROT_READ|PROT_WRITE, MAP_SHARED, itf->sock_fd, 0);
  if (itf->mmap_base == (void*)-1)
    {
      perror("mmap rx buffer failed");
      return -1;
    }

  /* get rx and tx buffer description */
  itf->rx_buffer_size = itf->rx_packet_req.tp_block_size * itf->rx_packet_req.tp_block_nr;
  itf->rx_buffer_addr = itf->mmap_base;
  itf->rx_buffer_idx  = 0;
  itf->rx_buffer_cnt  = itf->rx_packet_req.tp_block_size * itf->rx_packet_req.tp_block_nr / itf->rx_packet_req.tp_frame_size;

  itf->tx_buffer_size = itf->tx_packet_req.tp_block_size * itf->tx_packet_req.tp_block_nr;
  itf->tx_buffer_addr = itf->mmap_base + itf->rx_buffer_size;
  itf->tx_buffer_idx  = 0;
  itf->tx_buffer_cnt  = itf->tx_packet_req.tp_block_size * itf->tx_packet_req.tp_block_nr / itf->tx_packet_req.tp_frame_size;

  /* 
   * Precompute payload offset and max size 
   * Warning : tx and rx are asymetrics 
   */

  /*
   * - on rx we get data at tp_net (SOCK_DGRAM) and tp_mac if we need mac 
   *   header (SOCK_RAW) 
   *   the rx_buffer_payload_offset is the offset from the tp_net of the frame !
   *   For computing max size we consider the tp_net to be :
   *     TPACKET2_HDRLEN + 16 + reserve   (=80)
   *   or
   *     TPACKET2_HDRLEN + min(16, maclen) + reserve
   *   see src/linux/net/packet/af_packet.c tpacket_rcv  
   */
  itf->rx_buffer_payload_offset = TPACKET_ALIGN(TPACKET2_HDRLEN + MIN(sizeof(ether_header_t), 16)); // only used here, use tp_net elsewhere
  itf->rx_buffer_payload_max_size = itf->rx_packet_req.tp_frame_size - itf->rx_buffer_payload_offset;

  /*
   * - on tx we fill data at 
   *     TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)
   *   or
   *     TPACKET2_HDRLEN + min(16, maclen)
   *   see src/linux/net/packet/af_packet.c tpacket_fill_skb  
   */
#if defined(PATCHED_PACKET)
  itf->tx_buffer_payload_offset = TPACKET_ALIGN(TPACKET2_HDRLEN + MIN(sizeof(ether_header_t), 16));
#else /* defined(PATCHED_PACKET) */
  itf->tx_buffer_payload_offset = (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll));
#endif /* defined(PATCHED_PACKET) */
  itf->tx_buffer_payload_max_size = itf->tx_packet_req.tp_frame_size - itf->tx_buffer_payload_offset;

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "rx_buffer_payload_max_size %u\n", 
              itf->rx_buffer_payload_max_size);
      fprintf(stdout, "tx_buffer_payload_max_size %u\n", 
              itf->tx_buffer_payload_max_size);
    }
#endif

#if defined(COOKED_PACKET)
  /* for each packet we initialize the ethernet header */
  ether_header_t ether_header;
  memcpy(ether_header.dhost, &itf->remote_addr.sll_addr, sizeof(ether_header.dhost));
  memcpy(ether_header.shost, &itf->local_addr.sll_addr, sizeof(ether_header.shost));
#if defined(P_8021Q)
  ether_header.tpid = htons(E_8021Q_TPID);
  ether_header.tci  = htons(E_8021Q_TCI);
#endif /* defined(P_8021Q) */
  ether_header.type = htons(send_protocol);
  for(unsigned i = 0; i < itf->tx_buffer_cnt; i++)
    {
      void * base = itf->tx_buffer_addr + i * itf->tx_packet_req.tp_frame_size;
      memcpy(base + itf->tx_buffer_payload_offset - sizeof(ether_header_t), &ether_header, sizeof(ether_header_t));
    }
  
  /* override the setting of the tx data offset and size */

  /* apply the diffs */
  itf->rx_buffer_payload_max_size -= sizeof(ether_header);
  itf->tx_buffer_payload_max_size -= sizeof(ether_header);

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "rx_buffer_payload_max_size %u\n", 
              itf->rx_buffer_payload_max_size);
      fprintf(stdout, "tx_buffer_payload_max_size %u\n", 
              itf->tx_buffer_payload_max_size);
    }
#endif

#endif /* defined(COOKED_PACKET) */

  /* threshold payload max size according to the mtu */
  if(itf->mtu < itf->rx_buffer_payload_max_size)
    {
      itf->rx_buffer_payload_max_size = itf->mtu;
    }
  if(itf->mtu < itf->tx_buffer_payload_max_size)
    {
      itf->tx_buffer_payload_max_size = itf->mtu;
    }

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "rx_buffer_payload_max_size %u\n", itf->rx_buffer_payload_max_size);
      fprintf(stdout, "tx_buffer_payload_max_size %u\n", itf->tx_buffer_payload_max_size);
    }
#endif

  /* setup poll fd */

  itf->pollfd[0].fd      = itf->timer_fd;
  itf->pollfd[0].events  = POLLIN;
  itf->pollfd[0].revents = 0;

  itf->pollfd[1].fd      = itf->sock_fd;
  itf->pollfd[1].events  = POLLIN|POLLRDNORM|POLLERR;
  itf->pollfd[1].revents = 0;

  return 0;
}


/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
void ethernet_close(ethernet_t * itf)
{
  /* check parameters */
  assert(itf);

  /* */
  if(itf->mmap_base != (void *)-1)
    {
      munmap(itf->mmap_base, itf->mmap_size);
      itf->mmap_base = (void *)-1;
      itf->mmap_size = 0;
    }

  /* close socket */
  if(0 <= itf->sock_fd)
    {
#if !defined(NDEBUG)
      if(itf->debug)
        {
          fprintf(stdout, "close\n");
        }
#endif
      close(itf->sock_fd);
      itf->sock_fd = -1;
    }

  /* close timer */
  if(0 <= itf->timer_fd)
    {
      close(itf->timer_fd);
      itf->timer_fd = -1;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
void ethernet_purge(ethernet_t * itf)
{
  /* check parameters */
  assert(itf);

  /* get base adress of the current rx frame */
  void * base = itf->rx_buffer_addr + itf->rx_buffer_idx * itf->rx_packet_req.tp_frame_size;
  volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;
  while(header->tp_status != TP_STATUS_KERNEL)
    {
      /* load the next rx frame index */
      if(itf->rx_buffer_idx < (itf->rx_buffer_cnt - 1))
        {
          itf->rx_buffer_idx ++;
        }
      else
        {
          itf->rx_buffer_idx = 0;
        }

      /* clear the status */
      header->tp_status = TP_STATUS_KERNEL;

      /* get base adress of the current rx frame */
      base = itf->rx_buffer_addr + itf->rx_buffer_idx * itf->rx_packet_req.tp_frame_size;
      header = (struct tpacket2_hdr *)base;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_rx_request(ethernet_t * itf, ethernet_msg_t * msg)
{
  /* check parameters */
  assert(itf && msg);

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_rx_request\n");
    }
#endif

  if(msg->data || msg->data_len)
    {
      fprintf(stderr, "Rx request have to be released before requested.\n");
      return -1;
    }
 
  /* get base adress of the current rx frame */
  void * base = itf->rx_buffer_addr + itf->rx_buffer_idx * itf->rx_packet_req.tp_frame_size;
  volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;

  /* check if we need to poll */
  if(header->tp_status == TP_STATUS_KERNEL)
    {
      int err;

      /* setup read timeout */
      struct itimerspec to = {{0,0}, msg->to};
      int flags = (msg->to_is_relative)?0:TFD_TIMER_ABSTIME;
      err = timerfd_settime(itf->timer_fd, flags, &to, NULL);
      if(err < 0)
        {
          perror("timerfd_settime failed");
          return -1;
        }
     
      /* poll input */
      itf->pollfd[0].revents = 0;
      itf->pollfd[1].revents = 0;
      err = ppoll(itf->pollfd, 2, NULL, NULL);
      if(err < 0)
        {
          perror("ppoll failed");
          fprintf(stderr, "revents = %hd %hd\n", itf->pollfd[0].revents, itf->pollfd[1].revents);
          return -1;
        }
#if !defined(NDEBUG)
      else if(err == 0)
        {
          fprintf(stderr, "ppoll timeout unexpected\n");
          return -1;
        }
#endif
      else if(itf->pollfd[0].revents == POLLIN)
        {
#if !defined(NDEBUG)
          if(itf->debug)
            {
              fprintf(stdout, "timerfd timeout\n");
            }
#endif
          return 0;
        }
#if !defined(NDEBUG)
      else if(!itf->pollfd[1].revents)
        {
          fprintf(stderr, "event on socket axpected\n");
          return -1;
        }
#endif
      else if(itf->pollfd[1].revents & POLLERR)
        {
          fprintf(stderr, "error on socket poll\n");
          return -1;
        }
    }

#if !defined(NDEBUG)
  if(itf->debug)
    {
      ethernet_debug_frame(base);
    }
#endif

  /* so, here we have a frame ready to process */

  /* load the next rx frame index */
  if(itf->rx_buffer_idx < (itf->rx_buffer_cnt - 1))
    {
      itf->rx_buffer_idx ++;
    }
  else
    {
      itf->rx_buffer_idx = 0;
    }

  /* if the frame is good for reading */
  if((header->tp_status == TP_STATUS_USER) && header->tp_snaplen)
    {
      /* give to the caller the payload adress and size */
      msg->data = base + header->tp_net;
      msg->data_len = header->tp_snaplen; 
#if defined(COOKED_PACKET) // hope that header->tp_net - sizeof(ether_header_t) == header->tp_mac
      assert((header->tp_net - sizeof(ether_header_t)) == header->tp_mac);
      msg->data_len -= sizeof(ether_header_t);
#endif
      return 0;
    }
  else
    {
      fprintf(stderr, "capture failed : revents %x, status %d, snap_len %d\n", itf->pollfd[1].revents, header->tp_status, header->tp_snaplen);
      header->tp_status = TP_STATUS_KERNEL;
      return -1;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_rx_release(ethernet_t * itf, ethernet_msg_t * msg)
{
  /* check parameters */
  assert(itf && msg);

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_rx_release\n");
    }
#endif

  if(!msg->data || !msg->data_len)
    {
      fprintf(stderr, "Rx request have to be requested before release.\n");
      return -1;
    }

  /* find the index of the frame associated to this data pointer */
  int i = (msg->data - itf->rx_buffer_addr) / itf->rx_packet_req.tp_frame_size;
  if((0 <= i) &&  ((unsigned)i < itf->rx_buffer_cnt))
    {
      void * base = itf->rx_buffer_addr + i * itf->rx_packet_req.tp_frame_size;
      volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;
      header->tp_status = TP_STATUS_KERNEL;
      msg->data = 0;
      msg->data_len = 0;
      return 0;
    }
  else
    {
      fprintf(stderr, "Rx release addr out of range (%p).\n", msg->data);
      return -1;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_tx_request(ethernet_t * itf, ethernet_msg_t * msg)
{
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_tx_request\n");
    }
#endif

  /* check parameters */
  assert(itf && msg);

  if(msg->data || msg->data_len)
    {
      fprintf(stderr, "Tx request have to be released before requested.\n");
      return -1;
    }
 
  /* get base adress of the current tx frame */
  void * base;
  volatile struct tpacket2_hdr * header;
  do
    {
      /* get base adress of the current tx frame */
      base = itf->tx_buffer_addr + itf->tx_buffer_idx * itf->tx_packet_req.tp_frame_size;
      header = (struct tpacket2_hdr *)base;

      /* load the next tx frame index */
      if(itf->tx_buffer_idx < (itf->tx_buffer_cnt - 1))
        {
          itf->tx_buffer_idx ++;
        }
      else
        {
          itf->tx_buffer_idx = 0;
        }

    } while(header->tp_status != TP_STATUS_AVAILABLE);

  /* give to the caller the payload adress and size */
  msg->data = base + itf->tx_buffer_payload_offset;
  msg->data_len = itf->tx_buffer_payload_max_size;

#if !defined(NDEBUG)
  if(itf->debug)
    {
      ethernet_debug_frame(base);
    }
#endif

  return 0;
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_tx_release(ethernet_t * itf, ethernet_msg_t * msg)
{
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_tx_release\n");
    }
#endif

  /* check parameters */
  assert(itf && msg);

  if(!msg->data || !msg->data_len)
    {
      fprintf(stderr, "Tx request have to be requested before released.\n");
      return -1;
    }

  if(itf->tx_buffer_payload_max_size < msg->data_len)
    {
      fprintf(stderr, "Tx request can be greater than %d bytes (requested %d).\n", itf->tx_buffer_payload_max_size, msg->data_len);
      return -1;
    }
 
  /* ethernet payload are at least 46 bytes */
  if(msg->data_len < 46)
    {
      memset(msg->data + msg->data_len, 0, 46 - msg->data_len);
      msg->data_len = 46;
    }

  /* find the index of the frame associated to this data pointer */
  int i = (msg->data - itf->tx_buffer_addr) / itf->tx_packet_req.tp_frame_size;
  if((i < 0) || (itf->tx_buffer_cnt <= (unsigned)i))
    {
      fprintf(stderr, "Tx release addr out of range (%p).\n", msg->data);
      return -1;
    }

  /* get base adress of this tx frame */
  void * base = itf->tx_buffer_addr + i * itf->tx_packet_req.tp_frame_size;
  volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;

#if defined(PATCHED_PACKET)
  /* update packet offset */
  header->tp_net = itf->tx_buffer_payload_offset;
#endif /* defined(PATCHED_PACKET) */
  /* update packet len */
  header->tp_len = msg->data_len;
#if defined(COOKED_PACKET)
  header->tp_len += sizeof(ether_header_t);
#endif
  /* set header flag to USER (trigs xmit)*/
  header->tp_status = TP_STATUS_SEND_REQUEST;

  /* ask the kernel to send data */
  ssize_t err;
  err = sendto(itf->sock_fd, NULL, 0, 0, (const struct sockaddr *)&itf->remote_addr, sizeof(itf->remote_addr));
  if(err < 0) 
    {
      perror("sendto failed");
      fprintf(stderr, "errno = %d\n", errno);
      return -1;
    }
  else if(err == 0 ) 
    {
      /* nothing to do */
      fprintf(stderr, "Kernel have nothing to send.\n");
      return -1;
    }

  /* reset the tp_len : optional */
  header->tp_len = 0;

  /* release the buffer */
  msg->data = 0;
  msg->data_len = 0;

  return 0;
}

/******************************************************************************
 * Permet de fixer le mode debug                                              *
 *****************************************************************************/
int ethernet_set_debug(ethernet_t * itf, int debug)
{
#if !defined(NDEBUG)
  /* check parameters */
  assert(itf);
 
  int old_debug = itf->debug;

  itf->debug = debug;
 
  return old_debug;
#else
  return 0;
#endif
}

/******************************************************************************
 * Permet de recuperer l'adresse mac                                          *
 *****************************************************************************/
void ethernet_fill_with_mac_addr(ethernet_t * itf, uint8_t * addr, unsigned addr_len)
{
  /* check parameters */
  assert(itf && addr);
 
  unsigned i;
  for(i = 0; (i < itf->local_addr.sll_halen)  && (i < addr_len); i++)
    {
      addr[i] = itf->local_addr.sll_addr[i];
    }
  for(; i < addr_len; i++)
    {
      addr[i] = 0;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
#if !defined(NDEBUG)
static void ethernet_debug_frame(const void * base)
{
  fprintf(stdout, "buffer base addr %p\n", base);

  const struct tpacket2_hdr * header = (const struct tpacket2_hdr *)base;
  fprintf(stdout, "tpacket2_header :\n");
  fprintf(stdout, " tp_status   : 0x%02x\n", header->tp_status);
  fprintf(stdout, " tp_len      : %d\n", header->tp_len);
  fprintf(stdout, " tp_snaplen  : %d\n", header->tp_snaplen);
  fprintf(stdout, " tp_mac      : %d\n", header->tp_mac);
  fprintf(stdout, " tp_net      : %d\n", header->tp_net);
  fprintf(stdout, " tp_sec      : %d\n", header->tp_sec);
  fprintf(stdout, " tp_nsec     : %d\n", header->tp_nsec);
  fprintf(stdout, " tp_vlan_tci : 0x%04x\n", header->tp_vlan_tci);

  const struct sockaddr_ll * sll = (const struct sockaddr_ll *)(base + TPACKET_ALIGN(sizeof(struct tpacket2_hdr)));
  fprintf(stdout, "sockaddr_ll :\n");
  fprintf(stdout, " sll_family   : 0x%02x\n", sll->sll_family);
  fprintf(stdout, " sll_protocol : 0x%04x\n", sll->sll_protocol);
  fprintf(stdout, " sll_ifindex  : %d\n", sll->sll_ifindex);
  fprintf(stdout, " sll_hatype   : %d\n", sll->sll_hatype);
  fprintf(stdout, " sll_pkttype  : %d\n", sll->sll_pkttype);
  fprintf(stdout, " sll_halen    : %d\n", sll->sll_halen);
  fprintf(stdout, " sll_addr[8]  : %02x:%02x:%02x:%02x:%02x:%02x:\n", 
          sll->sll_addr[0], sll->sll_addr[1], sll->sll_addr[2],
          sll->sll_addr[3], sll->sll_addr[4], sll->sll_addr[5]);
}
#endif

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
#if !defined(NDEBUG)
static void ethernet_debug_packet_req(const struct tpacket_req * rx_packet_req, const struct tpacket_req * tx_packet_req)
{
  fprintf(stdout, "Pagesize = %ld\n", sysconf(_SC_PAGESIZE));
  fprintf(stdout, "TPACKET_ALIGNMENT = %d\n", TPACKET_ALIGNMENT);
  fprintf(stdout, "TPACKET2_HDRLEN = %d\n", TPACKET2_HDRLEN);
  fprintf(stdout, "sizeof(struct sockaddr_ll) = %d\n", sizeof(struct sockaddr_ll));
  fprintf(stdout, "Rx packet req :\n");
  fprintf(stdout, " tp_block_size = %d\n", rx_packet_req->tp_block_size);
  fprintf(stdout, " tp_block_nr   = %d\n", rx_packet_req->tp_block_nr);
  fprintf(stdout, " tp_frame_size = %d\n", rx_packet_req->tp_frame_size);
  fprintf(stdout, " tp_frame_nr   = %d\n", rx_packet_req->tp_frame_nr);
  fprintf(stdout, "Tx packet req :\n");
  fprintf(stdout, " tp_block_size = %d\n", tx_packet_req->tp_block_size);
  fprintf(stdout, " tp_block_nr   = %d\n", tx_packet_req->tp_block_nr);
  fprintf(stdout, " tp_frame_size = %d\n", tx_packet_req->tp_frame_size);
  fprintf(stdout, " tp_frame_nr   = %d\n", tx_packet_req->tp_frame_nr);
}
#endif


  parent reply	other threads:[~2013-05-20 20:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-15 12:53 Single socket with TX_RING and RX_RING Ricardo Tubío
2013-05-15 13:20 ` Daniel Borkmann
2013-05-15 13:32   ` Ricardo Tubío
2013-05-15 14:47     ` Daniel Borkmann
2013-05-15 14:52       ` Daniel Borkmann
2013-05-15 14:58         ` Ricardo Tubío
2013-05-15 15:04           ` Daniel Borkmann
2013-05-20 20:50     ` Paul Chavent [this message]
2013-05-15 22:44 ` Phil Sutter
2013-05-16  9:18   ` Ricardo Tubío
2013-05-16 10:45     ` Phil Sutter
2013-05-16 11:01       ` Ricardo Tubío
2013-05-16 11:14         ` Daniel Borkmann
2013-05-16 11:52         ` Phil Sutter
2013-05-20 20:54         ` Paul Chavent
2013-05-22 19:36           ` Ricardo Tubío

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519A8C95.6090609@fnac.net \
    --to=paul.chavent@fnac.net \
    --cc=netdev@vger.kernel.org \
    --cc=rtpardavila@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).