From: Paul Chavent <Paul.Chavent@onera.fr>
To: Daniel Borkmann <danborkmann@iogearbox.net>
Cc: davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH net-next] packet mmap : allow the user to choose tx data offset.
Date: Fri, 19 Oct 2012 16:28:43 +0200 [thread overview]
Message-ID: <5081639B.5080102@onera.fr> (raw)
In-Reply-To: <CAD6jFURov5oB+ck33JzCYQoBo9mYsEM9c0rHz9gmcgdMKgmwCg@mail.gmail.com>
Hi Daniel.
I've just tested the limit cases.
With SOCK_DGRAM, with a too large tp_net :
payload offset (2003) out of range [32;2002]
With SOCK_DGRAM, with a too small tp_net :
payload offset (31) out of range [32;2002]
With SOCK_RAW, with a too large tp_mac :
payload offset (1989) out of range [32;1988]
With SOCK_RAW, with a too small tp_mac :
payload offset (31) out of range [32;1988]
On 10/19/2012 01:36 PM, Daniel Borkmann wrote:
> On Fri, Oct 19, 2012 at 9:21 AM, Paul Chavent <Paul.Chavent@onera.fr> wrote:
>> The tx data offset of packet mmap tx ring used to be :
>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>
>> The problem is that, with SOCK_RAW socket, the payload
>> (14 bytes after the begening of the user data) is misaligned.
>>
>> This patch allow to let the user give an offset for it's tx
>> data if he desires.
>>
>> Set sock option PACKET_TX_HAS_OFF to 1, then specify in each
>> frame of your tx ring tp_net for SOCK_DGRAM, or tp_mac for
>> SOCK_RAW.
>>
>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>> ---
>> Documentation/networking/packet_mmap.txt | 13 +++++++++
>> include/uapi/linux/if_packet.h | 1 +
>> net/packet/af_packet.c | 48 +++++++++++++++++++++++++++++++-
>> net/packet/internal.h | 1 +
>> 4 files changed, 62 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
>> index 1c08a4b..c805e5f 100644
>> --- a/Documentation/networking/packet_mmap.txt
>> +++ b/Documentation/networking/packet_mmap.txt
>> @@ -163,6 +163,19 @@ As capture, each frame contains two parts:
>>
>> A complete tutorial is available at: http://wiki.gnu-log.net/
>>
>> +By default, the user should put data at :
>> + frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
>
> TPACKET_HDRLEN is only for tpacket, version 1. Maybe you should mention that.
>
>> +So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
>> +the begening of the user data will be at :
>
> Typo in "begening".
>
>> + frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
>
> See above, tpacket, version 1.
>
>> +If you whish to put user data at a custom offset from the begenning of
>
> Typo in "whish" and "begenning".
>
>> +the frame (for payload alignment with SOCK_RAW mode for instance) you
>> +can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
>> +to make this work it must be enabled previously with setsockopt()
>> +and the PACKET_TX_HAS_OFF option.
>> +
>> --------------------------------------------------------------------------------
>> + PACKET_MMAP settings
>> --------------------------------------------------------------------------------
>> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
>> index f379929..f9a6037 100644
>> --- a/include/uapi/linux/if_packet.h
>> +++ b/include/uapi/linux/if_packet.h
>> @@ -50,6 +50,7 @@ struct sockaddr_ll {
>> #define PACKET_TX_TIMESTAMP 16
>> #define PACKET_TIMESTAMP 17
>> #define PACKET_FANOUT 18
>> +#define PACKET_TX_HAS_OFF 19
>>
>> #define PACKET_FANOUT_HASH 0
>> #define PACKET_FANOUT_LB 1
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 94060ed..b6df577 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -1881,7 +1881,37 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
>> skb_reserve(skb, hlen);
>> skb_reset_network_header(skb);
>>
>> - data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll);
>> + if (po->tp_tx_has_off) {
>> + int off_min = po->tp_hdrlen - sizeof(struct sockaddr_ll);
>> + int off_max = po->tx_ring.frame_size - tp_len;
>
> I think here, the header length is missing as well. Have you tested
> this with min/max offsets?
>
> Maybe it is more reasonable to put off = off_min at the beginning and
> then add tp_mac to it. Thus, tp_mac can also be 0 with
> PACKET_TX_HAS_OFF.
>
>> + int off;
>
> For offsets, better use off_t, or here u32. Also, add a newline after
> variable declaration.
>
>> + if (sock->type != SOCK_DGRAM)
>
> Why not test for == SOCK_RAW? This makes it more readable.
>
>> + switch (po->tp_version) {
>> + case TPACKET_V2:
>> + off = ph.h2->tp_mac;
>> + break;
>> + default:
>> + off = ph.h1->tp_mac;
>
> TPACKET_V1 as default is wrong since there's also TPACKET_V3. What
> about TPACKET_V3 in general in your patch? You simply ignore it.
>
>> + break;
>> + }
>> + else
>> + switch (po->tp_version) {
>> + case TPACKET_V2:
>> + off = ph.h2->tp_net;
>> + break;
>> + default:
>> + off = ph.h1->tp_net;
>> + break;
>> + }
>
> Same as above. You should also put braces around the if-else
> construct. Sure, it's only one statement after that, but this spans
> across multiple lines and can make it error-prone in future changes.
>
>> + if (unlikely((off < off_min) || (off_max < off))) {
>> + pr_err("payload offset (%d) out of range [%d;%d]\n",
>> + off, off_min, off_max);
>> + return -EINVAL;
>> + }
>
> if you set off initially to off_min, you could drop the check for off < off_min.
>
>> + data = ph.raw + off;
>> + } else {
>> + data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll);
>> + }
>> to_write = tp_len;
>>
>> if (sock->type == SOCK_DGRAM) {
>> @@ -3111,6 +3141,19 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
>>
>> return fanout_add(sk, val & 0xffff, val >> 16);
>> }
>> + case PACKET_TX_HAS_OFF:
>> + {
>> + unsigned int val;
>> +
>> + if (optlen != sizeof(val))
>> + return -EINVAL;
>> + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec)
>> + return -EBUSY;
>> + if (copy_from_user(&val, optval, sizeof(val)))
>> + return -EFAULT;
>> + po->tp_tx_has_off = !!val;
>> + return 0;
>> + }
>> default:
>> return -ENOPROTOOPT;
>> }
>> @@ -3202,6 +3245,9 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
>> ((u32)po->fanout->type << 16)) :
>> 0);
>> break;
>> + case PACKET_TX_HAS_OFF:
>> + val = po->tp_tx_has_off;
>> + break;
>> default:
>> return -ENOPROTOOPT;
>> }
>> diff --git a/net/packet/internal.h b/net/packet/internal.h
>> index 44945f6..169e60d 100644
>> --- a/net/packet/internal.h
>> +++ b/net/packet/internal.h
>> @@ -110,6 +110,7 @@ struct packet_sock {
>> unsigned int tp_reserve;
>> unsigned int tp_loss:1;
>> unsigned int tp_tstamp;
>> + unsigned int tp_tx_has_off:1;
>> struct packet_type prot_hook ____cacheline_aligned_in_smp;
>> };
>
prev parent reply other threads:[~2012-10-19 14:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-19 7:21 [PATCH net-next] packet mmap : allow the user to choose tx data offset Paul Chavent
2012-10-19 11:36 ` Daniel Borkmann
2012-10-19 13:43 ` Paul Chavent
2012-10-20 12:39 ` Daniel Borkmann
2012-10-20 17:13 ` pchavent
2012-10-20 17:35 ` Daniel Borkmann
2012-10-19 13:50 ` Paul Chavent
2012-10-19 14:28 ` Paul Chavent [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5081639B.5080102@onera.fr \
--to=paul.chavent@onera.fr \
--cc=danborkmann@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox