* [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
@ 2012-10-05 13:10 Paul Chavent
2012-10-05 14:17 ` Daniel Borkmann
0 siblings, 1 reply; 7+ messages in thread
From: Paul Chavent @ 2012-10-05 13:10 UTC (permalink / raw)
To: davem, edumazet, daniel.borkmann, xemul, herbert, netdev,
johann.baudy, uaca
Cc: Paul Chavent
The tx offset of packet mmap tx ring used to be :
(TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
The problem is that depending on the usage of SOCK_DGRAM or
SOCK_RAW, the payload could be aligned or not.
This patch allow to let the user give an offset for it's tx
payload if he desires.
Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
---
net/packet/af_packet.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 94060ed..5bf3100 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1851,7 +1851,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
struct tpacket2_hdr *h2;
void *raw;
} ph;
- int to_write, offset, len, tp_len, nr_frags, len_max;
+ int to_write, offset, len, tp_len, nr_frags, len_max, off;
struct socket *sock = po->sk.sk_socket;
struct page *page;
void *data;
@@ -1868,9 +1868,11 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
switch (po->tp_version) {
case TPACKET_V2:
tp_len = ph.h2->tp_len;
+ off = ph.h2->tp_net;
break;
default:
tp_len = ph.h1->tp_len;
+ off = ph.h1->tp_net;
break;
}
if (unlikely(tp_len > size_max)) {
@@ -1882,6 +1884,16 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
skb_reset_network_header(skb);
data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll);
+ if (sock->type != SOCK_DGRAM)
+ off -= dev->hard_header_len;
+ if (likely((po->tp_hdrlen - sizeof(struct sockaddr_ll)) < off)) {
+ if (unlikely((off + tp_len) > size_max)) {
+ pr_err("packet size is too long (%d + %d > %d)\n",
+ off, tp_len, size_max);
+ return -EMSGSIZE;
+ }
+ data = ph.raw + off;
+ }
to_write = tp_len;
if (sock->type == SOCK_DGRAM) {
--
1.7.12.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
2012-10-05 13:10 [PATCH] Packet mmap : allow the user to choose the offset of the tx payload Paul Chavent
@ 2012-10-05 14:17 ` Daniel Borkmann
2012-10-05 19:21 ` pchavent
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2012-10-05 14:17 UTC (permalink / raw)
To: Paul Chavent; +Cc: davem, edumazet, xemul, herbert, netdev, johann.baudy, uaca
On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr> wrote:
> The tx offset of packet mmap tx ring used to be :
> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>
> The problem is that depending on the usage of SOCK_DGRAM or
> SOCK_RAW, the payload could be aligned or not.
>
> This patch allow to let the user give an offset for it's tx
> payload if he desires.
>
> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
On the first look, could it be that your patch currently enforces the
use of your tp_net's offset for *all* TX_RING users, even if they
don't care about it? So in case off==0, you probably get a negative
offset in case of SOCK_RAW, thus it won't hit the second if-statement.
Sure, but this does not look intuitive in my opinion. Maybe, it's
better to only enter this path if the offset *is* used by someone.
Also, to my knowledge tp_net is currently only applied in receive
path. So, if for whatever reason people did not explicitly set tp_net
to 0, it might break their code, if there's a random offset in it,
right?
Best,
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
2012-10-05 14:17 ` Daniel Borkmann
@ 2012-10-05 19:21 ` pchavent
2012-10-05 19:37 ` Daniel Borkmann
0 siblings, 1 reply; 7+ messages in thread
From: pchavent @ 2012-10-05 19:21 UTC (permalink / raw)
To: Daniel Borkmann
Cc: davem, edumazet, xemul, herbert, netdev, johann.baudy, uaca
Hi
On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
> wrote:
>> The tx offset of packet mmap tx ring used to be :
>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>
>> The problem is that depending on the usage of SOCK_DGRAM or
>> SOCK_RAW, the payload could be aligned or not.
>>
>> This patch allow to let the user give an offset for it's tx
>> payload if he desires.
>>
>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>
> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
When we use tx ring, the user have to write at (TPACKET_HDRLEN -
sizeof(struct sockaddr_ll))
This adress is aligned on TPACKET_ALIGNMENT since
TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) +
sizeof(struct sockaddr_ll))
When we use the tx ring with SOCK_RAW option, the mac header is aligned
on TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>
> On the first look, could it be that your patch currently enforces the
> use of your tp_net's offset for *all* TX_RING users, even if they
> don't care about it? So in case off==0, you probably get a negative
> offset in case of SOCK_RAW, thus it won't hit the second
> if-statement.
> Sure, but this does not look intuitive in my opinion. Maybe, it's
> better to only enter this path if the offset *is* used by someone.
Yes, moreover i had a problem with the signed/unsigned comparison.
I've fixed all those problems for the next submission.
>
> Also, to my knowledge tp_net is currently only applied in receive
> path. So, if for whatever reason people did not explicitly set tp_net
> to 0, it might break their code, if there's a random offset in it,
> right?
I haven't found where all frames were initialized to
TP_STATUS_AVAILABLE, so i have supposed that the frames were initialized
to zero.
So your are right, if tp_net is not forced to zero before sending the
packet it could break the legacy code :(
>
> Best,
> Daniel
Thank for your review.
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
2012-10-05 19:21 ` pchavent
@ 2012-10-05 19:37 ` Daniel Borkmann
2012-10-06 7:43 ` pchavent
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2012-10-05 19:37 UTC (permalink / raw)
To: pchavent; +Cc: davem, edumazet, xemul, herbert, netdev, johann.baudy, uaca
On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> wrote:
> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
>> wrote:
>>>
>>> The tx offset of packet mmap tx ring used to be :
>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>
>>> The problem is that depending on the usage of SOCK_DGRAM or
>>> SOCK_RAW, the payload could be aligned or not.
>>>
>>> This patch allow to let the user give an offset for it's tx
>>> payload if he desires.
>>>
>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>
>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>
> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
> sizeof(struct sockaddr_ll))
>
> This adress is aligned on TPACKET_ALIGNMENT since
> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + sizeof(struct
> sockaddr_ll))
>
> When we use the tx ring with SOCK_RAW option, the mac header is aligned on
> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
Okay, I'm confused about your intentions, maybe I'm missing something.
The man-page of packet(7) clearly says:
The socket_type is either SOCK_RAW for raw packets *including* the
link level header or SOCK_DGRAM for cooked packets with the link
level header *removed*.
So this is perfectly intended behavior of PF_PACKET.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
2012-10-05 19:37 ` Daniel Borkmann
@ 2012-10-06 7:43 ` pchavent
2012-10-07 10:50 ` Daniel Borkmann
0 siblings, 1 reply; 7+ messages in thread
From: pchavent @ 2012-10-06 7:43 UTC (permalink / raw)
To: Daniel Borkmann
Cc: davem, edumazet, xemul, herbert, netdev, johann.baudy, uaca
On Fri, 5 Oct 2012 21:37:58 +0200, Daniel Borkmann wrote:
> On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr>
> wrote:
>> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent
>>> <Paul.Chavent@onera.fr>
>>> wrote:
>>>>
>>>> The tx offset of packet mmap tx ring used to be :
>>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>>
>>>> The problem is that depending on the usage of SOCK_DGRAM or
>>>> SOCK_RAW, the payload could be aligned or not.
>>>>
>>>> This patch allow to let the user give an offset for it's tx
>>>> payload if he desires.
>>>>
>>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>>
>>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>>
>> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
>> sizeof(struct sockaddr_ll))
>>
>> This adress is aligned on TPACKET_ALIGNMENT since
>> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) +
>> sizeof(struct
>> sockaddr_ll))
>>
>> When we use the tx ring with SOCK_RAW option, the mac header is
>> aligned on
>> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>
> Okay, I'm confused about your intentions, maybe I'm missing
> something.
> The man-page of packet(7) clearly says:
>
> The socket_type is either SOCK_RAW for raw packets *including* the
> link level header or SOCK_DGRAM for cooked packets with the link
> level header *removed*.
>
> So this is perfectly intended behavior of PF_PACKET.
>
> Cheers,
>
> Daniel
Yes, i also expect to be able to include the link level header when i
use SOCK_RAW.
My intention is to send a frame with this payload (for example) :
typedef struct
{
double ts;
uint64_t foo;
} test_t;
So i get a pointer to the raw packet :
void * raw_packet = frame_base + (TPACKET_HDRLEN - sizeof(struct
sockaddr_ll));
I cook the header :
memcpy(raw_packet + 0, dst_addr, sizeof(dst_addr));
memcpy(raw_packet + 6, src_addr, sizeof(src_addr));
memcpy(raw_packet + 12, type , sizeof(type));
Then i get a pointer to the beginning of payload :
test_t * payload = raw_packet + 14;
Here payload is at 58 bytes from the beginning of the frame.
Then i fill the payload :
payload->ts = 1.0;
payload->foo = 2;
...
These are misaligned accesses.
I don't care to fill the cooked header if it's misaligned, but i would
like to be able to fill the frame directly in the ring buffer being on
aligned boundaries.
It would allow to avoid to memcpy an instance of the payload
structure...
Regards.
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
2012-10-06 7:43 ` pchavent
@ 2012-10-07 10:50 ` Daniel Borkmann
2012-10-07 12:44 ` Daniel Borkmann
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2012-10-07 10:50 UTC (permalink / raw)
To: pchavent; +Cc: davem, edumazet, xemul, herbert, netdev, johann.baudy, uaca
On Sat, Oct 6, 2012 at 9:43 AM, pchavent <Paul.Chavent@onera.fr> wrote:
> On Fri, 5 Oct 2012 21:37:58 +0200, Daniel Borkmann wrote:
>> On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> wrote:
>>> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>>>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
>>>> wrote:
>>>>>
>>>>>
>>>>> The tx offset of packet mmap tx ring used to be :
>>>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>>>
>>>>> The problem is that depending on the usage of SOCK_DGRAM or
>>>>> SOCK_RAW, the payload could be aligned or not.
>>>>>
>>>>> This patch allow to let the user give an offset for it's tx
>>>>> payload if he desires.
>>>>>
>>>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>>>
>>>>
>>>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>>>
>>>
>>> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
>>> sizeof(struct sockaddr_ll))
>>>
>>> This adress is aligned on TPACKET_ALIGNMENT since
>>> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) +
>>> sizeof(struct
>>> sockaddr_ll))
>>>
>>> When we use the tx ring with SOCK_RAW option, the mac header is aligned
>>> on
>>> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>>
>>
>> Okay, I'm confused about your intentions, maybe I'm missing something.
>> The man-page of packet(7) clearly says:
>>
>> The socket_type is either SOCK_RAW for raw packets *including* the
>> link level header or SOCK_DGRAM for cooked packets with the link
>> level header *removed*.
>>
>> So this is perfectly intended behavior of PF_PACKET.
>>
>> Cheers,
>>
>> Daniel
>
>
> Yes, i also expect to be able to include the link level header when i use
> SOCK_RAW.
>
> My intention is to send a frame with this payload (for example) :
> typedef struct
> {
> double ts;
> uint64_t foo;
> } test_t;
>
> So i get a pointer to the raw packet :
> void * raw_packet = frame_base + (TPACKET_HDRLEN - sizeof(struct
> sockaddr_ll));
>
> I cook the header :
> memcpy(raw_packet + 0, dst_addr, sizeof(dst_addr));
> memcpy(raw_packet + 6, src_addr, sizeof(src_addr));
> memcpy(raw_packet + 12, type , sizeof(type));
>
> Then i get a pointer to the beginning of payload :
> test_t * payload = raw_packet + 14;
>
> Here payload is at 58 bytes from the beginning of the frame.
>
> Then i fill the payload :
> payload->ts = 1.0;
> payload->foo = 2;
> ...
>
> These are misaligned accesses.
>
> I don't care to fill the cooked header if it's misaligned, but i would like
> to be able to fill the frame directly in the ring buffer being on aligned
> boundaries.
Okay.
Maybe what you could do in a new version of your patch is to introduce
a TP_STATUS flag, e.g. TP_STATUS_SEND_HAS_OFF that you pass along
binary or'ed with the commonly used flags, and then you can fill
tp_mac resp. tp_net with offsets. By that, you won't break legacy
stuff.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] Packet mmap : allow the user to choose the offset of the tx payload.
2012-10-07 10:50 ` Daniel Borkmann
@ 2012-10-07 12:44 ` Daniel Borkmann
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Borkmann @ 2012-10-07 12:44 UTC (permalink / raw)
To: pchavent; +Cc: davem, edumazet, xemul, herbert, netdev, johann.baudy, uaca
On Sun, Oct 7, 2012 at 12:50 PM, Daniel Borkmann
<danborkmann@iogearbox.net> wrote:
> On Sat, Oct 6, 2012 at 9:43 AM, pchavent <Paul.Chavent@onera.fr> wrote:
>> On Fri, 5 Oct 2012 21:37:58 +0200, Daniel Borkmann wrote:
>>> On Fri, Oct 5, 2012 at 9:21 PM, pchavent <Paul.Chavent@onera.fr> wrote:
>>>> On Fri, 5 Oct 2012 16:17:12 +0200, Daniel Borkmann wrote:
>>>>> On Fri, Oct 5, 2012 at 3:10 PM, Paul Chavent <Paul.Chavent@onera.fr>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> The tx offset of packet mmap tx ring used to be :
>>>>>> (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))
>>>>>>
>>>>>> The problem is that depending on the usage of SOCK_DGRAM or
>>>>>> SOCK_RAW, the payload could be aligned or not.
>>>>>>
>>>>>> This patch allow to let the user give an offset for it's tx
>>>>>> payload if he desires.
>>>>>>
>>>>>> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
>>>>>
>>>>>
>>>>> Can you provide an example when it doesn't hit TPACKET_ALIGNMENT?
>>>>
>>>>
>>>> When we use tx ring, the user have to write at (TPACKET_HDRLEN -
>>>> sizeof(struct sockaddr_ll))
>>>>
>>>> This adress is aligned on TPACKET_ALIGNMENT since
>>>> TPACKET_HDRLEN = (TPACKET_ALIGN(sizeof(struct tpacket_hdr)) +
>>>> sizeof(struct
>>>> sockaddr_ll))
>>>>
>>>> When we use the tx ring with SOCK_RAW option, the mac header is aligned
>>>> on
>>>> TPACKET_ALIGNMENT, but not the payload (14 bytes away).
>>>
>>>
>>> Okay, I'm confused about your intentions, maybe I'm missing something.
>>> The man-page of packet(7) clearly says:
>>>
>>> The socket_type is either SOCK_RAW for raw packets *including* the
>>> link level header or SOCK_DGRAM for cooked packets with the link
>>> level header *removed*.
>>>
>>> So this is perfectly intended behavior of PF_PACKET.
>>>
>>> Cheers,
>>>
>>> Daniel
>>
>>
>> Yes, i also expect to be able to include the link level header when i use
>> SOCK_RAW.
>>
>> My intention is to send a frame with this payload (for example) :
>> typedef struct
>> {
>> double ts;
>> uint64_t foo;
>> } test_t;
>>
>> So i get a pointer to the raw packet :
>> void * raw_packet = frame_base + (TPACKET_HDRLEN - sizeof(struct
>> sockaddr_ll));
>>
>> I cook the header :
>> memcpy(raw_packet + 0, dst_addr, sizeof(dst_addr));
>> memcpy(raw_packet + 6, src_addr, sizeof(src_addr));
>> memcpy(raw_packet + 12, type , sizeof(type));
>>
>> Then i get a pointer to the beginning of payload :
>> test_t * payload = raw_packet + 14;
>>
>> Here payload is at 58 bytes from the beginning of the frame.
>>
>> Then i fill the payload :
>> payload->ts = 1.0;
>> payload->foo = 2;
>> ...
>>
>> These are misaligned accesses.
>>
>> I don't care to fill the cooked header if it's misaligned, but i would like
>> to be able to fill the frame directly in the ring buffer being on aligned
>> boundaries.
Just a minor remark: speaking about your patch, in the case of
SOCK_RAW I think you rather might want to care whether your *mac
header* starts at an aligned offset or not, since the kernel doesn't
care about your particular payload, but about the frame as a whole
that it needs to process.
> Okay.
>
> Maybe what you could do in a new version of your patch is to introduce
> a TP_STATUS flag, e.g. TP_STATUS_SEND_HAS_OFF that you pass along
> binary or'ed with the commonly used flags, and then you can fill
> tp_mac resp. tp_net with offsets. By that, you won't break legacy
> stuff.
Also, note that you should wait with your submission until net-next is reopened.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-10-07 12:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-05 13:10 [PATCH] Packet mmap : allow the user to choose the offset of the tx payload Paul Chavent
2012-10-05 14:17 ` Daniel Borkmann
2012-10-05 19:21 ` pchavent
2012-10-05 19:37 ` Daniel Borkmann
2012-10-06 7:43 ` pchavent
2012-10-07 10:50 ` Daniel Borkmann
2012-10-07 12:44 ` Daniel Borkmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).