* [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support
@ 2017-01-03 14:31 Sowmini Varadhan
2017-01-03 14:31 ` [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Sowmini Varadhan @ 2017-01-03 14:31 UTC (permalink / raw)
To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem
This patch series allows an application to use a single PF_PACKET
descriptor and leverage the best implementations of TX_RING
and RX_RING that exist today.
Patch 1 adds the kernel/Documentation changes for TX_RING
support and patch2 adds the associated test case in selftests.
Changes since v2: additional sanity checks for setsockopt
input for TX_RING/TPACKET_V3. Refactored psock_tpacket.c
test code to avoid code duplication from V2.
Sowmini Varadhan (2):
af_packet: TX_RING support for TPACKET_V3
tools: test case for TPACKET_V3/TX_RING support
Documentation/networking/packet_mmap.txt | 9 ++-
net/packet/af_packet.c | 39 +++++++++---
tools/testing/selftests/net/psock_tpacket.c | 91 ++++++++++++++++++++++-----
3 files changed, 111 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
2017-01-03 14:31 [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
@ 2017-01-03 14:31 ` Sowmini Varadhan
2017-01-03 15:57 ` Willem de Bruijn
2017-01-03 14:31 ` [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
2017-01-03 16:01 ` [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support David Miller
2 siblings, 1 reply; 6+ messages in thread
From: Sowmini Varadhan @ 2017-01-03 14:31 UTC (permalink / raw)
To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.
This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: sanity checks on tp_next_offset and corresponding Doc updates
as suggested by Willem de Bruijn
v3: additional sanity checking of input in packet_set_ring()
Documentation/networking/packet_mmap.txt | 9 +++++-
net/packet/af_packet.c | 39 +++++++++++++++++++++++-------
2 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index daa015a..f3b9e50 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -565,7 +565,7 @@ where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
(void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
TPACKET_V2 --> TPACKET_V3:
- - Flexible buffer implementation:
+ - Flexible buffer implementation for RX_RING:
1. Blocks can be configured with non-static frame-size
2. Read/poll is at a block-level (as opposed to packet-level)
3. Added poll timeout to avoid indefinite user-space wait
@@ -574,7 +574,12 @@ where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
4.1 block::timeout
4.2 tpkt_hdr::sk_rxhash
- RX Hash data available in user space
- - Currently only RX_RING available
+ - TX_RING semantics are conceptually similar to TPACKET_V2;
+ use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN
+ instead of TPACKET2_HDRLEN. In the current implementation,
+ the tp_next_offset field in the tpacket3_hdr MUST be set to
+ zero, indicating that the ring does not hold variable sized frames.
+ Packets with non-zero values of tp_next_offset will be dropped.
-------------------------------------------------------------------------------
+ AF_PACKET fanout mode
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 89f2e8c..ee8be1c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -409,6 +409,9 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
flush_dcache_page(pgv_to_page(&h.h2->tp_status));
break;
case TPACKET_V3:
+ h.h3->tp_status = status;
+ flush_dcache_page(pgv_to_page(&h.h3->tp_status));
+ break;
default:
WARN(1, "TPACKET version not supported.\n");
BUG();
@@ -432,6 +435,8 @@ static int __packet_get_status(struct packet_sock *po, void *frame)
flush_dcache_page(pgv_to_page(&h.h2->tp_status));
return h.h2->tp_status;
case TPACKET_V3:
+ flush_dcache_page(pgv_to_page(&h.h3->tp_status));
+ return h.h3->tp_status;
default:
WARN(1, "TPACKET version not supported.\n");
BUG();
@@ -2500,6 +2505,13 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
ph.raw = frame;
switch (po->tp_version) {
+ case TPACKET_V3:
+ if (ph.h3->tp_next_offset != 0) {
+ pr_warn_once("variable sized slot not supported");
+ return -EINVAL;
+ }
+ tp_len = ph.h3->tp_len;
+ break;
case TPACKET_V2:
tp_len = ph.h2->tp_len;
break;
@@ -2519,6 +2531,9 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
off_max = po->tx_ring.frame_size - tp_len;
if (po->sk.sk_type == SOCK_DGRAM) {
switch (po->tp_version) {
+ case TPACKET_V3:
+ off = ph.h3->tp_net;
+ break;
case TPACKET_V2:
off = ph.h2->tp_net;
break;
@@ -2528,6 +2543,9 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
}
} else {
switch (po->tp_version) {
+ case TPACKET_V3:
+ off = ph.h3->tp_mac;
+ break;
case TPACKET_V2:
off = ph.h2->tp_mac;
break;
@@ -4116,11 +4134,6 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
struct tpacket_req *req = &req_u->req;
lock_sock(sk);
- /* Opening a Tx-ring is NOT supported in TPACKET_V3 */
- if (!closing && tx_ring && (po->tp_version > TPACKET_V2)) {
- net_warn_ratelimited("Tx-ring is not supported.\n");
- goto out;
- }
rb = tx_ring ? &po->tx_ring : &po->rx_ring;
rb_queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
@@ -4180,11 +4193,19 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
goto out;
switch (po->tp_version) {
case TPACKET_V3:
- /* Transmit path is not supported. We checked
- * it above but just being paranoid
- */
- if (!tx_ring)
+ /* Block transmit is not supported yet */
+ if (!tx_ring) {
init_prb_bdqc(po, rb, pg_vec, req_u);
+ } else {
+ struct tpacket_req3 *req3 = &req_u->req3;
+
+ if (req3->tp_retire_blk_tov ||
+ req3->tp_sizeof_priv ||
+ req3->tp_feature_req_word) {
+ err = -EINVAL;
+ goto out;
+ }
+ }
break;
default:
break;
--
1.7.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
2017-01-03 14:31 [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
2017-01-03 14:31 ` [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
@ 2017-01-03 14:31 ` Sowmini Varadhan
2017-01-03 15:57 ` Willem de Bruijn
2017-01-03 16:01 ` [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support David Miller
2 siblings, 1 reply; 6+ messages in thread
From: Sowmini Varadhan @ 2017-01-03 14:31 UTC (permalink / raw)
To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem
Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: Added test case.
v3: refactored code to have a single walk_tx() function that handles all three
TPACKET versions.
tools/testing/selftests/net/psock_tpacket.c | 91 ++++++++++++++++++++++-----
1 files changed, 74 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 24adf70..4a1bc64 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -311,20 +311,33 @@ static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr)
__sync_synchronize();
}
-static inline int __v1_v2_tx_kernel_ready(void *base, int version)
+static inline int __v3_tx_kernel_ready(struct tpacket3_hdr *hdr)
+{
+ return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING));
+}
+
+static inline void __v3_tx_user_ready(struct tpacket3_hdr *hdr)
+{
+ hdr->tp_status = TP_STATUS_SEND_REQUEST;
+ __sync_synchronize();
+}
+
+static inline int __tx_kernel_ready(void *base, int version)
{
switch (version) {
case TPACKET_V1:
return __v1_tx_kernel_ready(base);
case TPACKET_V2:
return __v2_tx_kernel_ready(base);
+ case TPACKET_V3:
+ return __v3_tx_kernel_ready(base);
default:
bug_on(1);
return 0;
}
}
-static inline void __v1_v2_tx_user_ready(void *base, int version)
+static inline void __tx_user_ready(void *base, int version)
{
switch (version) {
case TPACKET_V1:
@@ -333,6 +346,9 @@ static inline void __v1_v2_tx_user_ready(void *base, int version)
case TPACKET_V2:
__v2_tx_user_ready(base);
break;
+ case TPACKET_V3:
+ __v3_tx_user_ready(base);
+ break;
}
}
@@ -348,7 +364,22 @@ static void __v1_v2_set_packet_loss_discard(int sock)
}
}
-static void walk_v1_v2_tx(int sock, struct ring *ring)
+static inline void *get_next_frame(struct ring *ring, int n)
+{
+ uint8_t *f0 = ring->rd[0].iov_base;
+
+ switch (ring->version) {
+ case TPACKET_V1:
+ case TPACKET_V2:
+ return ring->rd[n].iov_base;
+ case TPACKET_V3:
+ return f0 + (n * ring->req3.tp_frame_size);
+ default:
+ bug_on(1);
+ }
+}
+
+static void walk_tx(int sock, struct ring *ring)
{
struct pollfd pfd;
int rcv_sock, ret;
@@ -360,9 +391,19 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
.sll_family = PF_PACKET,
.sll_halen = ETH_ALEN,
};
+ int nframes;
+
+ /* TPACKET_V{1,2} sets up the ring->rd* related variables based
+ * on frames (e.g., rd_num is tp_frame_nr) whereas V3 sets these
+ * up based on blocks (e.g, rd_num is tp_block_nr)
+ */
+ if (ring->version <= TPACKET_V2)
+ nframes = ring->rd_num;
+ else
+ nframes = ring->req3.tp_frame_nr;
bug_on(ring->type != PACKET_TX_RING);
- bug_on(ring->rd_num < NUM_PACKETS);
+ bug_on(nframes < NUM_PACKETS);
rcv_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (rcv_sock == -1) {
@@ -388,10 +429,11 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
create_payload(packet, &packet_len);
while (total_packets > 0) {
- while (__v1_v2_tx_kernel_ready(ring->rd[frame_num].iov_base,
- ring->version) &&
+ void *next = get_next_frame(ring, frame_num);
+
+ while (__tx_kernel_ready(next, ring->version) &&
total_packets > 0) {
- ppd.raw = ring->rd[frame_num].iov_base;
+ ppd.raw = next;
switch (ring->version) {
case TPACKET_V1:
@@ -413,14 +455,27 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
packet_len);
total_bytes += ppd.v2->tp_h.tp_snaplen;
break;
+ case TPACKET_V3: {
+ struct tpacket3_hdr *tx = next;
+
+ tx->tp_snaplen = packet_len;
+ tx->tp_len = packet_len;
+ tx->tp_next_offset = 0;
+
+ memcpy((uint8_t *)tx + TPACKET3_HDRLEN -
+ sizeof(struct sockaddr_ll), packet,
+ packet_len);
+ total_bytes += tx->tp_snaplen;
+ break;
+ }
}
status_bar_update();
total_packets--;
- __v1_v2_tx_user_ready(ppd.raw, ring->version);
+ __tx_user_ready(next, ring->version);
- frame_num = (frame_num + 1) % ring->rd_num;
+ frame_num = (frame_num + 1) % nframes;
}
poll(&pfd, 1, 1);
@@ -460,7 +515,7 @@ static void walk_v1_v2(int sock, struct ring *ring)
if (ring->type == PACKET_RX_RING)
walk_v1_v2_rx(sock, ring);
else
- walk_v1_v2_tx(sock, ring);
+ walk_tx(sock, ring);
}
static uint64_t __v3_prev_block_seq_num = 0;
@@ -583,7 +638,7 @@ static void walk_v3(int sock, struct ring *ring)
if (ring->type == PACKET_RX_RING)
walk_v3_rx(sock, ring);
else
- bug_on(1);
+ walk_tx(sock, ring);
}
static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
@@ -602,12 +657,13 @@ static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
ring->flen = ring->req.tp_frame_size;
}
-static void __v3_fill(struct ring *ring, unsigned int blocks)
+static void __v3_fill(struct ring *ring, unsigned int blocks, int type)
{
- ring->req3.tp_retire_blk_tov = 64;
- ring->req3.tp_sizeof_priv = 0;
- ring->req3.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
-
+ if (type == PACKET_RX_RING) {
+ ring->req3.tp_retire_blk_tov = 64;
+ ring->req3.tp_sizeof_priv = 0;
+ ring->req3.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
+ }
ring->req3.tp_block_size = getpagesize() << 2;
ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
ring->req3.tp_block_nr = blocks;
@@ -641,7 +697,7 @@ static void setup_ring(int sock, struct ring *ring, int version, int type)
break;
case TPACKET_V3:
- __v3_fill(ring, blocks);
+ __v3_fill(ring, blocks, type);
ret = setsockopt(sock, SOL_PACKET, type, &ring->req3,
sizeof(ring->req3));
break;
@@ -796,6 +852,7 @@ int main(void)
ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
+ ret |= test_tpacket(TPACKET_V3, PACKET_TX_RING);
if (ret)
return 1;
--
1.7.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
2017-01-03 14:31 ` [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
@ 2017-01-03 15:57 ` Willem de Bruijn
0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2017-01-03 15:57 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
On Tue, Jan 3, 2017 at 9:31 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
> does not currently have TX_RING support. As a result an application
> that wants the best perf for Tx and Rx (e.g. to handle request/response
> transacations) ends up needing 2 sockets, one with *_v2 for Tx and
> another with *_v3 for Rx.
>
> This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
> so that an application can use a single descriptor to get the benefits
> of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
> first filling up multiple frames in the Tx ring and then triggering a
> transmit. This patch only support fixed size Tx frames for TPACKET_V3,
> and requires that tp_next_offset must be zero.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
2017-01-03 14:31 ` [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
@ 2017-01-03 15:57 ` Willem de Bruijn
0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2017-01-03 15:57 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
On Tue, Jan 3, 2017 at 9:31 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support
2017-01-03 14:31 [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
2017-01-03 14:31 ` [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
2017-01-03 14:31 ` [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
@ 2017-01-03 16:01 ` David Miller
2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2017-01-03 16:01 UTC (permalink / raw)
To: sowmini.varadhan; +Cc: netdev, daniel, willemb
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Tue, 3 Jan 2017 06:31:46 -0800
> This patch series allows an application to use a single PF_PACKET
> descriptor and leverage the best implementations of TX_RING
> and RX_RING that exist today.
>
> Patch 1 adds the kernel/Documentation changes for TX_RING
> support and patch2 adds the associated test case in selftests.
>
> Changes since v2: additional sanity checks for setsockopt
> input for TX_RING/TPACKET_V3. Refactored psock_tpacket.c
> test code to avoid code duplication from V2.
Series applied, thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-01-03 16:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-03 14:31 [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
2017-01-03 14:31 ` [PATCH v3 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
2017-01-03 15:57 ` Willem de Bruijn
2017-01-03 14:31 ` [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
2017-01-03 15:57 ` Willem de Bruijn
2017-01-03 16:01 ` [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).