netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/2] TPACKET_V3 TX_RING support
@ 2017-01-01 22:45 Sowmini Varadhan
  2017-01-01 22:45 ` [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
  2017-01-01 22:45 ` [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
  0 siblings, 2 replies; 8+ messages in thread
From: Sowmini Varadhan @ 2017-01-01 22:45 UTC (permalink / raw)
  To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem

This patch series allows an application to use a single PF_PACKET
descriptor and leverage the best implementations of TX_RING
and RX_RING that exist today.

Updates in v2 are listed below:

- patch 1 (which builds on the earlier patch discussed at
  http://patchwork.ozlabs.org/patch/709840/) verifies that tp_next_offset
  is set to 0 for TPACKET_V3 in the tpacket_snd path, indicating that
  variable-sized frames are not in use. Applications that wish to do
  block-sends must fill up multiple frames in the Tx ring and then
  trigger the transmit. At the current time, only fixed-size frames are
  supported on TX_RING for TPACKET_V3.

- patch 2 in this series adds a test case and sample code for
  (TPACKET_V3, PACKET_TX_RING) in testing/selftests

Sowmini Varadhan (2):
  af_packet: TX_RING support for TPACKET_V3
  tools: test case for TPACKET_V3/TX_RING support

 Documentation/networking/packet_mmap.txt    |    9 ++-
 net/packet/af_packet.c                      |   27 +++++--
 tools/testing/selftests/net/psock_tpacket.c |  110 ++++++++++++++++++++++++++-
 3 files changed, 135 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
  2017-01-01 22:45 [PATCH v2 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
@ 2017-01-01 22:45 ` Sowmini Varadhan
  2017-01-02 22:57   ` Willem de Bruijn
  2017-01-01 22:45 ` [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
  1 sibling, 1 reply; 8+ messages in thread
From: Sowmini Varadhan @ 2017-01-01 22:45 UTC (permalink / raw)
  To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem

Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.

This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: sanity checks on tp_next_offset and corresponding Doc updates
    as suggested by Willem de Bruijn

 Documentation/networking/packet_mmap.txt |    9 +++++++--
 net/packet/af_packet.c                   |   27 +++++++++++++++++++--------
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index daa015a..f3b9e50 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -565,7 +565,7 @@ where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
 		   (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
 
 TPACKET_V2 --> TPACKET_V3:
-	- Flexible buffer implementation:
+	- Flexible buffer implementation for RX_RING:
 		1. Blocks can be configured with non-static frame-size
 		2. Read/poll is at a block-level (as opposed to packet-level)
 		3. Added poll timeout to avoid indefinite user-space wait
@@ -574,7 +574,12 @@ where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
 			4.1 block::timeout
 			4.2 tpkt_hdr::sk_rxhash
 	- RX Hash data available in user space
-	- Currently only RX_RING available
+	- TX_RING semantics are conceptually similar to TPACKET_V2;
+	  use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN
+	  instead of TPACKET2_HDRLEN. In the current implementation,
+	  the tp_next_offset field in the tpacket3_hdr MUST be set to
+	  zero, indicating that the ring does not hold variable sized frames.
+	  Packets with non-zero values of tp_next_offset will be dropped.
 
 -------------------------------------------------------------------------------
 + AF_PACKET fanout mode
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 89f2e8c..513f52b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -409,6 +409,9 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
 		flush_dcache_page(pgv_to_page(&h.h2->tp_status));
 		break;
 	case TPACKET_V3:
+		h.h3->tp_status = status;
+		flush_dcache_page(pgv_to_page(&h.h3->tp_status));
+		break;
 	default:
 		WARN(1, "TPACKET version not supported.\n");
 		BUG();
@@ -432,6 +435,8 @@ static int __packet_get_status(struct packet_sock *po, void *frame)
 		flush_dcache_page(pgv_to_page(&h.h2->tp_status));
 		return h.h2->tp_status;
 	case TPACKET_V3:
+		flush_dcache_page(pgv_to_page(&h.h3->tp_status));
+		return h.h3->tp_status;
 	default:
 		WARN(1, "TPACKET version not supported.\n");
 		BUG();
@@ -2500,6 +2505,13 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
 	ph.raw = frame;
 
 	switch (po->tp_version) {
+	case TPACKET_V3:
+		if (ph.h3->tp_next_offset != 0) {
+			pr_warn_once("variable sized slot not supported");
+			return -EINVAL;
+		}
+		tp_len = ph.h3->tp_len;
+		break;
 	case TPACKET_V2:
 		tp_len = ph.h2->tp_len;
 		break;
@@ -2519,6 +2531,9 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
 		off_max = po->tx_ring.frame_size - tp_len;
 		if (po->sk.sk_type == SOCK_DGRAM) {
 			switch (po->tp_version) {
+			case TPACKET_V3:
+				off = ph.h3->tp_net;
+				break;
 			case TPACKET_V2:
 				off = ph.h2->tp_net;
 				break;
@@ -2528,6 +2543,9 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
 			}
 		} else {
 			switch (po->tp_version) {
+			case TPACKET_V3:
+				off = ph.h3->tp_mac;
+				break;
 			case TPACKET_V2:
 				off = ph.h2->tp_mac;
 				break;
@@ -4116,11 +4134,6 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 	struct tpacket_req *req = &req_u->req;
 
 	lock_sock(sk);
-	/* Opening a Tx-ring is NOT supported in TPACKET_V3 */
-	if (!closing && tx_ring && (po->tp_version > TPACKET_V2)) {
-		net_warn_ratelimited("Tx-ring is not supported.\n");
-		goto out;
-	}
 
 	rb = tx_ring ? &po->tx_ring : &po->rx_ring;
 	rb_queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
@@ -4180,9 +4193,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 			goto out;
 		switch (po->tp_version) {
 		case TPACKET_V3:
-		/* Transmit path is not supported. We checked
-		 * it above but just being paranoid
-		 */
+			/* Block transmit is not supported yet */
 			if (!tx_ring)
 				init_prb_bdqc(po, rb, pg_vec, req_u);
 			break;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
  2017-01-01 22:45 [PATCH v2 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
  2017-01-01 22:45 ` [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
@ 2017-01-01 22:45 ` Sowmini Varadhan
  2017-01-02 22:31   ` Willem de Bruijn
  1 sibling, 1 reply; 8+ messages in thread
From: Sowmini Varadhan @ 2017-01-01 22:45 UTC (permalink / raw)
  To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem

Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: Added test case.

 tools/testing/selftests/net/psock_tpacket.c |  110 ++++++++++++++++++++++++++-
 1 files changed, 109 insertions(+), 1 deletions(-)

diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 24adf70..f2012dc 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -311,6 +311,17 @@ static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr)
 	__sync_synchronize();
 }
 
+static inline int __v3_tx_kernel_ready(struct tpacket3_hdr *hdr)
+{
+	return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING));
+}
+
+static inline void __v3_tx_user_ready(struct tpacket3_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_SEND_REQUEST;
+	__sync_synchronize();
+}
+
 static inline int __v1_v2_tx_kernel_ready(void *base, int version)
 {
 	switch (version) {
@@ -578,12 +589,108 @@ static void walk_v3_rx(int sock, struct ring *ring)
 	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
 }
 
+static inline void *
+get_v3_frame(struct ring *ring, int n)
+{
+	uint8_t *f0 = ring->rd[0].iov_base;
+
+	return f0 + (n * ring->req3.tp_frame_size);
+}
+
+static void walk_v3_tx(int sock, struct ring *ring)
+{
+	struct pollfd pfd;
+	int rcv_sock, ret;
+	size_t packet_len;
+	char packet[1024];
+	unsigned int frame_num = 0, got = 0;
+	struct sockaddr_ll ll = {
+		.sll_family = PF_PACKET,
+		.sll_halen = ETH_ALEN,
+	};
+
+	bug_on(ring->type != PACKET_TX_RING);
+	bug_on(ring->req3.tp_frame_nr < NUM_PACKETS);
+
+	rcv_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+	if (rcv_sock == -1) {
+		perror("socket");
+		exit(1);
+	}
+
+	pair_udp_setfilter(rcv_sock);
+
+	ll.sll_ifindex = if_nametoindex("lo");
+	ret = bind(rcv_sock, (struct sockaddr *) &ll, sizeof(ll));
+	if (ret == -1) {
+		perror("bind");
+		exit(1);
+	}
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLOUT | POLLERR;
+	pfd.revents = 0;
+
+	total_packets = NUM_PACKETS;
+	create_payload(packet, &packet_len);
+
+	while (total_packets > 0) {
+		struct tpacket3_hdr *tx = get_v3_frame(ring, frame_num);
+
+		while (__v3_tx_kernel_ready(tx) && total_packets > 0) {
+			tx->tp_snaplen = packet_len;
+			tx->tp_len = packet_len;
+			memcpy((uint8_t *) tx + TPACKET3_HDRLEN -
+			       sizeof(struct sockaddr_ll), packet, packet_len);
+			total_bytes += tx->tp_snaplen;
+
+			status_bar_update();
+			total_packets--;
+
+			__v3_tx_user_ready(tx);
+
+			frame_num = (frame_num + 1) % ring->req3.tp_frame_nr;
+		}
+
+		poll(&pfd, 1, 1);
+	}
+
+	bug_on(total_packets != 0);
+
+	ret = sendto(sock, NULL, 0, 0, NULL, 0);
+	if (ret == -1) {
+		perror("sendto");
+		exit(1);
+	}
+
+	while ((ret = recvfrom(rcv_sock, packet, sizeof(packet),
+			       0, NULL, NULL)) > 0 &&
+	       total_packets < NUM_PACKETS) {
+		got += ret;
+		test_payload(packet, ret);
+
+		status_bar_update();
+		total_packets++;
+	}
+
+	close(rcv_sock);
+
+	if (total_packets != NUM_PACKETS) {
+		fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+			ring->version, total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, got);
+}
+
 static void walk_v3(int sock, struct ring *ring)
 {
 	if (ring->type == PACKET_RX_RING)
 		walk_v3_rx(sock, ring);
 	else
-		bug_on(1);
+		walk_v3_tx(sock, ring);
 }
 
 static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
@@ -796,6 +903,7 @@ int main(void)
 	ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
 
 	ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
+	ret |= test_tpacket(TPACKET_V3, PACKET_TX_RING);
 
 	if (ret)
 		return 1;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
  2017-01-01 22:45 ` [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
@ 2017-01-02 22:31   ` Willem de Bruijn
  2017-01-02 23:02     ` Sowmini Varadhan
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2017-01-02 22:31 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
	David Miller

On Sun, Jan 1, 2017 at 5:45 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)

Thanks for adding this.

walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
just be extended to add a v3 case where it already multiplexes between
v1 and v2.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
  2017-01-01 22:45 ` [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
@ 2017-01-02 22:57   ` Willem de Bruijn
  2017-01-02 23:07     ` Sowmini Varadhan
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2017-01-02 22:57 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
	David Miller

On Sun, Jan 1, 2017 at 5:45 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
> does not currently have TX_RING support. As a result an application
> that wants the best perf for Tx and Rx (e.g. to handle request/response
> transacations) ends up needing 2 sockets, one with *_v2 for Tx and
> another with *_v3 for Rx.
>
> This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
> so that an application can use a single descriptor to get the benefits
> of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
> first filling up multiple frames in the Tx ring and then triggering a
> transmit. This patch only support fixed size Tx frames for TPACKET_V3,
> and requires that tp_next_offset must be zero.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
> v2: sanity checks on tp_next_offset and corresponding Doc updates
>     as suggested by Willem de Bruijn
>
>  Documentation/networking/packet_mmap.txt |    9 +++++++--
>  net/packet/af_packet.c                   |   27 +++++++++++++++++++--------
>  2 files changed, 26 insertions(+), 10 deletions(-)

> @@ -4180,9 +4193,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
>                         goto out;
>                 switch (po->tp_version) {
>                 case TPACKET_V3:
> -               /* Transmit path is not supported. We checked
> -                * it above but just being paranoid
> -                */
> +                       /* Block transmit is not supported yet */
>                         if (!tx_ring)
>                                 init_prb_bdqc(po, rb, pg_vec, req_u);


One more point. We should validate the tpacket_req3 input and fail if
unsupported options are passed. Specifically, fail if any of
{tp_retire_blk_tov, tp_sizeof_priv, tp_feature_req_word} is non-zero.

Otherwise looks good to me.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
  2017-01-02 22:31   ` Willem de Bruijn
@ 2017-01-02 23:02     ` Sowmini Varadhan
  2017-01-02 23:15       ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Sowmini Varadhan @ 2017-01-02 23:02 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
	David Miller

On (01/02/17 17:31), Willem de Bruijn wrote:
> 
> Thanks for adding this.
> 
> walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
> just be extended to add a v3 case where it already multiplexes between
> v1 and v2.

I looked at that, but the sticky point is that v1/v2 sets up the
ring->rd* related variables based on frames (e.g., rd_num is tp_frame_nr)
whereas V3 sets these up based on blocks (e.g, rd_num is  tp_block_nr) 
so this impacts the core sending loop a bit.

I suppose we could change the walk_v2_v2_tx to be something like
	while (total_packets > 0) {
		if (ring->version) {
		    /* V3 send, that takes above difference into account */
		} else {
		    /* existing code */
		}
		/* status_bar_update(), user_ready  update frame_num */
	}

I can change it as above, if you think this would help.

--Sowmini

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
  2017-01-02 22:57   ` Willem de Bruijn
@ 2017-01-02 23:07     ` Sowmini Varadhan
  0 siblings, 0 replies; 8+ messages in thread
From: Sowmini Varadhan @ 2017-01-02 23:07 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
	David Miller

On (01/02/17 17:57), Willem de Bruijn wrote:
> One more point. We should validate the tpacket_req3 input and fail if
> unsupported options are passed. Specifically, fail if any of
> {tp_retire_blk_tov, tp_sizeof_priv, tp_feature_req_word} is non-zero.
> 
> Otherwise looks good to me.

Ok, I'll send out v3 tomorrow, with the test case also updated
to share code with walk_v1_v2_tx as cleanly as possible. 

Thanks for the review feedback!

--Sowmini

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
  2017-01-02 23:02     ` Sowmini Varadhan
@ 2017-01-02 23:15       ` Willem de Bruijn
  0 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2017-01-02 23:15 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
	David Miller

On Mon, Jan 2, 2017 at 6:02 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/02/17 17:31), Willem de Bruijn wrote:
>>
>> Thanks for adding this.
>>
>> walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
>> just be extended to add a v3 case where it already multiplexes between
>> v1 and v2.
>
> I looked at that, but the sticky point is that v1/v2 sets up the
> ring->rd* related variables based on frames (e.g., rd_num is tp_frame_nr)
> whereas V3 sets these up based on blocks (e.g, rd_num is  tp_block_nr)
> so this impacts the core sending loop a bit.

Good point. Yes, deduplicating the function will help make it crystal
clear where v3 differs from v2.

The patch already has __v3_tx_kernel_ready and __v3_tx_user_ready,
which can be plugged into the existing multiplexer functions
__v1_v2_tx_kernel_ready and __v2_v2_tx_user_ready multiplexer
(along with changing their names).

We'll indeed need a similar multiplexer function for calculating the next
frame to work around this rd_num issue, then.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-02 23:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-01 22:45 [PATCH v2 net-next 0/2] TPACKET_V3 TX_RING support Sowmini Varadhan
2017-01-01 22:45 ` [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3 Sowmini Varadhan
2017-01-02 22:57   ` Willem de Bruijn
2017-01-02 23:07     ` Sowmini Varadhan
2017-01-01 22:45 ` [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support Sowmini Varadhan
2017-01-02 22:31   ` Willem de Bruijn
2017-01-02 23:02     ` Sowmini Varadhan
2017-01-02 23:15       ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).