netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ARM, AF_PACKET: caching problems on Marvell Kirkwood
@ 2011-04-08 13:06 Phil Sutter
  2011-05-05 14:11 ` Phil Sutter
  0 siblings, 1 reply; 19+ messages in thread
From: Phil Sutter @ 2011-04-08 13:06 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: netdev, ne

Dear lists,

I am experiencing severe caching issues using the TX_RING feature of
AF_PACKET on a Kirkwood-based system (i.e., OpenRD). This may likely be
a bug of the CPU/SoC itself, at least it reacts a bit picky when using
the preload data instruction (pld) in rather useless cases (but that's a
different story).

There is simple testing code at the end of this email, effectively just
preparing a packet in the TX_RING and triggering it's delivery once per
second. The experienced symptom is that sporadically nothing goes out in
one iteration, and two packets in the following one.

It looks like the kernel doesn't get the changed value of tp_status in
time, although userspace sees the correct value. Note that moving the
sleep(1) from the end of the loop to just before calling sendto() fixes
the problem.

Another (more useful) workaround is to call flush_cache_all() at the
beginning of packet_sendmsg() in net/packet/af_packet.c. I was not able
to fix this with some more specific flushing at that place. Anyway, the
call to flush_dcache_page() from __packet_get_status() in the same
source file is meant to do the trick I guess. But somehow doesn't.

Feedback regardles of which kind is highly appreciated, of course!

Greetings, Phil

------------------[start of packet_mmap_test.c]--------------------
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <net/if.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/types.h>

#define PERROR_EXIT(rc, mesg) { \
	perror(mesg); \
	return rc; \
}

int main(void)
{
	uint32_t size;
	struct sockaddr_ll sa;
	struct ifreq ifr;
	int index;
	int tmp;
	int fd;
	struct tpacket_req packet_req;
	struct tpacket2_hdr * ps_header_start, *ps_header;

	if ((fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL))) < 0)
		PERROR_EXIT(EXIT_FAILURE, "socket");

	/* retrieve eth0's interface index number */
	strncpy (ifr.ifr_name, "eth0", sizeof(ifr.ifr_name));
	if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0)
		PERROR_EXIT(EXIT_FAILURE, "ioctl(SIOCGIFINDEX)");

	/* set sockaddr info */
	memset(&sa, 0, sizeof(sa));
	sa.sll_family = AF_PACKET;
	sa.sll_protocol = ETH_P_ALL;
	sa.sll_ifindex = ifr.ifr_ifindex;

	/* bind port */
	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "bind()");

	tmp = TPACKET_V2;
	if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, &tmp, sizeof(tmp)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_VERSION)");

	/* set packet loss option */
	tmp = 1;
	if (setsockopt(fd, SOL_PACKET, PACKET_LOSS, &tmp, sizeof(tmp)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_LOSS)");

	/* prepare Tx ring request */
	packet_req.tp_block_size = 1024 * 8;
	packet_req.tp_frame_size = 1024 * 8;
	packet_req.tp_block_nr = 1024;
	packet_req.tp_frame_nr = 1024;

	/* send TX ring request */
	if (setsockopt(fd, SOL_PACKET, PACKET_TX_RING,
	               &packet_req, sizeof(packet_req)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "setsockopt: PACKET_TX_RING");

	/* calculate memory to mmap in the kernel */
	size = packet_req.tp_block_size * packet_req.tp_block_nr;

	/* mmap Tx ring buffers memory */
	ps_header_start = mmap(0, size,
			PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (ps_header_start < 0)
		PERROR_EXIT(EXIT_FAILURE, "mmap");

	/* fill peer sockaddr for SOCK_DGRAM */
	sa.sll_family = AF_PACKET;
	sa.sll_protocol = htons(ETH_P_IP);
	sa.sll_ifindex = ifr.ifr_ifindex;
	sa.sll_halen = ETH_ALEN;
	memset(&sa.sll_addr, 0xff, ETH_ALEN);

	ps_header = ps_header_start;
	while (1) {
		int sendlen, j;

		char *data = (void*)ps_header + TPACKET_HDRLEN
		              - sizeof(struct sockaddr_ll);

		switch((volatile uint32_t)ps_header->tp_status)
		{
		case TP_STATUS_AVAILABLE:
			memset(data, 0x23, 150);
			break;

		case TP_STATUS_WRONG_FORMAT:
			printf("An error has occured during transfer\n");
			exit(EXIT_FAILURE);
			break;

		default:
			printf("Buffer is not available, aborting\n");
			exit(1);
			break;
		}
		ps_header->tp_len = 150;
		ps_header->tp_status = TP_STATUS_SEND_REQUEST;

		sendlen = sendto(fd, NULL, 0, 0,
				(struct sockaddr *)&sa, sizeof(sa));
		if (sendlen < 0)
			perror("sendto");
		else if (sendlen == 0)
			printf("sendto(): nothing sent!\n");
		else
			printf("sendto(): sent %d bytes out\n", sendlen);

#define ST_IS(x) ((volatile uint32_t)ps_header->tp_status == x)
		printf("tp_status after sending: %s\n",
				ST_IS(TP_STATUS_AVAILABLE) ? "AVAILABLE" :
				ST_IS(TP_STATUS_SEND_REQUEST) ? "SEND_REQUEST" :
				ST_IS(TP_STATUS_WRONG_FORMAT) ? "WRONG_FORMAT" :
				"unknown");
#undef ST_IS

		ps_header = (void *)ps_header + packet_req.tp_frame_size;
		if (ps_header >= ps_header_start + size)
			ps_header = ps_header_start;

		sleep(1);
	}
	return 0;
}
--------------------[end of packet_mmap_test.c]--------------------

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-04-08 13:06 ARM, AF_PACKET: caching problems on Marvell Kirkwood Phil Sutter
@ 2011-05-05 14:11 ` Phil Sutter
  2011-05-05 14:56   ` Eric Dumazet
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Phil Sutter @ 2011-05-05 14:11 UTC (permalink / raw)
  To: linux-arm-kernel, netdev, ne
  Cc: Johann Baudy, Lennert Buytenhek, Nicolas Pitre

Hi,

Hasn't anyone experienced this bug but me? Can anyone reproduce the
described behaviour on his Kirkwood-based (or even generic ARM) machine?
I am still not sure if this is a problem of just my CPU or common
amongst Kirkwood/VIPT/ARM machines.

My workaround looks like this:
| diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
| index b5362e9..0672f50 100644
| --- a/net/packet/af_packet.c
| +++ b/net/packet/af_packet.c
| @@ -1298,10 +1298,13 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
|  {
|         struct sock *sk = sock->sk;
|         struct packet_sock *po = pkt_sk(sk);
| -       if (po->tx_ring.pg_vec)
| -               return tpacket_snd(po, msg);
| -       else
| -               return packet_snd(sock, msg, len);
| +       int rc;
| +
| +       flush_cache_all();
| +       rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
| +                       packet_snd(sock, msg, len);
| +       flush_cache_all();
| +       return rc;
|  }
|  
|  /*

Greetings, Phil

(Full-quoting here because I've added the TX ring author and the Kirkwood
maintainers to Cc.)

On Fri, Apr 08, 2011 at 03:06:43PM +0200, Phil Sutter wrote:
> Dear lists,
> 
> I am experiencing severe caching issues using the TX_RING feature of
> AF_PACKET on a Kirkwood-based system (i.e., OpenRD). This may likely be
> a bug of the CPU/SoC itself, at least it reacts a bit picky when using
> the preload data instruction (pld) in rather useless cases (but that's a
> different story).
> 
> There is simple testing code at the end of this email, effectively just
> preparing a packet in the TX_RING and triggering it's delivery once per
> second. The experienced symptom is that sporadically nothing goes out in
> one iteration, and two packets in the following one.
> 
> It looks like the kernel doesn't get the changed value of tp_status in
> time, although userspace sees the correct value. Note that moving the
> sleep(1) from the end of the loop to just before calling sendto() fixes
> the problem.
> 
> Another (more useful) workaround is to call flush_cache_all() at the
> beginning of packet_sendmsg() in net/packet/af_packet.c. I was not able
> to fix this with some more specific flushing at that place. Anyway, the
> call to flush_dcache_page() from __packet_get_status() in the same
> source file is meant to do the trick I guess. But somehow doesn't.
> 
> Feedback regardles of which kind is highly appreciated, of course!
> 
> Greetings, Phil
> 
> ------------------[start of packet_mmap_test.c]--------------------
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <linux/if_ether.h>
> #include <linux/if_packet.h>
> #include <net/if.h>
> #include <sys/ioctl.h>
> #include <sys/mman.h>
> #include <sys/socket.h>
> #include <sys/types.h>
> 
> #define PERROR_EXIT(rc, mesg) { \
> 	perror(mesg); \
> 	return rc; \
> }
> 
> int main(void)
> {
> 	uint32_t size;
> 	struct sockaddr_ll sa;
> 	struct ifreq ifr;
> 	int index;
> 	int tmp;
> 	int fd;
> 	struct tpacket_req packet_req;
> 	struct tpacket2_hdr * ps_header_start, *ps_header;
> 
> 	if ((fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL))) < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "socket");
> 
> 	/* retrieve eth0's interface index number */
> 	strncpy (ifr.ifr_name, "eth0", sizeof(ifr.ifr_name));
> 	if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "ioctl(SIOCGIFINDEX)");
> 
> 	/* set sockaddr info */
> 	memset(&sa, 0, sizeof(sa));
> 	sa.sll_family = AF_PACKET;
> 	sa.sll_protocol = ETH_P_ALL;
> 	sa.sll_ifindex = ifr.ifr_ifindex;
> 
> 	/* bind port */
> 	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "bind()");
> 
> 	tmp = TPACKET_V2;
> 	if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, &tmp, sizeof(tmp)) < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_VERSION)");
> 
> 	/* set packet loss option */
> 	tmp = 1;
> 	if (setsockopt(fd, SOL_PACKET, PACKET_LOSS, &tmp, sizeof(tmp)) < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_LOSS)");
> 
> 	/* prepare Tx ring request */
> 	packet_req.tp_block_size = 1024 * 8;
> 	packet_req.tp_frame_size = 1024 * 8;
> 	packet_req.tp_block_nr = 1024;
> 	packet_req.tp_frame_nr = 1024;
> 
> 	/* send TX ring request */
> 	if (setsockopt(fd, SOL_PACKET, PACKET_TX_RING,
> 	               &packet_req, sizeof(packet_req)) < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "setsockopt: PACKET_TX_RING");
> 
> 	/* calculate memory to mmap in the kernel */
> 	size = packet_req.tp_block_size * packet_req.tp_block_nr;
> 
> 	/* mmap Tx ring buffers memory */
> 	ps_header_start = mmap(0, size,
> 			PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> 	if (ps_header_start < 0)
> 		PERROR_EXIT(EXIT_FAILURE, "mmap");
> 
> 	/* fill peer sockaddr for SOCK_DGRAM */
> 	sa.sll_family = AF_PACKET;
> 	sa.sll_protocol = htons(ETH_P_IP);
> 	sa.sll_ifindex = ifr.ifr_ifindex;
> 	sa.sll_halen = ETH_ALEN;
> 	memset(&sa.sll_addr, 0xff, ETH_ALEN);
> 
> 	ps_header = ps_header_start;
> 	while (1) {
> 		int sendlen, j;
> 
> 		char *data = (void*)ps_header + TPACKET_HDRLEN
> 		              - sizeof(struct sockaddr_ll);
> 
> 		switch((volatile uint32_t)ps_header->tp_status)
> 		{
> 		case TP_STATUS_AVAILABLE:
> 			memset(data, 0x23, 150);
> 			break;
> 
> 		case TP_STATUS_WRONG_FORMAT:
> 			printf("An error has occured during transfer\n");
> 			exit(EXIT_FAILURE);
> 			break;
> 
> 		default:
> 			printf("Buffer is not available, aborting\n");
> 			exit(1);
> 			break;
> 		}
> 		ps_header->tp_len = 150;
> 		ps_header->tp_status = TP_STATUS_SEND_REQUEST;
> 
> 		sendlen = sendto(fd, NULL, 0, 0,
> 				(struct sockaddr *)&sa, sizeof(sa));
> 		if (sendlen < 0)
> 			perror("sendto");
> 		else if (sendlen == 0)
> 			printf("sendto(): nothing sent!\n");
> 		else
> 			printf("sendto(): sent %d bytes out\n", sendlen);
> 
> #define ST_IS(x) ((volatile uint32_t)ps_header->tp_status == x)
> 		printf("tp_status after sending: %s\n",
> 				ST_IS(TP_STATUS_AVAILABLE) ? "AVAILABLE" :
> 				ST_IS(TP_STATUS_SEND_REQUEST) ? "SEND_REQUEST" :
> 				ST_IS(TP_STATUS_WRONG_FORMAT) ? "WRONG_FORMAT" :
> 				"unknown");
> #undef ST_IS
> 
> 		ps_header = (void *)ps_header + packet_req.tp_frame_size;
> 		if (ps_header >= ps_header_start + size)
> 			ps_header = ps_header_start;
> 
> 		sleep(1);
> 	}
> 	return 0;
> }
> --------------------[end of packet_mmap_test.c]--------------------
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-05-05 14:11 ` Phil Sutter
@ 2011-05-05 14:56   ` Eric Dumazet
  2011-05-06 16:12     ` Phil Sutter
  2011-05-05 19:46   ` Andrew Lunn
  2011-09-02 11:08   ` [PATCH] af_packet: flush complete kernel cache in packet_sendmsg Phil Sutter
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-05-05 14:56 UTC (permalink / raw)
  To: Phil Sutter
  Cc: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre

Le jeudi 05 mai 2011 à 16:11 +0200, Phil Sutter a écrit :
> Hi,
> 
> Hasn't anyone experienced this bug but me? Can anyone reproduce the
> described behaviour on his Kirkwood-based (or even generic ARM) machine?
> I am still not sure if this is a problem of just my CPU or common
> amongst Kirkwood/VIPT/ARM machines.
> 
> My workaround looks like this:
> | diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> | index b5362e9..0672f50 100644
> | --- a/net/packet/af_packet.c
> | +++ b/net/packet/af_packet.c
> | @@ -1298,10 +1298,13 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
> |  {
> |         struct sock *sk = sock->sk;
> |         struct packet_sock *po = pkt_sk(sk);
> | -       if (po->tx_ring.pg_vec)
> | -               return tpacket_snd(po, msg);
> | -       else
> | -               return packet_snd(sock, msg, len);
> | +       int rc;
> | +
> | +       flush_cache_all();
> | +       rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
> | +                       packet_snd(sock, msg, len);
> | +       flush_cache_all();
> | +       return rc;
> |  }
> |  
> |  /*
> 
> Greetings, Phil
> 
> (Full-quoting here because I've added the TX ring author and the Kirkwood
> maintainers to Cc.)
> 
> On Fri, Apr 08, 2011 at 03:06:43PM +0200, Phil Sutter wrote:
> > Dear lists,
> > 
> > I am experiencing severe caching issues using the TX_RING feature of
> > AF_PACKET on a Kirkwood-based system (i.e., OpenRD). This may likely be
> > a bug of the CPU/SoC itself, at least it reacts a bit picky when using
> > the preload data instruction (pld) in rather useless cases (but that's a
> > different story).
> > 
> > There is simple testing code at the end of this email, effectively just
> > preparing a packet in the TX_RING and triggering it's delivery once per
> > second. The experienced symptom is that sporadically nothing goes out in
> > one iteration, and two packets in the following one.
> > 
> > It looks like the kernel doesn't get the changed value of tp_status in
> > time, although userspace sees the correct value. Note that moving the
> > sleep(1) from the end of the loop to just before calling sendto() fixes
> > the problem.
> > 
> > Another (more useful) workaround is to call flush_cache_all() at the
> > beginning of packet_sendmsg() in net/packet/af_packet.c. I was not able
> > to fix this with some more specific flushing at that place. Anyway, the
> > call to flush_dcache_page() from __packet_get_status() in the same
> > source file is meant to do the trick I guess. But somehow doesn't.
> > 
> > Feedback regardles of which kind is highly appreciated, of course!
> > 
> > Greetings, Phil
> > 
> > ------------------[start of packet_mmap_test.c]--------------------
> > #include <stdint.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <linux/if_ether.h>
> > #include <linux/if_packet.h>
> > #include <net/if.h>
> > #include <sys/ioctl.h>
> > #include <sys/mman.h>
> > #include <sys/socket.h>
> > #include <sys/types.h>
> > 
> > #define PERROR_EXIT(rc, mesg) { \
> > 	perror(mesg); \
> > 	return rc; \
> > }
> > 
> > int main(void)
> > {
> > 	uint32_t size;
> > 	struct sockaddr_ll sa;
> > 	struct ifreq ifr;
> > 	int index;
> > 	int tmp;
> > 	int fd;
> > 	struct tpacket_req packet_req;
> > 	struct tpacket2_hdr * ps_header_start, *ps_header;
> > 
> > 	if ((fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL))) < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "socket");
> > 
> > 	/* retrieve eth0's interface index number */
> > 	strncpy (ifr.ifr_name, "eth0", sizeof(ifr.ifr_name));
> > 	if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "ioctl(SIOCGIFINDEX)");
> > 
> > 	/* set sockaddr info */
> > 	memset(&sa, 0, sizeof(sa));
> > 	sa.sll_family = AF_PACKET;
> > 	sa.sll_protocol = ETH_P_ALL;
> > 	sa.sll_ifindex = ifr.ifr_ifindex;
> > 
> > 	/* bind port */
> > 	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "bind()");
> > 
> > 	tmp = TPACKET_V2;
> > 	if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, &tmp, sizeof(tmp)) < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_VERSION)");
> > 
> > 	/* set packet loss option */
> > 	tmp = 1;
> > 	if (setsockopt(fd, SOL_PACKET, PACKET_LOSS, &tmp, sizeof(tmp)) < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_LOSS)");
> > 
> > 	/* prepare Tx ring request */
> > 	packet_req.tp_block_size = 1024 * 8;
> > 	packet_req.tp_frame_size = 1024 * 8;
> > 	packet_req.tp_block_nr = 1024;
> > 	packet_req.tp_frame_nr = 1024;
> > 
> > 	/* send TX ring request */
> > 	if (setsockopt(fd, SOL_PACKET, PACKET_TX_RING,
> > 	               &packet_req, sizeof(packet_req)) < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "setsockopt: PACKET_TX_RING");
> > 
> > 	/* calculate memory to mmap in the kernel */
> > 	size = packet_req.tp_block_size * packet_req.tp_block_nr;
> > 
> > 	/* mmap Tx ring buffers memory */
> > 	ps_header_start = mmap(0, size,
> > 			PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> > 	if (ps_header_start < 0)
> > 		PERROR_EXIT(EXIT_FAILURE, "mmap");
> > 
> > 	/* fill peer sockaddr for SOCK_DGRAM */
> > 	sa.sll_family = AF_PACKET;
> > 	sa.sll_protocol = htons(ETH_P_IP);
> > 	sa.sll_ifindex = ifr.ifr_ifindex;
> > 	sa.sll_halen = ETH_ALEN;
> > 	memset(&sa.sll_addr, 0xff, ETH_ALEN);
> > 
> > 	ps_header = ps_header_start;
> > 	while (1) {
> > 		int sendlen, j;
> > 
> > 		char *data = (void*)ps_header + TPACKET_HDRLEN
> > 		              - sizeof(struct sockaddr_ll);
> > 
> > 		switch((volatile uint32_t)ps_header->tp_status)
> > 		{
> > 		case TP_STATUS_AVAILABLE:
> > 			memset(data, 0x23, 150);
> > 			break;
> > 
> > 		case TP_STATUS_WRONG_FORMAT:
> > 			printf("An error has occured during transfer\n");
> > 			exit(EXIT_FAILURE);
> > 			break;
> > 
> > 		default:
> > 			printf("Buffer is not available, aborting\n");
> > 			exit(1);
> > 			break;
> > 		}
> > 		ps_header->tp_len = 150;
> > 		ps_header->tp_status = TP_STATUS_SEND_REQUEST;
> > 
> > 		sendlen = sendto(fd, NULL, 0, 0,
> > 				(struct sockaddr *)&sa, sizeof(sa));
> > 		if (sendlen < 0)
> > 			perror("sendto");
> > 		else if (sendlen == 0)
> > 			printf("sendto(): nothing sent!\n");
> > 		else
> > 			printf("sendto(): sent %d bytes out\n", sendlen);
> > 
> > #define ST_IS(x) ((volatile uint32_t)ps_header->tp_status == x)
> > 		printf("tp_status after sending: %s\n",
> > 				ST_IS(TP_STATUS_AVAILABLE) ? "AVAILABLE" :
> > 				ST_IS(TP_STATUS_SEND_REQUEST) ? "SEND_REQUEST" :
> > 				ST_IS(TP_STATUS_WRONG_FORMAT) ? "WRONG_FORMAT" :
> > 				"unknown");
> > #undef ST_IS
> > 
> > 		ps_header = (void *)ps_header + packet_req.tp_frame_size;
> > 		if (ps_header >= ps_header_start + size)
> > 			ps_header = ps_header_start;
> > 
> > 		sleep(1);
> > 	}
> > 	return 0;
> > }
> > --------------------[end of packet_mmap_test.c]--------------------

Hi Phil

I assume you use latest linux-2.6 or net-next-2.6 ?

Could you try to force vmalloc() use ?

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5362e9..0b5a89c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2383,7 +2383,7 @@ static inline char *alloc_one_pg_vec_page(unsigned long order)
 	gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
 			  __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
 
-	buffer = (char *) __get_free_pages(gfp_flags, order);
+	buffer = NULL;
 
 	if (buffer)
 		return buffer;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-05-05 14:11 ` Phil Sutter
  2011-05-05 14:56   ` Eric Dumazet
@ 2011-05-05 19:46   ` Andrew Lunn
  2011-05-06 16:17     ` Phil Sutter
  2011-09-02 11:08   ` [PATCH] af_packet: flush complete kernel cache in packet_sendmsg Phil Sutter
  2 siblings, 1 reply; 19+ messages in thread
From: Andrew Lunn @ 2011-05-05 19:46 UTC (permalink / raw)
  To: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre

On Thu, May 05, 2011 at 04:11:07PM +0200, Phil Sutter wrote:
> Hi,
> 
> Hasn't anyone experienced this bug but me? Can anyone reproduce the
> described behaviour on his Kirkwood-based (or even generic ARM) machine?
> I am still not sure if this is a problem of just my CPU or common
> amongst Kirkwood/VIPT/ARM machines.

Hi Phil

I can reproduce it on a Kirkwood:

[    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977

     Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-05-05 14:56   ` Eric Dumazet
@ 2011-05-06 16:12     ` Phil Sutter
  0 siblings, 0 replies; 19+ messages in thread
From: Phil Sutter @ 2011-05-06 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre

Hi,

On Thu, May 05, 2011 at 04:56:02PM +0200, Eric Dumazet wrote:
> I assume you use latest linux-2.6 or net-next-2.6 ?

Well, initially we noticed the problem on 2.6.34.7, but I verified it
against both 2.6.37 and linux-2.6 from three days ago.

> Could you try to force vmalloc() use ?
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index b5362e9..0b5a89c 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2383,7 +2383,7 @@ static inline char *alloc_one_pg_vec_page(unsigned long order)
>  	gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
>  			  __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>  
> -	buffer = (char *) __get_free_pages(gfp_flags, order);
> +	buffer = NULL;
>  
>  	if (buffer)
>  		return buffer;

Thanks for the hint. I tried that, but the problem persists.

Greetings, Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-05-05 19:46   ` Andrew Lunn
@ 2011-05-06 16:17     ` Phil Sutter
  2011-05-09  8:59       ` Phil Sutter
  2011-05-25 10:32       ` Phil Sutter
  0 siblings, 2 replies; 19+ messages in thread
From: Phil Sutter @ 2011-05-06 16:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre

Hi,

On Thu, May 05, 2011 at 09:46:01PM +0200, Andrew Lunn wrote:
> I can reproduce it on a Kirkwood:
> 
> [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977

Thanks for the information. Seems like we have the same CPU:

| [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053177
| [    0.000000] CPU: VIVT data cache, VIVT instruction cache

and it's actually VIVT, not VIPT as I wrote in an earlier mail.

Greetings, Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-05-06 16:17     ` Phil Sutter
@ 2011-05-09  8:59       ` Phil Sutter
  2011-05-25 10:32       ` Phil Sutter
  1 sibling, 0 replies; 19+ messages in thread
From: Phil Sutter @ 2011-05-09  8:59 UTC (permalink / raw)
  To: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre

Hi,

On Fri, May 06, 2011 at 06:17:53PM +0200, Phil Sutter wrote:
> On Thu, May 05, 2011 at 09:46:01PM +0200, Andrew Lunn wrote:
> > I can reproduce it on a Kirkwood:
> > 
> > [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
> 
> Thanks for the information. Seems like we have the same CPU:
> 
> | [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053177
> | [    0.000000] CPU: VIVT data cache, VIVT instruction cache
> 
> and it's actually VIVT, not VIPT as I wrote in an earlier mail.

Interesting news, a friend of mine can reproduce the problem on a
Foxboard (FOXG20):

| root@localhost:/root # dmesg |head -4
| [    0.000000] Linux version 2.6.37 (wbx@chrom) (gcc version 4.5.2 (GCC) ) #1 Sat May 7 19:37:30 CEST 2011
| [    0.000000] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
| [    0.000000] CPU: VIVT data cache, VIVT instruction cache
| [    0.000000] Machine: Acme Systems FOXG20
| root@localhost:/root # uname -a
| Linux localhost 2.6.37 #1 Sat May 7 19:37:30 CEST 2011 armv5tejl GNU/Linux
| root@localhost:/root # /mmap
| sendto(): nothing sent!
| tp_status after sending: SEND_REQUEST
| sendto(): nothing sent!
| tp_status after sending: SEND_REQUEST
| sendto(): sent 450 bytes out
| tp_status after sending: AVAILABLE
| sendto(): sent 150 bytes out
| tp_status after sending: AVAILABLE
| sendto(): nothing sent!
| tp_status after sending: SEND_REQUEST
| sendto(): nothing sent!
| tp_status after sending: SEND_REQUEST
| sendto(): sent 300 bytes out
| tp_status after sending: SEND_REQUEST
| sendto(): nothing sent!
| tp_status after sending: SEND_REQUEST
| sendto(): sent 300 bytes out
| tp_status after sending: SEND_REQUEST
| sendto(): sent 150 bytes out
| tp_status after sending: SEND_REQUEST
| sendto(): sent 150 bytes out
| tp_status after sending: SEND_REQUEST
| sendto(): sent 300 bytes out
| tp_status after sending: AVAILABLE
| sendto(): nothing sent!
| tp_status after sending: SEND_REQUEST
| sendto(): sent 150 bytes out
| tp_status after sending: SEND_REQUEST

Greetings, Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
  2011-05-06 16:17     ` Phil Sutter
  2011-05-09  8:59       ` Phil Sutter
@ 2011-05-25 10:32       ` Phil Sutter
  1 sibling, 0 replies; 19+ messages in thread
From: Phil Sutter @ 2011-05-25 10:32 UTC (permalink / raw)
  To: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre

Hi,

On Fri, May 06, 2011 at 06:17:53PM +0200, Phil Sutter wrote:
> On Thu, May 05, 2011 at 09:46:01PM +0200, Andrew Lunn wrote:
> > I can reproduce it on a Kirkwood:
> > 
> > [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
> 
> Thanks for the information. Seems like we have the same CPU:
> 
> | [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053177
> | [    0.000000] CPU: VIVT data cache, VIVT instruction cache
> 
> and it's actually VIVT, not VIPT as I wrote in an earlier mail.

I have been looking at flush_dcache_page(), which is supposed to do the
trick when caches need to be flushed/dropped for the mmapped memory.
Interestingly, there seems to be no page mapping - so page_mapping(page)
returns NULL and therefore __flush_dcache_aliases(mapping, page) is
never being called.

Could that be the culprit here? I guess there should always be a mapping
present for mmapped pages, right?

Greetings, Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-05-05 14:11 ` Phil Sutter
  2011-05-05 14:56   ` Eric Dumazet
  2011-05-05 19:46   ` Andrew Lunn
@ 2011-09-02 11:08   ` Phil Sutter
  2011-09-02 13:46     ` Ben Hutchings
       [not found]     ` <D3F292ADF945FB49B35E96C94C2061B90A239361@nsmail.netscout.com>
  2 siblings, 2 replies; 19+ messages in thread
From: Phil Sutter @ 2011-09-02 11:08 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: netdev, Russell King, David S. Miller

This flushes the cache before and after accessing the mmapped packet
buffer. It seems like the call to flush_dcache_page from inside
__packet_get_status is not enough on Kirkwood (or ARM in general).
---
I know this is far from an optimal solution, but it's in fact the only working
one I found. And it shouldn't interfere with unaffected target systems. So
anyone relying on a working TX_RING on Kirkwood may refer to this patch. Any
ARM/cache/Marvell/Kirkwood experts out there feel free to improve this.
---
 net/packet/af_packet.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 243946d..d7b5c2e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -87,6 +87,14 @@
 #include <net/inet_common.h>
 #endif
 
+/* whether we need additional cacheflushing between user- and kernel-space */
+#ifdef CONFIG_ARCH_KIRKWOOD
+#  define ENABLE_CACHEPROB_WORKAROUND
+#  define kw_extra_cache_flush()	flush_cache_all()
+#else
+#  define kw_extra_cache_flush()	/* nothing */
+#endif
+
 /*
    Assumptions:
    - if device has no dev->hard_header routine, it adds and removes ll header
@@ -1239,10 +1247,13 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
 {
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
-	if (po->tx_ring.pg_vec)
-		return tpacket_snd(po, msg);
-	else
-		return packet_snd(sock, msg, len);
+	int rc;
+
+	kw_extra_cache_flush();
+	rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
+			packet_snd(sock, msg, len);
+	kw_extra_cache_flush();
+	return rc;
 }
 
 /*
@@ -2622,6 +2633,11 @@ static int __init packet_init(void)
 	sock_register(&packet_family_ops);
 	register_pernet_subsys(&packet_net_ops);
 	register_netdevice_notifier(&packet_netdev_notifier);
+
+#ifdef ENABLE_CACHEPROB_WORKAROUND
+	printk(KERN_INFO "af_packet: cache coherency workaround for kirkwood is active!\n");
+#endif
+
 out:
 	return rc;
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 11:08   ` [PATCH] af_packet: flush complete kernel cache in packet_sendmsg Phil Sutter
@ 2011-09-02 13:46     ` Ben Hutchings
  2011-09-02 13:59       ` Phil Sutter
  2011-09-02 17:28       ` Russell King - ARM Linux
       [not found]     ` <D3F292ADF945FB49B35E96C94C2061B90A239361@nsmail.netscout.com>
  1 sibling, 2 replies; 19+ messages in thread
From: Ben Hutchings @ 2011-09-02 13:46 UTC (permalink / raw)
  To: Phil Sutter; +Cc: linux-arm-kernel, netdev, Russell King, David S. Miller

On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> This flushes the cache before and after accessing the mmapped packet
> buffer. It seems like the call to flush_dcache_page from inside
> __packet_get_status is not enough on Kirkwood (or ARM in general).
> ---
> I know this is far from an optimal solution, but it's in fact the only working
> one I found.
[...]

This is ridiculous.  If flush_dcache_page() isn't doing everything it
should, you need to fix that.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 13:46     ` Ben Hutchings
@ 2011-09-02 13:59       ` Phil Sutter
  2011-09-02 17:28       ` Russell King - ARM Linux
  1 sibling, 0 replies; 19+ messages in thread
From: Phil Sutter @ 2011-09-02 13:59 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-arm-kernel, netdev, Russell King, David S. Miller

On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > This flushes the cache before and after accessing the mmapped packet
> > buffer. It seems like the call to flush_dcache_page from inside
> > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > ---
> > I know this is far from an optimal solution, but it's in fact the only working
> > one I found.
> [...]
> 
> This is ridiculous.  If flush_dcache_page() isn't doing everything it
> should, you need to fix that.

You're absolutely correct. But in fact this problem goes way too deep
for me to find it's cause. And since my time is finite, I doubt this
will change in the near future. So I asked for help, a pointer in
whatever direction or anything I could try to help further analyzing -
without any response (unless I missed it, in which case I apologize).

Please don't get me wrong. I have no intend in this patch becoming
mainline, just want to give others with the same problem a starting
point.

Greetings, Phil
-- 
Viprinet GmbH
Mainzer Str. 43
55411 Bingen am Rhein
Germany

Zentrale:     +49-6721-49030-0
Durchwahl:    +49-6721-49030-134
Fax:          +49-6721-49030-209

phil.sutter@viprinet.com
http://www.viprinet.com

Sitz der Gesellschaft: Bingen am Rhein
Handelsregister: Amtsgericht Mainz HRB40380
Geschäftsführer: Simon Kissel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: FW: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
       [not found]     ` <D3F292ADF945FB49B35E96C94C2061B90A239361@nsmail.netscout.com>
@ 2011-09-02 14:00       ` chetan loke
  2011-09-02 15:31         ` Phil Sutter
  0 siblings, 1 reply; 19+ messages in thread
From: chetan loke @ 2011-09-02 14:00 UTC (permalink / raw)
  To: phil.sutter, linux-arm-kernel; +Cc: netdev, linux, davem

>
> This flushes the cache before and after accessing the mmapped packet
> buffer. It seems like the call to flush_dcache_page from inside
> __packet_get_status is not enough on Kirkwood (or ARM in general).



> +       kw_extra_cache_flush();
> +       rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
> +                       packet_snd(sock, msg, len);
> +       kw_extra_cache_flush();
> +       return rc;
>  }

If a workaround is needed for mmap, then why not change tpacket_snd?

Also, is this workaround actually working for all the cases? Because
packet_get_status is not being touched in your patch.

Also, I don't see any changes for the Rx-path. Is that working ok?


Chetan Loke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: FW: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 14:00       ` FW: " chetan loke
@ 2011-09-02 15:31         ` Phil Sutter
  2011-09-02 16:49           ` chetan loke
  0 siblings, 1 reply; 19+ messages in thread
From: Phil Sutter @ 2011-09-02 15:31 UTC (permalink / raw)
  To: chetan loke; +Cc: linux-arm-kernel, netdev, linux, davem

On Fri, Sep 02, 2011 at 10:00:16AM -0400, chetan loke wrote:
> >
> > This flushes the cache before and after accessing the mmapped packet
> > buffer. It seems like the call to flush_dcache_page from inside
> > __packet_get_status is not enough on Kirkwood (or ARM in general).
> 
> 
> 
> > +       kw_extra_cache_flush();
> > +       rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) :
> > +                       packet_snd(sock, msg, len);
> > +       kw_extra_cache_flush();
> > +       return rc;
> >  }
> 
> If a workaround is needed for mmap, then why not change tpacket_snd?

I did not verify that packet_snd() is not affected. OTOH, adding it
there was quite "intuitive".

> Also, is this workaround actually working for all the cases? Because
> packet_get_status is not being touched in your patch.
> 
> Also, I don't see any changes for the Rx-path. Is that working ok?

So far we haven't noticed problems in that direction. I just tried some
explicit test: having tcpdump print local timestamps (not the pcap-ones)
on every received packet, activating icmp_echo_ignore_all and pinging
the host on a dedicated line. I expected to sometimes see a second
difference between the two timestamps, as like with sending from time to
time a packet should get "lost" in the cache, and then occur to
userspace after the next one arrived. Maybe my test is broken, or RX is
indeed unaffected.

Greetings and thanks for the hints, Phil
-- 
Viprinet GmbH
Mainzer Str. 43
55411 Bingen am Rhein
Germany

Zentrale:     +49-6721-49030-0
Durchwahl:    +49-6721-49030-134
Fax:          +49-6721-49030-209

phil.sutter@viprinet.com
http://www.viprinet.com

Sitz der Gesellschaft: Bingen am Rhein
Handelsregister: Amtsgericht Mainz HRB40380
Geschäftsführer: Simon Kissel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: FW: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 15:31         ` Phil Sutter
@ 2011-09-02 16:49           ` chetan loke
  2011-09-06  9:44             ` Phil Sutter
  0 siblings, 1 reply; 19+ messages in thread
From: chetan loke @ 2011-09-02 16:49 UTC (permalink / raw)
  To: Phil Sutter; +Cc: linux-arm-kernel, netdev, linux, davem

On Fri, Sep 2, 2011 at 11:31 AM, Phil Sutter <phil.sutter@viprinet.com> wrote:

> So far we haven't noticed problems in that direction. I just tried some
> explicit test: having tcpdump print local timestamps (not the pcap-ones)
> on every received packet, activating icmp_echo_ignore_all and pinging
> the host on a dedicated line. I expected to sometimes see a second
> difference between the two timestamps, as like with sending from time to
> time a packet should get "lost" in the cache, and then occur to
> userspace after the next one arrived. Maybe my test is broken, or RX is
> indeed unaffected.
>

You will need high traffic rate. If interested, you could try
pktgen(with varying packet-load). Keep the packet-payload under 1500
bytes (don't send jumbo frames) unless you have the following fix:
commit cc9f01b246ca8e4fa245991840b8076394f86707

Your Tx path is working because flush_cache_call gets triggered before
flush_dcache_page. On the Rx path, since you don't have that
workaround, you will eventually(it's just a matter of time) see this
problem.

Or, delete your patch and try this workaround (in
__packet_get/set_status) and you may be able to cover both Tx and Rx
paths.


diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2ea3d63..35d71dc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -412,11 +412,19 @@ static void __packet_set_status(struct
packet_sock *po, void *frame, int status)
        switch (po->tp_version) {
        case TPACKET_V1:
                h.h1->tp_status = status;
-               flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #else
+                       kw_extra_cache_flush();
+               endif
                break;
        case TPACKET_V2:
                h.h2->tp_status = status;
-               flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #else
+                       kw_extra_cache_flush();
+               #endif
                break;
        case TPACKET_V3:
        default:
@@ -437,13 +445,19 @@ static int __packet_get_status(struct
packet_sock *po, void *frame)

        smp_rmb();

+       kw_extra_cache_flush();
+
        h.raw = frame;
        switch (po->tp_version) {
        case TPACKET_V1:
-               flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h1->tp_status));
+               #endif
                return h.h1->tp_status;
        case TPACKET_V2:
-               flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #ifndef ENABLE_CACHEPROB_WORKAROUND
+                       flush_dcache_page(pgv_to_page(&h.h2->tp_status));
+               #endif
                return h.h2->tp_status;
        case TPACKET_V3:
        default:


> Greetings and thanks for the hints, Phil

Chetan Loke

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 13:46     ` Ben Hutchings
  2011-09-02 13:59       ` Phil Sutter
@ 2011-09-02 17:28       ` Russell King - ARM Linux
  2011-09-05 19:57         ` Phil Sutter
  1 sibling, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2011-09-02 17:28 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, Phil Sutter, David S. Miller, linux-arm-kernel

On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > This flushes the cache before and after accessing the mmapped packet
> > buffer. It seems like the call to flush_dcache_page from inside
> > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > ---
> > I know this is far from an optimal solution, but it's in fact the only working
> > one I found.
> [...]
> 
> This is ridiculous.  If flush_dcache_page() isn't doing everything it
> should, you need to fix that.

It does do everything it should - which is to perform maintanence on
page cache pages.  It flushes the kernel mapping of the page.  It
also flushes the userspace mappings of the page which it finds by
walking the mmap list via the associated struct page.  It does not
touch vmalloc mappings because it has no way to know whether they
exist or not.

It doesn't do so much for anonymous pages - to do so would only
duplicate what flush_anon_page() does at the very same callsites.
Plus the mmap list isn't available for such pages so there's no
way to find out what userspace addresses to flush.

If the AF_PACKET buffers are created from anonymous pages and it's
using flush_dcache_page(), it's using the wrong interface.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 17:28       ` Russell King - ARM Linux
@ 2011-09-05 19:57         ` Phil Sutter
  2011-09-06  9:57           ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Phil Sutter @ 2011-09-05 19:57 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Ben Hutchings, linux-arm-kernel, netdev, David S. Miller

Hi,

On Fri, Sep 02, 2011 at 06:28:50PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > > This flushes the cache before and after accessing the mmapped packet
> > > buffer. It seems like the call to flush_dcache_page from inside
> > > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > > ---
> > > I know this is far from an optimal solution, but it's in fact the only working
> > > one I found.
> > [...]
> > 
> > This is ridiculous.  If flush_dcache_page() isn't doing everything it
> > should, you need to fix that.
> 
> It does do everything it should - which is to perform maintanence on
> page cache pages.  It flushes the kernel mapping of the page.  It
> also flushes the userspace mappings of the page which it finds by
> walking the mmap list via the associated struct page.  It does not
> touch vmalloc mappings because it has no way to know whether they
> exist or not.
> 
> It doesn't do so much for anonymous pages - to do so would only
> duplicate what flush_anon_page() does at the very same callsites.
> Plus the mmap list isn't available for such pages so there's no
> way to find out what userspace addresses to flush.

Indeed very interesting information, thanks a lot!

The code in question uses __get_free_pages(), and if that fails uses
vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths
show result in the same faulty behaviour.

> If the AF_PACKET buffers are created from anonymous pages and it's
> using flush_dcache_page(), it's using the wrong interface.

So, in order to fix this, which alternative would you suggest? Quite a
lot of work has been done regarding memory allocation, so I guess
changing that side is a no-go.

Greetings, Phil

-- 
Viprinet GmbH
Mainzer Str. 43
55411 Bingen am Rhein
Germany

Zentrale:     +49-6721-49030-0
Durchwahl:    +49-6721-49030-134
Fax:          +49-6721-49030-209

phil.sutter@viprinet.com
http://www.viprinet.com

Sitz der Gesellschaft: Bingen am Rhein
Handelsregister: Amtsgericht Mainz HRB40380
Geschäftsführer: Simon Kissel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: FW: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-02 16:49           ` chetan loke
@ 2011-09-06  9:44             ` Phil Sutter
  0 siblings, 0 replies; 19+ messages in thread
From: Phil Sutter @ 2011-09-06  9:44 UTC (permalink / raw)
  To: chetan loke; +Cc: netdev, linux, davem, linux-arm-kernel

On Fri, Sep 02, 2011 at 12:49:47PM -0400, chetan loke wrote:
> On Fri, Sep 2, 2011 at 11:31 AM, Phil Sutter <phil.sutter@viprinet.com> wrote:
> 
> > So far we haven't noticed problems in that direction. I just tried some
> > explicit test: having tcpdump print local timestamps (not the pcap-ones)
> > on every received packet, activating icmp_echo_ignore_all and pinging
> > the host on a dedicated line. I expected to sometimes see a second
> > difference between the two timestamps, as like with sending from time to
> > time a packet should get "lost" in the cache, and then occur to
> > userspace after the next one arrived. Maybe my test is broken, or RX is
> > indeed unaffected.
> >
> 
> You will need high traffic rate. If interested, you could try
> pktgen(with varying packet-load). Keep the packet-payload under 1500
> bytes (don't send jumbo frames) unless you have the following fix:
> commit cc9f01b246ca8e4fa245991840b8076394f86707

Hmm. I don't really get your point here: with higher traffic rates, the
bug should be even harder to identify. Assuming the same behaviour as
for TX, of course. There are no packets lost, just not immediately
transmitted (or never, if it's the last packet to be sent). This is how
it goes:
1) userspace places packet into TX_RING, calls sendto()
2) kernel does not see the packet, reads TP_STATUS_AVAILABLE for the
   given field from the cache
3) userspace places second packet into TX_RING (after the first one,
   since it knows it's there)
4) something happens that makes caches flush
5) userspace calls sendto()
5) kernel sees two packets to be transmitted, sends them out

So analogous for RX, this should mean:
1) tcpdump runs pcap_loop() (which, according to strace, calls poll()
   with a timeout of 1s)
2) kernel receives packet, puts it into RX_RING, sets POLLIN
3) tcpdump's poll() returns, but an unmodified RX_RING is seen
4) something happens that makes caches flush
5) (2) happens again
6) tcpdump's poll() returns, two packets are seen in RX_RING

My tests on TX-side show that this "something that makes caches flush"
actually happens quite frequently. But nevertheless, when receiving a
packet once a second, I'm expecting to occasionally see no packet in two
seconds, and then two in the following. The higher the packet rate, the
harder it should be to notice this phenomenon.

> Your Tx path is working because flush_cache_call gets triggered before
> flush_dcache_page. On the Rx path, since you don't have that
> workaround, you will eventually(it's just a matter of time) see this
> problem.

So you say if I called flush_cache_all() _after_ flush_dcache_page() it
wouldn't work?

> Or, delete your patch and try this workaround (in
> __packet_get/set_status) and you may be able to cover both Tx and Rx
> paths.

Oh great, thanks a lot for improving my ugly hacks! :)

Greetings, Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-05 19:57         ` Phil Sutter
@ 2011-09-06  9:57           ` Russell King - ARM Linux
  2011-09-06 11:05             ` Phil Sutter
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2011-09-06  9:57 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Ben Hutchings, netdev, David S. Miller, linux-arm-kernel

On Mon, Sep 05, 2011 at 09:57:14PM +0200, Phil Sutter wrote:
> Hi,
> 
> On Fri, Sep 02, 2011 at 06:28:50PM +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote:
> > > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote:
> > > > This flushes the cache before and after accessing the mmapped packet
> > > > buffer. It seems like the call to flush_dcache_page from inside
> > > > __packet_get_status is not enough on Kirkwood (or ARM in general).
> > > > ---
> > > > I know this is far from an optimal solution, but it's in fact the only working
> > > > one I found.
> > > [...]
> > > 
> > > This is ridiculous.  If flush_dcache_page() isn't doing everything it
> > > should, you need to fix that.
> > 
> > It does do everything it should - which is to perform maintanence on
> > page cache pages.  It flushes the kernel mapping of the page.  It
> > also flushes the userspace mappings of the page which it finds by
> > walking the mmap list via the associated struct page.  It does not
> > touch vmalloc mappings because it has no way to know whether they
> > exist or not.
> > 
> > It doesn't do so much for anonymous pages - to do so would only
> > duplicate what flush_anon_page() does at the very same callsites.
> > Plus the mmap list isn't available for such pages so there's no
> > way to find out what userspace addresses to flush.
> 
> Indeed very interesting information, thanks a lot!
> 
> The code in question uses __get_free_pages(), and if that fails uses
> vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths
> show result in the same faulty behaviour.

So, what you're wanting is cache coherency between vmalloc() and
userspace.  There is no API in the kernel to do that, and you'll see
the same failures of this interface not only on ARM but also other
architectures with virtual caches.

It sounds like we need an API to flush the cache using both the
userspace address, plus the kernel side address be that in the direct
map or the vmalloc map areas.

Or maybe the right solution is to simply disable AF_PACKET MMAP support
for virtual cached architectures - it may be that adding cache flushing
calls makes the thing too expensive and the benefits of mmap over normal
read/write are lost.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_packet: flush complete kernel cache in packet_sendmsg
  2011-09-06  9:57           ` Russell King - ARM Linux
@ 2011-09-06 11:05             ` Phil Sutter
  0 siblings, 0 replies; 19+ messages in thread
From: Phil Sutter @ 2011-09-06 11:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Ben Hutchings, netdev, David S. Miller, linux-arm-kernel

On Tue, Sep 06, 2011 at 10:57:22AM +0100, Russell King - ARM Linux wrote:
> > The code in question uses __get_free_pages(), and if that fails uses
> > vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths
> > show result in the same faulty behaviour.
> 
> So, what you're wanting is cache coherency between vmalloc() and
> userspace.  There is no API in the kernel to do that, and you'll see
> the same failures of this interface not only on ARM but also other
> architectures with virtual caches.
> 
> It sounds like we need an API to flush the cache using both the
> userspace address, plus the kernel side address be that in the direct
> map or the vmalloc map areas.
> 
> Or maybe the right solution is to simply disable AF_PACKET MMAP support
> for virtual cached architectures - it may be that adding cache flushing
> calls makes the thing too expensive and the benefits of mmap over normal
> read/write are lost.

OK, that's horrible. Of course we depend on just this combination to
work flawlessly, i.e. PACKET_MMAP && VIVT. :(

Another userspace-interface I'm working on uses a different solution:
memory is allocated in userspace and accessed from kernelspace using
get_user_pages(). I did not explicitly search for the earlier described
fault pattern, but we didn't notice any problem with this approach on
the very same hardware either. I already see myself writing TPACKET_V3.
;)

What do you think?

Greetings, Phil

-- 
Viprinet GmbH
Mainzer Str. 43
55411 Bingen am Rhein
Germany

Zentrale:     +49-6721-49030-0
Durchwahl:    +49-6721-49030-134
Fax:          +49-6721-49030-209

phil.sutter@viprinet.com
http://www.viprinet.com

Sitz der Gesellschaft: Bingen am Rhein
Handelsregister: Amtsgericht Mainz HRB40380
Geschäftsführer: Simon Kissel


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-09-06 11:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-08 13:06 ARM, AF_PACKET: caching problems on Marvell Kirkwood Phil Sutter
2011-05-05 14:11 ` Phil Sutter
2011-05-05 14:56   ` Eric Dumazet
2011-05-06 16:12     ` Phil Sutter
2011-05-05 19:46   ` Andrew Lunn
2011-05-06 16:17     ` Phil Sutter
2011-05-09  8:59       ` Phil Sutter
2011-05-25 10:32       ` Phil Sutter
2011-09-02 11:08   ` [PATCH] af_packet: flush complete kernel cache in packet_sendmsg Phil Sutter
2011-09-02 13:46     ` Ben Hutchings
2011-09-02 13:59       ` Phil Sutter
2011-09-02 17:28       ` Russell King - ARM Linux
2011-09-05 19:57         ` Phil Sutter
2011-09-06  9:57           ` Russell King - ARM Linux
2011-09-06 11:05             ` Phil Sutter
     [not found]     ` <D3F292ADF945FB49B35E96C94C2061B90A239361@nsmail.netscout.com>
2011-09-02 14:00       ` FW: " chetan loke
2011-09-02 15:31         ` Phil Sutter
2011-09-02 16:49           ` chetan loke
2011-09-06  9:44             ` Phil Sutter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).