Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/2] AF_PACKET fanout support
From: David Miller @ 2011-07-05  4:20 UTC (permalink / raw)
  To: victor; +Cc: netdev


This is a fully functional version, I've tested both hash and
load-balance modes successfully.  I plan to commit this to
net-next-2.6 very soon.

Below is a test program that other people can play with
if they want.  It basically creates 4 threads, and creates
an AF_PACKET fanout amongst them.  Each thread prints out
it's pid in parentheses every time it receives 10 packets.
After each thread processes 10,000 packets, it exits.

Try things like "./test eth0 hash", "./test eth0 lb", etc.

Signed-off-by: David S. Miller <davem@davemloft.net>

--------------------
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include <sys/types.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#include <unistd.h>

#include <linux/if_ether.h>
#include <linux/if_packet.h>

#include <net/if.h>

static const char *device_name;
static int fanout_type;
static int fanout_id;

#ifndef PACKET_FANOUT
#define PACKET_FANOUT		18
#define PACKET_FANOUT_HASH		0
#define PACKET_FANOUT_LB		1
#endif

static int setup_socket(void)
{
	int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
	struct sockaddr_ll ll;
	struct ifreq ifr;
	int fanout_arg;

	if (fd < 0) {
		perror("socket");
		return EXIT_FAILURE;
	}

	memset(&ifr, 0, sizeof(ifr));
	strcpy(ifr.ifr_name, device_name);
	err = ioctl(fd, SIOCGIFINDEX, &ifr);
	if (err < 0) {
		perror("SIOCGIFINDEX");
		return EXIT_FAILURE;
	}

	memset(&ll, 0, sizeof(ll));
	ll.sll_family = AF_PACKET;
	ll.sll_ifindex = ifr.ifr_ifindex;
	err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
	if (err < 0) {
		perror("bind");
		return EXIT_FAILURE;
	}

	fanout_arg = (fanout_id | (fanout_type << 16));
	err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
			 &fanout_arg, sizeof(fanout_arg));
	if (err) {
		perror("setsockopt");
		return EXIT_FAILURE;
	}

	return fd;
}

static void fanout_thread(void)
{
	int fd = setup_socket();
	int limit = 10000;

	if (fd < 0)
		exit(fd);

	while (limit-- > 0) {
		char buf[1600];
		int err;

		err = read(fd, buf, sizeof(buf));
		if (err < 0) {
			perror("read");
			exit(EXIT_FAILURE);
		}
		if ((limit % 10) == 0)
			fprintf(stdout, "(%d) \n", getpid());
	}

	fprintf(stdout, "%d: Received 10000 packets\n", getpid());

	close(fd);
	exit(0);
}

int main(int argc, char **argp)
{
	int fd, err;
	int i;

	if (argc != 3) {
		fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
		return EXIT_FAILURE;
	}

	if (!strcmp(argp[2], "hash"))
		fanout_type = PACKET_FANOUT_HASH;
	else if (!strcmp(argp[2], "lb"))
		fanout_type = PACKET_FANOUT_LB;
	else {
		fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
		exit(EXIT_FAILURE);
	}

	device_name = argp[1];
	fanout_id = getpid() & 0xffff;

	for (i = 0; i < 4; i++) {
		pid_t pid = fork();

		switch (pid) {
		case 0:
			fanout_thread();

		case -1:
			perror("fork");
			exit(EXIT_FAILURE);
		}
	}

	for (i = 0; i < 4; i++) {
		int status;

		wait(&status);
	}

	return 0;
}

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Alexey Zaytsev @ 2011-07-05  4:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <CAB9v_DGSFAG9V0jqem+tDP3G-N8v6Z+_6oKdPwL-ZwhfhCOZnw@mail.gmail.com>

On Tue, Jul 5, 2011 at 08:17, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 08:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
>>> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
>>> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > >
>>> > > I dont care about duplicate acks at this point.
>>> > >
>>> > > Thats a separate issue (TCP layer)
>>> > >
>>> >
>>> > Maybe some tx packets are just sent out more then once? Or a single
>>> > packet is sent out instead of some other packets?
>>> > The delays between two dups is short, and they come in bursts, up to a
>>> > few hundreds of duplicate packets at a time.
>>> >
>>>
>>> Thats a completely different problem. SSH is very expensive for your
>>> receiver (your dump1 file has small packets (560 bytes)), and it cannot
>>> cope with the stress.
>>>
>>> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
>>> packets at once. This sure is a problem for tcp, as it stalls the thing.
>>>
>>> You could avoid this by doing this at b44 machine (the receiver)
>>>
>>> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
>>>
>>> So that sender wont be able to push so many packets
>>
>> You can also try using more packets in rx ring : (default is 200
>> packets, limit ~511)
>>
>> ethtool -G eth0 rx 400
>>
>>
> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> out by the b44 machine within 30 milliseconds.

In dump1.pcap, that is.

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Alexey Zaytsev @ 2011-07-05  4:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309839258.2720.17.camel@edumazet-laptop>

On Tue, Jul 5, 2011 at 08:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
>> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
>> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > >
>> > > I dont care about duplicate acks at this point.
>> > >
>> > > Thats a separate issue (TCP layer)
>> > >
>> >
>> > Maybe some tx packets are just sent out more then once? Or a single
>> > packet is sent out instead of some other packets?
>> > The delays between two dups is short, and they come in bursts, up to a
>> > few hundreds of duplicate packets at a time.
>> >
>>
>> Thats a completely different problem. SSH is very expensive for your
>> receiver (your dump1 file has small packets (560 bytes)), and it cannot
>> cope with the stress.
>>
>> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
>> packets at once. This sure is a problem for tcp, as it stalls the thing.
>>
>> You could avoid this by doing this at b44 machine (the receiver)
>>
>> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
>>
>> So that sender wont be able to push so many packets
>
> You can also try using more packets in rx ring : (default is 200
> packets, limit ~511)
>
> ethtool -G eth0 rx 400
>
>
Check out starting at packet 302893. 383 _identical_ ACKs were sent
out by the b44 machine within 30 milliseconds.

>
>

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05  4:14 UTC (permalink / raw)
  To: Alexey Zaytsev
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309839068.2720.15.camel@edumazet-laptop>

Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > I dont care about duplicate acks at this point.
> > >
> > > Thats a separate issue (TCP layer)
> > >
> > 
> > Maybe some tx packets are just sent out more then once? Or a single
> > packet is sent out instead of some other packets?
> > The delays between two dups is short, and they come in bursts, up to a
> > few hundreds of duplicate packets at a time.
> > 
> 
> Thats a completely different problem. SSH is very expensive for your
> receiver (your dump1 file has small packets (560 bytes)), and it cannot
> cope with the stress.
> 
> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
> packets at once. This sure is a problem for tcp, as it stalls the thing.
> 
> You could avoid this by doing this at b44 machine (the receiver)
> 
> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
> 
> So that sender wont be able to push so many packets

You can also try using more packets in rx ring : (default is 200
packets, limit ~511)

ethtool -G eth0 rx 400




^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05  4:11 UTC (permalink / raw)
  To: Alexey Zaytsev
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <CAB9v_DFYGyYiXGfMCXn_WDeGTKz8BZPYBCuaDj_a+5VAG3Jn=g@mail.gmail.com>

Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I dont care about duplicate acks at this point.
> >
> > Thats a separate issue (TCP layer)
> >
> 
> Maybe some tx packets are just sent out more then once? Or a single
> packet is sent out instead of some other packets?
> The delays between two dups is short, and they come in bursts, up to a
> few hundreds of duplicate packets at a time.
> 

Thats a completely different problem. SSH is very expensive for your
receiver (your dump1 file has small packets (560 bytes)), and it cannot
cope with the stress.

You're filling the b44 rx ring, and b44 driver has no choice to zap 200
packets at once. This sure is a problem for tcp, as it stalls the thing.

You could avoid this by doing this at b44 machine (the receiver)

echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem

So that sender wont be able to push so many packets

> > Do you still have memory scribbles ?
> Yes.

OK

> 
> >
> > I wonder if the problem is not coming from the "fast recovery" added in
> > commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
> > (simplified))
> >
> 
> I've tested back to 2.6.27. I did not test all releases of course, so
> maybe this was fixed, and then broken again.




^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Alexey Zaytsev @ 2011-07-05  3:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309837443.2720.8.camel@edumazet-laptop>

On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 02:29 +0400, Alexey Zaytsev a écrit :
>> On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
>> > On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>> >>> On Mon, 4 Jul 2011 16:27:26 +0200
>> >>> Michael Büsch <m@bues.ch> wrote:
>> >>> > We do this in b43, which has exactly the same DMA engine.
>> >>>
>> >>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>> >>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>> >>>  descriptor ring pointer.)
>> >>
>> >> I am wondering what happens if RX ring is set to 64, and we receive
>> >> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>> >>
>> >> Alexey, could you try this patch please ?
>> >
>> > Sorry, did not help.
>> >
>>
>> Ran a few rounds of tcpdump. Seeing a significant number or duplicate
>> ACKs from the problematic machine. Not seeing them when testing
>> between this machine and an other linux box. Or the illumos machine
>> and the other linux box.
>>
>> Dumps are available here:
>>
>> http://zaytsev.su/tmp/caps/
>>
>> dump1-3 - between the problematic machine an the illumos box,
>> collected on illumos side. All show dups.
>> dump5 - between an other linux box and the illumos machine, no dups.
>> Collcted on the illumos side.
>> dump-linux - between 2 linux machines, collected on the
>> non-problematic side. No dups, no corruptions.
>>
>> 192.168.0.33 - the problematic machine.
>> 192.168.0.72 - the illumos machine.
>> 192.168.0.122 - an other linux machine.
>
> ??
>
> I dont care about duplicate acks at this point.
>
> Thats a separate issue (TCP layer)
>

Maybe some tx packets are just sent out more then once? Or a single
packet is sent out instead of some other packets?
The delays between two dups is short, and they come in bursts, up to a
few hundreds of duplicate packets at a time.

> Do you still have memory scribbles ?
Yes.

>
> I wonder if the problem is not coming from the "fast recovery" added in
> commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
> (simplified))
>

I've tested back to 2.6.27. I did not test all releases of course, so
maybe this was fixed, and then broken again.

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05  3:44 UTC (permalink / raw)
  To: Alexey Zaytsev
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <CAB9v_DFFX6cHAFBSOGDJRniehJ6pYD7Z5XG6ygTGHy8j=z+U0g@mail.gmail.com>

Le mardi 05 juillet 2011 à 02:29 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> > On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> >>> On Mon, 4 Jul 2011 16:27:26 +0200
> >>> Michael Büsch <m@bues.ch> wrote:
> >>> > We do this in b43, which has exactly the same DMA engine.
> >>>
> >>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
> >>>  But that's a bug. We should do a wmb() on the RX side before advancing the
> >>>  descriptor ring pointer.)
> >>
> >> I am wondering what happens if RX ring is set to 64, and we receive
> >> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
> >>
> >> Alexey, could you try this patch please ?
> >
> > Sorry, did not help.
> >
> 
> Ran a few rounds of tcpdump. Seeing a significant number or duplicate
> ACKs from the problematic machine. Not seeing them when testing
> between this machine and an other linux box. Or the illumos machine
> and the other linux box.
> 
> Dumps are available here:
> 
> http://zaytsev.su/tmp/caps/
> 
> dump1-3 - between the problematic machine an the illumos box,
> collected on illumos side. All show dups.
> dump5 - between an other linux box and the illumos machine, no dups.
> Collcted on the illumos side.
> dump-linux - between 2 linux machines, collected on the
> non-problematic side. No dups, no corruptions.
> 
> 192.168.0.33 - the problematic machine.
> 192.168.0.72 - the illumos machine.
> 192.168.0.122 - an other linux machine.

??

I dont care about duplicate acks at this point.

Thats a separate issue (TCP layer)

Do you still have memory scribbles ?

I wonder if the problem is not coming from the "fast recovery" added in
commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
(simplified))

Maybe we should do instead a fast dequeue of packets (recycling them
instead of pushing them to upper stack) in case too many packets are
ready to be delivered, and always make sure NIC has a reserve of
available buffers for DMA accesses, before it can assert ISTAT_RFO




^ permalink raw reply

* Re: [PATCH net-next] net/wireless: ipw2x00: Use helpers from linux/etherdevice.h
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: linville, netdev, linux-wireless
In-Reply-To: <1309773622-28510-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 12:00:22 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: vxge: Use is_multicast_ether_addr helper
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: jdmason, netdev
In-Reply-To: <1309773484-27514-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:58:04 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ewrk3: Use helpers from linux/etherdevice.h
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: netdev
In-Reply-To: <1309773382-26609-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:56:22 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: depca: Use helpers from linux/etherdevice.h
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: netdev
In-Reply-To: <1309773353-26210-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:55:53 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: rionet: Use is_multicast_ether_addr
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: netdev
In-Reply-To: <1309773294-25395-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:54:54 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: iseries_veth: Use is_unicast_ether_addr helper
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: netdev
In-Reply-To: <1309773123-24049-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:52:03 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: de4x5: Use helpers from linux/etherdevice.h
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: grundler, netdev
In-Reply-To: <1309772893-22585-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:48:13 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: igb: Use is_multicast_ether_addr helper
From: David Miller @ 2011-07-05  3:31 UTC (permalink / raw)
  To: tklauser; +Cc: e1000-devel, netdev
In-Reply-To: <1309773015-23598-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:50:15 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH net-next] net: e1000e: Use is_multicast_ether_addr helper
From: David Miller @ 2011-07-05  3:30 UTC (permalink / raw)
  To: tklauser; +Cc: e1000-devel, netdev
In-Reply-To: <1309772824-22157-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon,  4 Jul 2011 11:47:04 +0200

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCHv2 0/2] Minor documentation updates for ip-related tunables
From: David Miller @ 2011-07-05  2:27 UTC (permalink / raw)
  To: makc; +Cc: linux-sctp, netdev
In-Reply-To: <20110705015718.32777815DBB1@regina.usersys.redhat.com>

From: Max Matveev <makc@redhat.com>
Date: Tue,  5 Jul 2011 11:57:18 +1000 (EST)

> Confusion about the way SCTP uses its rmem and wmem tunables
> prompted a documentation revision.
> 
> v2: incorporate suggestions by Shan Wei and Neil Horman.
> 
> Max Matveev (2):
>   Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
>   Update documented default values for various TCP/UDP tunables

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH v2 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Shan Wei @ 2011-07-05  2:01 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-sctp, netdev
In-Reply-To: <20110705015723.8B226815DBB1@regina.usersys.redhat.com>

Max Matveev wrote, at 06/20/2011 04:08 PM:
> sctp does not use second and third ("default" and "max") values
> of sctp_rmem tunable. The format is the same as tcp_rmem
> but the meaning is different so make the documentation explicit to
> avoid confusion.
> 
> sctp_wmem is not used at all.
> 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Max Matveev <makc@redhat.com>

Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>


-- 
Best Regards
-----
Shan Wei

^ permalink raw reply

* Re: [PATCH 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Max Matveev @ 2011-07-05  2:00 UTC (permalink / raw)
  To: Neil Horman; +Cc: linux-sctp, netdev
In-Reply-To: <20110704145454.GA10310@hmsreliant.think-freely.org>

On Mon, 4 Jul 2011 10:54:54 -0400, Neil Horman wrote:

 nhorman> On Mon, Jun 20, 2011 at 06:08:10PM +1000, Max Matveev wrote:

 >> sctp_rmem - vector of 3 INTEGERs: min, default, max
 >> -	See tcp_rmem for a description.
 >> +	Only the first value ("min") is used, "default" and "max" are
 >> +	ignored and may be removed in the future versions.
 >> +

 nhorman> Its accurate to say that only the first value is usd
 nhorman> currently, but because of the way this sysctl is contructed
 nhorman> (its used by the sysctl_rmem pointer in the sctp_prot
 nhorman> struct, which expects an array of three integers in the
 nhorman> commong __sk_mem_schedule function), we wont' be removing
 nhorman> the other two values.

Technically it can be just a single integer - UDP does use it 
that way but I'm not going to argue, v2 of the patch removed
that bit.

max

^ permalink raw reply

* Re: [PATCH 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Max Matveev @ 2011-07-05  1:58 UTC (permalink / raw)
  To: Shan Wei; +Cc: linux-sctp, netdev
In-Reply-To: <4E126A1E.7070300@cn.fujitsu.com>

On Tue, 05 Jul 2011 09:34:22 +0800, Shan Wei wrote:

 shanwei> Max Matveev wrote, at 06/20/2011 04:08 PM:
 >> sctp does not use second and third ("default" and "max") values
 >> of sctp_(r|w)mem tunables. 

 shanwei> Avoid confusion, but you introduced new confusion.
 shanwei> Hope that you also can correct your changelog in next version. 
Done - just send v2 of the patch.

max

^ permalink raw reply

* [PATCH v2 2/2] Update documented default values for various TCP/UDP tunables
From: Max Matveev @ 2011-06-22  7:18 UTC (permalink / raw)
  To: linux-sctp; +Cc: netdev

tcp_rmem and tcp_wmem use 1 page as default value for the minimum
amount of memory to be used, same as udp_wmem_min and udp_rmem_min.
Pages are different size on different architectures - use the right
units when describing the defaults.

Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Max Matveev <makc@redhat.com>
---
 Documentation/networking/ip-sysctl.txt |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ce09f83..940a404 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -394,7 +394,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
 	min: Minimal size of receive buffer used by TCP sockets.
 	It is guaranteed to each TCP socket, even under moderate memory
 	pressure.
-	Default: 8K
+	Default: 1 page
 
 	default: initial size of receive buffer used by TCP sockets.
 	This value overrides net.core.rmem_default used by other protocols.
@@ -483,7 +483,7 @@ tcp_window_scaling - BOOLEAN
 tcp_wmem - vector of 3 INTEGERs: min, default, max
 	min: Amount of memory reserved for send buffers for TCP sockets.
 	Each TCP socket has rights to use it due to fact of its birth.
-	Default: 4K
+	Default: 1 page
 
 	default: initial size of send buffer used by TCP sockets.  This
 	value overrides net.core.wmem_default used by other protocols.
@@ -553,13 +553,13 @@ udp_rmem_min - INTEGER
 	Minimal size of receive buffer used by UDP sockets in moderation.
 	Each UDP socket is able to use the size for receiving data, even if
 	total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
-	Default: 4096
+	Default: 1 page
 
 udp_wmem_min - INTEGER
 	Minimal size of send buffer used by UDP sockets in moderation.
 	Each UDP socket is able to use the size for sending data, even if
 	total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
-	Default: 4096
+	Default: 1 page
 
 CIPSOv4 Variables:
 
-- 
1.7.3.3


^ permalink raw reply related

* [PATCH v2 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Max Matveev @ 2011-06-20  8:08 UTC (permalink / raw)
  To: linux-sctp; +Cc: netdev

sctp does not use second and third ("default" and "max") values
of sctp_rmem tunable. The format is the same as tcp_rmem
but the meaning is different so make the documentation explicit to
avoid confusion.

sctp_wmem is not used at all.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Max Matveev <makc@redhat.com>
---
 Documentation/networking/ip-sysctl.txt |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index d3d653a..ce09f83 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1465,10 +1465,17 @@ sctp_mem - vector of 3 INTEGERs: min, pressure, max
 	Default is calculated at boot time from amount of available memory.
 
 sctp_rmem - vector of 3 INTEGERs: min, default, max
-	See tcp_rmem for a description.
+	Only the first value ("min") is used, "default" and "max" are
+	ignored.
+
+	min: Minimal size of receive buffer used by SCTP socket.
+	It is guaranteed to each SCTP socket (but not association) even 
+	under moderate memory pressure.
+
+	Default: 1 page
 
 sctp_wmem  - vector of 3 INTEGERs: min, default, max
-	See tcp_wmem for a description.
+	Currently this tunable has no effect.
 
 addr_scope_policy - INTEGER
 	Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
-- 
1.7.3.3


^ permalink raw reply related

* [PATCHv2 0/2] Minor documentation updates for ip-related tunables
From: Max Matveev @ 2011-07-05  1:57 UTC (permalink / raw)
  To: linux-sctp; +Cc: netdev

Confusion about the way SCTP uses its rmem and wmem tunables
prompted a documentation revision.

v2: incorporate suggestions by Shan Wei and Neil Horman.

Max Matveev (2):
  Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
  Update documented default values for various TCP/UDP tunables

 Documentation/networking/ip-sysctl.txt |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

-- 
1.7.3.3


^ permalink raw reply

* Re: [PATCH 2/2] Update documented default values for various TCP/UDP tunables
From: Shan Wei @ 2011-07-05  1:36 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-sctp, netdev
In-Reply-To: <20110704083616.43F2C8156C57@regina.usersys.redhat.com>

Max Matveev wrote, at 06/22/2011 03:18 PM:
> tcp_rmem and tcp_wmem use 1 page as default value for the minimum
> amount of memory to be used, same as udp_wmem_min and udp_rmem_min.
> Pages are different size on different architectures - use the right
> units when describing the defaults.
> 
> Signed-off-by: Max Matveev <makc@redhat.com>

Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>


-- 
Best Regards
-----
Shan Wei

^ permalink raw reply

* Re: [PATCH 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Shan Wei @ 2011-07-05  1:34 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-sctp, netdev
In-Reply-To: <20110704083605.AF9C28156C57@regina.usersys.redhat.com>

Max Matveev wrote, at 06/20/2011 04:08 PM:
> sctp does not use second and third ("default" and "max") values
> of sctp_(r|w)mem tunables. 

Avoid confusion, but you introduced new confusion.
Hope that you also can correct your changelog in next version. 


-- 
Best Regards
-----
Shan Wei

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox