Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 01/17] mm: Serialize access to min_free_kbytes
From: Mel Gorman @ 2012-05-10 13:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown,
	Peter Zijlstra, Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1336657510-24378-1-git-send-email-mgorman@suse.de>

There is a race between the min_free_kbytes sysctl, memory hotplug
and transparent hugepage support enablement.  Memory hotplug uses a
zonelists_mutex to avoid a race when building zonelists. Reuse it to
serialise watermark updates.

[a.p.zijlstra@chello.nl: Older patch fixed the race with spinlock]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c |   23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a712fb9..280eabe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4976,14 +4976,7 @@ static void setup_per_zone_lowmem_reserve(void)
 	calculate_totalreserve_pages();
 }
 
-/**
- * setup_per_zone_wmarks - called when min_free_kbytes changes
- * or when memory is hot-{added|removed}
- *
- * Ensures that the watermark[min,low,high] values for each zone are set
- * correctly with respect to min_free_kbytes.
- */
-void setup_per_zone_wmarks(void)
+static void __setup_per_zone_wmarks(void)
 {
 	unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
 	unsigned long lowmem_pages = 0;
@@ -5038,6 +5031,20 @@ void setup_per_zone_wmarks(void)
 	calculate_totalreserve_pages();
 }
 
+/**
+ * setup_per_zone_wmarks - called when min_free_kbytes changes
+ * or when memory is hot-{added|removed}
+ *
+ * Ensures that the watermark[min,low,high] values for each zone are set
+ * correctly with respect to min_free_kbytes.
+ */
+void setup_per_zone_wmarks(void)
+{
+	mutex_lock(&zonelists_mutex);
+	__setup_per_zone_wmarks();
+	mutex_unlock(&zonelists_mutex);
+}
+
 /*
  * The inactive anon list should be small enough that the VM never has to
  * do too much work, but large enough that each inactive page has a chance
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 00/17] Swap-over-NBD without deadlocking V10
From: Mel Gorman @ 2012-05-10 13:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown,
	Peter Zijlstra, Mike Christie, Eric B Munson, Mel Gorman

Changelog since V9
  o Rebase to 3.4-rc5
  o Clarify comment on why PF_MEMALLOC is cleared in softirq handling (akpm)
  o Only set page->pfmemalloc if ALLOC_NO_WATERMARKS was required     (rientjes)

Changelog since V8
  o Rebase to 3.4-rc2
  o Use page flag instead of slab fields to keep structures the same size
  o Properly detect allocations from softirq context that use PF_MEMALLOC
  o Ensure kswapd does not sleep while processes are throttled
  o Do not accidentally throttle !_GFP_FS processes indefinitely

Changelog since V7
  o Rebase to 3.3-rc2
  o Take greater care propagating page->pfmemalloc to skb
  o Propagate pfmemalloc from netdev_alloc_page to skb where possible
  o Release RCU lock properly on preempt kernel

Changelog since V6
  o Rebase to 3.1-rc8
  o Use wake_up instead of wake_up_interruptible()
  o Do not throttle kernel threads
  o Avoid a potential race between kswapd going to sleep and processes being
    throttled

Changelog since V5
  o Rebase to 3.1-rc5

Changelog since V4
  o Update comment clarifying what protocols can be used		(Michal)
  o Rebase to 3.0-rc3

Changelog since V3
  o Propogate pfmemalloc from packet fragment pages to skb		(Neil)
  o Rebase to 3.0-rc2

Changelog since V2
  o Document that __GFP_NOMEMALLOC overrides __GFP_MEMALLOC		(Neil)
  o Use wait_event_interruptible					(Neil)
  o Use !! when casting to bool to avoid any possibilitity of type
    truncation								(Neil)
  o Nicer logic when using skb_pfmemalloc_protocol			(Neil)

Changelog since V1
  o Rebase on top of mmotm
  o Use atomic_t for memalloc_socks		(David Miller)
  o Remove use of sk_memalloc_socks in vmscan	(Neil Brown)
  o Check throttle within prepare_to_wait	(Neil Brown)
  o Add statistics on throttling instead of printk

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon. Swap over the network is considered as an option in diskless
systems. The two likely scenarios are when blade servers are used as part
of a cluster where the form factor or maintenance costs do not allow the
use of disks and thin clients.

The Linux Terminal Server Project recommends the use of the
Network Block Device (NBD) for swap according to the manual at
https://sourceforge.net/projects/ltsp/files/Docs-Admin-Guide/LTSPManual.pdf/download
There is also documentation and tutorials on how to setup swap over NBD
at places like https://help.ubuntu.com/community/UbuntuLTSP/EnableNBDSWAP
The nbd-client also documents the use of NBD as swap. Despite this, the
fact is that a machine using NBD for swap can deadlock within minutes if
swap is used intensively. This patch series addresses the problem.

The core issue is that network block devices do not use mempools like
normal block devices do. As the host cannot control where they receive
packets from, they cannot reliably work out in advance how much memory
they might need. Some years ago, Peter Ziljstra developed a series of
patches that supported swap over an NFS that at least one distribution
is carrying within their kernels. This patch series borrows very heavily
from Peter's work to support swapping over NBD as a pre-requisite to
supporting swap-over-NFS. The bulk of the complexity is concerned with
preserving memory that is allocated from the PFMEMALLOC reserves for use
by the network layer which is needed for both NBD and NFS.

Patch 1 serialises access to min_free_kbytes. It's not strictly needed
	by this series but as the series cares about watermarks in
	general, it's a harmless fix. It could be merged independently
	and may be if CMA is merged in advance.

Patch 2 adds knowledge of the PFMEMALLOC reserves to SLAB and SLUB to
	preserve access to pages allocated under low memory situations
	to callers that are freeing memory.

Patch 3 introduces __GFP_MEMALLOC to allow access to the PFMEMALLOC
	reserves without setting PFMEMALLOC.

Patch 4 opens the possibility for softirqs to use PFMEMALLOC reserves
	for later use by network packet processing.

Patch 6 ignores memory policies when ALLOC_NO_WATERMARKS is set.

Patch 7 only sets page->pfmemalloc when ALLOC_NO_WATERMARKS was required

Patches 8-14 allows network processing to use PFMEMALLOC reserves when
	the socket has been marked as being used by the VM to clean pages. If
	packets are received and stored in pages that were allocated under
	low-memory situations and are unrelated to the VM, the packets
	are dropped.

	Patch 11 reintroduces __netdev_alloc_page which the networking
	folk may object to but is needed in some cases to propogate
	pfmemalloc from a newly allocated page to an skb. If there is a
	strong objection, this patch can be dropped with the impact being
	that swap-over-network will be slower in some cases but it should
	not fail.

Patch 14 is a micro-optimisation to avoid a function call in the
	common case.

Patch 15 tags NBD sockets as being SOCK_MEMALLOC so they can use
	PFMEMALLOC if necessary.

Patch 16 notes that it is still possible for the PFMEMALLOC reserve
	to be depleted. To prevent this, direct reclaimers get throttled on
	a waitqueue if 50% of the PFMEMALLOC reserves are depleted.  It is
	expected that kswapd and the direct reclaimers already running
	will clean enough pages for the low watermark to be reached and
	the throttled processes are woken up.

Patch 17 adds a statistic to track how often processes get throttled

Some basic performance testing was run using kernel builds, netperf
on loopback for UDP and TCP, hackbench (pipes and sockets), iozone
and sysbench. Each of them were expected to use the sl*b allocators
reasonably heavily but there did not appear to be significant
performance variances.

For testing swap-over-NBD, a machine was booted with 2G of RAM with a
swapfile backed by NBD. 8*NUM_CPU processes were started that create
anonymous memory mappings and read them linearly in a loop. The total
size of the mappings were 4*PHYSICAL_MEMORY to use swap heavily under
memory pressure.

Without the patches and using SLUB, the machine locks up within minutes and
runs to completion with them applied. With SLAB, the story is different
as an unpatched kernel run to completion. However, the patched kernel
completed the test 40% faster.

                                         3.4.0-rc2     3.4.0-rc2
                                      vanilla-slab     swapnbd
Sys Time Running Test (seconds)              87.90     73.45
User+Sys Time Running Test (seconds)         91.93     76.91
Total Elapsed Time (seconds)               4174.37   2953.96

 drivers/block/nbd.c                               |    6 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c          |    2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c        |    2 +-
 drivers/net/ethernet/intel/igb/igb_main.c         |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |    2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    3 +-
 drivers/net/usb/cdc-phonet.c                      |    2 +-
 drivers/usb/gadget/f_phonet.c                     |    2 +-
 include/linux/gfp.h                               |   13 +-
 include/linux/mm_types.h                          |    9 +
 include/linux/mmzone.h                            |    1 +
 include/linux/page-flags.h                        |   28 +++
 include/linux/sched.h                             |    7 +
 include/linux/skbuff.h                            |   83 +++++++-
 include/linux/vm_event_item.h                     |    1 +
 include/net/sock.h                                |   19 ++
 include/trace/events/gfpflags.h                   |    1 +
 kernel/softirq.c                                  |    9 +
 mm/page_alloc.c                                   |   69 +++++--
 mm/slab.c                                         |  212 +++++++++++++++++++--
 mm/slub.c                                         |   28 ++-
 mm/vmscan.c                                       |  131 ++++++++++++-
 mm/vmstat.c                                       |    1 +
 net/core/dev.c                                    |   52 ++++-
 net/core/filter.c                                 |    8 +
 net/core/skbuff.c                                 |   94 +++++++--
 net/core/sock.c                                   |   42 ++++
 net/ipv4/tcp.c                                    |    3 +-
 net/ipv4/tcp_output.c                             |   16 +-
 net/ipv6/tcp_ipv6.c                               |    8 +-
 30 files changed, 765 insertions(+), 91 deletions(-)

-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/2 net] 6lowpan: add missing pskb_may_pull() check
From: Eric Dumazet @ 2012-05-10 13:30 UTC (permalink / raw)
  To: alex.bluesman.smirnov; +Cc: davem, netdev
In-Reply-To: <4fabc166.9208cc0a.4de8.ffff9211@mx.google.com>

On Thu, 2012-05-10 at 17:22 +0400, alex.bluesman.smirnov@gmail.com
wrote:
> From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
> 
> Add pskb_may_pull() call when fetching u8 from skb.
> 
> Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
> ---
>  net/ieee802154/6lowpan.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
> index 32eb417..0ab3efe 100644
> --- a/net/ieee802154/6lowpan.c
> +++ b/net/ieee802154/6lowpan.c
> @@ -295,6 +295,8 @@ static u8 lowpan_fetch_skb_u8(struct sk_buff *skb)
>  {
>  	u8 ret;
>  
> +	BUG_ON(!pskb_may_pull(skb, 1));
> +
>  	ret = skb->data[0];
>  	skb_pull(skb, 1);
>  

No, you cant do that.

pskb_may_pull() can fail, and you crash your machine instead of graceful
error reporting.

^ permalink raw reply

* [PATCH net-next] 6lowpan: IPv6 link local address
From: Alexander Smirnov @ 2012-05-10 13:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, Alexander Smirnov

According to the RFC4944 (Transmission of IPv6 Packets over
IEEE 802.15.4 Networks), chapter 7:

The IPv6 link-local address [RFC4291] for an IEEE 802.15.4 interface
is formed by appending the Interface Identifier, as defined above, to
the prefix FE80::/64.

  10 bits            54 bits                  64 bits
+----------+-----------------------+----------------------------+
|1111111010|         (zeros)       |    Interface Identifier    |
+----------+-----------------------+----------------------------+

This patch adds IPv6 address generation support for the 6lowpan
interfaces.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/ipv6/addrconf.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index e3b3421..8b7f100 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -66,6 +66,7 @@
 #include <net/sock.h>
 #include <net/snmp.h>
 
+#include <net/af_ieee802154.h>
 #include <net/ipv6.h>
 #include <net/protocol.h>
 #include <net/ndisc.h>
@@ -1514,6 +1515,14 @@ static int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
 	return 0;
 }
 
+static int addrconf_ifid_eui64(u8 *eui, struct net_device *dev)
+{
+	if (dev->addr_len != IEEE802154_ADDR_LEN)
+		return -1;
+	memcpy(eui, dev->dev_addr, 8);
+	return 0;
+}
+
 static int addrconf_ifid_arcnet(u8 *eui, struct net_device *dev)
 {
 	/* XXX: inherit EUI-64 from other interface -- yoshfuji */
@@ -1577,6 +1586,8 @@ static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
 		return addrconf_ifid_sit(eui, dev);
 	case ARPHRD_IPGRE:
 		return addrconf_ifid_gre(eui, dev);
+	case ARPHRD_IEEE802154:
+		return addrconf_ifid_eui64(eui, dev);
 	}
 	return -1;
 }
@@ -2438,7 +2449,8 @@ static void addrconf_dev_config(struct net_device *dev)
 	    (dev->type != ARPHRD_FDDI) &&
 	    (dev->type != ARPHRD_IEEE802_TR) &&
 	    (dev->type != ARPHRD_ARCNET) &&
-	    (dev->type != ARPHRD_INFINIBAND)) {
+	    (dev->type != ARPHRD_INFINIBAND) &&
+	    (dev->type != ARPHRD_IEEE802154)) {
 		/* Alas, we support only Ethernet autoconfiguration. */
 		return;
 	}
-- 
1.7.2.3

^ permalink raw reply related

* Re: [PATCH 9/9] sunrpc: use SKB fragment destructors to delay completion until page is released by network stack.
From: Ian Campbell @ 2012-05-10 13:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Miller,
	Eric Dumazet, Neil Brown, J. Bruce Fields,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20120510111948.GA9609-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]

On Thu, 2012-05-10 at 12:19 +0100, Michael S. Tsirkin wrote:
> On Thu, May 03, 2012 at 03:56:11PM +0100, Ian Campbell wrote:
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index f6d8c73..1145929 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -198,7 +198,8 @@ int svc_send_common(struct socket *sock, struct xdr_buf *xdr,
> >  	while (pglen > 0) {
> >  		if (slen == size)
> >  			flags = 0;
> > -		result = kernel_sendpage(sock, *ppage, NULL, base, size, flags);
> > +		result = kernel_sendpage(sock, *ppage, xdr->destructor,
> > +					 base, size, flags);
> >  		if (result > 0)
> >  			len += result;
> >  		if (result != size)
> 
> So I tried triggering this by simply creating an nfs export on localhost
> and copying a large file out with dd, but this never seems to trigger
> this code.
> 
> Any idea how to test?

My test code, which is a bit overly complex for this because it also
tries to demonstrate corruption on the wire, is attached.

Using dd I suspect you probably need to increase the block size, and
possibly enable O_DIRECT (conv=direct?)

My typical scenario has been to mount a remote NFS and run
tcpdump -s 4096 -x -ne -v -i eth0 'host $client and ip[184:4] == 0x55555555'
to watch for on-wire corruption.

FYI I've triggered a BUG_ON in my local debug patches with your series
applied, I'm just investigating whether its my debugging or something in
the series which causes it.

After that I'll try it with local NFS and VMs with bridging etc to test
the extra aspects which your series is exercising.

Ian.

[-- Attachment #2: blktest3.c --]
[-- Type: text/x-csrc, Size: 1090 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <fcntl.h>

#include <err.h>
#include <errno.h>

#define NR 256
#define SIZE 4096

int main(int argc, char **argv)
{
	int fd, rc, n, iter = 0;
	const char *path;
	static unsigned char  __attribute__ ((aligned (4096))) buf[NR][SIZE];

	if (argc != 2) {
		fprintf(stderr, "usage: blktest2 [PATH]\n");
		exit(1);
	}
	path = argv[1];

	printf("opening %s for O_DIRECT access\n", path);
	fd = open(path, O_CREAT/*|O_DIRECT*/|O_RDWR, 0666);
	if (fd < 0)
		err(1,"unable to open file");

	while(1) {
		if ((iter%10)==0)
			printf("iteration %d ...", iter);
		if (lseek(fd, (iter%NR)*SIZE, SEEK_SET) < 0)
			err(1, "seek for write %d %d\n", iter, n);
		memset(buf[iter%NR], 0xaa, SIZE);
		rc = write(fd, buf[iter%NR], SIZE);
		memset(buf[iter%NR], 0x55, SIZE);
		if (rc == -1)
			//warn("write failed");
			err(1, "write failed");
		else if (rc != SIZE)
			err(1, "only wrote %d/%d bytes\n", rc, SIZE);
		if ((iter%10)==0)
			printf("\n", iter);

		iter++;
	}
}

^ permalink raw reply

* [PATCH 2/2 net] 6lowpan: fix hop limit compression
From: alex.bluesman.smirnov @ 2012-05-10 13:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, Alexander Smirnov, Tony Cheneau
In-Reply-To: <1336656163-19382-1-git-send-email-y>

From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>

Add missing pointer shift for the 'default' case.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Tony Cheneau <tony.cheneau+zigbeedev@amnesiak.org>
---
 net/ieee802154/6lowpan.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 0ab3efe..86f0013 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -492,6 +492,7 @@ static int lowpan_header_create(struct sk_buff *skb,
 		break;
 	default:
 		*hc06_ptr = hdr->hop_limit;
+		hc06_ptr += 1;
 		break;
 	}
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 1/2 net] 6lowpan: add missing pskb_may_pull() check
From: alex.bluesman.smirnov @ 2012-05-10 13:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1336656163-19382-1-git-send-email-y>

From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>

Add pskb_may_pull() call when fetching u8 from skb.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/ieee802154/6lowpan.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 32eb417..0ab3efe 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -295,6 +295,8 @@ static u8 lowpan_fetch_skb_u8(struct sk_buff *skb)
 {
 	u8 ret;
 
+	BUG_ON(!pskb_may_pull(skb, 1));
+
 	ret = skb->data[0];
 	skb_pull(skb, 1);
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 0/2 net] 6lowpan fixes
From: alex.bluesman.smirnov @ 2012-05-10 13:22 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hi David,

this patch set contains 2 fixes for the 6lowpan module. Please find detailed
description in the patch headers.

With best regards,
Alex

8<--

The following changes since commit 4b31b26441fb3c5f1e61ee13832358c09f8ca12d:

  6lowpan: duplicate definition of IEEE802154_ALEN (2012-04-26 13:02:33 +0400)

are available in the git repository at:
  http://github.com/linux-wsn/kernel.git 6lowpan_dev

Alexander Smirnov (2):
      6lowpan: add missing pskb_may_pull() check
      6lowpan: fix hop limit compression

 net/ieee802154/6lowpan.c |    3 +++

From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Subject: [PATCH 0/2 net] 6lowpan fixes
In-Reply-To: 

^ permalink raw reply

* Re: [PATCH] xfrm: take iphdr size into account for esp payload size calculation
From: Steffen Klassert @ 2012-05-10 12:18 UTC (permalink / raw)
  To: Benjamin Poirier
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, linux-kernel
In-Reply-To: <1336602952-10479-1-git-send-email-bpoirier@suse.de>

On Wed, May 09, 2012 at 06:35:52PM -0400, Benjamin Poirier wrote:
> 
> According to what is done, mainly in esp_output(), net_header_len aka
> sizeof(struct iphdr) must be taken into account before doing the alignment
> calculation.

Why do you need to take the ip header into account here? Your patch breaks
pmtu discovery, at least on tunnel mode with aes-sha1 (aes blocksize 16 bytes).

With your patch applied:

tracepath -n 192.168.1.2
 1?: [LOCALHOST]     pmtu 1442
 1:  send failed
 1:  send failed
     Resume: pmtu 1442

Without your patch:

tracepath -n 192.168.1.2
 1?: [LOCALHOST]     pmtu 1438
 1:  192.168.1.2       0.736ms reached
 1:  192.168.1.2       0.390ms reached
     Resume: pmtu 1438 hops 1 back 64 

Your patch increases the mtu by 4 bytes. Be aware that adding
one byte of payload may increase the packet size up to 16 bytes
in the case of aes, as we have to pad the encryption payload
always to a multiple of the cipher blocksize.

> -
> -	switch (x->props.mode) {
> -	case XFRM_MODE_TUNNEL:
> -		break;
> -	default:
> -	case XFRM_MODE_TRANSPORT:
> -		/* The worst case */
> -		mtu -= blksize - 4;
> -		mtu += min_t(u32, blksize - 4, rem);
> -		break;

Btw. why we are doing the calculation above for transport mode?

^ permalink raw reply

* Re: [PATCH 3/3] tcp: Out-line tcp_try_rmem_schedule
From: Eric Dumazet @ 2012-05-10 12:00 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List
In-Reply-To: <4FABAB7C.6040306@parallels.com>

On Thu, 2012-05-10 at 15:50 +0400, Pavel Emelyanov wrote:
> As proposed by Eric, make the tcp_input.o thinner.
> 
> add/remove: 1/1 grow/shrink: 1/4 up/down: 868/-1329 (-461)
> function                                     old     new   delta
> tcp_try_rmem_schedule                          -     864    +864
> tcp_ack                                     4811    4815      +4
> tcp_validate_incoming                        817     815      -2
> tcp_collapse                                 860     858      -2
> tcp_send_rcvq                                555     353    -202
> tcp_data_queue                              3435    3033    -402
> tcp_prune_queue                              721       -    -721
> 
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
> ---
>  net/ipv4/tcp_input.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: [PATCH 2/3] tcp: Schedule rmem for rcvq repair send
From: Eric Dumazet @ 2012-05-10 12:00 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List
In-Reply-To: <4FABAB69.70401@parallels.com>

On Thu, 2012-05-10 at 15:50 +0400, Pavel Emelyanov wrote:
> As noted by Eric, no checks are performed on the data size we're
> putting in the read queue during repair. Thus, validate the given
> data size with the common rmem management routine.
> 
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
> ---
>  net/ipv4/tcp_input.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 1/3] tcp: Move rcvq sending to tcp_input.c
From: Eric Dumazet @ 2012-05-10 12:00 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List
In-Reply-To: <4FABAB55.5050803@parallels.com>

On Thu, 2012-05-10 at 15:49 +0400, Pavel Emelyanov wrote:
> It actually works on the input queue and will use its read mem
> routines, thus it's better to have in in the tcp_input.c file.
> 
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
> ---
>  include/net/tcp.h    |    3 +--
>  net/ipv4/tcp.c       |   33 ---------------------------------
>  net/ipv4/tcp_input.c |   35 ++++++++++++++++++++++++++++++++++-
>  3 files changed, 35 insertions(+), 36 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH 3/3] tcp: Out-line tcp_try_rmem_schedule
From: Pavel Emelyanov @ 2012-05-10 11:50 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: Linux Netdev List
In-Reply-To: <4FABAB39.6090201@parallels.com>

As proposed by Eric, make the tcp_input.o thinner.

add/remove: 1/1 grow/shrink: 1/4 up/down: 868/-1329 (-461)
function                                     old     new   delta
tcp_try_rmem_schedule                          -     864    +864
tcp_ack                                     4811    4815      +4
tcp_validate_incoming                        817     815      -2
tcp_collapse                                 860     858      -2
tcp_send_rcvq                                555     353    -202
tcp_data_queue                              3435    3033    -402
tcp_prune_queue                              721       -    -721

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
 net/ipv4/tcp_input.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 164659f..b99ada2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4511,7 +4511,7 @@ static void tcp_ofo_queue(struct sock *sk)
 static int tcp_prune_ofo_queue(struct sock *sk);
 static int tcp_prune_queue(struct sock *sk);
 
-static inline int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
+static int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
 {
 	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
 	    !sk_rmem_schedule(sk, size)) {
-- 
1.5.5.6

^ permalink raw reply related

* [PATCH 2/3] tcp: Schedule rmem for rcvq repair send
From: Pavel Emelyanov @ 2012-05-10 11:50 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: Linux Netdev List
In-Reply-To: <4FABAB39.6090201@parallels.com>

As noted by Eric, no checks are performed on the data size we're
putting in the read queue during repair. Thus, validate the given
data size with the common rmem management routine.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
 net/ipv4/tcp_input.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7c6c99d..164659f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4769,6 +4769,9 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
 	struct tcphdr *th;
 	bool fragstolen;
 
+	if (tcp_try_rmem_schedule(sk, size + sizeof(*th)))
+		goto err;
+
 	skb = alloc_skb(size + sizeof(*th), sk->sk_allocation);
 	if (!skb)
 		goto err;
-- 
1.5.5.6

^ permalink raw reply related

* [PATCH 1/3] tcp: Move rcvq sending to tcp_input.c
From: Pavel Emelyanov @ 2012-05-10 11:49 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: Linux Netdev List
In-Reply-To: <4FABAB39.6090201@parallels.com>

It actually works on the input queue and will use its read mem
routines, thus it's better to have in in the tcp_input.c file.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
 include/net/tcp.h    |    3 +--
 net/ipv4/tcp.c       |   33 ---------------------------------
 net/ipv4/tcp_input.c |   35 ++++++++++++++++++++++++++++++++++-
 3 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 92faa6a..aaf5de9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -432,8 +432,7 @@ extern int tcp_disconnect(struct sock *sk, int flags);
 
 void tcp_connect_init(struct sock *sk);
 void tcp_finish_connect(struct sock *sk, struct sk_buff *skb);
-int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb,
-			       int hdrlen, bool *fragstolen);
+int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size);
 
 /* From syncookies.c */
 extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5654062..86e2cf2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -978,39 +978,6 @@ static inline int select_size(const struct sock *sk, bool sg)
 	return tmp;
 }
 
-static int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
-{
-	struct sk_buff *skb;
-	struct tcphdr *th;
-	bool fragstolen;
-
-	skb = alloc_skb(size + sizeof(*th), sk->sk_allocation);
-	if (!skb)
-		goto err;
-
-	th = (struct tcphdr *)skb_put(skb, sizeof(*th));
-	skb_reset_transport_header(skb);
-	memset(th, 0, sizeof(*th));
-
-	if (memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size))
-		goto err_free;
-
-	TCP_SKB_CB(skb)->seq = tcp_sk(sk)->rcv_nxt;
-	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + size;
-	TCP_SKB_CB(skb)->ack_seq = tcp_sk(sk)->snd_una - 1;
-
-	if (tcp_queue_rcv(sk, skb, sizeof(*th), &fragstolen)) {
-		WARN_ON_ONCE(fragstolen); /* should not happen */
-		__kfree_skb(skb);
-	}
-	return size;
-
-err_free:
-	kfree_skb(skb);
-err:
-	return -ENOMEM;
-}
-
 int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		size_t size)
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index eb58b94..7c6c99d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4746,7 +4746,7 @@ end:
 		skb_set_owner_r(skb, sk);
 }
 
-int tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int hdrlen,
+static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int hdrlen,
 		  bool *fragstolen)
 {
 	int eaten;
@@ -4763,6 +4763,39 @@ int tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int hdrlen,
 	return eaten;
 }
 
+int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
+{
+	struct sk_buff *skb;
+	struct tcphdr *th;
+	bool fragstolen;
+
+	skb = alloc_skb(size + sizeof(*th), sk->sk_allocation);
+	if (!skb)
+		goto err;
+
+	th = (struct tcphdr *)skb_put(skb, sizeof(*th));
+	skb_reset_transport_header(skb);
+	memset(th, 0, sizeof(*th));
+
+	if (memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size))
+		goto err_free;
+
+	TCP_SKB_CB(skb)->seq = tcp_sk(sk)->rcv_nxt;
+	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + size;
+	TCP_SKB_CB(skb)->ack_seq = tcp_sk(sk)->snd_una - 1;
+
+	if (tcp_queue_rcv(sk, skb, sizeof(*th), &fragstolen)) {
+		WARN_ON_ONCE(fragstolen); /* should not happen */
+		__kfree_skb(skb);
+	}
+	return size;
+
+err_free:
+	kfree_skb(skb);
+err:
+	return -ENOMEM;
+}
+
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 {
 	const struct tcphdr *th = tcp_hdr(skb);
-- 
1.5.5.6

^ permalink raw reply related

* [PATCH 0/3] tcp: Validate recv queue on repair and related stuff
From: Pavel Emelyanov @ 2012-05-10 11:49 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: Linux Netdev List

As noted by Eric, no checks are performed when repairing data in tcp
read queue. He also suggested that the tcp_try_rmem_schedule() gets
out-lined for this.

This set does both of the above, more details are in patch comments.

Applies to net-next.

Thanks,
Pavel

^ permalink raw reply

* Re: [PATCH 8/9] net: add paged frag destructor support to kernel_sendpage.
From: Michael S. Tsirkin @ 2012-05-10 11:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, David Miller, Eric Dumazet
In-Reply-To: <1336056971-7839-8-git-send-email-ian.campbell@citrix.com>

On Thu, May 03, 2012 at 03:56:10PM +0100, Ian Campbell wrote:
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 2d590ca..bee7864 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -822,8 +822,11 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>  	return mss_now;
>  }
>  
> -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
> -			 size_t psize, int flags)
> +static ssize_t do_tcp_sendpages(struct sock *sk,
> +				struct page **pages,
> +				struct skb_frag_destructor *destroy,
> +				int poffset,
> +				size_t psize, int flags)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	int mss_now, size_goal;
> @@ -870,7 +873,7 @@ new_segment:
>  			copy = size;
>  
>  		i = skb_shinfo(skb)->nr_frags;
> -		can_coalesce = skb_can_coalesce(skb, i, page, NULL, offset);
> +		can_coalesce = skb_can_coalesce(skb, i, page, destroy, offset);
>  		if (!can_coalesce && i >= MAX_SKB_FRAGS) {
>  			tcp_mark_push(tp, skb);
>  			goto new_segment;
> @@ -881,8 +884,9 @@ new_segment:
>  		if (can_coalesce) {
>  			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>  		} else {
> -			get_page(page);
>  			skb_fill_page_desc(skb, i, page, offset, copy);
> +			skb_frag_set_destructor(skb, i, destroy);
> +			skb_frag_ref(skb, i);
>  		}
>  
>  		skb->len += copy;
> @@ -937,18 +941,20 @@ out_err:
>  	return sk_stream_error(sk, flags, err);
>  }
>  
> -int tcp_sendpage(struct sock *sk, struct page *page, int offset,
> -		 size_t size, int flags)
> +int tcp_sendpage(struct sock *sk, struct page *page,
> +		 struct skb_frag_destructor *destroy,
> +		 int offset, size_t size, int flags)
>  {
>  	ssize_t res;
>  
>  	if (!(sk->sk_route_caps & NETIF_F_SG) ||
>  	    !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> -		return sock_no_sendpage(sk->sk_socket, page, offset, size,
> -					flags);
> +		return sock_no_sendpage(sk->sk_socket, page, destroy,
> +					offset, size, flags);
>  
>  	lock_sock(sk);
> -	res = do_tcp_sendpages(sk, &page, offset, size, flags);
> +	res = do_tcp_sendpages(sk, &page, destroy,
> +			       offset, size, flags);
>  	release_sock(sk);
>  	return res;
>  }


Sorry about making more noise but I realized there's
something I don't understand here.

Is it true that all this does is stick the destructor in the frag list?

If so, could this deadlock (or delay application significantly) if tcp
has queued the skb on the write queue but is not transmitting it, while
the application is waiting for pages to complete?

-- 
MST

^ permalink raw reply

* Re: [PATCH 9/9] sunrpc: use SKB fragment destructors to delay completion until page is released by network stack.
From: Michael S. Tsirkin @ 2012-05-10 11:19 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, David Miller, Eric Dumazet,
	Neil Brown, J. Bruce Fields, linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1336056971-7839-9-git-send-email-ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>

On Thu, May 03, 2012 at 03:56:11PM +0100, Ian Campbell wrote:
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index f6d8c73..1145929 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -198,7 +198,8 @@ int svc_send_common(struct socket *sock, struct xdr_buf *xdr,
>  	while (pglen > 0) {
>  		if (slen == size)
>  			flags = 0;
> -		result = kernel_sendpage(sock, *ppage, NULL, base, size, flags);
> +		result = kernel_sendpage(sock, *ppage, xdr->destructor,
> +					 base, size, flags);
>  		if (result > 0)
>  			len += result;
>  		if (result != size)

So I tried triggering this by simply creating an nfs export on localhost
and copying a large file out with dd, but this never seems to trigger
this code.

Any idea how to test?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netlink: connector: implement cn_netlink_reply
From: Alban Crequy @ 2012-05-10 10:39 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: Ben Hutchings, netdev, Vincent Sanders, Javier Martinez Canillas,
	Rodrigo Moya
In-Reply-To: <20120510004553.GB8362@ioremap.net>

On Thu, 10 May 2012 04:45:53 +0400,
Evgeniy Polyakov <zbr@ioremap.net> wrote :

> On Thu, May 10, 2012 at 01:20:48AM +0100, Ben Hutchings
> (bhutchings@solarflare.com) wrote:
> > On Wed, 2012-05-09 at 15:37 +0100, Alban Crequy wrote:
> > > In a connector callback, it was not possible to reply to a
> > > message only to a sender. This patch implements
> > > cn_netlink_reply(). It uses the connector socket to send an
> > > unicast netlink message back to the sender.
> > [...]
> > 
> > We try to avoid adding functions with no users.  You'll need to
> > submit the code that's intended to use this as well.
> 
> I have no objection against this patch, but as correctly stated it is
> useless without users. Alban, what is the code you want this
> functionality to be used in? Do you plan to submit it? Can you submit
> this change in the patch with your code?

The code to use the feature is not yet ready for submission and we will
add this patch to the front of that submission in due course.

We are just being good community members and making each patch
available early. Thanks for your feedback on this patch. Please let
me know if I can add any reviewed-by.

Alban

^ permalink raw reply

* [PATCH net-next 2/2] l2tp: fix data packet sequence number handling
From: James Chapman @ 2012-05-10  9:43 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1336642989-4785-1-git-send-email-jchapman@katalix.com>

If enabled, L2TP data packets have sequence numbers which a receiver
can use to drop out of sequence frames or try to reorder them. The
first frame has sequence number 0, but the L2TP code currently expects
it to be 1. This results in the first data frame being handled as out
of sequence.

This one-line patch fixes the problem.

Signed-off-by: James Chapman <jchapman@katalix.com>
---
 net/l2tp/l2tp_core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index d1ab3a2..0d6aedc 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1762,7 +1762,7 @@ struct l2tp_session *l2tp_session_create(int priv_size, struct l2tp_tunnel *tunn
 
 		session->session_id = session_id;
 		session->peer_session_id = peer_session_id;
-		session->nr = 1;
+		session->nr = 0;
 
 		sprintf(&session->name[0], "sess %u/%u",
 			tunnel->tunnel_id, session->session_id);
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH net-next 1/2] l2tp: fix reorder timeout recovery
From: James Chapman @ 2012-05-10  9:43 UTC (permalink / raw)
  To: netdev

When L2TP data packet reordering is enabled, packets are held in a
queue while waiting for out-of-sequence packets. If a packet gets
lost, packets will be held until the reorder timeout expires, when we
are supposed to then advance to the sequence number of the next packet
but we don't currently do so. As a result, the data channel is stuck
because we are waiting for a packet that will never arrive - all
packets age out and none are passed.

The fix is to add a flag to the session context, which is set when the
reorder timeout expires and tells the receive code to reset the next
expected sequence number to that of the next packet in the queue.

Tested in a production L2TP network with Starent and Nortel L2TP gear.

Signed-off-by: James Chapman <jchapman@katalix.com>
---
 net/l2tp/l2tp_core.c |    9 +++++++++
 net/l2tp/l2tp_core.h |    1 +
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 456b52d..d1ab3a2 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -428,6 +428,7 @@ start:
 			       session->name, L2TP_SKB_CB(skb)->ns,
 			       L2TP_SKB_CB(skb)->length, session->nr,
 			       skb_queue_len(&session->reorder_q));
+			session->reorder_skip = 1;
 			__skb_unlink(skb, &session->reorder_q);
 			kfree_skb(skb);
 			if (session->deref)
@@ -436,6 +437,14 @@ start:
 		}
 
 		if (L2TP_SKB_CB(skb)->has_seq) {
+			if (session->reorder_skip) {
+				PRINTK(session->debug, L2TP_MSG_SEQ, KERN_DEBUG,
+				       "%s: advancing nr to next pkt: %u -> %u",
+				       session->name, session->nr,
+				       L2TP_SKB_CB(skb)->ns);
+				session->reorder_skip = 0;
+				session->nr = L2TP_SKB_CB(skb)->ns;
+			}
 			if (L2TP_SKB_CB(skb)->ns != session->nr) {
 				PRINTK(session->debug, L2TP_MSG_SEQ, KERN_DEBUG,
 				       "%s: holding oos pkt %u len %d, "
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 0bf60fc..9002634 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -123,6 +123,7 @@ struct l2tp_session {
 						 * categories */
 	int			reorder_timeout; /* configured reorder timeout
 						  * (in jiffies) */
+	int			reorder_skip;	/* set if skip to next nr */
 	int			mtu;
 	int			mru;
 	enum l2tp_pwtype	pwtype;
-- 
1.7.0.4

^ permalink raw reply related

* Re: Fwd: Memory exhaust issue with only IPsec policies configured on continuous traffic
From: Nikhil Agarwal @ 2012-05-10  9:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, netdev, herbert, benjamin.thery, davem, pstaszewski
In-Reply-To: <1336627923.12504.128.camel@edumazet-glaptop>

If i disable this cache of DST, dst are working fine. But now XFRM
state is allocated for every incoming packet and then freed. But while
freeing the xfrm state same garbage collection logic is there. Now
since packets are coming continuously garbage collector may not get
scheduled and large amount of memory is stuck to be freed causing the
system to go into non-recoverable state.

It seems that ther should be some change garbage collection scheduling
logic or  some mechnism to decide wether to cache some entry or not.

On Thu, May 10, 2012 at 11:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-05-10 at 07:27 +0200, Eric Dumazet wrote:
>
>> Yep, we can use DST_NOCACHE
>>
>
> Please try following patch :
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 5773f5d..172c251 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2896,6 +2896,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
>        if (rt) {
>                struct dst_entry *new = &rt->dst;
>
> +               new->flags |= DST_NOCACHE;
>                new->__use = 1;
>                new->input = dst_discard;
>                new->output = dst_discard;
>
>

^ permalink raw reply

* Re: [PATCH 07/14] batman-adv: split neigh_new function into generic and batman iv specific parts
From: Sven Eckelmann @ 2012-05-10  7:34 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, lindner_marek-LWAfsSFWpa4,
	David Miller
In-Reply-To: <20120509.204111.1931607501484626500.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

On Wednesday, May 09, 2012 08:41:11 PM David Miller wrote:
[...]
> The namespace pollution of the batman-adv code needs to improve,
> and I'm putting my foot down starting with this change.
> 
> If you have a static function which is therefore private to a
> source file, name it whatever you want.
> 
> But once it gets exported out of that file, you have to give it
> an appropriate name.  Probably with a "batman_adv_" prefix or
> similar.

I aggree, but would like to like to have a shorter prefix batadv_. I know that 
you said "or similar" but there are still some developers that fear your 
response to a patch that only adds the prefix batadv_ instead of the longer 
version.

Could you please approve or disapprove this proposal.

Thanks,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 00/13] net: Add and use ether_addr_equal
From: Emmanuel Grumbach @ 2012-05-10  7:08 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Joe Perches, David Miller, netdev, bridge, netfilter-devel,
	netfilter, coreteam, linux-wireless, linux-kernel,
	linux-bluetooth
In-Reply-To: <1336633015.4334.2.camel@jlt3.sipsolutions.net>

>> >
>> > > That case you didn't convert in mac80211 is probably the
>> > > bug Johannes was talking about which started this whole
>> > > discussion.
>> >
>> > The bug case that started it all is in net/wireless/scan.c and Emmanuel
>> > has since changed it back to memcmp(). Not sure if that's the one you
>> > were referring to or not :-)
>>
>> That's the one that I left alone.
>>
>> Post patch:
>> $ git grep -n -w compare_ether_addr net
>> net/batman-adv/main.h:198: * note: can't use compare_ether_addr() as it requires aligned memory
>> net/wireless/scan.c:381:        return compare_ether_addr(a->bssid, b->bssid);
>
> Ok, great, then that means the fix from Emmanuel won't conflict when it
> gets in.
>

Thanks Joe - as Johannes said this won't conflict with my patch. And
yes the code is now more clear.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 00/13] net: Add and use ether_addr_equal
From: Johannes Berg @ 2012-05-10  6:56 UTC (permalink / raw)
  To: Joe Perches
  Cc: netfilter, netdev, bridge, linux-wireless, linux-kernel,
	linux-bluetooth, coreteam, netfilter-devel, David Miller
In-Reply-To: <1336632860.22495.6.camel@joe2Laptop>

On Wed, 2012-05-09 at 23:54 -0700, Joe Perches wrote:
> On Thu, 2012-05-10 at 08:48 +0200, Johannes Berg wrote:
> > On Wed, 2012-05-09 at 21:21 -0400, David Miller wrote:
> > 
> > > That case you didn't convert in mac80211 is probably the
> > > bug Johannes was talking about which started this whole
> > > discussion.
> > 
> > The bug case that started it all is in net/wireless/scan.c and Emmanuel
> > has since changed it back to memcmp(). Not sure if that's the one you
> > were referring to or not :-)
> 
> That's the one that I left alone.
> 
> Post patch:
> $ git grep -n -w compare_ether_addr net
> net/batman-adv/main.h:198: * note: can't use compare_ether_addr() as it requires aligned memory
> net/wireless/scan.c:381:        return compare_ether_addr(a->bssid, b->bssid);

Ok, great, then that means the fix from Emmanuel won't conflict when it
gets in.

johannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox