Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH] af_packet: don't to defrag shared skb
From: Eric Leblond @ 2012-12-07 18:56 UTC (permalink / raw)
  To: netdev; +Cc: Eric Leblond

This patch is adding a check on skb before trying to defrag the
packet for the hash computation in fanout mode. The goal of this
patch is to avoid an kernel crash in pskb_expand_head.
It appears that under some specific condition there is a shared
skb reaching the defrag code and this lead to a crash due to the
following code:

	if (skb_shared(skb))
		BUG();

I've observed this crash under the following condition:
 1. a program is listening to an wifi interface (let say wlan0)
 2. it is using fanout capture in flow load balancing mode
 3. defrag option is on on the fanout socket
 4. the interface disconnect (radio down for example)
 5. the interface reconnect (radio switched up)
 6. once reconnected a single packet is seen with skb->users=2
 7. the kernel crash in pskb_expand_head at skbuff.c:1035

[BBB55:744364] [<ffffffff812a2761>] ? __pskb_pull_tail+0x43x0x26f
[BB8S5.744395] [<ffffffff812d29Tb>] ? ip_check_defrag+ox3a/0x14a
[BBB55.744422] [<ffffffffB1344459>] ? packet_rcv_fanout+ox5e/oxf9
[BBBS5.7444S0] [<ffffffffB12aaS9b>] ? __netif_receive_skb+ox444/ox4f9
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? netif_receive_skb+ox6d/0x?3
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? ieee80211_deliver_skb+0xbd/0xfa [mac80211]
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? ieee80211_rx_h_data+0x1e0/0x21a [mac80211]
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? ieee80211_rx_handlers+0x3d5/0x480 [mac80211]
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? __wake_up
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? evdev_eventr+0xc0/0xcf [evdev]

Signed-off-by: Eric Leblond <eric@regit.org>
---
 net/packet/af_packet.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index e639645..4b453f8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1110,7 +1110,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 	switch (f->type) {
 	case PACKET_FANOUT_HASH:
 	default:
-		if (f->defrag) {
+		if (f->defrag && !skb_shared(skb)) {
 			skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET);
 			if (!skb)
 				return 0;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [patch v2] bridge: make buffer larger in br_setlink()
From: Dan Carpenter @ 2012-12-07 18:53 UTC (permalink / raw)
  To: walter harms
  Cc: netdev, bridge, kernel-janitors, Thomas Graf, Stephen Hemminger,
	David S. Miller
In-Reply-To: <50C2143C.2010200@bfs.de>

On Fri, Dec 07, 2012 at 05:07:24PM +0100, walter harms wrote:
> 
> 
> Am 07.12.2012 12:10, schrieb Dan Carpenter:
> > We pass IFLA_BRPORT_MAX to nla_parse_nested() so we need
> > IFLA_BRPORT_MAX + 1 elements.  Also Smatch complains that we read past
> > the end of the array when in br_set_port_flag() when it's called with
> > IFLA_BRPORT_FAST_LEAVE.
> > 
> 
> 
> 
> I have no clue why nla_parse_nested() need IFLA_BRPORT_MAX elements.
> but the majory of loop look like
> for(i=0;i<max;++)
> most programmers will think this way.
> So it seems the place to fix is nla_parse_nested().
> doing not so is asking for trouble (in the long run).
> At least this function needs a big warning label that (max-1)
> is actually needed.
> 

Yeah, nla_parse_nested() is actually documented already.

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH v3 1/4] net: Add support for hardware-offloaded encapsulation
From: Joseph Gasparakis @ 2012-12-07 18:24 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Alexander Duyck, Joseph Gasparakis, davem, shemminger, chrisw,
	gospo, netdev, linux-kernel, dmitry, saeed.bishara,
	Peter P Waskiewicz Jr
In-Reply-To: <1354900385.2707.42.camel@bwh-desktop.uk.solarflarecom.com>



On Fri, 7 Dec 2012, Ben Hutchings wrote:

> On Fri, 2012-12-07 at 08:45 -0800, Alexander Duyck wrote:
> > On 12/07/2012 02:07 AM, Ben Hutchings wrote:
> > > On Thu, 2012-12-06 at 17:56 -0800, Joseph Gasparakis wrote:
> > >> This patch adds support in the kernel for offloading in the NIC Tx and Rx
> > >> checksumming for encapsulated packets (such as VXLAN and IP GRE).
> > > [...]
> > >> --- a/include/linux/netdevice.h
> > >> +++ b/include/linux/netdevice.h
> > >> @@ -1063,6 +1063,8 @@ struct net_device {
> > >>  	netdev_features_t	wanted_features;
> > >>  	/* mask of features inheritable by VLAN devices */
> > >>  	netdev_features_t	vlan_features;
> > >> +	/* mask of features inherited by encapsulating devices */
> > >> +	netdev_features_t	hw_enc_features;
> > > [...]
> > > 
> > > How will the networking core know *which* encapsulations this applies
> > > to?  I notice that your implementation in ixgbe does not set
> > > NETIF_F_HW_CSUM here, so presumably the hardware will parse headers to
> > > find which ranges should be checksummed and it won't cover the next
> > > encapsulation protocol that comes along.
> > > 
> > > Ben.
> > > 
> > 
> > Actually the offload is generic to any encapsulation that does not
> > compute a checksum on the inner headers.  So as long as you can treat
> > the outer headers as one giant L2 header you can pretty much ignore what
> > is in there as long as the inner network and transport header values are
> > set.  There are a number of tunnels that fall into that category since
> > most just use IP as the L2 and the L3 usually doesn't contain any checksum.
> 
> Yes, that should work, but it requires that the driver/hardware uses the
> header offsets from the skb rather than parsing the packet.  This is not
> currently required for devices with the NETIF_F_IP_CSUM and
> NETIF_F_IPV6_CSUM features.
> 
> Please do state explicitly which feature flags are valid in
> hw_enc_features, any changes in semantics, and in particular in what
> cases the driver/hardware is supposed to use header offsets from the skb
> vs parsing the packet.
> 
> Ben.
> 
So the idea here is that the driver will use the headers for checksumming 
if the skb->encapsulation bit is on. The bit should be set in the protocol 
driver.

To answer the second comment, the flags that we use in this series of 
patches is NETIF_F_IP_CSUM, NETIF_F_IPV6_CSUM and NETIF_F_SG. These are 
the bits that we propose will be used for checksumming of encapsulation. 
As per a previous comment in v2, the hw_enc_features field should be used 
also in the future when NICs have more encap offloads, so one could
indicate these features there from the driver.

Furthermore, I submitted a patch for Rx checksumming, where NETIF_F_RXCSUM 
is used, again in conjunction with skb->encapsulation flag. As I mention 
in my logs, the driver is expected to set the ip_summed to UNNECESSARY and 
turn the skb->encapsulation on, to indicate that the inner headers are 
already HW checksummed.

^ permalink raw reply

* Re: [PATCH RFC 0/5] Containerize syslog
From: Eric W. Biederman @ 2012-12-07 18:21 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: Andrew Morton, Rui Xiang, containers, netdev
In-Reply-To: <20121207142331.GC4004@sergelap>

Serge Hallyn <serge.hallyn@canonical.com> writes:

> Not as a separate justification admittedly, but the description was
> meant to explain it:  right now /dev/kmsg and sys_syslog are not safe
> and useful in a container;

The user namespace solves this the biggest practical problem, like it
solves so many other problems of excessive privileges in a container.

Since these patches are still mostly in the design
proof-of-concept/design phase I am inclined to see how getting a usable
user namespace affects the situation on the ground.

But I do think there are issues to be solved in some fashion.  We have
the possibiloity of configuring firewall logging rules that are not
usable in containers.   Similarly reasons for mount failures and number
of other cases go silent.

I think it makes some sense to change where we put things in the kernel
log to solve these things but it also makes sense to ask the question
is there a better solution.  Hopefully a little more experience with
these issues and time playing with ideas can make things clear.

Eric

^ permalink raw reply

* Re: [net-next PATCH] tun: correctly report an error in tun_flow_init()
From: David Miller @ 2012-12-07 18:21 UTC (permalink / raw)
  To: jasowang; +Cc: pmoore, netdev
In-Reply-To: <3078848.lXJIGSkO7f@jason-thinkpad-t430s>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 07 Dec 2012 13:51:47 +0800

> On Thursday, December 06, 2012 10:48:38 AM Paul Moore wrote:
>> On error, the error code from tun_flow_init() is lost inside
>> tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
>> error code to the "err" variable which is returned by
>> the tun_flow_init() function on error.
>> 
>> Signed-off-by: Paul Moore <pmoore@redhat.com>
> 
> Acked-by: Jason Wang <jasowang@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
From: David Miller @ 2012-12-07 18:20 UTC (permalink / raw)
  To: ncardwell; +Cc: edumazet, netdev
In-Reply-To: <1354808546-644-1-git-send-email-ncardwell@google.com>

From: Neal Cardwell <ncardwell@google.com>
Date: Thu,  6 Dec 2012 10:42:26 -0500

> Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
> instantiated for IPv4 traffic and in the SYN-RECV state were actually
> created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
> means that for such connections inet6_rsk(req) returns a pointer to a
> random spot in memory up to roughly 64KB beyond the end of the
> request_sock.
> 
> With this bug, for a server using AF_INET6 TCP sockets and serving
> IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
> inet_diag_fill_req() causing an oops or the export to user space of 16
> bytes of kernel memory as a garbage IPv6 address, depending on where
> the garbage inet6_rsk(req) pointed.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

Thanks for this fix, but it opens up more questions.

We don't seem to make any validations upon inet_diag_hostcond's
prefix_len.  That parameter we pass into bitstring_match() can
be just about anything.

As another example, what if we do an ipv6 128-bit compare on what's
actually an ipv4 address in the inet request sock?

I think we need to, using cond->family, make some kind of validations
upon cond->prefix_len.

^ permalink raw reply

* Re: [PATCH RFC 0/5] Containerize syslog
From: Eric W. Biederman @ 2012-12-07 18:05 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Rui Xiang, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <50C1FD9D.5020703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> I keep asking myself if it isn't the case of forwarding to a container
> all messages printed in process context. That will obviously exclude all
> messages resulting from kthreads - that will always be in the initial
> namespace anyway, interrupts, etc. There is no harm, for instance, in
> delivering the same message twice: one to the container, and the other
> to the host system.

Except that there is harm in double printing.  One of the better
justifications for doing something with the kernel log is that it is
possible to overflow the kernel log with operations performed
exclusively in a container.

I do think the idea of process context printks going to the current
container one worth playing with.

Eric

^ permalink raw reply

* Re: GPF in ip6_dst_lookup_tail
From: David Miller @ 2012-12-07 18:05 UTC (permalink / raw)
  To: davej; +Cc: netdev
In-Reply-To: <20121207180428.GA3884@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Fri, 7 Dec 2012 13:04:28 -0500

> On Fri, Dec 07, 2012 at 12:43:15PM -0500, David Miller wrote:
>  > From: Dave Jones <davej@redhat.com>
>  > Date: Fri, 7 Dec 2012 09:15:25 -0500
>  > 
>  > > I just hit this gpf in overnight testing.
>  > 
>  > Hmmm, perhaps introduced by:
>  > 
>  > commit f950c0ecc78f745e490d615280e031de4dbb1306
>  > Author: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>  > Date:   Thu Sep 20 18:29:56 2012 +0000
>  > 
>  >     ipv6: fix return value check in fib6_add()
>  > 
>  > and fixed by:
>  > 
>  > commit 188c517a050ec5b123e72cab76ea213721e5bd9d
>  > Author: Lin Ming <mlin@ss.pku.edu.cn>
>  > Date:   Tue Sep 25 15:17:07 2012 +0000
>  > 
>  >     ipv6: return errno pointers consistently for fib6_add_1()
>  >     
> 
> That was in the build that I was running.
> (It was Linus' tree from last night, sorry I neglected to mention that)

Ok, it's something else then.

^ permalink raw reply

* Re: GPF in ip6_dst_lookup_tail
From: Dave Jones @ 2012-12-07 18:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121207.124315.106563528177056022.davem@davemloft.net>

On Fri, Dec 07, 2012 at 12:43:15PM -0500, David Miller wrote:
 > From: Dave Jones <davej@redhat.com>
 > Date: Fri, 7 Dec 2012 09:15:25 -0500
 > 
 > > I just hit this gpf in overnight testing.
 > 
 > Hmmm, perhaps introduced by:
 > 
 > commit f950c0ecc78f745e490d615280e031de4dbb1306
 > Author: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
 > Date:   Thu Sep 20 18:29:56 2012 +0000
 > 
 >     ipv6: fix return value check in fib6_add()
 > 
 > and fixed by:
 > 
 > commit 188c517a050ec5b123e72cab76ea213721e5bd9d
 > Author: Lin Ming <mlin@ss.pku.edu.cn>
 > Date:   Tue Sep 25 15:17:07 2012 +0000
 > 
 >     ipv6: return errno pointers consistently for fib6_add_1()
 >     

That was in the build that I was running.
(It was Linus' tree from last night, sorry I neglected to mention that)

	Dave

^ permalink raw reply

* Re: [PULL net-next] vhost: changes for 3.8
From: David Miller @ 2012-12-07 17:58 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, dinggnu, yongjun_wei
In-Reply-To: <20121206151800.GA3889@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 6 Dec 2012 17:18:00 +0200

> The following changes since commit b93196dc5af7729ff7cc50d3d322ab1a364aa14f:
> 
>   net: fix some compiler warning in net/core/neighbour.c (2012-12-05 21:50:37 -0500)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next

Pulled, thanks Michael.

In the future, please give me some descriptive text to add to the
merge commit message.

Thanks.

^ permalink raw reply

* Re: [PATCH] chelsio: remove get_clock and use ktime_get
From: David Miller @ 2012-12-07 17:56 UTC (permalink / raw)
  To: jang; +Cc: netdev
In-Reply-To: <1354785614-11468-1-git-send-email-jang@linux.vnet.ibm.com>

From: Jan Glauber <jang@linux.vnet.ibm.com>
Date: Thu,  6 Dec 2012 10:20:14 +0100

> The get_clock() of the chelsio driver clashes with the s390 one.
> The chelsio helper reads a timespec via ktime just to convert it
> back to ktime. I can see no different outcome from calling
> ktime_get directly.
> 
> Remove the get_clock and use ktime_get directly.
> 
> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net 1/1] r8169: workaround for missing extended GigaMAC registers
From: David Miller @ 2012-12-07 17:55 UTC (permalink / raw)
  To: romieu; +Cc: netdev, jlee, udknight, hayeswang
In-Reply-To: <20121205223452.GA24164@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 5 Dec 2012 23:34:52 +0100

> GigaMAC registers have been reported left unitialized in several
> situations:
> - after cold boot from power-off state
> - after S3 resume
> 
> Tweaking rtl_hw_phy_config takes care of both.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Cc: Hayes Wang <hayeswang@realtek.com>
> ---
>  drivers/net/ethernet/realtek/r8169.c | 42 ++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 18 deletions(-)
> 
>  YanQing and Chun-Yi, can you add your Signed-off-by to this patch ?
>  It contains bits of everybody's work but it does not match any. :o)
> 
>  I apparently play in the safe bios league since I did not notice any
>  difference before or after the patch.
> 
>  Beware, this patch seems to apply to net-next but doing so moves
>  rtl_rar_exgmac_set from rtl8168e_2_hw_phy_config to rtl8168f_hw_phy_config.
> 
>  Hayes, your comments are welcome if any.

Francois could you please respin this against net-next to avoid the unintended
consequence of applying the change to the wrong function?

If this change turns out to be more critical than it appears, and impact more
people than it appears, we can queue it up for -stable later.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/1] bnx2x: Prevent link flaps when booting from SAN.
From: David Miller @ 2012-12-07 17:54 UTC (permalink / raw)
  To: yuvalmin; +Cc: netdev, ariele, barak, eilong
In-Reply-To: <1354784643-22236-1-git-send-email-yuvalmin@broadcom.com>

From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Thu, 6 Dec 2012 11:04:03 +0200

> From: Barak Witkowski <barak@broadcom.com>
> 
> It is possible that the driver is configured to operate with a certain
> link configuration which differs from the link's configuration during
> boot from SAN - this would cause the driver to flap the link.
> 
> Said flap may be missed by specific switches, causing dcbx convergence 
> to be too long and boot sequence to fail. Convergence is longer because
> switch ignores new dcbx packets due to counters mismatch, as only host 
> side reset the counters due to the link flap.
> 
> This patch causes the driver to ignore user's initial configuration during
> boot from SAN, and continues with the existing link configuration.
> 
> Signed-off-by: Barak Witkowski <barak@broadcom.com>
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 06/12] cxgb3: Use standard #defines for PCIe Capability ASPM fields
From: David Miller @ 2012-12-07 17:51 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, netdev, divy
In-Reply-To: <20121205205750.13851.10893.stgit@bhelgaas.mtv.corp.google.com>

From: Bjorn Helgaas <bhelgaas@google.com>
Date: Wed, 05 Dec 2012 13:57:50 -0700

> Use the standard #defines rather than bare numbers for PCIe Capability
> ASPM fields.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

This seems to depend upon another patch which presumably adds the
define of this new macro to a PCI header file.  So I can't apply
this to my tree.

Just merge it wherever the dependency is:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH v2] net: phy: smsc: force all capable mode if the phy is started in powerdown mode
From: David Miller @ 2012-12-07 17:48 UTC (permalink / raw)
  To: tremyfr
  Cc: netdev, linux-kernel, otavio, javier, jkosina, eric.jarrige,
	julien.boibessot, thomas.petazzoni
In-Reply-To: <1354647130-1854-1-git-send-email-tremyfr@yahoo.fr>

From: Philippe Reynes <tremyfr@yahoo.fr>
Date: Tue,  4 Dec 2012 19:52:10 +0100

> A SMSC PHY in power down mode can't be used.
> If a SMSC PHY is in this mode in the config_init
> stage, the mode "all capable" is set. So the PHY
> could then be used.
> 
> Signed-off-by: Philippe Reynes <tremyfr@yahoo.fr>

Applied, thanks.

^ permalink raw reply

* Re: [net-next 2/7] bna: Tx and Rx Optimizations
From: David Miller @ 2012-12-07 17:46 UTC (permalink / raw)
  To: David.Laight; +Cc: rmody, netdev, bhutchings, adapter_linux_open_src_team
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70E5@saturn3.aculab.com>

From: "David Laight" <David.Laight@ACULAB.COM>
Date: Fri, 7 Dec 2012 10:46:22 -0000

>>  #define BNA_QE_INDX_ADD(_qe_idx, _qe_num, _q_depth)			\
>>  	((_qe_idx) = ((_qe_idx) + (_qe_num)) & ((_q_depth) - 1))
>> 
>> +#define BNA_QE_INDX_INC(_idx, _q_depth)					\
>> +do {									\
>> +	(_idx)++;							\
>> +	if ((_idx) == (_q_depth))					\
>> +		(_idx) = 0;						\
>> +} while (0)
>> +
> 
> If q_depth has to be a power of 2 (implied by BNA_QE_IND_ADD())
> then you should mask in BNA_QE_INDX_INC() to save the conditional.
> Or just:
> #define BNA_QE_INDX_INC(_idx, _q_depth) BNA_QE_INDX_ADD(_idx, 1, _q_depth)

Agreed.

^ permalink raw reply

* Re: GPF in ip6_dst_lookup_tail
From: David Miller @ 2012-12-07 17:43 UTC (permalink / raw)
  To: davej; +Cc: netdev
In-Reply-To: <20121207141525.GA20613@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Fri, 7 Dec 2012 09:15:25 -0500

> I just hit this gpf in overnight testing.

Hmmm, perhaps introduced by:

commit f950c0ecc78f745e490d615280e031de4dbb1306
Author: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Date:   Thu Sep 20 18:29:56 2012 +0000

    ipv6: fix return value check in fib6_add()

and fixed by:

commit 188c517a050ec5b123e72cab76ea213721e5bd9d
Author: Lin Ming <mlin@ss.pku.edu.cn>
Date:   Tue Sep 25 15:17:07 2012 +0000

    ipv6: return errno pointers consistently for fib6_add_1()
    

^ permalink raw reply

* Re: [PATCH net-next 0/1] fix vlan transmit performance
From: Andrew Gallatin @ 2012-12-07 17:32 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem, netdev
In-Reply-To: <1354837321.2828.86.camel@bwh-desktop.uk.solarflarecom.com>

On 12/06/12 18:42, Ben Hutchings wrote:
> On Thu, 2012-12-06 at 15:54 -0500, Andrew Gallatin wrote:
<..>
>> The following patch (just copy dev->features to dev->vlan_features in
>> vlan_dev_init()) seems to be the simplest way to fix it. Perhaps this
>> is wrong, and there is a better way?
> [...]
> 
> It's wrong, because those features would then be recursively transferred
> to further stacked VLAN devices.
> 

The more I play with it & try various combinations, the more
I realize how tricky stacked vlans makes things,
so I think its best for me to just leave this alone..

Sorry for the noise.

Drew

^ permalink raw reply

* Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit
From: David Miller @ 2012-12-07 17:36 UTC (permalink / raw)
  To: nhorman; +Cc: paul.gortmaker, netdev, jon.maloy, ying.xue
In-Reply-To: <20121207160733.GD29819@shamino.rdu.redhat.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 7 Dec 2012 11:07:33 -0500

> On Fri, Dec 07, 2012 at 09:28:10AM -0500, Paul Gortmaker wrote:
>> From: Ying Xue <ying.xue@windriver.com>
>> 
>> As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
>> global atomic counter for the sum of sk_recv_queue sizes across all
>> tipc sockets. When incremented, the counter is compared to an upper
>> threshold value, and if this is reached, the message is rejected
>> with error code TIPC_OVERLOAD.
>> 
>> This check was originally meant to protect the node against
>> buffer exhaustion and general CPU overload. However, all experience
>> indicates that the feature not only is redundant on Linux, but even
>> harmful. Users run into the limit very often, causing disturbances
>> for their applications, while removing it seems to have no negative
>> effects at all. We have also seen that overall performance is
>> boosted significantly when this bottleneck is removed.
>> 
>> Furthermore, we don't see any other network protocols maintaining
>> such a mechanism, something strengthening our conviction that this
>> control can be eliminated.
>> 
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
 ...
>> @@ -1241,11 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
>>  	}
>>  
>>  	/* Reject message if there isn't room to queue it */
>> -	recv_q_len = (u32)atomic_read(&tipc_queue_size);
>> -	if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
>> -		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
>> -			return TIPC_ERR_OVERLOAD;
>> -	}
> If you're going to remove the one place that you read this variable, don't you
> also want to remove the points where you increment/decrement the atomic as well,
> and for that matter eliminate the definition itself?

There's another reader, a getsockopt() call.

I would just make it return zero or similar.

Paul please do so and respin this series.

Thanks.

^ permalink raw reply

* Re: [net-next 0/5][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-12-07 17:33 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1354860695-27039-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu,  6 Dec 2012 22:11:30 -0800

> This series contains updates to igb and ixgbe.
> 
> The following are changes since commit b93196dc5af7729ff7cc50d3d322ab1a364aa14f:
>   net: fix some compiler warning in net/core/neighbour.c
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Pulled, thanks Jeff.

^ permalink raw reply

* [RFC PATCH] ipv6: do not create neighbor entries for local delivery
From: Marcelo Ricardo Leitner @ 2012-12-07 17:30 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Graf, Flavio Leitner

[-- Attachment #1: Type: text/plain, Size: 1702 bytes --]

RFC comment:

Please note that I do not fully understand the impacts of this, thus why 
RFC. I don't know if this is the best way to address the issue.

I have a report that when using TPROXY and IPv6, neighbor cache gets 
flooded with empty entries, while this does not happen with IPv4. These 
empty entries looks like:

# ip -6 neigh show nud all   (I masked some bits)
...
20xx::f0xx:x:3xdb dev lo lladdr 00:00:00:00:00:00 NOARP
...

Note that this address was not directly reachable by this host. It was 1 
hop away, and still got a neighbor entry.

These entries seems to be not used during input. I disabled their 
creation and I could not notice any abnormal results.

Bellow the dashes, my original patch description. Applies to net-next.

Please advise.

Thanks,
Marcelo.

----------------

They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.

Note that IPv4 already handles it this way. No neighbor entries are
created for local input.
---
  net/ipv6/route.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 
e229a3bc345dc4138a188282c4ab4f1717882832..e6058ab0bb94233da1eec3349e098175d5abf831 
100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -928,7 +928,7 @@ restart:
  	dst_hold(&rt->dst);
  	read_unlock_bh(&table->tb6_lock);
-	if (!rt->n && !(rt->rt6i_flags & RTF_NONEXTHOP))
+	if (!rt->n && !(rt->rt6i_flags & (RTF_NONEXTHOP|RTF_LOCAL)))
  		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
  	else if (!(rt->dst.flags & DST_HOST))
  		nrt = rt6_alloc_clone(rt, &fl6->daddr);
-- 
1.7.11.7


[-- Attachment #2: Seção da mensagem anexada --]
[-- Type: text/plain, Size: 0 bytes --]



^ permalink raw reply

* Re: [PATCH rfc] netfilter: two xtables matches
From: Willem de Bruijn @ 2012-12-07 17:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet,
	David Miller, Patrick McHardy
In-Reply-To: <20121207132014.GB3019@1984>

On Fri, Dec 7, 2012 at 8:20 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote:
>> On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
>> >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
>> >>
>> >> >Somehow, the first part of this email went missing. Not critical,
>> >> >but for completeness:
>> >> >
>> >> >These two patches each add an xtables match.
>> >> >
>> >> >The xt_priority match is a straighforward addition in the style of
>> >> >xt_mark, adding the option to filter on one more sk_buff field. I
>> >> >have an immediate application for this. The amount of code (in
>> >> >kernel + userspace) to add a single check proved quite large.
>> >>
>> >> Hm so yeah, can't we just place this in xt_mark.c?
>> >
>> > I don't feel this belongs to xt_mark at all.
>>
>> Do you have other concerns, or can I resubmit as is for merging in a
>> few days if no one raises additional issues?
>
> In nftables we have the 'meta' extension that allows to match all
> skbuff fields (among other things):
>
> http://1984.lsi.us.es/git/nf-next/tree/net/netfilter/nft_meta.c?h=nf_tables8
>
> I think it's the way to go so we stop adding small matches for each
> skbuff field.
>
> I don't mind the name if it's xt_skbuff or xt_meta.

Okay. I'll respin right now with one more field to select the skb field to
match on, as a patch against tree nf-next, and will send that to
netfilter-devel.

^ permalink raw reply

* Re: [PATCH] ipv4/route/rtnl: get mcast attributes when dst is multicast
From: David Miller @ 2012-12-07 17:25 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <1354618987-15794-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue,  4 Dec 2012 12:03:07 +0100

> Commit f1ce3062c538 (ipv4: Remove 'rt_dst' from 'struct rtable') removes the
> call to ipmr_get_route(), which will get multicast parameters of the route.
> 
> I revert the part of the patch that remove this call. I think the goal was only
> to get rid of rt_dst field.
> 
> The patch is only compiled-tested. My first idea was to remove ipmr_get_route()
> because rt_fill_info() was the only user, but it seems the previous patch cleans
> the code a bit too much ;-)
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Good catch, applied to net-next.

Thanks.

^ permalink raw reply

* Re: [patch net-next] net: call notifiers for mtu change even if iface is not up
From: David Miller @ 2012-12-07 17:23 UTC (permalink / raw)
  To: nhorman; +Cc: jiri, netdev, edumazet, bhutchings, psimerda
In-Reply-To: <20121207150708.GA29819@shamino.rdu.redhat.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 7 Dec 2012 10:07:08 -0500

> On Fri, Dec 07, 2012 at 01:29:20PM +0100, Jiri Pirko wrote:
>> Mon, Dec 03, 2012 at 04:22:50PM CET, nhorman@tuxdriver.com wrote:
>> >On Mon, Dec 03, 2012 at 03:22:29PM +0100, Jiri Pirko wrote:
>> >> Mon, Dec 03, 2012 at 03:18:23PM CET, nhorman@tuxdriver.com wrote:
>> >> >On Mon, Dec 03, 2012 at 12:16:32PM +0100, Jiri Pirko wrote:
>> >> >> Do the same thing as in set mac. Call notifiers every time.
>> >> >> 
>> >> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
 ...
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Applied.

^ permalink raw reply

* Re: [PATCH stable] ipv4: do not cache looped multicasts
From: David Miller @ 2012-12-07 17:14 UTC (permalink / raw)
  To: caiqian; +Cc: netdev, mbizon, ja
In-Reply-To: <1620782878.5271574.1354895934053.JavaMail.root@redhat.com>


Please stop submitting networking -stable patches.

Thank you.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox