Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt
From: Al Viro @ 2014-11-06  6:43 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, netdev, linux-kernel, bcrl, Masahide Nakamura,
	Hideaki YOSHIFUJI
In-Reply-To: <20141106055023.GA28865@gondor.apana.org.au>

On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> +	/* We only need the first two bytes. */
> +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> +	if (err)
> +		return err;
> +
> +	fl4->fl4_icmp_type = icmph.type;
> +	fl4->fl4_icmp_code = icmph.code;

That's more readable, but that exposes another problem in there - we read
the same piece of userland data twice, with no promise whatsoever that we'll
get the same value both times...

^ permalink raw reply

* [PATCH] bridge: missing null bridge device check causing null pointer dereference (bugfix)
From: Su-Hyun Park @ 2014-11-06  6:26 UTC (permalink / raw)
  To: Stephen Hemminger, David S. Miller
  Cc: bridge, netdev, linux-kernel, Su-Hyun Park

the bridge device can be null if the bridge is being deleted while processing 
the packet, which causes the null pointer dereference in switch statement.

crash dump snippet:

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000021
<1>IP: [<ffffffff814179f6>] br_handle_frame+0xe6/0x270

<0>Code: 4c 0f 44 f0 89 f8 66 33 15 32 52 24 00 66 33 05 29 52 24 00 09 c2 89 
f0 66 33 05 22 52 24 00 80 e4 f0 66 09 c2 0f 84 eb 00 00 00 <41> 0f b6 46 21 
3c 02 74 61 3c 03 74 1d 48 89 df e8 d5 bc f0 ff
---
 net/bridge/br_input.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 6fd5522..7e899ca 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -176,6 +176,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 		return RX_HANDLER_CONSUMED;

 	p = br_port_get_rcu(skb->dev);
+	if (!p)
+		goto drop;

 	if (unlikely(is_link_local_ether_addr(dest))) {
 		u16 fwd_mask = p->br->group_fwd_mask_required;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next] PPC: bpf_jit_comp: add SKF_AD_HATYPE instruction
From: Denis Kirjanov @ 2014-11-06  6:02 UTC (permalink / raw)
  To: netdev
  Cc: linuxppc-dev, Denis Kirjanov, Alexei Starovoitov, Daniel Borkmann,
	Philippe Bergheaud

Add BPF extension SKF_AD_HATYPE to ppc JIT to check
the hw type of the interface

JIT off:
[   69.106783] test_bpf: #20 LD_HATYPE 48 48 PASS
JIT on:
[   64.721757] test_bpf: #20 LD_HATYPE 7 6 PASS

CC: Alexei Starovoitov<alexei.starovoitov@gmail.com>
CC: Daniel Borkmann<dborkman@redhat.com>
CC: Philippe Bergheaud<felix@linux.vnet.ibm.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
---
 arch/powerpc/net/bpf_jit_comp.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index d110e28..8bf4fc2 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -412,6 +412,22 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			PPC_ANDI(r_A, r_A, PKT_TYPE_MAX);
 			PPC_SRWI(r_A, r_A, 5);
 			break;
+		case BPF_ANC | SKF_AD_HATYPE:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, type) != 2);
+			PPC_LD_OFFS(r_scratch1, r_skb, offsetof(struct sk_buff,
+								dev));
+			PPC_CMPDI(r_scratch1, 0);
+			if (ctx->pc_ret0 != -1) {
+				PPC_BCC(COND_EQ, addrs[ctx->pc_ret0]);
+			} else {
+				/* Exit, returning 0; first pass hits here. */
+				PPC_BCC_SHORT(COND_NE, (ctx->idx*4)+12);
+				PPC_LI(r_ret, 0);
+				PPC_JMP(exit_addr);
+			}
+			PPC_LHZ_OFFS(r_A, r_scratch1,
+				     offsetof(struct net_device, type));
+			break;
 		case BPF_ANC | SKF_AD_CPU:
 #ifdef CONFIG_SMP
 			/*
-- 
2.1.0

^ permalink raw reply related

* ipv4: Use standard iovec primitive in raw_probe_proto_opt
From: Herbert Xu @ 2014-11-06  5:50 UTC (permalink / raw)
  To: Al Viro
  Cc: David Miller, netdev, linux-kernel, bcrl, Masahide Nakamura,
	Hideaki YOSHIFUJI
In-Reply-To: <20141106032533.GU7996@ZenIV.linux.org.uk>

On Thu, Nov 06, 2014 at 03:25:34AM +0000, Al Viro wrote:
>
> 	* there's some really weird stuff in there.  Just what is this
> static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
> {

It looks like newbie coding that's all.  There's nothing tricky
here as far as I can tell.  We're just trying to fetch the ICMP
header to seed the IPsec lookup.

So how about this rewrite? I'm assuming that you're not going
to get rid of memcpy_fromiovecend/memcpy_toiovecend, if you
are, let me know and I'll redo this with iterators.

ipv4: Use standard iovec primitive in raw_probe_proto_opt

The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets.  In doing so it's processing iovec by hand and
overcomplicating things.

This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 739db31..04f67e1 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -422,48 +422,20 @@ error:
 
 static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
 {
-	struct iovec *iov;
-	u8 __user *type = NULL;
-	u8 __user *code = NULL;
-	int probed = 0;
-	unsigned int i;
+	struct icmphdr icmph;
+	int err;
 
-	if (!msg->msg_iov)
+	if (fl4->flowi4_proto != IPPROTO_ICMP)
 		return 0;
 
-	for (i = 0; i < msg->msg_iovlen; i++) {
-		iov = &msg->msg_iov[i];
-		if (!iov)
-			continue;
-
-		switch (fl4->flowi4_proto) {
-		case IPPROTO_ICMP:
-			/* check if one-byte field is readable or not. */
-			if (iov->iov_base && iov->iov_len < 1)
-				break;
-
-			if (!type) {
-				type = iov->iov_base;
-				/* check if code field is readable or not. */
-				if (iov->iov_len > 1)
-					code = type + 1;
-			} else if (!code)
-				code = iov->iov_base;
-
-			if (type && code) {
-				if (get_user(fl4->fl4_icmp_type, type) ||
-				    get_user(fl4->fl4_icmp_code, code))
-					return -EFAULT;
-				probed = 1;
-			}
-			break;
-		default:
-			probed = 1;
-			break;
-		}
-		if (probed)
-			break;
-	}
+	/* We only need the first two bytes. */
+	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
+	if (err)
+		return err;
+
+	fl4->fl4_icmp_type = icmph.type;
+	fl4->fl4_icmp_code = icmph.code;
+
 	return 0;
 }
 

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* [PATCH net-next 1/3] r8152: move r8152b_get_version
From: Hayes Wang @ 2014-11-06  4:47 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1394712342-15778-84-Taiwan-albertk@realtek.com>

Move r8152b_get_version() to the location before rtl_ops_init().
Then, the rtl_ops_init() could use tp->version.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index fd41675..4b6db8a 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3833,6 +3833,7 @@ static int rtl8152_probe(struct usb_interface *intf,
 	tp->netdev = netdev;
 	tp->intf = intf;
 
+	r8152b_get_version(tp);
 	ret = rtl_ops_init(tp, id);
 	if (ret)
 		goto out;
@@ -3866,11 +3867,9 @@ static int rtl8152_probe(struct usb_interface *intf,
 	tp->mii.phy_id_mask = 0x3f;
 	tp->mii.reg_num_mask = 0x1f;
 	tp->mii.phy_id = R8152_PHY_ID;
-	tp->mii.supports_gmii = 0;
 
 	intf->needs_remote_wakeup = 1;
 
-	r8152b_get_version(tp);
 	tp->rtl_ops.init(tp);
 	set_ethernet_addr(tp);
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 3/3] r8152: remove the definitions of the PID
From: Hayes Wang @ 2014-11-06  4:47 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1394712342-15778-84-Taiwan-albertk@realtek.com>

The PIDs are only used in the id table, so the definitions are
unnacessary. Remove them wouldn't have confusion.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index cf1b8a7..66b139a 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -461,11 +461,7 @@ enum rtl8152_flags {
 
 /* Define these values to match your device */
 #define VENDOR_ID_REALTEK		0x0bda
-#define PRODUCT_ID_RTL8152		0x8152
-#define PRODUCT_ID_RTL8153		0x8153
-
 #define VENDOR_ID_SAMSUNG		0x04e8
-#define PRODUCT_ID_SAMSUNG		0xa101
 
 #define MCU_TYPE_PLA			0x0100
 #define MCU_TYPE_USB			0x0000
@@ -3898,9 +3894,9 @@ static void rtl8152_disconnect(struct usb_interface *intf)
 
 /* table of devices that work with this driver */
 static struct usb_device_id rtl8152_table[] = {
-	{USB_DEVICE(VENDOR_ID_REALTEK, PRODUCT_ID_RTL8152)},
-	{USB_DEVICE(VENDOR_ID_REALTEK, PRODUCT_ID_RTL8153)},
-	{USB_DEVICE(VENDOR_ID_SAMSUNG, PRODUCT_ID_SAMSUNG)},
+	{USB_DEVICE(VENDOR_ID_REALTEK, 0x8152)},
+	{USB_DEVICE(VENDOR_ID_REALTEK, 0x8153)},
+	{USB_DEVICE(VENDOR_ID_SAMSUNG, 0xa101)},
 	{}
 };
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 2/3] r8152: modify rtl_ops_init
From: Hayes Wang @ 2014-11-06  4:47 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1394712342-15778-84-Taiwan-albertk@realtek.com>

Replace using VID/PID with using tp->version to initialize the ops.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 79 ++++++++++++++++++-------------------------------
 1 file changed, 28 insertions(+), 51 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 4b6db8a..cf1b8a7 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3742,66 +3742,43 @@ static void rtl8153_unload(struct r8152 *tp)
 	r8153_power_cut_en(tp, false);
 }
 
-static int rtl_ops_init(struct r8152 *tp, const struct usb_device_id *id)
+static int rtl_ops_init(struct r8152 *tp)
 {
 	struct rtl_ops *ops = &tp->rtl_ops;
-	int ret = -ENODEV;
-
-	switch (id->idVendor) {
-	case VENDOR_ID_REALTEK:
-		switch (id->idProduct) {
-		case PRODUCT_ID_RTL8152:
-			ops->init		= r8152b_init;
-			ops->enable		= rtl8152_enable;
-			ops->disable		= rtl8152_disable;
-			ops->up			= rtl8152_up;
-			ops->down		= rtl8152_down;
-			ops->unload		= rtl8152_unload;
-			ops->eee_get		= r8152_get_eee;
-			ops->eee_set		= r8152_set_eee;
-			ret = 0;
-			break;
-		case PRODUCT_ID_RTL8153:
-			ops->init		= r8153_init;
-			ops->enable		= rtl8153_enable;
-			ops->disable		= rtl8153_disable;
-			ops->up			= rtl8153_up;
-			ops->down		= rtl8153_down;
-			ops->unload		= rtl8153_unload;
-			ops->eee_get		= r8153_get_eee;
-			ops->eee_set		= r8153_set_eee;
-			ret = 0;
-			break;
-		default:
-			break;
-		}
+	int ret = 0;
+
+	switch (tp->version) {
+	case RTL_VER_01:
+	case RTL_VER_02:
+		ops->init		= r8152b_init;
+		ops->enable		= rtl8152_enable;
+		ops->disable		= rtl8152_disable;
+		ops->up			= rtl8152_up;
+		ops->down		= rtl8152_down;
+		ops->unload		= rtl8152_unload;
+		ops->eee_get		= r8152_get_eee;
+		ops->eee_set		= r8152_set_eee;
 		break;
 
-	case VENDOR_ID_SAMSUNG:
-		switch (id->idProduct) {
-		case PRODUCT_ID_SAMSUNG:
-			ops->init		= r8153_init;
-			ops->enable		= rtl8153_enable;
-			ops->disable		= rtl8153_disable;
-			ops->up			= rtl8153_up;
-			ops->down		= rtl8153_down;
-			ops->unload		= rtl8153_unload;
-			ops->eee_get		= r8153_get_eee;
-			ops->eee_set		= r8153_set_eee;
-			ret = 0;
-			break;
-		default:
-			break;
-		}
+	case RTL_VER_03:
+	case RTL_VER_04:
+	case RTL_VER_05:
+		ops->init		= r8153_init;
+		ops->enable		= rtl8153_enable;
+		ops->disable		= rtl8153_disable;
+		ops->up			= rtl8153_up;
+		ops->down		= rtl8153_down;
+		ops->unload		= rtl8153_unload;
+		ops->eee_get		= r8153_get_eee;
+		ops->eee_set		= r8153_set_eee;
 		break;
 
 	default:
+		ret = -ENODEV;
+		netif_err(tp, probe, tp->netdev, "Unknown Device\n");
 		break;
 	}
 
-	if (ret)
-		netif_err(tp, probe, tp->netdev, "Unknown Device\n");
-
 	return ret;
 }
 
@@ -3834,7 +3811,7 @@ static int rtl8152_probe(struct usb_interface *intf,
 	tp->intf = intf;
 
 	r8152b_get_version(tp);
-	ret = rtl_ops_init(tp, id);
+	ret = rtl_ops_init(tp);
 	if (ret)
 		goto out;
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 0/3] r8152: rtl_ops_init modify
From: Hayes Wang @ 2014-11-06  4:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Hayes Wang

Initialize the ops through tp->version. This could skip checking
each VID/PID.

Hayes Wang (3):
  r8152: move r8152b_get_version
  r8152: modify rtl_ops_init
  r8152: remove the definitions of the PID

 drivers/net/usb/r8152.c | 92 +++++++++++++++++--------------------------------
 1 file changed, 32 insertions(+), 60 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: Al Viro @ 2014-11-06  3:25 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev, linux-kernel, bcrl
In-Reply-To: <20141105.165719.835728206041332333.davem@davemloft.net>

On Wed, Nov 05, 2014 at 04:57:19PM -0500, David Miller wrote:
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Wed, 5 Nov 2014 21:07:45 +0000
> 
> > Ping me when you put it there, OK?  I'll rebase the rest of old stuff on
> > top of it (similar helpers, mostly).
> 
> I just pushed it into net-next, thanks Al.

OK, I've taken the beginning of the old queue on top of net-next; it's
in git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.

>From the quick look at the remaining ->msg_iov users:

	* I'll need to add several iov_iter primitives - counterparts of
checksum.h stuff (copy_and_csum_{from,to}_iter(), maybe some more).  Not
a big deal, I'll do that tomorrow.  That will give us a clean iov_iter-based
counterpart of skb_copy_and_csum_datagram_iovec().

	* a new helper: zerocopy_sg_from_iter().  I have it, actually,
but I'd rather not step on Herbert's toes - it's too close to the areas
his series will touch, so that's probably for when his series goes in.
It will be needed for complete macvtap conversion...

	* why doesn't verify_iovec() use rw_copy_check_uvector()?  The only
real differences I see is that (a) you do allocation in callers (same as
rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case of
too long vector, while rw_copy_check_uvector() returns EINVAL in that case
and (c) you don't do access_ok().  The last one is described as optimization,
but for iov_iter primitives it's a serious PITA - for iovec-backed instances
they are using __copy_from_user()/__copy_to_user(), etc.
	It certainly would be nice to have the same code doing all copying
of iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc.  Am I
missing something subtle semantical difference in there?  EMSGSIZE vs EINVAL
is trivial (we can lift that check into the callers, if nothing else), but
I could miss something more interesting...

	* various getfrag will need to grow iov_iter-based counterparts,
but ip_append_output() needs no changes, AFAICS.

	* crypto stuff will be easy to convert - iov_iter_get_pages()
would suffice for a primitive

	* there's some really weird stuff in there.  Just what is this
static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
{
        struct iovec *iov;
        u8 __user *type = NULL;
        u8 __user *code = NULL;
        int probed = 0;
        unsigned int i;

        if (!msg->msg_iov)
                return 0;

        for (i = 0; i < msg->msg_iovlen; i++) {
                iov = &msg->msg_iov[i];
                if (!iov)
                        continue;
trying to do?  "If non-NULL pointer + i somehow happened to be NULL, skip it
and try to use the same pointer + i + 1"?  Huh?  Had been that way since
the function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code
when sending packets via raw socket.", according to historical tree)...

	* rds, bluetooth and vsock are doing something odd; need to RTFS some
more.

	* not sure I understand what TIPC is doing - does it prohibit too
short first segment of ->msg_iov?  net/tipc/socket.c:dest_name_check() looks
odd _and_ potentially racy - we read the same data twice and hope our checks
still apply.  I asked TIPC folks about that race back in April, but it
looks like that fell through the cracks...

Overall, so far it looks more or less feasible - other than the missing csum
primitives, current mm/iov_iter.c should suffice.  I have _not_ seriously
looked into sendpage yet; that might very well require some more.

^ permalink raw reply

* Re: [PATCH net-next] fou: Fix typo in returning flags in netlink
From: David Miller @ 2014-11-06  3:18 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <1415234978-31931-1-git-send-email-therbert@google.com>

From: Tom Herbert <therbert@google.com>
Date: Wed,  5 Nov 2014 16:49:38 -0800

> When filling netlink info, dport is being returned as flags. Fix
> instances to return correct value.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

Applied, thanks Tom.

^ permalink raw reply

* Re: [PATCH net-next] r8152: disable the tasklet by default
From: David Miller @ 2014-11-06  3:17 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-83-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Wed, 5 Nov 2014 10:17:02 +0800

> Let the tasklet only be enabled after open(), and be disabled for
> the other situation. The tasklet is only necessary after open() for
> tx/rx, so it could be disabled by default.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net v4] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: David Miller @ 2014-11-06  3:14 UTC (permalink / raw)
  To: dborkman; +Cc: lw1a2.jing, netdev, edumazet, hannes, david.stevens
In-Reply-To: <1415215658-10054-1-git-send-email-dborkman@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Wed,  5 Nov 2014 20:27:38 +0100

> It has been reported that generating an MLD listener report on
> devices with large MTUs (e.g. 9000) and a high number of IPv6
> addresses can trigger a skb_over_panic():
 ...
> mld_newpack() skb allocations are usually requested with dev->mtu
> in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
> we have changed the limit in order to be less likely to fail.
> 
> However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
> macros, which determine if we may end up doing an skb_put() for
> adding another record. To avoid possible fragmentation, we check
> the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
> assumption as the actual max allocation size can be much smaller.
> 
> The IGMP case doesn't have this issue as commit 57e1ab6eaddc
> ("igmp: refine skb allocations") stores the allocation size in
> the cb[].
> 
> Set a reserved_tailroom to make it fit into the MTU and use
> skb_availroom() helper instead. This also allows to get rid of
> igmp_skb_size().
> 
> Reported-by: Wei Liu <lw1a2.jing@gmail.com>
> Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

This has always been a tricky area, applied and queued up for
-stable, thanks everyone.

^ permalink raw reply

* Re: [PATCH v2 net-next] udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts
From: David Miller @ 2014-11-06  3:11 UTC (permalink / raw)
  To: raj; +Cc: netdev
In-Reply-To: <20141104234710.7FC7C290039D@tardy>

From: raj@tardy.usa.hp.com (Rick Jones)
Date: Tue,  4 Nov 2014 15:47:10 -0800 (PST)

> @@ -1656,6 +1657,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	int dif = skb->dev->ifindex;
>  	unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
>  	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
> +	unsigned int inner_flushed = 0;
>  
>  	if (use_hash2) {
>  		hash2_any = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum) &
 ...
> @@ -781,6 +781,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	int dif = inet6_iif(skb);
>  	unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
>  	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
> +	int inner_flushed = 0;

Please use bool/true/false for inner_flushed in these two functions.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: Convert SEQ_START_TOKEN/seq_printf to seq_puts
From: David Miller @ 2014-11-06  3:05 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <1415144223.1508.1.camel@perches.com>

From: Joe Perches <joe@perches.com>
Date: Tue, 04 Nov 2014 15:37:03 -0800

> Using a single fixed string is smaller code size than using
> a format and many string arguments.
> 
> Reduces overall code size a little.
> 
> $ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
>    text	   data	    bss	    dec	    hex	filename
>   34269	   7012	  14824	  56105	   db29	net/ipv4/igmp.o.new
>   34315	   7012	  14824	  56151	   db57	net/ipv4/igmp.o.old
>   30078	   7869	  13200	  51147	   c7cb	net/ipv6/mcast.o.new
>   30105	   7869	  13200	  51174	   c7e6	net/ipv6/mcast.o.old
>   11434	   3748	   8580	  23762	   5cd2	net/ipv6/ip6_flowlabel.o.new
>   11491	   3748	   8580	  23819	   5d0b	net/ipv6/ip6_flowlabel.o.old
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Ok, I'm fine with this, applied.

Thanks Joe.

^ permalink raw reply

* Re: [PATCH] rtlwifi: Add more checks for get_btc_status callback
From: Mike Galbraith @ 2014-11-06  3:03 UTC (permalink / raw)
  To: Larry Finger
  Cc: Murilo Opsfelder Araujo, linux-kernel, linux-wireless, netdev,
	Chaoming Li, John W. Linville, Thadeu Cascardo, troy_tan
In-Reply-To: <545A6894.7040506@lwfinger.net>

On Wed, 2014-11-05 at 12:12 -0600, Larry Finger wrote:

> Yes, I am aware that rtl8192se is failing, and now that I am back from vacation, 
> I am working on the problem. If you want to use the driver with kernel 3.18, 
> clone the repo at http://github.com/lwfinger/rtlwifi_new.git and build and 
> install either the master or kernel_version branches. Both work.

Nah, no hurry.  My lappy is about to go on 4 weeks vacation, and has a
bulging suitcase full of kernels to wear :)

-Mike

^ permalink raw reply

* Re: [PATCH net-next] fast_hash: avoid indirect function calls
From: David Miller @ 2014-11-06  3:03 UTC (permalink / raw)
  To: hannes; +Cc: netdev, kernel, dborkman, tgraf
In-Reply-To: <8214a3fdc8b7f97bb782c8722e9f1e65037553fe.1415142006.git.hannes@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed,  5 Nov 2014 00:23:04 +0100

> By default the arch_fast_hash hashing function pointers are initialized
> to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get
> updated to the CRC32 ones. This dispatching scheme incurs a function
> pointer lookup and indirect call for every hashing operation.
> 
> rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing
> functions in its structure, too, causing two indirect branches per
> hashing operation.
> 
> Using alternative_call we can get away with one of those indirect branches.
> 
> Acked-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied, thanks Hannes.

> Would it make sense to start suppressing the generation of local
> functions for static inline functions which address is taken?
> 
> E.g. we could use extern inline in a few cases (dst_output is often used
> as a function pointer but marked static inline).  We could mark it as
> extern inline and copy&paste the code to a .c file to prevent multiple
> copies of machine code for this function. But because of the copy&paste I
> did not in this case.

I'd say that perhaps dst_output() can be handled in the "traditional"
way, by not inlining it ever.

If we have indirect function invocations and non-direct inlines, maybe
in the end it's better to have it in a single hot cache location, no?

^ permalink raw reply

* Re: [PATCH net-next v1 00/12] amd-xgbe: AMD XGBE driver updates 2014-11-04
From: David Miller @ 2014-11-06  3:00 UTC (permalink / raw)
  To: thomas.lendacky; +Cc: netdev
In-Reply-To: <20141104220620.24738.10070.stgit@tlendack-t1.amdoffice.net>

From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Tue, 4 Nov 2014 16:06:20 -0600

> The following series of patches includes functional updates to the
> driver as well as some trivial changes for function renaming and
> spelling fixes.
> 
> - Move channel and ring structure allocation into the device open path
> - Rename the pre_xmit function to dev_xmit
> - Explicitly use the u32 data type for the device descriptors
> - Use page allocation for the receive buffers
> - Add support for split header/payload receive
> - Add support for per DMA channel interrupts
> - Add support for receive side scaling (RSS)
> - Add support for ethtool receive side scaling commands
> - Fix the spelling of descriptors
> - After a PCS reset, sync the PCS and PHY modes
> - Add dependency on HAS_IOMEM to both the amd-xgbe and amd-xgbe-phy
>   drivers
> 
> This patch series is based on net-next.

Series applied, this series looked really nice.

Thanks.

^ permalink raw reply

* Re: [PATCH net 3/5] fm10k: Implement ndo_gso_check()
From: Alexander Duyck @ 2014-11-06  2:54 UTC (permalink / raw)
  To: Joe Stringer, netdev
  Cc: sathya.perla, jeffrey.t.kirsher, linux.nics, amirv, shahed.shaikh,
	Dept-GELinuxNICDev, therbert, linux-kernel
In-Reply-To: <1415138202-1197-4-git-send-email-joestringer@nicira.com>

On 11/04/2014 01:56 PM, Joe Stringer wrote:
> ndo_gso_check() was recently introduced to allow NICs to report the
> offloading support that they have on a per-skb basis. Add an
> implementation for this driver which checks for something that looks
> like VXLAN.
>
> Implementation shamelessly stolen from Tom Herbert:
> http://thread.gmane.org/gmane.linux.network/332428/focus=333111
>
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> ---
> Should this driver report support for GSO on packets with tunnel headers
> up to 64B like the i40e driver does?
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> index 8811364..b9ef622 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> @@ -1350,6 +1350,17 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
>  	}
>  }
>  
> +static bool fm10k_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> +	if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
> +	    (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
> +	     skb->inner_protocol != htons(ETH_P_TEB) ||
> +	     skb_inner_mac_header(skb) - skb_transport_header(skb) != 16))
> +		return false;
> +
> +	return true;
> +}
> +
>  static const struct net_device_ops fm10k_netdev_ops = {
>  	.ndo_open		= fm10k_open,
>  	.ndo_stop		= fm10k_close,
> @@ -1372,6 +1383,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
>  	.ndo_do_ioctl		= fm10k_ioctl,
>  	.ndo_dfwd_add_station	= fm10k_dfwd_add_station,
>  	.ndo_dfwd_del_station	= fm10k_dfwd_del_station,
> +	.ndo_gso_check		= fm10k_gso_check,
>  };
>  
>  #define DEFAULT_DEBUG_LEVEL_SHIFT 3

I'm thinking this check is far too simplistic.  If you look the fm10k
driver already has fm10k_tx_encap_offload() in the TSO function for
verifying if it can support offloading tunnels or not.  I would
recommend starting there or possibly even just adapting that function to
suit your purpose.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 02/13] net_sched: introduce qdisc_peek() helper function
From: Cong Wang @ 2014-11-06  2:50 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Stephen Hemminger, Linux Kernel Network Developers
In-Reply-To: <20141105034929.GA19857@gondor.apana.org.au>

On Tue, Nov 4, 2014 at 7:49 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Tue, Nov 4, 2014 at 10:45 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Tue,  4 Nov 2014 09:56:25 -0800
>>> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>
>>>> +static inline void qdisc_warn_nonwc(void *func, struct Qdisc *qdisc)
>>>> +{
>>>> +     if (!(qdisc->flags & TCQ_F_WARN_NONWC)) {
>>>> +             pr_warn("%pf: %s qdisc %X: is non-work-conserving?\n",
>>>> +                     func, qdisc->ops->id, qdisc->handle >> 16);
>>>> +             qdisc->flags |= TCQ_F_WARN_NONWC;
>>>> +     }
>>>> +}
>>>> +
>>>
>>> Inilining this and creating N copies of same message is not a step forward.
>>
>> Hmm, I think gcc merges same string literals when building Linux kernel?
>> But I never verify this.
>
> In general you should try to avoid inlining code that's not in
> the fast path as that leads to binary code size bloat.  As errors
> shouldn't be in the fast path this function should be inlined.

Makes sense.

Thanks!

^ permalink raw reply

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Tom Herbert @ 2014-11-06  2:44 UTC (permalink / raw)
  To: David Miller
  Cc: Joe Stringer, Or Gerlitz, Linux Netdev List, Sathya Perla,
	Jeff Kirsher, linux.nics, Amir Vadai, shahed.shaikh,
	dept-gelinuxnicdev, LKML
In-Reply-To: <20141105.211558.969082848816106943.davem@davemloft.net>

On Wed, Nov 5, 2014 at 6:15 PM, David Miller <davem@davemloft.net> wrote:
> From: Joe Stringer <joestringer@nicira.com>
> Date: Wed, 5 Nov 2014 17:06:46 -0800
>
>> My impression was that the changes are more likely to be
>> hardware-specific (like the i40e changes) rather than software-specific,
>> like changes that might be integrated into the helper.
>
> I think there is more commonality amongst hardware capabilities,
> and this is why I want the helper to play itself out.
>
>> That said, I can rework for one helper. The way I see it would be the
>> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
>> drivers/net/vxlan.c which would be called from each driver. Is that what
>> you had in mind?
>
> Yes.

Note that this code is not VXLAN specific, it will also accept NVGRE
and GRE/UDP with keyid and TEB. I imagine all these cases should be
indistinguishable to the hardware so they probably just work (which
would be cool!). It might be better to name and locate the helper
function to reflect that.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Eric Dumazet @ 2014-11-06  2:39 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <1415240055.13896.57.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 2014-11-05 at 18:14 -0800, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 17:38 -0800, Rick Jones wrote:
> 
> > Speaking of QPS, what happens to 200 TCP_RR tests when the feature is 
> > enabled?

The possible reduction of QPS happens when you have a single flow like
TCP_RR  -- -r 40000,40000

(Because we have one single TCP packet with 40000 bytes of payload,
application is waked up once when Push flag is received)

So cpu effiency is way better, but application has to copy 40000 bytes
in one go _after_ Push flag, instead of being able to copy part of the
data _before_ receiving the Push flag.

lpaa5:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# ./netperf -H lpaa6 -t TCP_RR -l 20 -Cc -- -r 40000,40000
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa6.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

16384  87380  40000   40000  20.00   9023.86  2.02   1.70   107.513  90.561 
16384  87380 

lpaa5:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# ./netperf -H lpaa6 -t TCP_RR -l 20 -Cc -- -r 40000,40000
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa6.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

16384  87380  40000   40000  20.00   8651.26  0.66   1.02   36.502  56.710 
16384  87380 

^ permalink raw reply

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: David Miller @ 2014-11-06  2:15 UTC (permalink / raw)
  To: joestringer
  Cc: gerlitz.or, therbert, netdev, sathya.perla, jeffrey.t.kirsher,
	linux.nics, amirv, shahed.shaikh, Dept-GELinuxNICDev,
	linux-kernel
In-Reply-To: <20141106010501.GA18339@gmail.com>

From: Joe Stringer <joestringer@nicira.com>
Date: Wed, 5 Nov 2014 17:06:46 -0800

> My impression was that the changes are more likely to be
> hardware-specific (like the i40e changes) rather than software-specific,
> like changes that might be integrated into the helper.

I think there is more commonality amongst hardware capabilities,
and this is why I want the helper to play itself out.

> That said, I can rework for one helper. The way I see it would be the
> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
> drivers/net/vxlan.c which would be called from each driver. Is that what
> you had in mind?

Yes.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Eric Dumazet @ 2014-11-06  2:14 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <545AD11B.5050603@hp.com>

On Wed, 2014-11-05 at 17:38 -0800, Rick Jones wrote:

> Speaking of QPS, what happens to 200 TCP_RR tests when the feature is 
> enabled?

Nothing at all (but the usual noise I guess)

200 TCP_RR send packets with 1 byte of payload and Push flag,
so no packet ever sits in napi->gro_list

lpaa5:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20
3.13827e+06

real	0m32.170s
user	0m32.885s
sys	7m38.868s

lpaa5:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20
3.19013e+06

real	0m37.152s
user	0m33.477s
sys	7m30.586s

Now lets try TCP_RR with -- -r 4000,4000 ;)

Reducing ACK packets allow us to better use the 10Gbe bandwith for
payload, so QPS actually increase.

lpaa5:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20 -- -r
4000,4000
379645

real	0m32.201s
user	0m4.390s
sys	0m59.501s

lpaa5:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20 -- -r
4000,4000
400610

real	0m37.159s
user	0m4.501s
sys	0m59.665s

^ permalink raw reply

* Re: M_CAN message RAM initialization AppNote  - was: Re: [PATCH V3 3/3] can: m_can: workaround for transmit data less than 4 bytes
From: Dong Aisheng @ 2014-11-06  1:57 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, wg, varkabhadram, netdev,
	linux-arm-kernel
In-Reply-To: <545A692E.40002@hartkopp.net>

On Wed, Nov 05, 2014 at 07:15:10PM +0100, Oliver Hartkopp wrote:
> Hi all,
> 
> just to close this application note relevant point ...
> 
> I got an answer from Florian Hartwich (Mr. CAN) from Bosch regarding
> the bit error detection found by Dong Aisheng.
> 
> The relevant interrupts IR.BEU or IR.BEC monitor the message RAM:
> 
> Bit 21 BEU: Bit Error Uncorrected
> Message RAM bit error detected, uncorrected. Controlled by input
> signal m_can_aeim_berr[1] generated by an optional external parity /
> ECC logic attached to the Message RAM. An uncorrected Message RAM
> bit error sets CCCR.INIT to ‘1’. This is done to avoid transmission
> of corrupted data.
> 
> 0= No bit error detected when reading from Message RAM
> 1= Bit error detected, uncorrected (e.g. parity logic)
> 
> Bit 20 BEC: Bit Error Corrected
> Message RAM bit error detected and corrected. Controlled by input
> signal m_can_aeim_berr[0] generated by an optional external parity /
> ECC logic attached to the Message RAM.
> 
> 0= No bit error detected when reading from Message RAM
> 1= Bit error detected and corrected (e.g. ECC)
> 
> ---
> 
> The Message RAM is usually equipped with a parity or ECC functionality.
> But RAM cells suffer a hardware reset and can therefore hold
> arbitrary content at startup - including parity and/or ECC bits.
> 
> So when you write only the CAN ID and the first four bytes the last
> four bytes remain untouched. Then the M_CAN starts to read in 32bit
> words from the start of the Tx Message element. So it is very likely
> to trigger the message RAM error when reading the uninitialized
> 32bit word from the last four bytes.
> 
> Finally it turns out that an initial writing (with any kind of data)
> to the entire message RAM is mandatory to create valid parity/ECC
> checksums.
> 
> That's it.
> 

Thanks for sharing this information.
Does it mean this issue is related to the nature of Message RAM and is
supposed to exist on all M_CAN IP versions?

> Regards,
> Oliver
> 

Regards
Dong Aisheng

^ permalink raw reply

* Re: [PATCH V3 1/3] can: add can_is_canfd_skb() API
From: Dong Aisheng @ 2014-11-06  1:52 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Eric Dumazet, linux-can, mkl, wg, varkabhadram, netdev,
	linux-arm-kernel
In-Reply-To: <545A5F55.7050307@hartkopp.net>

On Wed, Nov 05, 2014 at 06:33:09PM +0100, Oliver Hartkopp wrote:
> On 05.11.2014 17:22, Eric Dumazet wrote:
> >On Wed, 2014-11-05 at 21:16 +0800, Dong Aisheng wrote:
> 
> >
> >This looks a bit strange to assume that skb->len == magical_value is CAN
> >FD. A comment would be nice.
> >
> 
> Yes. Due to exactly two types of struct can(fd)_frame which can be
> contained in a skb the skbs are distinguished by the length which
> can be either CAN_MTU or CANFD_MTU.
> 
> >>+static inline int can_is_canfd_skb(struct sk_buff *skb)
> >
> >static inline bool can_is_canfd_skb(const struct sk_buff *skb)
> >
> 
> ok.
> 

Got it.

> >>+{
> 
> What about:
> 
> 	/* the CAN specific type of skb is identified by its data length */
> 

Looks good to me.
I will send a updated version with these changes.

> >>+	return skb->len == CANFD_MTU;
> >>+}
> >>+
> >>  /* get data length from can_dlc with sanitized can_dlc */
> >>  u8 can_dlc2len(u8 can_dlc);
> 
> Regards,
> Oliver
>

Regards
Dong Aisheng

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox