Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: net/ipv6: slab-out-of-bounds in ip6_tnl_xmit
From: Cong Wang @ 2017-04-25  5:04 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet,
	Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CAM_iQpXLHx=NhSNYU_o4F_V6nNXUHGBSdV=QQnML0B+HD7MbBQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On Mon, Apr 24, 2017 at 9:47 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> We use ipv4 dst in ip6_tunnel and cast an IPv4 neigh key as an
> IPv6 address...
>
>
>                 neigh = dst_neigh_lookup(skb_dst(skb),
>                                          &ipv6_hdr(skb)->daddr);
>                 if (!neigh)
>                         goto tx_err_link_failure;
>
>                 addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
>                 addr_type = ipv6_addr_type(addr6);
>
>                 if (addr_type == IPV6_ADDR_ANY)
>                         addr6 = &ipv6_hdr(skb)->daddr;
>
>                 memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
>
> Also the network header of the skb at this point should be still IPv4?

Please try the attached patch.

I am not sure how we could handle 4in6 case better than just relying on
the config of ip6 tunnel.

[-- Attachment #2: ip6_tunnel.diff --]
[-- Type: text/plain, Size: 1888 bytes --]

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 75fac93..a9692ec 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1037,7 +1037,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	struct ip6_tnl *t = netdev_priv(dev);
 	struct net *net = t->net;
 	struct net_device_stats *stats = &t->dev->stats;
-	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct ipv6hdr *ipv6h;
 	struct ipv6_tel_txoption opt;
 	struct dst_entry *dst = NULL, *ndst = NULL;
 	struct net_device *tdev;
@@ -1057,26 +1057,28 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 
 	/* NBMA tunnel */
 	if (ipv6_addr_any(&t->parms.raddr)) {
-		struct in6_addr *addr6;
-		struct neighbour *neigh;
-		int addr_type;
+		if (skb->protocol == htons(ETH_P_IPV6)) {
+			struct in6_addr *addr6;
+			struct neighbour *neigh;
+			int addr_type;
 
-		if (!skb_dst(skb))
-			goto tx_err_link_failure;
+			if (!skb_dst(skb))
+				goto tx_err_link_failure;
 
-		neigh = dst_neigh_lookup(skb_dst(skb),
-					 &ipv6_hdr(skb)->daddr);
-		if (!neigh)
-			goto tx_err_link_failure;
+			neigh = dst_neigh_lookup(skb_dst(skb),
+						 &ipv6_hdr(skb)->daddr);
+			if (!neigh)
+				goto tx_err_link_failure;
 
-		addr6 = (struct in6_addr *)&neigh->primary_key;
-		addr_type = ipv6_addr_type(addr6);
+			addr6 = (struct in6_addr *)&neigh->primary_key;
+			addr_type = ipv6_addr_type(addr6);
 
-		if (addr_type == IPV6_ADDR_ANY)
-			addr6 = &ipv6_hdr(skb)->daddr;
+			if (addr_type == IPV6_ADDR_ANY)
+				addr6 = &ipv6_hdr(skb)->daddr;
 
-		memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
-		neigh_release(neigh);
+			memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
+			neigh_release(neigh);
+		}
 	} else if (!(t->parms.flags &
 		     (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) {
 		/* enable the cache only only if the routing decision does

^ permalink raw reply related

* Re: [PATCH RFC] ptr_ring: add ptr_ring_unconsume
From: Jason Wang @ 2017-04-25  4:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, netdev
In-Reply-To: <20170424145632-mutt-send-email-mst@kernel.org>



On 2017年04月24日 20:00, Michael S. Tsirkin wrote:
> On Mon, Apr 24, 2017 at 07:54:18PM +0800, Jason Wang wrote:
>> On 2017年04月24日 07:28, Michael S. Tsirkin wrote:
>>> On Tue, Apr 18, 2017 at 11:07:42AM +0800, Jason Wang wrote:
>>>> On 2017年04月17日 07:19, Michael S. Tsirkin wrote:
>>>>> Applications that consume a batch of entries in one go
>>>>> can benefit from ability to return some of them back
>>>>> into the ring.
>>>>>
>>>>> Add an API for that - assuming there's space. If there's no space
>>>>> naturally we can't do this and have to drop entries, but this implies
>>>>> ring is full so we'd likely drop some anyway.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>>> ---
>>>>>
>>>>> Jason, in my mind the biggest issue with your batching patchset is the
>>>>> backet drops on disconnect.  This API will help avoid that in the common
>>>>> case.
>>>> Ok, I will rebase the series on top of this. (Though I don't think we care
>>>> the packet loss).
>>> E.g. I care - I often start sending packets to VM before it's
>>> fully booted. Several vhost resets might follow.
>> Ok.
>>
>>>>> I would still prefer that we understand what's going on,
>>>> I try to reply in another thread, does it make sense?
>>>>
>>>>>     and I would
>>>>> like to know what's the smallest batch size that's still helpful,
>>>> Yes, I've replied in another thread, the result is:
>>>>
>>>>
>>>> no batching   1.88Mpps
>>>> RX_BATCH=1    1.93Mpps
>>>> RX_BATCH=4    2.11Mpps
>>>> RX_BATCH=16   2.14Mpps
>>>> RX_BATCH=64   2.25Mpps
>>>> RX_BATCH=256  2.18Mpps
>>> Essentially 4 is enough, other stuf looks more like noise
>>> to me. What about 2?
>> The numbers are pretty stable, so probably not noise. Retested on top of
>> batch zeroing:
>>
>> no  1.97Mpps
>> 1   2.09Mpps
>> 2   2.11Mpps
>> 4   2.16Mpps
>> 8   2.19Mpps
>> 16  2.21Mpps
>> 32  2.25Mpps
>> 64  2.30Mpps
>> 128 2.21Mpps
>> 256 2.21Mpps
>>
>> 64 performs best.
>>
>> Thanks
> OK but it might be e.g. a function of the ring size, host cache size or
> whatever. As we don't really understand the why, if we just optimize for
> your setup we risk regressions in others.  64 entries is a lot, it
> increases the queue size noticeably.  Could this be part of the effect?
> Could you try changing the queue size to see what happens?

I increase tx_queue_len to 1100, but only see less than 1% improvement 
on pps number (batch = 1) in my machine. If you care about the 
regression, we probably can leave the choice to user through e.g module 
parameter. But I'm afraid we have already had too much choices for them. 
Or I can test this with different CPU types.

Thanks

>

^ permalink raw reply

* [PATCH net 1/1] qed: Fix error in the dcbx app meta data initialization.
From: Sudarsana Reddy Kalluru @ 2017-04-25  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuval.Mintz

DCBX app_data array is initialized with the incorrect values for
personality field. This would  prevent offloaded protocols from
honoring the PFC.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index a6e2bbe..cfdadb6 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -64,11 +64,11 @@
 	((u32)(prio_tc_tbl >> ((7 - prio) * 4)) & 0x7)
 
 static const struct qed_dcbx_app_metadata qed_dcbx_app_update[] = {
-	{DCBX_PROTOCOL_ISCSI, "ISCSI", QED_PCI_DEFAULT},
-	{DCBX_PROTOCOL_FCOE, "FCOE", QED_PCI_DEFAULT},
-	{DCBX_PROTOCOL_ROCE, "ROCE", QED_PCI_DEFAULT},
-	{DCBX_PROTOCOL_ROCE_V2, "ROCE_V2", QED_PCI_DEFAULT},
-	{DCBX_PROTOCOL_ETH, "ETH", QED_PCI_ETH}
+	{DCBX_PROTOCOL_ISCSI, "ISCSI", QED_PCI_ISCSI},
+	{DCBX_PROTOCOL_FCOE, "FCOE", QED_PCI_FCOE},
+	{DCBX_PROTOCOL_ROCE, "ROCE", QED_PCI_ETH_ROCE},
+	{DCBX_PROTOCOL_ROCE_V2, "ROCE_V2", QED_PCI_ETH_ROCE},
+	{DCBX_PROTOCOL_ETH, "ETH", QED_PCI_ETH},
 };
 
 static bool qed_dcbx_app_ethtype(u32 app_info_bitmap)
-- 
1.8.3.1

^ permalink raw reply related

* Re: qed*: debug infrastructures
From: Jakub Kicinski @ 2017-04-25  3:44 UTC (permalink / raw)
  To: Elior, Ariel
  Cc: David Miller, netdev@vger.kernel.org, Mintz, Yuval, Tayar, Tomer,
	Dupuis, Chad
In-Reply-To: <CY1PR0701MB1337FA26F1503FC9928B0340901F0@CY1PR0701MB1337.namprd07.prod.outlook.com>

On Mon, 24 Apr 2017 17:38:57 +0000, Elior, Ariel wrote:
> Hi Dave,

Hi Ariel!

I'm not Dave but let me share my perspective :)

> According to the recent messages on the list indicating debugfs is not the way
> to go, I am looking for some guidance on what is. dpipe approach was
> mentioned as favorable, but I wanted to make sure molding our debug features to
> this infrastructure will result in something acceptable. A few points:
> 
> 1.
> One of our HW debug features is a signal recording feature. HW is configured to
> output specific signals, which are then continuously dumped into a cyclic
> buffer on host. There are ~8000 signals, which can be logically divided to ~100
> groups. I believe this can be modeled in dpipe (or similar tool) as a set of
> ~100 tables with ~80 entries each, and user would be able to see them all and
> choose what they like. The output data itself is binary, and meaningless to
> "the user". The amount of data is basically as large a contiguous buffer as
> driver can allocate, i.e. usually 4MB. When user selects the signals, and sets
> meta data regarding to mode of operations, some device configuration will have
> to take place. Does that sound reasonable?
> This debug feature already exists out of tree for bnx2x and qed* drivers and is
> *very* effective in field deployments. I would very much like to see this as an
> in-tree feature via some infrastructure or another.

Sorry for even mentioning it, new debug interfaces would be cool, but
for FW/HW state dumps which are meaningless to the user why not just
use ethtool get-dump/set-dump?  Do you really need the ability to
toggle those 8k signals one-by-one or are there reasonable sets you
could provide from the driver that you could encode on the available
32bits of flags?

What could be useful would be some form of start/stop commands for
debugging to tell the driver/FW when to record events selected by
set-dump and maybe a way for the user to discover what dumps the driver
can provide (a'la ethtool private flags).

> 2.
> Sometimes we want to debug the probe or removal flow of the driver. ethtool has
> the disadvantage of only being available once network device is available. If a
> bug stops the load flow before the ethtool debug paths are available, there is
> no way to collect a dump. Similarly, removal flows which hit a bug but do remove
> the network device, can't be debugged from ethtool. Does dpipe suffer from the
> same problem? qed* (like mlx*) has a common-functionality module. This allows
> creating debugfs nodes even before the network drivers are probed, providing a
> solution for this (debug nodes are also available after network driver removal).
> If dpipe does hold an answer here (e.g. provide preconfiguration which would
> kick in when network device registers) then we might want to port all of our
> register dump logic over there for this advantage. Does that sound reasonable?

Porting the debug/dump infrastructure to devlink would be very much
appreciated.  I'm not sure it would fit into dpipe or be a separate
command.

> 3.
> Yuval mentioned this, but I wanted to reiterate that the same is necessary for
> our storage drivers (qedi/qedf). debugfs does have the advantage of being non
> sub-system specific. Is there perhaps another non subsystem specific debug
> infrastructure which *is* acceptable to the networking subsystem? My guess is
> that the storage drivers will turn to debugfs (in their own subsystem).

devlink is not ethernet-specific, it should be a good fit for iSCSI and
FCOE drivers, too.

^ permalink raw reply

* [PATCH net-next v2] sparc64: Improve 64-bit constant loading in eBPF JIT.
From: David Miller @ 2017-04-25  3:38 UTC (permalink / raw)
  To: netdev; +Cc: sparclinux, ast, daniel


Doing a full 64-bit decomposition is really stupid especially for
simple values like 0 and -1.

But if we are going to optimize this, go all the way and try for all 2
and 3 instruction sequences not requiring a temporary register as
well.

First we do the easy cases where it's a zero or sign extended 32-bit
number (sethi+or, sethi+xor, respectively).

Then we try to find a range of set bits we can load simply then shift
up into place, in various ways.

Then we try negating the constant and see if we can do a simple
sequence using that with a xor at the end.  (f.e. the range of set
bits can't be loaded simply, but for the negated value it can)

The final optimized strategy involves 4 instructions sequences not
needing a temporary register.

Otherwise we sadly fully decompose using a temp..

Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000:

0000000000000000 <foo>:
   0:   9d e3 bf 50     save  %sp, -176, %sp
   4:   01 00 00 00     nop
   8:   90 10 00 18     mov  %i0, %o0
   c:   13 3f ff ff     sethi  %hi(0xfffffc00), %o1
  10:   92 12 63 ff     or  %o1, 0x3ff, %o1     ! ffffffff <foo+0xffffffff>
  14:   93 2a 70 10     sllx  %o1, 0x10, %o1
  18:   15 3f ff ff     sethi  %hi(0xfffffc00), %o2
  1c:   94 12 a3 ff     or  %o2, 0x3ff, %o2     ! ffffffff <foo+0xffffffff>
  20:   95 2a b0 10     sllx  %o2, 0x10, %o2
  24:   92 1a 60 00     xor  %o1, 0, %o1
  28:   12 e2 40 8a     cxbe  %o1, %o2, 38 <foo+0x38>
  2c:   9a 10 20 02     mov  2, %o5
  30:   10 60 00 03     b,pn   %xcc, 3c <foo+0x3c>
  34:   01 00 00 00     nop
  38:   9a 10 20 01     mov  1, %o5     ! 1 <foo+0x1>
  3c:   81 c7 e0 08     ret
  40:   91 eb 40 00     restore  %o5, %g0, %o0

Signed-off-by: David S. Miller <davem@davemloft.net>
---

v2:
 - fix spaces (Alexei)
 - put details into commit msg (Alexei)

 arch/sparc/net/bpf_jit_comp_64.c | 249 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 245 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 2b2f3c3..ec7d10d 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -28,6 +28,11 @@ static inline bool is_simm5(unsigned int value)
 	return value + 0x10 < 0x20;
 }
 
+static inline bool is_sethi(unsigned int value)
+{
+	return (value & ~0x3fffff) == 0;
+}
+
 static void bpf_flush_icache(void *start_, void *end_)
 {
 	/* Cheetah's I-cache is fully coherent.  */
@@ -367,16 +372,252 @@ static void emit_loadimm_sext(s32 K, unsigned int dest, struct jit_ctx *ctx)
 	}
 }
 
+static void analyze_64bit_constant(u32 high_bits, u32 low_bits,
+				   int *hbsp, int *lbsp, int *abbasp)
+{
+	int lowest_bit_set, highest_bit_set, all_bits_between_are_set;
+	int i;
+
+	lowest_bit_set = highest_bit_set = -1;
+	i = 0;
+	do {
+		if ((lowest_bit_set == -1) && ((low_bits >> i) & 1))
+			lowest_bit_set = i;
+		if ((highest_bit_set == -1) && ((high_bits >> (32 - i - 1)) & 1))
+			highest_bit_set = (64 - i - 1);
+	}  while (++i < 32 && (highest_bit_set == -1 ||
+			       lowest_bit_set == -1));
+	if (i == 32) {
+		i = 0;
+		do {
+			if (lowest_bit_set == -1 && ((high_bits >> i) & 1))
+				lowest_bit_set = i + 32;
+			if (highest_bit_set == -1 &&
+			    ((low_bits >> (32 - i - 1)) & 1))
+				highest_bit_set = 32 - i - 1;
+		} while (++i < 32 && (highest_bit_set == -1 ||
+				      lowest_bit_set == -1));
+	}
+
+	all_bits_between_are_set = 1;
+	for (i = lowest_bit_set; i <= highest_bit_set; i++) {
+		if (i < 32) {
+			if ((low_bits & (1 << i)) != 0)
+				continue;
+		} else {
+			if ((high_bits & (1 << (i - 32))) != 0)
+				continue;
+		}
+		all_bits_between_are_set = 0;
+		break;
+	}
+	*hbsp = highest_bit_set;
+	*lbsp = lowest_bit_set;
+	*abbasp = all_bits_between_are_set;
+}
+
+static unsigned long create_simple_focus_bits(unsigned long high_bits,
+					      unsigned long low_bits,
+					      int lowest_bit_set, int shift)
+{
+	long hi, lo;
+
+	if (lowest_bit_set < 32) {
+		lo = (low_bits >> lowest_bit_set) << shift;
+		hi = ((high_bits << (32 - lowest_bit_set)) << shift);
+	} else {
+		lo = 0;
+		hi = ((high_bits >> (lowest_bit_set - 32)) << shift);
+	}
+	return hi | lo;
+}
+
+static bool const64_is_2insns(unsigned long high_bits,
+			      unsigned long low_bits)
+{
+	int highest_bit_set, lowest_bit_set, all_bits_between_are_set;
+
+	if (high_bits == 0 || high_bits == 0xffffffff)
+		return true;
+
+	analyze_64bit_constant(high_bits, low_bits,
+			       &highest_bit_set, &lowest_bit_set,
+			       &all_bits_between_are_set);
+
+	if ((highest_bit_set == 63 || lowest_bit_set == 0) &&
+	    all_bits_between_are_set != 0)
+		return true;
+
+	if (highest_bit_set - lowest_bit_set < 21)
+		return true;
+
+	return false;
+}
+
+static void sparc_emit_set_const64_quick2(unsigned long high_bits,
+					  unsigned long low_imm,
+					  unsigned int dest,
+					  int shift_count, struct jit_ctx *ctx)
+{
+	emit_loadimm32(high_bits, dest, ctx);
+
+	/* Now shift it up into place.  */
+	emit_alu_K(SLLX, dest, shift_count, ctx);
+
+	/* If there is a low immediate part piece, finish up by
+	 * putting that in as well.
+	 */
+	if (low_imm != 0)
+		emit(OR | IMMED | RS1(dest) | S13(low_imm) | RD(dest), ctx);
+}
+
 static void emit_loadimm64(u64 K, unsigned int dest, struct jit_ctx *ctx)
 {
+	int all_bits_between_are_set, lowest_bit_set, highest_bit_set;
 	unsigned int tmp = bpf2sparc[TMP_REG_1];
-	u32 high_part = (K >> 32);
-	u32 low_part = (K & 0xffffffff);
+	u32 low_bits = (K & 0xffffffff);
+	u32 high_bits = (K >> 32);
+
+	/* These two tests also take care of all of the one
+	 * instruction cases.
+	 */
+	if (high_bits == 0xffffffff && (low_bits & 0x80000000))
+		return emit_loadimm_sext(K, dest, ctx);
+	if (high_bits == 0x00000000)
+		return emit_loadimm32(K, dest, ctx);
+
+	analyze_64bit_constant(high_bits, low_bits, &highest_bit_set,
+			       &lowest_bit_set, &all_bits_between_are_set);
+
+	/* 1) mov	-1, %reg
+	 *    sllx	%reg, shift, %reg
+	 * 2) mov	-1, %reg
+	 *    srlx	%reg, shift, %reg
+	 * 3) mov	some_small_const, %reg
+	 *    sllx	%reg, shift, %reg
+	 */
+	if (((highest_bit_set == 63 || lowest_bit_set == 0) &&
+	     all_bits_between_are_set != 0) ||
+	    ((highest_bit_set - lowest_bit_set) < 12)) {
+		int shift = lowest_bit_set;
+		long the_const = -1;
+
+		if ((highest_bit_set != 63 && lowest_bit_set != 0) ||
+		    all_bits_between_are_set == 0) {
+			the_const =
+				create_simple_focus_bits(high_bits, low_bits,
+							 lowest_bit_set, 0);
+		} else if (lowest_bit_set == 0)
+			shift = -(63 - highest_bit_set);
+
+		emit(OR | IMMED | RS1(G0) | S13(the_const) | RD(dest), ctx);
+		if (shift > 0)
+			emit_alu_K(SLLX, dest, shift, ctx);
+		else if (shift < 0)
+			emit_alu_K(SRLX, dest, -shift, ctx);
+
+		return;
+	}
+
+	/* Now a range of 22 or less bits set somewhere.
+	 * 1) sethi	%hi(focus_bits), %reg
+	 *    sllx	%reg, shift, %reg
+	 * 2) sethi	%hi(focus_bits), %reg
+	 *    srlx	%reg, shift, %reg
+	 */
+	if ((highest_bit_set - lowest_bit_set) < 21) {
+		unsigned long focus_bits =
+			create_simple_focus_bits(high_bits, low_bits,
+						 lowest_bit_set, 10);
+
+		emit(SETHI(focus_bits, dest), ctx);
+
+		/* If lowest_bit_set == 10 then a sethi alone could
+		 * have done it.
+		 */
+		if (lowest_bit_set < 10)
+			emit_alu_K(SRLX, dest, 10 - lowest_bit_set, ctx);
+		else if (lowest_bit_set > 10)
+			emit_alu_K(SLLX, dest, lowest_bit_set - 10, ctx);
+		return;
+	}
+
+	/* Ok, now 3 instruction sequences.  */
+	if (low_bits == 0) {
+		emit_loadimm32(high_bits, dest, ctx);
+		emit_alu_K(SLLX, dest, 32, ctx);
+		return;
+	}
+
+	/* We may be able to do something quick
+	 * when the constant is negated, so try that.
+	 */
+	if (const64_is_2insns((~high_bits) & 0xffffffff,
+			      (~low_bits) & 0xfffffc00)) {
+		/* NOTE: The trailing bits get XOR'd so we need the
+		 * non-negated bits, not the negated ones.
+		 */
+		unsigned long trailing_bits = low_bits & 0x3ff;
+
+		if ((((~high_bits) & 0xffffffff) == 0 &&
+		     ((~low_bits) & 0x80000000) == 0) ||
+		    (((~high_bits) & 0xffffffff) == 0xffffffff &&
+		     ((~low_bits) & 0x80000000) != 0)) {
+			unsigned long fast_int = (~low_bits & 0xffffffff);
+
+			if ((is_sethi(fast_int) &&
+			     (~high_bits & 0xffffffff) == 0)) {
+				emit(SETHI(fast_int, dest), ctx);
+			} else if (is_simm13(fast_int)) {
+				emit(OR | IMMED | RS1(G0) | S13(fast_int) | RD(dest), ctx);
+			} else {
+				emit_loadimm64(fast_int, dest, ctx);
+			}
+		} else {
+			u64 n = ((~low_bits) & 0xfffffc00) |
+				(((unsigned long)((~high_bits) & 0xffffffff))<<32);
+			emit_loadimm64(n, dest, ctx);
+		}
+
+		low_bits = -0x400 | trailing_bits;
+
+		emit(XOR | IMMED | RS1(dest) | S13(low_bits) | RD(dest), ctx);
+		return;
+	}
+
+	/* 1) sethi	%hi(xxx), %reg
+	 *    or	%reg, %lo(xxx), %reg
+	 *    sllx	%reg, yyy, %reg
+	 */
+	if ((highest_bit_set - lowest_bit_set) < 32) {
+		unsigned long focus_bits =
+			create_simple_focus_bits(high_bits, low_bits,
+						 lowest_bit_set, 0);
+
+		/* So what we know is that the set bits straddle the
+		 * middle of the 64-bit word.
+		 */
+		sparc_emit_set_const64_quick2(focus_bits, 0, dest,
+					      lowest_bit_set, ctx);
+		return;
+	}
+
+	/* 1) sethi	%hi(high_bits), %reg
+	 *    or	%reg, %lo(high_bits), %reg
+	 *    sllx	%reg, 32, %reg
+	 *    or	%reg, low_bits, %reg
+	 */
+	if (is_simm13(low_bits) && ((int)low_bits > 0)) {
+		sparc_emit_set_const64_quick2(high_bits, low_bits,
+					      dest, 32, ctx);
+		return;
+	}
 
+	/* Oh well, we tried... Do a full 64-bit decomposition.  */
 	ctx->tmp_1_used = true;
 
-	emit_set_const(high_part, tmp, ctx);
-	emit_set_const(low_part, dest, ctx);
+	emit_loadimm32(high_bits, tmp, ctx);
+	emit_loadimm32(low_bits, dest, ctx);
 	emit_alu_K(SLLX, tmp, 32, ctx);
 	emit(OR | RS1(dest) | RS2(tmp) | RD(dest), ctx);
 }
-- 
2.1.2.532.g19b5d50


^ permalink raw reply related

* Re: [PATCH net-next] sparc64: Improve 64-bit constant loading in eBPF JIT.
From: David Miller @ 2017-04-25  3:28 UTC (permalink / raw)
  To: ast; +Cc: netdev, sparclinux, daniel
In-Reply-To: <74973b5c-5380-b27b-3cf6-7b1444cade02@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Mon, 24 Apr 2017 20:18:56 -0700

> all gcc-ism actually gives confidence that most likely
> it's all good and battle proven logic :)

Yes, this is old code written 15 years ago :)

I'll fix up the spaces.

^ permalink raw reply

* Re: [PATCH net] netvsc: fix calculation of available send sections
From: Greg Rose @ 2017-04-25  3:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, Stephen Hemminger
In-Reply-To: <20170425013338.2653-1-sthemmin@microsoft.com>

On Mon, 2017-04-24 at 18:33 -0700, Stephen Hemminger wrote:
> My change (introduced in 4.11) to use find_first_clear_bit
> incorrectly assumed that the size argument was words, not bits.

Oops...

> The effect was only a small limited number of the available send
> sections were being actually used. This can cause performance loss
> with some workloads.
> 
> Since map_words is now used only during initialization, it can
> be on stack instead of in per-device data.
> 
> Fixes: b58a185801da ("netvsc: simplify get next send section")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

Looks good to me.

Reviewed by Greg Rose <gvrose8192@gmail.com>

> ---
>  drivers/net/hyperv/hyperv_net.h | 1 -
>  drivers/net/hyperv/netvsc.c     | 9 ++++-----
>  2 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index f9f3dba7a588..db23cb36ae5c 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -751,7 +751,6 @@ struct netvsc_device {
>  	u32 send_section_cnt;
>  	u32 send_section_size;
>  	unsigned long *send_section_map;
> -	int map_words;
>  
>  	/* Used for NetVSP initialization protocol */
>  	struct completion channel_init_wait;
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 8dd0b8770328..15ef713d96c0 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -236,6 +236,7 @@ static int netvsc_init_buf(struct hv_device *device)
>  	struct netvsc_device *net_device;
>  	struct nvsp_message *init_packet;
>  	struct net_device *ndev;
> +	size_t map_words;
>  	int node;
>  
>  	net_device = get_outbound_net_device(device);
> @@ -401,11 +402,9 @@ static int netvsc_init_buf(struct hv_device *device)
>  		   net_device->send_section_size, net_device->send_section_cnt);
>  
>  	/* Setup state for managing the send buffer. */
> -	net_device->map_words = DIV_ROUND_UP(net_device->send_section_cnt,
> -					     BITS_PER_LONG);
> +	map_words = DIV_ROUND_UP(net_device->send_section_cnt, BITS_PER_LONG);
>  
> -	net_device->send_section_map = kcalloc(net_device->map_words,
> -					       sizeof(ulong), GFP_KERNEL);
> +	net_device->send_section_map = kcalloc(map_words, sizeof(ulong), GFP_KERNEL);
>  	if (net_device->send_section_map == NULL) {
>  		ret = -ENOMEM;
>  		goto cleanup;
> @@ -683,7 +682,7 @@ static u32 netvsc_get_next_send_section(struct netvsc_device *net_device)
>  	unsigned long *map_addr = net_device->send_section_map;
>  	unsigned int i;
>  
> -	for_each_clear_bit(i, map_addr, net_device->map_words) {
> +	for_each_clear_bit(i, map_addr, net_device->send_section_cnt) {
>  		if (sync_test_and_set_bit(i, map_addr) == 0)
>  			return i;
>  	}

^ permalink raw reply

* Re: [PATCH net-next] sparc64: Improve 64-bit constant loading in eBPF JIT.
From: Alexei Starovoitov @ 2017-04-25  3:18 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: sparclinux, daniel
In-Reply-To: <20170424.225334.898469015162314835.davem@davemloft.net>

On 4/24/17 7:53 PM, David Miller wrote:
>
> Doing a full 64-bit decomposition is really stupid especially for
> simple values like 0 and -1.
>
> But if we are going to optimize this, go all the way and try for all 2
> and 3 instruction sequences not requiring a temporary register as
> well.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>
> This one is dedicated to all the bit twiddlers out there.
>
> First we do the easy cases where it's a zero or sign extended
> 32-bit number (sethi+or, sethi+xor, respectively).
>
> Then we try to find a range of set bits we can load simply then shift
> up into place, in various ways.
>
> Then we try negating the constant and see if we can do a simple
> sequence using that with a xor at the end.  (f.e. the range of set
> bits can't be loaded simply, but for the negated value it can)
>
> The final optimized strategy involves 4 instructions sequences not
> needing a temporary register.
>
> Otherwise we sadly fully decompose using a temp..
>
> Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000:
>
> 0000000000000000 <foo>:
>    0:   9d e3 bf 50     save  %sp, -176, %sp
>    4:   01 00 00 00     nop
>    8:   90 10 00 18     mov  %i0, %o0
>    c:   13 3f ff ff     sethi  %hi(0xfffffc00), %o1
>   10:   92 12 63 ff     or  %o1, 0x3ff, %o1     ! ffffffff <foo+0xffffffff>
>   14:   93 2a 70 10     sllx  %o1, 0x10, %o1
>   18:   15 3f ff ff     sethi  %hi(0xfffffc00), %o2
>   1c:   94 12 a3 ff     or  %o2, 0x3ff, %o2     ! ffffffff <foo+0xffffffff>
>   20:   95 2a b0 10     sllx  %o2, 0x10, %o2
>   24:   92 1a 60 00     xor  %o1, 0, %o1
>   28:   12 e2 40 8a     cxbe  %o1, %o2, 38 <foo+0x38>
>   2c:   9a 10 20 02     mov  2, %o5
>   30:   10 60 00 03     b,pn   %xcc, 3c <foo+0x3c>
>   34:   01 00 00 00     nop
>   38:   9a 10 20 01     mov  1, %o5     ! 1 <foo+0x1>
>   3c:   81 c7 e0 08     ret
>   40:   91 eb 40 00     restore  %o5, %g0, %o0

all of the above is very useful info that can be in commit log.

> Last optimization tonight, I promise :-)
>
>  arch/sparc/net/bpf_jit_comp_64.c | 249 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 245 insertions(+), 4 deletions(-)
>
> diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
> index 2b2f3c3..6869ec0 100644
> --- a/arch/sparc/net/bpf_jit_comp_64.c
> +++ b/arch/sparc/net/bpf_jit_comp_64.c
> @@ -28,6 +28,11 @@ static inline bool is_simm5(unsigned int value)
>  	return value + 0x10 < 0x20;
>  }
>
> +static inline bool is_sethi(unsigned int value)
> +{
> +	return (value & ~0x3fffff) == 0;
> +}
> +
>  static void bpf_flush_icache(void *start_, void *end_)
>  {
>  	/* Cheetah's I-cache is fully coherent.  */
> @@ -367,16 +372,252 @@ static void emit_loadimm_sext(s32 K, unsigned int dest, struct jit_ctx *ctx)
>  	}
>  }
>
> +static void analyze_64bit_constant(u32 high_bits, u32 low_bits,
> +				   int *hbsp, int *lbsp, int *abbasp)
> +{
> +	int lowest_bit_set, highest_bit_set, all_bits_between_are_set;
> +	int i;
> +
> +	lowest_bit_set = highest_bit_set = -1;
> +	i = 0;
> +	do {
> +		if ((lowest_bit_set == -1) && ((low_bits >> i) & 1))
> +			lowest_bit_set = i;
> +		if ((highest_bit_set == -1) && ((high_bits >> (32 - i - 1)) & 1))
> +			highest_bit_set = (64 - i - 1);
> +	}  while (++i < 32 && ((highest_bit_set == -1) ||
> +			       (lowest_bit_set == -1)));

copy pasta from gcc ? ;)
the number of extra () looks excessive.

> +static bool const64_is_2insns (unsigned long high_bits,
> +			       unsigned long low_bits)

extra space before (

> +{
> +	int highest_bit_set, lowest_bit_set, all_bits_between_are_set;
> +
> +	if (high_bits == 0 || high_bits == 0xffffffff)
> +		return true;
> +
> +	analyze_64bit_constant (high_bits, low_bits,

another space

> +				&highest_bit_set, &lowest_bit_set,
> +				&all_bits_between_are_set);
> +
> +	if ((highest_bit_set == 63 || lowest_bit_set == 0) &&
> +	    all_bits_between_are_set != 0)
> +		return true;
> +
> +	if ((highest_bit_set - lowest_bit_set) < 21)
> +		return true;

extra ()

> +static void sparc_emit_set_const64_quick2 (unsigned long high_bits,

extra space

> +	/* Now a range of 22 or less bits set somewhere.
> +	 * 1) sethi	%hi(focus_bits), %reg
> +	 *    sllx	%reg, shift, %reg
> +	 * 2) sethi	%hi(focus_bits), %reg
> +	 *    srlx	%reg, shift, %reg
> +	 */

was thinking whether any of these optimizations can be generalized to
other JITs. Probably not. sethi semantic is too sparc specific.

> +	if (const64_is_2insns ((~high_bits) & 0xffffffff,
> +			       (~low_bits) & 0xfffffc00)) {

all gcc-ism actually gives confidence that most likely
it's all good and battle proven logic :)


^ permalink raw reply

* [PATCH net-next] sparc64: Improve 64-bit constant loading in eBPF JIT.
From: David Miller @ 2017-04-25  2:53 UTC (permalink / raw)
  To: netdev; +Cc: sparclinux, ast, daniel


Doing a full 64-bit decomposition is really stupid especially for
simple values like 0 and -1.

But if we are going to optimize this, go all the way and try for all 2
and 3 instruction sequences not requiring a temporary register as
well.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

This one is dedicated to all the bit twiddlers out there.

First we do the easy cases where it's a zero or sign extended
32-bit number (sethi+or, sethi+xor, respectively).

Then we try to find a range of set bits we can load simply then shift
up into place, in various ways.

Then we try negating the constant and see if we can do a simple
sequence using that with a xor at the end.  (f.e. the range of set
bits can't be loaded simply, but for the negated value it can)

The final optimized strategy involves 4 instructions sequences not
needing a temporary register.

Otherwise we sadly fully decompose using a temp..

Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000:

0000000000000000 <foo>:
   0:   9d e3 bf 50     save  %sp, -176, %sp
   4:   01 00 00 00     nop
   8:   90 10 00 18     mov  %i0, %o0
   c:   13 3f ff ff     sethi  %hi(0xfffffc00), %o1
  10:   92 12 63 ff     or  %o1, 0x3ff, %o1     ! ffffffff <foo+0xffffffff>
  14:   93 2a 70 10     sllx  %o1, 0x10, %o1
  18:   15 3f ff ff     sethi  %hi(0xfffffc00), %o2
  1c:   94 12 a3 ff     or  %o2, 0x3ff, %o2     ! ffffffff <foo+0xffffffff>
  20:   95 2a b0 10     sllx  %o2, 0x10, %o2
  24:   92 1a 60 00     xor  %o1, 0, %o1
  28:   12 e2 40 8a     cxbe  %o1, %o2, 38 <foo+0x38>
  2c:   9a 10 20 02     mov  2, %o5
  30:   10 60 00 03     b,pn   %xcc, 3c <foo+0x3c>
  34:   01 00 00 00     nop
  38:   9a 10 20 01     mov  1, %o5     ! 1 <foo+0x1>
  3c:   81 c7 e0 08     ret
  40:   91 eb 40 00     restore  %o5, %g0, %o0

Last optimization tonight, I promise :-)

 arch/sparc/net/bpf_jit_comp_64.c | 249 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 245 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 2b2f3c3..6869ec0 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -28,6 +28,11 @@ static inline bool is_simm5(unsigned int value)
 	return value + 0x10 < 0x20;
 }
 
+static inline bool is_sethi(unsigned int value)
+{
+	return (value & ~0x3fffff) == 0;
+}
+
 static void bpf_flush_icache(void *start_, void *end_)
 {
 	/* Cheetah's I-cache is fully coherent.  */
@@ -367,16 +372,252 @@ static void emit_loadimm_sext(s32 K, unsigned int dest, struct jit_ctx *ctx)
 	}
 }
 
+static void analyze_64bit_constant(u32 high_bits, u32 low_bits,
+				   int *hbsp, int *lbsp, int *abbasp)
+{
+	int lowest_bit_set, highest_bit_set, all_bits_between_are_set;
+	int i;
+
+	lowest_bit_set = highest_bit_set = -1;
+	i = 0;
+	do {
+		if ((lowest_bit_set == -1) && ((low_bits >> i) & 1))
+			lowest_bit_set = i;
+		if ((highest_bit_set == -1) && ((high_bits >> (32 - i - 1)) & 1))
+			highest_bit_set = (64 - i - 1);
+	}  while (++i < 32 && ((highest_bit_set == -1) ||
+			       (lowest_bit_set == -1)));
+	if (i == 32) {
+		i = 0;
+		do {
+			if ((lowest_bit_set == -1) && ((high_bits >> i) & 1))
+				lowest_bit_set = i + 32;
+			if ((highest_bit_set == -1) &&
+			    ((low_bits >> (32 - i - 1)) & 1))
+				highest_bit_set = 32 - i - 1;
+		} while (++i < 32 && ((highest_bit_set == -1)
+				      || (lowest_bit_set == -1)));
+	}
+
+	all_bits_between_are_set = 1;
+	for (i = lowest_bit_set; i <= highest_bit_set; i++) {
+		if (i < 32) {
+			if ((low_bits & (1 << i)) != 0)
+				continue;
+		} else {
+			if ((high_bits & (1 << (i - 32))) != 0)
+				continue;
+		}
+		all_bits_between_are_set = 0;
+		break;
+	}
+	*hbsp = highest_bit_set;
+	*lbsp = lowest_bit_set;
+	*abbasp = all_bits_between_are_set;
+}
+
+static unsigned long create_simple_focus_bits(unsigned long high_bits,
+					      unsigned long low_bits,
+					      int lowest_bit_set, int shift)
+{
+	long hi, lo;
+
+	if (lowest_bit_set < 32) {
+		lo = (low_bits >> lowest_bit_set) << shift;
+		hi = ((high_bits << (32 - lowest_bit_set)) << shift);
+	} else {
+		lo = 0;
+		hi = ((high_bits >> (lowest_bit_set - 32)) << shift);
+	}
+	return hi | lo;
+}
+
+static bool const64_is_2insns (unsigned long high_bits,
+			       unsigned long low_bits)
+{
+	int highest_bit_set, lowest_bit_set, all_bits_between_are_set;
+
+	if (high_bits == 0 || high_bits == 0xffffffff)
+		return true;
+
+	analyze_64bit_constant (high_bits, low_bits,
+				&highest_bit_set, &lowest_bit_set,
+				&all_bits_between_are_set);
+
+	if ((highest_bit_set == 63 || lowest_bit_set == 0) &&
+	    all_bits_between_are_set != 0)
+		return true;
+
+	if ((highest_bit_set - lowest_bit_set) < 21)
+		return true;
+
+	return false;
+}
+
+static void sparc_emit_set_const64_quick2 (unsigned long high_bits,
+					   unsigned long low_imm,
+					   unsigned int dest,
+					   int shift_count, struct jit_ctx *ctx)
+{
+	emit_loadimm32(high_bits, dest, ctx);
+
+	/* Now shift it up into place.  */
+	emit_alu_K(SLLX, dest, shift_count, ctx);
+
+	/* If there is a low immediate part piece, finish up by
+	 * putting that in as well.
+	 */
+	if (low_imm != 0)
+		emit(OR | IMMED | RS1(dest) | S13(low_imm) | RD(dest), ctx);
+}
+
 static void emit_loadimm64(u64 K, unsigned int dest, struct jit_ctx *ctx)
 {
+	int all_bits_between_are_set, lowest_bit_set, highest_bit_set;
 	unsigned int tmp = bpf2sparc[TMP_REG_1];
-	u32 high_part = (K >> 32);
-	u32 low_part = (K & 0xffffffff);
+	u32 low_bits = (K & 0xffffffff);
+	u32 high_bits = (K >> 32);
+
+	/* These two tests also take care of all of the one
+	 * instruction cases.
+	 */
+	if (high_bits == 0xffffffff && (low_bits & 0x80000000))
+		return emit_loadimm_sext(K, dest, ctx);
+	if (high_bits == 0x00000000)
+		return emit_loadimm32(K, dest, ctx);
+
+	analyze_64bit_constant(high_bits, low_bits, &highest_bit_set,
+			       &lowest_bit_set, &all_bits_between_are_set);
+
+	/* 1) mov	-1, %reg
+	 *    sllx	%reg, shift, %reg
+	 * 2) mov	-1, %reg
+	 *    srlx	%reg, shift, %reg
+	 * 3) mov	some_small_const, %reg
+	 *    sllx	%reg, shift, %reg
+	 */
+	if (((highest_bit_set == 63 || lowest_bit_set == 0) &&
+	     all_bits_between_are_set != 0) ||
+	    ((highest_bit_set - lowest_bit_set) < 12)) {
+		int shift = lowest_bit_set;
+		long the_const = -1;
+
+		if ((highest_bit_set != 63 && lowest_bit_set != 0) ||
+		    all_bits_between_are_set == 0) {
+			the_const =
+				create_simple_focus_bits(high_bits, low_bits,
+							 lowest_bit_set, 0);
+		} else if (lowest_bit_set == 0)
+			shift = -(63 - highest_bit_set);
+
+		emit(OR | IMMED | RS1(G0) | S13(the_const) | RD(dest), ctx);
+		if (shift > 0)
+			emit_alu_K(SLLX, dest, shift, ctx);
+		else if (shift < 0)
+			emit_alu_K(SRLX, dest, -shift, ctx);
+
+		return;
+	}
+
+	/* Now a range of 22 or less bits set somewhere.
+	 * 1) sethi	%hi(focus_bits), %reg
+	 *    sllx	%reg, shift, %reg
+	 * 2) sethi	%hi(focus_bits), %reg
+	 *    srlx	%reg, shift, %reg
+	 */
+	if ((highest_bit_set - lowest_bit_set) < 21) {
+		unsigned long focus_bits =
+			create_simple_focus_bits(high_bits, low_bits,
+						 lowest_bit_set, 10);
+
+		emit(SETHI(focus_bits, dest), ctx);
+
+		/* If lowest_bit_set == 10 then a sethi alone could
+		 * have done it.
+		 */
+		if (lowest_bit_set < 10)
+			emit_alu_K(SRLX, dest, 10 - lowest_bit_set, ctx);
+		else if (lowest_bit_set > 10)
+			emit_alu_K(SLLX, dest, lowest_bit_set - 10, ctx);
+		return;
+	}
+
+	/* Ok, now 3 instruction sequences.  */
+	if (low_bits == 0) {
+		emit_loadimm32(high_bits, dest, ctx);
+		emit_alu_K(SLLX, dest, 32, ctx);
+		return;
+	}
+
+	/* We may be able to do something quick
+	 * when the constant is negated, so try that.
+	 */
+	if (const64_is_2insns ((~high_bits) & 0xffffffff,
+			       (~low_bits) & 0xfffffc00)) {
+		/* NOTE: The trailing bits get XOR'd so we need the
+		 * non-negated bits, not the negated ones.
+		 */
+		unsigned long trailing_bits = low_bits & 0x3ff;
+
+		if ((((~high_bits) & 0xffffffff) == 0 &&
+		     ((~low_bits) & 0x80000000) == 0) ||
+		    (((~high_bits) & 0xffffffff) == 0xffffffff &&
+		     ((~low_bits) & 0x80000000) != 0)) {
+			unsigned long fast_int = (~low_bits & 0xffffffff);
+
+			if ((is_sethi(fast_int) &&
+			     (~high_bits & 0xffffffff) == 0)) {
+				emit(SETHI(fast_int, dest), ctx);
+			} else if (is_simm13(fast_int)) {
+				emit(OR | IMMED | RS1(G0) | S13(fast_int) | RD(dest), ctx);
+			} else {
+				emit_loadimm64(fast_int, dest, ctx);
+			}
+		} else {
+			u64 n = ((~low_bits) & 0xfffffc00) |
+				(((unsigned long)((~high_bits) & 0xffffffff))<<32);
+			emit_loadimm64(n, dest, ctx);
+		}
+
+		low_bits = -0x400 | trailing_bits;
+
+		emit(XOR | IMMED | RS1(dest) | S13(low_bits) | RD(dest), ctx);
+		return;
+	}
+
+	/* 1) sethi	%hi(xxx), %reg
+	 *    or	%reg, %lo(xxx), %reg
+	 *    sllx	%reg, yyy, %reg
+	 */
+	if ((highest_bit_set - lowest_bit_set) < 32) {
+		unsigned long focus_bits =
+			create_simple_focus_bits(high_bits, low_bits,
+						 lowest_bit_set, 0);
+
+		/* So what we know is that the set bits straddle the
+		 * middle of the 64-bit word.
+		 */
+		sparc_emit_set_const64_quick2(focus_bits, 0, dest,
+					      lowest_bit_set, ctx);
+		return;
+	}
+
+	/* 1) sethi	%hi(high_bits), %reg
+	 *    or	%reg, %lo(high_bits), %reg
+	 *    sllx	%reg, 32, %reg
+	 *    or	%reg, low_bits, %reg
+	 */
+	if (is_simm13(low_bits) && ((int)low_bits > 0)) {
+		sparc_emit_set_const64_quick2(high_bits, low_bits,
+					      dest, 32, ctx);
+		return;
+	}
 
+	/* Oh well, we tried... Do a full 64-bit decomposition.  */
 	ctx->tmp_1_used = true;
 
-	emit_set_const(high_part, tmp, ctx);
-	emit_set_const(low_part, dest, ctx);
+	emit_loadimm32(high_bits, tmp, ctx);
+	emit_loadimm32(low_bits, dest, ctx);
 	emit_alu_K(SLLX, tmp, 32, ctx);
 	emit(OR | RS1(dest) | RS2(tmp) | RD(dest), ctx);
 }
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* Re: [PATCH v2] net: core: Prevent from dereferencing null pointer when
From: Myungho Jung @ 2017-04-25  2:39 UTC (permalink / raw)
  To: David Miller; +Cc: edumazet, netdev
In-Reply-To: <20170424.214450.2215530406378773476.davem@davemloft.net>

On Mon, Apr 24, 2017 at 09:44:50PM -0400, David Miller wrote:
> From: Myungho Jung <mhjungk@gmail.com>
> Date: Mon, 24 Apr 2017 18:00:52 -0700
> 
> > On Mon, Apr 24, 2017 at 12:02:35PM -0400, David Miller wrote:
> >> From: Myungho Jung <mhjungk@gmail.com>
> >> Date: Thu, 20 Apr 2017 16:59:20 -0700
> >> 
> >> > Added NULL check to make __dev_kfree_skb_irq consistent with kfree
> >> > family of functions.
> >> > 
> >> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
> >> > 
> >> > Signed-off-by: Myungho Jung <mhjungk@gmail.com>
> >> > ---
> >> > Changes in v2:
> >> >  - Correct category in subject
> >> 
> >> This subject line is an incomplete sentence.
> >> 
> >> This patch prevents dereferenccing a null pointer when "what"?
> > 
> > As it was reported on bugzilla, it would happen when plugging p54 usb device
> > to RPi2. But, i'm not 100% sure that this patch will resolve the issue because
> > the reporter didn't try my patch yet and I don't have the device to test.
> > 
> > And there might be some other places calling the function without checking
> > null pointer. The thing is that only the function don't check null among
> > kfree functions. So, I just hope this patch will prevent potential oops
> > issues.
> 
> It doesn't check for a NULL pointer because it is almost exclusively
> used in the interrupt paths where we know we have a non-NULL skb.

Yes, actually null is checked before calling the function in most
cases. That's why my first patch was applied not to net/core but to
p54 driver because I was worried about duplicated checking.

But, Christian suggested this patch to make it consistent with other
kfree functions and consume_skb, and Eric agreed with that.

Thanks,
Myungho

^ permalink raw reply

* Re: [PATCH net-next 0/2] flower: add MPLS matching support
From: Jakub Kicinski @ 2017-04-25  2:07 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David Miller, benjamin.lahaise, netdev, bcrl, Jiri Pirko
In-Reply-To: <8f6e7952-b87b-8959-b45a-ab687b69fb1b@mojatatu.com>

On Mon, 24 Apr 2017 22:06:08 -0400, Jamal Hadi Salim wrote:
> On 17-04-24 10:00 PM, Jamal Hadi Salim wrote:
> > On 17-04-24 09:48 PM, Jamal Hadi Salim wrote:  
> 
> >
> > Hrm. maybe I am wrong.
> > Lets say user sets all of the 8 bits in BOS,
> > what does setting
> > key_val->mpls_bos = nla_get_u8 do?
> >
> > Same with the 20 bits for the label in the u32
> > or 3 bit bits in the u8 tc.  
> 
> The label and tc are masked - maybe just the BOS
> needs something similar?

Indeed, good catch!

^ permalink raw reply

* Re: [PATCH net-next 0/2] flower: add MPLS matching support
From: Jamal Hadi Salim @ 2017-04-25  2:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Miller, benjamin.lahaise, netdev, bcrl, Jiri Pirko
In-Reply-To: <e58fd7ee-1bd0-0cd5-ab15-c510ec4e70bd@mojatatu.com>

On 17-04-24 10:00 PM, Jamal Hadi Salim wrote:
> On 17-04-24 09:48 PM, Jamal Hadi Salim wrote:

>
> Hrm. maybe I am wrong.
> Lets say user sets all of the 8 bits in BOS,
> what does setting
> key_val->mpls_bos = nla_get_u8 do?
>
> Same with the 20 bits for the label in the u32
> or 3 bit bits in the u8 tc.

The label and tc are masked - maybe just the BOS
needs something similar?

cheers,
jamal

^ permalink raw reply

* [PATCH net-next] bpf: map_get_next_key to return first key on NULL
From: Alexei Starovoitov @ 2017-04-25  2:00 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Teng Qin, netdev, kernel-team

From: Teng Qin <qinteng@fb.com>

When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/trace/events/bpf.h              | 10 +++++++---
 kernel/bpf/arraymap.c                   |  2 +-
 kernel/bpf/hashtab.c                    |  9 +++++----
 kernel/bpf/syscall.c                    | 20 ++++++++++++--------
 tools/testing/selftests/bpf/test_maps.c | 29 +++++++++++++++++++++++++----
 5 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/include/trace/events/bpf.h b/include/trace/events/bpf.h
index c3a53fd47ff1..52c8425d144b 100644
--- a/include/trace/events/bpf.h
+++ b/include/trace/events/bpf.h
@@ -321,11 +321,14 @@ TRACE_EVENT(bpf_map_next_key,
 		__dynamic_array(u8, key, map->key_size)
 		__dynamic_array(u8, nxt, map->key_size)
 		__field(bool, key_trunc)
+		__field(bool, key_null)
 		__field(int, ufd)
 	),
 
 	TP_fast_assign(
-		memcpy(__get_dynamic_array(key), key, map->key_size);
+		if (key)
+			memcpy(__get_dynamic_array(key), key, map->key_size);
+		__entry->key_null = !key;
 		memcpy(__get_dynamic_array(nxt), key_next, map->key_size);
 		__entry->type      = map->map_type;
 		__entry->key_len   = min(map->key_size, 16U);
@@ -336,8 +339,9 @@ TRACE_EVENT(bpf_map_next_key,
 	TP_printk("map type=%s ufd=%d key=[%s%s] next=[%s%s]",
 		  __print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
 		  __entry->ufd,
-		  __print_hex(__get_dynamic_array(key), __entry->key_len),
-		  __entry->key_trunc ? " ..." : "",
+		  __entry->key_null ? "NULL" : __print_hex(__get_dynamic_array(key),
+							   __entry->key_len),
+		  __entry->key_trunc && !__entry->key_null ? " ..." : "",
 		  __print_hex(__get_dynamic_array(nxt), __entry->key_len),
 		  __entry->key_trunc ? " ..." : "")
 );
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index ec621df5a97a..5e00b2333c26 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -182,7 +182,7 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
 static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
-	u32 index = *(u32 *)key;
+	u32 index = key ? *(u32 *)key : U32_MAX;
 	u32 *next = (u32 *)next_key;
 
 	if (index >= array->map.max_entries) {
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index bc80c038e430..004334ea13ba 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -540,12 +540,15 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	struct hlist_nulls_head *head;
 	struct htab_elem *l, *next_l;
 	u32 hash, key_size;
-	int i;
+	int i = 0;
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
 	key_size = map->key_size;
 
+	if (!key)
+		goto find_first_elem;
+
 	hash = htab_map_hash(key, key_size);
 
 	head = select_bucket(htab, hash);
@@ -553,10 +556,8 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	/* lookup the key */
 	l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
 
-	if (!l) {
-		i = 0;
+	if (!l)
 		goto find_first_elem;
-	}
 
 	/* key was found, get next key in the same bucket */
 	next_l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_next_rcu(&l->hash_node)),
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b89288e2b589..13642c73dca0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -536,14 +536,18 @@ static int map_get_next_key(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
-	err = -ENOMEM;
-	key = kmalloc(map->key_size, GFP_USER);
-	if (!key)
-		goto err_put;
-
-	err = -EFAULT;
-	if (copy_from_user(key, ukey, map->key_size) != 0)
-		goto free_key;
+	if (ukey) {
+		err = -ENOMEM;
+		key = kmalloc(map->key_size, GFP_USER);
+		if (!key)
+			goto err_put;
+
+		err = -EFAULT;
+		if (copy_from_user(key, ukey, map->key_size) != 0)
+			goto free_key;
+	} else {
+		key = NULL;
+	}
 
 	err = -ENOMEM;
 	next_key = kmalloc(map->key_size, GFP_USER);
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 20f1871874df..a977c4f7b0ce 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -28,7 +28,7 @@ static int map_flags;
 
 static void test_hashmap(int task, void *data)
 {
-	long long key, next_key, value;
+	long long key, next_key, first_key, value;
 	int fd;
 
 	fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
@@ -89,10 +89,13 @@ static void test_hashmap(int task, void *data)
 	assert(bpf_map_delete_elem(fd, &key) == -1 && errno == ENOENT);
 
 	/* Iterate over two elements. */
+	assert(bpf_map_get_next_key(fd, NULL, &first_key) == 0 &&
+	       (first_key == 1 || first_key == 2));
 	assert(bpf_map_get_next_key(fd, &key, &next_key) == 0 &&
-	       (next_key == 1 || next_key == 2));
+	       (next_key == first_key));
 	assert(bpf_map_get_next_key(fd, &next_key, &next_key) == 0 &&
-	       (next_key == 1 || next_key == 2));
+	       (next_key == 1 || next_key == 2) &&
+	       (next_key != first_key));
 	assert(bpf_map_get_next_key(fd, &next_key, &next_key) == -1 &&
 	       errno == ENOENT);
 
@@ -105,6 +108,8 @@ static void test_hashmap(int task, void *data)
 
 	key = 0;
 	/* Check that map is empty. */
+	assert(bpf_map_get_next_key(fd, NULL, &next_key) == -1 &&
+	       errno == ENOENT);
 	assert(bpf_map_get_next_key(fd, &key, &next_key) == -1 &&
 	       errno == ENOENT);
 
@@ -133,7 +138,7 @@ static void test_hashmap_percpu(int task, void *data)
 {
 	unsigned int nr_cpus = bpf_num_possible_cpus();
 	long long value[nr_cpus];
-	long long key, next_key;
+	long long key, next_key, first_key;
 	int expected_key_mask = 0;
 	int fd, i;
 
@@ -193,7 +198,13 @@ static void test_hashmap_percpu(int task, void *data)
 	assert(bpf_map_delete_elem(fd, &key) == -1 && errno == ENOENT);
 
 	/* Iterate over two elements. */
+	assert(bpf_map_get_next_key(fd, NULL, &first_key) == 0 &&
+	       ((expected_key_mask & first_key) == first_key));
 	while (!bpf_map_get_next_key(fd, &key, &next_key)) {
+		if (first_key) {
+			assert(next_key == first_key);
+			first_key = 0;
+		}
 		assert((expected_key_mask & next_key) == next_key);
 		expected_key_mask &= ~next_key;
 
@@ -219,6 +230,8 @@ static void test_hashmap_percpu(int task, void *data)
 
 	key = 0;
 	/* Check that map is empty. */
+	assert(bpf_map_get_next_key(fd, NULL, &next_key) == -1 &&
+	       errno == ENOENT);
 	assert(bpf_map_get_next_key(fd, &key, &next_key) == -1 &&
 	       errno == ENOENT);
 
@@ -264,6 +277,8 @@ static void test_arraymap(int task, void *data)
 	assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == ENOENT);
 
 	/* Iterate over two elements. */
+	assert(bpf_map_get_next_key(fd, NULL, &next_key) == 0 &&
+	       next_key == 0);
 	assert(bpf_map_get_next_key(fd, &key, &next_key) == 0 &&
 	       next_key == 0);
 	assert(bpf_map_get_next_key(fd, &next_key, &next_key) == 0 &&
@@ -319,6 +334,8 @@ static void test_arraymap_percpu(int task, void *data)
 	assert(bpf_map_lookup_elem(fd, &key, values) == -1 && errno == ENOENT);
 
 	/* Iterate over two elements. */
+	assert(bpf_map_get_next_key(fd, NULL, &next_key) == 0 &&
+	       next_key == 0);
 	assert(bpf_map_get_next_key(fd, &key, &next_key) == 0 &&
 	       next_key == 0);
 	assert(bpf_map_get_next_key(fd, &next_key, &next_key) == 0 &&
@@ -400,6 +417,8 @@ static void test_map_large(void)
 	       errno == E2BIG);
 
 	/* Iterate through all elements. */
+	assert(bpf_map_get_next_key(fd, NULL, &key) == 0);
+	key.c = -1;
 	for (i = 0; i < MAP_SIZE; i++)
 		assert(bpf_map_get_next_key(fd, &key, &key) == 0);
 	assert(bpf_map_get_next_key(fd, &key, &key) == -1 && errno == ENOENT);
@@ -499,6 +518,7 @@ static void test_map_parallel(void)
 	       errno == EEXIST);
 
 	/* Check that all elements were inserted. */
+	assert(bpf_map_get_next_key(fd, NULL, &key) == 0);
 	key = -1;
 	for (i = 0; i < MAP_SIZE; i++)
 		assert(bpf_map_get_next_key(fd, &key, &key) == 0);
@@ -518,6 +538,7 @@ static void test_map_parallel(void)
 
 	/* Nothing should be left. */
 	key = -1;
+	assert(bpf_map_get_next_key(fd, NULL, &key) == -1 && errno == ENOENT);
 	assert(bpf_map_get_next_key(fd, &key, &key) == -1 && errno == ENOENT);
 }
 
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next 0/2] flower: add MPLS matching support
From: Jamal Hadi Salim @ 2017-04-25  2:00 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Miller, benjamin.lahaise, netdev, bcrl, Jiri Pirko
In-Reply-To: <6ae82894-fc8b-2ad9-7dfd-cf74cd6042a1@mojatatu.com>

On 17-04-24 09:48 PM, Jamal Hadi Salim wrote:
> On 17-04-24 09:20 PM, Jakub Kicinski wrote:
>> On Mon, 24 Apr 2017 20:58:18 -0400, Jamal Hadi Salim wrote:
>>> On 17-04-24 02:32 PM, David Miller wrote:
>
>>> You have 3 TLVs, one of which is u8 that only allows use of 3 bits.
>>> The other is a u32 which allows only 20 bits to be set.
>>
>> I don't think we will ever reuse bits in a field which is called
>> MPLS_LABEL for anything else than an MPLS label.
>
> That is true. So maybe bad example. It also helps the mpls disector
> wont for example allow copying of more than 20 bits for a label.

Hrm. maybe I am wrong.
Lets say user sets all of the 8 bits in BOS,
what does setting
key_val->mpls_bos = nla_get_u8 do?

Same with the 20 bits for the label in the u32
or 3 bit bits in the u8 tc.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 7/7] liquidio: use pcie_flr instead of duplicating it
From: Felix Manlunas @ 2017-04-25  1:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Giovanni Cabiddu, Salvatore Benedetto,
	Mike Marciniszyn, Dennis Dalessandro, Derek Chickles,
	Satanand Burla, Felix Manlunas, Raghu Vatsavayi, Jeff Kirsher,
	linux-pci, qat-linux, linux-crypto, linux-rdma, netdev,
	linux-kernel
In-Reply-To: <20170414191131.14286-8-hch@lst.de>

From: Christoph Hellwig <hch@lst.de> 
Date: Fri, 14 Apr 2017 21:11:31 +0200

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> index 9d5e03502c76..afdbf7fa016e 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> @@ -869,20 +869,7 @@ static void octeon_pci_flr(struct octeon_device *oct)
>  	pci_write_config_word(oct->pci_dev, PCI_COMMAND,
>  			      PCI_COMMAND_INTX_DISABLE);
>  
> -	/* Wait for Transaction Pending bit clean */
> -	msleep(100);
> -	pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA, &status);
> -	if (status & PCI_EXP_DEVSTA_TRPND) {
> -		dev_info(&oct->pci_dev->dev, "Function reset incomplete after 100ms, sleeping for 5 seconds\n");
> -		ssleep(5);
> -		pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA,
> -					  &status);
> -		if (status & PCI_EXP_DEVSTA_TRPND)
> -			dev_info(&oct->pci_dev->dev, "Function reset still incomplete after 5s, reset anyway\n");
> -	}
> -	pcie_capability_set_word(oct->pci_dev, PCI_EXP_DEVCTL,
> -				 PCI_EXP_DEVCTL_BCR_FLR);
> -	mdelay(100);
> +	pcie_flr(oct->pci_dev);
>  
>  	pci_cfg_access_unlock(oct->pci_dev);
>  
> -- 
> 2.11.0
> 

This patch works.  I tested it on a LiquidIO NIC and the "next" branch of the
PCI git tree.

But the patch causes a gcc warning:

    .../lio_vf_main.c: In function 'octeon_pci_flr':
    .../lio_vf_main.c:862:6: warning: unused variable 'status' [-Wunused-variable]

Can you rework the patch to get rid of the warning?  Thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/2] flower: add MPLS matching support
From: Jamal Hadi Salim @ 2017-04-25  1:48 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Miller, benjamin.lahaise, netdev, bcrl, Jiri Pirko
In-Reply-To: <20170424182054.116d1a99@cakuba.netronome.com>

On 17-04-24 09:20 PM, Jakub Kicinski wrote:
> On Mon, 24 Apr 2017 20:58:18 -0400, Jamal Hadi Salim wrote:
>> On 17-04-24 02:32 PM, David Miller wrote:

>> You have 3 TLVs, one of which is u8 that only allows use of 3 bits.
>> The other is a u32 which allows only 20 bits to be set.
>
> I don't think we will ever reuse bits in a field which is called
> MPLS_LABEL for anything else than an MPLS label.

That is true. So maybe bad example. It also helps the mpls disector
wont for example allow copying of more than 20 bits for a label.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH v2] net: core: Prevent from dereferencing null pointer when
From: David Miller @ 2017-04-25  1:44 UTC (permalink / raw)
  To: mhjungk; +Cc: edumazet, netdev
In-Reply-To: <20170425010052.GA8717@fqdn.specialj.com>

From: Myungho Jung <mhjungk@gmail.com>
Date: Mon, 24 Apr 2017 18:00:52 -0700

> On Mon, Apr 24, 2017 at 12:02:35PM -0400, David Miller wrote:
>> From: Myungho Jung <mhjungk@gmail.com>
>> Date: Thu, 20 Apr 2017 16:59:20 -0700
>> 
>> > Added NULL check to make __dev_kfree_skb_irq consistent with kfree
>> > family of functions.
>> > 
>> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
>> > 
>> > Signed-off-by: Myungho Jung <mhjungk@gmail.com>
>> > ---
>> > Changes in v2:
>> >  - Correct category in subject
>> 
>> This subject line is an incomplete sentence.
>> 
>> This patch prevents dereferenccing a null pointer when "what"?
> 
> As it was reported on bugzilla, it would happen when plugging p54 usb device
> to RPi2. But, i'm not 100% sure that this patch will resolve the issue because
> the reporter didn't try my patch yet and I don't have the device to test.
> 
> And there might be some other places calling the function without checking
> null pointer. The thing is that only the function don't check null among
> kfree functions. So, I just hope this patch will prevent potential oops
> issues.

It doesn't check for a NULL pointer because it is almost exclusively
used in the interrupt paths where we know we have a non-NULL skb.

^ permalink raw reply

* Re: [PATCH v2] net: core: Prevent from dereferencing null pointer when
From: Myungho Jung @ 2017-04-25  1:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <CANn89iL1n67fkZm0ZL0zN0_AL_uWZDKZXY3mOrn0H96vZTX7zQ@mail.gmail.com>

On Mon, Apr 24, 2017 at 06:10:32PM -0700, Eric Dumazet wrote:
> On Mon, Apr 24, 2017 at 6:00 PM, Myungho Jung <mhjungk@gmail.com> wrote:
> > On Mon, Apr 24, 2017 at 12:02:35PM -0400, David Miller wrote:
> >> From: Myungho Jung <mhjungk@gmail.com>
> >> Date: Thu, 20 Apr 2017 16:59:20 -0700
> >>
> >> > Added NULL check to make __dev_kfree_skb_irq consistent with kfree
> >> > family of functions.
> >> >
> >> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
> >> >
> >> > Signed-off-by: Myungho Jung <mhjungk@gmail.com>
> >> > ---
> >> > Changes in v2:
> >> >  - Correct category in subject
> >>
> >> This subject line is an incomplete sentence.
> >>
> >> This patch prevents dereferenccing a null pointer when "what"?
> >
> > As it was reported on bugzilla, it would happen when plugging p54 usb device
> > to RPi2. But, i'm not 100% sure that this patch will resolve the issue because
> > the reporter didn't try my patch yet and I don't have the device to test.
> >
> > And there might be some other places calling the function without checking
> > null pointer. The thing is that only the function don't check null among
> > kfree functions. So, I just hope this patch will prevent potential oops
> > issues.
> 
> Okay, but your patch title was : "net: core: Prevent from
> dereferencing null pointer when"
> 
> "when" is usually followed by other words.

Oh, sorry that was typo. I'll remove "when" from subject.

Thanks,
Myungho

^ permalink raw reply

* [PATCH net] netvsc: fix calculation of available send sections
From: Stephen Hemminger @ 2017-04-25  1:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

My change (introduced in 4.11) to use find_first_clear_bit
incorrectly assumed that the size argument was words, not bits.
The effect was only a small limited number of the available send
sections were being actually used. This can cause performance loss
with some workloads.

Since map_words is now used only during initialization, it can
be on stack instead of in per-device data.

Fixes: b58a185801da ("netvsc: simplify get next send section")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h | 1 -
 drivers/net/hyperv/netvsc.c     | 9 ++++-----
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f9f3dba7a588..db23cb36ae5c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -751,7 +751,6 @@ struct netvsc_device {
 	u32 send_section_cnt;
 	u32 send_section_size;
 	unsigned long *send_section_map;
-	int map_words;
 
 	/* Used for NetVSP initialization protocol */
 	struct completion channel_init_wait;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 8dd0b8770328..15ef713d96c0 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -236,6 +236,7 @@ static int netvsc_init_buf(struct hv_device *device)
 	struct netvsc_device *net_device;
 	struct nvsp_message *init_packet;
 	struct net_device *ndev;
+	size_t map_words;
 	int node;
 
 	net_device = get_outbound_net_device(device);
@@ -401,11 +402,9 @@ static int netvsc_init_buf(struct hv_device *device)
 		   net_device->send_section_size, net_device->send_section_cnt);
 
 	/* Setup state for managing the send buffer. */
-	net_device->map_words = DIV_ROUND_UP(net_device->send_section_cnt,
-					     BITS_PER_LONG);
+	map_words = DIV_ROUND_UP(net_device->send_section_cnt, BITS_PER_LONG);
 
-	net_device->send_section_map = kcalloc(net_device->map_words,
-					       sizeof(ulong), GFP_KERNEL);
+	net_device->send_section_map = kcalloc(map_words, sizeof(ulong), GFP_KERNEL);
 	if (net_device->send_section_map == NULL) {
 		ret = -ENOMEM;
 		goto cleanup;
@@ -683,7 +682,7 @@ static u32 netvsc_get_next_send_section(struct netvsc_device *net_device)
 	unsigned long *map_addr = net_device->send_section_map;
 	unsigned int i;
 
-	for_each_clear_bit(i, map_addr, net_device->map_words) {
+	for_each_clear_bit(i, map_addr, net_device->send_section_cnt) {
 		if (sync_test_and_set_bit(i, map_addr) == 0)
 			return i;
 	}
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next] selftests/net: Fix broken test case in psock_fanout
From: Mike Maloney @ 2017-04-25  1:29 UTC (permalink / raw)
  To: netdev, davem; +Cc: Mike Maloney

From: Mike Maloney <maloney@google.com>

The error return falue form sock_fanout_open is -1, not zero.  One test
case was checking for 0 instead of -1.

Tested: Built and tested in clean client.
Signed-off-by: Mike Maloney <maloney@google.com>
---
 tools/testing/selftests/net/psock_fanout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index b4b1d91fcea5..989f917068d1 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -305,7 +305,7 @@ static void test_unique_fanout_group_ids(void)
 		exit(1);
 	}
 
-	if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id)) {
+	if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong type.\n");
 		exit(1);
 	}
-- 
2.13.0.rc0.306.g87b477812d-goog

^ permalink raw reply related

* MONTHLY UPDATE
From: IT Department @ 2017-04-24 23:17 UTC (permalink / raw)
  To: netdev

Recently, we have detect some unusual activity on your account and as a result, all email users are urged to update their email account within 24 hours of receiving this e-mail, please click the link http://beam.to/6128 to confirm that your email account is up to date with the institution requirement.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply

* Re: [PATCH net-next 0/2] flower: add MPLS matching support
From: Jamal Hadi Salim @ 2017-04-25  1:25 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David Miller, netdev, bcrl, Jiri Pirko
In-Reply-To: <20170425010558.GC24425@nvt-d.home.kvack.org>

On 17-04-24 09:05 PM, Benjamin LaHaise wrote:
> On Mon, Apr 24, 2017 at 08:58:18PM -0400, Jamal Hadi Salim wrote:
>> On 17-04-24 02:32 PM, David Miller wrote:

>> You have 3 TLVs, one of which is u8 that only allows use of 3 bits.
>> The other is a u32 which allows only 20 bits to be set.
>
> What are the new rules for TLVs -- do you want a new type added for the
> subset values that are limited to the number of bits actually used?  A
> pointed here would be helpful.
>

The rules are to check for bits that are not used.
For example in the BOS TLV it would be required to
verify that the user did not set any other bit other than the
bos field. It is required to reject the whole thing if the user
set any other of the u8 bits.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next 0/2] flower: add MPLS matching support
From: Jakub Kicinski @ 2017-04-25  1:20 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David Miller, benjamin.lahaise, netdev, bcrl, Jiri Pirko
In-Reply-To: <d54cca4b-6212-bf7d-93e2-8314177df42c@mojatatu.com>

On Mon, 24 Apr 2017 20:58:18 -0400, Jamal Hadi Salim wrote:
> On 17-04-24 02:32 PM, David Miller wrote:
> > From: Benjamin LaHaise <benjamin.lahaise@netronome.com>  
> 
> >
> > Series applied, but in the future:
> >
> > 1) Put the "v2", "v3", whatever in the inital [PATCH xxx] bracketed
> >    part of the subject line, otherwise it ends up in the GIT commit
> >    message header line and that's not desired.
> >
> > 2) Please cut it with these double signoffs, and add an appropriate
> >    entry to the email aliases file.  
> 
> I know i should have spoken earlier, wanted to but got distracted - but
> shouldnt the new rules have applied to this patch too? ;->
> 
> You have 3 TLVs, one of which is u8 that only allows use of 3 bits.
> The other is a u32 which allows only 20 bits to be set.

I don't think we will ever reuse bits in a field which is called
MPLS_LABEL for anything else than an MPLS label.  My understanding of
the conclusions from recent discussions was to either (a) make
attributes single purpose (i.e. separate u8 per flag); or (b) make
attributes u32/word size but validate the unused bits are zero.  This
patch would fall under (a).

^ permalink raw reply

* EXTREMELY IMPORTANT
From: Ms. Katherine @ 2017-04-24 22:33 UTC (permalink / raw)
  To: Recipients

Dear Beloved, Sorry for the inconvenience, I am getting in touch with you regarding an extremely important and urgent matter, Please I need your urgent assistance and idea to finish up a project (Orphanage Home) Due to my health condition, Everything is available to finish up the project, All I need is your idea and trust...............

Geachte Geliefde, Sorry voor het ongemak, ik kom met je in contact met betrekking tot een zeer belangrijke en dringende zaak. Gelieve mijn dringende hulp en idee nodig om een project af te ronden (Weeshuishuis) Vanwege mijn gezondheidstoestand is alles beschikbaar voor Voltooi het project, alles wat ik nodig heb is jouw idee en vertrouwen.

Please contact me for more details.
Contact Email: kathrynnmison@gmail.com

Thanks

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

^ permalink raw reply

* Re: [PATCH v2] net: core: Prevent from dereferencing null pointer when
From: Eric Dumazet @ 2017-04-25  1:10 UTC (permalink / raw)
  To: Myungho Jung; +Cc: David Miller, netdev
In-Reply-To: <20170425010052.GA8717@fqdn.specialj.com>

On Mon, Apr 24, 2017 at 6:00 PM, Myungho Jung <mhjungk@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 12:02:35PM -0400, David Miller wrote:
>> From: Myungho Jung <mhjungk@gmail.com>
>> Date: Thu, 20 Apr 2017 16:59:20 -0700
>>
>> > Added NULL check to make __dev_kfree_skb_irq consistent with kfree
>> > family of functions.
>> >
>> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
>> >
>> > Signed-off-by: Myungho Jung <mhjungk@gmail.com>
>> > ---
>> > Changes in v2:
>> >  - Correct category in subject
>>
>> This subject line is an incomplete sentence.
>>
>> This patch prevents dereferenccing a null pointer when "what"?
>
> As it was reported on bugzilla, it would happen when plugging p54 usb device
> to RPi2. But, i'm not 100% sure that this patch will resolve the issue because
> the reporter didn't try my patch yet and I don't have the device to test.
>
> And there might be some other places calling the function without checking
> null pointer. The thing is that only the function don't check null among
> kfree functions. So, I just hope this patch will prevent potential oops
> issues.

Okay, but your patch title was : "net: core: Prevent from
dereferencing null pointer when"

"when" is usually followed by other words.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox