Netdev List
 help / color / mirror / Atom feed
* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
From: John David Anglin @ 2019-01-30 19:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190130172818.GJ21904@lunn.ch>

On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
>> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
>>> >From my Espressobin
>>>
>>> cat /proc/interrupts
>>> ...
>>>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>>>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>>>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>>>
>>> These are PHY interrupts.
>> If we come back to my trying to use the INTn pin on the esspressobin, I
>> have found that clearing and resetting the interrupt
>> enable bits in the global control register (offset 0x4) restarts link
>> detection when the device is stuck.  This suggests that the
>> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
>> on this pin.  Possibly, this is caused by the fact
>> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
>> loading is done.  Another possibility might be race conditions
>> in processing interrupts.
>>
>> Thoughts?
> Hi David
>
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
>
> I would suggest you remove the interrupt from your device tree and use
> the mv88e6xxx polling method. If i remember correctly, it currently
> polls 10 per second, so PHY link up/down is going to be 5 times faster
> on average than when phylib is polling the PHY.
Hi Andrew,

The main motivation in doing this is to try to enable the AVB interrupt
and to improve the PTP support.
I agree that polling is perfectly fine for PHY link interrupts. 
Possibly, ATU and VTU might need faster
support but I'm not using that at the moment.

I have hacked on the time stamp code quit a bit to try and improve
things but there are still issues with
lost or overwritten time stamps:

Jan 28 11:15:05 localhost kernel: [234850.840883] mv88e6085
d0032004.mdio-mii:01
: timestamp discarded
Jan 28 11:15:05 localhost ptp4l: [234852.998] port 3: received SYNC
without time
stamp

I think when PTP packets other than PDELAY are too closely spaced, we
have problems accessing
the timestamp quick enough.  Also, timestamp access is dependent on CPU
speed and HZ.

It looks like I can easily connect MPP2_23 to MPP1_16 on the edge
connector P8.  I believe the northbridge
pins support level interrupts.

In /proc/interrupts, the switch interrupts are shown as edge.  The only
place that I see where this
is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
request_threaded_irq() passes
"IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply

* WARNING in strp_done (2)
From: syzbot @ 2019-01-30 18:53 UTC (permalink / raw)
  To: ast, daniel, davejwatson, davem, doronrk, kafai, linux-kernel,
	netdev, songliubraving, syzkaller-bugs, vakul.garg, yhs,
	yuehaibing

Hello,

syzbot found the following crash on:

HEAD commit:    02495e76ded5 Add linux-next specific files for 20190130
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16a00c2f400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a2b2e9c0bc43c14d
dashboard link: https://syzkaller.appspot.com/bug?extid=ea38a133bb90dd367b6e
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14b059ef400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ea38a133bb90dd367b6e@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_hsr: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): bridge0: link becomes ready
WARNING: CPU: 0 PID: 7927 at net/strparser/strparser.c:526  
strp_done+0xca/0xf0 net/strparser/strparser.c:526
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 7927 Comm: kworker/0:3 Not tainted 5.0.0-rc4-next-20190130 #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events sk_psock_destroy_deferred
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
  panic+0x2cb/0x65c kernel/panic.c:214
  __warn.cold+0x20/0x48 kernel/panic.c:571
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  fixup_bug arch/x86/kernel/traps.c:173 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:strp_done+0xca/0xf0 net/strparser/strparser.c:526
Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 75 31 48 c7 43 18 00 00  
00 00 e8 82 bc 40 fa 5b 41 5c 41 5d 5d c3 e8 76 bc 40 fa <0f> 0b eb 81 e8  
1d 7e 85 fa e9 5c ff ff ff 4c 89 e7 e8 70 7e 85 fa
RSP: 0018:ffff88808a7a7900 EFLAGS: 00010293
RAX: ffff88808f9343c0 RBX: ffff8880a7fa9240 RCX: ffffffff87415c5a
RDX: 0000000000000000 RSI: ffffffff87415cda RDI: 0000000000000001
RBP: ffff88808a7a7918 R08: ffff88808f9343c0 R09: ffffed1015cc5b80
R10: ffffed1015cc5b7f R11: ffff8880ae62dbfb R12: 0000000000000001
R13: 1ffff110114f4f71 R14: ffff8880a7fa9200 R15: ffff88809fda3e00
  sk_psock_destroy_deferred+0x8b/0x7f0 net/core/skmsg.c:557
  process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
  worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
  kthread+0x357/0x430 kernel/kthread.c:247
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* [PATCH v3] lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically
From: Bart Van Assche @ 2019-01-30 18:42 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, herbert, netdev, linux-kernel, Bart Van Assche

The test_insert_dup() function from lib/test_rhashtable.c passes a
pointer to a stack object to rhltable_init(). Allocate the hash table
dynamically to avoid that the following is reported with object
debugging enabled:

ODEBUG: object (ptrval) is on stack (ptrval), but NOT annotated.
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:368 __debug_object_init+0x312/0x480
Modules linked in:
EIP: __debug_object_init+0x312/0x480
Call Trace:
 ? debug_object_init+0x1a/0x20
 ? __init_work+0x16/0x30
 ? rhashtable_init+0x1e1/0x460
 ? sched_clock_cpu+0x57/0xe0
 ? rhltable_init+0xb/0x20
 ? test_insert_dup+0x32/0x20f
 ? trace_hardirqs_on+0x38/0xf0
 ? ida_dump+0x10/0x10
 ? jhash+0x130/0x130
 ? my_hashfn+0x30/0x30
 ? test_rht_init+0x6aa/0xab4
 ? ida_dump+0x10/0x10
 ? test_rhltable+0xc5c/0xc5c
 ? do_one_initcall+0x67/0x28e
 ? trace_hardirqs_off+0x22/0xe0
 ? restore_all_kernel+0xf/0x70
 ? trace_hardirqs_on_thunk+0xc/0x10
 ? restore_all_kernel+0xf/0x70
 ? kernel_init_freeable+0x142/0x213
 ? rest_init+0x230/0x230
 ? kernel_init+0x10/0x110
 ? schedule_tail_wrapper+0x9/0xc
 ? ret_from_fork+0x19/0x24

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---

Changes compared to v2: fixed build error.

Changes compared to v1: instead of modifying rhashtable_init(), modify its
   caller.

lib/test_rhashtable.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 6a8ac7626797..e52f8cafe227 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -541,38 +541,45 @@ static unsigned int __init print_ht(struct rhltable *rhlt)
 static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects,
 				  int cnt, bool slow)
 {
-	struct rhltable rhlt;
+	struct rhltable *rhlt;
 	unsigned int i, ret;
 	const char *key;
 	int err = 0;
 
-	err = rhltable_init(&rhlt, &test_rht_params_dup);
-	if (WARN_ON(err))
+	rhlt = kmalloc(sizeof(*rhlt), GFP_KERNEL);
+	if (WARN_ON(!rhlt))
+		return -EINVAL;
+
+	err = rhltable_init(rhlt, &test_rht_params_dup);
+	if (WARN_ON(err)) {
+		kfree(rhlt);
 		return err;
+	}
 
 	for (i = 0; i < cnt; i++) {
 		rhl_test_objects[i].value.tid = i;
-		key = rht_obj(&rhlt.ht, &rhl_test_objects[i].list_node.rhead);
+		key = rht_obj(&rhlt->ht, &rhl_test_objects[i].list_node.rhead);
 		key += test_rht_params_dup.key_offset;
 
 		if (slow) {
-			err = PTR_ERR(rhashtable_insert_slow(&rhlt.ht, key,
+			err = PTR_ERR(rhashtable_insert_slow(&rhlt->ht, key,
 							     &rhl_test_objects[i].list_node.rhead));
 			if (err == -EAGAIN)
 				err = 0;
 		} else
-			err = rhltable_insert(&rhlt,
+			err = rhltable_insert(rhlt,
 					      &rhl_test_objects[i].list_node,
 					      test_rht_params_dup);
 		if (WARN(err, "error %d on element %d/%d (%s)\n", err, i, cnt, slow? "slow" : "fast"))
 			goto skip_print;
 	}
 
-	ret = print_ht(&rhlt);
+	ret = print_ht(rhlt);
 	WARN(ret != cnt, "missing rhltable elements (%d != %d, %s)\n", ret, cnt, slow? "slow" : "fast");
 
 skip_print:
-	rhltable_destroy(&rhlt);
+	rhltable_destroy(rhlt);
+	kfree(rhlt);
 
 	return 0;
 }
-- 
2.20.1.495.gaa96b0ce6b-goog


^ permalink raw reply related

* Re: [PATCH] ucc_geth: Reset BQL queue when stopping device
From: David Miller @ 2019-01-30 18:37 UTC (permalink / raw)
  To: mathias.thore
  Cc: leoyang.li, netdev, linuxppc-dev, david.gounaris,
	joakim.tjernlund
In-Reply-To: <20190128090747.15851-1-mathias.thore@infinera.com>

From: Mathias Thore <mathias.thore@infinera.com>
Date: Mon, 28 Jan 2019 10:07:47 +0100

> After a timeout event caused by for example a broadcast storm, when
> the MAC and PHY are reset, the BQL TX queue needs to be reset as
> well. Otherwise, the device will exhibit severe performance issues
> even after the storm has ended.
> 
> Co-authored-by: David Gounaris <david.gounaris@infinera.com>
> Signed-off-by: Mathias Thore <mathias.thore@infinera.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Paul E. McKenney @ 2019-01-30 18:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alexei Starovoitov, Peter Zijlstra, Alexei Starovoitov, davem,
	daniel, jakub.kicinski, netdev, kernel-team, mingo, jannh
In-Reply-To: <20190130181100.GA18558@fuggles.cambridge.arm.com>

On Wed, Jan 30, 2019 at 06:11:00PM +0000, Will Deacon wrote:
> Hi Alexei,
> 
> On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > > What I want to avoid is to define the whole execution ordering model upfront.
> > > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > > Most of the bpf progs are written and running on x86. We shouldn't
> > > > twist bpf developer's arm by artificially relaxing memory model.
> > > > BPF memory model is equal to memory model of underlying architecture.
> > > > What we can do is to make it bpf progs a bit more portable with
> > > > smp_rmb instructions, but we must not force weak execution on the developer.
> > > 
> > > Well, I agree with only introducing bits you actually need, and my
> > > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > > smp_store_release() might have been a far more useful example.
> > > 
> > > But I disagree with the last part; we have to pick a model now;
> > > otherwise you'll pain yourself into a corner.
> > > 
> > > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > > be gaining a lot of attention and that is very much a weak architecture.
> > > Adding strongly ordered assumptions to BPF now, will penalize them in
> > > the long run.
> > 
> > arm64 is gaining attention just like riscV is gaining it too.
> > BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> > BPF is not picking sides in CPU HW and ISA battles.
> 
> It's not about picking a side, it's about providing an abstraction of the
> various CPU architectures out there so that the programmer doesn't need to
> worry about where their program may run. Hell, even if you just said "eBPF
> follows x86 semantics" that would be better than saying nothing (and then we
> could have a discussion about whether x86 semantics are really what you
> want).

To reinforce this point, the Linux-kernel memory model (tools/memory-model)
is that abstraction for the Linux kernel.  Why not just use that for BPF?

							Thanx, Paul

> > Memory model is CPU HW design decision. BPF ISA cannot dictate HW design.
> > We're not saying today that BPF is strongly ordered.
> > BPF load/stores are behaving differently on x86 vs arm64.
> > We can add new instructions, but we cannot 'define' how load/stores behave
> > from memory model perspective.
> > For example, take atomicity of single byte load/store.
> > Not all archs have them atomic, but we cannot say to bpf programmers
> > to always assume non-atomic byte loads.
> 
> Hmm, I don't think this is a good approach to take for the future of eBPF.
> Assuming that a desirable property of an eBPF program is portability between
> CPU architectures, then you're effectively forcing the programmer to "assume
> the worst", where the worst is almost certainly unusable for practical
> purposes.
> 
> One easy thing you could consider would be to allow tagging of an eBPF
> program with its supported target architectures (the JIT will refuse to
> accept it for other architectures). This would at least prevent remove the
> illusion of portability and force the programmer to be explicit.
> 
> However, I think we'd much better off if we defined some basic ordering
> primitives such as relaxed and RCpc-style acquire/release operations
> (including atomic RmW), single-copy atomic memory accesses up to the native
> machine size and a full-fence instruction. If your program uses something
> that the underlying arch doesn't support, then it is rejected (e.g. 64-bit
> READ_ONCE on a 32-bit arch)
> 
> That should map straightforwardly to all modern architectures and allow for
> efficient codegen on x86 and arm64. It would probably require a bunch of new
> BPF instructions that would be defined to be atomic (you already have XADD
> as a relaxed atomic add).
> 
> Apologies if this sounds patronising, but I'm keen to help figure out the
> semantics *now* so that we don't end up having to infer them later on, which
> is the usual painful case for memory models. I suspect Peter and Paul would
> also prefer to attack it that way around. I appreciate that the temptation
> is to avoid the problem by deferring to the underlying hardware memory
> model, but I think that will create more problems than it solves and we're
> here to help you get this right.
> 
> Will
> 


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix potential RX buffer starvation for AF_XDP
From: Jakub Kicinski @ 2019-01-30 18:22 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Magnus Karlsson, Network Development, Björn Töpel,
	intel-wired-lan, Magnus Karlsson
In-Reply-To: <CAJ+HfNg=vUqUszLX+C6LvYRq8J_X_UiU4x63FCaXWqdRjV5DGA@mail.gmail.com>

On Wed, 30 Jan 2019 15:16:15 +0100, Björn Töpel wrote:
> Den ons 30 jan. 2019 kl 10:35 skrev Magnus Karlsson:
> >
> > On Tue, Jan 29, 2019 at 8:15 PM Jakub Kicinski wrote: 
> > >
> > > I may not understand the problem fully, but isn't it kind of normal
> > > that if you create a ring empty you'll never receive packets?  And it
> > > should be reasonably easy to catch while writing an app from scratch
> > > (i.e. it behaves deterministically).  
> >
> > Agree that this should be the normal behavior for a NIC. The question
> > is how to get out of this situation.There are two options: punt this
> > to the application writer or fix this in the driver. I chose to fix
> > the driver since this removes complexity in the application.
> >  
> 
> Magnus' fix addresses a race/timing issue. At zero-copy initialization
> point, if the fill ring was empty, the driver (both i40e and ixgbe)
> would stop retrying to "allocate" zero-copy frames from the fill
> ring. So, frames would never be received, even if the fill ring was
> filled at a later point.
> 
> If the driver runs-dry in terms of Rx buffer if one or more frames has
> been received, the driver will retry polling the fill-ring. However at
> initialization point, if the fill-ring was empty, the driver would
> just give up and never retry.
> 
> As Magnus stated, there is no "notify the kernel that the items has
> appeared in the fill ring" (other than a HW mechanism where the tail
> pointer is a door bell) on the Rx side, so for the Intel drivers it's
> up to the driver to solve this.

Oh, I see, so its a missing piece in an otherwise very fault tolerant
implementation :)  Okay, no objection.

The nfp prototype I did simply fails the program load if there are not
enough buffers to do the initial fill, but that's just a personal
preference.  I tend to return errors in unclear situations more than
most people.

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH] iavf: Use printf instead of gnu_printf for iavf_debug_d
From: Bowers, AndrewX @ 2019-01-30 18:20 UTC (permalink / raw)
  To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190110042157.8445-1-natechancellor@gmail.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Nathan Chancellor
> Sent: Wednesday, January 9, 2019 8:22 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; Nick Desaulniers <ndesaulniers@google.com>;
> linux-kernel@vger.kernel.org; Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com>; intel-wired-lan@lists.osuosl.org;
> Nathan Chancellor <natechancellor@gmail.com>; David S. Miller
> <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH] iavf: Use printf instead of gnu_printf for
> iavf_debug_d
> 
> Clang warns:
> 
> In file included from drivers/net/ethernet/intel/iavf/iavf_main.c:4:
> In file included from drivers/net/ethernet/intel/iavf/iavf.h:37:
> In file included from drivers/net/ethernet/intel/iavf/iavf_type.h:8:
> drivers/net/ethernet/intel/iavf/iavf_osdep.h:49:18: warning: 'format'
> attribute argument not supported: gnu_printf [-Wignored-attributes]
>         __attribute__ ((format(gnu_printf, 3, 4)));
>                         ^
> 1 warning generated.
> 
> We can convert from gnu_printf to printf without any side effects for two
> reasons:
> 
> 1. All iavf_debug instances use standard printf formats, as pointed out
>    by Miguel Ojeda at the below link, meaning gnu_printf is not strictly
>    required.
> 
> 2. However, GCC has aliased printf to gnu_printf on Linux since at least
>    2010 based on git history.
> 
>    From gcc/c-family/c-format.c:
> 
>    /* Attributes such as "printf" are equivalent to those such as
>       "gnu_printf" unless this is overridden by a target.  */
>    static const target_ovr_attr gnu_target_overrides_format_attributes[] =
>    {
>      { "gnu_printf",   "printf" },
>      { "gnu_scanf",    "scanf" },
>      { "gnu_strftime", "strftime" },
>      { "gnu_strfmon",  "strfmon" },
>      { NULL,           NULL }
>    };
> 
> The mentioned override only happens on Windows (mingw32). Changing
> from gnu_printf to printf is a no-op for GCC and stops Clang from warning.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/111
> Suggested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_osdep.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* Re: [PATCH net 0/4] various compat ioctl fixes
From: David Miller @ 2019-01-30 18:20 UTC (permalink / raw)
  To: viro; +Cc: johannes, netdev, robert
In-Reply-To: <20190130154009.GJ2217@ZenIV.linux.org.uk>

From: Al Viro <viro@zeniv.linux.org.uk>
Date: Wed, 30 Jan 2019 15:40:09 +0000

> On Mon, Jan 28, 2019 at 10:32:30PM +0100, Johannes Berg wrote:
> 
>> At the same time, fixing all this _completely_ is not very realistic, it
>> would require passing the ifreq size through to lots of places and
>> making the user copy there take the size rather than sizeof(ifreq),
>> obviously the very least to the method decnet uses, i.e. sock->ioctl() I
>> think, but clearly that affects every other protocol too.
>> This was what my previous patch had done partially for the directly
>> handled ioctls (the revert of which is the first patch in this series).
>> 
>> > From what I can see this looks like probably the simplest way to
>> > fix this in net and -stable currently.
>> 
>> I tend to agree, at least to fix the regression.
>> 
>> We can still deliberate separately if we want to fix decnet for compat
>> or if nobody cares now. But perhaps better decnet broken (quite
>> obviously and detectably) like it basically always was, than IP broken
>> (subtly, if your struct ends up landing at the end of a page).
>> 
>> Al, care to speak up about this here?
> 
> Umm...  Short-term I don't see anything better; long-term I would really
> like to see compat_alloc_user_space()/copy_in_user() crap gone and
> copyin-copyout for anything more or less generic lifted up as far as
> cleanly possible, but let's not mix it with regression fixing.

It's a real shame, I thought it was a super clever solution to that
problem space at the time we added it.

> So for the lack of better short-term solutions,
> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
> on the series.

Ok, series applied, thanks everyone.

I'll queue this up for -stable too.

^ permalink raw reply

* [PATCH] selftests: net: fix "from" match test in fib_rule_tests.sh
From: Marcelo Henrique Cerri @ 2019-01-30 18:19 UTC (permalink / raw)
  To: David S. Miller, Shuah Khan, netdev, linux-kselftest,
	linux-kernel
  Cc: Marcelo Henrique Cerri

Fix the IPv4 address of the dummy0 interface and ensure that ip_forward
is enabled in the network space to get a valid response when checking
for routes between the gateway and other hosts.

Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
---
 tools/testing/selftests/net/fib_rule_tests.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
index d4cfb6a7a086..552a9784e759 100755
--- a/tools/testing/selftests/net/fib_rule_tests.sh
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -54,9 +54,11 @@ setup()
 
 	$IP link add dummy0 type dummy
 	$IP link set dev dummy0 up
-	$IP address add 198.51.100.1/24 dev dummy0
+	$IP address add 192.51.100.1/24 dev dummy0
 	$IP -6 address add 2001:db8:1::1/64 dev dummy0
 
+	ip netns exec testns sysctl -w net.ipv4.ip_forward=1
+
 	set +e
 }
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next v4 5/7] samples/bpf: Add a "force" flag to XDP samples
From: Maciej Fijalkowski @ 2019-01-30 18:12 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: daniel, ast, netdev, jakub.kicinski
In-Reply-To: <20190129123405.787746f7@redhat.com>

On Tue, 29 Jan 2019 12:34:05 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Tue, 29 Jan 2019 09:00:00 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > On Mon, 28 Jan 2019 20:16:11 +0100
> > Maciej Fijalkowski <maciejromanfijalkowski@gmail.com> wrote:
> >   
> > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > 
> > > Make xdp samples consistent with iproute2 behavior and set the
> > > XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
> > > interface. Provide an option for user to force the program loading,
> > > which as a result will not include the mentioned flag in
> > > bpf_set_link_xdp_fd call.    
> > 
> > I like the idea, but what is the error message users get after this
> > change?  
> 
> $ sudo ./xdp1 mlx5p1 &
> [1] 9768
> 
> $ sudo ./xdp1 mlx5p1 
> link set xdp fd failed
> 
> This error message is a little too generic to be a good user experience.
> The kernel (in dev_change_xdp_fd) will return errno -EBUSY (-16), but
> we don't use or report the return value in these sample programs.
> 
> If my QA see this error message, I will still get an error report
> bugzilla that I need to spend time on investigating.  Can we please
> improve this error message? 
> 
> If you are really cool you get inspired by (or use) libbpf_strerror()
> code avail in tools/lib/bpf/libbpf_errno.c.  Default strerror(EBUSY)
> will return "Device or resource busy", but maybe we can do slightly
> better and report something more meaningful for this XDP context.
> 
I'll post a v5 with libbpf_strerror() usage when bpf_set_link_xdp_fd failed in
samples but at this point it will only give us a standard "device or resource
busy" error, so if we need some more meaningful message that libbpf will give
us then I guess we need to define a new libbpf_errno entry (as well as entry in
libbpf_strerror_table for this new errno value) and set the errno in
bpf_set_link_xdp_fd in case of a failure?

^ permalink raw reply

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Will Deacon @ 2019-01-30 18:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, jakub.kicinski,
	netdev, kernel-team, mingo, Paul McKenney, jannh
In-Reply-To: <20190128215623.6eqskzhklydhympa@ast-mbp>

Hi Alexei,

On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > What I want to avoid is to define the whole execution ordering model upfront.
> > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > Most of the bpf progs are written and running on x86. We shouldn't
> > > twist bpf developer's arm by artificially relaxing memory model.
> > > BPF memory model is equal to memory model of underlying architecture.
> > > What we can do is to make it bpf progs a bit more portable with
> > > smp_rmb instructions, but we must not force weak execution on the developer.
> > 
> > Well, I agree with only introducing bits you actually need, and my
> > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > smp_store_release() might have been a far more useful example.
> > 
> > But I disagree with the last part; we have to pick a model now;
> > otherwise you'll pain yourself into a corner.
> > 
> > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > be gaining a lot of attention and that is very much a weak architecture.
> > Adding strongly ordered assumptions to BPF now, will penalize them in
> > the long run.
> 
> arm64 is gaining attention just like riscV is gaining it too.
> BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> BPF is not picking sides in CPU HW and ISA battles.

It's not about picking a side, it's about providing an abstraction of the
various CPU architectures out there so that the programmer doesn't need to
worry about where their program may run. Hell, even if you just said "eBPF
follows x86 semantics" that would be better than saying nothing (and then we
could have a discussion about whether x86 semantics are really what you
want).

> Memory model is CPU HW design decision. BPF ISA cannot dictate HW design.
> We're not saying today that BPF is strongly ordered.
> BPF load/stores are behaving differently on x86 vs arm64.
> We can add new instructions, but we cannot 'define' how load/stores behave
> from memory model perspective.
> For example, take atomicity of single byte load/store.
> Not all archs have them atomic, but we cannot say to bpf programmers
> to always assume non-atomic byte loads.

Hmm, I don't think this is a good approach to take for the future of eBPF.
Assuming that a desirable property of an eBPF program is portability between
CPU architectures, then you're effectively forcing the programmer to "assume
the worst", where the worst is almost certainly unusable for practical
purposes.

One easy thing you could consider would be to allow tagging of an eBPF
program with its supported target architectures (the JIT will refuse to
accept it for other architectures). This would at least prevent remove the
illusion of portability and force the programmer to be explicit.

However, I think we'd much better off if we defined some basic ordering
primitives such as relaxed and RCpc-style acquire/release operations
(including atomic RmW), single-copy atomic memory accesses up to the native
machine size and a full-fence instruction. If your program uses something
that the underlying arch doesn't support, then it is rejected (e.g. 64-bit
READ_ONCE on a 32-bit arch)

That should map straightforwardly to all modern architectures and allow for
efficient codegen on x86 and arm64. It would probably require a bunch of new
BPF instructions that would be defined to be atomic (you already have XADD
as a relaxed atomic add).

Apologies if this sounds patronising, but I'm keen to help figure out the
semantics *now* so that we don't end up having to infer them later on, which
is the usual painful case for memory models. I suspect Peter and Paul would
also prefer to attack it that way around. I appreciate that the temptation
is to avoid the problem by deferring to the underlying hardware memory
model, but I think that will create more problems than it solves and we're
here to help you get this right.

Will

^ permalink raw reply

* Re: [PATCH -next] mISDN: hfcsusb: Fix potential NULL pointer dereference
From: David Miller @ 2019-01-30 18:10 UTC (permalink / raw)
  To: yuehaibing; +Cc: isdn, gustavo, bigeasy, linux-kernel, netdev
In-Reply-To: <20190130101902.19744-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 30 Jan 2019 18:19:02 +0800

> There is a potential NULL pointer dereference in case
> kzalloc() fails and returns NULL.
> 
> Fixes: 69f52adb2d53 ("mISDN: Add HFC USB driver")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/isdn/hardware/mISDN/hfcsusb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c
> index 124ff53..5660d5a 100644
> --- a/drivers/isdn/hardware/mISDN/hfcsusb.c
> +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
> @@ -263,6 +263,8 @@ hfcsusb_ph_info(struct hfcsusb *hw)
>  	int i;
>  
>  	phi = kzalloc(struct_size(phi, bch, dch->dev.nrbchan), GFP_ATOMIC);
> +	if (!phi)
> +		return;

If we fail with an error and do not perform the operation we were requested to
make, we must return an error to the caller, and the caller must do something
reasonable with that error (perhaps return it to it's caller) and so on and
so forth.

^ permalink raw reply

* Re: [PATCH net-next 0/5] mlxsw: spectrum_acl: Include delta bits into hashtable key
From: David Miller @ 2019-01-30 18:07 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, mlxsw
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Wed, 30 Jan 2019 08:58:29 +0000

> The Spectrum-2 ASIC allows multiple rules to use the same mask provided
> that the difference between their masks is small enough (up to 8
> consecutive delta bits). A more detailed explanation is provided in
> merge commit 756cd36626f7 ("Merge branch
> 'mlxsw-Introduce-algorithmic-TCAM-support'").
> 
> These delta bits are part of the rule's key and therefore rules that
> only differ in their delta bits can be inserted with the same A-TCAM
> mask. In case two rules share the same key and only differ in their
> priority, then the second will spill to the C-TCAM.
> 
> Current code does not take the delta bits into account when checking for
> duplicate rules, which leads to unnecessary spillage to the C-TCAM.
> This may result in reduced scale and performance.
> 
> Patch #1 includes the delta bits in the rule's key to avoid the above
> mentioned problem.
> 
> Patch #2 adds a tracepoint when a rule is inserted into the C-TCAM.
> 
> Patches #3-#5 add test cases to make sure unnecessary spillage into the
> C-TCAM does not occur.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH bpf-next v4 6/7] libbpf: Add a support for getting xdp prog id on ifindex
From: Maciej Fijalkowski @ 2019-01-30 17:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, jakub.kicinski, brouer
In-Reply-To: <a9a2cef6-9628-39de-2b05-eb3f88d8ed85@gmail.com>

On Mon, 28 Jan 2019 13:07:49 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> On 1/28/19 11:16 AM, Maciej Fijalkowski wrote:
> > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > 
> > Since we have a dedicated netlink attributes for xdp setup on a
> > particular interface, it is now possible to retrieve the program id that
> > is currently attached to the interface. The use case is targeted for
> > sample xdp programs, which will store the program id just after loading
> > bpf program onto iface. On shutdown, the sample will make sure that it
> > can unload the program by querying again the iface and verifying that
> > both program id's matches.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---  
> 
> small nit.
> 
> > +
> > +int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
> > +{
> > +	struct xdp_id_md xdp_id = {};
> > +	int sock, ret;
> > +	__u32 nl_pid;
> > +	__u32 mask;
> > +
> > +	if (flags & ~XDP_FLAGS_MASK)
> > +		return -EINVAL;
> > +
> > +	/* Check whether the single {HW,DRV,SKB} mode is set */
> > +	flags &= XDP_FLAGS_MODES;
> > +	mask = flags - 1;
> > +	if (flags && flags & mask)
> > +		return -EINVAL;
> > +
> > +	sock = libbpf_netlink_open(&nl_pid);
> > +	if (sock < 0)
> > +		return sock;
> > +
> > +	xdp_id.ifindex = ifindex;
> > +	xdp_id.flags = flags;
> > +
> > +	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_id, &xdp_id);
> > +	*prog_id = xdp_id.id;  
> 
> just a nit but should we really set prog_id from user if there is
> an error. Probably friendlier not to set caller data in error
> case.
>
Agree, besides checking the return value from libbpf_nl_get_link() we should
also check that we have assigned a value to xdp_id.id so that would mean we
have gone through whole get_xdp_id(). I'll post a v5 within several hours with
following check:

if (!ret && xdp_id.id)
	*prog_id = xdp_id;

Hope that makes sense.
> > +
> > +	close(sock);
> > +	return ret;
> > +}
> > +
> >  int libbpf_nl_get_link(int sock, unsigned int nl_pid,
> >  		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
> >  {
> >   
> 


^ permalink raw reply

* [PATCH net 2/9] net/smc: allow 16 byte pnetids in netlink policy
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Hans Wippel <hwippel@linux.ibm.com>

Currently, users can only send pnetids with a maximum length of 15 bytes
over the SMC netlink interface although the maximum pnetid length is 16
bytes. This patch changes the SMC netlink policy to accept 16 byte
pnetids.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_pnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 7cb3e4f07c10..632c3109dee5 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -27,7 +27,7 @@
 static struct nla_policy smc_pnet_policy[SMC_PNETID_MAX + 1] = {
 	[SMC_PNETID_NAME] = {
 		.type = NLA_NUL_STRING,
-		.len = SMC_MAX_PNETID_LEN - 1
+		.len = SMC_MAX_PNETID_LEN
 	},
 	[SMC_PNETID_ETHNAME] = {
 		.type = NLA_NUL_STRING,
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 5/9] net/smc: recvmsg and splice_read should return 0 after shutdown
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

When a socket was connected and is now shut down for read, return 0 to
indicate end of data in recvmsg and splice_read (like TCP) and do not
return ENOTCONN. This behavior is required by the socket api.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/af_smc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index c4e56602e0c6..b04a813fc865 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1505,6 +1505,11 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	smc = smc_sk(sk);
 	lock_sock(sk);
+	if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) {
+		/* socket was connected before, no more data to read */
+		rc = 0;
+		goto out;
+	}
 	if ((sk->sk_state == SMC_INIT) ||
 	    (sk->sk_state == SMC_LISTEN) ||
 	    (sk->sk_state == SMC_CLOSED))
@@ -1840,7 +1845,11 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
 
 	smc = smc_sk(sk);
 	lock_sock(sk);
-
+	if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) {
+		/* socket was connected before, no more data to read */
+		rc = 0;
+		goto out;
+	}
 	if (sk->sk_state == SMC_INIT ||
 	    sk->sk_state == SMC_LISTEN ||
 	    sk->sk_state == SMC_CLOSED)
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 7/9] net/smc: call smc_cdc_msg_send() under send_lock
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Call smc_cdc_msg_send() under the connection send_lock to make sure all
send operations for one connection are serialized.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_cdc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index db83332ac1c8..1c5333d494e9 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -125,7 +125,10 @@ static int smcr_cdc_get_slot_and_msg_send(struct smc_connection *conn)
 	if (rc)
 		return rc;
 
-	return smc_cdc_msg_send(conn, wr_buf, pend);
+	spin_lock_bh(&conn->send_lock);
+	rc = smc_cdc_msg_send(conn, wr_buf, pend);
+	spin_unlock_bh(&conn->send_lock);
+	return rc;
 }
 
 int smc_cdc_get_slot_and_msg_send(struct smc_connection *conn)
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 9/9] net/smc: fix use of variable in cleared area
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Do not use pend->idx as index for the arrays because its value is
located in the cleared area. Use the existing local variable instead.
Without this fix the wrong area might be cleared.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_wr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index c2694750a6a8..1dc88c32d6bb 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -218,10 +218,10 @@ int smc_wr_tx_put_slot(struct smc_link *link,
 		u32 idx = pend->idx;
 
 		/* clear the full struct smc_wr_tx_pend including .priv */
-		memset(&link->wr_tx_pends[pend->idx], 0,
-		       sizeof(link->wr_tx_pends[pend->idx]));
-		memset(&link->wr_tx_bufs[pend->idx], 0,
-		       sizeof(link->wr_tx_bufs[pend->idx]));
+		memset(&link->wr_tx_pends[idx], 0,
+		       sizeof(link->wr_tx_pends[idx]));
+		memset(&link->wr_tx_bufs[idx], 0,
+		       sizeof(link->wr_tx_bufs[idx]));
 		test_and_clear_bit(idx, link->wr_tx_mask);
 		return 1;
 	}
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 8/9] net/smc: use device link provided in qp_context
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

The device field of the IB event structure does not always point to the
SMC IB device. Load the pointer from the qp_context which is always
provided to smc_ib_qp_event_handler() in the priv field. And for qp
events the affected port is given in the qp structure of the ibevent,
derive it from there.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_ib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index e519ef29c0ff..76487a16934e 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -289,8 +289,8 @@ int smc_ib_create_protection_domain(struct smc_link *lnk)
 
 static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv)
 {
-	struct smc_ib_device *smcibdev =
-		(struct smc_ib_device *)ibevent->device;
+	struct smc_link *lnk = (struct smc_link *)priv;
+	struct smc_ib_device *smcibdev = lnk->smcibdev;
 	u8 port_idx;
 
 	switch (ibevent->event) {
@@ -298,7 +298,7 @@ static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv)
 	case IB_EVENT_GID_CHANGE:
 	case IB_EVENT_PORT_ERR:
 	case IB_EVENT_QP_ACCESS_ERR:
-		port_idx = ibevent->element.port_num - 1;
+		port_idx = ibevent->element.qp->port - 1;
 		set_bit(port_idx, &smcibdev->port_event_mask);
 		schedule_work(&smcibdev->port_event_work);
 		break;
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 6/9] net/smc: do not wait under send_lock
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

smc_cdc_get_free_slot() might wait for free transfer buffers when using
SMC-R. This wait should not be done under the send_lock, which is a
spin_lock. This fixes a cpu loop in parallel threads waiting for the
send_lock.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_tx.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index f99951f3f7fd..36af3de731b9 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -488,25 +488,23 @@ static int smcr_tx_sndbuf_nonempty(struct smc_connection *conn)
 	struct smc_wr_buf *wr_buf;
 	int rc;
 
-	spin_lock_bh(&conn->send_lock);
 	rc = smc_cdc_get_free_slot(conn, &wr_buf, &pend);
 	if (rc < 0) {
 		if (rc == -EBUSY) {
 			struct smc_sock *smc =
 				container_of(conn, struct smc_sock, conn);
 
-			if (smc->sk.sk_err == ECONNABORTED) {
-				rc = sock_error(&smc->sk);
-				goto out_unlock;
-			}
+			if (smc->sk.sk_err == ECONNABORTED)
+				return sock_error(&smc->sk);
 			rc = 0;
 			if (conn->alert_token_local) /* connection healthy */
 				mod_delayed_work(system_wq, &conn->tx_work,
 						 SMC_TX_WORK_DELAY);
 		}
-		goto out_unlock;
+		return rc;
 	}
 
+	spin_lock_bh(&conn->send_lock);
 	if (!conn->local_tx_ctrl.prod_flags.urg_data_present) {
 		rc = smc_tx_rdma_writes(conn);
 		if (rc) {
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 4/9] net/smc: don't wait for send buffer space when data was already sent
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

When there is no more send buffer space and at least 1 byte was already
sent then return to user space. The wait is only done when no data was
sent by the sendmsg() call.
This fixes smc_tx_sendmsg() which tried to always send all user data and
started to wait for free send buffer space when needed. During this wait
the user space program was blocked in the sendmsg() call and hence not
able to receive incoming data. When both sides were in such a situation
then the connection stalled forever.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_tx.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index d8366ed51757..f99951f3f7fd 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -165,12 +165,11 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 			conn->local_tx_ctrl.prod_flags.urg_data_pending = 1;
 
 		if (!atomic_read(&conn->sndbuf_space) || conn->urg_tx_pend) {
+			if (send_done)
+				return send_done;
 			rc = smc_tx_wait(smc, msg->msg_flags);
-			if (rc) {
-				if (send_done)
-					return send_done;
+			if (rc)
 				goto out_err;
-			}
 			continue;
 		}
 
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 3/9] net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

To prevent races between smc_lgr_terminate() and smc_conn_free() add an
extra check of the lgr field before accessing it, and cancel a delayed
free_work when a new smc connection is created.
This fixes the problem that free_work cleared the lgr variable but
smc_lgr_terminate() or smc_conn_free() still access it in parallel.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 35c1cdc93e1c..097c798983ca 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -128,6 +128,8 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
 {
 	struct smc_link_group *lgr = conn->lgr;
 
+	if (!lgr)
+		return;
 	write_lock_bh(&lgr->conns_lock);
 	if (conn->alert_token_local) {
 		__smc_lgr_unregister_conn(conn);
@@ -628,6 +630,8 @@ int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact,
 			local_contact = SMC_REUSE_CONTACT;
 			conn->lgr = lgr;
 			smc_lgr_register_conn(conn); /* add smc conn to lgr */
+			if (delayed_work_pending(&lgr->free_work))
+				cancel_delayed_work(&lgr->free_work);
 			write_unlock_bh(&lgr->conns_lock);
 			break;
 		}
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 1/9] net/smc: fix another sizeof to int comparison
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_clc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 776e9dfc915d..d53fd588d1f5 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -378,7 +378,7 @@ int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info)
 	vec.iov_len = sizeof(struct smc_clc_msg_decline);
 	len = kernel_sendmsg(smc->clcsock, &msg, &vec, 1,
 			     sizeof(struct smc_clc_msg_decline));
-	if (len < sizeof(struct smc_clc_msg_decline))
+	if (len < 0 || len < sizeof(struct smc_clc_msg_decline))
 		len = -EPROTO;
 	return len > 0 ? 0 : len;
 }
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 0/9] net/smc: fixes 2019-01-30
From: Ursula Braun @ 2019-01-30 17:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun

Dave,

here are some fixes in different areas of the smc code for the net
tree.

Thanks, Ursula

Hans Wippel (1):
  net/smc: allow 16 byte pnetids in netlink policy

Karsten Graul (7):
  net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
  net/smc: don't wait for send buffer space when data was already sent
  net/smc: recvmsg and splice_read should return 0 after shutdown
  net/smc: do not wait under send_lock
  net/smc: call smc_cdc_msg_send() under send_lock
  net/smc: use device link provided in qp_context
  net/smc: fix use of variable in cleared area

Ursula Braun (1):
  net/smc: fix another sizeof to int comparison

 net/smc/af_smc.c   | 11 ++++++++++-
 net/smc/smc_cdc.c  |  5 ++++-
 net/smc/smc_clc.c  |  2 +-
 net/smc/smc_core.c |  4 ++++
 net/smc/smc_ib.c   |  6 +++---
 net/smc/smc_pnet.c |  2 +-
 net/smc/smc_tx.c   | 17 +++++++----------
 net/smc/smc_wr.c   |  8 ++++----
 8 files changed, 34 insertions(+), 21 deletions(-)

-- 
2.16.4


^ permalink raw reply

* RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Leo Li @ 2019-01-30 17:48 UTC (permalink / raw)
  To: Pankaj Bansal, Shawn Guo, Andrew Lunn, Florian Fainelli
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <VI1PR0401MB24968C7FA8CE3CEB291FDC94F1900@VI1PR0401MB2496.eurprd04.prod.outlook.com>



> -----Original Message-----
> From: Pankaj Bansal
> Sent: Wednesday, January 30, 2019 4:37 AM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>; Shawn Guo
> <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>; Andrew Lunn
> <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> HI Shawn/Leo,
> 
> Can you please review this patch and include it in your tree ?
> 
> Regards,
> Pankaj Bansal
> 
> -----Original Message-----
> From: Pankaj Bansal [mailto:pankaj.bansal@nxp.com]
> Sent: Thursday, 15 November, 2018 05:42 PM
> To: Shawn Guo <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>;
> Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Pankaj
> Bansal <pankaj.bansal@nxp.com>
> Subject: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> The two external MDIO buses used to communicate with phy devices that
> are external to SOC are muxed in LX2160AQDS board.
> 
> These buses can be routed to any one of the eight IO slots on LX2160AQDS
> board depending on value in fpga register 0x54.
> 
> Additionally the external MDIO1 is used to communicate to the onboard
> RGMII phy devices.
> 
> The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is controlled by
> bits 0-3 of fpga register.
> 
> Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> ---
> 
> Notes:
>     This patch depends on following patches:
>     [1]https://patchwork.kernel.org/cover/10658863/
>     [2]https://patchwork.codeaurora.org/patch/637861/
> 
>  .../boot/dts/freescale/fsl-lx2160a-qds.dts   | 116 +++++++++++++++++
>  .../boot/dts/freescale/fsl-lx2160a.dtsi      |  23 ++++
>  2 files changed, 139 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> index 8a0305a2b778..39aa2731ddfa 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> @@ -54,6 +54,121 @@
>  &i2c0 {
>  	status = "okay";
> 
> +	fpga@66 {
> +		compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> +		reg = <0x66>;
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		mdio-mux-1@54 {

No compatible for this node?  What is the binding used with this node?

> +			mdio-parent-bus = <&emdio1>;
> +			reg = <0x54>;		 /* BRDCFG4 */
> +			mux-mask = <0xf8>;      /* EMI1_MDIO */
> +			#address-cells=<1>;
> +			#size-cells = <0>;
> +
> +			mdio@0 {
> +				reg = <0x00>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@40 {
> +				reg = <0x40>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@c0 {
> +				reg = <0xc0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@c8 {
> +				reg = <0xc8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@d0 {
> +				reg = <0xd0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@d8 {
> +				reg = <0xd8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@e0 {
> +				reg = <0xe0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@e8 {
> +				reg = <0xe8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@f0 {
> +				reg = <0xf0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@f8 {
> +				reg = <0xf8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
> +
> +		mdio-mux-2@54 {

Same comment as the previous one.

> +			mdio-parent-bus = <&emdio2>;
> +			reg = <0x54>;		 /* BRDCFG4 */
> +			mux-mask = <0x07>;      /* EMI2_MDIO */
> +			#address-cells=<1>;
> +			#size-cells = <0>;
> +
> +			mdio@0 {
> +				reg = <0x00>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@1 {
> +				reg = <0x01>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@2 {
> +				reg = <0x02>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@3 {
> +				reg = <0x03>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@4 {
> +				reg = <0x04>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@5 {
> +				reg = <0x05>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@6 {
> +				reg = <0x06>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@7 {
> +				reg = <0x07>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
> +	};
> +
>  	i2c-mux@77 {
>  		compatible = "nxp,pca9547";
>  		reg = <0x77>;
> @@ -118,3 +233,4 @@
>  &usb1 {
>  	status = "okay";
>  };
> +

No need to add this new line.

> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> index 6ce0677c3096..518882b05f03 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -780,5 +780,28 @@
>  				     <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
>  			dma-coherent;
>  		};
> +		/* TODO: WRIOP (CCSR?) */

What is this TODO?  Can we address it now?

> +		/* WRIOP0: 0x8B8_0000, E-MDIO1: 0x1_6000 */
> +		emdio1: mdio@0x8B96000 {
> +			compatible = "fsl,fman-memac-mdio";

This node doesn't actually match the binding Documentation/devicetree/bindings/net/fsl-fman.txt

> +			reg = <0x0 0x8B96000 0x0 0x1000>;
> +			device_type = "mdio";	/* TODO: is this necessary? */

Not needed

> +			little-endian;	/* force the driver in LE mode */
> +
> +			/* Not necessary on the QDS, but needed on the
> RDB*/

Why?  And we shouldn't discuss boards in the soc dtsi file.

> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +		};
> +		/* WRIOP0: 0x8B8_0000, E-MDIO2: 0x1_7000 */
> +		emdio2: mdio@0x8B97000 {
> +			compatible = "fsl,fman-memac-mdio";
> +			reg = <0x0 0x8B97000 0x0 0x1000>;
> +			device_type = "mdio";	/* TODO: is this necessary? */
> +			little-endian;	/* force the driver in LE mode */
> +
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +		};
>  	};
>  };
> +
> --
> 2.17.1


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox