* Re: [PATCH net-next] ipv6: Do not use this_cpu_ptr() in preemptible context
From: Eric Dumazet @ 2017-10-08 16:03 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, weiwan, mlxsw
In-Reply-To: <20171008151839.24519-1-idosch@mellanox.com>
On Sun, 2017-10-08 at 18:18 +0300, Ido Schimmel wrote:
> Without the rwlock and with PREEMPT_RCU we're no longer guaranteed to be
> in non-preemptible context when performing a route lookup, so use
> raw_cpu_ptr() instead.
>
> Takes care of the following splat:
> [ 122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672
> [ 122.221845] caller is debug_smp_processor_id+0x17/0x20
> [ 122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639
> [ 122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
> [ 122.221893] Call Trace:
> [ 122.221919] dump_stack+0xb1/0x10c
> [ 122.221946] ? _atomic_dec_and_lock+0x124/0x124
> [ 122.221974] ? ___ratelimit+0xfe/0x240
> [ 122.222020] check_preemption_disabled+0x173/0x1b0
> [ 122.222060] debug_smp_processor_id+0x17/0x20
> [ 122.222083] ip6_pol_route+0x1482/0x24a0
> ...
>
> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
Thanks Ido for this patch.
IMO, we no longer play this read_lock() -> write_lock() game since
ip6_dst_gc() could be called from rt6_make_pcpu_route()
So we might simplify things quite a bit, by blocking BH (and thus
preventing preemption)
Something like :
net/ipv6/route.c | 26 ++++++--------------------
1 file changed, 6 insertions(+), 20 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399d1bceec4a6e6736c367e706dd2acbd4093d58..606e80325b21c0e10a02e9c7d5b3fcfbfc26a003 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1136,15 +1136,7 @@ static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt)
dst_hold(&pcpu_rt->dst);
p = this_cpu_ptr(rt->rt6i_pcpu);
prev = cmpxchg(p, NULL, pcpu_rt);
- if (prev) {
- /* If someone did it before us, return prev instead */
- /* release refcnt taken by ip6_rt_pcpu_alloc() */
- dst_release_immediate(&pcpu_rt->dst);
- /* release refcnt taken by above dst_hold() */
- dst_release_immediate(&pcpu_rt->dst);
- dst_hold(&prev->dst);
- pcpu_rt = prev;
- }
+ BUG_ON(prev);
rt6_dst_from_metrics_check(pcpu_rt);
return pcpu_rt;
@@ -1739,31 +1731,25 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
struct rt6_info *pcpu_rt;
dst_use_noref(&rt->dst, jiffies);
+ local_bh_disable();
pcpu_rt = rt6_get_pcpu_route(rt);
- if (pcpu_rt) {
- rcu_read_unlock();
- } else {
+ if (!pcpu_rt) {
/* atomic_inc_not_zero() is needed when using rcu */
if (atomic_inc_not_zero(&rt->rt6i_ref)) {
- /* We have to do the read_unlock first
- * because rt6_make_pcpu_route() may trigger
- * ip6_dst_gc() which will take the write_lock.
- *
- * No dst_hold() on rt is needed because grabbing
+ /* No dst_hold() on rt is needed because grabbing
* rt->rt6i_ref makes sure rt can't be released.
*/
- rcu_read_unlock();
pcpu_rt = rt6_make_pcpu_route(rt);
rt6_release(rt);
} else {
/* rt is already removed from tree */
- rcu_read_unlock();
pcpu_rt = net->ipv6.ip6_null_entry;
dst_hold(&pcpu_rt->dst);
}
}
-
+ local_bh_enable();
+ rcu_read_unlock();
trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
return pcpu_rt;
}
^ permalink raw reply related
* Re: [PATCH net-next RFC 0/9] net: dsa: PTP timestamping for mv88e6xxx
From: Richard Cochran @ 2017-10-08 15:38 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <20170929094323.cwh2ubv4odknlyot@localhost>
On Fri, Sep 29, 2017 at 05:43:23AM -0400, Richard Cochran wrote:
> I happy to see this series. I just finished porting an out-of-tree
> PHC driver for the Marvell mv88e635x, and I want to mainline it, but I
> also have a few uglies.
This series looks really good. I won't even post my mine, as that
would now be too embarrassing.
I will try to get my hands on some HW, perhaps by the end of October,
in order to test and complete your driver...
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next RFC 8/9] net: dsa: mv88e6xxx: add rx/tx timestamping support
From: Richard Cochran @ 2017-10-08 15:29 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-9-git-send-email-brandon.streiff@ni.com>
On Thu, Sep 28, 2017 at 10:25:40AM -0500, Brandon Streiff wrote:
> +void mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port,
> + struct sk_buff *clone, unsigned int type)
> +{
> + struct mv88e6xxx_chip *chip = ds->priv;
> + struct mv88e6xxx_port_hwtstamp *ps = &chip->port_hwtstamp[port];
> +
> + if (!chip->info->ptp_support)
> + return;
> +
> + if (port < 0 || port >= mv88e6xxx_num_ports(chip))
> + goto out;
> +
> + if (unlikely(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP) &&
> + mv88e6xxx_should_tstamp(chip, port, clone, type)) {
> + __be16 *seq_ptr = (__be16 *)(_get_ptp_header(clone, type) +
> + OFF_PTP_SEQUENCE_ID);
> +
> + if (!test_and_set_bit_lock(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS,
> + &ps->state)) {
> + ps->tx_skb = clone;
> + ps->tx_tstamp_start = jiffies;
> + ps->tx_seq_id = be16_to_cpup(seq_ptr);
> +
> + /* Fetching the timestamp is high-priority work because
> + * 802.1AS bounds the time for a response.
Can you please use this?
commit d9535cb7b7603aeb549c697ecdf92024e4d0a650
Author: Grygorii Strashko <grygorii.strashko@ti.com>
Date: Fri Jul 28 17:30:02 2017 -0500
ptp: introduce ptp auxiliary worker
Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manuallly.
Hence, The kthread_worker can be used insted of workqueues, as it create
separte named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.
> + * No need to check result of queue_work(). ps->tx_skb
> + * check ensures work item is not pending (it may be
> + * waiting to exit)
> + */
> + queue_work(system_highpri_wq, &ps->tx_tstamp_work);
> + return;
> + }
> +
> + /* Otherwise we're already in progress... */
> + dev_dbg(chip->dev,
> + "p%d: tx timestamp already in progress, discarding",
> + port);
> + }
> +
> +out:
> + /* We don't need it after all. */
> + kfree_skb(clone);
> +}
Thanks,
Richard
^ permalink raw reply
* [PATCH net-next] ipv6: Do not use this_cpu_ptr() in preemptible context
From: Ido Schimmel @ 2017-10-08 15:18 UTC (permalink / raw)
To: netdev; +Cc: davem, weiwan, mlxsw, Ido Schimmel
Without the rwlock and with PREEMPT_RCU we're no longer guaranteed to be
in non-preemptible context when performing a route lookup, so use
raw_cpu_ptr() instead.
Takes care of the following splat:
[ 122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672
[ 122.221845] caller is debug_smp_processor_id+0x17/0x20
[ 122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639
[ 122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ 122.221893] Call Trace:
[ 122.221919] dump_stack+0xb1/0x10c
[ 122.221946] ? _atomic_dec_and_lock+0x124/0x124
[ 122.221974] ? ___ratelimit+0xfe/0x240
[ 122.222020] check_preemption_disabled+0x173/0x1b0
[ 122.222060] debug_smp_processor_id+0x17/0x20
[ 122.222083] ip6_pol_route+0x1482/0x24a0
...
Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
net/ipv6/route.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399d1bceec4a..579d4b73beb1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1112,7 +1112,7 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt)
{
struct rt6_info *pcpu_rt, **p;
- p = this_cpu_ptr(rt->rt6i_pcpu);
+ p = raw_cpu_ptr(rt->rt6i_pcpu);
pcpu_rt = *p;
if (pcpu_rt && ip6_hold_safe(NULL, &pcpu_rt, false))
@@ -1134,7 +1134,7 @@ static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt)
}
dst_hold(&pcpu_rt->dst);
- p = this_cpu_ptr(rt->rt6i_pcpu);
+ p = raw_cpu_ptr(rt->rt6i_pcpu);
prev = cmpxchg(p, NULL, pcpu_rt);
if (prev) {
/* If someone did it before us, return prev instead */
--
2.13.6
^ permalink raw reply related
* [BUG] stmmac: A possible sleep-in-atomic bug in stmmac_suspend
From: Jia-Ju Bai @ 2017-10-08 15:13 UTC (permalink / raw)
To: peppe.cavallaro, alexandre.torgue, davem
Cc: netdev, Linux Kernel Mailing List
According to stmmac_main.c, the driver may sleep under a spinlock,
and the function call path is:
stmmac_suspend (acquire the spinlock)
stmmac_disable_all_queues
napi_disable
might_sleep --> may sleep
msleep --> may sleep
This bug is found by my static analysis tool and my code review.
Thanks,
Jia-Ju Bai
^ permalink raw reply
* Re: [PATCH net-next RFC 8/9] net: dsa: mv88e6xxx: add rx/tx timestamping support
From: Richard Cochran @ 2017-10-08 15:12 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-9-git-send-email-brandon.streiff@ni.com>
On Thu, Sep 28, 2017 at 10:25:40AM -0500, Brandon Streiff wrote:
> We also utilize a feature of the "generation 3" PTP hardware that lets
> us to embed the timestamp value into one of the reserved fields in the
> PTP header. This lets us extract the timestamp out of the header and
> avoid an SMI access in the RX codepath. (This implementation does not
> presently support the older generations.)
That is fine for the later models, but we really need the code to read
over MDIO as well. You added .ptp_support = true for those older
switches, and so the present series won't work.
If it helps, maybe I can adapt the relevant code from my driver to
your work.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH 41/47] netfilter: convert hook list to an array
From: Tariq Toukan @ 2017-10-08 15:07 UTC (permalink / raw)
To: Pablo Neira Ayuso, netfilter-devel, Aaron Conole,
Florian Westphal
Cc: davem, netdev
In-Reply-To: <1504478574-13281-6-git-send-email-pablo@netfilter.org>
On 04/09/2017 1:42 AM, Pablo Neira Ayuso wrote:
> From: Aaron Conole <aconole@bytheb.org>
>
> This converts the storage and layout of netfilter hook entries from a
> linked list to an array. After this commit, hook entries will be
> stored adjacent in memory. The next pointer is no longer required.
>
> The ops pointers are stored at the end of the array as they are only
> used in the register/unregister path and in the legacy br_netfilter code.
>
> nf_unregister_net_hooks() is slower than needed as it just calls
> nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
> calls), this will be addressed in followup patch.
>
> Test setup:
> - ixgbe 10gbit
> - netperf UDP_STREAM, 64 byte packets
> - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
> empty mangle and raw prerouting, mangle and filter input hooks:
> 353.9
> this patch:
> 364.2
>
> Signed-off-by: Aaron Conole <aconole@bytheb.org>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
Hi,
We experience a regression in server with iommu enabled.
After installing kernel and rebooting the server, it crashes during boot.
Please see trace below.
Bisecting points to this patch.
Any idea what's wrong?
Regards,
Tariq Toukan
[ 25.590816] RIP: 0010:_raw_read_lock_bh+0x15/0x40
[ 25.596160] RSP: 0018:ffffc90007db77a0 EFLAGS: 00010286
[ 25.602089] RAX: 0000000000000100 RBX: 0000000000000003 RCX:
0000000000000000
[ 25.610152] RDX: 0000000000000000 RSI: ffffc90007db7898 RDI:
000000000000003c
[ 25.618470] RBP: ffffc90007db7840 R08: 0000000000000001 R09:
0000000087c10eef
[ 25.626786] R10: ffff88180f21f040 R11: ffffea005feeaf00 R12:
0000000000000000
[ 25.635103] R13: ffffc90007db7898 R14: ffff8817fbabdc00 R15:
ffff8817fbabdc00
[ 25.643421] FS: 00007fcdb7771740(0000) GS:ffff88180f200000(0000)
knlGS:0000000000000000
[ 25.653056] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 25.659818] CR2: 000000000000003c CR3: 0000001809ae0001 CR4:
00000000001606e0
[ 25.668136] Call Trace:
[ 25.671215] ? ebt_do_table+0x3d/0x6e8 [ebtables]
[ 25.676817] ebt_nat_out+0x1f/0x30 [ebtable_nat]
[ 25.682326] nf_hook_slow+0x3c/0xb0
[ 25.686576] __br_forward+0xb1/0x1b0 [bridge]
[ 25.691786] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[ 25.704333] br_flood+0x130/0x1b0 [bridge]
[ 25.709254] br_dev_xmit+0x1e5/0x2a0 [bridge]
[ 25.714468] dev_hard_start_xmit+0xa1/0x210
[ 25.719485] __dev_queue_xmit+0x4f6/0x610
[ 25.724304] dev_queue_xmit+0x10/0x20
[ 25.728739] ip_finish_output2+0x233/0x320
[ 25.733656] ip_finish_output+0x12a/0x1d0
[ 25.738474] ? netif_rx_ni+0x33/0x80
[ 25.742805] ip_mc_output+0x84/0x250
[ 25.747140] ip_local_out+0x35/0x40
[ 25.751377] ip_send_skb+0x19/0x40
[ 25.755583] udp_send_skb+0x172/0x280
[ 25.760013] udp_sendmsg+0x2c0/0xa30
[ 25.764348] ? ip_reply_glue_bits+0x50/0x50
[ 25.769366] ? import_iovec+0x2c/0xc0
[ 25.773801] inet_sendmsg+0x31/0xb0
[ 25.778042] sock_sendmsg+0x38/0x50
[ 25.782276] ___sys_sendmsg+0x25c/0x270
[ 25.786904] ? file_update_time+0x3a/0xf0
[ 25.791727] ? __wake_up_sync_key+0x50/0x60
[ 25.796741] ? pipe_write+0x3cc/0x420
[ 25.801175] ? __vfs_write+0xd0/0x130
[ 25.805608] __sys_sendmsg+0x45/0x80
[ 25.809938] SyS_sendmsg+0x12/0x20
[ 25.814077] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 25.819577] RIP: 0033:0x7fcdb64ac7a0
[ 25.823908] RSP: 002b:00007ffe2b98cb98 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 25.832961] RAX: ffffffffffffffda RBX: 00007ffe2b98c630 RCX:
00007fcdb64ac7a0
[ 25.841270] RDX: 0000000000000000 RSI: 00007ffe2b98cc50 RDI:
000000000000000c
[ 25.849583] RBP: 00007fcdb69018f8 R08: 00007ffe2b98cbc3 R09:
0000000000000004
[ 25.857901] R10: 0000000000000019 R11: 0000000000000246 R12:
0000000000000000
[ 25.866213] R13: 0000000000000000 R14: 00007ffe2b98c6c0 R15:
00007ffe2b98c6e0
[ 25.874520] Code: 55 48 89 e5 e8 bd 74 82 ff 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 65 81 05 68 78 74 7e 00 02 00 00 b8 00 01
00 00 <f0> 0f c1 07 8d b0 00 01 00 00 40 84
[ 25.896497] RIP: _raw_read_lock_bh+0x15/0x40 RSP: ffffc90007db77a0
[ 25.903744] CR2: 000000000000003c
[ 25.907808] ---[ end trace 4f824a5c467b1872 ]---
[ 25.907811] BUG: unable to handle kernel NULL pointer dereference at
000000000000003c
[ 25.907828] IP: _raw_read_lock_bh+0x15/0x40
[ 25.907830] PGD 0 P4D 0
[ 25.907834] Oops: 0002 [#2] SMP
[ 25.907836] Modules linked in: ebtable_nat(+) ebtables ib_ucm mlx4_en
mlx4_ib rpcrdma mlx4_core rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm
ib_ipoib iw_cm ib_cm mlx5_ib bridge stp llc sge
[ 25.907895] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G D
4.13.0-for-linust-perf-2017-09-10_06-48-01-64 #1
[ 25.907896] Hardware name: Dell Inc. PowerEdge R720/0HJK12, BIOS
2.2.3 05/20/2014
[ 25.907898] task: ffff880c0c2f8000 task.stack: ffffc90006318000
[ 25.907901] RIP: 0010:_raw_read_lock_bh+0x15/0x40
[ 25.907902] RSP: 0018:ffff880c0f9839d0 EFLAGS: 00010286
[ 25.907904] RAX: 0000000000000100 RBX: 0000000000000003 RCX:
0000000000000000
[ 25.907905] RDX: 0000000000000000 RSI: ffff880c0f983ac8 RDI:
000000000000003c
[ 25.907906] RBP: ffff880c0f983a70 R08: 0000000000000001 R09:
0000000000000000
[ 25.907907] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 25.907909] R13: ffff880c0f983ac8 R14: ffff880bfcfdda00 R15:
ffff880bfcfdda00
[ 25.907911] FS: 0000000000000000(0000) GS:ffff880c0f980000(0000)
knlGS:0000000000000000
[ 25.907912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 25.907913] CR2: 000000000000003c CR3: 0000001809a5e001 CR4:
00000000001606e0
[ 25.907915] Call Trace:
[ 25.907918] <IRQ>
[ 25.907925] ? ebt_do_table+0x3d/0x6e8 [ebtables]
[ 25.907929] ? lock_timer_base+0x7d/0xa0
[ 25.907932] ? mod_timer+0xa9/0x2c0
[ 25.907937] ebt_nat_out+0x1f/0x30 [ebtable_nat]
[ 25.907946] nf_hook_slow+0x3c/0xb0
[ 25.907958] __br_forward+0xb1/0x1b0 [bridge]
[ 25.907966] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[ 25.907972] br_flood+0x130/0x1b0 [bridge]
[ 25.907979] br_dev_xmit+0x1e5/0x2a0 [bridge]
[ 25.907987] dev_hard_start_xmit+0xa1/0x210
[ 25.907990] __dev_queue_xmit+0x4f6/0x610
[ 25.907993] ? _raw_read_unlock_bh+0x20/0x30
[ 25.907996] dev_queue_xmit+0x10/0x20
[ 25.908001] ip6_finish_output2+0x3b5/0x4c0
[ 25.908005] ip6_finish_output+0xa5/0x100
[ 25.908007] ip6_output+0x5b/0xf0
[ 25.908012] NF_HOOK.constprop.43+0x30/0x90
[ 25.908015] ? icmp6_dst_alloc+0xd2/0x110
[ 25.908018] mld_sendpack+0x168/0x220
[ 25.908021] mld_ifc_timer_expire+0x17f/0x290
[ 25.908024] ? mld_dad_timer_expire+0x60/0x60
[ 25.908026] call_timer_fn+0x35/0x140
[ 25.908028] run_timer_softirq+0x1ce/0x410
[ 25.908031] ? timerqueue_add+0x59/0x90
[ 25.908036] ? sched_clock+0x9/0x10
[ 25.908039] ? sched_clock_cpu+0x11/0xb0
[ 25.908042] __do_softirq+0xd1/0x27f
[ 25.908046] irq_exit+0xb5/0xc0
[ 25.908048] smp_apic_timer_interrupt+0x69/0x130
[ 25.908050] apic_timer_interrupt+0x93/0xa0
[ 25.908052] </IRQ>
[ 25.908056] RIP: 0010:cpuidle_enter_state+0xe9/0x280
[ 25.908057] RSP: 0018:ffffc9000631be88 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff10
[ 25.908059] RAX: ffff880c0f99bdc0 RBX: ffffe8f400180270 RCX:
000000000000001f
[ 25.908060] RDX: 0000000000000000 RSI: ffff7761f8923d16 RDI:
0000000000000000
[ 25.908061] RBP: ffffc9000631bec0 R08: 00000000000002a1 R09:
0000000000000390
[ 25.908062] R10: 000000000000037e R11: 0000000000000018 R12:
0000000000000004
[ 25.908063] R13: 000000000000000c R14: ffffe8f400180270 R15:
00000005f7b4d9b4
[ 25.908068] ? cpuidle_enter_state+0xc5/0x280
[ 25.908071] cpuidle_enter+0x17/0x20
[ 25.908074] call_cpuidle+0x23/0x40
[ 25.908077] do_idle+0x172/0x1e0
[ 25.908079] cpu_startup_entry+0x1d/0x30
[ 25.908084] start_secondary+0x103/0x130
[ 25.908087] secondary_startup_64+0xa5/0xa5
[ 25.908089] Code: 55 48 89 e5 e8 bd 74 82 ff 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 65 81 05 68 78 74 7e 00 02 00 00 b8 00 01
00 00 <f0> 0f c1 07 8d b0 00 01 00 00 40 84
[ 25.908124] RIP: _raw_read_lock_bh+0x15/0x40 RSP: ffff880c0f9839d0
[ 25.908124] CR2: 000000000000003c
[ 25.908154] ---[ end trace 4f824a5c467b1873 ]---
[ 25.913089] Kernel panic - not syncing: Fatal exception in interrupt
[ 26.964216] Shutting down cpus with NMI
[ 26.968841] Kernel Offset: disabled
[ 26.975644] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt
^ permalink raw reply
* Re: [PATCH net-next RFC 4/9] net: dsa: mv88e6xxx: add support for event capture
From: Richard Cochran @ 2017-10-08 15:06 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-5-git-send-email-brandon.streiff@ni.com>
There are some issues here.
On Thu, Sep 28, 2017 at 10:25:36AM -0500, Brandon Streiff wrote:
> +static int mv88e6xxx_config_periodic_trig(struct mv88e6xxx_chip *chip,
> + u32 ns, u16 picos)
> +{
> + int err;
> + u16 global_config;
> +
> + if (picos >= 1000)
> + return -ERANGE;
> +
> + /* TRIG generation is in units of 8 ns clock periods. Convert ns
> + * and ps into 8 ns clock periods and up to 8000 additional ps
> + */
> + picos += (ns & 0x7) * 1000;
> + ns = ns >> 3;
Again, the 8 nanosecounds shouldn't be hard coded.
...
> + return err;
> +}
> +static void mv88e6xxx_tai_event_work(struct work_struct *ugly)
> +{
> + struct delayed_work *dw = to_delayed_work(ugly);
> + struct mv88e6xxx_chip *chip =
> + container_of(dw, struct mv88e6xxx_chip, tai_event_work);
> + u16 ev_status[4];
> + int err;
> +
> + mutex_lock(&chip->reg_lock);
> +
> + err = mv88e6xxx_tai_read(chip, MV88E6XXX_TAI_EVENT_STATUS,
> + ev_status, ARRAY_SIZE(ev_status));
> + if (err) {
> + mutex_unlock(&chip->reg_lock);
> + return;
> + }
> +
> + if (ev_status[0] & MV88E6XXX_TAI_EVENT_STATUS_ERROR)
> + dev_warn(chip->dev, "missed event capture\n");
> +
> + if (ev_status[0] & MV88E6XXX_TAI_EVENT_STATUS_VALID) {
Avoid IfOk.
> + struct ptp_clock_event ev;
> + u32 raw_ts = ((u32)ev_status[2] << 16) | ev_status[1];
> +
> + /* Clear the valid bit so the next timestamp can come in */
> + ev_status[0] &= ~MV88E6XXX_TAI_EVENT_STATUS_VALID;
> + err = mv88e6xxx_tai_write(chip, MV88E6XXX_TAI_EVENT_STATUS,
> + ev_status[0]);
> +
> + if (ev_status[0] & MV88E6XXX_TAI_EVENT_STATUS_CAP_TRIG) {
> + /* TAI is configured to timestamp internal events.
> + * This will be a PPS event.
> + */
> + ev.type = PTP_CLOCK_PPS;
> + } else {
> + /* Otherwise this is an external timestamp */
> + ev.type = PTP_CLOCK_EXTTS;
> + }
> + /* We only have one timestamping channel. */
> + ev.index = 0;
> + ev.timestamp = timecounter_cyc2time(&chip->tstamp_tc, raw_ts);
> +
> + ptp_clock_event(chip->ptp_clock, &ev);
> + }
> +
> + mutex_unlock(&chip->reg_lock);
> +
> + schedule_delayed_work(&chip->tai_event_work, TAI_EVENT_WORK_INTERVAL);
> +}
> +
> +static int mv88e6xxx_ptp_enable_perout(struct mv88e6xxx_chip *chip,
> + struct ptp_clock_request *rq, int on)
> +{
> + struct timespec ts;
> + u64 ns;
> + int pin;
> + int err;
> +
> + pin = ptp_find_pin(chip->ptp_clock, PTP_PF_PEROUT, rq->extts.index);
> +
> + if (pin < 0)
> + return -EBUSY;
> +
> + ts.tv_sec = rq->perout.period.sec;
> + ts.tv_nsec = rq->perout.period.nsec;
> + ns = timespec_to_ns(&ts);
> +
> + if (ns > U32_MAX)
> + return -ERANGE;
> +
> + mutex_lock(&chip->reg_lock);
> +
> + err = mv88e6xxx_config_periodic_trig(chip, (u32)ns, 0);
Here you ignore the phase of the signal given in the trq->perout.start
field. That is not what the user expects. For periodic outputs where
the phase cannot be set, we really would need a new ioctl.
However, in this case, you should just drop this functionality. I
understand that this works with your adjustable external oscillator,
but we cannot support that in mainline (at least, not yet).
Thanks,
Richard
> + if (err)
> + goto out;
> +
> + if (on) {
> + err = mv88e6xxx_g2_set_gpio_config(
> + chip, pin, MV88E6XXX_G2_SCRATCH_GPIO_MODE_TRIG,
> + MV88E6XXX_G2_SCRATCH_GPIO_DIR_OUT);
> + } else {
> + err = mv88e6xxx_g2_set_gpio_config(
> + chip, pin, MV88E6XXX_G2_SCRATCH_GPIO_MODE_GPIO,
> + MV88E6XXX_G2_SCRATCH_GPIO_DIR_IN);
> + }
> +
> +out:
> + mutex_unlock(&chip->reg_lock);
> +
> + return err;
> +}
^ permalink raw reply
* Re: devlink dump of mlxsw_adj table triggers a panic
From: David Ahern @ 2017-10-08 15:04 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko; +Cc: netdev@vger.kernel.org
In-Reply-To: <c7286c2b-09de-1221-303a-0aed3ed77c21@mellanox.com>
On 10/8/17 1:43 AM, Arkadi Sharshevsky wrote:
> Thanks, will check it out. How many nexthops groups & overall number of
> nexthops you configured?
8 ports with 62 VLANs on each (496 total vlan devices) and 62 VRFs. BGP
is exchanging routes with neighbors. No multipath routes.
^ permalink raw reply
* Re: [PATCH net-next RFC 2/9] net: dsa: mv88e6xxx: expose switch time as a PTP hardware clock
From: Richard Cochran @ 2017-10-08 14:52 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-3-git-send-email-brandon.streiff@ni.com>
On Thu, Sep 28, 2017 at 10:25:34AM -0500, Brandon Streiff wrote:
> +static int mv88e6xxx_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
> +{
> + if (scaled_ppm == 0)
> + return 0;
> +
> + return -EOPNOTSUPP;
> +}
We really want to have an adjustable clock here. More below.
> +int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
> +{
> + /* Set up the cycle counter */
> + memset(&chip->tstamp_cc, 0, sizeof(chip->tstamp_cc));
> + chip->tstamp_cc.read = mv88e6xxx_ptp_clock_read;
> + chip->tstamp_cc.mask = CYCLECOUNTER_MASK(32);
> + /* Raw timestamps are in units of 8-ns clock periods. */
> + chip->tstamp_cc.mult = 8;
> + chip->tstamp_cc.shift = 0;
First of all, the switch can use an external clock, and so at the very
least, the period should be a macro so that if and when we support the
external clock, the macro may be converted into a variable.
Secondly, the mult/shift should be chosen to allow the finest possible
frequency adjustment. Here is what I did:
---
#define N 28
#define CC_MULT (8 << N)
int mv88e635x_setup(struct dsa_switch *ds)
{
struct mv88e6xxx_chip *ps = ds->priv;
ps->cc.read = mv88e635x_global_time_read;
ps->cc.mask = CLOCKSOURCE_MASK(32);
ps->cc.mult = CC_MULT;
ps->cc.shift = N;
timecounter_init(&ps->tc, &ps->cc, ktime_to_ns(ktime_get_real()));
...
}
static int mv88e635x_ptp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
{
u64 adj;
u32 diff, mult;
int neg_adj = 0;
struct mv88e6xxx_chip *ps =
container_of(ptp, struct mv88e6xxx_chip, ptp_info);
if (ppb < 0) {
neg_adj = 1;
ppb = -ppb;
}
mult = CC_MULT;
adj = mult;
adj *= ppb;
diff = div_u64(adj, 1000000000ULL);
mutex_lock(&ps->clock_mutex);
timecounter_read(&ps->tc);
ps->cc.mult = neg_adj ? mult - diff : mult + diff;
mutex_unlock(&ps->clock_mutex);
return 0;
}
---
(This is the legacy adjfreq method, but you can easily convert it into
the adjfine method.)
Of course, this means that you'll have to drop the periodic output
signal code.
Thanks,
Richard
^ permalink raw reply
* [PATCH iproute2 net-next] ip: mroute: Print offload indication
From: Yotam Gigi @ 2017-10-08 14:43 UTC (permalink / raw)
To: netdev; +Cc: stephen, davem, mlxsw, Yotam Gigi
Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.
Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1) Iif: sw1p7 Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1) Iif: sw1p7 Oifs: sw1p4 State: resolved offload
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
---
ip/ipmroute.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index b51c23c..453a6cf 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -161,6 +161,8 @@ int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
}
fprintf(fp, " State: %s",
r->rtm_flags & RTNH_F_UNRESOLVED ? "unresolved" : "resolved");
+ if (r->rtm_flags & RTNH_F_OFFLOAD)
+ fprintf(fp, " offload");
if (show_stats && tb[RTA_MFC_STATS]) {
struct rta_mfc_stats *mfcs = RTA_DATA(tb[RTA_MFC_STATS]);
--
2.8.4
^ permalink raw reply related
* [PATCH] tests: Remove bashisms (s/source/.)
From: Petr Vorel @ 2017-10-08 14:39 UTC (permalink / raw)
To: netdev; +Cc: Petr Vorel
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
---
testsuite/tests/ip/link/new_link.t | 2 +-
testsuite/tests/ip/link/show_dev_wo_vf_rate.t | 2 +-
testsuite/tests/ip/netns/set_nsid.t | 2 +-
testsuite/tests/ip/netns/set_nsid_batch.t | 2 +-
testsuite/tests/ip/route/add_default_route.t | 4 ++--
testsuite/tests/ip/tunnel/add_tunnel.t | 2 +-
testsuite/tests/tc/cls-testbed.t | 2 +-
testsuite/tests/tc/dsmark.t | 2 +-
testsuite/tests/tc/pedit.t | 2 +-
9 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/testsuite/tests/ip/link/new_link.t b/testsuite/tests/ip/link/new_link.t
index 699adbcd..c17650a2 100755
--- a/testsuite/tests/ip/link/new_link.t
+++ b/testsuite/tests/ip/link/new_link.t
@@ -1,6 +1,6 @@
#!/bin/sh
-source lib/generic.sh
+. lib/generic.sh
ts_log "[Testing add/del virtual links]"
diff --git a/testsuite/tests/ip/link/show_dev_wo_vf_rate.t b/testsuite/tests/ip/link/show_dev_wo_vf_rate.t
index a600ba65..5b3c004e 100755
--- a/testsuite/tests/ip/link/show_dev_wo_vf_rate.t
+++ b/testsuite/tests/ip/link/show_dev_wo_vf_rate.t
@@ -1,6 +1,6 @@
#!/bin/sh
-source lib/generic.sh
+. lib/generic.sh
NL_FILE="tests/ip/link/dev_wo_vf_rate.nl"
ts_ip "$0" "Show VF devices w/o VF rate info" -d monitor file $NL_FILE
diff --git a/testsuite/tests/ip/netns/set_nsid.t b/testsuite/tests/ip/netns/set_nsid.t
index 606d45ab..8f8c7792 100755
--- a/testsuite/tests/ip/netns/set_nsid.t
+++ b/testsuite/tests/ip/netns/set_nsid.t
@@ -1,6 +1,6 @@
#!/bin/sh
-source lib/generic.sh
+. lib/generic.sh
ts_log "[Testing netns nsid]"
diff --git a/testsuite/tests/ip/netns/set_nsid_batch.t b/testsuite/tests/ip/netns/set_nsid_batch.t
index abb3f1bb..196fd4b3 100755
--- a/testsuite/tests/ip/netns/set_nsid_batch.t
+++ b/testsuite/tests/ip/netns/set_nsid_batch.t
@@ -1,6 +1,6 @@
#!/bin/sh
-source lib/generic.sh
+. lib/generic.sh
ts_log "[Testing netns nsid in batch mode]"
diff --git a/testsuite/tests/ip/route/add_default_route.t b/testsuite/tests/ip/route/add_default_route.t
index e5ea6473..0b566f1f 100755
--- a/testsuite/tests/ip/route/add_default_route.t
+++ b/testsuite/tests/ip/route/add_default_route.t
@@ -1,6 +1,6 @@
-#!/bin/sh
+#!/bin/bash
-source lib/generic.sh
+. lib/generic.sh
ts_log "[Testing add default route]"
diff --git a/testsuite/tests/ip/tunnel/add_tunnel.t b/testsuite/tests/ip/tunnel/add_tunnel.t
index 18f6e370..3f5a9d3c 100755
--- a/testsuite/tests/ip/tunnel/add_tunnel.t
+++ b/testsuite/tests/ip/tunnel/add_tunnel.t
@@ -1,6 +1,6 @@
#!/bin/sh
-source lib/generic.sh
+. lib/generic.sh
TUNNEL_NAME="tunnel_test_ip"
diff --git a/testsuite/tests/tc/cls-testbed.t b/testsuite/tests/tc/cls-testbed.t
index 2afc26fc..d5c21e5c 100755
--- a/testsuite/tests/tc/cls-testbed.t
+++ b/testsuite/tests/tc/cls-testbed.t
@@ -1,7 +1,7 @@
#!/bin/bash
# vim: ft=sh
-source lib/generic.sh
+. lib/generic.sh
QDISCS="cbq htb dsmark"
diff --git a/testsuite/tests/tc/dsmark.t b/testsuite/tests/tc/dsmark.t
index 6934165e..177585e6 100755
--- a/testsuite/tests/tc/dsmark.t
+++ b/testsuite/tests/tc/dsmark.t
@@ -1,7 +1,7 @@
#!/bin/bash
# vim: ft=sh
-source lib/generic.sh
+. lib/generic.sh
ts_qdisc_available "dsmark"
if [ $? -eq 0 ]; then
diff --git a/testsuite/tests/tc/pedit.t b/testsuite/tests/tc/pedit.t
index e9b6c333..8d531a05 100755
--- a/testsuite/tests/tc/pedit.t
+++ b/testsuite/tests/tc/pedit.t
@@ -1,6 +1,6 @@
#!/bin/sh
-source lib/generic.sh
+. lib/generic.sh
DEV="$(rand_dev)"
ts_ip "$0" "Add $DEV dummy interface" link add dev $DEV type dummy
--
2.14.2
^ permalink raw reply related
* [PATCH iproute2 1/1] color: Fix ip segfault in color_fprintf() when using --color switch
From: Petr Vorel @ 2017-10-08 14:38 UTC (permalink / raw)
To: netdev; +Cc: Petr Vorel, Julien Fortin, Stephen Hemminger
This fixes two regressions:
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
caused segfault, when running ip with --color switch, as 'attr + 8' in
color_fprintf() access array item out of bounds.
Changing latter value of ternar operator in attr_colors[] index is for
restoring the same colors.
Reproduce the bug with:
$ ip -c a
Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using 0 is valid option.
Reproduce the bug with:
$ COLORFGBG='0;15' ip -c a
NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
---
include/json_print.h | 2 +-
lib/color.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/json_print.h b/include/json_print.h
index b6ce1f9f..2f3f07c8 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -53,7 +53,7 @@ void close_json_array(enum output_type type, const char *delim);
const char *fmt, \
type value) \
{ \
- print_color_##type_name(t, -1, key, fmt, value); \
+ print_color_##type_name(t, 0, key, fmt, value); \
}
_PRINT_FUNC(int, int);
_PRINT_FUNC(bool, bool);
diff --git a/lib/color.c b/lib/color.c
index 79d5e289..e597798f 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -110,7 +110,7 @@ int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...)
}
ret += fprintf(fp, "%s",
- color_codes[attr_colors[is_dark_bg ? attr + 8 : attr]]);
+ color_codes[attr_colors[is_dark_bg ? attr + 6 : attr - 1]]);
ret += vfprintf(fp, fmt, args);
ret += fprintf(fp, "%s", color_codes[C_CLEAR]);
--
2.14.2
^ permalink raw reply related
* Re: [PATCH net-next RFC 1/9] net: dsa: mv88e6xxx: add accessors for PTP/TAI registers
From: Richard Cochran @ 2017-10-08 14:32 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-2-git-send-email-brandon.streiff@ni.com>
On Thu, Sep 28, 2017 at 10:25:33AM -0500, Brandon Streiff wrote:
> This patch implements support for accessing PTP/TAI registers through
To avoid confusion, it would be helpful to mention what TAI stands for
here and also in the source code comments!
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next RFC 8/9] net: dsa: mv88e6xxx: add rx/tx timestamping support
From: Richard Cochran @ 2017-10-08 14:24 UTC (permalink / raw)
To: Brandon Streiff
Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-9-git-send-email-brandon.streiff@ni.com>
On Thu, Sep 28, 2017 at 10:25:40AM -0500, Brandon Streiff wrote:
> +static bool mv88e6xxx_should_tstamp(struct mv88e6xxx_chip *chip, int port,
> + struct sk_buff *skb, unsigned int type)
> +{
> + struct mv88e6xxx_port_hwtstamp *ps = &chip->port_hwtstamp[port];
> + u8 *ptp_hdr, *msgtype;
> + bool ret;
> +
> + if (port < 0 || port >= mv88e6xxx_num_ports(chip))
> + return false;
> +
> + ptp_hdr = _get_ptp_header(skb, type);
> + if (IS_ERR(ptp_hdr))
> + return false;
> +
> + if (unlikely(type & PTP_CLASS_V1))
> + msgtype = ptp_hdr + OFF_PTP_CONTROL;
> + else
> + msgtype = ptp_hdr;
> +
> + ret = test_bit(MV88E6XXX_HWTSTAMP_ENABLED, &ps->state);
This should be the first test, don't you think?
> + dev_dbg(chip->dev,
> + "p%d: PTP message classification 0x%x type 0x%x, tstamp? %d",
> + port, type, *msgtype, (int)ret);
> +
> + return ret;
> +}
> +
> +/* rxtstamp will be called in interrupt context so we don't to do
> + * anything like read PTP registers over SMI.
> + */
> +bool mv88e6xxx_port_rxtstamp(struct dsa_switch *ds, int port,
> + struct sk_buff *skb, unsigned int type)
> +{
> + struct mv88e6xxx_chip *chip = ds->priv;
> + struct skb_shared_hwtstamps *shhwtstamps;
> + __be32 *ptp_rx_ts;
> + u8 *ptp_hdr;
> + u32 raw_ts;
> + u64 ns;
> +
> + if (!chip->info->ptp_support)
> + return false;
> +
> + if (port < 0 || port >= mv88e6xxx_num_ports(chip))
> + return false;
This test is duplicated in mv88e6xxx_should_tstamp().
> + if (!mv88e6xxx_should_tstamp(chip, port, skb, type))
> + return false;
> +
> + shhwtstamps = skb_hwtstamps(skb);
> + memset(shhwtstamps, 0, sizeof(*shhwtstamps));
> +
> + /* Because we configured the arrival timestamper to put the counter
> + * into the 32-bit "reserved" field of the PTP header, we can retrieve
> + * the value from the packet directly instead of having to retrieve it
> + * via SMI.
> + */
> + ptp_hdr = _get_ptp_header(skb, type);
> + if (IS_ERR(ptp_hdr))
> + return false;
> + ptp_rx_ts = (__be32 *)(ptp_hdr + OFF_PTP_RESERVED);
> + raw_ts = __be32_to_cpu(*ptp_rx_ts);
> + ns = timecounter_cyc2time(&chip->tstamp_tc, raw_ts);
> + shhwtstamps->hwtstamp = ns_to_ktime(ns);
> +
> + dev_dbg(chip->dev, "p%d: rxtstamp %llx\n", port, ns);
> +
> + return false;
> +}
> +
> +static void mv88e6xxx_txtstamp_work(struct work_struct *ugly)
> +{
> + struct mv88e6xxx_port_hwtstamp *ps = container_of(
> + ugly, struct mv88e6xxx_port_hwtstamp, tx_tstamp_work);
> + struct mv88e6xxx_chip *chip = container_of(
> + ps, struct mv88e6xxx_chip, port_hwtstamp[ps->port_id]);
> + struct sk_buff *tmp_skb;
> + unsigned long tmp_tstamp_start;
> + int err;
> + u16 departure_block[4];
> + u16 tmp_seq_id;
> +
> + if (!test_bit(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS, &ps->state))
> + return;
> +
> + tmp_skb = ps->tx_skb;
> + tmp_seq_id = ps->tx_seq_id;
> + tmp_tstamp_start = ps->tx_tstamp_start;
> +
> + if (!tmp_skb)
> + return;
> +
> + mutex_lock(&chip->reg_lock);
> + err = mv88e6xxx_port_ptp_read(chip, ps->port_id,
> + MV88E6XXX_PORT_PTP_DEP_STS,
> + departure_block,
> + ARRAY_SIZE(departure_block));
> + mutex_unlock(&chip->reg_lock);
> +
> + if (err)
> + goto free_and_clear_skb;
> +
> + if (departure_block[0] & MV88E6XXX_PTP_TS_VALID) {
You can avoid the IfOk anti-pattern here. Make the test for !VALID
and move the 'else' block up.
> + struct skb_shared_hwtstamps shhwtstamps;
> + u64 ns;
> + u32 time_raw;
> + u16 status;
> +
> + /* We have the timestamp; go ahead and clear valid now */
> + mutex_lock(&chip->reg_lock);
> + mv88e6xxx_port_ptp_write(chip, ps->port_id,
> + MV88E6XXX_PORT_PTP_DEP_STS, 0);
> + mutex_unlock(&chip->reg_lock);
> +
> + status = departure_block[0] &
> + MV88E6XXX_PTP_TS_STATUS_MASK;
> + if (status != MV88E6XXX_PTP_TS_STATUS_NORMAL) {
> + dev_warn(chip->dev, "p%d: tx timestamp overrun\n",
> + ps->port_id);
> + goto free_and_clear_skb;
> + }
> +
> + if (departure_block[3] != tmp_seq_id) {
> + dev_warn(chip->dev, "p%d: unexpected sequence id\n",
> + ps->port_id);
> + goto free_and_clear_skb;
> + }
> +
> + memset(&shhwtstamps, 0, sizeof(shhwtstamps));
> + time_raw = ((u32)departure_block[2] << 16) |
> + departure_block[1];
> + ns = timecounter_cyc2time(&chip->tstamp_tc, time_raw);
> + shhwtstamps.hwtstamp = ns_to_ktime(ns);
> +
> + dev_dbg(chip->dev,
> + "p%d: txtstamp %llx status 0x%04x skb ID 0x%04x hw ID 0x%04x\n",
> + ps->port_id, ktime_to_ns(shhwtstamps.hwtstamp),
> + departure_block[0], tmp_seq_id, departure_block[3]);
> +
> + /* skb_complete_tx_timestamp() will free up the client to make
> + * another timestamp-able transmit. We have to be ready for it
> + * -- by clearing the ps->tx_skb "flag" -- beforehand.
> + */
> +
> + ps->tx_skb = NULL;
> + clear_bit_unlock(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS, &ps->state);
> +
> + skb_complete_tx_timestamp(tmp_skb, &shhwtstamps);
> +
> + } else {
> + if (time_is_before_jiffies(
> + tmp_tstamp_start + TX_TSTAMP_TIMEOUT)) {
> + dev_warn(chip->dev, "p%d: clearing tx timestamp hang\n",
> + ps->port_id);
> + goto free_and_clear_skb;
> + }
> +
> + /* The timestamp should be available quickly, while getting it
> + * is high priority and time bounded to only 10ms. A poll is
> + * warranted and this is the nicest way to realize it in a work
> + * item.
> + */
> + queue_work(system_highpri_wq, &ps->tx_tstamp_work);
> + }
> +
> + return;
> +
> +free_and_clear_skb:
> + ps->tx_skb = NULL;
> + clear_bit_unlock(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS, &ps->state);
> +
> + dev_kfree_skb_any(tmp_skb);
> +}
> +
> +void mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port,
> + struct sk_buff *clone, unsigned int type)
> +{
> + struct mv88e6xxx_chip *chip = ds->priv;
> + struct mv88e6xxx_port_hwtstamp *ps = &chip->port_hwtstamp[port];
> +
> + if (!chip->info->ptp_support)
> + return;
> +
> + if (port < 0 || port >= mv88e6xxx_num_ports(chip))
> + goto out;
This test is duplicated in mv88e6xxx_should_tstamp().
> + if (unlikely(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP) &&
> + mv88e6xxx_should_tstamp(chip, port, clone, type)) {
Please avoid the IfOk anti-pattern here as well.
> + __be16 *seq_ptr = (__be16 *)(_get_ptp_header(clone, type) +
> + OFF_PTP_SEQUENCE_ID);
> +
> + if (!test_and_set_bit_lock(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS,
> + &ps->state)) {
> + ps->tx_skb = clone;
> + ps->tx_tstamp_start = jiffies;
> + ps->tx_seq_id = be16_to_cpup(seq_ptr);
> +
> + /* Fetching the timestamp is high-priority work because
> + * 802.1AS bounds the time for a response.
> + *
> + * No need to check result of queue_work(). ps->tx_skb
> + * check ensures work item is not pending (it may be
> + * waiting to exit)
> + */
> + queue_work(system_highpri_wq, &ps->tx_tstamp_work);
> + return;
> + }
> +
> + /* Otherwise we're already in progress... */
> + dev_dbg(chip->dev,
> + "p%d: tx timestamp already in progress, discarding",
> + port);
> + }
> +
> +out:
> + /* We don't need it after all. */
> + kfree_skb(clone);
How about moving this logic should into the caller, letting the tx
callback return a code that tells whether the clone was accepted or
not?
> +}
Thanks,
Richard
^ permalink raw reply
* [PATCHv2] Add a driver for Renesas uPD60620 and uPD60620A PHYs
From: Bernd Edlinger @ 2017-10-08 13:40 UTC (permalink / raw)
To: netdev@vger.kernel.org; +Cc: Andrew Lunn, Florian Fainelli
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
---
drivers/net/phy/Kconfig | 5 +++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/uPD60620.c | 109 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 115 insertions(+)
create mode 100644 drivers/net/phy/uPD60620.c
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index cd931cf..e2cf8ff 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -366,6 +366,11 @@ config REALTEK_PHY
---help---
Supports the Realtek 821x PHY.
+config RENESAS_PHY
+ tristate "Driver for Renesas PHYs"
+ ---help---
+ Supports the Renesas PHYs uPD60620 and uPD60620A.
+
config ROCKCHIP_PHY
tristate "Driver for Rockchip Ethernet PHYs"
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 416df92..1404ad3 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o
obj-$(CONFIG_NATIONAL_PHY) += national.o
obj-$(CONFIG_QSEMI_PHY) += qsemi.o
obj-$(CONFIG_REALTEK_PHY) += realtek.o
+obj-$(CONFIG_RENESAS_PHY) += uPD60620.o
obj-$(CONFIG_ROCKCHIP_PHY) += rockchip.o
obj-$(CONFIG_SMSC_PHY) += smsc.o
obj-$(CONFIG_STE10XP) += ste10Xp.o
diff --git a/drivers/net/phy/uPD60620.c b/drivers/net/phy/uPD60620.c
new file mode 100644
index 0000000..96b3347
--- /dev/null
+++ b/drivers/net/phy/uPD60620.c
@@ -0,0 +1,109 @@
+/*
+ * Driver for the Renesas PHY uPD60620.
+ *
+ * Copyright (C) 2015 Softing Industrial Automation GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+
+#define UPD60620_PHY_ID 0xb8242824
+
+/* Extended Registers and values */
+/* PHY Special Control/Status */
+#define PHY_PHYSCR 0x1F /* PHY.31 */
+#define PHY_PHYSCR_10MB 0x0004 /* PHY speed = 10mb */
+#define PHY_PHYSCR_100MB 0x0008 /* PHY speed = 100mb */
+#define PHY_PHYSCR_DUPLEX 0x0010 /* PHY Duplex */
+
+/* PHY Special Modes */
+#define PHY_SPM 0x12 /* PHY.18 */
+
+/* Init PHY */
+
+static int upd60620_config_init(struct phy_device *phydev)
+{
+ /* Enable support for passive HUBs (could be a strap option) */
+ /* PHYMODE: All speeds, HD in parallel detect */
+ return phy_write(phydev, PHY_SPM, 0x0180 | phydev->mdio.addr);
+}
+
+/* Get PHY status from common registers */
+
+static int upd60620_read_status(struct phy_device *phydev)
+{
+ int phy_state;
+
+ /* Read negotiated state */
+ phy_state = phy_read(phydev, MII_BMSR);
+ if (phy_state < 0)
+ return phy_state;
+
+ phydev->link = 0;
+ phydev->lp_advertising = 0;
+ phydev->pause = 0;
+ phydev->asym_pause = 0;
+
+ if (phy_state & (BMSR_ANEGCOMPLETE | BMSR_LSTATUS)) {
+ phy_state = phy_read(phydev, PHY_PHYSCR);
+ if (phy_state < 0)
+ return phy_state;
+
+ if (phy_state & (PHY_PHYSCR_10MB | PHY_PHYSCR_100MB)) {
+ phydev->link = 1;
+ phydev->speed = SPEED_10;
+ phydev->duplex = DUPLEX_HALF;
+
+ if (phy_state & PHY_PHYSCR_100MB)
+ phydev->speed = SPEED_100;
+ if (phy_state & PHY_PHYSCR_DUPLEX)
+ phydev->duplex = DUPLEX_FULL;
+
+ phy_state = phy_read(phydev, MII_LPA);
+ if (phy_state < 0)
+ return phy_state;
+
+ phydev->lp_advertising
+ = mii_lpa_to_ethtool_lpa_t(phy_state);
+
+ if (phydev->duplex == DUPLEX_FULL) {
+ if (phy_state & LPA_PAUSE_CAP)
+ phydev->pause = 1;
+ if (phy_state & LPA_PAUSE_ASYM)
+ phydev->asym_pause = 1;
+ }
+ }
+ }
+ return 0;
+}
+
+MODULE_DESCRIPTION("Renesas uPD60620 PHY driver");
+MODULE_AUTHOR("Bernd Edlinger <bernd.edlinger@hotmail.de>");
+MODULE_LICENSE("GPL");
+
+static struct phy_driver upd60620_driver[1] = { {
+ .phy_id = UPD60620_PHY_ID,
+ .phy_id_mask = 0xfffffffe,
+ .name = "Renesas uPD60620",
+ .features = PHY_BASIC_FEATURES,
+ .flags = 0,
+ .config_init = upd60620_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = upd60620_read_status,
+} };
+
+module_phy_driver(upd60620_driver);
+
+static struct mdio_device_id __maybe_unused upd60620_tbl[] = {
+ { UPD60620_PHY_ID, 0xfffffffe },
+ { }
+};
+
+MODULE_DEVICE_TABLE(mdio, upd60620_tbl);
--
1.9.1
^ permalink raw reply related
* Re: [PATCH] Add a driver for Renesas uPD60620 and uPD60620A PHYs
From: Bernd Edlinger @ 2017-10-08 13:31 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev@vger.kernel.org, Florian Fainelli
In-Reply-To: <ca1763c4-fb7b-cf49-6bcf-d9e2c29c7363@hotmail.de>
Hi Andrew,
sorry for delayed reply.
Looks like I did not receive a copy of your e-mail.
>> Do you suggest that there are cases where auto negotiation does not
>> reach completion, and still provides a usable link status?
>
> My experience is that it often return 10/half, since everything should
> support that. And depending on what the MAC is doing, packets can
> sometime get across the link.
>>
>> I have tried to connect to link partners with fixed configuration
>> but even then the auto negotiation always competes normally.
>
> Which is a bit odd.
>
> There are a few different possibilities here. The peer PHY driver is
> broken. Rather than doing fixed, it actually set the possible
> negotiation options to just the one setting you tried to fix it
> to. And hence the uPD60620 device negotiated fine. Or the uPD60620 is
> broken is said it negotiated, but in fact it failed.
>
> What was the result? 10/Half, or the fixed values you set the peer to?
This is a dual-channel PHY, so I did just connect both ports and
played with the mii-tool -F / -A in different combinations on each
port and observed what happens when the cable is plugged in.
What happens is that the port with autonegotiation enabled detects
the correct speed and always half duplex, so the ASIC _pretends_ that
autonegotiatiation completes, when in fact only parallel detection
succeeded. Of course the other phy may be in full-duplex mode, but
that can not be detected by parallel detection.
The duplex mode would be full duplex by default, but my initialization
overrides a possible strap option and changes that to half duplex:
+ /* Enable support for passive HUBs (could be a strap option) */
+ /* PHYMODE: All speeds, HD in parallel detect */
+ return phy_write(phydev, PHY_SPM, 0x0180 | phydev->mdio.addr);
>>
>> Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
>
> Please send this is a new patch. If we were to take this is is, all
> the comments above would end up in the commit message.
>
> ---
>
> Under the --- you can however add comments which don't go into the
> commit log. Good practice is to list the things you changed since the
> previous version.
Thanks, I did not know that.
I will re-send the patch in a new thread.
Bernd.
^ permalink raw reply
* Re: [PATCH net-next RFC 5/9] net: dsa: forward hardware timestamping ioctls to switch driver
From: Richard Cochran @ 2017-10-08 13:12 UTC (permalink / raw)
To: Florian Fainelli
Cc: Brandon Streiff, netdev, linux-kernel, David S. Miller,
Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <cb36aecc-3649-e22a-0fd4-0babbb691c8c@gmail.com>
On Thu, Sep 28, 2017 at 10:25:34AM -0700, Florian Fainelli wrote:
> This echoes back to Andrew's comments in patch 2, but we may have to
> prefer PHY timestamping over MAC timestamping if both are available?
> Richard, is that usually how the preference should be made?
No, if the MAC supports time stamping, then it will take precedence,
because the MAC driver doesn't know that the PHY also supports this.
In the case where a board design includes the PHYTER (the one and only
PHY PHC) and a MAC PHC, the user must de-select the MAC support in the
Kconfig in order to use the PHYTER.
So in general, we don't support PHC/timestamping simultaneously in the
MAC and PHY. It would be a lot of work to support this, and the user
timestamping API would have to be extended yet again, and so I think
it is not worth the effort.
Getting back to this patch, it should fall back to PHY timestamping
when the switch device doesn't support timestamping:
case SIOCGHWTSTAMP:
if (ds->ops->port_hwtstamp_get)
return ds->ops->port_hwtstamp_get(ds, port, ifr);
else
return phy_mii_ioctl(dev->phydev, ifr, cmd);
That way, if someone combines a PHYTER with a non-PTP capable switch,
it will just work.
Thanks,
Richard
^ permalink raw reply
* [PATCH] wcn36xx: Remove unnecessary rcu_read_unlock in wcn36xx_bss_info_changed
From: Jia-Ju Bai @ 2017-10-08 13:06 UTC (permalink / raw)
To: k.eugene.e, kvalo
Cc: wcn36xx, linux-wireless, netdev, linux-kernel, Jia-Ju Bai
No rcu_read_lock is called, but rcu_read_unlock is still called.
Thus rcu_read_unlock should be removed.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
---
drivers/net/wireless/ath/wcn36xx/main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c b/drivers/net/wireless/ath/wcn36xx/main.c
index 35bd50b..b83f01d 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -812,7 +812,6 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw *hw,
if (!sta) {
wcn36xx_err("sta %pM is not found\n",
bss_conf->bssid);
- rcu_read_unlock();
goto out;
}
sta_priv = wcn36xx_sta_to_priv(sta);
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH net-next RFC 2/9] net: dsa: mv88e6xxx: expose switch time as a PTP hardware clock
From: Richard Cochran @ 2017-10-08 12:07 UTC (permalink / raw)
To: Brandon Streiff
Cc: Andrew Lunn, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David S. Miller, Florian Fainelli, Vivien Didelot, Erik Hons
In-Reply-To: <CO2PR04MB218426561C8915B89947A79E9F7E0@CO2PR04MB2184.namprd04.prod.outlook.com>
On Fri, Sep 29, 2017 at 03:17:02PM +0000, Brandon Streiff wrote:
>
> Although now that I'm looking it over again, I'm also not certain of
> the need. Even if we're called more frequently than we expect, that
> doesn't seem to be harmful with regard to timekeeping. Hmm.
Just keep it simple and drop the extra logic. It doesn't hurt to
over-sample the clock. Here is what I did:
/* Covers both a 100 or a 125 MHz input clock. */
#define MV88E635X_OVERFLOW_PERIOD (HZ * 16)
static void mv88e635x_overflow_check(struct work_struct *ws)
{
struct timespec64 ts;
struct mv88e6xxx_chip *ps =
container_of(ws, struct mv88e6xxx_chip, oflow_work.work);
mv88e635x_ptp_gettime(&ps->ptp_info, &ts);
pr_debug("mv88e635x overflow check at %lld.%09lu\n",
ts.tv_sec, ts.tv_nsec);
schedule_delayed_work(&ps->oflow_work, MV88E635X_OVERFLOW_PERIOD);
}
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next RFC 2/9] net: dsa: mv88e6xxx: expose switch time as a PTP hardware clock
From: Richard Cochran @ 2017-10-08 11:59 UTC (permalink / raw)
To: Brandon Streiff
Cc: Andrew Lunn, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David S. Miller, Florian Fainelli, Vivien Didelot, Erik Hons
In-Reply-To: <CO2PR04MB218421921CC649BC026F34669F7E0@CO2PR04MB2184.namprd04.prod.outlook.com>
On Fri, Sep 29, 2017 at 03:28:02PM +0000, Brandon Streiff wrote:
>
> NETWORK_PHY_TIMESTAMPING implies NET_PTP_CLASSIFY (which I do use)
> and net/core/timestamping.c (which I didn't). It probably makes more
> sense to just depend on NET_PTP_CLASSIFY directly.
Yes, that makes sense to do, if you can make it work.
With my driver I tried depending on NET_PTP_CLASSIFY, but there was
some Kconfig issue, and rather than figuring it out I did the lazy
thing and used NETWORK_PHY_TIMESTAMPING.
Thanks,
Richard
^ permalink raw reply
* Re: [patch net-next 1/6] net: bridge: Use the MDB_RTR_TYPE_TEMP on bridge device too
From: Yotam Gigi @ 2017-10-08 10:30 UTC (permalink / raw)
To: Nikolay Aleksandrov, Jiri Pirko, netdev
Cc: davem, idosch, nogahf, mlxsw, ivecera, andrew, stephen, nbd,
roopa
In-Reply-To: <52b8e2d1-ce5a-f0d5-cfa7-b483bf223f78@cumulusnetworks.com>
On 10/08/2017 12:42 PM, Nikolay Aleksandrov wrote:
> On 08/10/17 12:39, Nikolay Aleksandrov wrote:
>> On 08/10/17 08:23, Yotam Gigi wrote:
>>> On 10/05/2017 03:09 PM, Nikolay Aleksandrov wrote:
>>>> On 05/10/17 13:36, Jiri Pirko wrote:
>>>>> From: Yotam Gigi <yotamg@mellanox.com>
>>>>>
>>>>> Every bridge port is in one of four mcast router port states:
>>>>> - MDB_RTR_TYPE_PERM - the port is set by the user to be an mrouter port
>>>>> regardless of IGMP queries.
>>>>> - MDB_RTR_TYPE_DISABLED - the port is set by the user to not be an mrouter
>>>>> port regardless of IGMP queries.
>>>>> - MDB_RTR_TYPE_TEMP - the port is set by the user to be in mcast router
>>>>> learning state, but currently it is not an mrouter port as no IGMP query
>>>>> has been received by it for the last multicast_querier_interval.
>>>>> - MDB_RTR_TYPE_TEMP_QUERY - the port is set by the user to be in mcast
>>>>> router learning state, and currently it is an mrouter port due to an
>>>>> IGMP query that has been received by it during the passed
>>>>> multicast_querier_interval.
>>>> I think you got the last two partially mixed up, MDB_RTR_TYPE_TEMP marks the port as a router
>>>> regardless if there were any igmp queries, while TYPE_TEMP_QUERY means it's in learning
>>>> state. It is the timer (armed vs not) that defines if currently the port is a router
>>>> when one of the TEMP/TEMP_QUERY are set. In the _TEMP case it is always armed as it
>>>> is refreshed by user or igmp queries which was the point of that mode.
>>>> So this means in br_multicast_router() just check for the timer_pending or perm mode.
>>>
>>> As much as I tried to make this clear, it seems like I failed :)
>>>
>>> The 4 states I described are currently the "bridged port" states, not the
>>> "bridge device" state. A bridged port has these 4 states, all can be set by the
>>> user, while the bridge device only uses 3 of these states. This patch makes the
>>> bridge device use the 4 states too. I thought it makes sense.
>> (disclaimer: this is all about bridge ports, not bridge device)
>> Right, I'll try to explain again: _TEMP always marks the port as a mcast router,
>> it does not put it into just learning state waiting for an igmp query and it can
>> be refreshed by either a query or the user again setting the port in _TEMP.
>> While _TEMP_QUERY puts the port in learning state waiting for a query to become
>> a router, and _TEMP downgrades to _TEMP_QUERY if it expires.
>>
>> Does that make it clearer ?
>>
>> so for _TEMP you say:
>>>>> - MDB_RTR_TYPE_TEMP - the port is set by the user to be in mcast router
>>>>> learning state, but currently it is not an mrouter port as no IGMP query
>>>>> has been received by it for the last multicast_querier_interval.
>> which is not the case, it is always a router when that mode is set on a port.
>> Same for _TEMP_QUERY.
> Err, sorry by same I meant it is not correct, not that the _TEMP definition is the same.
> Need to get coffee :-)
Ho, I see that I was clear but not right :)
I had a look at the code and seems like you are right - for some reason I
thought that _TEMP is learning-active and _TEMP_QUERY is learning-inactive, and
the ports change states when according to IGMP queries.
>
>>> The first paragraph describes the current states of a bridged port, and the
>>> second one explains the difference between bridged port and bridge device. I
>>> will (try to) make it clearer if we agree on resending this patch.
>>>
>>> Is it clearer now?
Yes it is.
>>>
>>>
>>>> In the port code you have the following transitions:
>>>> _TEMP -> TEMP_QUERY (on timer fire or user-set val, port becomes learning only)
>>>> _TEMP -> _TEMP (noop on user refresh or igmp query, timer refreshes)
>>>> _TEMP_QUERY -> _TEMP_QUERY (on igmp query the timer is armed, port becomes router)
>>>>
>>>> you never have _TEMP_QUERY -> _TEMP, which you're using here to denote the timer
>>>> getting armed and the bridge becoming a router.
>>>
>>> I am not sure I got this one. I do address that: when an IGMP query is recieved
>>> and the current state is _TEMP_QUERY, I arm the timer and set the state to
>>> _TEMP. I marked that place on the patch, so you can see below.
>>>
>> Exactly, there is no such transition for the ports. I tried to say that you're using
>> the router type to distinguish between when a query is received and it is just learning.
>> I get that you need to do so, but that deviates from how ports are handled, thus I
>> suggested to use the timer state instead and drop the _TEMP for bridge device altogether.
>> If it's possible then the patch will be much simpler and you will not need the hacks
>> to hide the state from user-space which is the part I really don't like.
Yeah, I agree. Thanks for the feedback and sorry for the late answer :)
>>
>>>>> The bridge device (brX) itself can also be configured by the user to be
>>>>> either fixed, disabled or learning mrouter port states, but currently there
>>>>> is no distinction between the MDB_RTR_TYPE_TEMP_QUERY and MDB_RTR_TYPE_TEMP
>>>>> in the bridge internal state. Due to that, when an IGMP query is received,
>>>>> it is not straightforward to tell whether it changes the bridge device
>>>>> mrouter port status or not.
>>>> But before this patch the bridge device could not get that set.
>>>>
>>>>> Further patches in this patch-set will introduce notifications upon the
>>>>> bridge device mrouter port state. In order to prevent resending bridge
>>>>> mrouter notification when it is not needed, such distinction is necessary.
>>>>>
>>>> Granted the bridge device hasn't got a way to clearly distinguish the transitions
>>>> without the chance for a race and if using the timer one could get an unnecessary
>>> Is there a race condition?
>>>
>> With this state - no, if you try to use the timer state though there might be
>> an extra notification as I've explained below, that's why I asked if it would be
>> a problem. I think it is worth looking into using the timer state because it will
>> remove a lot of the hacks for hiding the state needed in this patch.
>>
>>>> notification but that seems like a corner case when the timer fires exactly at the
>>>> same time as the igmp query is received. Can't it be handled by just checking if
>>>> the new state is different in the notification receiver ?
>>>> If it can't and is a problem then I'd prefer to add a new boolean to denote that
>>>> router on/off transition rather than doing this.
>>>>
>>>>> Hence, add the distinction between MDB_RTR_TYPE_TEMP and
>>>>> MDB_RTR_TYPE_TEMP_QUERY states for the bridge device, similarly to any
>>>>> other bridge port.
>>>>>
>>>> This does not add proper MDB_RTR_TYPE_TEMP support for the bridge device
>>>> but seems to abuse it to distinguish the timer state, and changes
>>>> the meaning of MDB_RTR_TYPE_TEMP. Can't you just use the timer instead ?
>>>> I think it will simplify the set and avoid all of this.
>>>>
>>>>> In order to not break the current kernel-user API, don't propagate the new
>>>>> state to the user and use it only in the bridge internal state. Thus, if
>>>>> the user reads (either via sysfs or netlink) the bridge device mrouter
>>>>> state, he will get the MDB_RTR_TYPE_TEMP_QUERY state even if the current
>>>>> bridge state is MDB_RTR_TYPE_TEMP.
>>>>>
>>>>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>>>>> Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>> ---
>>>>> net/bridge/br_multicast.c | 25 +++++++++++++++++++++----
>>>>> net/bridge/br_netlink.c | 3 ++-
>>>>> net/bridge/br_private.h | 13 ++++++++++---
>>>>> net/bridge/br_sysfs_br.c | 3 ++-
>>>>> 4 files changed, 35 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>>>>> index 8dc5c8d..b86307b 100644
>>>>> --- a/net/bridge/br_multicast.c
>>>>> +++ b/net/bridge/br_multicast.c
>>>>> @@ -861,6 +861,17 @@ static void br_multicast_router_expired(unsigned long data)
>>>>>
>>>>> static void br_multicast_local_router_expired(unsigned long data)
>>>>> {
>>>>> + struct net_bridge *br = (struct net_bridge *) data;
>>>>> +
>>>>> + spin_lock(&br->multicast_lock);
>>>>> + if (br->multicast_router == MDB_RTR_TYPE_DISABLED ||
>>>>> + br->multicast_router == MDB_RTR_TYPE_PERM ||
>>>>> + timer_pending(&br->multicast_router_timer))
>>>>> + goto out;
>>>>> +
>>>>> + br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
>>>>> +out:
>>>>> + spin_unlock(&br->multicast_lock);
>>>>> }
>>>>>
>>>>> static void br_multicast_querier_expired(struct net_bridge *br,
>>>>> @@ -1364,9 +1375,12 @@ static void br_multicast_mark_router(struct net_bridge *br,
>>>>> unsigned long now = jiffies;
>>>>>
>>>>> if (!port) {
>>>>> - if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY)
>>>>> + if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY ||
>>>>> + br->multicast_router == MDB_RTR_TYPE_TEMP) {
>>>>> mod_timer(&br->multicast_router_timer,
>>>>> now + br->multicast_querier_interval);
>>>>> + br->multicast_router = MDB_RTR_TYPE_TEMP;
>>>>> + }
>>>
>>> These are the transitions:
>>> _TEMP -> _TEMP_QUERY
>>> _TEMP_QUERY -> _TEMP_QUERY
>> You clearly always set multicast_router to _TEMP, where did you see a transition
>> _TEMP_QUERY -> _TEMP_QUERY or _TEMP -> _TEMP_QUERY ?
>> All transitions are -> _TEMP, because you use it to differentiate between learning
>> state and when a query was received. Maybe we have different definition for
>> a transition :-)
>>
>>> In both transitions the timer is armed.
>>>
>>>
>>>>> return;
>>>>> }
>>>>>
>>>>> @@ -1952,7 +1966,7 @@ void br_multicast_init(struct net_bridge *br)
>>>>>
>>>>> spin_lock_init(&br->multicast_lock);
>>>>> setup_timer(&br->multicast_router_timer,
>>>>> - br_multicast_local_router_expired, 0);
>>>>> + br_multicast_local_router_expired, (unsigned long)br);
>>>>> setup_timer(&br->ip4_other_query.timer,
>>>>> br_ip4_multicast_querier_expired, (unsigned long)br);
>>>>> setup_timer(&br->ip4_own_query.timer, br_ip4_multicast_query_expired,
>>>>> @@ -2043,11 +2057,14 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val)
>>>>> case MDB_RTR_TYPE_DISABLED:
>>>>> case MDB_RTR_TYPE_PERM:
>>>>> del_timer(&br->multicast_router_timer);
>>>>> - /* fall through */
>>>>> - case MDB_RTR_TYPE_TEMP_QUERY:
>>>>> br->multicast_router = val;
>>>>> err = 0;
>>>>> break;
>>>>> + case MDB_RTR_TYPE_TEMP_QUERY:
>>>>> + if (br->multicast_router != MDB_RTR_TYPE_TEMP)
>>>>> + br->multicast_router = val;
>>>>> + err = 0;
>>>>> + break;
>>>>> }
>>>>>
>>>>> spin_unlock_bh(&br->multicast_lock);
>>>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>>>> index dea88a2..cee5016 100644
>>>>> --- a/net/bridge/br_netlink.c
>>>>> +++ b/net/bridge/br_netlink.c
>>>>> @@ -1357,7 +1357,8 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
>>>>> return -EMSGSIZE;
>>>>> #endif
>>>>> #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>>>>> - if (nla_put_u8(skb, IFLA_BR_MCAST_ROUTER, br->multicast_router) ||
>>>>> + if (nla_put_u8(skb, IFLA_BR_MCAST_ROUTER,
>>>>> + br_multicast_router_translate(br->multicast_router)) ||
>>>>> nla_put_u8(skb, IFLA_BR_MCAST_SNOOPING, !br->multicast_disabled) ||
>>>>> nla_put_u8(skb, IFLA_BR_MCAST_QUERY_USE_IFADDR,
>>>>> br->multicast_query_use_ifaddr) ||
>>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>>>> index ab4df24..e6e3fec 100644
>>>>> --- a/net/bridge/br_private.h
>>>>> +++ b/net/bridge/br_private.h
>>>>> @@ -649,9 +649,8 @@ void br_multicast_get_stats(const struct net_bridge *br,
>>>>>
>>>>> static inline bool br_multicast_is_router(struct net_bridge *br)
>>>>> {
>>>>> - return br->multicast_router == 2 ||
>>>>> - (br->multicast_router == 1 &&
>>>>> - timer_pending(&br->multicast_router_timer));
>>>>> + return br->multicast_router == MDB_RTR_TYPE_PERM ||
>>>>> + br->multicast_router == MDB_RTR_TYPE_TEMP;
>>>>> }
>>>>>
>>>>> static inline bool
>>>>> @@ -790,6 +789,14 @@ static inline int br_multicast_igmp_type(const struct sk_buff *skb)
>>>>> }
>>>>> #endif
>>>>>
>>>>> +static inline unsigned char
>>>> u8
>>>>
>>>>> +br_multicast_router_translate(unsigned char multicast_router)
>>>> u8, if need be change the type of the struct member
>>>
>>> Sorry, didn't quite get that. Currently, both the bridge_port and bridge have
>>> the multicast_router field, and in both cases it is of type unsigned char. Do
>>> you suggest to change both to be "u8"?
>>>
>> Right, and this is unnecessarily long and requires to be broken ugly like that with
>> the type above and name below. So I suggested to use u8 to avoid that.
>>
>>>>> +{
>>>>> + if (multicast_router == MDB_RTR_TYPE_TEMP)
>>>>> + return MDB_RTR_TYPE_TEMP_QUERY;
>>>>> + return multicast_router;
>>>>> +}
>>>>> +
>>>>> /* br_vlan.c */
>>>>> #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>>>>> bool br_allowed_ingress(const struct net_bridge *br,
>>>>> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
>>>>> index 723f25e..9b9c597 100644
>>>>> --- a/net/bridge/br_sysfs_br.c
>>>>> +++ b/net/bridge/br_sysfs_br.c
>>>>> @@ -340,7 +340,8 @@ static ssize_t multicast_router_show(struct device *d,
>>>>> struct device_attribute *attr, char *buf)
>>>>> {
>>>>> struct net_bridge *br = to_bridge(d);
>>>>> - return sprintf(buf, "%d\n", br->multicast_router);
>>>>> + return sprintf(buf, "%d\n",
>>>>> + br_multicast_router_translate(br->multicast_router));
>>>>> }
>>>>>
>>>>> static ssize_t multicast_router_store(struct device *d,
>>>>>
^ permalink raw reply
* [patch net-next repost 2/2] mlxsw: spectrum: Propagate extack further for bridge enslavements
From: Jiri Pirko @ 2017-10-08 9:57 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, mlxsw, dsahern
In-Reply-To: <20171008095756.2139-1-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
The code that actually takes care of bridge offload introduces a few
more non-trivial constraints with regards to bridge enslavements.
Propagate extack there to indicate the reason.
$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add link enp1s0np1 name enp1s0np1.20 type vlan id 20
$ ip link add name br0 type bridge
$ ip link set dev enp1s0np1.10 master br0
$ ip link set dev enp1s0np1.20 master br0
Error: spectrum: Can not bridge VLAN uppers of the same port.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 6 ++++--
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 3 ++-
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 25 +++++++++++++++-------
3 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 5ab4fd7..321988a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4299,7 +4299,8 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
if (info->linking)
err = mlxsw_sp_port_bridge_join(mlxsw_sp_port,
lower_dev,
- upper_dev);
+ upper_dev,
+ extack);
else
mlxsw_sp_port_bridge_leave(mlxsw_sp_port,
lower_dev,
@@ -4416,7 +4417,8 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
if (info->linking)
err = mlxsw_sp_port_bridge_join(mlxsw_sp_port,
vlan_dev,
- upper_dev);
+ upper_dev,
+ extack);
else
mlxsw_sp_port_bridge_leave(mlxsw_sp_port,
vlan_dev,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index ae67e60..8e45183 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -326,7 +326,8 @@ void
mlxsw_sp_port_vlan_bridge_leave(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan);
int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *brport_dev,
- struct net_device *br_dev);
+ struct net_device *br_dev,
+ struct netlink_ext_ack *extack);
void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *brport_dev,
struct net_device *br_dev);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 0f9eac5..2cfdf22 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -46,6 +46,7 @@
#include <linux/workqueue.h>
#include <linux/jiffies.h>
#include <linux/rtnetlink.h>
+#include <linux/netlink.h>
#include <net/switchdev.h>
#include "spectrum.h"
@@ -107,7 +108,8 @@ struct mlxsw_sp_bridge_vlan {
struct mlxsw_sp_bridge_ops {
int (*port_join)(struct mlxsw_sp_bridge_device *bridge_device,
struct mlxsw_sp_bridge_port *bridge_port,
- struct mlxsw_sp_port *mlxsw_sp_port);
+ struct mlxsw_sp_port *mlxsw_sp_port,
+ struct netlink_ext_ack *extack);
void (*port_leave)(struct mlxsw_sp_bridge_device *bridge_device,
struct mlxsw_sp_bridge_port *bridge_port,
struct mlxsw_sp_port *mlxsw_sp_port);
@@ -1735,12 +1737,15 @@ static const struct switchdev_ops mlxsw_sp_port_switchdev_ops = {
static int
mlxsw_sp_bridge_8021q_port_join(struct mlxsw_sp_bridge_device *bridge_device,
struct mlxsw_sp_bridge_port *bridge_port,
- struct mlxsw_sp_port *mlxsw_sp_port)
+ struct mlxsw_sp_port *mlxsw_sp_port,
+ struct netlink_ext_ack *extack)
{
struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan;
- if (is_vlan_dev(bridge_port->dev))
+ if (is_vlan_dev(bridge_port->dev)) {
+ NL_SET_ERR_MSG(extack, "spectrum: Can not enslave a VLAN device to a VLAN-aware bridge");
return -EINVAL;
+ }
mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_vid(mlxsw_sp_port, 1);
if (WARN_ON(!mlxsw_sp_port_vlan))
@@ -1797,13 +1802,16 @@ mlxsw_sp_port_is_br_member(const struct mlxsw_sp_port *mlxsw_sp_port,
static int
mlxsw_sp_bridge_8021d_port_join(struct mlxsw_sp_bridge_device *bridge_device,
struct mlxsw_sp_bridge_port *bridge_port,
- struct mlxsw_sp_port *mlxsw_sp_port)
+ struct mlxsw_sp_port *mlxsw_sp_port,
+ struct netlink_ext_ack *extack)
{
struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan;
u16 vid;
- if (!is_vlan_dev(bridge_port->dev))
+ if (!is_vlan_dev(bridge_port->dev)) {
+ NL_SET_ERR_MSG(extack, "spectrum: Only VLAN devices can be enslaved to a VLAN-unaware bridge");
return -EINVAL;
+ }
vid = vlan_dev_vlan_id(bridge_port->dev);
mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_vid(mlxsw_sp_port, vid);
@@ -1811,7 +1819,7 @@ mlxsw_sp_bridge_8021d_port_join(struct mlxsw_sp_bridge_device *bridge_device,
return -EINVAL;
if (mlxsw_sp_port_is_br_member(mlxsw_sp_port, bridge_device->dev)) {
- netdev_err(mlxsw_sp_port->dev, "Can't bridge VLAN uppers of the same port\n");
+ NL_SET_ERR_MSG(extack, "spectrum: Can not bridge VLAN uppers of the same port");
return -EINVAL;
}
@@ -1854,7 +1862,8 @@ static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021d_ops = {
int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *brport_dev,
- struct net_device *br_dev)
+ struct net_device *br_dev,
+ struct netlink_ext_ack *extack)
{
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
struct mlxsw_sp_bridge_device *bridge_device;
@@ -1867,7 +1876,7 @@ int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
bridge_device = bridge_port->bridge_device;
err = bridge_device->ops->port_join(bridge_device, bridge_port,
- mlxsw_sp_port);
+ mlxsw_sp_port, extack);
if (err)
goto err_port_join;
--
2.9.5
^ permalink raw reply related
* [patch net-next repost 1/2] mlxsw: spectrum: Add extack for VLAN enslavements
From: Jiri Pirko @ 2017-10-08 9:57 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, mlxsw, dsahern
In-Reply-To: <20171008095756.2139-1-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
Similar to physical ports, enslavement of VLAN devices can also fail.
Use extack to indicate why the enslavement failed.
$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add name bond0 type bond mode 802.3ad
$ ip link set dev enp1s0np1.10 master bond0
Error: spectrum: VLAN devices only support bridge and VRF uppers.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 5cd4df0..5ab4fd7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -53,6 +53,7 @@
#include <linux/notifier.h>
#include <linux/dcbnl.h>
#include <linux/inetdevice.h>
+#include <linux/netlink.h>
#include <net/switchdev.h>
#include <net/pkt_cls.h>
#include <net/tc_act/tc_mirred.h>
@@ -4389,18 +4390,25 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
{
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
struct netdev_notifier_changeupper_info *info = ptr;
+ struct netlink_ext_ack *extack;
struct net_device *upper_dev;
int err = 0;
+ extack = netdev_notifier_info_to_extack(&info->info);
+
switch (event) {
case NETDEV_PRECHANGEUPPER:
upper_dev = info->upper_dev;
- if (!netif_is_bridge_master(upper_dev))
+ if (!netif_is_bridge_master(upper_dev)) {
+ NL_SET_ERR_MSG(extack, "spectrum: VLAN devices only support bridge and VRF uppers");
return -EINVAL;
+ }
if (!info->linking)
break;
- if (netdev_has_any_upper_dev(upper_dev))
+ if (netdev_has_any_upper_dev(upper_dev)) {
+ NL_SET_ERR_MSG(extack, "spectrum: Enslaving a port to a device that already has an upper device is not supported");
return -EINVAL;
+ }
break;
case NETDEV_CHANGEUPPER:
upper_dev = info->upper_dev;
--
2.9.5
^ permalink raw reply related
* [patch net-next repost 0/2] mlxsw: Add more extack error reporting
From: Jiri Pirko @ 2017-10-08 9:57 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, mlxsw, dsahern
From: Jiri Pirko <jiri@mellanox.com>
Ido says:
Add error messages to VLAN and bridge enslavements to help users
understand why the enslavement failed.
Ido Schimmel (2):
mlxsw: spectrum: Add extack for VLAN enslavements
mlxsw: spectrum: Propagate extack further for bridge enslavements
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 18 ++++++++++++----
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 3 ++-
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 25 +++++++++++++++-------
3 files changed, 33 insertions(+), 13 deletions(-)
--
2.9.5
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox