* Re: [PATCH net] pppoe: fix reception of frames with no mac header
From: Guillaume Nault @ 2018-09-14 14:35 UTC (permalink / raw)
To: netdev; +Cc: Alexander Potapenko, Michal Ostrowski, Eric Dumazet
In-Reply-To: <274ac54fa02052104201d4738a6326a637e87a83.1536935190.git.g.nault@alphalink.fr>
On Fri, Sep 14, 2018 at 04:28:05PM +0200, Guillaume Nault wrote:
> pppoe_rcv() needs to look back at the Ethernet header in order to
> lookup the PPPoE session. Therefore we need to ensure that the mac
> header is big enough to contain an Ethernet header. Otherwise
> eth_hdr(skb)->h_source might access invalid data.
>
Forgot to Cc Alexander :/
Sorry...
BTW, thanks for your first analysis.
^ permalink raw reply
* [PATCH net] pppoe: fix reception of frames with no mac header
From: Guillaume Nault @ 2018-09-14 14:28 UTC (permalink / raw)
To: netdev; +Cc: Michal Ostrowski, Eric Dumazet
pppoe_rcv() needs to look back at the Ethernet header in order to
lookup the PPPoE session. Therefore we need to ensure that the mac
header is big enough to contain an Ethernet header. Otherwise
eth_hdr(skb)->h_source might access invalid data.
==================================================================
BUG: KMSAN: uninit-value in __get_item drivers/net/ppp/pppoe.c:172 [inline]
BUG: KMSAN: uninit-value in get_item drivers/net/ppp/pppoe.c:236 [inline]
BUG: KMSAN: uninit-value in pppoe_rcv+0xcef/0x10e0 drivers/net/ppp/pppoe.c:450
CPU: 0 PID: 4543 Comm: syz-executor355 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:53
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
__get_item drivers/net/ppp/pppoe.c:172 [inline]
get_item drivers/net/ppp/pppoe.c:236 [inline]
pppoe_rcv+0xcef/0x10e0 drivers/net/ppp/pppoe.c:450
__netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562
__netif_receive_skb net/core/dev.c:4627 [inline]
netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
netif_receive_skb+0x230/0x240 net/core/dev.c:4725
tun_rx_batched drivers/net/tun.c:1555 [inline]
tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962
tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
call_write_iter include/linux/fs.h:1782 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x7fb/0x9f0 fs/read_write.c:482
vfs_write+0x463/0x8d0 fs/read_write.c:544
SYSC_write+0x172/0x360 fs/read_write.c:589
SyS_write+0x55/0x80 fs/read_write.c:581
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x4447c9
RSP: 002b:00007fff64c8fc28 EFLAGS: 00000297 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004447c9
RDX: 000000000000fd87 RSI: 0000000020000600 RDI: 0000000000000004
RBP: 00000000006cf018 R08: 00007fff64c8fda8 R09: 00007fff00006bda
R10: 0000000000005fe7 R11: 0000000000000297 R12: 00000000004020d0
R13: 0000000000402160 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
slab_post_alloc_hook mm/slab.h:445 [inline]
slab_alloc_node mm/slub.c:2737 [inline]
__kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:984 [inline]
alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
tun_alloc_skb drivers/net/tun.c:1532 [inline]
tun_get_user+0x2242/0x7c60 drivers/net/tun.c:1829
tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
call_write_iter include/linux/fs.h:1782 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x7fb/0x9f0 fs/read_write.c:482
vfs_write+0x463/0x8d0 fs/read_write.c:544
SYSC_write+0x172/0x360 fs/read_write.c:589
SyS_write+0x55/0x80 fs/read_write.c:581
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
==================================================================
Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers")
Reported-by: syzbot+f5f6080811c849739212@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
drivers/net/ppp/pppoe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index ce61231e96ea..62dc564b251d 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -429,6 +429,9 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
if (!skb)
goto out;
+ if (skb_mac_header_len(skb) < ETH_HLEN)
+ goto drop;
+
if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr)))
goto drop;
--
2.19.0
^ permalink raw reply related
* Re: [PATCH net] veth: Orphan skb before GRO
From: Paolo Abeni @ 2018-09-14 14:16 UTC (permalink / raw)
To: Toshiaki Makita, David S. Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <1536899624-2438-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>
On Fri, 2018-09-14 at 13:33 +0900, Toshiaki Makita wrote:
> GRO expects skbs not to be owned by sockets, but when XDP is enabled veth
> passed skbs owned by sockets. It caused corrupted sk_wmem_alloc.
>
> Paolo Abeni reported the following splat:
>
> [ 362.098904] refcount_t overflow at skb_set_owner_w+0x5e/0xa0 in iperf3[1644], uid/euid: 0/0
> [ 362.108239] WARNING: CPU: 0 PID: 1644 at kernel/panic.c:648 refcount_error_report+0xa0/0xa4
> [ 362.117547] Modules linked in: tcp_diag inet_diag veth intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif iTCO_wdt sg ipmi_si iTCO_vendor_support ipmi_devintf mxm_wmi ipmi_msghandler pcspkr dcdbas mei_me wmi mei lpc_ich acpi_power_meter pcc_cpufreq xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe igb ttm ahci mdio libahci ptp crc32c_intel drm pps_core libata i2c_algo_bit dca dm_mirror dm_region_hash dm_log dm_mod
> [ 362.176622] CPU: 0 PID: 1644 Comm: iperf3 Not tainted 4.19.0-rc2.vanilla+ #2025
> [ 362.184777] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
> [ 362.193124] RIP: 0010:refcount_error_report+0xa0/0xa4
> [ 362.198758] Code: 08 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 80 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 18 4d e7 9d 31 c0 e8 30 fa ff ff <0f> 0b eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc
> [ 362.219711] RSP: 0018:ffff9ee6ff603c20 EFLAGS: 00010282
> [ 362.225538] RAX: 0000000000000000 RBX: ffffffff9de83e10 RCX: 0000000000000000
> [ 362.233497] RDX: 0000000000000001 RSI: ffff9ee6ff6167d8 RDI: ffff9ee6ff6167d8
> [ 362.241457] RBP: ffff9ee6ff603d78 R08: 0000000000000490 R09: 0000000000000004
> [ 362.249416] R10: 0000000000000000 R11: ffff9ee6ff603990 R12: ffff9ee664b94500
> [ 362.257377] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff9de615f9
> [ 362.265337] FS: 00007f1d22d28740(0000) GS:ffff9ee6ff600000(0000) knlGS:0000000000000000
> [ 362.274363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 362.280773] CR2: 00007f1d222f35d0 CR3: 0000001fddfec003 CR4: 00000000001606f0
> [ 362.288733] Call Trace:
> [ 362.291459] <IRQ>
> [ 362.293702] ex_handler_refcount+0x4e/0x80
> [ 362.298269] fixup_exception+0x35/0x40
> [ 362.302451] do_trap+0x109/0x150
> [ 362.306048] do_error_trap+0xd5/0x130
> [ 362.315766] invalid_op+0x14/0x20
> [ 362.319460] RIP: 0010:skb_set_owner_w+0x5e/0xa0
> [ 362.324512] Code: ef ff ff 74 49 48 c7 43 60 20 7b 4a 9d 8b 85 f4 01 00 00 85 c0 75 16 8b 83 e0 00 00 00 f0 01 85 44 01 00 00 0f 88 d8 23 16 00 <5b> 5d c3 80 8b 91 00 00 00 01 8b 85 f4 01 00 00 89 83 a4 00 00 00
> [ 362.345465] RSP: 0018:ffff9ee6ff603e20 EFLAGS: 00010a86
> [ 362.351291] RAX: 0000000000001100 RBX: ffff9ee65deec700 RCX: ffff9ee65e829244
> [ 362.359250] RDX: 0000000000000100 RSI: ffff9ee65e829100 RDI: ffff9ee65deec700
> [ 362.367210] RBP: ffff9ee65e829100 R08: 000000000002a380 R09: 0000000000000000
> [ 362.375169] R10: 0000000000000002 R11: fffff1a4bf77bb00 R12: ffffc0754661d000
> [ 362.383130] R13: ffff9ee65deec200 R14: ffff9ee65f597000 R15: 00000000000000aa
> [ 362.391092] veth_xdp_rcv+0x4e4/0x890 [veth]
> [ 362.399357] veth_poll+0x4d/0x17a [veth]
> [ 362.403731] net_rx_action+0x2af/0x3f0
> [ 362.407912] __do_softirq+0xdd/0x29e
> [ 362.411897] do_softirq_own_stack+0x2a/0x40
> [ 362.416561] </IRQ>
> [ 362.418899] do_softirq+0x4b/0x70
> [ 362.422594] __local_bh_enable_ip+0x50/0x60
> [ 362.427258] ip_finish_output2+0x16a/0x390
> [ 362.431824] ip_output+0x71/0xe0
> [ 362.440670] __tcp_transmit_skb+0x583/0xab0
> [ 362.445333] tcp_write_xmit+0x247/0xfb0
> [ 362.449609] __tcp_push_pending_frames+0x2d/0xd0
> [ 362.454760] tcp_sendmsg_locked+0x857/0xd30
> [ 362.459424] tcp_sendmsg+0x27/0x40
> [ 362.463216] sock_sendmsg+0x36/0x50
> [ 362.467104] sock_write_iter+0x87/0x100
> [ 362.471382] __vfs_write+0x112/0x1a0
> [ 362.475369] vfs_write+0xad/0x1a0
> [ 362.479062] ksys_write+0x52/0xc0
> [ 362.482759] do_syscall_64+0x5b/0x180
> [ 362.486841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 362.492473] RIP: 0033:0x7f1d22293238
> [ 362.496458] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 c5 54 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [ 362.517409] RSP: 002b:00007ffebaef8008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 362.525855] RAX: ffffffffffffffda RBX: 0000000000002800 RCX: 00007f1d22293238
> [ 362.533816] RDX: 0000000000002800 RSI: 00007f1d22d36000 RDI: 0000000000000005
> [ 362.541775] RBP: 00007f1d22d36000 R08: 00000002db777a30 R09: 0000562b70712b20
> [ 362.549734] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
> [ 362.557693] R13: 0000000000002800 R14: 00007ffebaef8060 R15: 0000562b70712260
>
> In order to avoid this, orphan the skb before entering GRO.
>
> Fixes: 948d4f214fde ("veth: Add driver XDP")
> Reported-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
> drivers/net/veth.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 8d679c8..41a00cd 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -463,6 +463,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb,
> int mac_len, delta, off;
> struct xdp_buff xdp;
>
> + skb_orphan(skb);
> +
> rcu_read_lock();
> xdp_prog = rcu_dereference(rq->xdp_prog);
> if (unlikely(!xdp_prog)) {
> @@ -508,8 +510,6 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb,
> skb_copy_header(nskb, skb);
> head_off = skb_headroom(nskb) - skb_headroom(skb);
> skb_headers_offset_update(nskb, head_off);
> - if (skb->sk)
> - skb_set_owner_w(nskb, skb->sk);
> consume_skb(skb);
> skb = nskb;
> }
I just gave it a run in my test environment, and it fixes the reported
issue.
Tested-by: Paolo Abeni <pabeni@redhat.com>
^ permalink raw reply
* [PATCH iproute2] q_cake: Also print nonat, nowash and no-ack-filter keywords
From: Toke Høiland-Jørgensen @ 2018-09-14 13:51 UTC (permalink / raw)
To: netdev; +Cc: cake, Toke Høiland-Jørgensen
Similar to the previous patch for no-split-gso, the negative keywords for
'nat', 'wash' and 'ack-filter' were not printed either. Add those well.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
tc/q_cake.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tc/q_cake.c b/tc/q_cake.c
index 077bf84f..e827e3f1 100644
--- a/tc/q_cake.c
+++ b/tc/q_cake.c
@@ -468,6 +468,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
if (nat)
print_string(PRINT_FP, NULL, "nat ", NULL);
+ else
+ print_string(PRINT_FP, NULL, "nonat ", NULL);
print_bool(PRINT_JSON, "nat", NULL, nat);
if (tb[TCA_CAKE_WASH] &&
@@ -508,6 +510,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
if (wash)
print_string(PRINT_FP, NULL, "wash ", NULL);
+ else
+ print_string(PRINT_FP, NULL, "nowash ", NULL);
print_bool(PRINT_JSON, "wash", NULL, wash);
if (ingress)
@@ -520,7 +524,7 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
else if (ack_filter == CAKE_ACK_FILTER)
print_string(PRINT_ANY, "ack-filter", "ack-filter ", "enabled");
else
- print_string(PRINT_JSON, "ack-filter", NULL, "disabled");
+ print_string(PRINT_ANY, "ack-filter", "no-ack-filter ", "disabled");
if (split_gso)
print_string(PRINT_FP, NULL, "split-gso ", NULL);
--
2.18.0
^ permalink raw reply related
* Re: [PATCH iproute2] q_cake: Add printing of no-split-gso option
From: Toke Høiland-Jørgensen @ 2018-09-14 13:40 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20180912130743.1adfe86b@xeon-e3>
Stephen Hemminger <stephen@networkplumber.org> writes:
> On Wed, 12 Sep 2018 00:32:16 +0200
> Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>> When the GSO splitting was turned into dual split-gso/no-split-gso options,
>> the printing of the latter was left out. Add that, so output is consistent
>> with the options passed.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> Applied. I noticed that nat/nonat and wash/nowash have similar missing
> output.
Thanks! And yeah, you're right; I'll send another patch :)
-Toke
^ permalink raw reply
* Re: [PATCH] net/mlx4_core: print firmware version during driver loading
From: Qing Huang @ 2018-09-14 18:33 UTC (permalink / raw)
To: Andrew Lunn
Cc: Leon Romanovsky, netdev, linux-rdma, linux-kernel, tariqt, davem
In-Reply-To: <20180914181718.GD3811@lunn.ch>
On 9/14/2018 11:17 AM, Andrew Lunn wrote:
> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
>> The FW version is actually a very crucial piece of information and only
>> printed once here
>> when the driver is loaded. People tend to get confused when switching
>> multiple FW files
>> back and forth without running separate utility tools, especially at
>> customer sites.
>> IMHO, this information is very useful and only takes up very little log file
>> space. :-)
> Why not use ethtool -i ?
>
> $ sudo ethtool -i eth0
> driver: r8169
> version: 2.3LK-NAPI
> firmware-version: rtl8168g-2_0.0.1 02/06/13
>
> Andrew
Sure. You can also use ibstat or ibv_devinfo tool if they are installed.
But it's not very
convenient in some cases.
E.g.
A customer upgrades FW on HCAs and encounters issues. During triage,
it's much easier
to study customer uploaded log files when remotely testing different FW
files.
Thanks.
^ permalink raw reply
* KMSAN: uninit-value in do_ip_vs_set_ctl
From: syzbot @ 2018-09-14 18:23 UTC (permalink / raw)
To: coreteam, davem, fw, horms, ja, kadlec, linux-kernel, lvs-devel,
netdev, netfilter-devel, pablo, syzkaller-bugs, wensong
Hello,
syzbot found the following crash on:
HEAD commit: 06b2df0593a8 kmsan: unpoison only the created pages in get..
git tree: https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?x=11a6ae37800000
kernel config: https://syzkaller.appspot.com/x/.config?x=4ca1e57bafa8ab1f
dashboard link: https://syzkaller.appspot.com/bug?extid=23b5f9e7caf61d9a3898
compiler: clang version 7.0.0 (trunk 329391)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14008417800000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11deb017800000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+23b5f9e7caf61d9a3898@syzkaller.appspotmail.com
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KMSAN: uninit-value in do_ip_vs_set_ctl+0x15ac/0x2760
net/netfilter/ipvs/ip_vs_ctl.c:2424
CPU: 1 PID: 4464 Comm: syz-executor844 Not tainted 4.17.0-rc3+ #94
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1084
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
do_ip_vs_set_ctl+0x15ac/0x2760 net/netfilter/ipvs/ip_vs_ctl.c:2424
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x476/0x4d0 net/netfilter/nf_sockopt.c:115
ip_setsockopt+0x24b/0x2b0 net/ipv4/ip_sockglue.c:1253
raw_setsockopt+0x2e5/0x350 net/ipv4/raw.c:868
sock_common_setsockopt+0x136/0x170 net/core/sock.c:3039
__sys_setsockopt+0x4af/0x560 net/socket.c:1903
__do_sys_setsockopt net/socket.c:1914 [inline]
__se_sys_setsockopt net/socket.c:1911 [inline]
__x64_sys_setsockopt+0x15c/0x1c0 net/socket.c:1911
do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43fca9
RSP: 002b:00007fff7a4795b8 EFLAGS: 00000213 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fca9
RDX: 0000000000000480 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004015d0
R13: 0000000000401660 R14: 0000000000000000 R15: 0000000000000000
Local variable description: ----arg@do_ip_vs_set_ctl
Variable was created at:
read_pnet include/net/net_namespace.h:288 [inline]
sock_net include/net/sock.h:2306 [inline]
do_ip_vs_set_ctl+0x93/0x2760 net/netfilter/ipvs/ip_vs_ctl.c:2347
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x476/0x4d0 net/netfilter/nf_sockopt.c:115
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* Re: [PATCH] net/mlx4_core: print firmware version during driver loading
From: Andrew Lunn @ 2018-09-14 18:17 UTC (permalink / raw)
To: Qing Huang
Cc: Leon Romanovsky, netdev, linux-rdma, linux-kernel, tariqt, davem
In-Reply-To: <c580ad9d-b63d-743b-2278-1c4cf3553186@oracle.com>
On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
> The FW version is actually a very crucial piece of information and only
> printed once here
> when the driver is loaded. People tend to get confused when switching
> multiple FW files
> back and forth without running separate utility tools, especially at
> customer sites.
> IMHO, this information is very useful and only takes up very little log file
> space. :-)
Why not use ethtool -i ?
$ sudo ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: rtl8168g-2_0.0.1 02/06/13
Andrew
^ permalink raw reply
* Re: [PATCH net-next 4/4] bnxt_en: Always forward VF MAC address to the PF.
From: Siwei Liu @ 2018-09-14 12:49 UTC (permalink / raw)
To: Michael Chan; +Cc: David Miller, Netdev, si-wei liu
In-Reply-To: <1525763921-20698-5-git-send-email-michael.chan@broadcom.com>
This commit is toxic, if possible I hope it can be reverted and
reworked with a new patch.
First, the patch introduced backward incompatible changes to bnxt_en
VF driver that is causing issue when interoperating with the old PF
driver without this commit. In that event, VF probing fails from
within the VM:
[ 5.660331] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
[ 5.663653] bnxt_en 0000:00:03.0 (unnamed net_device)
(uninitialized): hwrm req_type 0xf seq id 0x6 error 0x4
[ 5.665804] bnxt_en 0000:00:03.0 (unnamed net_device)
(uninitialized): VF MAC address 00:01:02:03:04:05 not approved by the
PF
[ 5.668268] bnxt_en 0000:00:03.0: Unable to initialize mac address.
[ 5.670974] bnxt_en: probe of 0000:00:03.0 failed with error -99
Second, this commit contains driver changes to both PF and VF side,
and incorrectly assumes that both PF and VF can/should be updated at
the same time to resolve the original issue (zero VF MAC address in
'ip link show') it tried to address. In fact that is not warranted. A
potential warranted fix is for VF driver to ignore what
bnxt_approve_mac() may return when it got a valid MAC address from the
firmware. The only purpose for the bnxt_approve_mac call for this case
is a best-effort attempt to inform PF of the MAC address, instead of
failing the VF driver probe when talking to an old PF driver.
Canonical reported a similar issue a few days back due to the same cause.
https://www.spinics.net/lists/netdev/msg521428.html
Regards,
-Siwei
On Tue, May 8, 2018 at 12:18 AM, Michael Chan <michael.chan@broadcom.com> wrote:
> The current code already forwards the VF MAC address to the PF, except
> in one case. If the VF driver gets a valid MAC address from the firmware
> during probe time, it will not forward the MAC address to the PF,
> incorrectly assuming that the PF already knows the MAC address. This
> causes "ip link show" to show zero VF MAC addresses for this case.
>
> This assumption is not correct. Newer firmware remembers the VF MAC
> address last used by the VF and provides it to the VF driver during
> probe. So we need to always forward the VF MAC address to the PF.
>
> The forwarded MAC address may now be the PF assigned MAC address and so we
> need to make sure we approve it for this case.
>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index cd3ab78..dfa0839 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -8678,8 +8678,8 @@ static int bnxt_init_mac_addr(struct bnxt *bp)
> memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
> } else {
> eth_hw_addr_random(bp->dev);
> - rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
> }
> + rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
> #endif
> }
> return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
> index cc21d87..a649108 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
> @@ -923,7 +923,8 @@ static int bnxt_vf_configure_mac(struct bnxt *bp, struct bnxt_vf_info *vf)
> if (req->enables & cpu_to_le32(FUNC_VF_CFG_REQ_ENABLES_DFLT_MAC_ADDR)) {
> if (is_valid_ether_addr(req->dflt_mac_addr) &&
> ((vf->flags & BNXT_VF_TRUST) ||
> - (!is_valid_ether_addr(vf->mac_addr)))) {
> + !is_valid_ether_addr(vf->mac_addr) ||
> + ether_addr_equal(req->dflt_mac_addr, vf->mac_addr))) {
> ether_addr_copy(vf->vf_mac_addr, req->dflt_mac_addr);
> return bnxt_hwrm_exec_fwd_resp(bp, vf, msg_size);
> }
> --
> 1.8.3.1
>
^ permalink raw reply
* Re: [PATCH 5/7] MIPS: mscc: ocelot: add GPIO4 pinmuxing DT node
From: Alexandre Belloni @ 2018-09-14 18:02 UTC (permalink / raw)
To: Quentin Schulz
Cc: ralf, paul.burton, jhogan, robh+dt, mark.rutland, davem, andrew,
f.fainelli, allan.nielsen, linux-mips, devicetree, linux-kernel,
netdev, thomas.petazzoni, antoine.tenart
In-Reply-To: <20180914162638.fgzzjin2bzgx74de@qschulz>
On 14/09/2018 18:26:38+0200, Quentin Schulz wrote:
> Hi Alexandre,
>
> On Fri, Sep 14, 2018 at 04:54:46PM +0200, Alexandre Belloni wrote:
> > Hi,
> >
> > On 14/09/2018 11:44:26+0200, Quentin Schulz wrote:
> > > In order to use GPIO4 as a GPIO, we need to mux it in this mode so let's
> > > declare a new pinctrl DT node for it.
> > >
> > > Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
> > > ---
> > > arch/mips/boot/dts/mscc/ocelot.dtsi | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/arch/mips/boot/dts/mscc/ocelot.dtsi b/arch/mips/boot/dts/mscc/ocelot.dtsi
> > > index 8ce317c..b5c4c74 100644
> > > --- a/arch/mips/boot/dts/mscc/ocelot.dtsi
> > > +++ b/arch/mips/boot/dts/mscc/ocelot.dtsi
> > > @@ -182,6 +182,11 @@
> > > interrupts = <13>;
> > > #interrupt-cells = <2>;
> > >
> > > + gpio4: gpio4 {
> > > + pins = "GPIO_4";
> > > + function = "gpio";
> > > + };
> > > +
> >
> > For a GPIO, I would do that in the board dts because it is not used
> > directly in the dtsi.
> >
>
> And the day we've two boards using this pinctrl we move it to a dtsi. Is
> that the plan?
>
Not really, at least not for gpios. I've included the pinctrl for the
uart, i2c and spi because they are the only option if you are to use
those peripherals. Else, I've would have left the pinctrl to the board
file. From my point of view, the gpios are too board specific to be in a
soc dtsi.
--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply
* Re: [PATCH net-next 3/7] net: phy: mscc: split config_init in two functions for VSC8584
From: Florian Fainelli @ 2018-09-14 17:57 UTC (permalink / raw)
To: Quentin Schulz, alexandre.belloni, ralf, paul.burton, jhogan,
robh+dt, mark.rutland, davem, andrew
Cc: allan.nielsen, linux-mips, devicetree, linux-kernel, netdev,
thomas.petazzoni, antoine.tenart
In-Reply-To: <5daa7f3e467b218410238ef0fb97f01779f8f49f.1536916714.git-series.quentin.schulz@bootlin.com>
On 09/14/2018 02:44 AM, Quentin Schulz wrote:
> Part of the config init is common between the VSC8584 and the VSC8574,
> so to prepare the upcoming support for VSC8574, separate config_init
> PHY-specific code to config_pre_init function which is set in the probe
> function of the PHY and used in config_init.
>
> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
> ---
> drivers/net/phy/mscc.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
> index b450489..69cc3cf 100644
> --- a/drivers/net/phy/mscc.c
> +++ b/drivers/net/phy/mscc.c
> @@ -355,6 +355,7 @@ struct vsc8531_private {
> u64 *stats;
> int nstats;
> bool pkg_init;
> + int (*config_pre_init)(struct mii_bus *bus, int phy);
Is not this overkill given that you have a reference to the phy_device,
you could check for the for phy_id to know which exact type you have and
call the appropriate pre_init function?
unsigned int phy might be more appropriate.
--
Florian
^ permalink raw reply
* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
From: Jason A. Donenfeld @ 2018-09-14 17:49 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
Greg Kroah-Hartman, Samuel Neves, Andrew Lutomirski,
Jean-Philippe Aumasson, Eric Biggers
In-Reply-To: <CAKv+Gu-wwFJOL82+iJYCu8rbzeDWLYH=5PtGOJBUouB1zdiZjg@mail.gmail.com>
On Fri, Sep 14, 2018 at 7:38 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> so could we please bring that discussion to a close before we drop the ARM code?
My understanding is that either these will find their way up to AndyP
and then back down here, or Eric or you will augment the .S in this
patch at a later date with an improvement commit that includes some
benchmarks.
Jason
^ permalink raw reply
* Re: [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel
From: Jason A. Donenfeld @ 2018-09-14 17:47 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
Greg Kroah-Hartman
In-Reply-To: <CAKv+Gu_LYsNs88uF4+G1xfOtWvNPOjiiYZKqZf7qSBkvn6iEoA@mail.gmail.com>
On Fri, Sep 14, 2018 at 7:40 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> > - Move away from makefile ifdef maze and instead prefer kconfig values,
> > which also makes the design a bit more modular too, which could help
> > in the future.
>
> Could you elaborate on this? From the patches, it is not clear to me
> how this has improved.
Feature detection was prior done as a confusing set of ifeq and
ifdefs. Instead, I've now put the logic for this into the kconfig,
which makes the makefiles and header files a bit simpler. This also
makes it easier to later on modularize Zinc itself if deemed
necessary.
^ permalink raw reply
* Re: [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations
From: Jason A. Donenfeld @ 2018-09-14 17:45 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
Greg Kroah-Hartman, Samuel Neves, Andrew Lutomirski,
Jean-Philippe Aumasson, Andy Polyakov, Russell King - ARM Linux,
linux-arm-kernel
In-Reply-To: <CAKv+Gu8BD=fLk3zm8tvRQ3H-yiePqzXOrKLEz1BLFSRRz2opOQ@mail.gmail.com>
Hi Ard,
On Fri, Sep 14, 2018 at 7:27 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> As I asked in response to v3, could we please have this as a separate
> patch on top? The diff below is corrupted.
I had played with that originally, but thought it made things actually
harder to review, whereas here you have the changes presented pretty
straight forwardly, and I'd appreciate your review of them. If you and
Eric both prefer I split this into two commits, with the first one
just plopping down the CRYPTOGAMS code as is and the second one
bringing it up to kernel-snuff, I can do that.
> Also, both Andy and Eric have offered to get involved in upstreaming
> these changes to OpenSSL, so there is no delta to begin with.
Yes, I think this is probably a good long-term plan, which we can act
on sometime after Zinc is merged.
> I still don't like the GCC -includes, especially because these .h
> files contain function and variable definitions so they are not
> actually header files to begin with.
I very very strongly disagree with you here. I think doing it via
-include is significantly cleaner than any of the alternatives, and
allows the code to be cleanly expressed as conditionals that the
optimizer trivially compiles out in the case of stub functions
returning false and branch optimizes when the stub functions return
true. It is extremely important that these compile together as one
compilation unit. Yes, this is a different design than the crypto
API's approach, but I believe the approach presented here poses
significant improvements and is a lot cleaner.
> Also, you mentioned in the commit log that you got rid of defines and
> made the code more modular, but as far as I can tell, libzinc is still
> a single monolithic binary that is essentially always builtin once we
> move random.c to it.
Yes, it's still monolithic, but it's now trivial to split up when the
time comes to do that. If you and AndyL think that it should be split
into multiple modules _now_, then I can go ahead and do that for v5.
But if it's not essential, it seems simpler to keep it as is. I'll
wait for word from you two on this.
Jason
^ permalink raw reply
* Re: [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel
From: Ard Biesheuvel @ 2018-09-14 17:39 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Linux Kernel Mailing List, <netdev@vger.kernel.org>,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, David S. Miller,
Greg Kroah-Hartman
In-Reply-To: <20180914161954.7325-1-Jason@zx2c4.com>
On 14 September 2018 at 18:19, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Changes v3->v4:
> - Remove mistaken double 07/17 patch.
> - Fix whitespace issues in blake2s assembly.
> - It's not possible to put compound literals into __initconst, so
> we now instead just use boring fixed size struct members.
> - Move away from makefile ifdef maze and instead prefer kconfig values,
> which also makes the design a bit more modular too, which could help
> in the future.
Could you elaborate on this? From the patches, it is not clear to me
how this has improved.
> - Port old crypto API implementations (ChaCha20 and Poly1305) to Zinc.
> - Port security/keys/big_key to Zinc as second example of a good usage of
> Zinc.
> - Document precisely what is different between the kernel code and
> CRYPTOGAMS code when the CRYPTOGAMS code is used.
> - Move changelog to top of 00/20 message so that people can
> actually find it.
>
> -----------------------------------------------------------
>
> This patchset is available on git.kernel.org in this branch, where it may be
> pulled directly for inclusion into net-next:
>
> * https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=jd/wireguard
>
> -----------------------------------------------------------
>
> WireGuard is a secure network tunnel written especially for Linux, which
> has faced around three years of serious development, deployment, and
> scrutiny. It delivers excellent performance and is extremely easy to
> use and configure. It has been designed with the primary goal of being
> both easy to audit by virtue of being small and highly secure from a
> cryptography and systems security perspective. WireGuard is used by some
> massive companies pushing enormous amounts of traffic, and likely
> already today you've consumed bytes that at some point transited through
> a WireGuard tunnel. Even as an out-of-tree module, WireGuard has been
> integrated into various userspace tools, Linux distributions, mobile
> phones, and data centers. There are ports in several languages to
> several operating systems, and even commercial hardware and services
> sold integrating WireGuard. It is time, therefore, for WireGuard to be
> properly integrated into Linux.
>
> Ample information, including documentation, installation instructions,
> and project details, is available at:
>
> * https://www.wireguard.com/
> * https://www.wireguard.com/papers/wireguard.pdf
>
> As it is currently an out-of-tree module, it lives in its own git repo
> and has its own mailing list, and every commit for the module is tested
> against every stable kernel since 3.10 on a variety of architectures
> using an extensive test suite:
>
> * https://git.zx2c4.com/WireGuard
> https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/WireGuard.git/
> * https://lists.zx2c4.com/mailman/listinfo/wireguard
> * https://www.wireguard.com/build-status/
>
> The project has been broadly discussed at conferences, and was presented
> to the Netdev developers in Seoul last November, where a paper was
> released detailing some interesting aspects of the project. Dave asked
> me after the talk if I would consider sending in a v1 "sooner rather
> than later", hence this patchset. A decision is still waiting from the
> Linux Plumbers Conference, but an update on these topics may be presented
> in Vancouver in a few months. Prior presentations:
>
> * https://www.wireguard.com/presentations/
> * https://www.wireguard.com/papers/wireguard-netdev22.pdf
>
> The cryptography in the protocol itself has been formally verified by
> several independent academic teams with positive results, and I know of
> two additional efforts on their way to further corroborate those
> findings. The version 1 protocol is "complete", and so the purpose of
> this review is to assess the implementation of the protocol. However, it
> still may be of interest to know that the thing you're reviewing uses a
> protocol with various nice security properties:
>
> * https://www.wireguard.com/formal-verification/
>
> This patchset is divided into four segments. The first introduces a very
> simple helper for working with the FPU state for the purposes of amortizing
> SIMD operations. The second segment is a small collection of cryptographic
> primitives, split up into several commits by primitive and by hardware. The
> third shows usage of Zinc within the existing crypto API and as a replacement
> to the existing crypto API. The last is WireGuard itself, presented as an
> unintrusive and self-contained virtual network driver.
>
> It is intended that this entire patch series enter the kernel through
> DaveM's net-next tree. Subsequently, WireGuard patches will go through
> DaveM's net-next tree, while Zinc patches will go through Greg KH's tree.
>
> Enjoy,
> Jason
^ permalink raw reply
* Re: [PATCH net-next 2/7] net: phy: mscc: add support for VSC8584 PHY
From: Andrew Lunn @ 2018-09-14 17:27 UTC (permalink / raw)
To: Quentin Schulz
Cc: alexandre.belloni, ralf, paul.burton, jhogan, robh+dt,
mark.rutland, davem, f.fainelli, allan.nielsen, linux-mips,
devicetree, linux-kernel, netdev, thomas.petazzoni,
antoine.tenart
In-Reply-To: <a61d9affd3f1ec9deb60c882cce1daf37fbe2427.1536916714.git-series.quentin.schulz@bootlin.com>
> struct vsc8531_private {
> int rate_magic;
> u16 supp_led_modes;
> @@ -181,6 +354,7 @@ struct vsc8531_private {
> struct vsc85xx_hw_stat *hw_stats;
> u64 *stats;
> int nstats;
> + bool pkg_init;
> +/* bus->mdio_lock should be locked when using this function */
> +static int vsc8584_cmd(struct mii_bus *bus, int phy, u16 val)
> +{
> + unsigned long deadline;
> + u16 reg_val;
> +
> + __mdiobus_write(bus, phy, MSCC_EXT_PAGE_ACCESS,
> + MSCC_PHY_PAGE_EXTENDED_GPIO);
> +
> + __mdiobus_write(bus, phy, MSCC_PHY_PROC_CMD, PROC_CMD_NCOMPLETED | val);
Hi Quentin
All the __mdiobus_write() look a bit ugly. Maybe add bus and base_addr
to the vsc8531_private structure. Then add helpers
phy_write_base_phy(priv, reg, val) and phy_read_base_phy(priv, reg).
You could also add in:
if (unlikely(!mutex_is_locked(&priv->bus->mdio_lock))) {
dev_err(bus->dev, "MDIO bus lock not held!\n");
dump_stack();
}
Having such code in the mv88e6xxx driver has found a few bugs for me.
Andrew
^ permalink raw reply
* Re: [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations
From: Ard Biesheuvel @ 2018-09-14 17:27 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Linux Kernel Mailing List, <netdev@vger.kernel.org>,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, David S. Miller,
Greg Kroah-Hartman, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Andy Polyakov, Russell King,
linux-arm-kernel
In-Reply-To: <20180914162240.7925-9-Jason@zx2c4.com>
On 14 September 2018 at 18:22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> These NEON and non-NEON implementations come from Andy Polyakov's
> implementation. They are exactly the same as Andy Polyakov's original,
> with the following exceptions:
>
> - Entries and exits use the proper kernel convention macro.
> - CPU feature checking is done in C by the glue code, so that has been
> removed from the assembly.
> - The function names have been renamed to fit kernel conventions.
> - Labels have been renamed to fit kernel conventions.
> - The neon code can jump to the scalar code when it makes sense to do
> so.
>
> After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual
> instructions from the original.
>
As I asked in response to v3, could we please have this as a separate
patch on top? The diff below is corrupted.
Also, both Andy and Eric have offered to get involved in upstreaming
these changes to OpenSSL, so there is no delta to begin with.
> ARM:
>
> -poly1305_init:
> -.Lpoly1305_init:
> +ENTRY(poly1305_init_arm)
> stmdb sp!,{r4-r11}
>
> eor r3,r3,r3
> @@ -18,8 +25,6 @@
> moveq r0,#0
> beq .Lno_key
>
> - adr r11,.Lpoly1305_init
> - ldr r12,.LOPENSSL_armcap
> ldrb r4,[r1,#0]
> mov r10,#0x0fffffff
> ldrb r5,[r1,#1]
> @@ -34,8 +39,6 @@
> ldrb r7,[r1,#6]
> and r4,r4,r10
>
> - ldr r12,[r11,r12] @ OPENSSL_armcap_P
> - ldr r12,[r12]
> ldrb r8,[r1,#7]
> orr r5,r5,r6,lsl#8
> ldrb r6,[r1,#8]
> @@ -45,22 +48,6 @@
> ldrb r8,[r1,#10]
> and r5,r5,r3
>
> - tst r12,#ARMV7_NEON @ check for NEON
> - adr r9,poly1305_blocks_neon
> - adr r11,poly1305_blocks
> - it ne
> - movne r11,r9
> - adr r12,poly1305_emit
> - adr r10,poly1305_emit_neon
> - it ne
> - movne r12,r10
> - itete eq
> - addeq r12,r11,#(poly1305_emit-.Lpoly1305_init)
> - addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
> - addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init)
> - addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
> - orr r12,r12,#1 @ thumb-ify address
> - orr r11,r11,#1
> ldrb r9,[r1,#11]
> orr r6,r6,r7,lsl#8
> ldrb r7,[r1,#12]
> @@ -79,17 +66,16 @@
> str r6,[r0,#8]
> and r7,r7,r3
> str r7,[r0,#12]
> - stmia r2,{r11,r12} @ fill functions table
> - mov r0,#1
> - mov r0,#0
> .Lno_key:
> ldmia sp!,{r4-r11}
> bx lr @ bx lr
> tst lr,#1
> moveq pc,lr @ be binary compatible with V4, yet
> .word 0xe12fff1e @ interoperable with Thumb ISA:-)
> -poly1305_blocks:
> -.Lpoly1305_blocks:
> +ENDPROC(poly1305_init_arm)
> +
> +ENTRY(poly1305_blocks_arm)
> +.Lpoly1305_blocks_arm:
> stmdb sp!,{r3-r11,lr}
>
> ands r2,r2,#-16
> @@ -231,10 +217,11 @@
> tst lr,#1
> moveq pc,lr @ be binary compatible with V4, yet
> .word 0xe12fff1e @ interoperable with Thumb ISA:-)
> -poly1305_emit:
> +ENDPROC(poly1305_blocks_arm)
> +
> +ENTRY(poly1305_emit_arm)
> stmdb sp!,{r4-r11}
> .Lpoly1305_emit_enter:
> -
> ldmia r0,{r3-r7}
> adds r8,r3,#5 @ compare to modulus
> adcs r9,r4,#0
> @@ -305,8 +292,12 @@
> tst lr,#1
> moveq pc,lr @ be binary compatible with V4, yet
> .word 0xe12fff1e @ interoperable with Thumb ISA:-)
> +ENDPROC(poly1305_emit_arm)
> +
> +
>
> -poly1305_init_neon:
> +ENTRY(poly1305_init_neon)
> +.Lpoly1305_init_neon:
> ldr r4,[r0,#20] @ load key base 2^32
> ldr r5,[r0,#24]
> ldr r6,[r0,#28]
> @@ -515,8 +506,9 @@
> vst1.32 {d8[1]},[r7]
>
> bx lr @ bx lr
> +ENDPROC(poly1305_init_neon)
>
> -poly1305_blocks_neon:
> +ENTRY(poly1305_blocks_neon)
> ldr ip,[r0,#36] @ is_base2_26
> ands r2,r2,#-16
> beq .Lno_data_neon
> @@ -524,7 +516,7 @@
> cmp r2,#64
> bhs .Lenter_neon
> tst ip,ip @ is_base2_26?
> - beq .Lpoly1305_blocks
> + beq .Lpoly1305_blocks_arm
>
> .Lenter_neon:
> stmdb sp!,{r4-r7}
> @@ -534,7 +526,7 @@
> bne .Lbase2_26_neon
>
> stmdb sp!,{r1-r3,lr}
> - bl poly1305_init_neon
> + bl .Lpoly1305_init_neon
>
> ldr r4,[r0,#0] @ load hash value base 2^32
> ldr r5,[r0,#4]
> @@ -989,8 +981,9 @@
> ldmia sp!,{r4-r7}
> .Lno_data_neon:
> bx lr @ bx lr
> +ENDPROC(poly1305_blocks_neon)
>
> -poly1305_emit_neon:
> +ENTRY(poly1305_emit_neon)
> ldr ip,[r0,#36] @ is_base2_26
>
> stmdb sp!,{r4-r11}
> @@ -1055,6 +1048,6 @@
>
> ldmia sp!,{r4-r11}
> bx lr @ bx lr
> +ENDPROC(poly1305_emit_neon)
>
> ARM64:
>
> -poly1305_init:
> +ENTRY(poly1305_init_arm)
> cmp x1,xzr
> stp xzr,xzr,[x0] // zero hash value
> stp xzr,xzr,[x0,#16] // [along with is_base2_26]
> @@ -11,14 +15,9 @@
> csel x0,xzr,x0,eq
> b.eq .Lno_key
>
> - ldrsw x11,.LOPENSSL_armcap_P
> - ldr x11,.LOPENSSL_armcap_P
In the original, this looks like
#ifdef __ILP32__
ldrsw $t1,.LOPENSSL_armcap_P
#else
ldr $t1,.LOPENSSL_armcap_P
#endif
so I guess git commit ate those lines.
> - adr x10,.LOPENSSL_armcap_P
> -
> ldp x7,x8,[x1] // load key
> mov x9,#0xfffffffc0fffffff
> movk x9,#0x0fff,lsl#48
> - ldr w17,[x10,x11]
> rev x7,x7 // flip bytes
> rev x8,x8
> and x7,x7,x9 // &=0ffffffc0fffffff
> @@ -26,24 +25,11 @@
> and x8,x8,x9 // &=0ffffffc0ffffffc
> stp x7,x8,[x0,#32] // save key value
>
> - tst w17,#ARMV7_NEON
> -
> - adr x12,poly1305_blocks
> - adr x7,poly1305_blocks_neon
> - adr x13,poly1305_emit
> - adr x8,poly1305_emit_neon
> -
> - csel x12,x12,x7,eq
> - csel x13,x13,x8,eq
> -
> - stp w12,w13,[x2]
> - stp x12,x13,[x2]
> -
> - mov x0,#1
> .Lno_key:
> ret
> +ENDPROC(poly1305_init_arm)
>
> -poly1305_blocks:
> +ENTRY(poly1305_blocks_arm)
> ands x2,x2,#-16
> b.eq .Lno_data
>
> @@ -100,8 +86,9 @@
>
> .Lno_data:
> ret
> +ENDPROC(poly1305_blocks_arm)
>
> -poly1305_emit:
> +ENTRY(poly1305_emit_arm)
> ldp x4,x5,[x0] // load hash base 2^64
> ldr x6,[x0,#16]
> ldp x10,x11,[x2] // load nonce
> @@ -124,7 +111,9 @@
> stp x4,x5,[x1] // write result
>
> ret
> -poly1305_mult:
> +ENDPROC(poly1305_emit_arm)
> +
> +__poly1305_mult:
> mul x12,x4,x7 // h0*r0
> umulh x13,x4,x7
>
> @@ -158,7 +147,7 @@
>
> ret
>
> -poly1305_splat:
> +__poly1305_splat:
> and x12,x4,#0x03ffffff // base 2^64 -> base 2^26
> ubfx x13,x4,#26,#26
> extr x14,x5,x4,#52
> @@ -182,11 +171,11 @@
>
> ret
>
> -poly1305_blocks_neon:
> +ENTRY(poly1305_blocks_neon)
> ldr x17,[x0,#24]
> cmp x2,#128
> b.hs .Lblocks_neon
> - cbz x17,poly1305_blocks
> + cbz x17,poly1305_blocks_arm
>
> .Lblocks_neon:
> stp x29,x30,[sp,#-80]!
> @@ -232,7 +221,7 @@
> adcs x5,x5,x13
> adc x6,x6,x3
>
> - bl poly1305_mult
> + bl __poly1305_mult
> ldr x30,[sp,#8]
>
> cbz x3,.Lstore_base2_64_neon
> @@ -274,7 +263,7 @@
> adcs x5,x5,x13
> adc x6,x6,x3
>
> - bl poly1305_mult
> + bl __poly1305_mult
>
> .Linit_neon:
> and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
> @@ -301,19 +290,19 @@
> mov x5,x8
> mov x6,xzr
> add x0,x0,#48+12
> - bl poly1305_splat
> + bl __poly1305_splat
>
> - bl poly1305_mult // r^2
> + bl __poly1305_mult // r^2
> sub x0,x0,#4
> - bl poly1305_splat
> + bl __poly1305_splat
>
> - bl poly1305_mult // r^3
> + bl __poly1305_mult // r^3
> sub x0,x0,#4
> - bl poly1305_splat
> + bl __poly1305_splat
>
> - bl poly1305_mult // r^4
> + bl __poly1305_mult // r^4
> sub x0,x0,#4
> - bl poly1305_splat
> + bl __poly1305_splat
> ldr x30,[sp,#8]
>
> add x16,x1,#32
> @@ -743,10 +732,11 @@
> .Lno_data_neon:
> ldr x29,[sp],#80
> ret
> +ENDPROC(poly1305_blocks_neon)
>
> -poly1305_emit_neon:
> +ENTRY(poly1305_emit_neon)
> ldr x17,[x0,#24]
> - cbz x17,poly1305_emit
> + cbz x17,poly1305_emit_arm
>
> ldp w10,w11,[x0] // load hash value base 2^26
> ldp w12,w13,[x0,#8]
> @@ -788,6 +778,6 @@
> stp x4,x5,[x1] // write result
>
> ret
> +ENDPROC(poly1305_emit_neon)
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Cc: Andy Polyakov <appro@openssl.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: linux-arm-kernel@lists.infradead.org
> ---
> lib/zinc/Makefile | 8 +
> lib/zinc/poly1305/poly1305-arm-glue.h | 69 ++
> lib/zinc/poly1305/poly1305-arm.S | 1117 +++++++++++++++++++++++++
> lib/zinc/poly1305/poly1305-arm64.S | 822 ++++++++++++++++++
> 4 files changed, 2016 insertions(+)
> create mode 100644 lib/zinc/poly1305/poly1305-arm-glue.h
> create mode 100644 lib/zinc/poly1305/poly1305-arm.S
> create mode 100644 lib/zinc/poly1305/poly1305-arm64.S
>
> diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
> index d1e3892e06d9..f37df89a3f87 100644
> --- a/lib/zinc/Makefile
> +++ b/lib/zinc/Makefile
> @@ -25,6 +25,14 @@ endif
>
> ifeq ($(CONFIG_ZINC_POLY1305),y)
> zinc-y += poly1305/poly1305.o
> +ifeq ($(CONFIG_ZINC_ARCH_ARM),y)
> +zinc-y += poly1305/poly1305-arm.o
> +CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
> +endif
> +ifeq ($(CONFIG_ZINC_ARCH_ARM64),y)
> +zinc-y += poly1305/poly1305-arm64.o
> +CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
> +endif
> endif
>
I still don't like the GCC -includes, especially because these .h
files contain function and variable definitions so they are not
actually header files to begin with.
Also, you mentioned in the commit log that you got rid of defines and
made the code more modular, but as far as I can tell, libzinc is still
a single monolithic binary that is essentially always builtin once we
move random.c to it.
> zinc-y += main.o
> diff --git a/lib/zinc/poly1305/poly1305-arm-glue.h b/lib/zinc/poly1305/poly1305-arm-glue.h
> new file mode 100644
> index 000000000000..53f8fec7f858
> --- /dev/null
> +++ b/lib/zinc/poly1305/poly1305-arm-glue.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <zinc/poly1305.h>
> +#include <asm/hwcap.h>
> +#include <asm/neon.h>
> +
> +asmlinkage void poly1305_init_arm(void *ctx, const u8 key[16]);
> +asmlinkage void poly1305_blocks_arm(void *ctx, const u8 *inp, const size_t len,
> + const u32 padbit);
> +asmlinkage void poly1305_emit_arm(void *ctx, u8 mac[16], const u32 nonce[4]);
> +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && \
> + (defined(CONFIG_64BIT) || __LINUX_ARM_ARCH__ >= 7)
> +#define ARM_USE_NEON
> +asmlinkage void poly1305_blocks_neon(void *ctx, const u8 *inp, const size_t len,
> + const u32 padbit);
> +asmlinkage void poly1305_emit_neon(void *ctx, u8 mac[16], const u32 nonce[4]);
> +#endif
> +
> +static bool poly1305_use_neon __ro_after_init;
> +
> +void __init poly1305_fpu_init(void)
> +{
> +#if defined(CONFIG_ARM64)
> + poly1305_use_neon = elf_hwcap & HWCAP_ASIMD;
> +#elif defined(CONFIG_ARM)
> + poly1305_use_neon = elf_hwcap & HWCAP_NEON;
> +#endif
> +}
> +
> +static inline bool poly1305_init_arch(void *ctx,
> + const u8 key[POLY1305_KEY_SIZE],
> + simd_context_t simd_context)
> +{
> + poly1305_init_arm(ctx, key);
> + return true;
> +}
> +
> +static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
> + const size_t len, const u32 padbit,
> + simd_context_t simd_context)
> +{
> +#if defined(ARM_USE_NEON)
> + if (simd_context == HAVE_FULL_SIMD && poly1305_use_neon) {
> + poly1305_blocks_neon(ctx, inp, len, padbit);
> + return true;
> + }
> +#endif
> + poly1305_blocks_arm(ctx, inp, len, padbit);
> + return true;
> +}
> +
> +static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
> + const u32 nonce[4],
> + simd_context_t simd_context)
> +{
> +#if defined(ARM_USE_NEON)
> + if (simd_context == HAVE_FULL_SIMD && poly1305_use_neon) {
> + poly1305_emit_neon(ctx, mac, nonce);
> + return true;
> + }
> +#endif
> + poly1305_emit_arm(ctx, mac, nonce);
> + return true;
> +}
> +
> +#define HAVE_POLY1305_ARCH_IMPLEMENTATION
We shouldn't #define HAVE_xxx constants in code but only in Kconfig.
> diff --git a/lib/zinc/poly1305/poly1305-arm.S b/lib/zinc/poly1305/poly1305-arm.S
> new file mode 100644
> index 000000000000..110f4317b5d7
> --- /dev/null
> +++ b/lib/zinc/poly1305/poly1305-arm.S
> @@ -0,0 +1,1117 @@
> +/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
> + *
> + * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
> + */
> +
> +#include <linux/linkage.h>
> +
> +.text
> +#if defined(__thumb2__)
> +.syntax unified
> +.thumb
> +#else
> +.code 32
> +#endif
> +
> +.align 5
> +ENTRY(poly1305_init_arm)
> + stmdb sp!,{r4-r11}
> +
> + eor r3,r3,r3
> + cmp r1,#0
> + str r3,[r0,#0] @ zero hash value
> + str r3,[r0,#4]
> + str r3,[r0,#8]
> + str r3,[r0,#12]
> + str r3,[r0,#16]
> + str r3,[r0,#36] @ is_base2_26
> + add r0,r0,#20
> +
> +#ifdef __thumb2__
> + it eq
> +#endif
> + moveq r0,#0
> + beq .Lno_key
> +
> + ldrb r4,[r1,#0]
> + mov r10,#0x0fffffff
> + ldrb r5,[r1,#1]
> + and r3,r10,#-4 @ 0x0ffffffc
> + ldrb r6,[r1,#2]
> + ldrb r7,[r1,#3]
> + orr r4,r4,r5,lsl#8
> + ldrb r5,[r1,#4]
> + orr r4,r4,r6,lsl#16
> + ldrb r6,[r1,#5]
> + orr r4,r4,r7,lsl#24
> + ldrb r7,[r1,#6]
> + and r4,r4,r10
> +
> + ldrb r8,[r1,#7]
> + orr r5,r5,r6,lsl#8
> + ldrb r6,[r1,#8]
> + orr r5,r5,r7,lsl#16
> + ldrb r7,[r1,#9]
> + orr r5,r5,r8,lsl#24
> + ldrb r8,[r1,#10]
> + and r5,r5,r3
> +
> + ldrb r9,[r1,#11]
> + orr r6,r6,r7,lsl#8
> + ldrb r7,[r1,#12]
> + orr r6,r6,r8,lsl#16
> + ldrb r8,[r1,#13]
> + orr r6,r6,r9,lsl#24
> + ldrb r9,[r1,#14]
> + and r6,r6,r3
> +
> + ldrb r10,[r1,#15]
> + orr r7,r7,r8,lsl#8
> + str r4,[r0,#0]
> + orr r7,r7,r9,lsl#16
> + str r5,[r0,#4]
> + orr r7,r7,r10,lsl#24
> + str r6,[r0,#8]
> + and r7,r7,r3
> + str r7,[r0,#12]
> +.Lno_key:
> + ldmia sp!,{r4-r11}
> +#if __LINUX_ARM_ARCH__ >= 5
> + bx lr @ bx lr
> +#else
> + tst lr,#1
> + moveq pc,lr @ be binary compatible with V4, yet
> + .word 0xe12fff1e @ interoperable with Thumb ISA:-)
> +#endif
> +ENDPROC(poly1305_init_arm)
> +
> +.align 5
> +ENTRY(poly1305_blocks_arm)
> +.Lpoly1305_blocks_arm:
> + stmdb sp!,{r3-r11,lr}
> +
> + ands r2,r2,#-16
> + beq .Lno_data
> +
> + cmp r3,#0
> + add r2,r2,r1 @ end pointer
> + sub sp,sp,#32
> +
> + ldmia r0,{r4-r12} @ load context
> +
> + str r0,[sp,#12] @ offload stuff
> + mov lr,r1
> + str r2,[sp,#16]
> + str r10,[sp,#20]
> + str r11,[sp,#24]
> + str r12,[sp,#28]
> + b .Loop
> +
> +.Loop:
> +#if __LINUX_ARM_ARCH__ < 7
> + ldrb r0,[lr],#16 @ load input
> +#ifdef __thumb2__
> + it hi
> +#endif
> + addhi r8,r8,#1 @ 1<<128
> + ldrb r1,[lr,#-15]
> + ldrb r2,[lr,#-14]
> + ldrb r3,[lr,#-13]
> + orr r1,r0,r1,lsl#8
> + ldrb r0,[lr,#-12]
> + orr r2,r1,r2,lsl#16
> + ldrb r1,[lr,#-11]
> + orr r3,r2,r3,lsl#24
> + ldrb r2,[lr,#-10]
> + adds r4,r4,r3 @ accumulate input
> +
> + ldrb r3,[lr,#-9]
> + orr r1,r0,r1,lsl#8
> + ldrb r0,[lr,#-8]
> + orr r2,r1,r2,lsl#16
> + ldrb r1,[lr,#-7]
> + orr r3,r2,r3,lsl#24
> + ldrb r2,[lr,#-6]
> + adcs r5,r5,r3
> +
> + ldrb r3,[lr,#-5]
> + orr r1,r0,r1,lsl#8
> + ldrb r0,[lr,#-4]
> + orr r2,r1,r2,lsl#16
> + ldrb r1,[lr,#-3]
> + orr r3,r2,r3,lsl#24
> + ldrb r2,[lr,#-2]
> + adcs r6,r6,r3
> +
> + ldrb r3,[lr,#-1]
> + orr r1,r0,r1,lsl#8
> + str lr,[sp,#8] @ offload input pointer
> + orr r2,r1,r2,lsl#16
> + add r10,r10,r10,lsr#2
> + orr r3,r2,r3,lsl#24
> +#else
> + ldr r0,[lr],#16 @ load input
> +#ifdef __thumb2__
> + it hi
> +#endif
> + addhi r8,r8,#1 @ padbit
> + ldr r1,[lr,#-12]
> + ldr r2,[lr,#-8]
> + ldr r3,[lr,#-4]
> +#ifdef __ARMEB__
> + rev r0,r0
> + rev r1,r1
> + rev r2,r2
> + rev r3,r3
> +#endif
> + adds r4,r4,r0 @ accumulate input
> + str lr,[sp,#8] @ offload input pointer
> + adcs r5,r5,r1
> + add r10,r10,r10,lsr#2
> + adcs r6,r6,r2
> +#endif
> + add r11,r11,r11,lsr#2
> + adcs r7,r7,r3
> + add r12,r12,r12,lsr#2
> +
> + umull r2,r3,r5,r9
> + adc r8,r8,#0
> + umull r0,r1,r4,r9
> + umlal r2,r3,r8,r10
> + umlal r0,r1,r7,r10
> + ldr r10,[sp,#20] @ reload r10
> + umlal r2,r3,r6,r12
> + umlal r0,r1,r5,r12
> + umlal r2,r3,r7,r11
> + umlal r0,r1,r6,r11
> + umlal r2,r3,r4,r10
> + str r0,[sp,#0] @ future r4
> + mul r0,r11,r8
> + ldr r11,[sp,#24] @ reload r11
> + adds r2,r2,r1 @ d1+=d0>>32
> + eor r1,r1,r1
> + adc lr,r3,#0 @ future r6
> + str r2,[sp,#4] @ future r5
> +
> + mul r2,r12,r8
> + eor r3,r3,r3
> + umlal r0,r1,r7,r12
> + ldr r12,[sp,#28] @ reload r12
> + umlal r2,r3,r7,r9
> + umlal r0,r1,r6,r9
> + umlal r2,r3,r6,r10
> + umlal r0,r1,r5,r10
> + umlal r2,r3,r5,r11
> + umlal r0,r1,r4,r11
> + umlal r2,r3,r4,r12
> + ldr r4,[sp,#0]
> + mul r8,r9,r8
> + ldr r5,[sp,#4]
> +
> + adds r6,lr,r0 @ d2+=d1>>32
> + ldr lr,[sp,#8] @ reload input pointer
> + adc r1,r1,#0
> + adds r7,r2,r1 @ d3+=d2>>32
> + ldr r0,[sp,#16] @ reload end pointer
> + adc r3,r3,#0
> + add r8,r8,r3 @ h4+=d3>>32
> +
> + and r1,r8,#-4
> + and r8,r8,#3
> + add r1,r1,r1,lsr#2 @ *=5
> + adds r4,r4,r1
> + adcs r5,r5,#0
> + adcs r6,r6,#0
> + adcs r7,r7,#0
> + adc r8,r8,#0
> +
> + cmp r0,lr @ done yet?
> + bhi .Loop
> +
> + ldr r0,[sp,#12]
> + add sp,sp,#32
> + stmia r0,{r4-r8} @ store the result
> +
> +.Lno_data:
> +#if __LINUX_ARM_ARCH__ >= 5
> + ldmia sp!,{r3-r11,pc}
> +#else
> + ldmia sp!,{r3-r11,lr}
> + tst lr,#1
> + moveq pc,lr @ be binary compatible with V4, yet
> + .word 0xe12fff1e @ interoperable with Thumb ISA:-)
> +#endif
> +ENDPROC(poly1305_blocks_arm)
> +
> +.align 5
> +ENTRY(poly1305_emit_arm)
> + stmdb sp!,{r4-r11}
> +.Lpoly1305_emit_enter:
> + ldmia r0,{r3-r7}
> + adds r8,r3,#5 @ compare to modulus
> + adcs r9,r4,#0
> + adcs r10,r5,#0
> + adcs r11,r6,#0
> + adc r7,r7,#0
> + tst r7,#4 @ did it carry/borrow?
> +
> +#ifdef __thumb2__
> + it ne
> +#endif
> + movne r3,r8
> + ldr r8,[r2,#0]
> +#ifdef __thumb2__
> + it ne
> +#endif
> + movne r4,r9
> + ldr r9,[r2,#4]
> +#ifdef __thumb2__
> + it ne
> +#endif
> + movne r5,r10
> + ldr r10,[r2,#8]
> +#ifdef __thumb2__
> + it ne
> +#endif
> + movne r6,r11
> + ldr r11,[r2,#12]
> +
> + adds r3,r3,r8
> + adcs r4,r4,r9
> + adcs r5,r5,r10
> + adc r6,r6,r11
> +
> +#if __LINUX_ARM_ARCH__ >= 7
> +#ifdef __ARMEB__
> + rev r3,r3
> + rev r4,r4
> + rev r5,r5
> + rev r6,r6
> +#endif
> + str r3,[r1,#0]
> + str r4,[r1,#4]
> + str r5,[r1,#8]
> + str r6,[r1,#12]
> +#else
> + strb r3,[r1,#0]
> + mov r3,r3,lsr#8
> + strb r4,[r1,#4]
> + mov r4,r4,lsr#8
> + strb r5,[r1,#8]
> + mov r5,r5,lsr#8
> + strb r6,[r1,#12]
> + mov r6,r6,lsr#8
> +
> + strb r3,[r1,#1]
> + mov r3,r3,lsr#8
> + strb r4,[r1,#5]
> + mov r4,r4,lsr#8
> + strb r5,[r1,#9]
> + mov r5,r5,lsr#8
> + strb r6,[r1,#13]
> + mov r6,r6,lsr#8
> +
> + strb r3,[r1,#2]
> + mov r3,r3,lsr#8
> + strb r4,[r1,#6]
> + mov r4,r4,lsr#8
> + strb r5,[r1,#10]
> + mov r5,r5,lsr#8
> + strb r6,[r1,#14]
> + mov r6,r6,lsr#8
> +
> + strb r3,[r1,#3]
> + strb r4,[r1,#7]
> + strb r5,[r1,#11]
> + strb r6,[r1,#15]
> +#endif
> + ldmia sp!,{r4-r11}
> +#if __LINUX_ARM_ARCH__ >= 5
> + bx lr @ bx lr
> +#else
> + tst lr,#1
> + moveq pc,lr @ be binary compatible with V4, yet
> + .word 0xe12fff1e @ interoperable with Thumb ISA:-)
> +#endif
> +ENDPROC(poly1305_emit_arm)
> +
> +
> +#if __LINUX_ARM_ARCH__ >= 7
> +.fpu neon
> +
> +.align 5
> +ENTRY(poly1305_init_neon)
> +.Lpoly1305_init_neon:
> + ldr r4,[r0,#20] @ load key base 2^32
> + ldr r5,[r0,#24]
> + ldr r6,[r0,#28]
> + ldr r7,[r0,#32]
> +
> + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26
> + mov r3,r4,lsr#26
> + mov r4,r5,lsr#20
> + orr r3,r3,r5,lsl#6
> + mov r5,r6,lsr#14
> + orr r4,r4,r6,lsl#12
> + mov r6,r7,lsr#8
> + orr r5,r5,r7,lsl#18
> + and r3,r3,#0x03ffffff
> + and r4,r4,#0x03ffffff
> + and r5,r5,#0x03ffffff
> +
> + vdup.32 d0,r2 @ r^1 in both lanes
> + add r2,r3,r3,lsl#2 @ *5
> + vdup.32 d1,r3
> + add r3,r4,r4,lsl#2
> + vdup.32 d2,r2
> + vdup.32 d3,r4
> + add r4,r5,r5,lsl#2
> + vdup.32 d4,r3
> + vdup.32 d5,r5
> + add r5,r6,r6,lsl#2
> + vdup.32 d6,r4
> + vdup.32 d7,r6
> + vdup.32 d8,r5
> +
> + mov r5,#2 @ counter
> +
> +.Lsquare_neon:
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
> + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4
> + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4
> + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4
> + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4
> +
> + vmull.u32 q5,d0,d0[1]
> + vmull.u32 q6,d1,d0[1]
> + vmull.u32 q7,d3,d0[1]
> + vmull.u32 q8,d5,d0[1]
> + vmull.u32 q9,d7,d0[1]
> +
> + vmlal.u32 q5,d7,d2[1]
> + vmlal.u32 q6,d0,d1[1]
> + vmlal.u32 q7,d1,d1[1]
> + vmlal.u32 q8,d3,d1[1]
> + vmlal.u32 q9,d5,d1[1]
> +
> + vmlal.u32 q5,d5,d4[1]
> + vmlal.u32 q6,d7,d4[1]
> + vmlal.u32 q8,d1,d3[1]
> + vmlal.u32 q7,d0,d3[1]
> + vmlal.u32 q9,d3,d3[1]
> +
> + vmlal.u32 q5,d3,d6[1]
> + vmlal.u32 q8,d0,d5[1]
> + vmlal.u32 q6,d5,d6[1]
> + vmlal.u32 q7,d7,d6[1]
> + vmlal.u32 q9,d1,d5[1]
> +
> + vmlal.u32 q8,d7,d8[1]
> + vmlal.u32 q5,d1,d8[1]
> + vmlal.u32 q6,d3,d8[1]
> + vmlal.u32 q7,d5,d8[1]
> + vmlal.u32 q9,d0,d7[1]
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
> + @ and P. Schwabe
> + @
> + @ H0>>+H1>>+H2>>+H3>>+H4
> + @ H3>>+H4>>*5+H0>>+H1
> + @
> + @ Trivia.
> + @
> + @ Result of multiplication of n-bit number by m-bit number is
> + @ n+m bits wide. However! Even though 2^n is a n+1-bit number,
> + @ m-bit number multiplied by 2^n is still n+m bits wide.
> + @
> + @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2,
> + @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit
> + @ one is n+1 bits wide.
> + @
> + @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that
> + @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4
> + @ can be 27. However! In cases when their width exceeds 26 bits
> + @ they are limited by 2^26+2^6. This in turn means that *sum*
> + @ of the products with these values can still be viewed as sum
> + @ of 52-bit numbers as long as the amount of addends is not a
> + @ power of 2. For example,
> + @
> + @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4,
> + @
> + @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or
> + @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than
> + @ 8 * (2^52) or 2^55. However, the value is then multiplied by
> + @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12),
> + @ which is less than 32 * (2^52) or 2^57. And when processing
> + @ data we are looking at triple as many addends...
> + @
> + @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and
> + @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the
> + @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while
> + @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32
> + @ instruction accepts 2x32-bit input and writes 2x64-bit result.
> + @ This means that result of reduction have to be compressed upon
> + @ loop wrap-around. This can be done in the process of reduction
> + @ to minimize amount of instructions [as well as amount of
> + @ 128-bit instructions, which benefits low-end processors], but
> + @ one has to watch for H2 (which is narrower than H0) and 5*H4
> + @ not being wider than 58 bits, so that result of right shift
> + @ by 26 bits fits in 32 bits. This is also useful on x86,
> + @ because it allows to use paddd in place for paddq, which
> + @ benefits Atom, where paddq is ridiculously slow.
> +
> + vshr.u64 q15,q8,#26
> + vmovn.i64 d16,q8
> + vshr.u64 q4,q5,#26
> + vmovn.i64 d10,q5
> + vadd.i64 q9,q9,q15 @ h3 -> h4
> + vbic.i32 d16,#0xfc000000 @ &=0x03ffffff
> + vadd.i64 q6,q6,q4 @ h0 -> h1
> + vbic.i32 d10,#0xfc000000
> +
> + vshrn.u64 d30,q9,#26
> + vmovn.i64 d18,q9
> + vshr.u64 q4,q6,#26
> + vmovn.i64 d12,q6
> + vadd.i64 q7,q7,q4 @ h1 -> h2
> + vbic.i32 d18,#0xfc000000
> + vbic.i32 d12,#0xfc000000
> +
> + vadd.i32 d10,d10,d30
> + vshl.u32 d30,d30,#2
> + vshrn.u64 d8,q7,#26
> + vmovn.i64 d14,q7
> + vadd.i32 d10,d10,d30 @ h4 -> h0
> + vadd.i32 d16,d16,d8 @ h2 -> h3
> + vbic.i32 d14,#0xfc000000
> +
> + vshr.u32 d30,d10,#26
> + vbic.i32 d10,#0xfc000000
> + vshr.u32 d8,d16,#26
> + vbic.i32 d16,#0xfc000000
> + vadd.i32 d12,d12,d30 @ h0 -> h1
> + vadd.i32 d18,d18,d8 @ h3 -> h4
> +
> + subs r5,r5,#1
> + beq .Lsquare_break_neon
> +
> + add r6,r0,#(48+0*9*4)
> + add r7,r0,#(48+1*9*4)
> +
> + vtrn.32 d0,d10 @ r^2:r^1
> + vtrn.32 d3,d14
> + vtrn.32 d5,d16
> + vtrn.32 d1,d12
> + vtrn.32 d7,d18
> +
> + vshl.u32 d4,d3,#2 @ *5
> + vshl.u32 d6,d5,#2
> + vshl.u32 d2,d1,#2
> + vshl.u32 d8,d7,#2
> + vadd.i32 d4,d4,d3
> + vadd.i32 d2,d2,d1
> + vadd.i32 d6,d6,d5
> + vadd.i32 d8,d8,d7
> +
> + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]!
> + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]!
> + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
> + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
> + vst1.32 {d8[0]},[r6,:32]
> + vst1.32 {d8[1]},[r7,:32]
> +
> + b .Lsquare_neon
> +
> +.align 4
> +.Lsquare_break_neon:
> + add r6,r0,#(48+2*4*9)
> + add r7,r0,#(48+3*4*9)
> +
> + vmov d0,d10 @ r^4:r^3
> + vshl.u32 d2,d12,#2 @ *5
> + vmov d1,d12
> + vshl.u32 d4,d14,#2
> + vmov d3,d14
> + vshl.u32 d6,d16,#2
> + vmov d5,d16
> + vshl.u32 d8,d18,#2
> + vmov d7,d18
> + vadd.i32 d2,d2,d12
> + vadd.i32 d4,d4,d14
> + vadd.i32 d6,d6,d16
> + vadd.i32 d8,d8,d18
> +
> + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]!
> + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]!
> + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
> + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
> + vst1.32 {d8[0]},[r6]
> + vst1.32 {d8[1]},[r7]
> +
> + bx lr @ bx lr
> +ENDPROC(poly1305_init_neon)
> +
> +.align 5
> +ENTRY(poly1305_blocks_neon)
> + ldr ip,[r0,#36] @ is_base2_26
> + ands r2,r2,#-16
> + beq .Lno_data_neon
> +
> + cmp r2,#64
> + bhs .Lenter_neon
> + tst ip,ip @ is_base2_26?
> + beq .Lpoly1305_blocks_arm
> +
> +.Lenter_neon:
> + stmdb sp!,{r4-r7}
> + vstmdb sp!,{d8-d15} @ ABI specification says so
> +
> + tst ip,ip @ is_base2_26?
> + bne .Lbase2_26_neon
> +
> + stmdb sp!,{r1-r3,lr}
> + bl .Lpoly1305_init_neon
> +
> + ldr r4,[r0,#0] @ load hash value base 2^32
> + ldr r5,[r0,#4]
> + ldr r6,[r0,#8]
> + ldr r7,[r0,#12]
> + ldr ip,[r0,#16]
> +
> + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26
> + mov r3,r4,lsr#26
> + veor d10,d10,d10
> + mov r4,r5,lsr#20
> + orr r3,r3,r5,lsl#6
> + veor d12,d12,d12
> + mov r5,r6,lsr#14
> + orr r4,r4,r6,lsl#12
> + veor d14,d14,d14
> + mov r6,r7,lsr#8
> + orr r5,r5,r7,lsl#18
> + veor d16,d16,d16
> + and r3,r3,#0x03ffffff
> + orr r6,r6,ip,lsl#24
> + veor d18,d18,d18
> + and r4,r4,#0x03ffffff
> + mov r1,#1
> + and r5,r5,#0x03ffffff
> + str r1,[r0,#36] @ is_base2_26
> +
> + vmov.32 d10[0],r2
> + vmov.32 d12[0],r3
> + vmov.32 d14[0],r4
> + vmov.32 d16[0],r5
> + vmov.32 d18[0],r6
> + adr r5,.Lzeros
> +
> + ldmia sp!,{r1-r3,lr}
> + b .Lbase2_32_neon
> +
> +.align 4
> +.Lbase2_26_neon:
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ load hash value
> +
> + veor d10,d10,d10
> + veor d12,d12,d12
> + veor d14,d14,d14
> + veor d16,d16,d16
> + veor d18,d18,d18
> + vld4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]!
> + adr r5,.Lzeros
> + vld1.32 {d18[0]},[r0]
> + sub r0,r0,#16 @ rewind
> +
> +.Lbase2_32_neon:
> + add r4,r1,#32
> + mov r3,r3,lsl#24
> + tst r2,#31
> + beq .Leven
> +
> + vld4.32 {d20[0],d22[0],d24[0],d26[0]},[r1]!
> + vmov.32 d28[0],r3
> + sub r2,r2,#16
> + add r4,r1,#32
> +
> +#ifdef __ARMEB__
> + vrev32.8 q10,q10
> + vrev32.8 q13,q13
> + vrev32.8 q11,q11
> + vrev32.8 q12,q12
> +#endif
> + vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26
> + vshl.u32 d26,d26,#18
> +
> + vsri.u32 d26,d24,#14
> + vshl.u32 d24,d24,#12
> + vadd.i32 d29,d28,d18 @ add hash value and move to #hi
> +
> + vbic.i32 d26,#0xfc000000
> + vsri.u32 d24,d22,#20
> + vshl.u32 d22,d22,#6
> +
> + vbic.i32 d24,#0xfc000000
> + vsri.u32 d22,d20,#26
> + vadd.i32 d27,d26,d16
> +
> + vbic.i32 d20,#0xfc000000
> + vbic.i32 d22,#0xfc000000
> + vadd.i32 d25,d24,d14
> +
> + vadd.i32 d21,d20,d10
> + vadd.i32 d23,d22,d12
> +
> + mov r7,r5
> + add r6,r0,#48
> +
> + cmp r2,r2
> + b .Long_tail
> +
> +.align 4
> +.Leven:
> + subs r2,r2,#64
> + it lo
> + movlo r4,r5
> +
> + vmov.i32 q14,#1<<24 @ padbit, yes, always
> + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
> + add r1,r1,#64
> + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0)
> + add r4,r4,#64
> + itt hi
> + addhi r7,r0,#(48+1*9*4)
> + addhi r6,r0,#(48+3*9*4)
> +
> +#ifdef __ARMEB__
> + vrev32.8 q10,q10
> + vrev32.8 q13,q13
> + vrev32.8 q11,q11
> + vrev32.8 q12,q12
> +#endif
> + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
> + vshl.u32 q13,q13,#18
> +
> + vsri.u32 q13,q12,#14
> + vshl.u32 q12,q12,#12
> +
> + vbic.i32 q13,#0xfc000000
> + vsri.u32 q12,q11,#20
> + vshl.u32 q11,q11,#6
> +
> + vbic.i32 q12,#0xfc000000
> + vsri.u32 q11,q10,#26
> +
> + vbic.i32 q10,#0xfc000000
> + vbic.i32 q11,#0xfc000000
> +
> + bls .Lskip_loop
> +
> + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2
> + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
> + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
> + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
> + b .Loop_neon
> +
> +.align 5
> +.Loop_neon:
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
> + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
> + @ ___________________/
> + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
> + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
> + @ ___________________/ ____________________/
> + @
> + @ Note that we start with inp[2:3]*r^2. This is because it
> + @ doesn't depend on reduction in previous iteration.
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4
> + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4
> + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4
> + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4
> + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ inp[2:3]*r^2
> +
> + vadd.i32 d24,d24,d14 @ accumulate inp[0:1]
> + vmull.u32 q7,d25,d0[1]
> + vadd.i32 d20,d20,d10
> + vmull.u32 q5,d21,d0[1]
> + vadd.i32 d26,d26,d16
> + vmull.u32 q8,d27,d0[1]
> + vmlal.u32 q7,d23,d1[1]
> + vadd.i32 d22,d22,d12
> + vmull.u32 q6,d23,d0[1]
> +
> + vadd.i32 d28,d28,d18
> + vmull.u32 q9,d29,d0[1]
> + subs r2,r2,#64
> + vmlal.u32 q5,d29,d2[1]
> + it lo
> + movlo r4,r5
> + vmlal.u32 q8,d25,d1[1]
> + vld1.32 d8[1],[r7,:32]
> + vmlal.u32 q6,d21,d1[1]
> + vmlal.u32 q9,d27,d1[1]
> +
> + vmlal.u32 q5,d27,d4[1]
> + vmlal.u32 q8,d23,d3[1]
> + vmlal.u32 q9,d25,d3[1]
> + vmlal.u32 q6,d29,d4[1]
> + vmlal.u32 q7,d21,d3[1]
> +
> + vmlal.u32 q8,d21,d5[1]
> + vmlal.u32 q5,d25,d6[1]
> + vmlal.u32 q9,d23,d5[1]
> + vmlal.u32 q6,d27,d6[1]
> + vmlal.u32 q7,d29,d6[1]
> +
> + vmlal.u32 q8,d29,d8[1]
> + vmlal.u32 q5,d23,d8[1]
> + vmlal.u32 q9,d21,d7[1]
> + vmlal.u32 q6,d25,d8[1]
> + vmlal.u32 q7,d27,d8[1]
> +
> + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0)
> + add r4,r4,#64
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ (hash+inp[0:1])*r^4 and accumulate
> +
> + vmlal.u32 q8,d26,d0[0]
> + vmlal.u32 q5,d20,d0[0]
> + vmlal.u32 q9,d28,d0[0]
> + vmlal.u32 q6,d22,d0[0]
> + vmlal.u32 q7,d24,d0[0]
> + vld1.32 d8[0],[r6,:32]
> +
> + vmlal.u32 q8,d24,d1[0]
> + vmlal.u32 q5,d28,d2[0]
> + vmlal.u32 q9,d26,d1[0]
> + vmlal.u32 q6,d20,d1[0]
> + vmlal.u32 q7,d22,d1[0]
> +
> + vmlal.u32 q8,d22,d3[0]
> + vmlal.u32 q5,d26,d4[0]
> + vmlal.u32 q9,d24,d3[0]
> + vmlal.u32 q6,d28,d4[0]
> + vmlal.u32 q7,d20,d3[0]
> +
> + vmlal.u32 q8,d20,d5[0]
> + vmlal.u32 q5,d24,d6[0]
> + vmlal.u32 q9,d22,d5[0]
> + vmlal.u32 q6,d26,d6[0]
> + vmlal.u32 q8,d28,d8[0]
> +
> + vmlal.u32 q7,d28,d6[0]
> + vmlal.u32 q5,d22,d8[0]
> + vmlal.u32 q9,d20,d7[0]
> + vmov.i32 q14,#1<<24 @ padbit, yes, always
> + vmlal.u32 q6,d24,d8[0]
> + vmlal.u32 q7,d26,d8[0]
> +
> + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
> + add r1,r1,#64
> +#ifdef __ARMEB__
> + vrev32.8 q10,q10
> + vrev32.8 q11,q11
> + vrev32.8 q12,q12
> + vrev32.8 q13,q13
> +#endif
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ lazy reduction interleaved with base 2^32 -> base 2^26 of
> + @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14.
> +
> + vshr.u64 q15,q8,#26
> + vmovn.i64 d16,q8
> + vshr.u64 q4,q5,#26
> + vmovn.i64 d10,q5
> + vadd.i64 q9,q9,q15 @ h3 -> h4
> + vbic.i32 d16,#0xfc000000
> + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
> + vadd.i64 q6,q6,q4 @ h0 -> h1
> + vshl.u32 q13,q13,#18
> + vbic.i32 d10,#0xfc000000
> +
> + vshrn.u64 d30,q9,#26
> + vmovn.i64 d18,q9
> + vshr.u64 q4,q6,#26
> + vmovn.i64 d12,q6
> + vadd.i64 q7,q7,q4 @ h1 -> h2
> + vsri.u32 q13,q12,#14
> + vbic.i32 d18,#0xfc000000
> + vshl.u32 q12,q12,#12
> + vbic.i32 d12,#0xfc000000
> +
> + vadd.i32 d10,d10,d30
> + vshl.u32 d30,d30,#2
> + vbic.i32 q13,#0xfc000000
> + vshrn.u64 d8,q7,#26
> + vmovn.i64 d14,q7
> + vaddl.u32 q5,d10,d30 @ h4 -> h0 [widen for a sec]
> + vsri.u32 q12,q11,#20
> + vadd.i32 d16,d16,d8 @ h2 -> h3
> + vshl.u32 q11,q11,#6
> + vbic.i32 d14,#0xfc000000
> + vbic.i32 q12,#0xfc000000
> +
> + vshrn.u64 d30,q5,#26 @ re-narrow
> + vmovn.i64 d10,q5
> + vsri.u32 q11,q10,#26
> + vbic.i32 q10,#0xfc000000
> + vshr.u32 d8,d16,#26
> + vbic.i32 d16,#0xfc000000
> + vbic.i32 d10,#0xfc000000
> + vadd.i32 d12,d12,d30 @ h0 -> h1
> + vadd.i32 d18,d18,d8 @ h3 -> h4
> + vbic.i32 q11,#0xfc000000
> +
> + bhi .Loop_neon
> +
> +.Lskip_loop:
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
> +
> + add r7,r0,#(48+0*9*4)
> + add r6,r0,#(48+1*9*4)
> + adds r2,r2,#32
> + it ne
> + movne r2,#0
> + bne .Long_tail
> +
> + vadd.i32 d25,d24,d14 @ add hash value and move to #hi
> + vadd.i32 d21,d20,d10
> + vadd.i32 d27,d26,d16
> + vadd.i32 d23,d22,d12
> + vadd.i32 d29,d28,d18
> +
> +.Long_tail:
> + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1
> + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2
> +
> + vadd.i32 d24,d24,d14 @ can be redundant
> + vmull.u32 q7,d25,d0
> + vadd.i32 d20,d20,d10
> + vmull.u32 q5,d21,d0
> + vadd.i32 d26,d26,d16
> + vmull.u32 q8,d27,d0
> + vadd.i32 d22,d22,d12
> + vmull.u32 q6,d23,d0
> + vadd.i32 d28,d28,d18
> + vmull.u32 q9,d29,d0
> +
> + vmlal.u32 q5,d29,d2
> + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
> + vmlal.u32 q8,d25,d1
> + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
> + vmlal.u32 q6,d21,d1
> + vmlal.u32 q9,d27,d1
> + vmlal.u32 q7,d23,d1
> +
> + vmlal.u32 q8,d23,d3
> + vld1.32 d8[1],[r7,:32]
> + vmlal.u32 q5,d27,d4
> + vld1.32 d8[0],[r6,:32]
> + vmlal.u32 q9,d25,d3
> + vmlal.u32 q6,d29,d4
> + vmlal.u32 q7,d21,d3
> +
> + vmlal.u32 q8,d21,d5
> + it ne
> + addne r7,r0,#(48+2*9*4)
> + vmlal.u32 q5,d25,d6
> + it ne
> + addne r6,r0,#(48+3*9*4)
> + vmlal.u32 q9,d23,d5
> + vmlal.u32 q6,d27,d6
> + vmlal.u32 q7,d29,d6
> +
> + vmlal.u32 q8,d29,d8
> + vorn q0,q0,q0 @ all-ones, can be redundant
> + vmlal.u32 q5,d23,d8
> + vshr.u64 q0,q0,#38
> + vmlal.u32 q9,d21,d7
> + vmlal.u32 q6,d25,d8
> + vmlal.u32 q7,d27,d8
> +
> + beq .Lshort_tail
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ (hash+inp[0:1])*r^4:r^3 and accumulate
> +
> + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3
> + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
> +
> + vmlal.u32 q7,d24,d0
> + vmlal.u32 q5,d20,d0
> + vmlal.u32 q8,d26,d0
> + vmlal.u32 q6,d22,d0
> + vmlal.u32 q9,d28,d0
> +
> + vmlal.u32 q5,d28,d2
> + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
> + vmlal.u32 q8,d24,d1
> + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
> + vmlal.u32 q6,d20,d1
> + vmlal.u32 q9,d26,d1
> + vmlal.u32 q7,d22,d1
> +
> + vmlal.u32 q8,d22,d3
> + vld1.32 d8[1],[r7,:32]
> + vmlal.u32 q5,d26,d4
> + vld1.32 d8[0],[r6,:32]
> + vmlal.u32 q9,d24,d3
> + vmlal.u32 q6,d28,d4
> + vmlal.u32 q7,d20,d3
> +
> + vmlal.u32 q8,d20,d5
> + vmlal.u32 q5,d24,d6
> + vmlal.u32 q9,d22,d5
> + vmlal.u32 q6,d26,d6
> + vmlal.u32 q7,d28,d6
> +
> + vmlal.u32 q8,d28,d8
> + vorn q0,q0,q0 @ all-ones
> + vmlal.u32 q5,d22,d8
> + vshr.u64 q0,q0,#38
> + vmlal.u32 q9,d20,d7
> + vmlal.u32 q6,d24,d8
> + vmlal.u32 q7,d26,d8
> +
> +.Lshort_tail:
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ horizontal addition
> +
> + vadd.i64 d16,d16,d17
> + vadd.i64 d10,d10,d11
> + vadd.i64 d18,d18,d19
> + vadd.i64 d12,d12,d13
> + vadd.i64 d14,d14,d15
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ lazy reduction, but without narrowing
> +
> + vshr.u64 q15,q8,#26
> + vand.i64 q8,q8,q0
> + vshr.u64 q4,q5,#26
> + vand.i64 q5,q5,q0
> + vadd.i64 q9,q9,q15 @ h3 -> h4
> + vadd.i64 q6,q6,q4 @ h0 -> h1
> +
> + vshr.u64 q15,q9,#26
> + vand.i64 q9,q9,q0
> + vshr.u64 q4,q6,#26
> + vand.i64 q6,q6,q0
> + vadd.i64 q7,q7,q4 @ h1 -> h2
> +
> + vadd.i64 q5,q5,q15
> + vshl.u64 q15,q15,#2
> + vshr.u64 q4,q7,#26
> + vand.i64 q7,q7,q0
> + vadd.i64 q5,q5,q15 @ h4 -> h0
> + vadd.i64 q8,q8,q4 @ h2 -> h3
> +
> + vshr.u64 q15,q5,#26
> + vand.i64 q5,q5,q0
> + vshr.u64 q4,q8,#26
> + vand.i64 q8,q8,q0
> + vadd.i64 q6,q6,q15 @ h0 -> h1
> + vadd.i64 q9,q9,q4 @ h3 -> h4
> +
> + cmp r2,#0
> + bne .Leven
> +
> + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> + @ store hash value
> +
> + vst4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]!
> + vst1.32 {d18[0]},[r0]
> +
> + vldmia sp!,{d8-d15} @ epilogue
> + ldmia sp!,{r4-r7}
> +.Lno_data_neon:
> + bx lr @ bx lr
> +ENDPROC(poly1305_blocks_neon)
> +
> +.align 5
> +ENTRY(poly1305_emit_neon)
> + ldr ip,[r0,#36] @ is_base2_26
> +
> + stmdb sp!,{r4-r11}
> +
> + tst ip,ip
> + beq .Lpoly1305_emit_enter
> +
> + ldmia r0,{r3-r7}
> + eor r8,r8,r8
> +
> + adds r3,r3,r4,lsl#26 @ base 2^26 -> base 2^32
> + mov r4,r4,lsr#6
> + adcs r4,r4,r5,lsl#20
> + mov r5,r5,lsr#12
> + adcs r5,r5,r6,lsl#14
> + mov r6,r6,lsr#18
> + adcs r6,r6,r7,lsl#8
> + adc r7,r8,r7,lsr#24 @ can be partially reduced ...
> +
> + and r8,r7,#-4 @ ... so reduce
> + and r7,r6,#3
> + add r8,r8,r8,lsr#2 @ *= 5
> + adds r3,r3,r8
> + adcs r4,r4,#0
> + adcs r5,r5,#0
> + adcs r6,r6,#0
> + adc r7,r7,#0
> +
> + adds r8,r3,#5 @ compare to modulus
> + adcs r9,r4,#0
> + adcs r10,r5,#0
> + adcs r11,r6,#0
> + adc r7,r7,#0
> + tst r7,#4 @ did it carry/borrow?
> +
> + it ne
> + movne r3,r8
> + ldr r8,[r2,#0]
> + it ne
> + movne r4,r9
> + ldr r9,[r2,#4]
> + it ne
> + movne r5,r10
> + ldr r10,[r2,#8]
> + it ne
> + movne r6,r11
> + ldr r11,[r2,#12]
> +
> + adds r3,r3,r8 @ accumulate nonce
> + adcs r4,r4,r9
> + adcs r5,r5,r10
> + adc r6,r6,r11
> +
> +#ifdef __ARMEB__
> + rev r3,r3
> + rev r4,r4
> + rev r5,r5
> + rev r6,r6
> +#endif
> + str r3,[r1,#0] @ store the result
> + str r4,[r1,#4]
> + str r5,[r1,#8]
> + str r6,[r1,#12]
> +
> + ldmia sp!,{r4-r11}
> + bx lr @ bx lr
> +ENDPROC(poly1305_emit_neon)
> +
> +.align 5
> +.Lzeros:
> +.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
> +#endif
> diff --git a/lib/zinc/poly1305/poly1305-arm64.S b/lib/zinc/poly1305/poly1305-arm64.S
> new file mode 100644
> index 000000000000..c20023544183
> --- /dev/null
> +++ b/lib/zinc/poly1305/poly1305-arm64.S
> @@ -0,0 +1,822 @@
> +/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
> + *
> + * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
> + */
> +
> +#include <linux/linkage.h>
> +.text
> +
> +.align 5
> +ENTRY(poly1305_init_arm)
> + cmp x1,xzr
> + stp xzr,xzr,[x0] // zero hash value
> + stp xzr,xzr,[x0,#16] // [along with is_base2_26]
> +
> + csel x0,xzr,x0,eq
> + b.eq .Lno_key
> +
> + ldp x7,x8,[x1] // load key
> + mov x9,#0xfffffffc0fffffff
> + movk x9,#0x0fff,lsl#48
> +#ifdef __ARMEB__
> + rev x7,x7 // flip bytes
> + rev x8,x8
> +#endif
> + and x7,x7,x9 // &=0ffffffc0fffffff
> + and x9,x9,#-4
> + and x8,x8,x9 // &=0ffffffc0ffffffc
> + stp x7,x8,[x0,#32] // save key value
> +
> +.Lno_key:
> + ret
> +ENDPROC(poly1305_init_arm)
> +
> +.align 5
> +ENTRY(poly1305_blocks_arm)
> + ands x2,x2,#-16
> + b.eq .Lno_data
> +
> + ldp x4,x5,[x0] // load hash value
> + ldp x7,x8,[x0,#32] // load key value
> + ldr x6,[x0,#16]
> + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
> + b .Loop
> +
> +.align 5
> +.Loop:
> + ldp x10,x11,[x1],#16 // load input
> + sub x2,x2,#16
> +#ifdef __ARMEB__
> + rev x10,x10
> + rev x11,x11
> +#endif
> + adds x4,x4,x10 // accumulate input
> + adcs x5,x5,x11
> +
> + mul x12,x4,x7 // h0*r0
> + adc x6,x6,x3
> + umulh x13,x4,x7
> +
> + mul x10,x5,x9 // h1*5*r1
> + umulh x11,x5,x9
> +
> + adds x12,x12,x10
> + mul x10,x4,x8 // h0*r1
> + adc x13,x13,x11
> + umulh x14,x4,x8
> +
> + adds x13,x13,x10
> + mul x10,x5,x7 // h1*r0
> + adc x14,x14,xzr
> + umulh x11,x5,x7
> +
> + adds x13,x13,x10
> + mul x10,x6,x9 // h2*5*r1
> + adc x14,x14,x11
> + mul x11,x6,x7 // h2*r0
> +
> + adds x13,x13,x10
> + adc x14,x14,x11
> +
> + and x10,x14,#-4 // final reduction
> + and x6,x14,#3
> + add x10,x10,x14,lsr#2
> + adds x4,x12,x10
> + adcs x5,x13,xzr
> + adc x6,x6,xzr
> +
> + cbnz x2,.Loop
> +
> + stp x4,x5,[x0] // store hash value
> + str x6,[x0,#16]
> +
> +.Lno_data:
> + ret
> +ENDPROC(poly1305_blocks_arm)
> +
> +.align 5
> +ENTRY(poly1305_emit_arm)
> + ldp x4,x5,[x0] // load hash base 2^64
> + ldr x6,[x0,#16]
> + ldp x10,x11,[x2] // load nonce
> +
> + adds x12,x4,#5 // compare to modulus
> + adcs x13,x5,xzr
> + adc x14,x6,xzr
> +
> + tst x14,#-4 // see if it's carried/borrowed
> +
> + csel x4,x4,x12,eq
> + csel x5,x5,x13,eq
> +
> +#ifdef __ARMEB__
> + ror x10,x10,#32 // flip nonce words
> + ror x11,x11,#32
> +#endif
> + adds x4,x4,x10 // accumulate nonce
> + adc x5,x5,x11
> +#ifdef __ARMEB__
> + rev x4,x4 // flip output bytes
> + rev x5,x5
> +#endif
> + stp x4,x5,[x1] // write result
> +
> + ret
> +ENDPROC(poly1305_emit_arm)
> +
> +.align 5
> +__poly1305_mult:
> + mul x12,x4,x7 // h0*r0
> + umulh x13,x4,x7
> +
> + mul x10,x5,x9 // h1*5*r1
> + umulh x11,x5,x9
> +
> + adds x12,x12,x10
> + mul x10,x4,x8 // h0*r1
> + adc x13,x13,x11
> + umulh x14,x4,x8
> +
> + adds x13,x13,x10
> + mul x10,x5,x7 // h1*r0
> + adc x14,x14,xzr
> + umulh x11,x5,x7
> +
> + adds x13,x13,x10
> + mul x10,x6,x9 // h2*5*r1
> + adc x14,x14,x11
> + mul x11,x6,x7 // h2*r0
> +
> + adds x13,x13,x10
> + adc x14,x14,x11
> +
> + and x10,x14,#-4 // final reduction
> + and x6,x14,#3
> + add x10,x10,x14,lsr#2
> + adds x4,x12,x10
> + adcs x5,x13,xzr
> + adc x6,x6,xzr
> +
> + ret
> +
> +__poly1305_splat:
> + and x12,x4,#0x03ffffff // base 2^64 -> base 2^26
> + ubfx x13,x4,#26,#26
> + extr x14,x5,x4,#52
> + and x14,x14,#0x03ffffff
> + ubfx x15,x5,#14,#26
> + extr x16,x6,x5,#40
> +
> + str w12,[x0,#16*0] // r0
> + add w12,w13,w13,lsl#2 // r1*5
> + str w13,[x0,#16*1] // r1
> + add w13,w14,w14,lsl#2 // r2*5
> + str w12,[x0,#16*2] // s1
> + str w14,[x0,#16*3] // r2
> + add w14,w15,w15,lsl#2 // r3*5
> + str w13,[x0,#16*4] // s2
> + str w15,[x0,#16*5] // r3
> + add w15,w16,w16,lsl#2 // r4*5
> + str w14,[x0,#16*6] // s3
> + str w16,[x0,#16*7] // r4
> + str w15,[x0,#16*8] // s4
> +
> + ret
> +
> +.align 5
> +ENTRY(poly1305_blocks_neon)
> + ldr x17,[x0,#24]
> + cmp x2,#128
> + b.hs .Lblocks_neon
> + cbz x17,poly1305_blocks_arm
> +
> +.Lblocks_neon:
> + stp x29,x30,[sp,#-80]!
> + add x29,sp,#0
> +
> + ands x2,x2,#-16
> + b.eq .Lno_data_neon
> +
> + cbz x17,.Lbase2_64_neon
> +
> + ldp w10,w11,[x0] // load hash value base 2^26
> + ldp w12,w13,[x0,#8]
> + ldr w14,[x0,#16]
> +
> + tst x2,#31
> + b.eq .Leven_neon
> +
> + ldp x7,x8,[x0,#32] // load key value
> +
> + add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64
> + lsr x5,x12,#12
> + adds x4,x4,x12,lsl#52
> + add x5,x5,x13,lsl#14
> + adc x5,x5,xzr
> + lsr x6,x14,#24
> + adds x5,x5,x14,lsl#40
> + adc x14,x6,xzr // can be partially reduced...
> +
> + ldp x12,x13,[x1],#16 // load input
> + sub x2,x2,#16
> + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
> +
> + and x10,x14,#-4 // ... so reduce
> + and x6,x14,#3
> + add x10,x10,x14,lsr#2
> + adds x4,x4,x10
> + adcs x5,x5,xzr
> + adc x6,x6,xzr
> +
> +#ifdef __ARMEB__
> + rev x12,x12
> + rev x13,x13
> +#endif
> + adds x4,x4,x12 // accumulate input
> + adcs x5,x5,x13
> + adc x6,x6,x3
> +
> + bl __poly1305_mult
> + ldr x30,[sp,#8]
> +
> + cbz x3,.Lstore_base2_64_neon
> +
> + and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
> + ubfx x11,x4,#26,#26
> + extr x12,x5,x4,#52
> + and x12,x12,#0x03ffffff
> + ubfx x13,x5,#14,#26
> + extr x14,x6,x5,#40
> +
> + cbnz x2,.Leven_neon
> +
> + stp w10,w11,[x0] // store hash value base 2^26
> + stp w12,w13,[x0,#8]
> + str w14,[x0,#16]
> + b .Lno_data_neon
> +
> +.align 4
> +.Lstore_base2_64_neon:
> + stp x4,x5,[x0] // store hash value base 2^64
> + stp x6,xzr,[x0,#16] // note that is_base2_26 is zeroed
> + b .Lno_data_neon
> +
> +.align 4
> +.Lbase2_64_neon:
> + ldp x7,x8,[x0,#32] // load key value
> +
> + ldp x4,x5,[x0] // load hash value base 2^64
> + ldr x6,[x0,#16]
> +
> + tst x2,#31
> + b.eq .Linit_neon
> +
> + ldp x12,x13,[x1],#16 // load input
> + sub x2,x2,#16
> + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
> +#ifdef __ARMEB__
> + rev x12,x12
> + rev x13,x13
> +#endif
> + adds x4,x4,x12 // accumulate input
> + adcs x5,x5,x13
> + adc x6,x6,x3
> +
> + bl __poly1305_mult
> +
> +.Linit_neon:
> + and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
> + ubfx x11,x4,#26,#26
> + extr x12,x5,x4,#52
> + and x12,x12,#0x03ffffff
> + ubfx x13,x5,#14,#26
> + extr x14,x6,x5,#40
> +
> + stp d8,d9,[sp,#16] // meet ABI requirements
> + stp d10,d11,[sp,#32]
> + stp d12,d13,[sp,#48]
> + stp d14,d15,[sp,#64]
> +
> + fmov d24,x10
> + fmov d25,x11
> + fmov d26,x12
> + fmov d27,x13
> + fmov d28,x14
> +
> + ////////////////////////////////// initialize r^n table
> + mov x4,x7 // r^1
> + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
> + mov x5,x8
> + mov x6,xzr
> + add x0,x0,#48+12
> + bl __poly1305_splat
> +
> + bl __poly1305_mult // r^2
> + sub x0,x0,#4
> + bl __poly1305_splat
> +
> + bl __poly1305_mult // r^3
> + sub x0,x0,#4
> + bl __poly1305_splat
> +
> + bl __poly1305_mult // r^4
> + sub x0,x0,#4
> + bl __poly1305_splat
> + ldr x30,[sp,#8]
> +
> + add x16,x1,#32
> + adr x17,.Lzeros
> + subs x2,x2,#64
> + csel x16,x17,x16,lo
> +
> + mov x4,#1
> + str x4,[x0,#-24] // set is_base2_26
> + sub x0,x0,#48 // restore original x0
> + b .Ldo_neon
> +
> +.align 4
> +.Leven_neon:
> + add x16,x1,#32
> + adr x17,.Lzeros
> + subs x2,x2,#64
> + csel x16,x17,x16,lo
> +
> + stp d8,d9,[sp,#16] // meet ABI requirements
> + stp d10,d11,[sp,#32]
> + stp d12,d13,[sp,#48]
> + stp d14,d15,[sp,#64]
> +
> + fmov d24,x10
> + fmov d25,x11
> + fmov d26,x12
> + fmov d27,x13
> + fmov d28,x14
> +
> +.Ldo_neon:
> + ldp x8,x12,[x16],#16 // inp[2:3] (or zero)
> + ldp x9,x13,[x16],#48
> +
> + lsl x3,x3,#24
> + add x15,x0,#48
> +
> +#ifdef __ARMEB__
> + rev x8,x8
> + rev x12,x12
> + rev x9,x9
> + rev x13,x13
> +#endif
> + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
> + and x5,x9,#0x03ffffff
> + ubfx x6,x8,#26,#26
> + ubfx x7,x9,#26,#26
> + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
> + extr x8,x12,x8,#52
> + extr x9,x13,x9,#52
> + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
> + fmov d14,x4
> + and x8,x8,#0x03ffffff
> + and x9,x9,#0x03ffffff
> + ubfx x10,x12,#14,#26
> + ubfx x11,x13,#14,#26
> + add x12,x3,x12,lsr#40
> + add x13,x3,x13,lsr#40
> + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
> + fmov d15,x6
> + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
> + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
> + fmov d16,x8
> + fmov d17,x10
> + fmov d18,x12
> +
> + ldp x8,x12,[x1],#16 // inp[0:1]
> + ldp x9,x13,[x1],#48
> +
> + ld1 {v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64
> + ld1 {v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64
> + ld1 {v8.4s},[x15]
> +
> +#ifdef __ARMEB__
> + rev x8,x8
> + rev x12,x12
> + rev x9,x9
> + rev x13,x13
> +#endif
> + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
> + and x5,x9,#0x03ffffff
> + ubfx x6,x8,#26,#26
> + ubfx x7,x9,#26,#26
> + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
> + extr x8,x12,x8,#52
> + extr x9,x13,x9,#52
> + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
> + fmov d9,x4
> + and x8,x8,#0x03ffffff
> + and x9,x9,#0x03ffffff
> + ubfx x10,x12,#14,#26
> + ubfx x11,x13,#14,#26
> + add x12,x3,x12,lsr#40
> + add x13,x3,x13,lsr#40
> + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
> + fmov d10,x6
> + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
> + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
> + movi v31.2d,#-1
> + fmov d11,x8
> + fmov d12,x10
> + fmov d13,x12
> + ushr v31.2d,v31.2d,#38
> +
> + b.ls .Lskip_loop
> +
> +.align 4
> +.Loop_neon:
> + ////////////////////////////////////////////////////////////////
> + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
> + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
> + // ___________________/
> + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
> + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
> + // ___________________/ ____________________/
> + //
> + // Note that we start with inp[2:3]*r^2. This is because it
> + // doesn't depend on reduction in previous iteration.
> + ////////////////////////////////////////////////////////////////
> + // d4 = h0*r4 + h1*r3 + h2*r2 + h3*r1 + h4*r0
> + // d3 = h0*r3 + h1*r2 + h2*r1 + h3*r0 + h4*5*r4
> + // d2 = h0*r2 + h1*r1 + h2*r0 + h3*5*r4 + h4*5*r3
> + // d1 = h0*r1 + h1*r0 + h2*5*r4 + h3*5*r3 + h4*5*r2
> + // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1
> +
> + subs x2,x2,#64
> + umull v23.2d,v14.2s,v7.s[2]
> + csel x16,x17,x16,lo
> + umull v22.2d,v14.2s,v5.s[2]
> + umull v21.2d,v14.2s,v3.s[2]
> + ldp x8,x12,[x16],#16 // inp[2:3] (or zero)
> + umull v20.2d,v14.2s,v1.s[2]
> + ldp x9,x13,[x16],#48
> + umull v19.2d,v14.2s,v0.s[2]
> +#ifdef __ARMEB__
> + rev x8,x8
> + rev x12,x12
> + rev x9,x9
> + rev x13,x13
> +#endif
> +
> + umlal v23.2d,v15.2s,v5.s[2]
> + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
> + umlal v22.2d,v15.2s,v3.s[2]
> + and x5,x9,#0x03ffffff
> + umlal v21.2d,v15.2s,v1.s[2]
> + ubfx x6,x8,#26,#26
> + umlal v20.2d,v15.2s,v0.s[2]
> + ubfx x7,x9,#26,#26
> + umlal v19.2d,v15.2s,v8.s[2]
> + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
> +
> + umlal v23.2d,v16.2s,v3.s[2]
> + extr x8,x12,x8,#52
> + umlal v22.2d,v16.2s,v1.s[2]
> + extr x9,x13,x9,#52
> + umlal v21.2d,v16.2s,v0.s[2]
> + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
> + umlal v20.2d,v16.2s,v8.s[2]
> + fmov d14,x4
> + umlal v19.2d,v16.2s,v6.s[2]
> + and x8,x8,#0x03ffffff
> +
> + umlal v23.2d,v17.2s,v1.s[2]
> + and x9,x9,#0x03ffffff
> + umlal v22.2d,v17.2s,v0.s[2]
> + ubfx x10,x12,#14,#26
> + umlal v21.2d,v17.2s,v8.s[2]
> + ubfx x11,x13,#14,#26
> + umlal v20.2d,v17.2s,v6.s[2]
> + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
> + umlal v19.2d,v17.2s,v4.s[2]
> + fmov d15,x6
> +
> + add v11.2s,v11.2s,v26.2s
> + add x12,x3,x12,lsr#40
> + umlal v23.2d,v18.2s,v0.s[2]
> + add x13,x3,x13,lsr#40
> + umlal v22.2d,v18.2s,v8.s[2]
> + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
> + umlal v21.2d,v18.2s,v6.s[2]
> + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
> + umlal v20.2d,v18.2s,v4.s[2]
> + fmov d16,x8
> + umlal v19.2d,v18.2s,v2.s[2]
> + fmov d17,x10
> +
> + ////////////////////////////////////////////////////////////////
> + // (hash+inp[0:1])*r^4 and accumulate
> +
> + add v9.2s,v9.2s,v24.2s
> + fmov d18,x12
> + umlal v22.2d,v11.2s,v1.s[0]
> + ldp x8,x12,[x1],#16 // inp[0:1]
> + umlal v19.2d,v11.2s,v6.s[0]
> + ldp x9,x13,[x1],#48
> + umlal v23.2d,v11.2s,v3.s[0]
> + umlal v20.2d,v11.2s,v8.s[0]
> + umlal v21.2d,v11.2s,v0.s[0]
> +#ifdef __ARMEB__
> + rev x8,x8
> + rev x12,x12
> + rev x9,x9
> + rev x13,x13
> +#endif
> +
> + add v10.2s,v10.2s,v25.2s
> + umlal v22.2d,v9.2s,v5.s[0]
> + umlal v23.2d,v9.2s,v7.s[0]
> + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
> + umlal v21.2d,v9.2s,v3.s[0]
> + and x5,x9,#0x03ffffff
> + umlal v19.2d,v9.2s,v0.s[0]
> + ubfx x6,x8,#26,#26
> + umlal v20.2d,v9.2s,v1.s[0]
> + ubfx x7,x9,#26,#26
> +
> + add v12.2s,v12.2s,v27.2s
> + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
> + umlal v22.2d,v10.2s,v3.s[0]
> + extr x8,x12,x8,#52
> + umlal v23.2d,v10.2s,v5.s[0]
> + extr x9,x13,x9,#52
> + umlal v19.2d,v10.2s,v8.s[0]
> + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
> + umlal v21.2d,v10.2s,v1.s[0]
> + fmov d9,x4
> + umlal v20.2d,v10.2s,v0.s[0]
> + and x8,x8,#0x03ffffff
> +
> + add v13.2s,v13.2s,v28.2s
> + and x9,x9,#0x03ffffff
> + umlal v22.2d,v12.2s,v0.s[0]
> + ubfx x10,x12,#14,#26
> + umlal v19.2d,v12.2s,v4.s[0]
> + ubfx x11,x13,#14,#26
> + umlal v23.2d,v12.2s,v1.s[0]
> + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
> + umlal v20.2d,v12.2s,v6.s[0]
> + fmov d10,x6
> + umlal v21.2d,v12.2s,v8.s[0]
> + add x12,x3,x12,lsr#40
> +
> + umlal v22.2d,v13.2s,v8.s[0]
> + add x13,x3,x13,lsr#40
> + umlal v19.2d,v13.2s,v2.s[0]
> + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
> + umlal v23.2d,v13.2s,v0.s[0]
> + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
> + umlal v20.2d,v13.2s,v4.s[0]
> + fmov d11,x8
> + umlal v21.2d,v13.2s,v6.s[0]
> + fmov d12,x10
> + fmov d13,x12
> +
> + /////////////////////////////////////////////////////////////////
> + // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
> + // and P. Schwabe
> + //
> + // [see discussion in poly1305-armv4 module]
> +
> + ushr v29.2d,v22.2d,#26
> + xtn v27.2s,v22.2d
> + ushr v30.2d,v19.2d,#26
> + and v19.16b,v19.16b,v31.16b
> + add v23.2d,v23.2d,v29.2d // h3 -> h4
> + bic v27.2s,#0xfc,lsl#24 // &=0x03ffffff
> + add v20.2d,v20.2d,v30.2d // h0 -> h1
> +
> + ushr v29.2d,v23.2d,#26
> + xtn v28.2s,v23.2d
> + ushr v30.2d,v20.2d,#26
> + xtn v25.2s,v20.2d
> + bic v28.2s,#0xfc,lsl#24
> + add v21.2d,v21.2d,v30.2d // h1 -> h2
> +
> + add v19.2d,v19.2d,v29.2d
> + shl v29.2d,v29.2d,#2
> + shrn v30.2s,v21.2d,#26
> + xtn v26.2s,v21.2d
> + add v19.2d,v19.2d,v29.2d // h4 -> h0
> + bic v25.2s,#0xfc,lsl#24
> + add v27.2s,v27.2s,v30.2s // h2 -> h3
> + bic v26.2s,#0xfc,lsl#24
> +
> + shrn v29.2s,v19.2d,#26
> + xtn v24.2s,v19.2d
> + ushr v30.2s,v27.2s,#26
> + bic v27.2s,#0xfc,lsl#24
> + bic v24.2s,#0xfc,lsl#24
> + add v25.2s,v25.2s,v29.2s // h0 -> h1
> + add v28.2s,v28.2s,v30.2s // h3 -> h4
> +
> + b.hi .Loop_neon
> +
> +.Lskip_loop:
> + dup v16.2d,v16.d[0]
> + add v11.2s,v11.2s,v26.2s
> +
> + ////////////////////////////////////////////////////////////////
> + // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
> +
> + adds x2,x2,#32
> + b.ne .Long_tail
> +
> + dup v16.2d,v11.d[0]
> + add v14.2s,v9.2s,v24.2s
> + add v17.2s,v12.2s,v27.2s
> + add v15.2s,v10.2s,v25.2s
> + add v18.2s,v13.2s,v28.2s
> +
> +.Long_tail:
> + dup v14.2d,v14.d[0]
> + umull2 v19.2d,v16.4s,v6.4s
> + umull2 v22.2d,v16.4s,v1.4s
> + umull2 v23.2d,v16.4s,v3.4s
> + umull2 v21.2d,v16.4s,v0.4s
> + umull2 v20.2d,v16.4s,v8.4s
> +
> + dup v15.2d,v15.d[0]
> + umlal2 v19.2d,v14.4s,v0.4s
> + umlal2 v21.2d,v14.4s,v3.4s
> + umlal2 v22.2d,v14.4s,v5.4s
> + umlal2 v23.2d,v14.4s,v7.4s
> + umlal2 v20.2d,v14.4s,v1.4s
> +
> + dup v17.2d,v17.d[0]
> + umlal2 v19.2d,v15.4s,v8.4s
> + umlal2 v22.2d,v15.4s,v3.4s
> + umlal2 v21.2d,v15.4s,v1.4s
> + umlal2 v23.2d,v15.4s,v5.4s
> + umlal2 v20.2d,v15.4s,v0.4s
> +
> + dup v18.2d,v18.d[0]
> + umlal2 v22.2d,v17.4s,v0.4s
> + umlal2 v23.2d,v17.4s,v1.4s
> + umlal2 v19.2d,v17.4s,v4.4s
> + umlal2 v20.2d,v17.4s,v6.4s
> + umlal2 v21.2d,v17.4s,v8.4s
> +
> + umlal2 v22.2d,v18.4s,v8.4s
> + umlal2 v19.2d,v18.4s,v2.4s
> + umlal2 v23.2d,v18.4s,v0.4s
> + umlal2 v20.2d,v18.4s,v4.4s
> + umlal2 v21.2d,v18.4s,v6.4s
> +
> + b.eq .Lshort_tail
> +
> + ////////////////////////////////////////////////////////////////
> + // (hash+inp[0:1])*r^4:r^3 and accumulate
> +
> + add v9.2s,v9.2s,v24.2s
> + umlal v22.2d,v11.2s,v1.2s
> + umlal v19.2d,v11.2s,v6.2s
> + umlal v23.2d,v11.2s,v3.2s
> + umlal v20.2d,v11.2s,v8.2s
> + umlal v21.2d,v11.2s,v0.2s
> +
> + add v10.2s,v10.2s,v25.2s
> + umlal v22.2d,v9.2s,v5.2s
> + umlal v19.2d,v9.2s,v0.2s
> + umlal v23.2d,v9.2s,v7.2s
> + umlal v20.2d,v9.2s,v1.2s
> + umlal v21.2d,v9.2s,v3.2s
> +
> + add v12.2s,v12.2s,v27.2s
> + umlal v22.2d,v10.2s,v3.2s
> + umlal v19.2d,v10.2s,v8.2s
> + umlal v23.2d,v10.2s,v5.2s
> + umlal v20.2d,v10.2s,v0.2s
> + umlal v21.2d,v10.2s,v1.2s
> +
> + add v13.2s,v13.2s,v28.2s
> + umlal v22.2d,v12.2s,v0.2s
> + umlal v19.2d,v12.2s,v4.2s
> + umlal v23.2d,v12.2s,v1.2s
> + umlal v20.2d,v12.2s,v6.2s
> + umlal v21.2d,v12.2s,v8.2s
> +
> + umlal v22.2d,v13.2s,v8.2s
> + umlal v19.2d,v13.2s,v2.2s
> + umlal v23.2d,v13.2s,v0.2s
> + umlal v20.2d,v13.2s,v4.2s
> + umlal v21.2d,v13.2s,v6.2s
> +
> +.Lshort_tail:
> + ////////////////////////////////////////////////////////////////
> + // horizontal add
> +
> + addp v22.2d,v22.2d,v22.2d
> + ldp d8,d9,[sp,#16] // meet ABI requirements
> + addp v19.2d,v19.2d,v19.2d
> + ldp d10,d11,[sp,#32]
> + addp v23.2d,v23.2d,v23.2d
> + ldp d12,d13,[sp,#48]
> + addp v20.2d,v20.2d,v20.2d
> + ldp d14,d15,[sp,#64]
> + addp v21.2d,v21.2d,v21.2d
> +
> + ////////////////////////////////////////////////////////////////
> + // lazy reduction, but without narrowing
> +
> + ushr v29.2d,v22.2d,#26
> + and v22.16b,v22.16b,v31.16b
> + ushr v30.2d,v19.2d,#26
> + and v19.16b,v19.16b,v31.16b
> +
> + add v23.2d,v23.2d,v29.2d // h3 -> h4
> + add v20.2d,v20.2d,v30.2d // h0 -> h1
> +
> + ushr v29.2d,v23.2d,#26
> + and v23.16b,v23.16b,v31.16b
> + ushr v30.2d,v20.2d,#26
> + and v20.16b,v20.16b,v31.16b
> + add v21.2d,v21.2d,v30.2d // h1 -> h2
> +
> + add v19.2d,v19.2d,v29.2d
> + shl v29.2d,v29.2d,#2
> + ushr v30.2d,v21.2d,#26
> + and v21.16b,v21.16b,v31.16b
> + add v19.2d,v19.2d,v29.2d // h4 -> h0
> + add v22.2d,v22.2d,v30.2d // h2 -> h3
> +
> + ushr v29.2d,v19.2d,#26
> + and v19.16b,v19.16b,v31.16b
> + ushr v30.2d,v22.2d,#26
> + and v22.16b,v22.16b,v31.16b
> + add v20.2d,v20.2d,v29.2d // h0 -> h1
> + add v23.2d,v23.2d,v30.2d // h3 -> h4
> +
> + ////////////////////////////////////////////////////////////////
> + // write the result, can be partially reduced
> +
> + st4 {v19.s,v20.s,v21.s,v22.s}[0],[x0],#16
> + st1 {v23.s}[0],[x0]
> +
> +.Lno_data_neon:
> + ldr x29,[sp],#80
> + ret
> +ENDPROC(poly1305_blocks_neon)
> +
> +.align 5
> +ENTRY(poly1305_emit_neon)
> + ldr x17,[x0,#24]
> + cbz x17,poly1305_emit_arm
> +
> + ldp w10,w11,[x0] // load hash value base 2^26
> + ldp w12,w13,[x0,#8]
> + ldr w14,[x0,#16]
> +
> + add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64
> + lsr x5,x12,#12
> + adds x4,x4,x12,lsl#52
> + add x5,x5,x13,lsl#14
> + adc x5,x5,xzr
> + lsr x6,x14,#24
> + adds x5,x5,x14,lsl#40
> + adc x6,x6,xzr // can be partially reduced...
> +
> + ldp x10,x11,[x2] // load nonce
> +
> + and x12,x6,#-4 // ... so reduce
> + add x12,x12,x6,lsr#2
> + and x6,x6,#3
> + adds x4,x4,x12
> + adcs x5,x5,xzr
> + adc x6,x6,xzr
> +
> + adds x12,x4,#5 // compare to modulus
> + adcs x13,x5,xzr
> + adc x14,x6,xzr
> +
> + tst x14,#-4 // see if it's carried/borrowed
> +
> + csel x4,x4,x12,eq
> + csel x5,x5,x13,eq
> +
> +#ifdef __ARMEB__
> + ror x10,x10,#32 // flip nonce words
> + ror x11,x11,#32
> +#endif
> + adds x4,x4,x10 // accumulate nonce
> + adc x5,x5,x11
> +#ifdef __ARMEB__
> + rev x4,x4 // flip output bytes
> + rev x5,x5
> +#endif
> + stp x4,x5,[x1] // write result
> +
> + ret
> +ENDPROC(poly1305_emit_neon)
> +
> +.align 5
> +.Lzeros:
> +.long 0,0,0,0,0,0,0,0
> --
> 2.19.0
>
^ permalink raw reply
* [PATCH net-next] cxgb4: update supported DCB version
From: Ganesh Goudar @ 2018-09-14 12:05 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, indranil, dt, varun, Ganesh Goudar
- In CXGB4_DCB_STATE_FW_INCOMPLETE state check if the dcb
version is changed and update the dcb supported version.
- Also, fill the priority code point value for priority
based flow control.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c | 27 ++++++++++++++++++++++++++
drivers/net/ethernet/chelsio/cxgb4/l2t.c | 6 ++++--
2 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
index b34f0f0..6ba3104 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
@@ -114,6 +114,24 @@ void cxgb4_dcb_reset(struct net_device *dev)
cxgb4_dcb_state_init(dev);
}
+/* update the dcb port support, if version is IEEE then set it to
+ * FW_PORT_DCB_VER_IEEE and if DCB_CAP_DCBX_VER_CEE is already set then
+ * clear that. and if it is set to CEE then set dcb supported to
+ * DCB_CAP_DCBX_VER_CEE & if DCB_CAP_DCBX_VER_IEEE is set, clear it
+ */
+static inline void cxgb4_dcb_update_support(struct port_dcb_info *dcb)
+{
+ if (dcb->dcb_version == FW_PORT_DCB_VER_IEEE) {
+ if (dcb->supported & DCB_CAP_DCBX_VER_CEE)
+ dcb->supported &= ~DCB_CAP_DCBX_VER_CEE;
+ dcb->supported |= DCB_CAP_DCBX_VER_IEEE;
+ } else if (dcb->dcb_version == FW_PORT_DCB_VER_CEE1D01) {
+ if (dcb->supported & DCB_CAP_DCBX_VER_IEEE)
+ dcb->supported &= ~DCB_CAP_DCBX_VER_IEEE;
+ dcb->supported |= DCB_CAP_DCBX_VER_CEE;
+ }
+}
+
/* Finite State machine for Data Center Bridging.
*/
void cxgb4_dcb_state_fsm(struct net_device *dev,
@@ -165,6 +183,15 @@ void cxgb4_dcb_state_fsm(struct net_device *dev,
}
case CXGB4_DCB_STATE_FW_INCOMPLETE: {
+ if (transition_to != CXGB4_DCB_INPUT_FW_DISABLED) {
+ /* during this CXGB4_DCB_STATE_FW_INCOMPLETE state,
+ * check if the dcb version is changed (there can be
+ * mismatch in default config & the negotiated switch
+ * configuration at FW, so update the dcb support
+ * accordingly.
+ */
+ cxgb4_dcb_update_support(dcb);
+ }
switch (transition_to) {
case CXGB4_DCB_INPUT_FW_ENABLED: {
/* we're alreaady in firmware DCB mode */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/l2t.c b/drivers/net/ethernet/chelsio/cxgb4/l2t.c
index 301c4df..99022c0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/l2t.c
@@ -433,10 +433,12 @@ struct l2t_entry *cxgb4_l2t_get(struct l2t_data *d, struct neighbour *neigh,
else
lport = netdev2pinfo(physdev)->lport;
- if (is_vlan_dev(neigh->dev))
+ if (is_vlan_dev(neigh->dev)) {
vlan = vlan_dev_vlan_id(neigh->dev);
- else
+ vlan |= vlan_dev_get_egress_qos_mask(neigh->dev, priority);
+ } else {
vlan = VLAN_NONE;
+ }
write_lock_bh(&d->lock);
for (e = d->l2tab[hash].first; e; e = e->next)
--
2.1.0
^ permalink raw reply related
* Re: [PATCH] net/mlx4_core: print firmware version during driver loading
From: Qing Huang @ 2018-09-14 17:15 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: netdev, linux-rdma, linux-kernel, tariqt, davem
In-Reply-To: <20180914044314.GC5257@mtr-leonro.mtl.com>
The FW version is actually a very crucial piece of information and only
printed once here
when the driver is loaded. People tend to get confused when switching
multiple FW files
back and forth without running separate utility tools, especially at
customer sites.
IMHO, this information is very useful and only takes up very little log
file space. :-)
I was also thinking of doing something slightly differently. Maybe we
just trim down the
output string, and add something like this?
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2208,6 +2208,11 @@ static int mlx4_init_fw(struct mlx4_dev *dev)
return err;
}
+ mlx4_info(dev, "Installed FW version is %d.%d.%03d.\n",
+ (int) (dev->caps.fw_ver >> 32),
+ (int) (dev->caps.fw_ver >> 16) & 0xffff,
+ (int) dev->caps.fw_ver & 0xffff);
+
err = mlx4_load_fw(dev);
if (err) {
mlx4_err(dev, "Failed to start FW, aborting\n");
Thanks,
Qing
On 9/13/2018 9:43 PM, Leon Romanovsky wrote:
> On Thu, Sep 13, 2018 at 05:25:14PM -0700, Qing Huang wrote:
>> When debugging firmware related issues, it's very helpful to have
> ^^^^^^^^^^ exactly, this is why we set this print as mlx4_dbg and
> not mlx4_info.
>
>> the installed FW version info in the kernel log when the driver is
>> loaded. It's easier to match error/warning messages with different
>> FW versions in the log other than running a separate tool to get
>> the information back and forth.
>>
>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
>> index babcfd9..e1c5218 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
>> @@ -1686,11 +1686,11 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
>> MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
>> cmd->max_cmds = 1 << lg;
>>
>> - mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
>> - (int) (dev->caps.fw_ver >> 32),
>> - (int) (dev->caps.fw_ver >> 16) & 0xffff,
>> - (int) dev->caps.fw_ver & 0xffff,
>> - cmd_if_rev, cmd->max_cmds);
>> + mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
>> + (int)(dev->caps.fw_ver >> 32),
>> + (int)(dev->caps.fw_ver >> 16) & 0xffff,
>> + (int)dev->caps.fw_ver & 0xffff,
>> + cmd_if_rev, cmd->max_cmds);
>>
>> MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET);
>> MLX4_GET(fw->catas_size, outbox, QUERY_FW_ERR_SIZE_OFFSET);
>> --
>> 2.9.3
>>
^ permalink raw reply
* Re: [PATCH 5/7] MIPS: mscc: ocelot: add GPIO4 pinmuxing DT node
From: Andrew Lunn @ 2018-09-14 17:02 UTC (permalink / raw)
To: Quentin Schulz
Cc: Alexandre Belloni, ralf, paul.burton, jhogan, robh+dt,
mark.rutland, davem, f.fainelli, allan.nielsen, linux-mips,
devicetree, linux-kernel, netdev, thomas.petazzoni,
antoine.tenart
In-Reply-To: <20180914162638.fgzzjin2bzgx74de@qschulz>
On Fri, Sep 14, 2018 at 06:26:38PM +0200, Quentin Schulz wrote:
> Hi Alexandre,
>
> On Fri, Sep 14, 2018 at 04:54:46PM +0200, Alexandre Belloni wrote:
> > Hi,
> >
> > On 14/09/2018 11:44:26+0200, Quentin Schulz wrote:
> > > In order to use GPIO4 as a GPIO, we need to mux it in this mode so let's
> > > declare a new pinctrl DT node for it.
> > >
> > > Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
> > > ---
> > > arch/mips/boot/dts/mscc/ocelot.dtsi | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/arch/mips/boot/dts/mscc/ocelot.dtsi b/arch/mips/boot/dts/mscc/ocelot.dtsi
> > > index 8ce317c..b5c4c74 100644
> > > --- a/arch/mips/boot/dts/mscc/ocelot.dtsi
> > > +++ b/arch/mips/boot/dts/mscc/ocelot.dtsi
> > > @@ -182,6 +182,11 @@
> > > interrupts = <13>;
> > > #interrupt-cells = <2>;
> > >
> > > + gpio4: gpio4 {
> > > + pins = "GPIO_4";
> > > + function = "gpio";
> > > + };
> > > +
> >
> > For a GPIO, I would do that in the board dts because it is not used
> > directly in the dtsi.
> >
>
> And the day we've two boards using this pinctrl we move it to a dtsi. Is
> that the plan?
Hi Quentin
gpio4 appears to be pretty arbitrary. Could a different design use a
different gpio? It me, this seems like a board property.
Andrew
^ permalink raw reply
* Re: [PATCH net-next 2/7] net: phy: mscc: add support for VSC8584 PHY
From: Andrew Lunn @ 2018-09-14 16:58 UTC (permalink / raw)
To: Quentin Schulz
Cc: alexandre.belloni, ralf, paul.burton, jhogan, robh+dt,
mark.rutland, davem, f.fainelli, allan.nielsen, linux-mips,
devicetree, linux-kernel, netdev, thomas.petazzoni,
antoine.tenart
In-Reply-To: <20180914162828.5e75ffh5sig4om3d@qschulz>
> Confirmed by HW engineers, it only impacts PHYs in the same package.
Hi Quentin
Thanks for checking. As you said, it would be counter intuitive,
meaning a lot of confusion if it actually did happen.
Maybe you can add "in package" before broadcast in the commit message
and the code comments.
Andrew
^ permalink raw reply
* [PATCH] net: hp100: fix always-true check for link up state
From: Colin King @ 2018-09-14 16:39 UTC (permalink / raw)
To: Jaroslav Kysela, David S . Miller, netdev; +Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value
that is always non-zero and hence the wait for the link to drop always
terminates prematurely. Fix this by using a logical not operator instead
of a bitwise complement. This issue has been in the driver since
pre-2.6.12-rc2.
Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/ethernet/hp/hp100.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hp/hp100.c b/drivers/net/ethernet/hp/hp100.c
index c8c7ad2eff77..9b5a68b65432 100644
--- a/drivers/net/ethernet/hp/hp100.c
+++ b/drivers/net/ethernet/hp/hp100.c
@@ -2634,7 +2634,7 @@ static int hp100_login_to_vg_hub(struct net_device *dev, u_short force_relogin)
/* Wait for link to drop */
time = jiffies + (HZ / 10);
do {
- if (~(hp100_inb(VG_LAN_CFG_1) & HP100_LINK_UP_ST))
+ if (!(hp100_inb(VG_LAN_CFG_1) & HP100_LINK_UP_ST))
break;
if (!in_interrupt())
schedule_timeout_interruptible(1);
--
2.17.1
^ permalink raw reply related
* [PATCH] net: ethernet: ti: add missing GENERIC_ALLOCATOR dependency
From: Corentin Labbe @ 2018-09-14 11:20 UTC (permalink / raw)
To: davem; +Cc: linux-kernel, netdev, Corentin Labbe
This patch mades TI_DAVINCI_CPDMA select GENERIC_ALLOCATOR.
without that, the following sparc64 build failure happen
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_check_free_tx_desc':
(.text+0x278): undefined reference to `gen_pool_avail'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_chan_submit':
(.text+0x340): undefined reference to `gen_pool_alloc'
(.text+0x5c4): undefined reference to `gen_pool_free'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `__cpdma_chan_free':
davinci_cpdma.c:(.text+0x64c): undefined reference to `gen_pool_free'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_desc_pool_destroy.isra.6':
davinci_cpdma.c:(.text+0x17ac): undefined reference to `gen_pool_size'
davinci_cpdma.c:(.text+0x17b8): undefined reference to `gen_pool_avail'
davinci_cpdma.c:(.text+0x1824): undefined reference to `gen_pool_size'
davinci_cpdma.c:(.text+0x1830): undefined reference to `gen_pool_avail'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_ctlr_create':
(.text+0x19f8): undefined reference to `devm_gen_pool_create'
(.text+0x1a90): undefined reference to `gen_pool_add_virt'
Makefile:1011: recipe for target 'vmlinux' failed
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
---
drivers/net/ethernet/ti/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 9263d63..f932923 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -41,6 +41,7 @@ config TI_DAVINCI_MDIO
config TI_DAVINCI_CPDMA
tristate "TI DaVinci CPDMA Support"
depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST
+ select GENERIC_ALLOCATOR
---help---
This driver supports TI's DaVinci CPDMA dma engine.
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next 2/7] net: phy: mscc: add support for VSC8584 PHY
From: Quentin Schulz @ 2018-09-14 16:28 UTC (permalink / raw)
To: Andrew Lunn
Cc: alexandre.belloni, ralf, paul.burton, jhogan, robh+dt,
mark.rutland, davem, f.fainelli, allan.nielsen, linux-mips,
devicetree, linux-kernel, netdev, thomas.petazzoni,
antoine.tenart
In-Reply-To: <20180914132930.fphdm3dm2incetbq@qschulz>
[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]
Hi Andrew,
On Fri, Sep 14, 2018 at 03:29:30PM +0200, Quentin Schulz wrote:
> Hi Andrew,
>
> On Fri, Sep 14, 2018 at 03:18:46PM +0200, Andrew Lunn wrote:
> > > Most of the init sequence of a PHY of the package is common to all PHYs
> > > in the package, thus we use the SMI broadcast feature which enables us
> > > to propagate a write in one register of one PHY to all PHYs in the
> > > package.
> >
> > Hi Quinten
> >
> > Could you say a bit more about the broadcast. Does the SMI broadcast
> > go to all PHY everywhere on an MDIO bus, or only all PHYs within one
> > package? I'm just thinking about the case you need two of these
> > packages to cover 8 switch ports.
> >
>
> Ah sorry, that wasn't very explicit. That's a feature on the PHY side so
> my wildest guess is that it wouldn't impact any other PHY outside of
> this package. Affecting any other PHY on the bus is counter-intuitive to
> me but I'll ask the HW engineers for confirmation.
>
Confirmed by HW engineers, it only impacts PHYs in the same package.
Quentin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH 5/7] MIPS: mscc: ocelot: add GPIO4 pinmuxing DT node
From: Quentin Schulz @ 2018-09-14 16:26 UTC (permalink / raw)
To: Alexandre Belloni
Cc: ralf, paul.burton, jhogan, robh+dt, mark.rutland, davem, andrew,
f.fainelli, allan.nielsen, linux-mips, devicetree, linux-kernel,
netdev, thomas.petazzoni, antoine.tenart
In-Reply-To: <20180914145446.GQ14988@piout.net>
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
Hi Alexandre,
On Fri, Sep 14, 2018 at 04:54:46PM +0200, Alexandre Belloni wrote:
> Hi,
>
> On 14/09/2018 11:44:26+0200, Quentin Schulz wrote:
> > In order to use GPIO4 as a GPIO, we need to mux it in this mode so let's
> > declare a new pinctrl DT node for it.
> >
> > Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
> > ---
> > arch/mips/boot/dts/mscc/ocelot.dtsi | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/mips/boot/dts/mscc/ocelot.dtsi b/arch/mips/boot/dts/mscc/ocelot.dtsi
> > index 8ce317c..b5c4c74 100644
> > --- a/arch/mips/boot/dts/mscc/ocelot.dtsi
> > +++ b/arch/mips/boot/dts/mscc/ocelot.dtsi
> > @@ -182,6 +182,11 @@
> > interrupts = <13>;
> > #interrupt-cells = <2>;
> >
> > + gpio4: gpio4 {
> > + pins = "GPIO_4";
> > + function = "gpio";
> > + };
> > +
>
> For a GPIO, I would do that in the board dts because it is not used
> directly in the dtsi.
>
And the day we've two boards using this pinctrl we move it to a dtsi. Is
that the plan?
Thanks,
Quentin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox