Netdev List
 help / color / mirror / Atom feed
* Re: STMMAC driver with TSO enabled issue
From: Jose Abreu @ 2018-05-25 14:05 UTC (permalink / raw)
  To: Bhadram Varka, Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <3143ce1f-30e7-6b8d-06b5-6048abab54bc@nvidia.com>

Hi Bhadram,

On 25-05-2018 05:41, Bhadram Varka wrote:
> Hi Jose,
>
> On 5/24/2018 3:01 PM, Jose Abreu wrote:
>> Hi Bhadram,
>>
>> On 24-05-2018 06:58, Bhadram Varka wrote:
>>>
>>> After some time if check Tx descriptor status - then I see only
>>> below
>>>
>>> [..]
>>> [85788.286730] 027 [0x827951b0]: 0xf854f000 0x0 0x16d8
>>> 0x90000000
>>>
>>> index 025 and 026 descriptors processed but not index 027.
>>>
>>> At this stage Tx DMA is always in below state -
>>>
>>> ■ 3'b011: Running (Reading Data from system memory
>>> buffer and queuing it to the Tx buffer (Tx FIFO))
>>
>> Thats strange, I think the descriptors look okay though. I will
>> need the registers values (before the lock) and, if possible, the
>> git bisect output.
>
> Attaching the register dump file after the issue observed.
> Please check once.
>

----->8-----
0x112c = 0x0000003F
0x11ac = 0x0000003F
0x122c = 0x0000003F
0x12ac = 0x0000003F

0x1130 = 0x0000003F
0x11b0 = 0x0000003F
0x1230 = 0x0000003F
0x12b0 = 0x0000003F
----->8-----

This can't be right, it should be DMA_{RX/TX}_SIZE - 1 = 511. Did
you change these values in the code?

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

* KASAN: use-after-free Write in tls_push_record
From: syzbot @ 2018-05-25 14:16 UTC (permalink / raw)
  To: aviadye, borisp, davejwatson, davem, linux-kernel, netdev,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    13405468f49d bpfilter: don't pass O_CREAT when opening con..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=109ad82f800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8be0182d69f8d422
dashboard link: https://syzkaller.appspot.com/bug?extid=709f2810a6a05f11d4d3
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=151ec3a7800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=154d302f800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com

RDX: 00000000fffffdef RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00007ffd6ccdd780 R08: 0000000020000000 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000004
R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000000
==================================================================
BUG: KASAN: use-after-free in tls_fill_prepend include/net/tls.h:339  
[inline]
BUG: KASAN: use-after-free in tls_push_record+0x1023/0x13e0  
net/tls/tls_sw.c:240
Write of size 1 at addr ffff8801d88d5000 by task syz-executor377/4600

CPU: 1 PID: 4600 Comm: syz-executor377 Not tainted 4.17.0-rc6+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435
  tls_fill_prepend include/net/tls.h:339 [inline]
  tls_push_record+0x1023/0x13e0 net/tls/tls_sw.c:240
  tls_sw_sendmsg+0x9de/0x12b0 net/tls/tls_sw.c:484
  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  __sys_sendto+0x3d7/0x670 net/socket.c:1789
  __do_sys_sendto net/socket.c:1801 [inline]
  __se_sys_sendto net/socket.c:1797 [inline]
  __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4416d9
RSP: 002b:00007ffd6ccdd758 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004416d9
RDX: 00000000fffffdef RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00007ffd6ccdd780 R08: 0000000020000000 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000004
R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0007623540 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0007592b60 ffff8801dae2fdd8 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801d88d4f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801d88d4f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801d88d5000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                    ^
  ffff8801d88d5080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8801d88d5100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [patch iproute2/net-next 2/2] devlink: introduce support for showing port number and split subport number
From: David Ahern @ 2018-05-25 14:19 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, idosch, jakub.kicinski, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, simon.horman,
	pieter.jansenvanvuuren, john.hurley, dirk.vandermerwe,
	alexander.h.duyck, ogerlitz, vijaya.guvva, satananda.burla,
	raghu.vatsavayi, felix.manlunas, gospo, sathya.perla,
	vasundhara-v.volam, tariqt, eranbe, jeffrey.t.kirsher, roopa
In-Reply-To: <20180524063952.GC2295@nanopsycho>

On 5/24/18 12:39 AM, Jiri Pirko wrote:
> Wed, May 23, 2018 at 10:05:49PM CEST, dsahern@gmail.com wrote:
>> On 5/20/18 2:15 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  devlink/devlink.c            | 6 ++++++
>>>  include/uapi/linux/devlink.h | 2 ++
>>>  2 files changed, 8 insertions(+)
>>>
>>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>>> index df2c66dac1c7..b0ae17767dab 100644
>>> --- a/devlink/devlink.c
>>> +++ b/devlink/devlink.c
>>> @@ -1737,9 +1737,15 @@ static void pr_out_port(struct dl *dl, struct nlattr **tb)
>>>  
>>>  		pr_out_str(dl, "flavour", port_flavour_name(port_flavour));
>>>  	}
>>> +	if (tb[DEVLINK_ATTR_PORT_NUMBER])
>>> +		pr_out_uint(dl, "number",
>>> +			    mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_NUMBER]));
>>
>> "number" is a label means nothing. "port" is more descriptive.
> 
> That attribute name is "port_number". As the other attributes are
> named "port_something", and the "something" is printed out here, the
> "number" is consistent with it. Each line represents a port with a list
> of attributes.

The name of the attribute is not relevant here. That's an API that very
few people will see. I am looking at this from a user perspective and
the word "number" followed by a number is not clear.

> 
>>
>> # ./devlink port
>> pci/0000:03:00.0/1: type eth netdev swp17 flavour physical number 17
>> pci/0000:03:00.0/3: type eth netdev swp18 flavour physical number 18
>> pci/0000:03:00.0/5: type eth netdev swp19 flavour physical number 19
>> pci/0000:03:00.0/7: type eth netdev swp20 flavour physical number 20
>> pci/0000:03:00.0/9: type eth netdev swp21 flavour physical number 21
>> ...
>> pci/0000:03:00.0/61: type eth netdev swp1s0 flavour physical number 1
>> split_group 1 subport 0
>> pci/0000:03:00.0/62: type eth netdev swp1s1 flavour physical number 1
>> split_group 1 subport 1
>>

^ permalink raw reply

* Re: [PATCH] ath10k: transmit queued frames after waking queues
From: Niklas Cassel @ 2018-05-25 14:21 UTC (permalink / raw)
  To: Bob Copeland
  Cc: Adrian Chadd, Kalle Valo, David Miller, ath10k, linux-wireless,
	netdev, Linux Kernel Mailing List
In-Reply-To: <20180525125023.alc42lkgehc6iodg@localhost>

On Fri, May 25, 2018 at 08:50:23AM -0400, Bob Copeland wrote:
> On Fri, May 25, 2018 at 02:36:56PM +0200, Niklas Cassel wrote:
> > A spin lock does have the advantage of ordering: memory operations issued
> > before the spin_unlock_bh() will be completed before the spin_unlock_bh()
> > operation has completed.
> > 
> > However, ath10k_htt_tx_dec_pending() was called earlier in the same function,
> > which decreases htt->num_pending_tx, so that write will be completed before
> > our read. That is the only ordering we care about here (if we should call
> > ath10k_mac_tx_push_pending() or not).
> 
> Sure.  I also understand that reading inside a lock and operating on the
> value outside the lock isn't really the definition of synchronization
> (doesn't really matter in this case though).
> 
> I was just suggesting that the implicit memory barrier in the spin unlock
> that we are already paying for would be sufficient here too, and it matches
> the semantic of "tx fields under tx_lock."  On the other hand, maybe it's
> just me, but I tend to look askance at just-in-case READ_ONCEs sprinkled
> about.

I agree, because of the implicit memory barrier from spin_unlock_bh(),
READ_ONCE shouldn't really be needed in this case.

I think that it's a good thing to be critical of all "just-in-case" things,
however, it's not always that obvious if you actually need READ_ONCE or not.

E.g. you might need to use it even when you are holding a spin_lock.

Some people recommend to use it for all concurrent non-read-only shared memory
accesses: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

Is there a better guideline somewhere..?


Kind regards,
Niklas

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: David Miller @ 2018-05-25 14:23 UTC (permalink / raw)
  To: qing.huang
  Cc: tariqt, haakon.bugge, yanjun.zhu, netdev, linux-rdma,
	linux-kernel, gi-oh.kim
In-Reply-To: <20180523232246.20445-1-qing.huang@oracle.com>

From: Qing Huang <qing.huang@oracle.com>
Date: Wed, 23 May 2018 16:22:46 -0700

> When a system is under memory presure (high usage with fragments),
> the original 256KB ICM chunk allocations will likely trigger kernel
> memory management to enter slow path doing memory compact/migration
> ops in order to complete high order memory allocations.
> 
> When that happens, user processes calling uverb APIs may get stuck
> for more than 120s easily even though there are a lot of free pages
> in smaller chunks available in the system.
> 
> Syslog:
> ...
> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
> oracle_205573_e:205573 blocked for more than 120 seconds.
> ...
> 
> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
> 
> However in order to support smaller ICM chunk size, we need to fix
> another issue in large size kcalloc allocations.
> 
> E.g.
> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
> entry). So we need a 16MB allocation for a table->icm pointer array to
> hold 2M pointers which can easily cause kcalloc to fail.
> 
> The solution is to use kvzalloc to replace kcalloc which will fall back
> to vmalloc automatically if kmalloc fails.
> 
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> Acked-by: Daniel Jurgens <danielj@mellanox.com>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Applied, thanks.

^ permalink raw reply

* Re: STMMAC driver with TSO enabled issue
From: Bhadram Varka @ 2018-05-25 14:25 UTC (permalink / raw)
  To: Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <78914761-0375-5929-ed88-5225e0e260b9@synopsys.com>

Hi Jose,

On 5/25/2018 7:35 PM, Jose Abreu wrote:
> Hi Bhadram,
> 
> On 25-05-2018 05:41, Bhadram Varka wrote:
>> Hi Jose,
>>
>> On 5/24/2018 3:01 PM, Jose Abreu wrote:
>>> Hi Bhadram,
>>>
>>> On 24-05-2018 06:58, Bhadram Varka wrote:
>>>>
>>>> After some time if check Tx descriptor status - then I see only
>>>> below
>>>>
>>>> [..]
>>>> [85788.286730] 027 [0x827951b0]: 0xf854f000 0x0 0x16d8
>>>> 0x90000000
>>>>
>>>> index 025 and 026 descriptors processed but not index 027.
>>>>
>>>> At this stage Tx DMA is always in below state -
>>>>
>>>> ■ 3'b011: Running (Reading Data from system memory
>>>> buffer and queuing it to the Tx buffer (Tx FIFO))
>>>
>>> Thats strange, I think the descriptors look okay though. I will
>>> need the registers values (before the lock) and, if possible, the
>>> git bisect output.
>>
>> Attaching the register dump file after the issue observed.
>> Please check once.
>>
> 
> ----->8-----
> 0x112c = 0x0000003F
> 0x11ac = 0x0000003F
> 0x122c = 0x0000003F
> 0x12ac = 0x0000003F
> 
> 0x1130 = 0x0000003F
> 0x11b0 = 0x0000003F
> 0x1230 = 0x0000003F
> 0x12b0 = 0x0000003F
> ----->8-----
> 
> This can't be right, it should be DMA_{RX/TX}_SIZE - 1 = 511. Did
> you change these values in the code?
> 

Yes. I have changed the descriptor length to 64 - so that searching for 
the current descriptor status would be easy.

-- 
Thanks,
Bhadram.

^ permalink raw reply

* [PATCH net-next] net: sched: shrink struct Qdisc
From: Paolo Abeni @ 2018-05-25 14:28 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko

The struct Qdisc has a lot of holes, especially after commit
a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback"),
which as a side effect, moved the fields just after 'busylock'
on a new cacheline.

Since both 'padded' and 'refcnt' are not updated frequently, and
there is a hole before 'gso_skb', we can move such fields there,
saving a cacheline without any performance side effect.

Before this commit:

pahole -C Qdisc net/sche/sch_generic.o
	# ...
        /* size: 384, cachelines: 6, members: 25 */
        /* sum members: 236, holes: 3, sum holes: 92 */
        /* padding: 56 */

After this commit:
pahole -C Qdisc net/sche/sch_generic.o
	# ...
	/* size: 320, cachelines: 5, members: 25 */
	/* sum members: 236, holes: 2, sum holes: 28 */
	/* padding: 56 */

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/sch_generic.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 98c10a28cd01..827a3711dc68 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -85,6 +85,8 @@ struct Qdisc {
 	struct net_rate_estimator __rcu *rate_est;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
+	int			padded;
+	refcount_t		refcnt;
 
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
@@ -97,8 +99,6 @@ struct Qdisc {
 	unsigned long		state;
 	struct Qdisc            *next_sched;
 	struct sk_buff_head	skb_bad_txq;
-	int			padded;
-	refcount_t		refcnt;
 
 	spinlock_t		busylock ____cacheline_aligned_in_smp;
 	spinlock_t		seqlock;
-- 
2.17.0

^ permalink raw reply related

* hello
From: las8ms @ 2018-05-25 14:28 UTC (permalink / raw)


we came across your e-mail contact prior a private search while in need 
of help

Get your bank atm card, I prepare card that worth huge amount of money
loaded in the card is $3,000,000m us dollars, with this card you can 
make
maximum withdrawals of $500 us dollars daily from any atm machine in
your country. get back to us by this email, mathewthomson746@gmail,com

Mr. mr. mathewthomson,
Manager  director atm card department

^ permalink raw reply

* Re: STMMAC driver with TSO enabled issue
From: Jose Abreu @ 2018-05-25 14:32 UTC (permalink / raw)
  To: Bhadram Varka, Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <94cda7c4-127c-cae1-e51e-8853224065e2@nvidia.com>

On 25-05-2018 15:25, Bhadram Varka wrote:
> Hi Jose,
>
> On 5/25/2018 7:35 PM, Jose Abreu wrote:
>> Hi Bhadram,
>>
>> On 25-05-2018 05:41, Bhadram Varka wrote:
>>> Hi Jose,
>>>
>>> On 5/24/2018 3:01 PM, Jose Abreu wrote:
>>>> Hi Bhadram,
>>>>
>>>> On 24-05-2018 06:58, Bhadram Varka wrote:
>>>>>
>>>>> After some time if check Tx descriptor status - then I see
>>>>> only
>>>>> below
>>>>>
>>>>> [..]
>>>>> [85788.286730] 027 [0x827951b0]: 0xf854f000 0x0 0x16d8
>>>>> 0x90000000
>>>>>
>>>>> index 025 and 026 descriptors processed but not index 027.
>>>>>
>>>>> At this stage Tx DMA is always in below state -
>>>>>
>>>>> ■ 3'b011: Running (Reading Data from system memory
>>>>> buffer and queuing it to the Tx buffer (Tx FIFO))
>>>>
>>>> Thats strange, I think the descriptors look okay though. I will
>>>> need the registers values (before the lock) and, if
>>>> possible, the
>>>> git bisect output.
>>>
>>> Attaching the register dump file after the issue observed.
>>> Please check once.
>>>
>>
>> ----->8-----
>> 0x112c = 0x0000003F
>> 0x11ac = 0x0000003F
>> 0x122c = 0x0000003F
>> 0x12ac = 0x0000003F
>>
>> 0x1130 = 0x0000003F
>> 0x11b0 = 0x0000003F
>> 0x1230 = 0x0000003F
>> 0x12b0 = 0x0000003F
>> ----->8-----
>>
>> This can't be right, it should be DMA_{RX/TX}_SIZE - 1 = 511. Did
>> you change these values in the code?
>>
>
> Yes. I have changed the descriptor length to 64 - so that
> searching for the current descriptor status would be easy.

Ok, it shouldn't impact anything. The only thing I'm remembering
now is that you can have TSO not enabled in all DMA channels (HW
configuration allows this). Please check if TSO in single-queue
works.

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH net-next] net:sched: add action inheritdsfield to skbmod
From: Marcelo Ricardo Leitner @ 2018-05-25 14:34 UTC (permalink / raw)
  To: Fu, Qiaobin
  Cc: davem@davemloft.net, netdev@vger.kernel.org, jhs@mojatatu.com,
	Michel Machado
In-Reply-To: <C7516012-947F-4485-B5DA-DD9AD45427F8@bu.edu>

On Fri, May 25, 2018 at 05:45:03AM +0000, Fu, Qiaobin wrote:
> Hi Marcelo,
> 
> Thanks for pointing out these style issues. Below is the updated version:

Hi Qiaobin,

Looks good to me. Now you have to submit it like you submitted the
original patch, but add the version tag to the summary. Like '[PATCH
v2 net-next] ....'
And without the text before the changelog.

Thanks.

> 
> ---
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
> 
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> 
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> 
> Note that the motivation for this patch is found in the following discussion:
> https://www.spinics.net/lists/netdev/msg501061.html
> ---
> 
> diff --git a/include/uapi/linux/tc_act/tc_skbmod.h b/include/uapi/linux/tc_act/tc_skbmod.h
> index 38c072f..0718b48 100644
> --- a/include/uapi/linux/tc_act/tc_skbmod.h
> +++ b/include/uapi/linux/tc_act/tc_skbmod.h
> @@ -19,6 +19,7 @@
>  #define SKBMOD_F_SMAC	0x2
>  #define SKBMOD_F_ETYPE	0x4
>  #define SKBMOD_F_SWAPMAC 0x8
> +#define SKBMOD_F_INHERITDSFIELD 0x10
>  
>  struct tc_skbmod {
>  	tc_gen;
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index ad050d7..e2082f6 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -16,6 +16,9 @@
>  #include <linux/rtnetlink.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> +#include <net/ip.h>
> +#include <net/ipv6.h>
> +#include <net/dsfield.h>
>  
>  #include <linux/tc_act/tc_skbmod.h>
>  #include <net/tc_act/tc_skbmod.h>
> @@ -72,6 +75,26 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
>  		ether_addr_copy(eth_hdr(skb)->h_source, (u8 *)tmpaddr);
>  	}
>  
> +	if (flags & SKBMOD_F_INHERITDSFIELD) {
> +		int wlen = skb_network_offset(skb);
> +
> +		switch (tc_skb_protocol(skb)) {
> +		case htons(ETH_P_IP):
> +			wlen += sizeof(struct iphdr);
> +			if (!pskb_may_pull(skb, wlen))
> +				return TC_ACT_SHOT;
> +			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> +			break;
> +
> +		case htons(ETH_P_IPV6):
> +			wlen += sizeof(struct ipv6hdr);
> +			if (!pskb_may_pull(skb, wlen))
> +				return TC_ACT_SHOT;
> +			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> +			break;
> +		}
> +	}
> +
>  	return action;
>  }
>  
> @@ -127,6 +150,9 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
>  	if (parm->flags & SKBMOD_F_SWAPMAC)
>  		lflags = SKBMOD_F_SWAPMAC;
>  
> +	if (parm->flags & SKBMOD_F_INHERITDSFIELD)
> +		lflags |= SKBMOD_F_INHERITDSFIELD;
> +
>  	exists = tcf_idr_check(tn, parm->index, a, bind);
>  	if (exists && bind)
>  		return 0;
> 
> > On May 23, 2018, at 2:06 PM, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> > 
> > Hi,
> > 
> > Some style fixes:
> > 
> > On Thu, May 17, 2018 at 07:33:08PM +0000, Fu, Qiaobin wrote:
> >> net/sched: add action inheritdsfield to skbmod
> > 
> > This extra line above should not be here.
> > 
> >> 
> >> The new action inheritdsfield copies the field DS of
> >> IPv4 and IPv6 packets into skb->prioriry. This enables
> >                              typo -----^
> > 
> >> later classification of packets based on the DS field.
> >> 
> >> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> >> 
> >> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> >> Reviewed-by: Michel Machado <michel@digirati.com.br>
> >> ---
> >> 
> >> Note that the motivation for this patch is found in the following discussion:
> >> https://www.spinics.net/lists/netdev/msg501061.html
> >> ---
> >> 
> >> diff --git a/include/uapi/linux/tc_act/tc_skbmod.h b/include/uapi/linux/tc_act/tc_skbmod.h
> >> index 38c072f..0718b48 100644
> >> --- a/include/uapi/linux/tc_act/tc_skbmod.h
> >> +++ b/include/uapi/linux/tc_act/tc_skbmod.h
> >> @@ -19,6 +19,7 @@
> >> #define SKBMOD_F_SMAC	0x2
> >> #define SKBMOD_F_ETYPE	0x4
> >> #define SKBMOD_F_SWAPMAC 0x8
> >> +#define SKBMOD_F_INHERITDSFIELD 0x10
> >> 
> >> struct tc_skbmod {
> >> 	tc_gen;
> >> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> >> index ad050d7..21d5bec 100644
> >> --- a/net/sched/act_skbmod.c
> >> +++ b/net/sched/act_skbmod.c
> >> @@ -16,6 +16,9 @@
> >> #include <linux/rtnetlink.h>
> >> #include <net/netlink.h>
> >> #include <net/pkt_sched.h>
> >> +#include <net/ip.h>
> >> +#include <net/ipv6.h>
> >> +#include <net/dsfield.h>
> >> 
> >> #include <linux/tc_act/tc_skbmod.h>
> >> #include <net/tc_act/tc_skbmod.h>
> >> @@ -72,6 +75,25 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
> >> 		ether_addr_copy(eth_hdr(skb)->h_source, (u8 *)tmpaddr);
> >> 	}
> >> 
> >> +	if (flags & SKBMOD_F_INHERITDSFIELD) {
> >> +		int wlen = skb_network_offset(skb);
> > 
> > You need a blank line here, between var declaration and the rest.
> > 
> >> +		switch (tc_skb_protocol(skb)) {
> >> +		case htons(ETH_P_IP):
> >> +			wlen += sizeof(struct iphdr);
> >> +			if (!pskb_may_pull(skb, wlen))
> >> +				return TC_ACT_SHOT;
> >> +			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> >> +			break;
> >> +
> >> +		case htons(ETH_P_IPV6):
> >> +			wlen += sizeof(struct ipv6hdr);
> >> +			if (!pskb_may_pull(skb, wlen))
> >> +				return TC_ACT_SHOT;
> >> +			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> 	return action;
> >> }
> >> 
> >> @@ -127,6 +149,9 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
> >> 	if (parm->flags & SKBMOD_F_SWAPMAC)
> >> 		lflags = SKBMOD_F_SWAPMAC;
> >> 
> >> +	if (parm->flags & SKBMOD_F_INHERITDSFIELD)
> >> +		lflags |= SKBMOD_F_INHERITDSFIELD;
> >> +
> >> 	exists = tcf_idr_check(tn, parm->index, a, bind);
> >> 	if (exists && bind)
> >> 		return 0;
> 

^ permalink raw reply

* refactor 32-bit dma limit quirks
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi all,

x86 currently has some quirks to force lower dma masks.  They are mostly
useful for certain VIA systems that otherwise corrupt data, but otherwise
don't make much sense given that the modern DMA APIs do the right thing
automatically.

This series dumps a few of these old kernel command lines (including their
not really working version on ia64), and moves the VIA quirk to a flag
in struct device so that it can be apply generically.  This will be needed
to support Xylinx root ports with a similar issue that show up in common
RISC-V boards.

^ permalink raw reply

* [PATCH 1/7] core, dma-direct: add a flag 32-bit dma limits
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180525143512.1466-1-hch-jcswGhMUV9g@public.gmane.org>

Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits
even if the device itself supports more.  Add a single bit flag to
struct device (to be moved into the dma extension once we around it)
to flag such devices and reject larger DMA to them.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 include/linux/device.h | 3 +++
 lib/dma-direct.c       | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 477956990f5e..fa317e45f5e6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -904,6 +904,8 @@ struct dev_links_info {
  * @offline:	Set after successful invocation of bus type's .offline().
  * @of_node_reused: Set if the device-tree node is shared with an ancestor
  *              device.
+ * @dma_32bit_limit: bridge limited to 32bit DMA even if the device itself
+ *		indicates support for a higher limit in the dma_mask field.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -992,6 +994,7 @@ struct device {
 	bool			offline_disabled:1;
 	bool			offline:1;
 	bool			of_node_reused:1;
+	bool			dma_32bit_limit:1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/lib/dma-direct.c b/lib/dma-direct.c
index bbfb229aa067..0151a7b2bc87 100644
--- a/lib/dma-direct.c
+++ b/lib/dma-direct.c
@@ -165,6 +165,12 @@ int dma_direct_supported(struct device *dev, u64 mask)
 	if (mask < DMA_BIT_MASK(32))
 		return 0;
 #endif
+	/*
+	 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
+	 * if the device itself might support it.
+	 */
+	if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
+		return 0;
 	return 1;
 }
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH 2/7] ia64: remove the dead iommu_sac_force variable
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: x86, iommu, linux-kernel, linux-ia64, netdev
In-Reply-To: <20180525143512.1466-1-hch@lst.de>

Looks like copy and paste from x86 that never actually got used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/kernel/pci-dma.c | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index b5df084c0af4..50b6ad282a90 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -18,8 +18,6 @@
 dma_addr_t bad_dma_address __read_mostly;
 EXPORT_SYMBOL(bad_dma_address);
 
-static int iommu_sac_force __read_mostly;
-
 int no_iommu __read_mostly;
 #ifdef CONFIG_IOMMU_DEBUG
 int force_iommu __read_mostly = 1;
@@ -61,23 +59,6 @@ int iommu_dma_supported(struct device *dev, u64 mask)
 	if (mask < DMA_BIT_MASK(24))
 		return 0;
 
-	/* Tell the device to use SAC when IOMMU force is on.  This
-	   allows the driver to use cheaper accesses in some cases.
-
-	   Problem with this is that if we overflow the IOMMU area and
-	   return DAC as fallback address the device may not handle it
-	   correctly.
-
-	   As a special case some controllers have a 39bit address
-	   mode that is as efficient as 32bit (aic79xx). Don't force
-	   SAC for these.  Assume all masks <= 40 bits are of this
-	   type. Normally this doesn't make any difference, but gives
-	   more gentle handling of IOMMU overflow. */
-	if (iommu_sac_force && (mask >= DMA_BIT_MASK(40))) {
-		dev_info(dev, "Force SAC with mask %llx\n", mask);
-		return 0;
-	}
-
 	return 1;
 }
 EXPORT_SYMBOL(iommu_dma_supported);
-- 
2.17.0

^ permalink raw reply related

* [PATCH 3/7] ia64: remove iommu_dma_supported
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180525143512.1466-1-hch-jcswGhMUV9g@public.gmane.org>

The generic dma_direct_supported helper already used by intel-iommu on
x86 does a better job than the ia64 reimplementation.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 arch/ia64/kernel/pci-dma.c  | 13 -------------
 drivers/iommu/intel-iommu.c |  2 --
 2 files changed, 15 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 50b6ad282a90..3c2884bef3d4 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -51,18 +51,6 @@ iommu_dma_init(void)
 	return;
 }
 
-int iommu_dma_supported(struct device *dev, u64 mask)
-{
-	/* Copied from i386. Doesn't make much sense, because it will
-	   only work for pci_alloc_coherent.
-	   The caller just has to use GFP_DMA in this case. */
-	if (mask < DMA_BIT_MASK(24))
-		return 0;
-
-	return 1;
-}
-EXPORT_SYMBOL(iommu_dma_supported);
-
 void __init pci_iommu_alloc(void)
 {
 	dma_ops = &intel_dma_ops;
@@ -71,7 +59,6 @@ void __init pci_iommu_alloc(void)
 	intel_dma_ops.sync_sg_for_cpu = machvec_dma_sync_sg;
 	intel_dma_ops.sync_single_for_device = machvec_dma_sync_single;
 	intel_dma_ops.sync_sg_for_device = machvec_dma_sync_sg;
-	intel_dma_ops.dma_supported = iommu_dma_supported;
 
 	/*
 	 * The order of these functions is important for
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 749d8f235346..5e0bef3754d1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3841,9 +3841,7 @@ const struct dma_map_ops intel_dma_ops = {
 	.map_page = intel_map_page,
 	.unmap_page = intel_unmap_page,
 	.mapping_error = intel_mapping_error,
-#ifdef CONFIG_X86
 	.dma_supported = dma_direct_supported,
-#endif
 };
 
 static inline int iommu_domain_cache_init(void)
-- 
2.17.0

^ permalink raw reply related

* [PATCH 4/7] x86: remove a stray reference to pci-nommu.c
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180525143512.1466-1-hch-jcswGhMUV9g@public.gmane.org>

This is just the minimal workaround.  The file file is mostly either stale
and/or duplicative of Documentation/admin-guide/kernel-parameters.txt,
but that is much more work than I'm willing to do right now.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 Documentation/x86/x86_64/boot-options.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index b297c48389b9..153b3a57fba2 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -187,9 +187,9 @@ PCI
 
 IOMMU (input/output memory management unit)
 
- Currently four x86-64 PCI-DMA mapping implementations exist:
+ Multiple x86-64 PCI-DMA mapping implementations exist, for example:
 
-   1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
+   1. <lib/dma-direct.c>: use no hardware/software IOMMU at all
       (e.g. because you have < 3 GB memory).
       Kernel boot message: "PCI-DMA: Disabling IOMMU"
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH 5/7] x86: remove the experimental forcesac boot option
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: x86, iommu, linux-kernel, linux-ia64, netdev
In-Reply-To: <20180525143512.1466-1-hch@lst.de>

Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying
the huge overhead of an IOMMU is rather pointless, and this seriously
gets in the way of dma mapping work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../admin-guide/kernel-parameters.txt         |  1 -
 Documentation/x86/x86_64/boot-options.txt     |  4 +---
 arch/x86/kernel/pci-dma.c                     | 21 +------------------
 drivers/net/ethernet/sfc/efx.c                |  5 ++---
 drivers/net/ethernet/sfc/falcon/efx.c         |  5 ++---
 5 files changed, 6 insertions(+), 30 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2040d46f095..cc0ac035b8fe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1705,7 +1705,6 @@
 		nopanic
 		merge
 		nomerge
-		forcesac
 		soft
 		pt		[x86, IA-64]
 		nobypass	[PPC/POWERNV]
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 153b3a57fba2..341588ec4e29 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -208,7 +208,7 @@ IOMMU (input/output memory management unit)
       Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
 
  iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
-	[,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
+	[,memaper[=<order>]][,merge][,fullflush][,nomerge]
 	[,noaperture][,calgary]
 
   General iommu options:
@@ -235,8 +235,6 @@ IOMMU (input/output memory management unit)
                        (experimental).
     nomerge            Don't do scatter-gather (SG) merging.
     noaperture         Ask the IOMMU not to touch the aperture for AGP.
-    forcesac           Force single-address cycle (SAC) mode for masks <40bits
-                       (experimental).
     noagp              Don't initialize the AGP driver and use full aperture.
     allowdac           Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
                        DAC is used with 32-bit PCI to push a 64-bit address in
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 77625b60a510..91dff954b745 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -20,8 +20,6 @@ static int forbid_dac __read_mostly;
 const struct dma_map_ops *dma_ops = &dma_direct_ops;
 EXPORT_SYMBOL(dma_ops);
 
-static int iommu_sac_force __read_mostly;
-
 #ifdef CONFIG_IOMMU_DEBUG
 int panic_on_overflow __read_mostly = 1;
 int force_iommu __read_mostly = 1;
@@ -125,7 +123,7 @@ static __init int iommu_setup(char *p)
 		if (!strncmp(p, "nomerge", 7))
 			iommu_merge = 0;
 		if (!strncmp(p, "forcesac", 8))
-			iommu_sac_force = 1;
+			pr_warn("forcesac option ignored.\n");
 		if (!strncmp(p, "allowdac", 8))
 			forbid_dac = 0;
 		if (!strncmp(p, "nodac", 5))
@@ -165,23 +163,6 @@ int arch_dma_supported(struct device *dev, u64 mask)
 	}
 #endif
 
-	/* Tell the device to use SAC when IOMMU force is on.  This
-	   allows the driver to use cheaper accesses in some cases.
-
-	   Problem with this is that if we overflow the IOMMU area and
-	   return DAC as fallback address the device may not handle it
-	   correctly.
-
-	   As a special case some controllers have a 39bit address
-	   mode that is as efficient as 32bit (aic79xx). Don't force
-	   SAC for these.  Assume all masks <= 40 bits are of this
-	   type. Normally this doesn't make any difference, but gives
-	   more gentle handling of IOMMU overflow. */
-	if (iommu_sac_force && (mask >= DMA_BIT_MASK(40))) {
-		dev_info(dev, "Force SAC with mask %Lx\n", mask);
-		return 0;
-	}
-
 	return 1;
 }
 EXPORT_SYMBOL(arch_dma_supported);
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index a4ebd8715494..661828e8fdcf 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1289,9 +1289,8 @@ static int efx_init_io(struct efx_nic *efx)
 
 	pci_set_master(pci_dev);
 
-	/* Set the PCI DMA mask.  Try all possibilities from our
-	 * genuine mask down to 32 bits, because some architectures
-	 * (e.g. x86_64 with iommu_sac_force set) will allow 40 bit
+	/* Set the PCI DMA mask.  Try all possibilities from our genuine mask
+	 * down to 32 bits, because some architectures will allow 40 bit
 	 * masks event though they reject 46 bit masks.
 	 */
 	while (dma_mask > 0x7fffffffUL) {
diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
index 3d6c91e96589..dd5530a4f8c8 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -1242,9 +1242,8 @@ static int ef4_init_io(struct ef4_nic *efx)
 
 	pci_set_master(pci_dev);
 
-	/* Set the PCI DMA mask.  Try all possibilities from our
-	 * genuine mask down to 32 bits, because some architectures
-	 * (e.g. x86_64 with iommu_sac_force set) will allow 40 bit
+	/* Set the PCI DMA mask.  Try all possibilities from our genuine mask
+	 * down to 32 bits, because some architectures will allow 40 bit
 	 * masks event though they reject 46 bit masks.
 	 */
 	while (dma_mask > 0x7fffffffUL) {
-- 
2.17.0

^ permalink raw reply related

* [PATCH 6/7] x86: remove the explicit nodac and allowdac option
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: x86, iommu, linux-kernel, linux-ia64, netdev
In-Reply-To: <20180525143512.1466-1-hch@lst.de>

This is something drivers should decide (modulo chipset quirks like
for VIA), which as far as I can tell is how things have been handled
for the last 15 years.

Note that we keep the usedac option for now, as it is used in the wild
to override the too generic VIA quirk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/x86/x86_64/boot-options.txt | 5 -----
 arch/x86/kernel/pci-dma.c                 | 4 ++--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 341588ec4e29..8d109ef67ab6 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -236,11 +236,6 @@ IOMMU (input/output memory management unit)
     nomerge            Don't do scatter-gather (SG) merging.
     noaperture         Ask the IOMMU not to touch the aperture for AGP.
     noagp              Don't initialize the AGP driver and use full aperture.
-    allowdac           Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
-                       DAC is used with 32-bit PCI to push a 64-bit address in
-                       two cycles. When off all DMA over >4GB is forced through
-                       an IOMMU or software bounce buffering.
-    nodac              Forbid DAC mode, i.e. DMA >4GB.
     panic              Always panic when IOMMU overflows.
     calgary            Use the Calgary IOMMU if it is available
 
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 91dff954b745..b5cbef974bd1 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -125,9 +125,9 @@ static __init int iommu_setup(char *p)
 		if (!strncmp(p, "forcesac", 8))
 			pr_warn("forcesac option ignored.\n");
 		if (!strncmp(p, "allowdac", 8))
-			forbid_dac = 0;
+			pr_warn("allowdac option ignored.\n");
 		if (!strncmp(p, "nodac", 5))
-			forbid_dac = 1;
+			pr_warn("nodac option ignored.\n");
 		if (!strncmp(p, "usedac", 6)) {
 			forbid_dac = -1;
 			return 1;
-- 
2.17.0

^ permalink raw reply related

* [PATCH 7/7] x86: switch the VIA 32-bit DMA quirk to use the struct device flag
From: Christoph Hellwig @ 2018-05-25 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu,
	Greg Kroah-Hartman
  Cc: x86, iommu, linux-kernel, linux-ia64, netdev
In-Reply-To: <20180525143512.1466-1-hch@lst.de>

Instead of globally disabling > 32bit DMA using the arch_dma_supported
hook walk the PCI bus under the actually affected bridge and mark every
device with the dma_32bit_limit flag.  This also gets rid of the
arch_dma_supported hook entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/include/asm/dma-mapping.h |  3 ---
 arch/x86/kernel/pci-dma.c          | 27 ++++++++++-----------------
 include/linux/dma-mapping.h        | 11 -----------
 3 files changed, 10 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 89ce4bfd241f..eb4e1352e403 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -30,9 +30,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 	return dma_ops;
 }
 
-int arch_dma_supported(struct device *dev, u64 mask);
-#define arch_dma_supported arch_dma_supported
-
 bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp);
 #define arch_dma_alloc_attrs arch_dma_alloc_attrs
 
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index b5cbef974bd1..0d6fd0d1c14f 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -15,7 +15,7 @@
 #include <asm/x86_init.h>
 #include <asm/iommu_table.h>
 
-static int forbid_dac __read_mostly;
+static bool disable_dac_quirk __read_mostly;
 
 const struct dma_map_ops *dma_ops = &dma_direct_ops;
 EXPORT_SYMBOL(dma_ops);
@@ -129,7 +129,7 @@ static __init int iommu_setup(char *p)
 		if (!strncmp(p, "nodac", 5))
 			pr_warn("nodac option ignored.\n");
 		if (!strncmp(p, "usedac", 6)) {
-			forbid_dac = -1;
+			disable_dac_quirk = true;
 			return 1;
 		}
 #ifdef CONFIG_SWIOTLB
@@ -154,19 +154,6 @@ static __init int iommu_setup(char *p)
 }
 early_param("iommu", iommu_setup);
 
-int arch_dma_supported(struct device *dev, u64 mask)
-{
-#ifdef CONFIG_PCI
-	if (mask > 0xffffffff && forbid_dac > 0) {
-		dev_info(dev, "PCI: Disallowing DAC for device\n");
-		return 0;
-	}
-#endif
-
-	return 1;
-}
-EXPORT_SYMBOL(arch_dma_supported);
-
 static int __init pci_iommu_init(void)
 {
 	struct iommu_table_entry *p;
@@ -190,11 +177,17 @@ rootfs_initcall(pci_iommu_init);
 #ifdef CONFIG_PCI
 /* Many VIA bridges seem to corrupt data for DAC. Disable it here */
 
+static int via_no_dac_cb(struct pci_dev *pdev, void *data)
+{
+	pdev->dev.dma_32bit_limit = true;
+	return 0;
+}
+
 static void via_no_dac(struct pci_dev *dev)
 {
-	if (forbid_dac == 0) {
+	if (!disable_dac_quirk) {
 		dev_info(&dev->dev, "disabling DAC on VIA PCI bridge\n");
-		forbid_dac = 1;
+		pci_walk_bus(dev->subordinate, via_no_dac_cb, NULL);
 	}
 }
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0f589e..0249bce7c5e7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -572,14 +572,6 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 	return 0;
 }
 
-/*
- * This is a hack for the legacy x86 forbid_dac and iommu_sac_force. Please
- * don't use this in new code.
- */
-#ifndef arch_dma_supported
-#define arch_dma_supported(dev, mask)	(1)
-#endif
-
 static inline void dma_check_mask(struct device *dev, u64 mask)
 {
 	if (sme_active() && (mask < (((u64)sme_get_me_mask() << 1) - 1)))
@@ -592,9 +584,6 @@ static inline int dma_supported(struct device *dev, u64 mask)
 
 	if (!ops)
 		return 0;
-	if (!arch_dma_supported(dev, mask))
-		return 0;
-
 	if (!ops->dma_supported)
 		return 1;
 	return ops->dma_supported(dev, mask);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] net: stmmac: Use mutex instead of spinlock
From: Bhadram Varka @ 2018-05-25 14:36 UTC (permalink / raw)
  To: Thierry Reding, David S. Miller
  Cc: Giuseppe Cavallaro, Alexandre Torgue, Jon Hunter, netdev,
	linux-kernel
In-Reply-To: <20180524140907.24197-1-thierry.reding@gmail.com>

Hi,

On 5/24/2018 7:39 PM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Some drivers, such as DWC EQOS on Tegra, need to perform operations that
> can sleep under this lock (clk_set_rate() in tegra_eqos_fix_speed()) for
> proper operation. Since there is no need for this lock to be a spinlock,
> convert it to a mutex instead.
> 
> Fixes: e6ea2d16fc61 ("net: stmmac: dwc-qos: Add Tegra186 support")
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---

Tested on P3310 Tegra186 platform.

Tested-by: Bhadram Varka <vbhadram@nvidia.com>

-- 
Bhadram

^ permalink raw reply

* Re: [PATCH net-next] net: sched: shrink struct Qdisc
From: Jiri Pirko @ 2018-05-25 14:41 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Jamal Hadi Salim, Cong Wang
In-Reply-To: <607936fe39bf1e78ca8b520e2ef25b7b326a767f.1527258390.git.pabeni@redhat.com>

Fri, May 25, 2018 at 04:28:44PM CEST, pabeni@redhat.com wrote:
>The struct Qdisc has a lot of holes, especially after commit
>a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback"),
>which as a side effect, moved the fields just after 'busylock'
>on a new cacheline.
>
>Since both 'padded' and 'refcnt' are not updated frequently, and
>there is a hole before 'gso_skb', we can move such fields there,
>saving a cacheline without any performance side effect.
>
>Before this commit:
>
>pahole -C Qdisc net/sche/sch_generic.o
>	# ...
>        /* size: 384, cachelines: 6, members: 25 */
>        /* sum members: 236, holes: 3, sum holes: 92 */
>        /* padding: 56 */
>
>After this commit:
>pahole -C Qdisc net/sche/sch_generic.o
>	# ...
>	/* size: 320, cachelines: 5, members: 25 */
>	/* sum members: 236, holes: 2, sum holes: 28 */
>	/* padding: 56 */
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH 1/7] core, dma-direct: add a flag 32-bit dma limits
From: Greg Kroah-Hartman @ 2018-05-25 14:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Fenghua Yu, Tony Luck, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ingo Molnar,
	Thomas Gleixner
In-Reply-To: <20180525143512.1466-2-hch-jcswGhMUV9g@public.gmane.org>

On Fri, May 25, 2018 at 04:35:06PM +0200, Christoph Hellwig wrote:
> Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits
> even if the device itself supports more.  Add a single bit flag to
> struct device (to be moved into the dma extension once we around it)

"once we around it"?  I don't understand, sorry.

> to flag such devices and reject larger DMA to them.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  include/linux/device.h | 3 +++
>  lib/dma-direct.c       | 6 ++++++
>  2 files changed, 9 insertions(+)

For the patch, no objection from me:

Reviewed-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

^ permalink raw reply

* Re: [PATCH] drivers/net/phy/micrel: Fix for PHY KSZ8061 errrata: Potential link-up failure when Ethernet cable is connected slowly
From: Florian Fainelli @ 2018-05-25 15:17 UTC (permalink / raw)
  To: Alexander Onnasch; +Cc: Andrew Lunn, netdev, linux-kernel
In-Reply-To: <1527251853-22218-1-git-send-email-alexander.onnasch@landisgyr.com>



On 05/25/2018 05:37 AM, Alexander Onnasch wrote:
> Signed-off-by: Alexander Onnasch <alexander.onnasch@landisgyr.com>

You would want to make the commit subject shorter (ideally capped
somewhere around 72 characters) and provide a commit message which
explains the issue and why the workaround is effective.

Thank you!

[snip]

> 
> P PLEASE CONSIDER OUR ENVIRONMENT BEFORE PRINTING THIS EMAIL.
> 
> This e-mail (including any attachments) is confidential and may be legally privileged. If you are not an intended recipient or an authorized representative of an intended recipient, you are prohibited from using, copying or distributing the information in this e-mail or its attachments. If you have received this e-mail in error, please notify the sender immediately by return e-mail and delete all copies of this message and any attachments. Thank you.

You need to remove that footer otherwise we cannot be accepting your patch.
-- 
Florian

^ permalink raw reply

* Re: [PATCH iproute2] ip link: Do not call ll_name_to_index when creating a new link
From: Stephen Hemminger @ 2018-05-25 15:21 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <054faf87-f311-61a1-3c9a-e7d20cac8279@gmail.com>

On Fri, 18 May 2018 17:40:05 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 5/18/18 4:08 PM, Stephen Hemminger wrote:
> > 
> > What about just pushing the lookup down to the leaf functions that need it?
> >   
> 
> That should work as well. You want to re-send a formal patch?
> 

I just pushed it up as a formal patch (with your text).

^ permalink raw reply

* Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Alban Crequy @ 2018-05-25 15:21 UTC (permalink / raw)
  To: Y Song
  Cc: Iago López Galeiras, netdev, Linux Containers, LKML,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Alexei Starovoitov
In-Reply-To: <CAH3MdRVdfw52atavT3KL8MpPw7zDM_hR6aUcqDP1PogLn_sH+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, May 23, 2018 at 4:34 AM Y Song <ys114321-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> I did a quick prototyping and the above interface seems working fine.

Thanks! I gave your kernel patch & userspace program a try and it works for
me on cgroup-v2.

Also, I found out how to get my containers to use both cgroup-v1 and
cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
'--exec-opt native.cgroupdriver=systemd' option). So I should be able to
use the BPF helper function without having to add support for all the
cgroup-v1 hierarchies.

> The kernel change:
> ===============

> [yhs@localhost bpf-next]$ git diff
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 97446bbe2ca5..669b7383fddb 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1976,7 +1976,8 @@ union bpf_attr {
>          FN(fib_lookup),                 \
>          FN(sock_hash_update),           \
>          FN(msg_redirect_hash),          \
> -       FN(sk_redirect_hash),
> +       FN(sk_redirect_hash),           \
> +       FN(get_current_cgroup_id),

>   /* integer value in 'imm' field of BPF_CALL instruction selects which
helper
>    * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ce2cbbff27e4..e11e3298f911 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
> bpf_current_task_under_cgroup_proto = {
>          .arg2_type      = ARG_ANYTHING,
>   };

> +BPF_CALL_0(bpf_get_current_cgroup_id)
> +{
> +       struct cgroup *cgrp = task_dfl_cgroup(current);
> +       if (!cgrp)
> +               return -EINVAL;
> +
> +       return cgrp->kn->id.id;
> +}
> +
> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
> +       .func           = bpf_get_current_cgroup_id,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +};
> +
>   BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
>             const void *, unsafe_ptr)
>   {
> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
> struct bpf_prog *prog)
>                  return &bpf_get_prandom_u32_proto;
>          case BPF_FUNC_probe_read_str:
>                  return &bpf_probe_read_str_proto;
> +       case BPF_FUNC_get_current_cgroup_id:
> +               return &bpf_get_current_cgroup_id_proto;
>          default:
>                  return NULL;
>          }

> The following program can be used to print out a cgroup id given a cgroup
path.
> [yhs@localhost cg]$ cat get_cgroup_id.c
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>

> int main(int argc, char **argv)
> {
>      int dirfd, err, flags, mount_id, fhsize;
>      struct file_handle *fhp;
>      char *pathname;

>      if (argc != 2) {
>          printf("usage: %s <cgroup_path>\n", argv[0]);
>          return 1;
>      }

>      pathname = argv[1];
>      dirfd = AT_FDCWD;
>      flags = 0;

>      fhsize = sizeof(*fhp);
>      fhp = malloc(fhsize);
>      if (!fhp)
>          return 1;

>      err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>      if (err >= 0) {
>          printf("error\n");
>          return 1;
>      }

>      fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
>      fhp = realloc(fhp, fhsize);
>      if (!fhp)
>          return 1;

>      err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>      if (err < 0)
>          perror("name_to_handle_at");
>      else {
>          int i;

>          printf("dir = %s, mount_id = %d\n", pathname, mount_id);
>          printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
>              fhp->handle_type);
>          if (fhp->handle_bytes != 8)
>              return 1;

>          printf("cgroup_id = 0x%llx\n", *(unsigned long long
*)fhp->f_handle);
>      }

>      return 0;
> }
> [yhs@localhost cg]$

> Given a cgroup path, the user can get cgroup_id and use it in their bpf
> program for filtering purpose.

> I run a simple program t.c
>     int main() { while(1) sleep(1); return 0; }
> in the cgroup v2 directory /home/yhs/tmp/yhs
>     none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)

> $ ./get_cgroup_id /home/yhs/tmp/yhs
> dir = /home/yhs/tmp/yhs, mount_id = 124
> handle_bytes = 8, handle_type = 1
> cgroup_id = 0x1000006b2

> // the below command to get cgroup_id from the kernel for the
> // process compiled with t.c and ran under /home/yhs/tmp/yhs:
> $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
> PID     TID     COMM            FUNC             -
> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
> ^C[yhs@localhost tools]$

> The kernel and user space cgid matches. Will provide a
> formal patch later.




> On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114321-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
> > <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
> >>>
> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
> >>> +{
> >>> +     // TODO: pick the correct hierarchy instead of the mem
controller
> >>> +     struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
> >>> +
> >>> +     if (unlikely(!cgrp))
> >>> +             return -EINVAL;
> >>> +     if (unlikely(hierarchy))
> >>> +             return -EINVAL;
> >>> +     if (unlikely(flags))
> >>> +             return -EINVAL;
> >>> +
> >>> +     return cgrp->kn->id.ino;
> >>
> >> ino only is not enough to identify cgroup. It needs generation number
too.
> >> I don't quite see how hierarchy and flags can be used in the future.
> >> Also why limit it to memcg?
> >>
> >> How about something like this instead:
> >>
> >> BPF_CALL_2(bpf_get_current_cgroup_id)
> >> {
> >>         struct cgroup *cgrp = task_dfl_cgroup(current);
> >>
> >>         return cgrp->kn->id.id;
> >> }
> >> The user space can use fhandle api to get the same 64-bit id.
> >
> > I think this should work. This will also be useful to bcc as user
> > space can encode desired id
> > in the bpf program and compared that id to the current cgroup id, so we
can have
> > cgroup level tracing (esp. stat collection) support. To cope with
> > cgroup hierarchy, user can use
> > cgroup-array based approach or explicitly compare against multiple
cgroup id's.

^ permalink raw reply

* [PATCH v2 net-next] net: stmmac: Add PPS and Flexible PPS support
From: Jose Abreu @ 2018-05-25 15:32 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue, Richard Cochran

This adds support for PPS output and Flexible PPS (which is equivalent
to per_out output of PTP subsystem).

Tested using an oscilloscope and the following commands:

1) Start PTP4L:
	# ptp4l -A -4 -H -m -i eth0 &
2) Set Flexible PPS frequency:
	# echo <idx> <ts> <tns> <ps> <pns> > /sys/class/ptp/ptpX/period

Where, ts/tns is start time and ps/pns is period time, and ptpX is ptp
of eth0.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Richard Cochran <richardcochran@gmail.com>
---
Changes from v1:
	- Correct kbuild errors in some archs
---
 drivers/net/ethernet/stmicro/stmmac/common.h      |    2 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h      |    1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |    2 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |    2 +
 drivers/net/ethernet/stmicro/stmmac/dwmac5.c      |   68 +++++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/dwmac5.h      |   23 +++++++
 drivers/net/ethernet/stmicro/stmmac/hwif.h        |   10 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |   12 ++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    4 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c  |   50 ++++++++++++++-
 10 files changed, 170 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index a679cb7..78fd0f8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -346,6 +346,8 @@ struct dma_features {
 	/* TX and RX number of queues */
 	unsigned int number_rx_queues;
 	unsigned int number_tx_queues;
+	/* PPS output */
+	unsigned int pps_out_num;
 	/* Alternate (enhanced) DESC mode */
 	unsigned int enh_desc;
 	/* TX and RX FIFO sizes */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 6330a55..eb013d5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -187,6 +187,7 @@ enum power_event {
 #define GMAC_HW_RXFIFOSIZE		GENMASK(4, 0)
 
 /* MAC HW features2 bitmap */
+#define GMAC_HW_FEAT_PPSOUTNUM		GENMASK(26, 24)
 #define GMAC_HW_FEAT_TXCHCNT		GENMASK(21, 18)
 #define GMAC_HW_FEAT_RXCHCNT		GENMASK(15, 12)
 #define GMAC_HW_FEAT_TXQCNT		GENMASK(9, 6)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index a7121a7..d46e784 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -796,6 +796,8 @@ static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x,
 	.safety_feat_irq_status = dwmac5_safety_feat_irq_status,
 	.safety_feat_dump = dwmac5_safety_feat_dump,
 	.rxp_config = dwmac5_rxp_config,
+	.pps_config = dwmac5_pps_config,
+	.flex_pps_config = dwmac5_flex_pps_config,
 };
 
 int dwmac4_setup(struct stmmac_priv *priv)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index bf8e5a1..d37f17c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -373,6 +373,8 @@ static void dwmac4_get_hw_feature(void __iomem *ioaddr,
 		((hw_cap & GMAC_HW_FEAT_RXQCNT) >> 0) + 1;
 	dma_cap->number_tx_queues =
 		((hw_cap & GMAC_HW_FEAT_TXQCNT) >> 6) + 1;
+	/* PPS output */
+	dma_cap->pps_out_num = (hw_cap & GMAC_HW_FEAT_PPSOUTNUM) >> 24;
 
 	/* IEEE 1588-2002 */
 	dma_cap->time_stamp = 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
index b2becb8..d12fa94 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
@@ -8,6 +8,7 @@
 #include "dwmac4.h"
 #include "dwmac5.h"
 #include "stmmac.h"
+#include "stmmac_ptp.h"
 
 struct dwmac5_error_desc {
 	bool valid;
@@ -494,3 +495,70 @@ int dwmac5_rxp_config(void __iomem *ioaddr, struct stmmac_tc_entry *entries,
 	writel(old_val, ioaddr + GMAC_CONFIG);
 	return ret;
 }
+
+int dwmac5_pps_config(void __iomem *ioaddr, bool enable)
+{
+	u32 val = readl(ioaddr + MAC_PPS_CONTROL);
+
+	/* There is no way to disable fixed PPS output so we just reset
+	 * the values to make sure its in fixed PPS mode */
+	val &= ~PPSx_MASK(0);
+	val |= TRGTMODSELx(0, 0x2);
+
+	writel(val, ioaddr + MAC_PPS_CONTROL);
+	return 0;
+}
+
+int dwmac5_flex_pps_config(void __iomem *ioaddr, int index,
+			   struct stmmac_pps_cfg *cfg, bool enable,
+			   u32 sub_second_inc, u32 systime_flags)
+{
+	u32 tnsec = readl(ioaddr + MAC_PPSx_TARGET_TIME_NSEC(index));
+	u32 val = readl(ioaddr + MAC_PPS_CONTROL);
+	u64 period;
+
+	if (!cfg->available)
+		return -EINVAL;
+	if (tnsec & TRGTBUSY0)
+		return -EBUSY;
+	if (!sub_second_inc || !systime_flags)
+		return -EINVAL;
+
+	val &= ~PPSx_MASK(index);
+
+	if (!enable) {
+		val |= PPSCMDx(index, 0x5);
+		writel(val, ioaddr + MAC_PPS_CONTROL);
+		return 0;
+	}
+
+	val |= PPSCMDx(index, 0x2);
+	val |= TRGTMODSELx(index, 0x2);
+	val |= PPSEN0;
+
+	writel(cfg->start.tv_sec, ioaddr + MAC_PPSx_TARGET_TIME_SEC(index));
+
+	if (!(systime_flags & PTP_TCR_TSCTRLSSR))
+		cfg->start.tv_nsec = (cfg->start.tv_nsec * 1000) / 465;
+	writel(cfg->start.tv_nsec, ioaddr + MAC_PPSx_TARGET_TIME_NSEC(index));
+
+	period = cfg->period.tv_sec * 1000000000;
+	period += cfg->period.tv_nsec;
+
+	do_div(period, sub_second_inc);
+
+	if (period <= 1)
+		return -EINVAL;
+
+	writel(period - 1, ioaddr + MAC_PPSx_INTERVAL(index));
+
+	period >>= 1;
+	if (period <= 1)
+		return -EINVAL;
+
+	writel(period - 1, ioaddr + MAC_PPSx_WIDTH(index));
+
+	/* Finally, activate it */
+	writel(val, ioaddr + MAC_PPS_CONTROL);
+	return 0;
+}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
index cc810af..d0a12cf 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
@@ -11,6 +11,25 @@
 #define PRTYEN				BIT(1)
 #define TMOUTEN				BIT(0)
 
+#define MAC_PPS_CONTROL			0x00000b70
+#define PPS_MAXIDX(x)			((((x) + 1) * 8) - 1)
+#define PPS_MINIDX(x)			((x) * 8)
+#define PPSx_MASK(x)			GENMASK(PPS_MAXIDX(x), PPS_MINIDX(x))
+#define MCGRENx(x)			BIT(PPS_MAXIDX(x))
+#define TRGTMODSELx(x, val)		\
+	GENMASK(PPS_MAXIDX(x) - 1, PPS_MAXIDX(x) - 2) & \
+	((val) << (PPS_MAXIDX(x) - 2))
+#define PPSCMDx(x, val)			\
+	GENMASK(PPS_MINIDX(x) + 3, PPS_MINIDX(x)) & \
+	((val) << PPS_MINIDX(x))
+#define PPSEN0				BIT(4)
+#define MAC_PPSx_TARGET_TIME_SEC(x)	(0x00000b80 + ((x) * 0x10))
+#define MAC_PPSx_TARGET_TIME_NSEC(x)	(0x00000b84 + ((x) * 0x10))
+#define TRGTBUSY0			BIT(31)
+#define TTSL0				GENMASK(30, 0)
+#define MAC_PPSx_INTERVAL(x)		(0x00000b88 + ((x) * 0x10))
+#define MAC_PPSx_WIDTH(x)		(0x00000b8c + ((x) * 0x10))
+
 #define MTL_RXP_CONTROL_STATUS		0x00000ca0
 #define RXPI				BIT(31)
 #define NPE				GENMASK(23, 16)
@@ -61,5 +80,9 @@ int dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
 			int index, unsigned long *count, const char **desc);
 int dwmac5_rxp_config(void __iomem *ioaddr, struct stmmac_tc_entry *entries,
 		      unsigned int count);
+int dwmac5_pps_config(void __iomem *ioaddr, bool enable);
+int dwmac5_flex_pps_config(void __iomem *ioaddr, int index,
+			   struct stmmac_pps_cfg *cfg, bool enable,
+			   u32 sub_second_inc, u32 systime_flags);
 
 #endif /* __DWMAC5_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index f499a7f..44ea531 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -241,6 +241,7 @@ struct stmmac_dma_ops {
 struct rgmii_adv;
 struct stmmac_safety_stats;
 struct stmmac_tc_entry;
+struct stmmac_pps_cfg;
 
 /* Helpers to program the MAC core */
 struct stmmac_ops {
@@ -313,6 +314,11 @@ struct stmmac_ops {
 	/* Flexible RX Parser */
 	int (*rxp_config)(void __iomem *ioaddr, struct stmmac_tc_entry *entries,
 			  unsigned int count);
+	/* PPS and Flexible PPS */
+	int (*pps_config)(void __iomem *ioaddr, bool enable);
+	int (*flex_pps_config)(void __iomem *ioaddr, int index,
+			       struct stmmac_pps_cfg *cfg, bool enable,
+			       u32 sub_second_inc, u32 systime_flags);
 };
 
 #define stmmac_core_init(__priv, __args...) \
@@ -379,6 +385,10 @@ struct stmmac_ops {
 	stmmac_do_callback(__priv, mac, safety_feat_dump, __args)
 #define stmmac_rxp_config(__priv, __args...) \
 	stmmac_do_callback(__priv, mac, rxp_config, __args)
+#define stmmac_pps_config(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, pps_config, __args)
+#define stmmac_flex_pps_config(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, flex_pps_config, __args)
 
 /* PTP and HW Timer helpers */
 struct stmmac_hwtimestamp {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 4d425b1..d1a4cb7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -100,6 +100,13 @@ struct stmmac_tc_entry {
 	} __packed val;
 };
 
+#define STMMAC_PPS_MAX		4
+struct stmmac_pps_cfg {
+	bool available;
+	struct timespec64 start;
+	struct timespec64 period;
+};
+
 struct stmmac_priv {
 	/* Frequently used values are kept adjacent for cache effect */
 	u32 tx_count_frames;
@@ -160,6 +167,8 @@ struct stmmac_priv {
 	struct ptp_clock *ptp_clock;
 	struct ptp_clock_info ptp_clock_ops;
 	unsigned int default_addend;
+	u32 sub_second_inc;
+	u32 systime_flags;
 	u32 adv_ts;
 	int use_riwt;
 	int irq_wake;
@@ -181,6 +190,9 @@ struct stmmac_priv {
 	unsigned int tc_entries_max;
 	unsigned int tc_off_max;
 	struct stmmac_tc_entry *tc_entries;
+
+	/* Pulse Per Second output */
+	struct stmmac_pps_cfg pps[STMMAC_PPS_MAX];
 };
 
 enum stmmac_state {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c32de53..14361c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -722,6 +722,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr)
 				priv->plat->has_gmac4, &sec_inc);
 		temp = div_u64(1000000000ULL, sec_inc);
 
+		/* Store sub second increment and flags for later use */
+		priv->sub_second_inc = sec_inc;
+		priv->systime_flags = value;
+
 		/* calculate default added value:
 		 * formula is :
 		 * addend = (2^32)/freq_div_ratio;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index 7d3a5c7..35c6d0c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -140,19 +140,50 @@ static int stmmac_set_time(struct ptp_clock_info *ptp,
 static int stmmac_enable(struct ptp_clock_info *ptp,
 			 struct ptp_clock_request *rq, int on)
 {
-	return -EOPNOTSUPP;
+	struct stmmac_priv *priv =
+	    container_of(ptp, struct stmmac_priv, ptp_clock_ops);
+	struct stmmac_pps_cfg *cfg;
+	int ret = -EOPNOTSUPP;
+	unsigned long flags;
+
+	switch (rq->type) {
+	case PTP_CLK_REQ_PEROUT:
+		cfg = &priv->pps[rq->perout.index];
+
+		cfg->start.tv_sec = rq->perout.start.sec;
+		cfg->start.tv_nsec = rq->perout.start.nsec;
+		cfg->period.tv_sec = rq->perout.period.sec;
+		cfg->period.tv_nsec = rq->perout.period.nsec;
+
+		spin_lock_irqsave(&priv->ptp_lock, flags);
+		ret = stmmac_flex_pps_config(priv, priv->ioaddr,
+					     rq->perout.index, cfg, on,
+					     priv->sub_second_inc,
+					     priv->systime_flags);
+		spin_unlock_irqrestore(&priv->ptp_lock, flags);
+		break;
+	case PTP_CLK_REQ_PPS:
+		spin_lock_irqsave(&priv->ptp_lock, flags);
+		ret = stmmac_pps_config(priv, priv->ioaddr, on);
+		spin_unlock_irqrestore(&priv->ptp_lock, flags);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
 }
 
 /* structure describing a PTP hardware clock */
-static const struct ptp_clock_info stmmac_ptp_clock_ops = {
+static struct ptp_clock_info stmmac_ptp_clock_ops = {
 	.owner = THIS_MODULE,
 	.name = "stmmac_ptp_clock",
 	.max_adj = 62500000,
 	.n_alarm = 0,
 	.n_ext_ts = 0,
-	.n_per_out = 0,
+	.n_per_out = 0, /* will be overwritten in stmmac_ptp_register */
 	.n_pins = 0,
-	.pps = 0,
+	.pps = 0, /* will be overwritten in stmmac_ptp_register */
 	.adjfreq = stmmac_adjust_freq,
 	.adjtime = stmmac_adjust_time,
 	.gettime64 = stmmac_get_time,
@@ -168,6 +199,17 @@ static int stmmac_enable(struct ptp_clock_info *ptp,
  */
 void stmmac_ptp_register(struct stmmac_priv *priv)
 {
+	int i;
+
+	for (i = 0; i < priv->dma_cap.pps_out_num; i++) {
+		if (i >= STMMAC_PPS_MAX)
+			break;
+		priv->pps[i].available = true;
+	}
+
+	stmmac_ptp_clock_ops.pps = priv->dma_cap.pps_out_num > 0;
+	stmmac_ptp_clock_ops.n_per_out = priv->dma_cap.pps_out_num;
+
 	spin_lock_init(&priv->ptp_lock);
 	priv->ptp_clock_ops = stmmac_ptp_clock_ops;
 
-- 
1.7.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox