* Re: [PATCH net 00/13] cls_u32 cleanups and fixes.
From: Al Viro @ 2018-09-09 14:15 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, Cong Wang, Jiri Pirko
In-Reply-To: <5f835727-3f86-1532-889f-d11ae17e16d4@mojatatu.com>
On Sun, Sep 09, 2018 at 08:58:50AM -0400, Jamal Hadi Salim wrote:
>
> Since you have the momentum here: i noticed something
> unusual while i was trying to craft a test that would
> vet some of your changes. This has nothing to do with
> your changes, same happens on my stock debian laptop
> with kernel:
> 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 (2018-08-27)
>
> Looking at git - possibly introduced around the time u32
> lockless was being introduced and maybe even earlier
> than that.
It's always been that way, actually - before that point the old
knode simply got reused, which excluded any chance of changing
n->sel.
> Unfortunately i dont have time to dig
> further.
>
> To reproduce what i am referring to, here's a setup:
>
> $tc filter add dev eth0 parent ffff: protocol ip prio 102 u32 \
> classid 1:2 match ip src 192.168.8.0/8
> $tc filter replace dev eth0 parent ffff: protocol ip prio 102 \
> handle 800:0:800 u32 classid 1:2 match ip src 1.1.0.0/24
>
> u32_change() code path should have allowed changing of the
> keynode.
Umm... Interesting - TCA_U32_SEL is not the only thing that
gets ignored there; TCA_U32_MARK gets the same treatment.
And then there's a lovely question what to do with n->pf -
it's an array of n->sel.nkeys counters, and apparently we
want (at least in common cases) to avoid resetting those.
*If* we declare that ->nkeys mismatch means failure, it's
all relatively easy to implement. Alternatively, we could
declare that selector change means resetting the stats.
Preferences?
^ permalink raw reply
* Re: [lxc-users] Containers and checkpoint/restart micro-conference at LPC2018
From: Lucas Oketch @ 2018-09-09 19:08 UTC (permalink / raw)
To: lucas.oketch-Re5JQEeQqe8AvxtiuMwx3w
Cc: containers-cunTk1MwBs98uUxBSJOaYoYkZiVZrdSR2LY78lusg7I,
Miklos Szeredi, Netdev, overlayfs, LSM List,
lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
linux-fsdevel, Vivek Goyal
In-Reply-To: <CAOQ4uxiQEpofdS97kxnii8LtVW2QiKAGvjjaH0Px-Bj3eHVCFA@mail.gmail.com>
[-- Attachment #1: Type: text/html, Size: 171 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel
^ permalink raw reply
* Re: [PATCH net] qmi_wwan: Support dynamic config on Quectel EP06
From: Kristian Evensen @ 2018-09-09 14:37 UTC (permalink / raw)
To: Bjørn Mork; +Cc: Network Development
In-Reply-To: <87musst6wo.fsf@miraculix.mork.no>
Hei Bjørn,
Thanks for the feedback!
On Sat, Sep 8, 2018 at 4:12 PM Bjørn Mork <bjorn@mork.no> wrote:
> That's annoying, but hardly surprising. They obviously try to make life
> as hard as possible for the drivers. I wonder what the Windows drivers
> do here, if there are any? Or are these modules only used in embedded
> Linux devices?
There are Windows-drivers, but I don't have access to them (or a
machine to test on if I could get them). I would not be surprised if
the modules turn up in Windows-machines, they seems to be quite
popular. EP06/EG06/EM06 all refer to the same module, they just have
different connectors. P is mini-PCIe, G is embedded and M is M.2.
Perhaps the assumption is that Windows-users wont mess around with
these settings? :)
> Could this break if more/other functions are enabled? Are you sure
> there can't be any other type of serial function with 3 endpoints and
> ff/ff/ff?
I forgot to include the listing of all interfaces in my commit, so I
have inserted it at the bottom of this reply. With the current
firmware and interfaces, only QMI has three endpoints and ff/ff/ff. I
guess we just have do as you suggest, wait and refine the approach if
or when something breaks.
Output of lsusb -v for EG06 with all interfaces enabled (I have
checked that EP06 produces the same output):
Bus 002 Device 008: ID 2c7c:0306 Quectel Wireless Solutions Co., Ltd.
EG06/EP06/EM06 LTE-A modem
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 3.00
bDeviceClass 239 Miscellaneous Device
bDeviceSubClass 2 ?
bDeviceProtocol 1 Interface Association
bMaxPacketSize0 9
idVendor 0x2c7c Quectel Wireless Solutions Co., Ltd.
idProduct 0x0306 EG06/EP06/EM06 LTE-A modem
bcdDevice 3.10
iManufacturer 1 Quectel
iProduct 2 LTE-A Module
iSerial 3 c1494706
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 328
bNumInterfaces 6
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xa0
(Bus Powered)
Remote Wakeup
MaxPower 126mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
** UNRECOGNIZED: 05 24 00 10 01
** UNRECOGNIZED: 05 24 01 00 00
** UNRECOGNIZED: 04 24 02 02
** UNRECOGNIZED: 05 24 06 00 00
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x000a 1x 10 bytes
bInterval 9
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
** UNRECOGNIZED: 05 24 00 10 01
** UNRECOGNIZED: 05 24 01 00 00
** UNRECOGNIZED: 04 24 02 02
** UNRECOGNIZED: 05 24 06 00 00
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x85 EP 5 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x000a 1x 10 bytes
bInterval 9
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 3
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
** UNRECOGNIZED: 05 24 00 10 01
** UNRECOGNIZED: 05 24 01 00 00
** UNRECOGNIZED: 04 24 02 02
** UNRECOGNIZED: 05 24 06 00 00
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x87 EP 7 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x000a 1x 10 bytes
bInterval 9
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 4
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x89 EP 9 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0008 1x 8 bytes
bInterval 9
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x88 EP 8 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x05 EP 5 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 5
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 66
bInterfaceProtocol 1
iInterface 4 ADB Interface
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x06 EP 6 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8a EP 10 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 0
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 22
bNumDeviceCaps 2
USB 2.0 Extension Device Capability:
bLength 7
bDescriptorType 16
bDevCapabilityType 2
bmAttributes 0x00000002
Link Power Management (LPM) Supported
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
wSpeedsSupported 0x000f
Device can operate at Low Speed (1Mbps)
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 1
Lowest fully-functional device speed is Full Speed (12Mbps)
bU1DevExitLat 1 micro seconds
bU2DevExitLat 500 micro seconds
Device Status: 0x0000
(Bus Powered)
Order of interfaces is diag, nmea, at, ppp, qmi, adb.
BR,
Kristian
^ permalink raw reply
* Re: [PATCH net-next, net v2] net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
From: David Miller @ 2018-09-09 15:10 UTC (permalink / raw)
To: vakul.garg; +Cc: netdev, borisp, aviadye, davejwatson, doronrk
In-Reply-To: <20180906161140.14710-1-vakul.garg@nxp.com>
From: Vakul Garg <vakul.garg@nxp.com>
Date: Thu, 6 Sep 2018 21:41:40 +0530
> tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
> function sk_alloc_sg(). In case the number of SG entries hit
> MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
> current SG index to '0'. This leads to calling of function
> tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
> kernel crash. To fix this, set the number of SG elements to the number
> of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
> returns -ENOSPC.
>
> Fixes: 3c4d7559159b ("tls: kernel TLS support")
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Applied to 'net' and queued up for -stable.
^ permalink raw reply
* Re: Containers and checkpoint/restart micro-conference at LPC2018
From: James Bottomley @ 2018-09-09 15:30 UTC (permalink / raw)
To: Stéphane Graber, lxc-devel, lxc-users, containers,
linux-security-module, linux-fsdevel, netdev
In-Reply-To: <20180813161015.GD18492@castiana>
[-- Attachment #1: Type: text/plain, Size: 380 bytes --]
On Mon, 2018-08-13 at 12:10 -0400, Stéphane Graber wrote:
> Details can be found here:
>
> https://discuss.linuxcontainers.org/t/containers-micro-conference-a
> t-linux-plumbers-2018/2417
This website was giving a 503 error when I looked.
However, if you want a discussion on the requirements for shiftfs (and
whether we still need it), I'm up for that.
James
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH net 00/13] cls_u32 cleanups and fixes.
From: Al Viro @ 2018-09-09 15:48 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, Cong Wang, Jiri Pirko
In-Reply-To: <20180909141538.GG19965@ZenIV.linux.org.uk>
On Sun, Sep 09, 2018 at 03:15:38PM +0100, Al Viro wrote:
> Umm... Interesting - TCA_U32_SEL is not the only thing that
> gets ignored there; TCA_U32_MARK gets the same treatment.
> And then there's a lovely question what to do with n->pf -
> it's an array of n->sel.nkeys counters, and apparently we
> want (at least in common cases) to avoid resetting those.
>
> *If* we declare that ->nkeys mismatch means failure, it's
> all relatively easy to implement. Alternatively, we could
> declare that selector change means resetting the stats.
> Preferences?
BTW, shouldn't we issue u32_clear_hw_hnode() every time
we destroy an hnode? It's done on u32_delete(), it's
done (for root ht) on u32_destroy(), but it's not done
for any other hnodes when you remove the entire (not shared)
filter. Looks fishy...
^ permalink raw reply
* Re: [PATCH] [RFC v2] Drop all 00-INDEX files from Documentation/
From: Jonathan Corbet @ 2018-09-09 21:12 UTC (permalink / raw)
To: Henrik Austad
Cc: Mark Rutland, linux-mips, linux-fbdev, x86, kvm, linux-doc,
Peter Zijlstra, James Hogan, Will Deacon, dri-devel,
Masahiro Yamada, Jan Kandziora, Paul Mackerras, Henrik Austad,
Pavel Machek, H. Peter Anvin, Evgeniy Polyakov, linux-s390,
Ian Kent, linux-security-module, Paul Moore, Michael Ellerman,
Helge Deller, Radim Krčmář, James E.J. Bottomley,
Ingo Molnar
In-Reply-To: <1536012923-16275-1-git-send-email-henrik@austad.us>
On Tue, 4 Sep 2018 00:15:23 +0200
Henrik Austad <henrik@austad.us> wrote:
> I don't really have an opinion to whether or not we /should/ have 00-INDEX,
> but the above 00-INDEX should either be removed or be kept up to date. If
> we should keep the files, I can try to keep them updated, but I rather not
> if we just want to delete them anyway.
>
> As a starting point, remove all index-files and references to 00-INDEX and
> see where the discussion is going.
Applied, thanks.
jon
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* Re: [PATCH net V2 0/7] bug fixes for ENA Ethernet driver
From: David Miller @ 2018-09-09 16:41 UTC (permalink / raw)
To: netanel
Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, evgenys,
gtzalik
In-Reply-To: <20180909081526.83979-1-netanel@amazon.com>
From: <netanel@amazon.com>
Date: Sun, 9 Sep 2018 08:15:19 +0000
> From: Netanel Belgazal <netanel@amazon.com>
>
> Netanel Belgazal (7):
> net: ena: fix surprise unplug NULL dereference kernel crash
> net: ena: fix driver when PAGE_SIZE == 64kB
> net: ena: fix device destruction to gracefully free resources
> net: ena: fix potential double ena_destroy_device()
> net: ena: fix missing lock during device destruction
> net: ena: fix missing calls to READ_ONCE
> net: ena: fix incorrect usage of memory barriers
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH] r8169: inform about CLKRUN protocol issue when behind a CardBus bridge
From: Maciej S. Szmigiero @ 2018-09-09 22:01 UTC (permalink / raw)
To: David Miller; +Cc: nic_swsd, netdev, linux-kernel
In-Reply-To: <20180909.080907.992434701890231030.davem@davemloft.net>
On 09.09.2018 17:09, David Miller wrote:
> From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
> Date: Thu, 6 Sep 2018 18:10:53 +0200
>
>> It turns out that at least some r8169 CardBus cards don't operate correctly
>> when CLKRUN protocol is enabled - the symptoms are recurring timeouts
>> during PHY reads / writes and a very high packet drop rate.
>> This is true of at least RTL8169sc/8110sc (XID 18000000) chip in
>> Sunrich C-160 CardBus NIC.
>>
>> Such behavior was observed on two separate laptops, the first one has
>> TI PCIxx12 CardBus bridge, while the second one has Ricoh RL5c476II.
>>
>> Setting CLKRUN_En bit in CONFIG 3 register via an EEPROM write didn't
>> improve things in either case (this is probably why it wasn't set by the
>> card manufacturer).
>> The only way to fix the issue was to disable the CLKRUN protocol either
>> in the CardBus bridge (only possible in the TI one) or in the southbridge.
>>
>> Since the problem takes some time to debug let's warn people that have
>> the suspect configuration (Conventional PCI r8169 NIC behind a CardBus
>> bridge) so they know what they can do if they encounter it.
>>
>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>
> I don't know about this.
>
> Barking at the user in the kernel log about an obscure knob (which btw
> doesn't exist for all cardbus bridges without other patches you are
> posting elsewhere) is rarely effective.
>
> We should just disable clkrun automatically we know it causes problems.
Unfortunately, as you wrote above, this workaround is only available on
TI CardBus bridges (and I hope will be available for two Ricoh ones soon,
too), while for other CardBus bridges it is either not implemented or
not available at all.
So we can't reliably just turn it on automatically when needed.
BTW, it seems that my RTL8169 card isn't the only model affected.
In fact, the original CLKRUN protocol disabling workaround on TI bridges
was implemented in 2005 because somebody's RTL8139 also had this
problem: https://lkml.org/lkml/2005/2/5/129
The main reason I wanted to add this warning is to save people time
debugging this issue, as it is rather unobvious.
But if this solution is unacceptable then I hope at least this
description will pop out in search results when searching for some
related keywords.
Thanks,
Maciej
^ permalink raw reply
* Re: Containers and checkpoint/restart micro-conference at LPC2018
From: Steve French @ 2018-09-09 17:38 UTC (permalink / raw)
To: James Bottomley
Cc: stgraber, containers, linux-security-module, linux-fsdevel,
Network Development
In-Reply-To: <1536507039.3192.5.camel@HansenPartnership.com>
On Sun, Sep 9, 2018 at 10:32 AM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Mon, 2018-08-13 at 12:10 -0400, Stéphane Graber wrote:
> > Details can be found here:
> >
> > https://discuss.linuxcontainers.org/t/containers-micro-conference-a
> > t-linux-plumbers-2018/2417
>
> This website was giving a 503 error when I looked.
>
> However, if you want a discussion on the requirements for shiftfs (and
> whether we still need it), I'm up for that.
I also put in a proposal for a fs miniconference if we have any file
system developers that will be at plumbers
--
Thanks,
Steve
^ permalink raw reply
* [PATCH net V2] ip: frags: fix crash in ip_do_fragment()
From: Taehee Yoo @ 2018-09-09 17:47 UTC (permalink / raw)
To: davem, posk, netdev; +Cc: ap420073, pablo, fw, edumazet
A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.
test commands:
%iptables -t nat -I POSTROUTING -j MASQUERADE
%hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
splat looks like:
[ 261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[ 261.198158] Call Trace:
[ 261.199018] ? dst_output+0x180/0x180
[ 261.205011] ? save_trace+0x300/0x300
[ 261.209018] ? ip_copy_metadata+0xb00/0xb00
[ 261.213034] ? sched_clock_local+0xd4/0x140
[ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack]
[ 261.223014] ? rt_cpu_seq_stop+0x10/0x10
[ 261.227014] ? find_held_lock+0x39/0x1c0
[ 261.233008] ip_finish_output+0x51d/0xb50
[ 261.237006] ? ip_fragment.constprop.56+0x220/0x220
[ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[ 261.250152] ? rcu_is_watching+0x77/0x120
[ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[ 261.261033] ? nf_hook_slow+0xb1/0x160
[ 261.265007] ip_output+0x1c7/0x710
[ 261.269005] ? ip_mc_output+0x13f0/0x13f0
[ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0
[ 261.278152] ? ip_fragment.constprop.56+0x220/0x220
[ 261.282996] ? nf_hook_slow+0xb1/0x160
[ 261.287007] raw_sendmsg+0x21f9/0x4420
[ 261.291008] ? dst_output+0x180/0x180
[ 261.297003] ? sched_clock_cpu+0x126/0x170
[ 261.301003] ? find_held_lock+0x39/0x1c0
[ 261.306155] ? stop_critical_timings+0x420/0x420
[ 261.311004] ? check_flags.part.36+0x450/0x450
[ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40
[ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40
[ 261.326142] ? cyc2ns_read_end+0x10/0x10
[ 261.330139] ? raw_bind+0x280/0x280
[ 261.334138] ? sched_clock_cpu+0x126/0x170
[ 261.338995] ? check_flags.part.36+0x450/0x450
[ 261.342991] ? __lock_acquire+0x4500/0x4500
[ 261.348994] ? inet_sendmsg+0x11c/0x500
[ 261.352989] ? dst_output+0x180/0x180
[ 261.357012] inet_sendmsg+0x11c/0x500
[ ... ]
v2:
- clear skb->sk at reassembly routine.(Eric Dumarzet)
Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
net/ipv4/ip_fragment.c | 1 +
net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 88281fbce88c..e7227128df2c 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -599,6 +599,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
nextp = &fp->next;
fp->prev = NULL;
memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+ fp->sk = NULL;
head->data_len += fp->len;
head->len += fp->len;
if (head->ip_summed != fp->ip_summed)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 2a14d8b65924..8f68a518d9db 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -445,6 +445,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct net_devic
else if (head->ip_summed == CHECKSUM_COMPLETE)
head->csum = csum_add(head->csum, fp->csum);
head->truesize += fp->truesize;
+ fp->sk = NULL;
}
sub_frag_mem_limit(fq->q.net, head->truesize);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v4 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-09 18:11 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-rdma, Doug Ledford, netdev
In-Reply-To: <20180907154359.GA14627@ziepe.ca>
[-- Attachment #1: Type: text/plain, Size: 3223 bytes --]
On Fri, Sep 07, 2018 at 09:43:59AM -0600, Jason Gunthorpe wrote:
> On Thu, Sep 06, 2018 at 05:51:12PM +0300, Arseny Maslennikov wrote:
> > Some tools may currently be using only the deprecated attribute;
> > let's print an elaborate and clear deprecation notice to kmsg.
> >
> > To do that, we have to replace the whole sysfs file, since we inherit
> > the original one from netdev.
> >
> > Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
> > drivers/infiniband/ulp/ipoib/ipoib_main.c | 31 +++++++++++++++++++++++
> > 1 file changed, 31 insertions(+)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 30f840f874b3..74732726ec6f 100644
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -2386,6 +2386,35 @@ int ipoib_add_pkey_attr(struct net_device *dev)
> > return device_create_file(&dev->dev, &dev_attr_pkey);
> > }
> >
> > +/*
> > + * We erroneously exposed the iface's port number in the dev_id
> > + * sysfs field long after dev_port was introduced for that purpose[1],
> > + * and we need to stop everyone from relying on that.
> > + * Let's overload the shower routine for the dev_id file here
> > + * to gently bring the issue up.
> > + *
> > + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> > + */
> > +static ssize_t dev_id_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct net_device *ndev = to_net_dev(dev);
> > +
> > + if (ndev->dev_id == ndev->dev_port)
> > + netdev_info_once(ndev,
> > + "\"%s\" wants to know my dev_id. Should it look at dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more info.\n",
> > + current->comm);
> > +
> > + return sprintf(buf, "%#x\n", ndev->dev_id);
> > +}
> > +static DEVICE_ATTR_RO(dev_id);
> > +
> > +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> > +{
> > + device_remove_file(&dev->dev, &dev_attr_dev_id);
> > + return device_create_file(&dev->dev, &dev_attr_dev_id);
> > +}
>
> Isn't this racey with userspace? Ie what happens if udev is querying
> the dev_id right here?
udev in particular does not use dev_id at all since 2014, because "why
would we keep using dev_id if it is not the right thing to use?".
>
> Do we know there is no userspace doing this?
>
Not for sure.
If we move all the sysfs handling stuff we introduce in _add_port():
- pkey
- umcast
- {create,delete}_child
- connected/datagram mode
to _ndo_init(), which is called by register_netdev before it sends
the netlink message, would that suffice to eliminate the race?
(Sysfs files for {create,delete}_child go to _parent_init() then).
> > static struct net_device *ipoib_add_port(const char *format,
> > struct ib_device *hca, u8 port)
> > {
> > @@ -2427,6 +2456,8 @@ static struct net_device *ipoib_add_port(const char *format,
> > */
> > ndev->priv_destructor = ipoib_intf_free;
> >
> > + if (ipoib_intercept_dev_id_attr(ndev))
> > + goto sysfs_failed;
>
> No device_remove_file needed?
>
As far as I understand, unregister_netdevice takes care of it
while destroying the related kobject.
> Jason
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net V2] ip: frags: fix crash in ip_do_fragment()
From: Eric Dumazet @ 2018-09-09 18:24 UTC (permalink / raw)
To: Taehee Yoo
Cc: David Miller, Peter Oskolkov, netdev, Pablo Neira Ayuso,
Florian Westphal
In-Reply-To: <20180909174705.3677-1-ap420073@gmail.com>
On Sun, Sep 9, 2018 at 10:47 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.
>
> test commands:
> %iptables -t nat -I POSTROUTING -j MASQUERADE
> %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
>
> splat looks like:
>
> v2:
> - clear skb->sk at reassembly routine.(Eric Dumarzet)
>
> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
> net/ipv4/ip_fragment.c | 1 +
> net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 88281fbce88c..e7227128df2c 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -599,6 +599,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
> nextp = &fp->next;
> fp->prev = NULL;
> memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> + fp->sk = NULL;
> head->data_len += fp->len;
> head->len += fp->len;
> if (head->ip_summed != fp->ip_summed)
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index 2a14d8b65924..8f68a518d9db 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -445,6 +445,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct net_devic
> else if (head->ip_summed == CHECKSUM_COMPLETE)
> head->csum = csum_add(head->csum, fp->csum);
> head->truesize += fp->truesize;
> + fp->sk = NULL;
This is not needed. IPv6 paths were not changed by recent commits.
> }
> sub_frag_mem_limit(fq->q.net, head->truesize);
>
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH net V2] ip: frags: fix crash in ip_do_fragment()
From: Taehee Yoo @ 2018-09-09 18:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Peter Oskolkov, netdev, Pablo Neira Ayuso,
Florian Westphal
In-Reply-To: <CANn89i+pBEuH-dg=f+3rxiheQyjJfg2=SqgoRmB4S7VEaVgeow@mail.gmail.com>
2018-09-10 3:24 GMT+09:00 Eric Dumazet <edumazet@google.com>:
> On Sun, Sep 9, 2018 at 10:47 AM Taehee Yoo <ap420073@gmail.com> wrote:
>>
>> A kernel crash occurrs when defragmented packet is fragmented
>> in ip_do_fragment().
>> In defragment routine, skb_orphan() is called and
>> skb->ip_defrag_offset is set. but skb->sk and
>> skb->ip_defrag_offset are same union member. so that
>> frag->sk is not NULL.
>> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
>> defragmented packet is fragmented.
>>
>> test commands:
>> %iptables -t nat -I POSTROUTING -j MASQUERADE
>> %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
>>
>> splat looks like:
>>
>> v2:
>> - clear skb->sk at reassembly routine.(Eric Dumarzet)
>>
>> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
>> Suggested-by: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
>> ---
>> net/ipv4/ip_fragment.c | 1 +
>> net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
>> index 88281fbce88c..e7227128df2c 100644
>> --- a/net/ipv4/ip_fragment.c
>> +++ b/net/ipv4/ip_fragment.c
>> @@ -599,6 +599,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>> nextp = &fp->next;
>> fp->prev = NULL;
>> memset(&fp->rbnode, 0, sizeof(fp->rbnode));
>> + fp->sk = NULL;
>> head->data_len += fp->len;
>> head->len += fp->len;
>> if (head->ip_summed != fp->ip_summed)
>> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
>> index 2a14d8b65924..8f68a518d9db 100644
>> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
>> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
>> @@ -445,6 +445,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct net_devic
>> else if (head->ip_summed == CHECKSUM_COMPLETE)
>> head->csum = csum_add(head->csum, fp->csum);
>> head->truesize += fp->truesize;
>> + fp->sk = NULL;
>
> This is not needed. IPv6 paths were not changed by recent commits.
>
>> }
>> sub_frag_mem_limit(fq->q.net, head->truesize);
>>
>> --
>> 2.17.1
>>
Hi Eric,
Thank you for review!
I think netfilter side ipv6 code change is needed
because netfilter ipv6 defrag routine also set fp->ip_defrag_offset value
so that fp->sk will not be NULL.
And I think these crash in ip_do_fragment() and ip6_fragment() are
actually same bug.
My ipv6 test environment is below
PC1<----------------->FW<----------------->PC2
cd::02/64 cd::01/64 ab::01/64 ab::02/64
FW command:
%ip6tables -t nat -I POSTROUTING -j MASQUERADE
PC2 command:
%ping6 cd::02 -s 60000
FW crash message:
[ 502.676552] kernel BUG at net/ipv6/ip6_output.c:658!
[ 502.682641] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 502.683545] CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.19.0-rc2+ #12
[ 502.692231] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 502.692231] RIP: 0010:ip6_fragment+0x16fa/0x3580
[ 502.692231] Code: 45 00 f8 0f 85 c3 03 00 00 49 8d 7f 18 48 89 f8
48 c1 e8 03 42 80 3c 28 00 0f 85 23 18 00 00 49 83 7f 18 00 0f 84 22
fe ff ff <0f> 0b 48 85 c0 0f 85 43 02 00 00 49 83 e6 fe 48 b8 00 00 00
00 00
[ 502.692231] RSP: 0018:ffff88011561eb58 EFLAGS: 00010202
[ 502.692231] RAX: 1ffff1002142e37b RBX: ffff8801062c16c0 RCX: 0000000000000004
[ 502.692231] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88010a171bd8
[ 502.692231] RBP: ffffed0022ac3d89 R08: ffffed002142e395 R09: ffffed002142e395
[ 502.692231] R10: 0000000000000001 R11: ffffed002142e394 R12: 0000000000000028
[ 502.692231] R13: dffffc0000000000 R14: ffff88010a171ca4 R15: ffff88010a171bc0
[ 502.692231] FS: 0000000000000000(0000) GS:ffff880116600000(0000)
knlGS:0000000000000000
[ 502.692231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 502.692231] CR2: 0000000002452c64 CR3: 0000000105fea000 CR4: 00000000001006e0
[ 502.692231] Call Trace:
[ 502.692231] ? ip6_forward_finish+0x370/0x370
[ 502.692231] ? ip6_forward+0x34e0/0x34e0
[ 502.813859] ? ip6_mtu+0x24d/0x310
[ 502.813859] ? ip6_negative_advice+0x160/0x160
[ 502.813859] ? rcu_is_watching+0x77/0x120
[ 502.813859] ? nf_nat_ipv6_out+0x206/0x4d0 [nf_nat_ipv6]
[ 502.813859] ? ip6_finish_output+0x35e/0x920
[ 502.813859] ip6_output+0x1e5/0x870
[ 502.813859] ? ip6_finish_output+0x920/0x920
[ 502.813859] ? __lock_acquire+0x4500/0x4500
[ 502.813859] ? __nf_nat_alloc_null_binding+0x20c/0x340 [nf_nat]
[ 502.813859] ? ip6_fragment+0x3580/0x3580
[ 502.813859] ip6_forward+0x1328/0x34e0
[ 502.813859] ? nf_nat_inet_fn+0x446/0x970 [nf_nat]
[ 502.813859] ? ip6_autoflowlabel+0xb0/0xb0
[ 502.813859] ? ip6_route_input+0x639/0xc80
[ 502.813859] ? save_trace+0x300/0x300
[ 502.813859] ? ip6_route_info_create+0x35e0/0x35e0
[ 502.813859] ? ip6_rcv_finish_core.isra.10+0x152/0x520
[ 502.813859] ? nf_hook.constprop.19+0x780/0x780
[ 502.813859] ? nf_nat_ipv6_out+0x4d0/0x4d0 [nf_nat_ipv6]
[ 502.813859] ? ipv6_rcv+0x40b/0x500
[ 502.813859] ? ip6_autoflowlabel+0xb0/0xb0
[ 502.813859] ipv6_rcv+0x40b/0x500
[ 502.813859] ? ip6_rcv_core.isra.14+0x1dd0/0x1dd0
[ 502.813859] ? lock_acquire+0x196/0x470
[ 502.813859] ? ip6_rcv_finish_core.isra.10+0x520/0x520
[ 502.813859] ? ip6_rcv_core.isra.14+0x1dd0/0x1dd0
[ 502.813859] __netif_receive_skb_one_core+0x115/0x1a0
[ 502.813859] ? rcu_is_watching+0x77/0x120
[ 502.813859] ? __netif_receive_skb_core+0x2ac0/0x2ac0
[ 502.813859] ? _rcu_barrier_trace+0x400/0x400
[ 502.813859] ? rcu_pm_notify+0xc0/0xc0
[ 502.813859] netif_receive_skb_internal+0xd8/0x570
[ 502.813859] ? dev_cpu_dead+0x980/0x980
[ 502.813859] ? rcu_read_lock_sched_held+0x114/0x130
[ 502.813859] ? kmem_cache_alloc+0x1ea/0x280
[ 502.813859] ? memset+0x1f/0x40
[ 502.813859] ? rcu_pm_notify+0xc0/0xc0
[ 502.813859] napi_gro_receive+0x344/0x410
[ 502.813859] ? dev_gro_receive+0x29b0/0x29b0
[ 502.813859] ? build_skb+0x63/0x2b0
[ 502.813859] ? eth_gro_receive+0x8f0/0x8f0
[ 502.813859] ? __build_skb+0x3b0/0x3b0
[ 502.813859] igb_poll+0x16e1/0x4650
[ 502.813859] ? igb_alloc_rx_buffers+0xa80/0xa80
[ 502.813859] ? sched_clock_cpu+0x126/0x170
[ 502.813859] ? stop_critical_timings+0x420/0x420
[ 502.813859] ? net_rx_action+0x3f0/0x1440
[ 502.813859] ? net_rx_action+0x3f0/0x1440
[ 502.813859] ? rcu_pm_notify+0xc0/0xc0
[ 502.813859] net_rx_action+0x5db/0x1440
[ 502.813859] ? _raw_spin_unlock+0x24/0x30
[ 502.813859] ? napi_complete_done+0x4c0/0x4c0
[ 502.813859] ? pfifo_fast_peek+0x1c0/0x1c0
[ 502.813859] ? cyc2ns_read_end+0x10/0x10
[ 502.813859] ? save_trace+0x300/0x300
[ 502.813859] ? save_trace+0x300/0x300
[ 502.813859] ? sched_clock_cpu+0x126/0x170
[ 502.813859] ? find_held_lock+0x39/0x1c0
[ 502.813859] ? lock_acquire+0x196/0x470
[ 502.813859] ? check_flags.part.36+0x450/0x450
[ 502.813859] ? check_flags.part.36+0x450/0x450
[ 502.813859] ? __run_timers+0x6ff/0x990
[ 502.813859] ? do_raw_spin_unlock+0xa5/0x330
[ 502.813859] ? do_raw_spin_trylock+0x1a0/0x1a0
[ 502.813859] ? do_raw_spin_trylock+0x101/0x1a0
[ 502.813859] ? _raw_spin_unlock+0x24/0x30
[ 502.813859] ? net_tx_action+0x6cb/0xad0
[ 502.813859] ? dev_queue_xmit_nit+0xe70/0xe70
[ 502.813859] ? _raw_spin_unlock_irq+0x31/0x40
[ 502.813859] ? stop_critical_timings+0x420/0x420
[ 502.813859] ? finish_task_switch+0x183/0x740
[ 502.813859] ? __switch_to_asm+0x30/0x60
[ 502.813859] ? __switch_to_asm+0x24/0x60
[ 502.813859] ? __do_softirq+0x263/0xa11
[ 502.813859] ? __do_softirq+0x263/0xa11
[ 502.813859] ? rcu_pm_notify+0xc0/0xc0
[ 502.813859] __do_softirq+0x2a5/0xa11
[ 502.813859] ? __sched_text_start+0x8/0x8
[ 502.813859] ? __irqentry_text_end+0x1fa51c/0x1fa51c
[ 502.813859] ? smpboot_thread_fn+0xd5/0x700
[ 502.813859] ? tracer_hardirqs_on+0x420/0x420
[ 502.813859] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 502.813859] ? run_ksoftirqd+0xb/0x50
[ 502.813859] ? trace_hardirqs_off+0x6b/0x210
[ 502.813859] ? trace_hardirqs_on_caller+0x210/0x210
[ 502.813859] ? takeover_tasklets+0x860/0x860
[ 502.813859] ? takeover_tasklets+0x860/0x860
[ 502.813859] run_ksoftirqd+0x24/0x50
[ 502.813859] smpboot_thread_fn+0x3cd/0x700
[ 502.813859] ? sort_range+0x20/0x20
[ 502.813859] ? __kthread_parkme+0xb6/0x180
[ 502.813859] ? sort_range+0x20/0x20
[ 502.813859] kthread+0x322/0x3e0
[ 502.813859] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 502.813859] ret_from_fork+0x3a/0x50
[ 502.813859] Modules linked in: ip6t_MASQUERADE ip6table_nat
nf_nat_ipv6 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
ip6_tables ip_tables x_tables
[ 503.279587] ---[ end trace 49127255558e40c3 ]---
[ 503.284881] RIP: 0010:ip6_fragment+0x16fa/0x3580
[ 503.290150] Code: 45 00 f8 0f 85 c3 03 00 00 49 8d 7f 18 48 89 f8
48 c1 e8 03 42 80 3c 28 00 0f 85 23 18 00 00 49 83 7f 18 00 0f 84 22
fe ff ff <0f> 0b 48 85 c0 0f 85 43 02 00 00 49 83 e6 fe 48 b8 00 00 00
00 00
[ 503.311356] RSP: 0018:ffff88011561eb58 EFLAGS: 00010202
[ 503.317328] RAX: 1ffff1002142e37b RBX: ffff8801062c16c0 RCX: 0000000000000004
[ 503.325438] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88010a171bd8
[ 503.333574] RBP: ffffed0022ac3d89 R08: ffffed002142e395 R09: ffffed002142e395
[ 503.341686] R10: 0000000000000001 R11: ffffed002142e394 R12: 0000000000000028
[ 503.349803] R13: dffffc0000000000 R14: ffff88010a171ca4 R15: ffff88010a171bc0
[ 503.357916] FS: 0000000000000000(0000) GS:ffff880116600000(0000)
knlGS:0000000000000000
[ 503.367102] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 503.373642] CR2: 0000000002452c64 CR3: 0000000105fea000 CR4: 00000000001006e0
[ 503.381803] Kernel panic - not syncing: Fatal exception in interrupt
[ 503.382721] Kernel Offset: 0xa000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 503.382721] Rebooting in 5 seconds..
^ permalink raw reply
* [PATCH v2 net] MIPS: lantiq: dma: add dev pointer
From: Hauke Mehrtens @ 2018-09-09 19:26 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, f.fainelli, john, linux-mips, hauke.mehrtens,
paul.burton, Hauke Mehrtens
dma_zalloc_coherent() now crashes if no dev pointer is given.
Add a dev pointer to the ltq_dma_channel structure and fill it in the
driver using it.
This fixes a bug introduced in kernel 4.19.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
---
no changes since v1.
This should go into kernel 4.19 and I have some other patches adding new
features for kernel 4.20 which are depending on this, so I would prefer
if this goes through the net tree.
arch/mips/include/asm/mach-lantiq/xway/xway_dma.h | 1 +
arch/mips/lantiq/xway/dma.c | 4 ++--
drivers/net/ethernet/lantiq_etop.c | 1 +
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/mips/include/asm/mach-lantiq/xway/xway_dma.h b/arch/mips/include/asm/mach-lantiq/xway/xway_dma.h
index 4901833498f7..8441b2698e64 100644
--- a/arch/mips/include/asm/mach-lantiq/xway/xway_dma.h
+++ b/arch/mips/include/asm/mach-lantiq/xway/xway_dma.h
@@ -40,6 +40,7 @@ struct ltq_dma_channel {
int desc; /* the current descriptor */
struct ltq_dma_desc *desc_base; /* the descriptor base */
int phys; /* physical addr */
+ struct device *dev;
};
enum {
diff --git a/arch/mips/lantiq/xway/dma.c b/arch/mips/lantiq/xway/dma.c
index 4b9fbb6744ad..664f2f7f55c1 100644
--- a/arch/mips/lantiq/xway/dma.c
+++ b/arch/mips/lantiq/xway/dma.c
@@ -130,7 +130,7 @@ ltq_dma_alloc(struct ltq_dma_channel *ch)
unsigned long flags;
ch->desc = 0;
- ch->desc_base = dma_zalloc_coherent(NULL,
+ ch->desc_base = dma_zalloc_coherent(ch->dev,
LTQ_DESC_NUM * LTQ_DESC_SIZE,
&ch->phys, GFP_ATOMIC);
@@ -182,7 +182,7 @@ ltq_dma_free(struct ltq_dma_channel *ch)
if (!ch->desc_base)
return;
ltq_dma_close(ch);
- dma_free_coherent(NULL, LTQ_DESC_NUM * LTQ_DESC_SIZE,
+ dma_free_coherent(ch->dev, LTQ_DESC_NUM * LTQ_DESC_SIZE,
ch->desc_base, ch->phys);
}
EXPORT_SYMBOL_GPL(ltq_dma_free);
diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c
index 7a637b51c7d2..e08301d833e2 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -274,6 +274,7 @@ ltq_etop_hw_init(struct net_device *dev)
struct ltq_etop_chan *ch = &priv->ch[i];
ch->idx = ch->dma.nr = i;
+ ch->dma.dev = &priv->pdev->dev;
if (IS_TX(i)) {
ltq_dma_alloc_tx(&ch->dma);
--
2.11.0
^ permalink raw reply related
* [PATCH v3 net-next 0/6] Add support for Lantiq / Intel vrx200 network
From: Hauke Mehrtens @ 2018-09-09 20:16 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree, Hauke Mehrtens
This adds basic support for the GSWIP (Gigabit Switch) found in the
VRX200 SoC.
There are different versions of this IP core used in different SoCs, but
this driver was currently only tested on the VRX200 SoC line, for other
SoCs this driver probably need some adoptions to work.
I also plan to add Layer 2 offloading to the DSA driver and later also
layer 3 offloading which is supported by the PPE HW block.
All these patches should go through the net-next tree.
This depends on the patch "MIPS: lantiq: dma: add dev pointer" which
should go into 4.19.
Changes since:
v2:
* Send patch "MIPS: lantiq: dma: add dev pointer" separately
* all: removed return in register write functions
* switch: uses phylink
* switch: uses hardware MDIO auto polling
* switch: use usleep_range() in MDIO busy check
* switch: configure MDIO bus to 2.5 MHz
* switch: disable xMII link when it is not used
* Ethernet: use NAPI for TX cleanups
* Ethernet: enable clock in open callback
* Ethernet: improve skb allocation
* Ethernet: use net_dev->stats
v1:
* Add "MIPS: lantiq: dma: add dev pointer"
* checkpatch fixes a all patches
* Added binding documentation
* use readx_poll_timeout function and ETIMEOUT error code
* integrate GPHY firmware loading into DSA driver
* renamed to NET_DSA_LANTIQ_GSWIP
* removed some needed casts
* added of_device_id.data information about the detected switch
* fixed John's email address
Hauke Mehrtens (6):
MIPS: lantiq: Do not enable IRQs in dma open
net: dsa: Add Lantiq / Intel GSWIP tag support
dt-bindings: net: Add lantiq,xrx200-net DT bindings
net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver
dt-bindings: net: dsa: Add lantiq,xrx200-gswip DT bindings
net: dsa: Add Lantiq / Intel DSA driver for vrx200
.../devicetree/bindings/net/dsa/lantiq-gswip.txt | 141 +++
.../devicetree/bindings/net/lantiq,xrx200-net.txt | 21 +
MAINTAINERS | 9 +
arch/mips/lantiq/xway/dma.c | 1 -
arch/mips/lantiq/xway/sysctrl.c | 14 +-
drivers/net/dsa/Kconfig | 8 +
drivers/net/dsa/Makefile | 1 +
drivers/net/dsa/lantiq_gswip.c | 1169 ++++++++++++++++++++
drivers/net/dsa/lantiq_pce.h | 153 +++
drivers/net/ethernet/Kconfig | 7 +
drivers/net/ethernet/Makefile | 1 +
drivers/net/ethernet/lantiq_etop.c | 1 +
drivers/net/ethernet/lantiq_xrx200.c | 564 ++++++++++
include/net/dsa.h | 1 +
net/dsa/Kconfig | 3 +
net/dsa/Makefile | 1 +
net/dsa/dsa.c | 3 +
net/dsa/dsa_priv.h | 3 +
net/dsa/tag_gswip.c | 109 ++
19 files changed, 2202 insertions(+), 8 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
create mode 100644 Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
create mode 100644 drivers/net/dsa/lantiq_gswip.c
create mode 100644 drivers/net/dsa/lantiq_pce.h
create mode 100644 drivers/net/ethernet/lantiq_xrx200.c
create mode 100644 net/dsa/tag_gswip.c
--
2.11.0
^ permalink raw reply
* [PATCH v3 net-next 1/6] MIPS: lantiq: Do not enable IRQs in dma open
From: Hauke Mehrtens @ 2018-09-09 20:16 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree, Hauke Mehrtens
In-Reply-To: <20180909201647.32727-1-hauke@hauke-m.de>
When a DMA channel is opened the IRQ should not get activated
automatically, this allows it to pull data out manually without the help
of interrupts. This is needed for a workaround in the vrx200 Ethernet
driver.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Paul Burton <paul.burton@mips.com>
---
arch/mips/lantiq/xway/dma.c | 1 -
drivers/net/ethernet/lantiq_etop.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/lantiq/xway/dma.c b/arch/mips/lantiq/xway/dma.c
index 664f2f7f55c1..982859f2b2a3 100644
--- a/arch/mips/lantiq/xway/dma.c
+++ b/arch/mips/lantiq/xway/dma.c
@@ -106,7 +106,6 @@ ltq_dma_open(struct ltq_dma_channel *ch)
spin_lock_irqsave(<q_dma_lock, flag);
ltq_dma_w32(ch->nr, LTQ_DMA_CS);
ltq_dma_w32_mask(0, DMA_CHAN_ON, LTQ_DMA_CCTRL);
- ltq_dma_w32_mask(0, 1 << ch->nr, LTQ_DMA_IRNEN);
spin_unlock_irqrestore(<q_dma_lock, flag);
}
EXPORT_SYMBOL_GPL(ltq_dma_open);
diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c
index e08301d833e2..379db19a303c 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -439,6 +439,7 @@ ltq_etop_open(struct net_device *dev)
if (!IS_TX(i) && (!IS_RX(i)))
continue;
ltq_dma_open(&ch->dma);
+ ltq_dma_enable_irq(&ch->dma);
napi_enable(&ch->napi);
}
phy_start(dev->phydev);
--
2.11.0
^ permalink raw reply related
* [PATCH v3 net-next 2/6] net: dsa: Add Lantiq / Intel GSWIP tag support
From: Hauke Mehrtens @ 2018-09-09 20:16 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree, Hauke Mehrtens
In-Reply-To: <20180909201647.32727-1-hauke@hauke-m.de>
This handles the tag added by the PMAC on the VRX200 SoC line.
The GSWIP uses internally a GSWIP special tag which is located after the
Ethernet header. The PMAC which connects the GSWIP to the CPU converts
this special tag used by the GSWIP into the PMAC special tag which is
added in front of the Ethernet header.
This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP
versions use slightly different PMAC special tags.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
MAINTAINERS | 6 +++
include/net/dsa.h | 1 +
net/dsa/Kconfig | 3 ++
net/dsa/Makefile | 1 +
net/dsa/dsa.c | 3 ++
net/dsa/dsa_priv.h | 3 ++
net/dsa/tag_gswip.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 126 insertions(+)
create mode 100644 net/dsa/tag_gswip.c
diff --git a/MAINTAINERS b/MAINTAINERS
index a5b256b25905..4b2ee65f6086 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8166,6 +8166,12 @@ S: Maintained
F: net/l3mdev
F: include/net/l3mdev.h
+LANTIQ / INTEL Ethernet drivers
+M: Hauke Mehrtens <hauke@hauke-m.de>
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/dsa/tag_gswip.c
+
LANTIQ MIPS ARCHITECTURE
M: John Crispin <john@phrozen.org>
L: linux-mips@linux-mips.org
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 461e8a7661b7..23690c44e167 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -35,6 +35,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_BRCM_PREPEND,
DSA_TAG_PROTO_DSA,
DSA_TAG_PROTO_EDSA,
+ DSA_TAG_PROTO_GSWIP,
DSA_TAG_PROTO_KSZ,
DSA_TAG_PROTO_LAN9303,
DSA_TAG_PROTO_MTK,
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 4183e4ba27a5..48c41918fb35 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -38,6 +38,9 @@ config NET_DSA_TAG_DSA
config NET_DSA_TAG_EDSA
bool
+config NET_DSA_TAG_GSWIP
+ bool
+
config NET_DSA_TAG_KSZ
bool
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 9e4d3536f977..6e721f7a2947 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -9,6 +9,7 @@ dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
dsa_core-$(CONFIG_NET_DSA_TAG_BRCM_PREPEND) += tag_brcm.o
dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o
+dsa_core-$(CONFIG_NET_DSA_TAG_GSWIP) += tag_gswip.o
dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o
dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index e63c554e0623..81212109c507 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -54,6 +54,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
#ifdef CONFIG_NET_DSA_TAG_EDSA
[DSA_TAG_PROTO_EDSA] = &edsa_netdev_ops,
#endif
+#ifdef CONFIG_NET_DSA_TAG_GSWIP
+ [DSA_TAG_PROTO_GSWIP] = &gswip_netdev_ops,
+#endif
#ifdef CONFIG_NET_DSA_TAG_KSZ
[DSA_TAG_PROTO_KSZ] = &ksz_netdev_ops,
#endif
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 3964c6f7a7c0..824ca07a30aa 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -205,6 +205,9 @@ extern const struct dsa_device_ops dsa_netdev_ops;
/* tag_edsa.c */
extern const struct dsa_device_ops edsa_netdev_ops;
+/* tag_gswip.c */
+extern const struct dsa_device_ops gswip_netdev_ops;
+
/* tag_ksz.c */
extern const struct dsa_device_ops ksz_netdev_ops;
diff --git a/net/dsa/tag_gswip.c b/net/dsa/tag_gswip.c
new file mode 100644
index 000000000000..49e9b73f1be3
--- /dev/null
+++ b/net/dsa/tag_gswip.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel / Lantiq GSWIP V2.0 PMAC tag support
+ *
+ * Copyright (C) 2017 - 2018 Hauke Mehrtens <hauke@hauke-m.de>
+ */
+
+#include <linux/bitops.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+#include <net/dsa.h>
+
+#include "dsa_priv.h"
+
+#define GSWIP_TX_HEADER_LEN 4
+
+/* special tag in TX path header */
+/* Byte 0 */
+#define GSWIP_TX_SLPID_SHIFT 0 /* source port ID */
+#define GSWIP_TX_SLPID_CPU 2
+#define GSWIP_TX_SLPID_APP1 3
+#define GSWIP_TX_SLPID_APP2 4
+#define GSWIP_TX_SLPID_APP3 5
+#define GSWIP_TX_SLPID_APP4 6
+#define GSWIP_TX_SLPID_APP5 7
+
+/* Byte 1 */
+#define GSWIP_TX_CRCGEN_DIS BIT(7)
+#define GSWIP_TX_DPID_SHIFT 0 /* destination group ID */
+#define GSWIP_TX_DPID_ELAN 0
+#define GSWIP_TX_DPID_EWAN 1
+#define GSWIP_TX_DPID_CPU 2
+#define GSWIP_TX_DPID_APP1 3
+#define GSWIP_TX_DPID_APP2 4
+#define GSWIP_TX_DPID_APP3 5
+#define GSWIP_TX_DPID_APP4 6
+#define GSWIP_TX_DPID_APP5 7
+
+/* Byte 2 */
+#define GSWIP_TX_PORT_MAP_EN BIT(7)
+#define GSWIP_TX_PORT_MAP_SEL BIT(6)
+#define GSWIP_TX_LRN_DIS BIT(5)
+#define GSWIP_TX_CLASS_EN BIT(4)
+#define GSWIP_TX_CLASS_SHIFT 0
+#define GSWIP_TX_CLASS_MASK GENMASK(3, 0)
+
+/* Byte 3 */
+#define GSWIP_TX_DPID_EN BIT(0)
+#define GSWIP_TX_PORT_MAP_SHIFT 1
+#define GSWIP_TX_PORT_MAP_MASK GENMASK(6, 1)
+
+#define GSWIP_RX_HEADER_LEN 8
+
+/* special tag in RX path header */
+/* Byte 7 */
+#define GSWIP_RX_SPPID_SHIFT 4
+#define GSWIP_RX_SPPID_MASK GENMASK(6, 4)
+
+static struct sk_buff *gswip_tag_xmit(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ int err;
+ u8 *gswip_tag;
+
+ err = skb_cow_head(skb, GSWIP_TX_HEADER_LEN);
+ if (err)
+ return NULL;
+
+ skb_push(skb, GSWIP_TX_HEADER_LEN);
+
+ gswip_tag = skb->data;
+ gswip_tag[0] = GSWIP_TX_SLPID_CPU;
+ gswip_tag[1] = GSWIP_TX_DPID_ELAN;
+ gswip_tag[2] = GSWIP_TX_PORT_MAP_EN | GSWIP_TX_PORT_MAP_SEL;
+ gswip_tag[3] = BIT(dp->index + GSWIP_TX_PORT_MAP_SHIFT) & GSWIP_TX_PORT_MAP_MASK;
+ gswip_tag[3] |= GSWIP_TX_DPID_EN;
+
+ return skb;
+}
+
+static struct sk_buff *gswip_tag_rcv(struct sk_buff *skb,
+ struct net_device *dev,
+ struct packet_type *pt)
+{
+ int port;
+ u8 *gswip_tag;
+
+ if (unlikely(!pskb_may_pull(skb, GSWIP_RX_HEADER_LEN)))
+ return NULL;
+
+ gswip_tag = skb->data - ETH_HLEN;
+
+ /* Get source port information */
+ port = (gswip_tag[7] & GSWIP_RX_SPPID_MASK) >> GSWIP_RX_SPPID_SHIFT;
+ skb->dev = dsa_master_find_slave(dev, 0, port);
+ if (!skb->dev)
+ return NULL;
+
+ /* remove GSWIP tag */
+ skb_pull_rcsum(skb, GSWIP_RX_HEADER_LEN);
+
+ return skb;
+}
+
+const struct dsa_device_ops gswip_netdev_ops = {
+ .xmit = gswip_tag_xmit,
+ .rcv = gswip_tag_rcv,
+};
--
2.11.0
^ permalink raw reply related
* [PATCH v3 net-next 3/6] dt-bindings: net: Add lantiq,xrx200-net DT bindings
From: Hauke Mehrtens @ 2018-09-09 20:16 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree, Hauke Mehrtens
In-Reply-To: <20180909201647.32727-1-hauke@hauke-m.de>
This adds the binding for the PMAC core between the CPU and the GSWIP
switch found on the xrx200 / VR9 Lantiq / Intel SoC.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: devicetree@vger.kernel.org
---
.../devicetree/bindings/net/lantiq,xrx200-net.txt | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
diff --git a/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
new file mode 100644
index 000000000000..8a2fe5200cdc
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
@@ -0,0 +1,21 @@
+Lantiq xRX200 GSWIP PMAC Ethernet driver
+==================================
+
+Required properties:
+
+- compatible : "lantiq,xrx200-net" for the PMAC of the embedded
+ : GSWIP in the xXR200
+- reg : memory range of the PMAC core inside of the GSWIP core
+- interrupts : TX and RX DMA interrupts. Use interrupt-names "tx" for
+ : the TX interrupt and "rx" for the RX interrupt.
+
+Example:
+
+eth0: eth@E10B308 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "lantiq,xrx200-net";
+ reg = <0xE10B308 0x30>;
+ interrupts = <73>, <72>;
+ interrupt-names = "tx", "rx";
+};
--
2.11.0
^ permalink raw reply related
* [PATCH v3 net-next 4/6] net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver
From: Hauke Mehrtens @ 2018-09-09 20:16 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree, Hauke Mehrtens
In-Reply-To: <20180909201647.32727-1-hauke@hauke-m.de>
This drives the PMAC between the GSWIP Switch and the CPU in the VRX200
SoC. This is currently only the very basic version of the Ethernet
driver.
When the DMA channel is activated we receive some packets which were
send to the SoC while it was still in U-Boot, these packets have the
wrong header. Resetting the IP cores did not work so we read out the
extra packets at the beginning and discard them.
This also adapts the clock code in sysctrl.c to use the default name of
the device node so that the driver gets the correct clock. sysctrl.c
should be replaced with a proper common clock driver later.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
---
MAINTAINERS | 1 +
arch/mips/lantiq/xway/sysctrl.c | 6 +-
drivers/net/ethernet/Kconfig | 7 +
drivers/net/ethernet/Makefile | 1 +
drivers/net/ethernet/lantiq_xrx200.c | 564 +++++++++++++++++++++++++++++++++++
5 files changed, 576 insertions(+), 3 deletions(-)
create mode 100644 drivers/net/ethernet/lantiq_xrx200.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 4b2ee65f6086..ffff912d31b5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8171,6 +8171,7 @@ M: Hauke Mehrtens <hauke@hauke-m.de>
L: netdev@vger.kernel.org
S: Maintained
F: net/dsa/tag_gswip.c
+F: drivers/net/ethernet/lantiq_xrx200.c
LANTIQ MIPS ARCHITECTURE
M: John Crispin <john@phrozen.org>
diff --git a/arch/mips/lantiq/xway/sysctrl.c b/arch/mips/lantiq/xway/sysctrl.c
index e0af39b33e28..eeb89a37e27e 100644
--- a/arch/mips/lantiq/xway/sysctrl.c
+++ b/arch/mips/lantiq/xway/sysctrl.c
@@ -505,7 +505,7 @@ void __init ltq_soc_init(void)
clkdev_add_pmu("1a800000.pcie", "msi", 1, 1, PMU1_PCIE2_MSI);
clkdev_add_pmu("1a800000.pcie", "pdi", 1, 1, PMU1_PCIE2_PDI);
clkdev_add_pmu("1a800000.pcie", "ctl", 1, 1, PMU1_PCIE2_CTL);
- clkdev_add_pmu("1e108000.eth", NULL, 0, 0, PMU_SWITCH | PMU_PPE_DP);
+ clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH | PMU_PPE_DP);
clkdev_add_pmu("1da00000.usif", "NULL", 1, 0, PMU_USIF);
clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU);
} else if (of_machine_is_compatible("lantiq,ar10")) {
@@ -513,7 +513,7 @@ void __init ltq_soc_init(void)
ltq_ar10_fpi_hz(), ltq_ar10_pp32_hz());
clkdev_add_pmu("1e101000.usb", "otg", 1, 0, PMU_USB0);
clkdev_add_pmu("1e106000.usb", "otg", 1, 0, PMU_USB1);
- clkdev_add_pmu("1e108000.eth", NULL, 0, 0, PMU_SWITCH |
+ clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH |
PMU_PPE_DP | PMU_PPE_TC);
clkdev_add_pmu("1da00000.usif", "NULL", 1, 0, PMU_USIF);
clkdev_add_pmu("1f203020.gphy", NULL, 1, 0, PMU_GPHY);
@@ -536,7 +536,7 @@ void __init ltq_soc_init(void)
clkdev_add_pmu(NULL, "ahb", 1, 0, PMU_AHBM | PMU_AHBS);
clkdev_add_pmu("1da00000.usif", "NULL", 1, 0, PMU_USIF);
- clkdev_add_pmu("1e108000.eth", NULL, 0, 0,
+ clkdev_add_pmu("1e10b308.eth", NULL, 0, 0,
PMU_SWITCH | PMU_PPE_DPLUS | PMU_PPE_DPLUM |
PMU_PPE_EMA | PMU_PPE_TC | PMU_PPE_SLL01 |
PMU_PPE_QSB | PMU_PPE_TOP);
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 6fde68aa13a4..885e00d17807 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -108,6 +108,13 @@ config LANTIQ_ETOP
---help---
Support for the MII0 inside the Lantiq SoC
+config LANTIQ_XRX200
+ tristate "Lantiq / Intel xRX200 PMAC network driver"
+ depends on SOC_TYPE_XWAY
+ ---help---
+ Support for the PMAC of the Gigabit switch (GSWIP) inside the
+ Lantiq / Intel VRX200 VDSL SoC
+
source "drivers/net/ethernet/marvell/Kconfig"
source "drivers/net/ethernet/mediatek/Kconfig"
source "drivers/net/ethernet/mellanox/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index b45d5f626b59..7b5bf9682066 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_NET_VENDOR_XSCALE) += xscale/
obj-$(CONFIG_JME) += jme.o
obj-$(CONFIG_KORINA) += korina.o
obj-$(CONFIG_LANTIQ_ETOP) += lantiq_etop.o
+obj-$(CONFIG_LANTIQ_XRX200) += lantiq_xrx200.o
obj-$(CONFIG_NET_VENDOR_MARVELL) += marvell/
obj-$(CONFIG_NET_VENDOR_MEDIATEK) += mediatek/
obj-$(CONFIG_NET_VENDOR_MELLANOX) += mellanox/
diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c
new file mode 100644
index 000000000000..c8b6d908f0cc
--- /dev/null
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Lantiq / Intel PMAC driver for XRX200 SoCs
+ *
+ * Copyright (C) 2010 Lantiq Deutschland
+ * Copyright (C) 2012 John Crispin <john@phrozen.org>
+ * Copyright (C) 2017 - 2018 Hauke Mehrtens <hauke@hauke-m.de>
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+
+#include <linux/of_net.h>
+#include <linux/of_platform.h>
+
+#include <xway_dma.h>
+
+/* DMA */
+#define XRX200_DMA_DATA_LEN 0x600
+#define XRX200_DMA_RX 0
+#define XRX200_DMA_TX 1
+
+/* cpu port mac */
+#define PMAC_RX_IPG 0x0024
+#define PMAC_RX_IPG_MASK 0xf
+
+#define PMAC_HD_CTL 0x0000
+/* Add Ethernet header to packets from DMA to PMAC */
+#define PMAC_HD_CTL_ADD BIT(0)
+/* Add VLAN tag to Packets from DMA to PMAC */
+#define PMAC_HD_CTL_TAG BIT(1)
+/* Add CRC to packets from DMA to PMAC */
+#define PMAC_HD_CTL_AC BIT(2)
+/* Add status header to packets from PMAC to DMA */
+#define PMAC_HD_CTL_AS BIT(3)
+/* Remove CRC from packets from PMAC to DMA */
+#define PMAC_HD_CTL_RC BIT(4)
+/* Remove Layer-2 header from packets from PMAC to DMA */
+#define PMAC_HD_CTL_RL2 BIT(5)
+/* Status header is present from DMA to PMAC */
+#define PMAC_HD_CTL_RXSH BIT(6)
+/* Add special tag from PMAC to switch */
+#define PMAC_HD_CTL_AST BIT(7)
+/* Remove specail Tag from PMAC to DMA */
+#define PMAC_HD_CTL_RST BIT(8)
+/* Check CRC from DMA to PMAC */
+#define PMAC_HD_CTL_CCRC BIT(9)
+/* Enable reaction to Pause frames in the PMAC */
+#define PMAC_HD_CTL_FC BIT(10)
+
+struct xrx200_chan {
+ int tx_free;
+
+ struct napi_struct napi;
+ struct ltq_dma_channel dma;
+ struct sk_buff *skb[LTQ_DESC_NUM];
+
+ struct xrx200_priv *priv;
+};
+
+struct xrx200_priv {
+ struct clk *clk;
+
+ struct xrx200_chan chan_tx;
+ struct xrx200_chan chan_rx;
+
+ struct net_device *net_dev;
+ struct device *dev;
+
+ __iomem void *pmac_reg;
+};
+
+static u32 xrx200_pmac_r32(struct xrx200_priv *priv, u32 offset)
+{
+ return __raw_readl(priv->pmac_reg + offset);
+}
+
+static void xrx200_pmac_w32(struct xrx200_priv *priv, u32 val, u32 offset)
+{
+ __raw_writel(val, priv->pmac_reg + offset);
+}
+
+static void xrx200_pmac_mask(struct xrx200_priv *priv, u32 clear, u32 set,
+ u32 offset)
+{
+ u32 val = xrx200_pmac_r32(priv, offset);
+
+ val &= ~(clear);
+ val |= set;
+ xrx200_pmac_w32(priv, val, offset);
+}
+
+/* drop all the packets from the DMA ring */
+static void xrx200_flush_dma(struct xrx200_chan *ch)
+{
+ int i;
+
+ for (i = 0; i < LTQ_DESC_NUM; i++) {
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+
+ if ((desc->ctl & (LTQ_DMA_OWN | LTQ_DMA_C)) != LTQ_DMA_C)
+ break;
+
+ desc->ctl = LTQ_DMA_OWN | LTQ_DMA_RX_OFFSET(NET_IP_ALIGN) |
+ XRX200_DMA_DATA_LEN;
+ ch->dma.desc++;
+ ch->dma.desc %= LTQ_DESC_NUM;
+ }
+}
+
+static int xrx200_open(struct net_device *net_dev)
+{
+ struct xrx200_priv *priv = netdev_priv(net_dev);
+ int err;
+
+ /* enable clock gate */
+ err = clk_prepare_enable(priv->clk);
+ if (err)
+ return err;
+
+ napi_enable(&priv->chan_tx.napi);
+ ltq_dma_open(&priv->chan_tx.dma);
+ ltq_dma_enable_irq(&priv->chan_tx.dma);
+
+ napi_enable(&priv->chan_rx.napi);
+ ltq_dma_open(&priv->chan_rx.dma);
+ /* The boot loader does not always deactivate the receiving of frames
+ * on the ports and then some packets queue up in the PPE buffers.
+ * They already passed the PMAC so they do not have the tags
+ * configured here. Read the these packets here and drop them.
+ * The HW should have written them into memory after 10us
+ */
+ usleep_range(20, 40);
+ xrx200_flush_dma(&priv->chan_rx);
+ ltq_dma_enable_irq(&priv->chan_rx.dma);
+
+ netif_wake_queue(net_dev);
+
+ return 0;
+}
+
+static int xrx200_close(struct net_device *net_dev)
+{
+ struct xrx200_priv *priv = netdev_priv(net_dev);
+
+ netif_stop_queue(net_dev);
+
+ napi_disable(&priv->chan_rx.napi);
+ ltq_dma_close(&priv->chan_rx.dma);
+
+ napi_disable(&priv->chan_tx.napi);
+ ltq_dma_close(&priv->chan_tx.dma);
+
+ clk_disable_unprepare(priv->clk);
+
+ return 0;
+}
+
+static int xrx200_alloc_skb(struct xrx200_chan *ch)
+{
+ int ret = 0;
+
+ ch->skb[ch->dma.desc] = netdev_alloc_skb_ip_align(ch->priv->net_dev,
+ XRX200_DMA_DATA_LEN);
+ if (!ch->skb[ch->dma.desc]) {
+ ret = -ENOMEM;
+ goto skip;
+ }
+
+ ch->dma.desc_base[ch->dma.desc].addr = dma_map_single(ch->priv->dev,
+ ch->skb[ch->dma.desc]->data, XRX200_DMA_DATA_LEN,
+ DMA_FROM_DEVICE);
+ if (unlikely(dma_mapping_error(ch->priv->dev,
+ ch->dma.desc_base[ch->dma.desc].addr))) {
+ dev_kfree_skb_any(ch->skb[ch->dma.desc]);
+ ret = -ENOMEM;
+ goto skip;
+ }
+
+skip:
+ ch->dma.desc_base[ch->dma.desc].ctl =
+ LTQ_DMA_OWN | LTQ_DMA_RX_OFFSET(NET_IP_ALIGN) |
+ XRX200_DMA_DATA_LEN;
+
+ return ret;
+}
+
+static int xrx200_hw_receive(struct xrx200_chan *ch)
+{
+ struct xrx200_priv *priv = ch->priv;
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+ struct sk_buff *skb = ch->skb[ch->dma.desc];
+ int len = (desc->ctl & LTQ_DMA_SIZE_MASK);
+ struct net_device *net_dev = priv->net_dev;
+ int ret;
+
+ ret = xrx200_alloc_skb(ch);
+
+ ch->dma.desc++;
+ ch->dma.desc %= LTQ_DESC_NUM;
+
+ if (ret) {
+ netdev_err(net_dev, "failed to allocate new rx buffer\n");
+ return ret;
+ }
+
+ skb_put(skb, len);
+ skb->protocol = eth_type_trans(skb, net_dev);
+ netif_receive_skb(skb);
+ net_dev->stats.rx_packets++;
+ net_dev->stats.rx_bytes += len - ETH_FCS_LEN;
+
+ return 0;
+}
+
+static int xrx200_poll_rx(struct napi_struct *napi, int budget)
+{
+ struct xrx200_chan *ch = container_of(napi,
+ struct xrx200_chan, napi);
+ int rx = 0;
+ int ret;
+
+ while (rx < budget) {
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+
+ if ((desc->ctl & (LTQ_DMA_OWN | LTQ_DMA_C)) == LTQ_DMA_C) {
+ ret = xrx200_hw_receive(ch);
+ if (ret)
+ return ret;
+ rx++;
+ } else {
+ break;
+ }
+ }
+
+ if (rx < budget) {
+ napi_complete(&ch->napi);
+ ltq_dma_enable_irq(&ch->dma);
+ }
+
+ return rx;
+}
+
+static int xrx200_tx_housekeeping(struct napi_struct *napi, int budget)
+{
+ struct xrx200_chan *ch = container_of(napi,
+ struct xrx200_chan, napi);
+ struct net_device *net_dev = ch->priv->net_dev;
+ int pkts = 0;
+ int bytes = 0;
+
+ while (pkts < budget) {
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->tx_free];
+
+ if ((desc->ctl & (LTQ_DMA_OWN | LTQ_DMA_C)) == LTQ_DMA_C) {
+ struct sk_buff *skb = ch->skb[ch->tx_free];
+
+ pkts++;
+ bytes += skb->len;
+ ch->skb[ch->tx_free] = NULL;
+ consume_skb(skb);
+ memset(&ch->dma.desc_base[ch->tx_free], 0,
+ sizeof(struct ltq_dma_desc));
+ ch->tx_free++;
+ ch->tx_free %= LTQ_DESC_NUM;
+ } else {
+ break;
+ }
+ }
+
+ net_dev->stats.tx_packets += pkts;
+ net_dev->stats.tx_bytes += bytes;
+ netdev_completed_queue(ch->priv->net_dev, pkts, bytes);
+
+ if (pkts < budget) {
+ napi_complete(&ch->napi);
+ ltq_dma_enable_irq(&ch->dma);
+ }
+
+ return pkts;
+}
+
+static int xrx200_start_xmit(struct sk_buff *skb, struct net_device *net_dev)
+{
+ struct xrx200_priv *priv = netdev_priv(net_dev);
+ struct xrx200_chan *ch = &priv->chan_tx;
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+ u32 byte_offset;
+ dma_addr_t mapping;
+ int len;
+
+ skb->dev = net_dev;
+ if (skb_put_padto(skb, ETH_ZLEN)) {
+ net_dev->stats.tx_dropped++;
+ return NETDEV_TX_OK;
+ }
+
+ len = skb->len;
+
+ if ((desc->ctl & (LTQ_DMA_OWN | LTQ_DMA_C)) || ch->skb[ch->dma.desc]) {
+ netdev_err(net_dev, "tx ring full\n");
+ netif_stop_queue(net_dev);
+ return NETDEV_TX_BUSY;
+ }
+
+ ch->skb[ch->dma.desc] = skb;
+
+ mapping = dma_map_single(priv->dev, skb->data, len, DMA_TO_DEVICE);
+ if (unlikely(dma_mapping_error(priv->dev, mapping)))
+ goto err_drop;
+
+ /* dma needs to start on a 16 byte aligned address */
+ byte_offset = mapping % 16;
+
+ desc->addr = mapping - byte_offset;
+ /* Make sure the address is written before we give it to HW */
+ wmb();
+ desc->ctl = LTQ_DMA_OWN | LTQ_DMA_SOP | LTQ_DMA_EOP |
+ LTQ_DMA_TX_OFFSET(byte_offset) | (len & LTQ_DMA_SIZE_MASK);
+ ch->dma.desc++;
+ ch->dma.desc %= LTQ_DESC_NUM;
+ if (ch->dma.desc == ch->tx_free)
+ netif_stop_queue(net_dev);
+
+ netdev_sent_queue(net_dev, len);
+
+ return NETDEV_TX_OK;
+
+err_drop:
+ dev_kfree_skb(skb);
+ net_dev->stats.tx_dropped++;
+ net_dev->stats.tx_errors++;
+ return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops xrx200_netdev_ops = {
+ .ndo_open = xrx200_open,
+ .ndo_stop = xrx200_close,
+ .ndo_start_xmit = xrx200_start_xmit,
+ .ndo_set_mac_address = eth_mac_addr,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_change_mtu = eth_change_mtu,
+};
+
+static irqreturn_t xrx200_dma_irq(int irq, void *ptr)
+{
+ struct xrx200_chan *ch = ptr;
+
+ ltq_dma_disable_irq(&ch->dma);
+ ltq_dma_ack_irq(&ch->dma);
+
+ napi_schedule(&ch->napi);
+
+ return IRQ_HANDLED;
+}
+
+static int xrx200_dma_init(struct xrx200_priv *priv)
+{
+ struct xrx200_chan *ch_rx = &priv->chan_rx;
+ struct xrx200_chan *ch_tx = &priv->chan_tx;
+ int ret = 0;
+ int i;
+
+ ltq_dma_init_port(DMA_PORT_ETOP);
+
+ ch_rx->dma.nr = XRX200_DMA_RX;
+ ch_rx->dma.dev = priv->dev;
+ ch_rx->priv = priv;
+
+ ltq_dma_alloc_rx(&ch_rx->dma);
+ for (ch_rx->dma.desc = 0; ch_rx->dma.desc < LTQ_DESC_NUM;
+ ch_rx->dma.desc++) {
+ ret = xrx200_alloc_skb(ch_rx);
+ if (ret)
+ goto rx_free;
+ }
+ ch_rx->dma.desc = 0;
+ ret = devm_request_irq(priv->dev, ch_rx->dma.irq, xrx200_dma_irq, 0,
+ "xrx200_net_rx", &priv->chan_rx);
+ if (ret) {
+ dev_err(priv->dev, "failed to request RX irq %d\n",
+ ch_rx->dma.irq);
+ goto rx_ring_free;
+ }
+
+ ch_tx->dma.nr = XRX200_DMA_TX;
+ ch_tx->dma.dev = priv->dev;
+ ch_tx->priv = priv;
+
+ ltq_dma_alloc_tx(&ch_tx->dma);
+ ret = devm_request_irq(priv->dev, ch_tx->dma.irq, xrx200_dma_irq, 0,
+ "xrx200_net_tx", &priv->chan_tx);
+ if (ret) {
+ dev_err(priv->dev, "failed to request TX irq %d\n",
+ ch_tx->dma.irq);
+ goto tx_free;
+ }
+
+ return ret;
+
+tx_free:
+ ltq_dma_free(&ch_tx->dma);
+
+rx_ring_free:
+ /* free the allocated RX ring */
+ for (i = 0; i < LTQ_DESC_NUM; i++) {
+ if (priv->chan_rx.skb[i])
+ dev_kfree_skb_any(priv->chan_rx.skb[i]);
+ }
+
+rx_free:
+ ltq_dma_free(&ch_rx->dma);
+ return ret;
+}
+
+static void xrx200_hw_cleanup(struct xrx200_priv *priv)
+{
+ int i;
+
+ ltq_dma_free(&priv->chan_tx.dma);
+ ltq_dma_free(&priv->chan_rx.dma);
+
+ /* free the allocated RX ring */
+ for (i = 0; i < LTQ_DESC_NUM; i++)
+ dev_kfree_skb_any(priv->chan_rx.skb[i]);
+}
+
+static int xrx200_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct device_node *np = dev->of_node;
+ struct resource *res;
+ struct xrx200_priv *priv;
+ struct net_device *net_dev;
+ const u8 *mac;
+ int err;
+
+ /* alloc the network device */
+ net_dev = devm_alloc_etherdev(dev, sizeof(struct xrx200_priv));
+ if (!net_dev)
+ return -ENOMEM;
+
+ priv = netdev_priv(net_dev);
+ priv->net_dev = net_dev;
+ priv->dev = dev;
+
+ net_dev->netdev_ops = &xrx200_netdev_ops;
+ SET_NETDEV_DEV(net_dev, dev);
+ net_dev->min_mtu = ETH_ZLEN;
+ net_dev->max_mtu = XRX200_DMA_DATA_LEN;
+
+ /* load the memory ranges */
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(dev, "failed to get resources\n");
+ return -ENOENT;
+ }
+
+ priv->pmac_reg = devm_ioremap_resource(dev, res);
+ if (!priv->pmac_reg) {
+ dev_err(dev, "failed to request and remap io ranges\n");
+ return -ENOMEM;
+ }
+
+ priv->chan_rx.dma.irq = platform_get_irq_byname(pdev, "rx");
+ if (priv->chan_rx.dma.irq < 0) {
+ dev_err(dev, "failed to get RX IRQ, %i\n",
+ priv->chan_rx.dma.irq);
+ return -ENOENT;
+ }
+ priv->chan_tx.dma.irq = platform_get_irq_byname(pdev, "tx");
+ if (priv->chan_tx.dma.irq < 0) {
+ dev_err(dev, "failed to get TX IRQ, %i\n",
+ priv->chan_tx.dma.irq);
+ return -ENOENT;
+ }
+
+ /* get the clock */
+ priv->clk = devm_clk_get(dev, NULL);
+ if (IS_ERR(priv->clk)) {
+ dev_err(dev, "failed to get clock\n");
+ return PTR_ERR(priv->clk);
+ }
+
+ mac = of_get_mac_address(np);
+ if (mac && is_valid_ether_addr(mac))
+ ether_addr_copy(net_dev->dev_addr, mac);
+ else
+ eth_hw_addr_random(net_dev);
+
+ /* bring up the dma engine and IP core */
+ err = xrx200_dma_init(priv);
+ if (err)
+ return err;
+
+ /* set IPG to 12 */
+ xrx200_pmac_mask(priv, PMAC_RX_IPG_MASK, 0xb, PMAC_RX_IPG);
+
+ /* enable status header, enable CRC */
+ xrx200_pmac_mask(priv, 0,
+ PMAC_HD_CTL_RST | PMAC_HD_CTL_AST | PMAC_HD_CTL_RXSH |
+ PMAC_HD_CTL_AS | PMAC_HD_CTL_AC | PMAC_HD_CTL_RC,
+ PMAC_HD_CTL);
+
+ /* setup NAPI */
+ netif_napi_add(net_dev, &priv->chan_rx.napi, xrx200_poll_rx, 32);
+ netif_napi_add(net_dev, &priv->chan_tx.napi, xrx200_tx_housekeeping, 32);
+
+ platform_set_drvdata(pdev, priv);
+
+ err = register_netdev(net_dev);
+ if (err)
+ goto err_uninit_dma;
+ return err;
+
+err_uninit_dma:
+ xrx200_hw_cleanup(priv);
+
+ return 0;
+}
+
+static int xrx200_remove(struct platform_device *pdev)
+{
+ struct xrx200_priv *priv = platform_get_drvdata(pdev);
+ struct net_device *net_dev = priv->net_dev;
+
+ /* free stack related instances */
+ netif_stop_queue(net_dev);
+ netif_napi_del(&priv->chan_tx.napi);
+ netif_napi_del(&priv->chan_rx.napi);
+
+ /* remove the actual device */
+ unregister_netdev(net_dev);
+
+ /* shut down hardware */
+ xrx200_hw_cleanup(priv);
+
+ return 0;
+}
+
+static const struct of_device_id xrx200_match[] = {
+ { .compatible = "lantiq,xrx200-net" },
+ {},
+};
+MODULE_DEVICE_TABLE(of, xrx200_match);
+
+static struct platform_driver xrx200_driver = {
+ .probe = xrx200_probe,
+ .remove = xrx200_remove,
+ .driver = {
+ .name = "lantiq,xrx200-net",
+ .of_match_table = xrx200_match,
+ },
+};
+
+module_platform_driver(xrx200_driver);
+
+MODULE_AUTHOR("John Crispin <john@phrozen.org>");
+MODULE_DESCRIPTION("Lantiq SoC XRX200 ethernet");
+MODULE_LICENSE("GPL");
--
2.11.0
^ permalink raw reply related
* [PATCH v3 net-next 5/6] dt-bindings: net: dsa: Add lantiq,xrx200-gswip DT bindings
From: Hauke Mehrtens @ 2018-09-09 20:20 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree
In-Reply-To: <20180909201647.32727-1-hauke@hauke-m.d>
This adds the binding for the GSWIP (Gigabit switch) core found in the
xrx200 / VR9 Lantiq / Intel SoC.
This part takes care of the switch, MDIO bus, and loading the FW into
the embedded GPHYs.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: devicetree@vger.kernel.org
---
.../devicetree/bindings/net/dsa/lantiq-gswip.txt | 141 +++++++++++++++++++++
1 file changed, 141 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
diff --git a/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
new file mode 100644
index 000000000000..a089f5856778
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
@@ -0,0 +1,141 @@
+Lantiq GSWIP Ethernet switches
+==================================
+
+Required properties for GSWIP core:
+
+- compatible : "lantiq,xrx200-gswip" for the embedded GSWIP in the
+ xRX200 SoC
+- reg : memory range of the GSWIP core registers
+ : memory range of the GSWIP MDIO registers
+ : memory range of the GSWIP MII registers
+
+See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of
+additional required and optional properties.
+
+
+Required properties for MDIO bus:
+- compatible : "lantiq,xrx200-mdio" for the MDIO bus inside the GSWIP
+ core of the xRX200 SoC and the PHYs connected to it.
+
+See Documentation/devicetree/bindings/net/mdio.txt for a list of additional
+required and optional properties.
+
+
+Required properties for GPHY firmware loading:
+- compatible : "lantiq,gphy-fw" and "lantiq,xrx200-gphy-fw",
+ "lantiq,xrx200a1x-gphy-fw", "lantiq,xrx200a2x-gphy-fw",
+ "lantiq,xrx300-gphy-fw", or "lantiq,xrx330-gphy-fw"
+ for the loading of the firmware into the embedded
+ GPHY core of the SoC.
+- lantiq,rcu : reference to the rcu syscon
+
+The GPHY firmware loader has a list of GPHY entries, one for each
+embedded GPHY
+
+- reg : Offset of the GPHY firmware register in the RCU
+ register range
+- resets : list of resets of the embedded GPHY
+- reset-names : list of names of the resets
+
+Example:
+
+Ethernet switch on the VRX200 SoC:
+
+gswip: gswip@E108000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "lantiq,xrx200-gswip";
+ reg = < 0xE108000 0x3000 /* switch */
+ 0xE10B100 0x70 /* mdio */
+ 0xE10B1D8 0x30 /* mii */
+ >;
+ dsa,member = <0 0>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan3";
+ phy-mode = "rgmii";
+ phy-handle = <&phy0>;
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan4";
+ phy-mode = "rgmii";
+ phy-handle = <&phy1>;
+ };
+
+ port@2 {
+ reg = <2>;
+ label = "lan2";
+ phy-mode = "internal";
+ phy-handle = <&phy11>;
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "lan1";
+ phy-mode = "internal";
+ phy-handle = <&phy13>;
+ };
+
+ port@5 {
+ reg = <5>;
+ label = "wan";
+ phy-mode = "rgmii";
+ phy-handle = <&phy5>;
+ };
+
+ port@6 {
+ reg = <0x6>;
+ label = "cpu";
+ ethernet = <ð0>;
+ };
+ };
+
+ mdio@0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "lantiq,xrx200-mdio";
+ reg = <0>;
+
+ phy0: ethernet-phy@0 {
+ reg = <0x0>;
+ };
+ phy1: ethernet-phy@1 {
+ reg = <0x1>;
+ };
+ phy5: ethernet-phy@5 {
+ reg = <0x5>;
+ };
+ phy11: ethernet-phy@11 {
+ reg = <0x11>;
+ };
+ phy13: ethernet-phy@13 {
+ reg = <0x13>;
+ };
+ };
+
+ gphy-fw {
+ compatible = "lantiq,xrx200-gphy-fw", "lantiq,gphy-fw";
+ lantiq,rcu = <&rcu0>;
+
+ gphy@20 {
+ reg = <0x20>;
+
+ resets = <&reset0 31 30>;
+ reset-names = "gphy";
+ };
+
+ gphy@68 {
+ reg = <0x68>;
+
+ resets = <&reset0 29 28>;
+ reset-names = "gphy";
+ };
+ };
+};
--
2.11.0
^ permalink raw reply related
* [PATCH v3 net-next 6/6] net: dsa: Add Lantiq / Intel DSA driver for vrx200
From: Hauke Mehrtens @ 2018-09-09 20:20 UTC (permalink / raw)
To: davem
Cc: netdev, andrew, vivien.didelot, f.fainelli, john, linux-mips, dev,
hauke.mehrtens, devicetree
In-Reply-To: <20180909201647.32727-1-hauke@hauke-m.d>
This adds the DSA driver for the GSWIP Switch found in the VRX200 SoC.
This switch is integrated in the DSL SoC, this SoC uses a GSWIP version
2.1, there are other SoCs using different versions of this IP block, but
this driver was only tested with the version found in the VRX200.
Currently only the basic features are implemented which will forward all
packages to the CPU and let the CPU do the forwarding. The hardware also
support Layer 2 offloading which is not yet implemented in this driver.
The GPHY FW loaded is now done by this driver and not any more by the
separate driver in drivers/soc/lantiq/gphy.c, I will remove this driver
is a separate patch. to make use of the GPHY this switch driver is
needed anyway. Other SoCs have more embedded GPHYs so this driver should
support a variable number of GPHYs. After the firmware was loaded the
GPHY can be probed on the MDIO bus and it behaves like an external GPHY,
without the firmware it can not be probed on the MDIO bus.
The clock names in the sysctrl.c file have to be changed because the
clocks are now used by a different driver. This should be cleaned up and
a real common clock driver should provide the clocks instead.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
---
MAINTAINERS | 2 +
arch/mips/lantiq/xway/sysctrl.c | 8 +-
drivers/net/dsa/Kconfig | 8 +
drivers/net/dsa/Makefile | 1 +
drivers/net/dsa/lantiq_gswip.c | 1169 +++++++++++++++++++++++++++++++++++++++
drivers/net/dsa/lantiq_pce.h | 153 +++++
6 files changed, 1337 insertions(+), 4 deletions(-)
create mode 100644 drivers/net/dsa/lantiq_gswip.c
create mode 100644 drivers/net/dsa/lantiq_pce.h
diff --git a/MAINTAINERS b/MAINTAINERS
index ffff912d31b5..5ae7385840e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8172,6 +8172,8 @@ L: netdev@vger.kernel.org
S: Maintained
F: net/dsa/tag_gswip.c
F: drivers/net/ethernet/lantiq_xrx200.c
+F: drivers/net/dsa/lantiq_pce.h
+F: drivers/net/dsa/intel_gswip.c
LANTIQ MIPS ARCHITECTURE
M: John Crispin <john@phrozen.org>
diff --git a/arch/mips/lantiq/xway/sysctrl.c b/arch/mips/lantiq/xway/sysctrl.c
index eeb89a37e27e..fe25c99089b7 100644
--- a/arch/mips/lantiq/xway/sysctrl.c
+++ b/arch/mips/lantiq/xway/sysctrl.c
@@ -516,8 +516,8 @@ void __init ltq_soc_init(void)
clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH |
PMU_PPE_DP | PMU_PPE_TC);
clkdev_add_pmu("1da00000.usif", "NULL", 1, 0, PMU_USIF);
- clkdev_add_pmu("1f203020.gphy", NULL, 1, 0, PMU_GPHY);
- clkdev_add_pmu("1f203068.gphy", NULL, 1, 0, PMU_GPHY);
+ clkdev_add_pmu("1e108000.gswip", "gphy0", 0, 0, PMU_GPHY);
+ clkdev_add_pmu("1e108000.gswip", "gphy1", 0, 0, PMU_GPHY);
clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU);
clkdev_add_pmu("1e116000.mei", "afe", 1, 2, PMU_ANALOG_DSL_AFE);
clkdev_add_pmu("1e116000.mei", "dfe", 1, 0, PMU_DFE);
@@ -540,8 +540,8 @@ void __init ltq_soc_init(void)
PMU_SWITCH | PMU_PPE_DPLUS | PMU_PPE_DPLUM |
PMU_PPE_EMA | PMU_PPE_TC | PMU_PPE_SLL01 |
PMU_PPE_QSB | PMU_PPE_TOP);
- clkdev_add_pmu("1f203020.gphy", NULL, 0, 0, PMU_GPHY);
- clkdev_add_pmu("1f203068.gphy", NULL, 0, 0, PMU_GPHY);
+ clkdev_add_pmu("1e108000.gswip", "gphy0", 0, 0, PMU_GPHY);
+ clkdev_add_pmu("1e108000.gswip", "gphy1", 0, 0, PMU_GPHY);
clkdev_add_pmu("1e103000.sdio", NULL, 1, 0, PMU_SDIO);
clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU);
clkdev_add_pmu("1e116000.mei", "dfe", 1, 0, PMU_DFE);
diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index d3ce1e4cb4d3..7c09d8f195ae 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -23,6 +23,14 @@ config NET_DSA_LOOP
This enables support for a fake mock-up switch chip which
exercises the DSA APIs.
+config NET_DSA_LANTIQ_GSWIP
+ tristate "Lantiq / Intel GSWIP"
+ depends on NET_DSA
+ select NET_DSA_TAG_GSWIP
+ ---help---
+ This enables support for the Lantiq / Intel GSWIP 2.1 found in
+ the xrx200 / VR9 SoC.
+
config NET_DSA_MT7530
tristate "Mediatek MT7530 Ethernet switch support"
depends on NET_DSA
diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile
index 46c1cba91ffe..82e5d794c41f 100644
--- a/drivers/net/dsa/Makefile
+++ b/drivers/net/dsa/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_NET_DSA_LOOP) += dsa_loop.o
ifdef CONFIG_NET_DSA_LOOP
obj-$(CONFIG_FIXED_PHY) += dsa_loop_bdinfo.o
endif
+obj-$(CONFIG_NET_DSA_LANTIQ_GSWIP) += lantiq_gswip.o
obj-$(CONFIG_NET_DSA_MT7530) += mt7530.o
obj-$(CONFIG_NET_DSA_MV88E6060) += mv88e6060.o
obj-$(CONFIG_NET_DSA_QCA8K) += qca8k.o
diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
new file mode 100644
index 000000000000..9c28d0b2fcdb
--- /dev/null
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -0,0 +1,1169 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Lantiq / Intel GSWIP switch driver for VRX200 SoCs
+ *
+ * Copyright (C) 2010 Lantiq Deutschland
+ * Copyright (C) 2012 John Crispin <john@phrozen.org>
+ * Copyright (C) 2017 - 2018 Hauke Mehrtens <hauke@hauke-m.de>
+ */
+
+#include <linux/clk.h>
+#include <linux/etherdevice.h>
+#include <linux/firmware.h>
+#include <linux/if_bridge.h>
+#include <linux/if_vlan.h>
+#include <linux/iopoll.h>
+#include <linux/mfd/syscon.h>
+#include <linux/module.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/of_platform.h>
+#include <linux/phy.h>
+#include <linux/phylink.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/reset.h>
+#include <net/dsa.h>
+#include <dt-bindings/mips/lantiq_rcu_gphy.h>
+
+#include "lantiq_pce.h"
+
+/* GSWIP MDIO Registers */
+#define GSWIP_MDIO_GLOB 0x00
+#define GSWIP_MDIO_GLOB_ENABLE BIT(15)
+#define GSWIP_MDIO_CTRL 0x08
+#define GSWIP_MDIO_CTRL_BUSY BIT(12)
+#define GSWIP_MDIO_CTRL_RD BIT(11)
+#define GSWIP_MDIO_CTRL_WR BIT(10)
+#define GSWIP_MDIO_CTRL_PHYAD_MASK 0x1f
+#define GSWIP_MDIO_CTRL_PHYAD_SHIFT 5
+#define GSWIP_MDIO_CTRL_REGAD_MASK 0x1f
+#define GSWIP_MDIO_READ 0x09
+#define GSWIP_MDIO_WRITE 0x0A
+#define GSWIP_MDIO_MDC_CFG0 0x0B
+#define GSWIP_MDIO_MDC_CFG1 0x0C
+#define GSWIP_MDIO_PHYp(p) (0x15 - (p))
+#define GSWIP_MDIO_PHY_LINK_MASK 0x6000
+#define GSWIP_MDIO_PHY_LINK_AUTO 0x0000
+#define GSWIP_MDIO_PHY_LINK_DOWN 0x4000
+#define GSWIP_MDIO_PHY_LINK_UP 0x2000
+#define GSWIP_MDIO_PHY_SPEED_MASK 0x1800
+#define GSWIP_MDIO_PHY_SPEED_AUTO 0x1800
+#define GSWIP_MDIO_PHY_SPEED_M10 0x0000
+#define GSWIP_MDIO_PHY_SPEED_M100 0x0800
+#define GSWIP_MDIO_PHY_SPEED_G1 0x1000
+#define GSWIP_MDIO_PHY_FDUP_MASK 0x0600
+#define GSWIP_MDIO_PHY_FDUP_AUTO 0x0000
+#define GSWIP_MDIO_PHY_FDUP_EN 0x0200
+#define GSWIP_MDIO_PHY_FDUP_DIS 0x0600
+#define GSWIP_MDIO_PHY_FCONTX_MASK 0x0180
+#define GSWIP_MDIO_PHY_FCONTX_AUTO 0x0000
+#define GSWIP_MDIO_PHY_FCONTX_EN 0x0100
+#define GSWIP_MDIO_PHY_FCONTX_DIS 0x0180
+#define GSWIP_MDIO_PHY_FCONRX_MASK 0x0060
+#define GSWIP_MDIO_PHY_FCONRX_AUTO 0x0000
+#define GSWIP_MDIO_PHY_FCONRX_EN 0x0020
+#define GSWIP_MDIO_PHY_FCONRX_DIS 0x0060
+#define GSWIP_MDIO_PHY_ADDR_MASK 0x001f
+#define GSWIP_MDIO_PHY_MASK (GSWIP_MDIO_PHY_ADDR_MASK | \
+ GSWIP_MDIO_PHY_FCONRX_MASK | \
+ GSWIP_MDIO_PHY_FCONTX_MASK | \
+ GSWIP_MDIO_PHY_LINK_MASK | \
+ GSWIP_MDIO_PHY_SPEED_MASK | \
+ GSWIP_MDIO_PHY_FDUP_MASK)
+
+/* GSWIP MII Registers */
+#define GSWIP_MII_CFG0 0x00
+#define GSWIP_MII_CFG1 0x02
+#define GSWIP_MII_CFG5 0x04
+#define GSWIP_MII_CFG_EN BIT(14)
+#define GSWIP_MII_CFG_LDCLKDIS BIT(12)
+#define GSWIP_MII_CFG_MODE_MIIP 0x0
+#define GSWIP_MII_CFG_MODE_MIIM 0x1
+#define GSWIP_MII_CFG_MODE_RMIIP 0x2
+#define GSWIP_MII_CFG_MODE_RMIIM 0x3
+#define GSWIP_MII_CFG_MODE_RGMII 0x4
+#define GSWIP_MII_CFG_MODE_MASK 0xf
+#define GSWIP_MII_CFG_RATE_M2P5 0x00
+#define GSWIP_MII_CFG_RATE_M25 0x10
+#define GSWIP_MII_CFG_RATE_M125 0x20
+#define GSWIP_MII_CFG_RATE_M50 0x30
+#define GSWIP_MII_CFG_RATE_AUTO 0x40
+#define GSWIP_MII_CFG_RATE_MASK 0x70
+#define GSWIP_MII_PCDU0 0x01
+#define GSWIP_MII_PCDU1 0x03
+#define GSWIP_MII_PCDU5 0x05
+#define GSWIP_MII_PCDU_TXDLY_MASK GENMASK(2, 0)
+#define GSWIP_MII_PCDU_RXDLY_MASK GENMASK(9, 7)
+
+/* GSWIP Core Registers */
+#define GSWIP_SWRES 0x000
+#define GSWIP_SWRES_R1 BIT(1) /* GSWIP Software reset */
+#define GSWIP_SWRES_R0 BIT(0) /* GSWIP Hardware reset */
+#define GSWIP_VERSION 0x013
+#define GSWIP_VERSION_REV_SHIFT 0
+#define GSWIP_VERSION_REV_MASK GENMASK(7, 0)
+#define GSWIP_VERSION_MOD_SHIFT 8
+#define GSWIP_VERSION_MOD_MASK GENMASK(15, 8)
+#define GSWIP_VERSION_2_0 0x100
+#define GSWIP_VERSION_2_1 0x021
+#define GSWIP_VERSION_2_2 0x122
+#define GSWIP_VERSION_2_2_ETC 0x022
+
+#define GSWIP_BM_RAM_VAL(x) (0x043 - (x))
+#define GSWIP_BM_RAM_ADDR 0x044
+#define GSWIP_BM_RAM_CTRL 0x045
+#define GSWIP_BM_RAM_CTRL_BAS BIT(15)
+#define GSWIP_BM_RAM_CTRL_OPMOD BIT(5)
+#define GSWIP_BM_RAM_CTRL_ADDR_MASK GENMASK(4, 0)
+#define GSWIP_BM_QUEUE_GCTRL 0x04A
+#define GSWIP_BM_QUEUE_GCTRL_GL_MOD BIT(10)
+/* buffer management Port Configuration Register */
+#define GSWIP_BM_PCFGp(p) (0x080 + ((p) * 2))
+#define GSWIP_BM_PCFG_CNTEN BIT(0) /* RMON Counter Enable */
+#define GSWIP_BM_PCFG_IGCNT BIT(1) /* Ingres Special Tag RMON count */
+/* buffer management Port Control Register */
+#define GSWIP_BM_RMON_CTRLp(p) (0x81 + ((p) * 2))
+#define GSWIP_BM_CTRL_RMON_RAM1_RES BIT(0) /* Software Reset for RMON RAM 1 */
+#define GSWIP_BM_CTRL_RMON_RAM2_RES BIT(1) /* Software Reset for RMON RAM 2 */
+
+/* PCE */
+#define GSWIP_PCE_TBL_KEY(x) (0x447 - (x))
+#define GSWIP_PCE_TBL_MASK 0x448
+#define GSWIP_PCE_TBL_VAL(x) (0x44D - (x))
+#define GSWIP_PCE_TBL_ADDR 0x44E
+#define GSWIP_PCE_TBL_CTRL 0x44F
+#define GSWIP_PCE_TBL_CTRL_BAS BIT(15)
+#define GSWIP_PCE_TBL_CTRL_TYPE BIT(13)
+#define GSWIP_PCE_TBL_CTRL_VLD BIT(12)
+#define GSWIP_PCE_TBL_CTRL_KEYFORM BIT(11)
+#define GSWIP_PCE_TBL_CTRL_GMAP_MASK GENMASK(10, 7)
+#define GSWIP_PCE_TBL_CTRL_OPMOD_MASK GENMASK(6, 5)
+#define GSWIP_PCE_TBL_CTRL_OPMOD_ADRD 0x00
+#define GSWIP_PCE_TBL_CTRL_OPMOD_ADWR 0x20
+#define GSWIP_PCE_TBL_CTRL_OPMOD_KSRD 0x40
+#define GSWIP_PCE_TBL_CTRL_OPMOD_KSWR 0x60
+#define GSWIP_PCE_TBL_CTRL_ADDR_MASK GENMASK(4, 0)
+#define GSWIP_PCE_PMAP1 0x453 /* Monitoring port map */
+#define GSWIP_PCE_PMAP2 0x454 /* Default Multicast port map */
+#define GSWIP_PCE_PMAP3 0x455 /* Default Unknown Unicast port map */
+#define GSWIP_PCE_GCTRL_0 0x456
+#define GSWIP_PCE_GCTRL_0_MC_VALID BIT(3)
+#define GSWIP_PCE_GCTRL_0_VLAN BIT(14) /* VLAN aware Switching */
+#define GSWIP_PCE_GCTRL_1 0x457
+#define GSWIP_PCE_GCTRL_1_MAC_GLOCK BIT(2) /* MAC Address table lock */
+#define GSWIP_PCE_GCTRL_1_MAC_GLOCK_MOD BIT(3) /* Mac address table lock forwarding mode */
+#define GSWIP_PCE_PCTRL_0p(p) (0x480 + ((p) * 0xA))
+#define GSWIP_PCE_PCTRL_0_INGRESS BIT(11)
+#define GSWIP_PCE_PCTRL_0_PSTATE_LISTEN 0x0
+#define GSWIP_PCE_PCTRL_0_PSTATE_RX 0x1
+#define GSWIP_PCE_PCTRL_0_PSTATE_TX 0x2
+#define GSWIP_PCE_PCTRL_0_PSTATE_LEARNING 0x3
+#define GSWIP_PCE_PCTRL_0_PSTATE_FORWARDING 0x7
+#define GSWIP_PCE_PCTRL_0_PSTATE_MASK GENMASK(2, 0)
+
+#define GSWIP_MAC_FLEN 0x8C5
+#define GSWIP_MAC_CTRL_2p(p) (0x905 + ((p) * 0xC))
+#define GSWIP_MAC_CTRL_2_MLEN BIT(3) /* Maximum Untagged Frame Lnegth */
+
+/* Ethernet Switch Fetch DMA Port Control Register */
+#define GSWIP_FDMA_PCTRLp(p) (0xA80 + ((p) * 0x6))
+#define GSWIP_FDMA_PCTRL_EN BIT(0) /* FDMA Port Enable */
+#define GSWIP_FDMA_PCTRL_STEN BIT(1) /* Special Tag Insertion Enable */
+#define GSWIP_FDMA_PCTRL_VLANMOD_MASK GENMASK(4, 3) /* VLAN Modification Control */
+#define GSWIP_FDMA_PCTRL_VLANMOD_SHIFT 3 /* VLAN Modification Control */
+#define GSWIP_FDMA_PCTRL_VLANMOD_DIS (0x0 << GSWIP_FDMA_PCTRL_VLANMOD_SHIFT)
+#define GSWIP_FDMA_PCTRL_VLANMOD_PRIO (0x1 << GSWIP_FDMA_PCTRL_VLANMOD_SHIFT)
+#define GSWIP_FDMA_PCTRL_VLANMOD_ID (0x2 << GSWIP_FDMA_PCTRL_VLANMOD_SHIFT)
+#define GSWIP_FDMA_PCTRL_VLANMOD_BOTH (0x3 << GSWIP_FDMA_PCTRL_VLANMOD_SHIFT)
+
+/* Ethernet Switch Store DMA Port Control Register */
+#define GSWIP_SDMA_PCTRLp(p) (0xBC0 + ((p) * 0x6))
+#define GSWIP_SDMA_PCTRL_EN BIT(0) /* SDMA Port Enable */
+#define GSWIP_SDMA_PCTRL_FCEN BIT(1) /* Flow Control Enable */
+#define GSWIP_SDMA_PCTRL_PAUFWD BIT(1) /* Pause Frame Forwarding */
+
+#define XRX200_GPHY_FW_ALIGN (16 * 1024)
+
+struct gswip_hw_info {
+ int max_ports;
+ int cpu_port;
+};
+
+struct xway_gphy_match_data {
+ char *fe_firmware_name;
+ char *ge_firmware_name;
+};
+
+struct gswip_gphy_fw {
+ struct clk *clk_gate;
+ struct reset_control *reset;
+ u32 fw_addr_offset;
+ char *fw_name;
+};
+
+struct gswip_priv {
+ __iomem void *gswip;
+ __iomem void *mdio;
+ __iomem void *mii;
+ const struct gswip_hw_info *hw_info;
+ const struct xway_gphy_match_data *gphy_fw_name_cfg;
+ struct dsa_switch *ds;
+ struct device *dev;
+ struct regmap *rcu_regmap;
+ int num_gphy_fw;
+ struct gswip_gphy_fw *gphy_fw;
+};
+
+struct gswip_rmon_cnt_desc {
+ unsigned int size;
+ unsigned int offset;
+ const char *name;
+};
+
+#define MIB_DESC(_size, _offset, _name) {.size = _size, .offset = _offset, .name = _name}
+
+static const struct gswip_rmon_cnt_desc gswip_rmon_cnt[] = {
+ /** Receive Packet Count (only packets that are accepted and not discarded). */
+ MIB_DESC(1, 0x1F, "RxGoodPkts"),
+ MIB_DESC(1, 0x23, "RxUnicastPkts"),
+ MIB_DESC(1, 0x22, "RxMulticastPkts"),
+ MIB_DESC(1, 0x21, "RxFCSErrorPkts"),
+ MIB_DESC(1, 0x1D, "RxUnderSizeGoodPkts"),
+ MIB_DESC(1, 0x1E, "RxUnderSizeErrorPkts"),
+ MIB_DESC(1, 0x1B, "RxOversizeGoodPkts"),
+ MIB_DESC(1, 0x1C, "RxOversizeErrorPkts"),
+ MIB_DESC(1, 0x20, "RxGoodPausePkts"),
+ MIB_DESC(1, 0x1A, "RxAlignErrorPkts"),
+ MIB_DESC(1, 0x12, "Rx64BytePkts"),
+ MIB_DESC(1, 0x13, "Rx127BytePkts"),
+ MIB_DESC(1, 0x14, "Rx255BytePkts"),
+ MIB_DESC(1, 0x15, "Rx511BytePkts"),
+ MIB_DESC(1, 0x16, "Rx1023BytePkts"),
+ /** Receive Size 1024-1522 (or more, if configured) Packet Count. */
+ MIB_DESC(1, 0x17, "RxMaxBytePkts"),
+ MIB_DESC(1, 0x18, "RxDroppedPkts"),
+ MIB_DESC(1, 0x19, "RxFilteredPkts"),
+ MIB_DESC(2, 0x24, "RxGoodBytes"),
+ MIB_DESC(2, 0x26, "RxBadBytes"),
+ MIB_DESC(1, 0x11, "TxAcmDroppedPkts"),
+ MIB_DESC(1, 0x0C, "TxGoodPkts"),
+ MIB_DESC(1, 0x06, "TxUnicastPkts"),
+ MIB_DESC(1, 0x07, "TxMulticastPkts"),
+ MIB_DESC(1, 0x00, "Tx64BytePkts"),
+ MIB_DESC(1, 0x01, "Tx127BytePkts"),
+ MIB_DESC(1, 0x02, "Tx255BytePkts"),
+ MIB_DESC(1, 0x03, "Tx511BytePkts"),
+ MIB_DESC(1, 0x04, "Tx1023BytePkts"),
+ /** Transmit Size 1024-1522 (or more, if configured) Packet Count. */
+ MIB_DESC(1, 0x05, "TxMaxBytePkts"),
+ MIB_DESC(1, 0x08, "TxSingleCollCount"),
+ MIB_DESC(1, 0x09, "TxMultCollCount"),
+ MIB_DESC(1, 0x0A, "TxLateCollCount"),
+ MIB_DESC(1, 0x0B, "TxExcessCollCount"),
+ MIB_DESC(1, 0x0D, "TxPauseCount"),
+ MIB_DESC(1, 0x10, "TxDroppedPkts"),
+ MIB_DESC(2, 0x0E, "TxGoodBytes"),
+};
+
+static u32 gswip_switch_r(struct gswip_priv *priv, u32 offset)
+{
+ return __raw_readl(priv->gswip + (offset * 4));
+}
+
+static void gswip_switch_w(struct gswip_priv *priv, u32 val, u32 offset)
+{
+ __raw_writel(val, priv->gswip + (offset * 4));
+}
+
+static void gswip_switch_mask(struct gswip_priv *priv, u32 clear, u32 set,
+ u32 offset)
+{
+ u32 val = gswip_switch_r(priv, offset);
+
+ val &= ~(clear);
+ val |= set;
+ gswip_switch_w(priv, val, offset);
+}
+
+static u32 gswip_switch_r_timeout(struct gswip_priv *priv, u32 offset,
+ u32 cleared)
+{
+ u32 val;
+
+ return readx_poll_timeout(__raw_readl, priv->gswip + (offset * 4), val,
+ (val & cleared) == 0, 20, 50000);
+}
+
+static u32 gswip_mdio_r(struct gswip_priv *priv, u32 offset)
+{
+ return __raw_readl(priv->mdio + (offset * 4));
+}
+
+static void gswip_mdio_w(struct gswip_priv *priv, u32 val, u32 offset)
+{
+ __raw_writel(val, priv->mdio + (offset * 4));
+}
+
+static void gswip_mdio_mask(struct gswip_priv *priv, u32 clear, u32 set,
+ u32 offset)
+{
+ u32 val = gswip_mdio_r(priv, offset);
+
+ val &= ~(clear);
+ val |= set;
+ gswip_mdio_w(priv, val, offset);
+}
+
+static u32 gswip_mii_r(struct gswip_priv *priv, u32 offset)
+{
+ return __raw_readl(priv->mii + (offset * 4));
+}
+
+static void gswip_mii_w(struct gswip_priv *priv, u32 val, u32 offset)
+{
+ __raw_writel(val, priv->mii + (offset * 4));
+}
+
+static void gswip_mii_mask(struct gswip_priv *priv, u32 clear, u32 set,
+ u32 offset)
+{
+ u32 val = gswip_mii_r(priv, offset);
+
+ val &= ~(clear);
+ val |= set;
+ gswip_mii_w(priv, val, offset);
+}
+
+static void gswip_mii_mask_cfg(struct gswip_priv *priv, u32 clear, u32 set,
+ int port)
+{
+ switch (port) {
+ case 0:
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_CFG0);
+ break;
+ case 1:
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_CFG1);
+ break;
+ case 5:
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_CFG5);
+ break;
+ }
+}
+
+static void gswip_mii_mask_pcdu(struct gswip_priv *priv, u32 clear, u32 set,
+ int port)
+{
+ switch (port) {
+ case 0:
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_PCDU0);
+ break;
+ case 1:
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_PCDU1);
+ break;
+ case 5:
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_PCDU5);
+ break;
+ }
+}
+
+static int gswip_mdio_poll(struct gswip_priv *priv)
+{
+ int cnt = 100;
+
+ while (likely(cnt--)) {
+ u32 ctrl = gswip_mdio_r(priv, GSWIP_MDIO_CTRL);
+
+ if ((ctrl & GSWIP_MDIO_CTRL_BUSY) == 0)
+ return 0;
+ usleep_range(20, 40);
+ }
+
+ return -ETIMEDOUT;
+}
+
+static int gswip_mdio_wr(struct mii_bus *bus, int addr, int reg, u16 val)
+{
+ struct gswip_priv *priv = bus->priv;
+ int err;
+
+ err = gswip_mdio_poll(priv);
+ if (err) {
+ dev_err(&bus->dev, "waiting for MDIO bus busy timed out\n");
+ return err;
+ }
+
+ gswip_mdio_w(priv, val, GSWIP_MDIO_WRITE);
+ gswip_mdio_w(priv, GSWIP_MDIO_CTRL_BUSY | GSWIP_MDIO_CTRL_WR |
+ ((addr & GSWIP_MDIO_CTRL_PHYAD_MASK) << GSWIP_MDIO_CTRL_PHYAD_SHIFT) |
+ (reg & GSWIP_MDIO_CTRL_REGAD_MASK),
+ GSWIP_MDIO_CTRL);
+
+ return 0;
+}
+
+static int gswip_mdio_rd(struct mii_bus *bus, int addr, int reg)
+{
+ struct gswip_priv *priv = bus->priv;
+ int err;
+
+ err = gswip_mdio_poll(priv);
+ if (err) {
+ dev_err(&bus->dev, "waiting for MDIO bus busy timed out\n");
+ return err;
+ }
+
+ gswip_mdio_w(priv, GSWIP_MDIO_CTRL_BUSY | GSWIP_MDIO_CTRL_RD |
+ ((addr & GSWIP_MDIO_CTRL_PHYAD_MASK) << GSWIP_MDIO_CTRL_PHYAD_SHIFT) |
+ (reg & GSWIP_MDIO_CTRL_REGAD_MASK),
+ GSWIP_MDIO_CTRL);
+
+ err = gswip_mdio_poll(priv);
+ if (err) {
+ dev_err(&bus->dev, "waiting for MDIO bus busy timed out\n");
+ return err;
+ }
+
+ return gswip_mdio_r(priv, GSWIP_MDIO_READ);
+}
+
+static int gswip_mdio(struct gswip_priv *priv, struct device_node *mdio_np)
+{
+ struct dsa_switch *ds = priv->ds;
+
+ ds->slave_mii_bus = devm_mdiobus_alloc(priv->dev);
+ if (!ds->slave_mii_bus)
+ return -ENOMEM;
+
+ ds->slave_mii_bus->priv = priv;
+ ds->slave_mii_bus->read = gswip_mdio_rd;
+ ds->slave_mii_bus->write = gswip_mdio_wr;
+ ds->slave_mii_bus->name = "lantiq,xrx200-mdio";
+ snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "%s-mii",
+ dev_name(priv->dev));
+ ds->slave_mii_bus->parent = priv->dev;
+ ds->slave_mii_bus->phy_mask = ~ds->phys_mii_mask;
+
+ return of_mdiobus_register(ds->slave_mii_bus, mdio_np);
+}
+
+static int gswip_port_enable(struct dsa_switch *ds, int port,
+ struct phy_device *phydev)
+{
+ struct gswip_priv *priv = ds->priv;
+
+ /* RMON Counter Enable for port */
+ gswip_switch_w(priv, GSWIP_BM_PCFG_CNTEN, GSWIP_BM_PCFGp(port));
+
+ /* enable port fetch/store dma & VLAN Modification */
+ gswip_switch_mask(priv, 0, GSWIP_FDMA_PCTRL_EN |
+ GSWIP_FDMA_PCTRL_VLANMOD_BOTH,
+ GSWIP_FDMA_PCTRLp(port));
+ gswip_switch_mask(priv, 0, GSWIP_SDMA_PCTRL_EN,
+ GSWIP_SDMA_PCTRLp(port));
+ gswip_switch_mask(priv, 0, GSWIP_PCE_PCTRL_0_INGRESS,
+ GSWIP_PCE_PCTRL_0p(port));
+
+ if (!dsa_is_cpu_port(ds, port)) {
+ u32 macconf = GSWIP_MDIO_PHY_LINK_AUTO |
+ GSWIP_MDIO_PHY_SPEED_AUTO |
+ GSWIP_MDIO_PHY_FDUP_AUTO |
+ GSWIP_MDIO_PHY_FCONTX_AUTO |
+ GSWIP_MDIO_PHY_FCONRX_AUTO |
+ (phydev->mdio.addr & GSWIP_MDIO_PHY_ADDR_MASK);
+
+ gswip_mdio_w(priv, macconf, GSWIP_MDIO_PHYp(port));
+ /* Activate MDIO auto polling */
+ gswip_mdio_mask(priv, 0, BIT(port), GSWIP_MDIO_MDC_CFG0);
+ }
+
+ return 0;
+}
+
+static void gswip_port_disable(struct dsa_switch *ds, int port,
+ struct phy_device *phy)
+{
+ struct gswip_priv *priv = ds->priv;
+
+ if (!dsa_is_cpu_port(ds, port)) {
+ gswip_mdio_mask(priv, GSWIP_MDIO_PHY_LINK_DOWN,
+ GSWIP_MDIO_PHY_LINK_MASK,
+ GSWIP_MDIO_PHYp(port));
+ /* Deactivate MDIO auto polling */
+ gswip_mdio_mask(priv, BIT(port), 0, GSWIP_MDIO_MDC_CFG0);
+ }
+
+ gswip_switch_mask(priv, GSWIP_FDMA_PCTRL_EN, 0,
+ GSWIP_FDMA_PCTRLp(port));
+ gswip_switch_mask(priv, GSWIP_SDMA_PCTRL_EN, 0,
+ GSWIP_SDMA_PCTRLp(port));
+}
+
+static int gswip_pce_load_microcode(struct gswip_priv *priv)
+{
+ int i;
+ int err;
+
+ gswip_switch_mask(priv, GSWIP_PCE_TBL_CTRL_ADDR_MASK |
+ GSWIP_PCE_TBL_CTRL_OPMOD_MASK,
+ GSWIP_PCE_TBL_CTRL_OPMOD_ADWR, GSWIP_PCE_TBL_CTRL);
+ gswip_switch_w(priv, 0, GSWIP_PCE_TBL_MASK);
+
+ for (i = 0; i < ARRAY_SIZE(gswip_pce_microcode); i++) {
+ gswip_switch_w(priv, i, GSWIP_PCE_TBL_ADDR);
+ gswip_switch_w(priv, gswip_pce_microcode[i].val_0,
+ GSWIP_PCE_TBL_VAL(0));
+ gswip_switch_w(priv, gswip_pce_microcode[i].val_1,
+ GSWIP_PCE_TBL_VAL(1));
+ gswip_switch_w(priv, gswip_pce_microcode[i].val_2,
+ GSWIP_PCE_TBL_VAL(2));
+ gswip_switch_w(priv, gswip_pce_microcode[i].val_3,
+ GSWIP_PCE_TBL_VAL(3));
+
+ /* start the table access: */
+ gswip_switch_mask(priv, 0, GSWIP_PCE_TBL_CTRL_BAS,
+ GSWIP_PCE_TBL_CTRL);
+ err = gswip_switch_r_timeout(priv, GSWIP_PCE_TBL_CTRL,
+ GSWIP_PCE_TBL_CTRL_BAS);
+ if (err)
+ return err;
+ }
+
+ /* tell the switch that the microcode is loaded */
+ gswip_switch_mask(priv, 0, GSWIP_PCE_GCTRL_0_MC_VALID,
+ GSWIP_PCE_GCTRL_0);
+
+ return 0;
+}
+
+static int gswip_setup(struct dsa_switch *ds)
+{
+ struct gswip_priv *priv = ds->priv;
+ unsigned int cpu_port = priv->hw_info->cpu_port;
+ int i;
+ int err;
+
+ gswip_switch_w(priv, GSWIP_SWRES_R0, GSWIP_SWRES);
+ usleep_range(5000, 10000);
+ gswip_switch_w(priv, 0, GSWIP_SWRES);
+
+ /* disable port fetch/store dma on all ports */
+ for (i = 0; i < priv->hw_info->max_ports; i++)
+ gswip_port_disable(ds, i, NULL);
+
+ /* enable Switch */
+ gswip_mdio_mask(priv, 0, GSWIP_MDIO_GLOB_ENABLE, GSWIP_MDIO_GLOB);
+
+ err = gswip_pce_load_microcode(priv);
+ if (err) {
+ dev_err(priv->dev, "writing PCE microcode failed, %i", err);
+ return err;
+ }
+
+ /* Default unknown Broadcast/Multicast/Unicast port maps */
+ gswip_switch_w(priv, BIT(cpu_port), GSWIP_PCE_PMAP1);
+ gswip_switch_w(priv, BIT(cpu_port), GSWIP_PCE_PMAP2);
+ gswip_switch_w(priv, BIT(cpu_port), GSWIP_PCE_PMAP3);
+
+ /* disable PHY auto polling */
+ gswip_mdio_w(priv, 0x0, GSWIP_MDIO_MDC_CFG0);
+ /* Configure the MDIO Clock 2.5 MHz */
+ gswip_mdio_mask(priv, 0xff, 0x09, GSWIP_MDIO_MDC_CFG1);
+
+ /* Disable the xMII link */
+ gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, 0);
+ gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, 1);
+ gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, 5);
+
+ /* enable special tag insertion on cpu port */
+ gswip_switch_mask(priv, 0, GSWIP_FDMA_PCTRL_STEN,
+ GSWIP_FDMA_PCTRLp(cpu_port));
+
+ gswip_switch_mask(priv, 0, GSWIP_MAC_CTRL_2_MLEN,
+ GSWIP_MAC_CTRL_2p(cpu_port));
+ gswip_switch_w(priv, VLAN_ETH_FRAME_LEN + 8, GSWIP_MAC_FLEN);
+ gswip_switch_mask(priv, 0, GSWIP_BM_QUEUE_GCTRL_GL_MOD,
+ GSWIP_BM_QUEUE_GCTRL);
+
+ /* VLAN aware Switching */
+ gswip_switch_mask(priv, 0, GSWIP_PCE_GCTRL_0_VLAN, GSWIP_PCE_GCTRL_0);
+
+ /* Mac Address Table Lock */
+ gswip_switch_mask(priv, 0, GSWIP_PCE_GCTRL_1_MAC_GLOCK |
+ GSWIP_PCE_GCTRL_1_MAC_GLOCK_MOD,
+ GSWIP_PCE_GCTRL_1);
+
+ gswip_port_enable(ds, cpu_port, NULL);
+ return 0;
+}
+
+static enum dsa_tag_protocol gswip_get_tag_protocol(struct dsa_switch *ds,
+ int port)
+{
+ return DSA_TAG_PROTO_GSWIP;
+}
+
+static void gswip_phylink_validate(struct dsa_switch *ds, int port,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+ __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+ switch (port) {
+ case 0:
+ case 1:
+ if (!phy_interface_mode_is_rgmii(state->interface) &&
+ state->interface != PHY_INTERFACE_MODE_MII &&
+ state->interface != PHY_INTERFACE_MODE_REVMII &&
+ state->interface != PHY_INTERFACE_MODE_RMII) {
+ bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
+ dev_err(ds->dev,
+ "Unsupported interface: %d\n", state->interface);
+ return;
+ }
+ break;
+ case 2:
+ case 3:
+ case 4:
+ if (state->interface != PHY_INTERFACE_MODE_INTERNAL) {
+ bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
+ dev_err(ds->dev,
+ "Unsupported interface: %d\n", state->interface);
+ return;
+ }
+ break;
+ case 5:
+ if (!phy_interface_mode_is_rgmii(state->interface) &&
+ state->interface != PHY_INTERFACE_MODE_INTERNAL) {
+ bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
+ dev_err(ds->dev,
+ "Unsupported interface: %d\n", state->interface);
+ return;
+ }
+ break;
+ }
+
+ /* Allow all the expected bits */
+ phylink_set(mask, Autoneg);
+ phylink_set_port_modes(mask);
+ phylink_set(mask, Pause);
+ phylink_set(mask, Asym_Pause);
+
+ /* With the exclusion of MII and Reverse MII, we support Gigabit,
+ * including Half duplex
+ */
+ if (state->interface != PHY_INTERFACE_MODE_MII &&
+ state->interface != PHY_INTERFACE_MODE_REVMII) {
+ phylink_set(mask, 1000baseT_Full);
+ phylink_set(mask, 1000baseT_Half);
+ }
+
+ phylink_set(mask, 10baseT_Half);
+ phylink_set(mask, 10baseT_Full);
+ phylink_set(mask, 100baseT_Half);
+ phylink_set(mask, 100baseT_Full);
+
+ bitmap_and(supported, supported, mask,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+ bitmap_and(state->advertising, state->advertising, mask,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static void gswip_phylink_mac_config(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ struct gswip_priv *priv = ds->priv;
+ u32 miicfg = 0;
+
+ miicfg |= GSWIP_MII_CFG_LDCLKDIS;
+
+ switch (state->interface) {
+ case PHY_INTERFACE_MODE_MII:
+ case PHY_INTERFACE_MODE_INTERNAL:
+ miicfg |= GSWIP_MII_CFG_MODE_MIIM;
+ break;
+ case PHY_INTERFACE_MODE_REVMII:
+ miicfg |= GSWIP_MII_CFG_MODE_MIIP;
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ miicfg |= GSWIP_MII_CFG_MODE_RMIIM;
+ break;
+ case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ miicfg |= GSWIP_MII_CFG_MODE_RGMII;
+ break;
+ default:
+ dev_err(ds->dev,
+ "Unsupported interface: %d\n", state->interface);
+ return;
+ }
+ gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_MODE_MASK, miicfg, port);
+
+ switch (state->interface) {
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ gswip_mii_mask_pcdu(priv, GSWIP_MII_PCDU_TXDLY_MASK |
+ GSWIP_MII_PCDU_RXDLY_MASK, 0, port);
+ break;
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ gswip_mii_mask_pcdu(priv, GSWIP_MII_PCDU_RXDLY_MASK, 0, port);
+ break;
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ gswip_mii_mask_pcdu(priv, GSWIP_MII_PCDU_TXDLY_MASK, 0, port);
+ break;
+ default:
+ break;
+ }
+}
+
+static void gswip_phylink_mac_link_down(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface)
+{
+ struct gswip_priv *priv = ds->priv;
+
+ gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, port);
+}
+
+static void gswip_phylink_mac_link_up(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface,
+ struct phy_device *phydev)
+{
+ struct gswip_priv *priv = ds->priv;
+
+ /* Enable the xMII interface only for the external PHY */
+ if (interface != PHY_INTERFACE_MODE_INTERNAL)
+ gswip_mii_mask_cfg(priv, 0, GSWIP_MII_CFG_EN, port);
+}
+
+static void gswip_get_strings(struct dsa_switch *ds, int port, u32 stringset,
+ uint8_t *data)
+{
+ int i;
+
+ if (stringset != ETH_SS_STATS)
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(gswip_rmon_cnt); i++)
+ strncpy(data + i * ETH_GSTRING_LEN, gswip_rmon_cnt[i].name,
+ ETH_GSTRING_LEN);
+}
+
+static u32 gswip_bcm_ram_entry_read(struct gswip_priv *priv, u32 table,
+ u32 index)
+{
+ u32 result;
+ int err;
+
+ gswip_switch_w(priv, index, GSWIP_BM_RAM_ADDR);
+ gswip_switch_mask(priv, GSWIP_BM_RAM_CTRL_ADDR_MASK |
+ GSWIP_BM_RAM_CTRL_OPMOD,
+ table | GSWIP_BM_RAM_CTRL_BAS,
+ GSWIP_BM_RAM_CTRL);
+
+ err = gswip_switch_r_timeout(priv, GSWIP_BM_RAM_CTRL,
+ GSWIP_BM_RAM_CTRL_BAS);
+ if (err) {
+ dev_err(priv->dev, "timeout while reading table: %u, index: %u",
+ table, index);
+ return 0;
+ }
+
+ result = gswip_switch_r(priv, GSWIP_BM_RAM_VAL(0));
+ result |= gswip_switch_r(priv, GSWIP_BM_RAM_VAL(1)) << 16;
+
+ return result;
+}
+
+static void gswip_get_ethtool_stats(struct dsa_switch *ds, int port,
+ uint64_t *data)
+{
+ struct gswip_priv *priv = ds->priv;
+ const struct gswip_rmon_cnt_desc *rmon_cnt;
+ int i;
+ u64 high;
+
+ for (i = 0; i < ARRAY_SIZE(gswip_rmon_cnt); i++) {
+ rmon_cnt = &gswip_rmon_cnt[i];
+
+ data[i] = gswip_bcm_ram_entry_read(priv, port,
+ rmon_cnt->offset);
+ if (rmon_cnt->size == 2) {
+ high = gswip_bcm_ram_entry_read(priv, port,
+ rmon_cnt->offset + 1);
+ data[i] |= high << 32;
+ }
+ }
+}
+
+static int gswip_get_sset_count(struct dsa_switch *ds, int port, int sset)
+{
+ if (sset != ETH_SS_STATS)
+ return 0;
+
+ return ARRAY_SIZE(gswip_rmon_cnt);
+}
+
+static const struct dsa_switch_ops gswip_switch_ops = {
+ .get_tag_protocol = gswip_get_tag_protocol,
+ .setup = gswip_setup,
+ .port_enable = gswip_port_enable,
+ .port_disable = gswip_port_disable,
+ .phylink_validate = gswip_phylink_validate,
+ .phylink_mac_config = gswip_phylink_mac_config,
+ .phylink_mac_link_down = gswip_phylink_mac_link_down,
+ .phylink_mac_link_up = gswip_phylink_mac_link_up,
+ .get_strings = gswip_get_strings,
+ .get_ethtool_stats = gswip_get_ethtool_stats,
+ .get_sset_count = gswip_get_sset_count,
+};
+
+static const struct xway_gphy_match_data xrx200a1x_gphy_data = {
+ .fe_firmware_name = "lantiq/xrx200_phy22f_a14.bin",
+ .ge_firmware_name = "lantiq/xrx200_phy11g_a14.bin",
+};
+
+static const struct xway_gphy_match_data xrx200a2x_gphy_data = {
+ .fe_firmware_name = "lantiq/xrx200_phy22f_a22.bin",
+ .ge_firmware_name = "lantiq/xrx200_phy11g_a22.bin",
+};
+
+static const struct xway_gphy_match_data xrx300_gphy_data = {
+ .fe_firmware_name = "lantiq/xrx300_phy22f_a21.bin",
+ .ge_firmware_name = "lantiq/xrx300_phy11g_a21.bin",
+};
+
+static const struct of_device_id xway_gphy_match[] = {
+ { .compatible = "lantiq,xrx200-gphy-fw", .data = NULL },
+ { .compatible = "lantiq,xrx200a1x-gphy-fw", .data = &xrx200a1x_gphy_data },
+ { .compatible = "lantiq,xrx200a2x-gphy-fw", .data = &xrx200a2x_gphy_data },
+ { .compatible = "lantiq,xrx300-gphy-fw", .data = &xrx300_gphy_data },
+ { .compatible = "lantiq,xrx330-gphy-fw", .data = &xrx300_gphy_data },
+ {},
+};
+
+static int gswip_gphy_fw_load(struct gswip_priv *priv, struct gswip_gphy_fw *gphy_fw)
+{
+ struct device *dev = priv->dev;
+ const struct firmware *fw;
+ void *fw_addr;
+ dma_addr_t dma_addr;
+ dma_addr_t dev_addr;
+ size_t size;
+ int ret;
+
+ ret = clk_prepare_enable(gphy_fw->clk_gate);
+ if (ret)
+ return ret;
+
+ reset_control_assert(gphy_fw->reset);
+
+ ret = request_firmware(&fw, gphy_fw->fw_name, dev);
+ if (ret) {
+ dev_err(dev, "failed to load firmware: %s, error: %i\n",
+ gphy_fw->fw_name, ret);
+ return ret;
+ }
+
+ /* GPHY cores need the firmware code in a persistent and contiguous
+ * memory area with a 16 kB boundary aligned start address.
+ */
+ size = fw->size + XRX200_GPHY_FW_ALIGN;
+
+ fw_addr = dmam_alloc_coherent(dev, size, &dma_addr, GFP_KERNEL);
+ if (fw_addr) {
+ fw_addr = PTR_ALIGN(fw_addr, XRX200_GPHY_FW_ALIGN);
+ dev_addr = ALIGN(dma_addr, XRX200_GPHY_FW_ALIGN);
+ memcpy(fw_addr, fw->data, fw->size);
+ } else {
+ dev_err(dev, "failed to alloc firmware memory\n");
+ release_firmware(fw);
+ return -ENOMEM;
+ }
+
+ release_firmware(fw);
+
+ ret = regmap_write(priv->rcu_regmap, gphy_fw->fw_addr_offset, dev_addr);
+ if (ret)
+ return ret;
+
+ reset_control_deassert(gphy_fw->reset);
+
+ return ret;
+}
+
+static int gswip_gphy_fw_probe(struct gswip_priv *priv,
+ struct gswip_gphy_fw *gphy_fw,
+ struct device_node *gphy_fw_np, int i)
+{
+ struct device *dev = priv->dev;
+ u32 gphy_mode;
+ int ret;
+ char gphyname[10];
+
+ snprintf(gphyname, sizeof(gphyname), "gphy%d", i);
+
+ gphy_fw->clk_gate = devm_clk_get(dev, gphyname);
+ if (IS_ERR(gphy_fw->clk_gate)) {
+ dev_err(dev, "Failed to lookup gate clock\n");
+ return PTR_ERR(gphy_fw->clk_gate);
+ }
+
+ ret = of_property_read_u32(gphy_fw_np, "reg", &gphy_fw->fw_addr_offset);
+ if (ret)
+ return ret;
+
+ ret = of_property_read_u32(gphy_fw_np, "lantiq,gphy-mode", &gphy_mode);
+ /* Default to GE mode */
+ if (ret)
+ gphy_mode = GPHY_MODE_GE;
+
+ switch (gphy_mode) {
+ case GPHY_MODE_FE:
+ gphy_fw->fw_name = priv->gphy_fw_name_cfg->fe_firmware_name;
+ break;
+ case GPHY_MODE_GE:
+ gphy_fw->fw_name = priv->gphy_fw_name_cfg->ge_firmware_name;
+ break;
+ default:
+ dev_err(dev, "Unknown GPHY mode %d\n", gphy_mode);
+ return -EINVAL;
+ }
+
+ gphy_fw->reset = of_reset_control_array_get_exclusive(gphy_fw_np);
+ if (IS_ERR(priv->gphy_fw)) {
+ if (PTR_ERR(priv->gphy_fw) != -EPROBE_DEFER)
+ dev_err(dev, "Failed to lookup gphy reset\n");
+ return PTR_ERR(priv->gphy_fw);
+ }
+
+ return gswip_gphy_fw_load(priv, gphy_fw);
+}
+
+static void gswip_gphy_fw_remove(struct gswip_priv *priv,
+ struct gswip_gphy_fw *gphy_fw)
+{
+ int ret;
+
+ /* check if the device was fully probed */
+ if (!gphy_fw->fw_name)
+ return;
+
+ ret = regmap_write(priv->rcu_regmap, gphy_fw->fw_addr_offset, 0);
+ if (ret)
+ dev_err(priv->dev, "can not reset GPHY FW pointer");
+
+ clk_disable_unprepare(gphy_fw->clk_gate);
+
+ reset_control_put(gphy_fw->reset);
+}
+
+static int gswip_gphy_fw_list(struct gswip_priv *priv,
+ struct device_node *gphy_fw_list_np, u32 version)
+{
+ struct device *dev = priv->dev;
+ struct device_node *gphy_fw_np;
+ const struct of_device_id *match;
+ int err;
+ int i = 0;
+
+ /* The The VRX200 rev 1.1 uses the GSWIP 2.0 and needs the older
+ * GPHY firmware. The VRX200 rev 1.2 uses the GSWIP 2.1 and also
+ * needs a different GPHY firmware.
+ */
+ if (of_device_is_compatible(gphy_fw_list_np, "lantiq,xrx200-gphy-fw")) {
+ switch (version) {
+ case GSWIP_VERSION_2_0:
+ priv->gphy_fw_name_cfg = &xrx200a1x_gphy_data;
+ break;
+ case GSWIP_VERSION_2_1:
+ priv->gphy_fw_name_cfg = &xrx200a2x_gphy_data;
+ break;
+ default:
+ dev_err(dev, "unknown GSWIP version: 0x%x", version);
+ return -ENOENT;
+ }
+ }
+
+ match = of_match_node(xway_gphy_match, gphy_fw_list_np);
+ if (match && match->data)
+ priv->gphy_fw_name_cfg = match->data;
+
+ if (!priv->gphy_fw_name_cfg) {
+ dev_err(dev, "GPHY compatible type not supported");
+ return -ENOENT;
+ }
+
+ priv->num_gphy_fw = of_get_available_child_count(gphy_fw_list_np);
+ if (!priv->num_gphy_fw)
+ return -ENOENT;
+
+ priv->rcu_regmap = syscon_regmap_lookup_by_phandle(gphy_fw_list_np,
+ "lantiq,rcu");
+ if (IS_ERR(priv->rcu_regmap))
+ return PTR_ERR(priv->rcu_regmap);
+
+ priv->gphy_fw = devm_kmalloc_array(dev, priv->num_gphy_fw,
+ sizeof(*priv->gphy_fw),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!priv->gphy_fw)
+ return -ENOMEM;
+
+ for_each_available_child_of_node(gphy_fw_list_np, gphy_fw_np) {
+ err = gswip_gphy_fw_probe(priv, &priv->gphy_fw[i],
+ gphy_fw_np, i);
+ if (err)
+ goto remove_gphy;
+ i++;
+ }
+
+ return 0;
+
+remove_gphy:
+ for (i = 0; i < priv->num_gphy_fw; i++)
+ gswip_gphy_fw_remove(priv, &priv->gphy_fw[i]);
+ return err;
+}
+
+static int gswip_probe(struct platform_device *pdev)
+{
+ struct gswip_priv *priv;
+ struct resource *gswip_res, *mdio_res, *mii_res;
+ struct device_node *mdio_np, *gphy_fw_np;
+ struct device *dev = &pdev->dev;
+ int err;
+ int i;
+ u32 version;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ gswip_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ priv->gswip = devm_ioremap_resource(dev, gswip_res);
+ if (!priv->gswip)
+ return -ENOMEM;
+
+ mdio_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+ priv->mdio = devm_ioremap_resource(dev, mdio_res);
+ if (!priv->mdio)
+ return -ENOMEM;
+
+ mii_res = platform_get_resource(pdev, IORESOURCE_MEM, 2);
+ priv->mii = devm_ioremap_resource(dev, mii_res);
+ if (!priv->mii)
+ return -ENOMEM;
+
+ priv->hw_info = of_device_get_match_data(dev);
+ if (!priv->hw_info)
+ return -EINVAL;
+
+ priv->ds = dsa_switch_alloc(dev, priv->hw_info->max_ports);
+ if (!priv->ds)
+ return -ENOMEM;
+
+ priv->ds->priv = priv;
+ priv->ds->ops = &gswip_switch_ops;
+ priv->dev = dev;
+ version = gswip_switch_r(priv, GSWIP_VERSION);
+
+ /* bring up the mdio bus */
+ gphy_fw_np = of_find_compatible_node(pdev->dev.of_node, NULL,
+ "lantiq,gphy-fw");
+ if (gphy_fw_np) {
+ err = gswip_gphy_fw_list(priv, gphy_fw_np, version);
+ if (err) {
+ dev_err(dev, "gphy fw probe failed\n");
+ return err;
+ }
+ }
+
+ /* bring up the mdio bus */
+ mdio_np = of_find_compatible_node(pdev->dev.of_node, NULL,
+ "lantiq,xrx200-mdio");
+ if (mdio_np) {
+ err = gswip_mdio(priv, mdio_np);
+ if (err) {
+ dev_err(dev, "mdio probe failed\n");
+ goto gphy_fw;
+ }
+ }
+
+ err = dsa_register_switch(priv->ds);
+ if (err) {
+ dev_err(dev, "dsa switch register failed: %i\n", err);
+ goto mdio_bus;
+ }
+ if (priv->ds->dst->cpu_dp->index != priv->hw_info->cpu_port) {
+ dev_err(dev, "wrong CPU port defined, HW only supports port: %i",
+ priv->hw_info->cpu_port);
+ err = -EINVAL;
+ goto mdio_bus;
+ }
+
+ platform_set_drvdata(pdev, priv);
+
+ dev_info(dev, "probed GSWIP version %lx mod %lx\n",
+ (version & GSWIP_VERSION_REV_MASK) >> GSWIP_VERSION_REV_SHIFT,
+ (version & GSWIP_VERSION_MOD_MASK) >> GSWIP_VERSION_MOD_SHIFT);
+ return 0;
+
+mdio_bus:
+ if (mdio_np)
+ mdiobus_unregister(priv->ds->slave_mii_bus);
+gphy_fw:
+ for (i = 0; i < priv->num_gphy_fw; i++)
+ gswip_gphy_fw_remove(priv, &priv->gphy_fw[i]);
+ return err;
+}
+
+static int gswip_remove(struct platform_device *pdev)
+{
+ struct gswip_priv *priv = platform_get_drvdata(pdev);
+ int i;
+
+ if (!priv)
+ return 0;
+
+ /* disable the switch */
+ gswip_mdio_mask(priv, GSWIP_MDIO_GLOB_ENABLE, 0, GSWIP_MDIO_GLOB);
+
+ dsa_unregister_switch(priv->ds);
+
+ if (priv->ds->slave_mii_bus)
+ mdiobus_unregister(priv->ds->slave_mii_bus);
+
+ for (i = 0; i < priv->num_gphy_fw; i++)
+ gswip_gphy_fw_remove(priv, &priv->gphy_fw[i]);
+
+ return 0;
+}
+
+static const struct gswip_hw_info gswip_xrx200 = {
+ .max_ports = 7,
+ .cpu_port = 6,
+};
+
+static const struct of_device_id gswip_of_match[] = {
+ { .compatible = "lantiq,xrx200-gswip", .data = &gswip_xrx200 },
+ {},
+};
+MODULE_DEVICE_TABLE(of, gswip_of_match);
+
+static struct platform_driver gswip_driver = {
+ .probe = gswip_probe,
+ .remove = gswip_remove,
+ .driver = {
+ .name = "gswip",
+ .of_match_table = gswip_of_match,
+ },
+};
+
+module_platform_driver(gswip_driver);
+
+MODULE_AUTHOR("Hauke Mehrtens <hauke@hauke-m.de>");
+MODULE_DESCRIPTION("Lantiq / Intel GSWIP driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/net/dsa/lantiq_pce.h b/drivers/net/dsa/lantiq_pce.h
new file mode 100644
index 000000000000..180663138e75
--- /dev/null
+++ b/drivers/net/dsa/lantiq_pce.h
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCE microcode extracted from UGW 7.1.1 switch api
+ *
+ * Copyright (c) 2012, 2014, 2015 Lantiq Deutschland GmbH
+ * Copyright (C) 2012 John Crispin <john@phrozen.org>
+ * Copyright (C) 2017 - 2018 Hauke Mehrtens <hauke@hauke-m.de>
+ */
+
+enum {
+ OUT_MAC0 = 0,
+ OUT_MAC1,
+ OUT_MAC2,
+ OUT_MAC3,
+ OUT_MAC4,
+ OUT_MAC5,
+ OUT_ETHTYP,
+ OUT_VTAG0,
+ OUT_VTAG1,
+ OUT_ITAG0,
+ OUT_ITAG1, /*10 */
+ OUT_ITAG2,
+ OUT_ITAG3,
+ OUT_IP0,
+ OUT_IP1,
+ OUT_IP2,
+ OUT_IP3,
+ OUT_SIP0,
+ OUT_SIP1,
+ OUT_SIP2,
+ OUT_SIP3, /*20*/
+ OUT_SIP4,
+ OUT_SIP5,
+ OUT_SIP6,
+ OUT_SIP7,
+ OUT_DIP0,
+ OUT_DIP1,
+ OUT_DIP2,
+ OUT_DIP3,
+ OUT_DIP4,
+ OUT_DIP5, /*30*/
+ OUT_DIP6,
+ OUT_DIP7,
+ OUT_SESID,
+ OUT_PROT,
+ OUT_APP0,
+ OUT_APP1,
+ OUT_IGMP0,
+ OUT_IGMP1,
+ OUT_IPOFF, /*39*/
+ OUT_NONE = 63,
+};
+
+/* parser's microcode length type */
+#define INSTR 0
+#define IPV6 1
+#define LENACCU 2
+
+/* parser's microcode flag type */
+enum {
+ FLAG_ITAG = 0,
+ FLAG_VLAN,
+ FLAG_SNAP,
+ FLAG_PPPOE,
+ FLAG_IPV6,
+ FLAG_IPV6FL,
+ FLAG_IPV4,
+ FLAG_IGMP,
+ FLAG_TU,
+ FLAG_HOP,
+ FLAG_NN1, /*10 */
+ FLAG_NN2,
+ FLAG_END,
+ FLAG_NO, /*13*/
+};
+
+struct gswip_pce_microcode {
+ u16 val_3;
+ u16 val_2;
+ u16 val_1;
+ u16 val_0;
+};
+
+#define MC_ENTRY(val, msk, ns, out, len, type, flags, ipv4_len) \
+ { val, msk, ((ns) << 10 | (out) << 4 | (len) >> 1),\
+ ((len) & 1) << 15 | (type) << 13 | (flags) << 9 | (ipv4_len) << 8 }
+static const struct gswip_pce_microcode gswip_pce_microcode[] = {
+ /* value mask ns fields L type flags ipv4_len */
+ MC_ENTRY(0x88c3, 0xFFFF, 1, OUT_ITAG0, 4, INSTR, FLAG_ITAG, 0),
+ MC_ENTRY(0x8100, 0xFFFF, 2, OUT_VTAG0, 2, INSTR, FLAG_VLAN, 0),
+ MC_ENTRY(0x88A8, 0xFFFF, 1, OUT_VTAG0, 2, INSTR, FLAG_VLAN, 0),
+ MC_ENTRY(0x8100, 0xFFFF, 1, OUT_VTAG0, 2, INSTR, FLAG_VLAN, 0),
+ MC_ENTRY(0x8864, 0xFFFF, 17, OUT_ETHTYP, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0800, 0xFFFF, 21, OUT_ETHTYP, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x86DD, 0xFFFF, 22, OUT_ETHTYP, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x8863, 0xFFFF, 16, OUT_ETHTYP, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0xF800, 10, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 40, OUT_ETHTYP, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0600, 0x0600, 40, OUT_ETHTYP, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 12, OUT_NONE, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0xAAAA, 0xFFFF, 14, OUT_NONE, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0300, 0xFF00, 41, OUT_NONE, 0, INSTR, FLAG_SNAP, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_DIP7, 3, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 18, OUT_DIP7, 3, INSTR, FLAG_PPPOE, 0),
+ MC_ENTRY(0x0021, 0xFFFF, 21, OUT_NONE, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0057, 0xFFFF, 22, OUT_NONE, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 40, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x4000, 0xF000, 24, OUT_IP0, 4, INSTR, FLAG_IPV4, 1),
+ MC_ENTRY(0x6000, 0xF000, 27, OUT_IP0, 3, INSTR, FLAG_IPV6, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 25, OUT_IP3, 2, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 26, OUT_SIP0, 4, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 40, OUT_NONE, 0, LENACCU, FLAG_NO, 0),
+ MC_ENTRY(0x1100, 0xFF00, 39, OUT_PROT, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0600, 0xFF00, 39, OUT_PROT, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0xFF00, 33, OUT_IP3, 17, INSTR, FLAG_HOP, 0),
+ MC_ENTRY(0x2B00, 0xFF00, 33, OUT_IP3, 17, INSTR, FLAG_NN1, 0),
+ MC_ENTRY(0x3C00, 0xFF00, 33, OUT_IP3, 17, INSTR, FLAG_NN2, 0),
+ MC_ENTRY(0x0000, 0x0000, 39, OUT_PROT, 1, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x00E0, 35, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 40, OUT_NONE, 0, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0xFF00, 33, OUT_NONE, 0, IPV6, FLAG_HOP, 0),
+ MC_ENTRY(0x2B00, 0xFF00, 33, OUT_NONE, 0, IPV6, FLAG_NN1, 0),
+ MC_ENTRY(0x3C00, 0xFF00, 33, OUT_NONE, 0, IPV6, FLAG_NN2, 0),
+ MC_ENTRY(0x0000, 0x0000, 40, OUT_PROT, 1, IPV6, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 40, OUT_SIP0, 16, INSTR, FLAG_NO, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_APP0, 4, INSTR, FLAG_IGMP, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+ MC_ENTRY(0x0000, 0x0000, 41, OUT_NONE, 0, INSTR, FLAG_END, 0),
+};
--
2.11.0
^ permalink raw reply related
* Re: [PATCH net V2] ip: frags: fix crash in ip_do_fragment()
From: Eric Dumazet @ 2018-09-09 20:51 UTC (permalink / raw)
To: Taehee Yoo
Cc: David Miller, Peter Oskolkov, netdev, Pablo Neira Ayuso,
Florian Westphal
In-Reply-To: <CAMArcTWjDRy-LpEKXzv4Zs-MiMGC25SkzCmNkB5Ks35_MWhoKQ@mail.gmail.com>
> Hi Eric,
> Thank you for review!
>
> I think netfilter side ipv6 code change is needed
> because netfilter ipv6 defrag routine also set fp->ip_defrag_offset value
> so that fp->sk will not be NULL.
> And I think these crash in ip_do_fragment() and ip6_fragment() are
> actually same bug.
>
Right you are, thanks.
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH v4 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-09 20:55 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-rdma, Doug Ledford, netdev
In-Reply-To: <20180909181146.GA29632@cello.Dlink>
[-- Attachment #1: Type: text/plain, Size: 3764 bytes --]
On Sun, Sep 09, 2018 at 09:11:46PM +0300, Arseny Maslennikov wrote:
> On Fri, Sep 07, 2018 at 09:43:59AM -0600, Jason Gunthorpe wrote:
> > On Thu, Sep 06, 2018 at 05:51:12PM +0300, Arseny Maslennikov wrote:
> > > Some tools may currently be using only the deprecated attribute;
> > > let's print an elaborate and clear deprecation notice to kmsg.
> > >
> > > To do that, we have to replace the whole sysfs file, since we inherit
> > > the original one from netdev.
> > >
> > > Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
> > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 31 +++++++++++++++++++++++
> > > 1 file changed, 31 insertions(+)
> > >
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > index 30f840f874b3..74732726ec6f 100644
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > @@ -2386,6 +2386,35 @@ int ipoib_add_pkey_attr(struct net_device *dev)
> > > return device_create_file(&dev->dev, &dev_attr_pkey);
> > > }
> > >
> > > +/*
> > > + * We erroneously exposed the iface's port number in the dev_id
> > > + * sysfs field long after dev_port was introduced for that purpose[1],
> > > + * and we need to stop everyone from relying on that.
> > > + * Let's overload the shower routine for the dev_id file here
> > > + * to gently bring the issue up.
> > > + *
> > > + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> > > + */
> > > +static ssize_t dev_id_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + struct net_device *ndev = to_net_dev(dev);
> > > +
> > > + if (ndev->dev_id == ndev->dev_port)
> > > + netdev_info_once(ndev,
> > > + "\"%s\" wants to know my dev_id. Should it look at dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more info.\n",
> > > + current->comm);
> > > +
> > > + return sprintf(buf, "%#x\n", ndev->dev_id);
> > > +}
> > > +static DEVICE_ATTR_RO(dev_id);
> > > +
> > > +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> > > +{
> > > + device_remove_file(&dev->dev, &dev_attr_dev_id);
> > > + return device_create_file(&dev->dev, &dev_attr_dev_id);
> > > +}
> >
> > Isn't this racey with userspace? Ie what happens if udev is querying
> > the dev_id right here?
>
> udev in particular does not use dev_id at all since 2014, because "why
> would we keep using dev_id if it is not the right thing to use?".
>
> >
> > Do we know there is no userspace doing this?
> >
>
> Not for sure.
>
> If we move all the sysfs handling stuff we introduce in _add_port():
> - pkey
> - umcast
> - {create,delete}_child
> - connected/datagram mode
> to _ndo_init(), which is called by register_netdev before it sends
> the netlink message, would that suffice to eliminate the race?
> (Sysfs files for {create,delete}_child go to _parent_init() then).
>
No, we can't, sorry for the noise. ndo_init() runs before the kobject
becomes available.
Anyway, our sysfs attributes being racy is unrelated to the patch series
subject, and I can't come up with any other ideas what to do with them
that do not involve adjustments to register_netdev.
> > > static struct net_device *ipoib_add_port(const char *format,
> > > struct ib_device *hca, u8 port)
> > > {
> > > @@ -2427,6 +2456,8 @@ static struct net_device *ipoib_add_port(const char *format,
> > > */
> > > ndev->priv_destructor = ipoib_intf_free;
> > >
> > > + if (ipoib_intercept_dev_id_attr(ndev))
> > > + goto sysfs_failed;
> >
> > No device_remove_file needed?
> >
>
> As far as I understand, unregister_netdevice takes care of it
> while destroying the related kobject.
>
> > Jason
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-09-10 2:03 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180907083500-mutt-send-email-mst@kernel.org>
On Fri, Sep 07, 2018 at 09:49:14AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 11, 2018 at 10:27:09AM +0800, Tiwei Bie wrote:
> > This commit introduces the support (without EVENT_IDX) for
> > packed ring.
> >
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > drivers/virtio/virtio_ring.c | 495 ++++++++++++++++++++++++++++++++++-
> > 1 file changed, 487 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c4f8abc7445a..f317b485ba54 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -55,12 +55,21 @@
> > #define END_USE(vq)
> > #endif
> >
> > +#define _VRING_DESC_F_AVAIL(b) ((__u16)(b) << 7)
> > +#define _VRING_DESC_F_USED(b) ((__u16)(b) << 15)
>
> Underscore followed by an upper case letter isn't a good idea
> for a symbol. And it's not nice that this does not
> use the VRING_DESC_F_USED from the header.
> How about ((b) ? VRING_DESC_F_USED : 0) instead?
> Is produced code worse then?
Yes, I think the produced code is worse when we use
conditional expression. Below is a simple test:
#define foo1(b) ((b) << 10)
#define foo2(b) ((b) ? (1 << 10) : 0)
unsigned short bar(unsigned short b)
{
return foo1(b);
}
unsigned short baz(unsigned short b)
{
return foo2(b);
}
With `gcc -O3 -S`, I got below assembly code:
.file "tmp.c"
.text
.p2align 4,,15
.globl bar
.type bar, @function
bar:
.LFB0:
.cfi_startproc
movl %edi, %eax
sall $10, %eax
ret
.cfi_endproc
.LFE0:
.size bar, .-bar
.p2align 4,,15
.globl baz
.type baz, @function
baz:
.LFB1:
.cfi_startproc
xorl %eax, %eax
testw %di, %di
setne %al
sall $10, %eax
ret
.cfi_endproc
.LFE1:
.size baz, .-baz
.ident "GCC: (Debian 8.2.0-4) 8.2.0"
.section .note.GNU-stack,"",@progbits
That is to say, for `((b) << 10)`, it will shift the
register directly. But for `((b) ? (1 << 10) : 0)`,
in above code, it will zero eax first, and set al
to 1 depend on whether di is 0, and shift eax.
>
> > +
> > struct vring_desc_state {
> > void *data; /* Data for callback. */
> > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > };
> >
> > struct vring_desc_state_packed {
> > + void *data; /* Data for callback. */
> > + struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
>
> Include vring_desc_state for these?
Sure.
>
> > + int num; /* Descriptor list length. */
> > + dma_addr_t addr; /* Buffer DMA addr. */
> > + u32 len; /* Buffer length. */
> > + u16 flags; /* Descriptor flags. */
>
> This seems only to be used for indirect. Check indirect field above
> instead and drop this.
The `flags` is also needed to know the direction, i.e. DMA_FROM_DEVICE
or DMA_TO_DEVICE when do DMA unmap.
>
> > int next; /* The next desc state. */
>
> Packing things into 16 bit integers will help reduce
> cache pressure here.
>
> Also, this is only used for unmap, so when dma API is not used,
> maintaining addr and len list management is unnecessary. How about we
> maintain addr/len in a separate array, avoiding accesses on unmap in that case?
Sure. I'll give it a try.
>
> Also, lots of architectures have a nop unmap in the DMA API.
> See a proposed patch at the end for optimizing that case.
Got it. Thanks.
>
> > };
> >
> > @@ -660,7 +669,6 @@ static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > - virtio_mb(vq->weak_barriers);
> > return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> > }
>
> why is this changing the split queue implementation?
Because above barrier is also needed by virtqueue_poll_packed(),
so I moved it to virtqueue_poll() and also added some comments
for it.
>
>
> >
> > @@ -757,6 +765,72 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > }
> >
> > +static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
> > + struct vring_desc_state_packed *state)
> > +{
> > + u16 flags;
> > +
> > + if (!vring_use_dma_api(vq->vq.vdev))
> > + return;
> > +
> > + flags = state->flags;
> > +
> > + if (flags & VRING_DESC_F_INDIRECT) {
> > + dma_unmap_single(vring_dma_dev(vq),
> > + state->addr, state->len,
> > + (flags & VRING_DESC_F_WRITE) ?
> > + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > + } else {
> > + dma_unmap_page(vring_dma_dev(vq),
> > + state->addr, state->len,
> > + (flags & VRING_DESC_F_WRITE) ?
> > + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > + }
> > +}
> > +
> > +static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > + struct vring_packed_desc *desc)
> > +{
> > + u16 flags;
> > +
> > + if (!vring_use_dma_api(vq->vq.vdev))
> > + return;
> > +
> > + flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
>
> I see no reason to use virtioXX wrappers for the packed ring.
> That's there to support legacy devices. Just use leXX.
Okay.
>
> > +
> > + if (flags & VRING_DESC_F_INDIRECT) {
> > + dma_unmap_single(vring_dma_dev(vq),
> > + virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > + virtio32_to_cpu(vq->vq.vdev, desc->len),
> > + (flags & VRING_DESC_F_WRITE) ?
> > + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > + } else {
> > + dma_unmap_page(vring_dma_dev(vq),
> > + virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > + virtio32_to_cpu(vq->vq.vdev, desc->len),
> > + (flags & VRING_DESC_F_WRITE) ?
> > + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > + }
> > +}
> > +
> > +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> > + unsigned int total_sg,
> > + gfp_t gfp)
> > +{
> > + struct vring_packed_desc *desc;
> > +
> > + /*
> > + * We require lowmem mappings for the descriptors because
> > + * otherwise virt_to_phys will give us bogus addresses in the
> > + * virtqueue.
>
> Where is virt_to_phys used? I don't see it in this patch.
In vring_map_single(), virt_to_phys() will be used to translate
the address to phys if dma api isn't used:
https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L197
>
> > + */
> > + gfp &= ~__GFP_HIGHMEM;
> > +
> > + desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> > +
> > + return desc;
> > +}
> > +
> > static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > struct scatterlist *sgs[],
> > unsigned int total_sg,
> > @@ -766,47 +840,449 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > void *ctx,
> > gfp_t gfp)
> > {
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + struct vring_packed_desc *desc;
> > + struct scatterlist *sg;
> > + unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> > + __virtio16 uninitialized_var(head_flags), flags;
> > + u16 head, avail_wrap_counter, id, curr;
> > + bool indirect;
> > +
> > + START_USE(vq);
> > +
> > + BUG_ON(data == NULL);
> > + BUG_ON(ctx && vq->indirect);
> > +
> > + if (unlikely(vq->broken)) {
> > + END_USE(vq);
> > + return -EIO;
> > + }
> > +
> > +#ifdef DEBUG
> > + {
> > + ktime_t now = ktime_get();
> > +
> > + /* No kick or get, with .1 second between? Warn. */
> > + if (vq->last_add_time_valid)
> > + WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> > + > 100);
> > + vq->last_add_time = now;
> > + vq->last_add_time_valid = true;
> > + }
> > +#endif
>
> Add incline helpers for this debug stuff rather than
> duplicate it from split ring?
Sure. I'd like to do that.
>
>
> > +
> > + BUG_ON(total_sg == 0);
> > +
> > + head = vq->next_avail_idx;
> > + avail_wrap_counter = vq->avail_wrap_counter;
> > +
> > + if (virtqueue_use_indirect(_vq, total_sg))
> > + desc = alloc_indirect_packed(_vq, total_sg, gfp);
> > + else {
> > + desc = NULL;
> > + WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> > + }
>
>
> Apparently this attempts to treat indirect descriptors same as
> direct ones. This is how it is in the split ring, but not in
> the packed one. I think you want two separate functions, for
> direct and indirect.
Okay, I'll do that.
>
> > +
> > + if (desc) {
> > + /* Use a single buffer which doesn't continue */
> > + indirect = true;
> > + /* Set up rest to use this indirect table. */
> > + i = 0;
> > + descs_used = 1;
> > + } else {
> > + indirect = false;
> > + desc = vq->vring_packed.desc;
> > + i = head;
> > + descs_used = total_sg;
> > + }
> > +
> > + if (vq->vq.num_free < descs_used) {
> > + pr_debug("Can't add buf len %i - avail = %i\n",
> > + descs_used, vq->vq.num_free);
> > + /* FIXME: for historical reasons, we force a notify here if
> > + * there are outgoing parts to the buffer. Presumably the
> > + * host should service the ring ASAP. */
> > + if (out_sgs)
> > + vq->notify(&vq->vq);
> > + if (indirect)
> > + kfree(desc);
> > + END_USE(vq);
> > + return -ENOSPC;
> > + }
> > +
> > + id = vq->free_head;
> > + BUG_ON(id == vq->vring_packed.num);
> > +
> > + curr = id;
> > + for (n = 0; n < out_sgs + in_sgs; n++) {
> > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > + dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> > + DMA_TO_DEVICE : DMA_FROM_DEVICE);
> > + if (vring_mapping_error(vq, addr))
> > + goto unmap_release;
> > +
> > + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > + (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> > + _VRING_DESC_F_AVAIL(vq->avail_wrap_counter) |
> > + _VRING_DESC_F_USED(!vq->avail_wrap_counter));
>
> Spec says:
> The VIRTQ_DESC_F_WRITE flags bit is the only valid flag for descriptors in the
> indirect table.
>
> All this logic isn't needed for indirect.
>
> Also, why re-calculate avail/used flags every time? They only change
> when you wrap around.
Will do that. Thanks.
>
>
> > + if (!indirect && i == head)
> > + head_flags = flags;
> > + else
> > + desc[i].flags = flags;
> > +
> > + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > + i++;
> > + if (!indirect) {
> > + if (vring_use_dma_api(_vq->vdev)) {
> > + vq->desc_state_packed[curr].addr = addr;
> > + vq->desc_state_packed[curr].len =
> > + sg->length;
> > + vq->desc_state_packed[curr].flags =
> > + virtio16_to_cpu(_vq->vdev,
> > + flags);
> > + }
> > + curr = vq->desc_state_packed[curr].next;
> > +
> > + if (i >= vq->vring_packed.num) {
> > + i = 0;
> > + vq->avail_wrap_counter ^= 1;
> > + }
> > + }
> > + }
> > + }
> > +
> > + prev = (i > 0 ? i : vq->vring_packed.num) - 1;
> > + desc[prev].id = cpu_to_virtio16(_vq->vdev, id);
>
> Is it easier to write this out in all descriptors, to avoid the need to
> calculate prev?
Yeah, I'll do that.
>
> > +
> > + /* Last one doesn't continue. */
> > + if (total_sg == 1)
> > + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > + else
> > + desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
> > + ~VRING_DESC_F_NEXT);
>
> Wouldn't it be easier to avoid setting VRING_DESC_F_NEXT
> in the first place?
> if (n != in_sgs - 1 && n != out_sgs - 1)
Will this affect the branch prediction?
>
> must be better than writing descriptor an extra time.
Not quite sure about this. I think this descriptor has just
been written, it should still be in the cache.
>
> > +
> > + if (indirect) {
> > + /* Now that the indirect table is filled in, map it. */
> > + dma_addr_t addr = vring_map_single(
> > + vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > + DMA_TO_DEVICE);
> > + if (vring_mapping_error(vq, addr))
> > + goto unmap_release;
> > +
> > + head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > + _VRING_DESC_F_AVAIL(avail_wrap_counter) |
> > + _VRING_DESC_F_USED(!avail_wrap_counter));
> > + vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
> > + addr);
> > + vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > + total_sg * sizeof(struct vring_packed_desc));
> > + vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
> > +
> > + if (vring_use_dma_api(_vq->vdev)) {
> > + vq->desc_state_packed[id].addr = addr;
> > + vq->desc_state_packed[id].len = total_sg *
> > + sizeof(struct vring_packed_desc);
> > + vq->desc_state_packed[id].flags =
> > + virtio16_to_cpu(_vq->vdev, head_flags);
> > + }
> > + }
> > +
> > + /* We're using some buffers from the free list. */
> > + vq->vq.num_free -= descs_used;
> > +
> > + /* Update free pointer */
> > + if (indirect) {
> > + n = head + 1;
> > + if (n >= vq->vring_packed.num) {
> > + n = 0;
> > + vq->avail_wrap_counter ^= 1;
> > + }
> > + vq->next_avail_idx = n;
> > + vq->free_head = vq->desc_state_packed[id].next;
> > + } else {
> > + vq->next_avail_idx = i;
> > + vq->free_head = curr;
> > + }
> > +
> > + /* Store token and indirect buffer state. */
> > + vq->desc_state_packed[id].num = descs_used;
> > + vq->desc_state_packed[id].data = data;
> > + if (indirect)
> > + vq->desc_state_packed[id].indir_desc = desc;
> > + else
> > + vq->desc_state_packed[id].indir_desc = ctx;
> > +
> > + /* A driver MUST NOT make the first descriptor in the list
> > + * available before all subsequent descriptors comprising
> > + * the list are made available. */
> > + virtio_wmb(vq->weak_barriers);
> > + vq->vring_packed.desc[head].flags = head_flags;
> > + vq->num_added += descs_used;
> > +
> > + pr_debug("Added buffer head %i to %p\n", head, vq);
> > + END_USE(vq);
> > +
> > + return 0;
> > +
> > +unmap_release:
> > + err_idx = i;
> > + i = head;
> > +
> > + for (n = 0; n < total_sg; n++) {
> > + if (i == err_idx)
> > + break;
> > + vring_unmap_desc_packed(vq, &desc[i]);
> > + i++;
> > + if (!indirect && i >= vq->vring_packed.num)
> > + i = 0;
> > + }
> > +
> > + vq->avail_wrap_counter = avail_wrap_counter;
> > +
> > + if (indirect)
> > + kfree(desc);
> > +
> > + END_USE(vq);
> > return -EIO;
> > }
> >
> > static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > {
> > - return false;
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + u16 flags;
> > + bool needs_kick;
> > + u32 snapshot;
> > +
> > + START_USE(vq);
> > + /* We need to expose the new flags value before checking notification
> > + * suppressions. */
> > + virtio_mb(vq->weak_barriers);
> > +
> > + snapshot = READ_ONCE(*(u32 *)vq->vring_packed.device);
> > + flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
>
> What are all these hard-coded things? Also either we do READ_ONCE
> everywhere or nowhere. Why is this place special? Any why read 32 bit
> if you only want the 16?
Yeah, READ_ONCE() and reading 32bits are not really needed in
this patch. But it's needed in the next patch. I thought it's
not wrong to do this, so I introduced them from the first place.
Just to double check: is the below code (apart from the
hard-coded value and virtio16) from the next patch OK?
"""
snapshot = READ_ONCE(*(u32 *)vq->vring_packed.device);
+ off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0xffff));
flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
"""
>
> And doesn't sparse complain about cast to __virtio16?
I'll give it a try in the next version.
>
> > +
> > +#ifdef DEBUG
> > + if (vq->last_add_time_valid) {
> > + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> > + vq->last_add_time)) > 100);
> > + }
> > + vq->last_add_time_valid = false;
> > +#endif
> > +
> > + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > + END_USE(vq);
> > + return needs_kick;
> > +}
> > +
> > +static void detach_buf_packed(struct vring_virtqueue *vq,
> > + unsigned int id, void **ctx)
> > +{
> > + struct vring_desc_state_packed *state = NULL;
> > + struct vring_packed_desc *desc;
> > + unsigned int curr, i;
> > +
> > + /* Clear data ptr. */
> > + vq->desc_state_packed[id].data = NULL;
> > +
> > + curr = id;
> > + for (i = 0; i < vq->desc_state_packed[id].num; i++) {
> > + state = &vq->desc_state_packed[curr];
> > + vring_unmap_state_packed(vq, state);
> > + curr = state->next;
> > + }
> > +
> > + BUG_ON(state == NULL);
> > + vq->vq.num_free += vq->desc_state_packed[id].num;
> > + state->next = vq->free_head;
> > + vq->free_head = id;
> > +
> > + if (vq->indirect) {
> > + u32 len;
> > +
> > + /* Free the indirect table, if any, now that it's unmapped. */
> > + desc = vq->desc_state_packed[id].indir_desc;
> > + if (!desc)
> > + return;
> > +
> > + if (vring_use_dma_api(vq->vq.vdev)) {
> > + len = vq->desc_state_packed[id].len;
> > + for (i = 0; i < len / sizeof(struct vring_packed_desc);
> > + i++)
> > + vring_unmap_desc_packed(vq, &desc[i]);
> > + }
> > + kfree(desc);
> > + vq->desc_state_packed[id].indir_desc = NULL;
> > + } else if (ctx) {
> > + *ctx = vq->desc_state_packed[id].indir_desc;
> > + }
> > +}
> > +
> > +static inline bool is_used_desc_packed(const struct vring_virtqueue *vq,
> > + u16 idx, bool used_wrap_counter)
> > +{
> > + u16 flags;
> > + bool avail, used;
> > +
> > + flags = virtio16_to_cpu(vq->vq.vdev,
> > + vq->vring_packed.desc[idx].flags);
> > + avail = !!(flags & VRING_DESC_F_AVAIL);
> > + used = !!(flags & VRING_DESC_F_USED);
> > +
> > + return avail == used && used == used_wrap_counter;
>
> I think that you don't need to look at avail flag to detect a used
> descriptor. The reason device writes it is to avoid confusing
> *device* next time descriptor wraps.
Okay, I'll just check the used flag.
>
> > }
> >
> > static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > {
> > - return false;
> > + return is_used_desc_packed(vq, vq->last_used_idx,
> > + vq->used_wrap_counter);
> > }
> >
> > static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > unsigned int *len,
> > void **ctx)
> > {
> > - return NULL;
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + u16 last_used, id;
> > + void *ret;
> > +
> > + START_USE(vq);
> > +
> > + if (unlikely(vq->broken)) {
> > + END_USE(vq);
> > + return NULL;
> > + }
> > +
> > + if (!more_used_packed(vq)) {
> > + pr_debug("No more buffers in queue\n");
> > + END_USE(vq);
> > + return NULL;
> > + }
> > +
> > + /* Only get used elements after they have been exposed by host. */
> > + virtio_rmb(vq->weak_barriers);
> > +
> > + last_used = vq->last_used_idx;
> > + id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> > + *len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> > +
> > + if (unlikely(id >= vq->vring_packed.num)) {
> > + BAD_RING(vq, "id %u out of range\n", id);
> > + return NULL;
> > + }
> > + if (unlikely(!vq->desc_state_packed[id].data)) {
> > + BAD_RING(vq, "id %u is not a head!\n", id);
> > + return NULL;
> > + }
> > +
> > + vq->last_used_idx += vq->desc_state_packed[id].num;
> > + if (vq->last_used_idx >= vq->vring_packed.num) {
> > + vq->last_used_idx -= vq->vring_packed.num;
> > + vq->used_wrap_counter ^= 1;
> > + }
> > +
> > + /* detach_buf_packed clears data, so grab it now. */
> > + ret = vq->desc_state_packed[id].data;
> > + detach_buf_packed(vq, id, ctx);
> > +
> > +#ifdef DEBUG
> > + vq->last_add_time_valid = false;
> > +#endif
> > +
> > + END_USE(vq);
> > + return ret;
> > }
> >
> > static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> > {
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > + if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
> > + vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> > + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > + vq->event_flags_shadow);
> > + }
> > }
> >
> > static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > {
> > - return 0;
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > + START_USE(vq);
> > +
> > + /* We optimistically turn back on interrupts, then check if there was
> > + * more to do. */
> > +
> > + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > + vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > + vq->event_flags_shadow);
> > + }
> > +
> > + END_USE(vq);
> > + return vq->last_used_idx | ((u16)vq->used_wrap_counter << 15);
> > }
> >
> > -static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > +static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned off_wrap)
> > {
> > - return false;
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + bool wrap_counter;
> > + u16 used_idx;
> > +
> > + wrap_counter = off_wrap >> 15;
> > + used_idx = off_wrap & ~(1 << 15);
> > +
> > + return is_used_desc_packed(vq, used_idx, wrap_counter);
>
> These >> 15 << 15 all over the place duplicate info.
> Pls put 15 in the header.
Sure.
>
> Also can you maintain the counters properly shifted?
> Then just use them.
Then, we may need to maintain both of the shifted wrapper
counters and un-shifted wrapper counters at the same time.
>
>
> > }
> >
> > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > {
> > - return false;
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > + START_USE(vq);
> > +
> > + /* We optimistically turn back on interrupts, then check if there was
> > + * more to do. */
> > +
> > + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > + vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > + vq->event_flags_shadow);
> > + /* We need to enable interrupts first before re-checking
> > + * for more used buffers. */
> > + virtio_mb(vq->weak_barriers);
> > + }
> > +
> > + if (more_used_packed(vq)) {
> > + END_USE(vq);
> > + return false;
> > + }
> > +
> > + END_USE(vq);
> > + return true;
> > }
> >
> > static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> > {
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + unsigned int i;
> > + void *buf;
> > +
> > + START_USE(vq);
> > +
> > + for (i = 0; i < vq->vring_packed.num; i++) {
> > + if (!vq->desc_state_packed[i].data)
> > + continue;
> > + /* detach_buf clears data, so grab it now. */
> > + buf = vq->desc_state_packed[i].data;
> > + detach_buf_packed(vq, i, NULL);
> > + END_USE(vq);
> > + return buf;
> > + }
> > + /* That should have freed everything. */
> > + BUG_ON(vq->vq.num_free != vq->vring_packed.num);
> > +
> > + END_USE(vq);
> > return NULL;
> > }
> >
> > @@ -1083,6 +1559,9 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > + /* We need to enable interrupts first before re-checking
> > + * for more used buffers. */
> > + virtio_mb(vq->weak_barriers);
> > return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
> > virtqueue_poll_split(_vq, last_used_idx);
> > }
>
> Possible optimization for when dma API is in use:
Got it. Will give it a try!
Best regards,
Tiwei Bie
>
> --->
>
> dma: detecting nop unmap
>
> drivers need to maintain the dma address for unmap purposes,
> but these cycles are wasted when unmap callback is not
> defined. Add an API for drivers to check that and avoid
> unmap completely. Debug builds still have unmap.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ----
>
> diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> index a785f2507159..38b2c27c8449 100644
> --- a/include/linux/dma-debug.h
> +++ b/include/linux/dma-debug.h
> @@ -42,6 +42,11 @@ extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> size_t size, int direction, bool map_single);
>
> +static inline bool has_debug_dma_unmap(struct device *dev)
> +{
> + return true;
> +}
> +
> extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
> int nents, int mapped_ents, int direction);
>
> @@ -121,6 +126,11 @@ static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> {
> }
>
> +static inline bool has_debug_dma_unmap(struct device *dev)
> +{
> + return false;
> +}
> +
> static inline void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
> int nents, int mapped_ents, int direction)
> {
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 1db6a6b46d0d..656f3e518166 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -241,6 +241,14 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
> return addr;
> }
>
> +static inline bool has_dma_unmap(struct device *dev)
> +{
> + const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> + return ops->unmap_page || ops->unmap_sg || ops->unmap_resource ||
> + has_dma_unmap(dev);
> +}
> +
> static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
> size_t size,
> enum dma_data_direction dir,
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox