* linux-next: manual merge of the staging tree with the net-next tree
From: Stephen Rothwell @ 2018-03-15 7:35 UTC (permalink / raw)
To: Greg KH, David Miller, Networking
Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
Denys Vlasenko
[-- Attachment #1: Type: text/plain, Size: 800 bytes --]
Hi all,
Today's linux-next merge of the staging tree got a conflict in:
drivers/staging/irda/net/af_irda.c
between commit:
9b2c45d479d0 ("net: make getname() functions return length rather than use int* parameter")
from the net-next tree and commit:
d64c2a76123f ("staging: irda: remove the irda network stack and drivers")
from the staging tree.
I fixed it up (I just removed the file) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging. You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH V2] brcmfmac: drop Inter-Access Point Protocol packets by default
From: Rafał Miłecki @ 2018-03-15 7:34 UTC (permalink / raw)
To: Kalle Valo
Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
Wright Feng, Pieter-Paul Giesberts, James Hughes,
linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER <brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>,,
Network Development, Linus Lüssing, Felix Fietkau, bridge
In-Reply-To: <20180315072909.1512-1-zajec5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On 15 March 2018 at 08:29, Rafał Miłecki <zajec5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> From: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
>
> Testing brcmfmac with more recent firmwares resulted in AP interfaces
> not working in some specific setups. Debugging resulted in discovering
> support for IAPP in Broadcom's firmwares.
>
> Older firmwares were only generating 802.11f frames. Newer ones like:
> 1) 10.10 (TOB) (r663589)
> 2) 10.10.122.20 (r683106)
> for 4366b1 and 4366c0 respectively seem to also /respect/ 802.11f frames
> in the Tx path by performing a STA disassociation.
>
> This obsoleted standard and its implementation is something that:
> 1) Most people don't need / want to use
> 2) Can allow local DoS attacks
> 3) Breaks AP interfaces in some specific bridge setups
>
> To solve issues it can cause this commit modifies brcmfmac to drop IAPP
> packets. If affects:
> 1) Rx path: driver won't be sending these unwanted packets up.
> 2) Tx path: driver will reject packets that would trigger STA
> disassociation perfromed by a firmware (possible local DoS attack).
>
> It appears there are some Broadcom's clients/users who care about this
> feature despite the drawbacks. They can switch it on using a new module
> param.
>
> This change results in only two more comparisons (check for module param
> and check for Ethernet packet length) for 99.9% of packets. Its overhead
> should be very minimal.
>
> Signed-off-by: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
> ---
I forgot to include the changelog, sorry.
V2: Use module param to don't /abuse/ Kconfig
Slightly optimize brcmf_skb_is_iapp
Move some description from Kconfig to the code
Update commit description: specify affected fws & mention impact
^ permalink raw reply
* [PATCH V2] brcmfmac: drop Inter-Access Point Protocol packets by default
From: Rafał Miłecki @ 2018-03-15 7:29 UTC (permalink / raw)
To: Kalle Valo
Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
Wright Feng, Pieter-Paul Giesberts, James Hughes,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w,
brcm80211-dev-list-+wT8y+m8/X5BDgjK7y7TUQ,
netdev-u79uwXL29TY76Z2rM5mHXA, Linus Lüssing, Felix Fietkau,
bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Rafał Miłecki
From: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
Testing brcmfmac with more recent firmwares resulted in AP interfaces
not working in some specific setups. Debugging resulted in discovering
support for IAPP in Broadcom's firmwares.
Older firmwares were only generating 802.11f frames. Newer ones like:
1) 10.10 (TOB) (r663589)
2) 10.10.122.20 (r683106)
for 4366b1 and 4366c0 respectively seem to also /respect/ 802.11f frames
in the Tx path by performing a STA disassociation.
This obsoleted standard and its implementation is something that:
1) Most people don't need / want to use
2) Can allow local DoS attacks
3) Breaks AP interfaces in some specific bridge setups
To solve issues it can cause this commit modifies brcmfmac to drop IAPP
packets. If affects:
1) Rx path: driver won't be sending these unwanted packets up.
2) Tx path: driver will reject packets that would trigger STA
disassociation perfromed by a firmware (possible local DoS attack).
It appears there are some Broadcom's clients/users who care about this
feature despite the drawbacks. They can switch it on using a new module
param.
This change results in only two more comparisons (check for module param
and check for Ethernet packet length) for 99.9% of packets. Its overhead
should be very minimal.
Signed-off-by: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
---
.../wireless/broadcom/brcm80211/brcmfmac/common.c | 5 ++
.../wireless/broadcom/brcm80211/brcmfmac/common.h | 1 +
.../wireless/broadcom/brcm80211/brcmfmac/core.c | 57 ++++++++++++++++++++++
3 files changed, 63 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c
index 70ef9835b647..5532ef39439f 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c
@@ -75,6 +75,10 @@ static int brcmf_roamoff;
module_param_named(roamoff, brcmf_roamoff, int, S_IRUSR);
MODULE_PARM_DESC(roamoff, "Do not use internal roaming engine");
+static int brcmf_iapp_enable;
+module_param_named(iapp, brcmf_iapp_enable, int, 0);
+MODULE_PARM_DESC(iapp, "Enable partial support for the obsoleted Inter-Access Point Protocol");
+
#ifdef DEBUG
/* always succeed brcmf_bus_started() */
static int brcmf_ignore_probe_fail;
@@ -438,6 +442,7 @@ struct brcmf_mp_device *brcmf_get_module_param(struct device *dev,
settings->feature_disable = brcmf_feature_disable;
settings->fcmode = brcmf_fcmode;
settings->roamoff = !!brcmf_roamoff;
+ settings->iapp = !!brcmf_iapp_enable;
#ifdef DEBUG
settings->ignore_probe_fail = !!brcmf_ignore_probe_fail;
#endif
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.h
index a62f8e70b320..ef914619e8e1 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.h
@@ -58,6 +58,7 @@ struct brcmf_mp_device {
unsigned int feature_disable;
int fcmode;
bool roamoff;
+ bool iapp;
bool ignore_probe_fail;
struct brcmfmac_pd_cc *country_codes;
union {
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 19048526b4af..ca97a8b4c59f 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -230,6 +230,37 @@ static void brcmf_netdev_set_multicast_list(struct net_device *ndev)
schedule_work(&ifp->multicast_work);
}
+/**
+ * brcmf_skb_is_iapp - checks if skb is an IAPP packet
+ *
+ * @skb: skb to check
+ */
+static bool brcmf_skb_is_iapp(struct sk_buff *skb)
+{
+ static const u8 iapp_l2_update_packet[6] __aligned(2) = {
+ 0x00, 0x01, 0xaf, 0x81, 0x01, 0x00,
+ };
+ unsigned char *eth_data;
+#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+ const u16 *a, *b;
+#endif
+
+ if (skb->len - skb->mac_len != 6 ||
+ !is_multicast_ether_addr(eth_hdr(skb)->h_dest))
+ return false;
+
+ eth_data = skb_mac_header(skb) + ETH_HLEN;
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+ return !(((*(const u32 *)eth_data) ^ (*(const u32 *)iapp_l2_update_packet)) |
+ ((*(const u16 *)(eth_data + 4)) ^ (*(const u16 *)(iapp_l2_update_packet + 4))));
+#else
+ a = (const u16 *)eth_data;
+ b = (const u16 *)iapp_l2_update_packet;
+
+ return !((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2]));
+#endif
+}
+
static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
struct net_device *ndev)
{
@@ -250,6 +281,23 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
goto done;
}
+ /* Some recent Broadcom's firmwares disassociate STA when they receive
+ * an 802.11f ADD frame. This behavior can lead to a local DoS security
+ * issue. Attacker may trigger disassociation of any STA by sending a
+ * proper Ethernet frame to the wireless interface.
+ *
+ * Moreover this feature may break AP interfaces in some specific
+ * setups. This applies e.g. to the bridge with hairpin mode enabled and
+ * IFLA_BRPORT_MCAST_TO_UCAST set. IAPP packet generated by a firmware
+ * will get passed back to the wireless interface and cause immediate
+ * disassociation of a just-connected STA.
+ */
+ if (!drvr->settings->iapp && brcmf_skb_is_iapp(skb)) {
+ dev_kfree_skb(skb);
+ ret = -EINVAL;
+ goto done;
+ }
+
/* Make sure there's enough writeable headroom */
if (skb_headroom(skb) < drvr->hdrlen || skb_header_cloned(skb)) {
head_delta = max_t(int, drvr->hdrlen - skb_headroom(skb), 0);
@@ -325,6 +373,15 @@ void brcmf_txflowblock_if(struct brcmf_if *ifp,
void brcmf_netif_rx(struct brcmf_if *ifp, struct sk_buff *skb)
{
+ /* Most of Broadcom's firmwares send 802.11f ADD frame every time a new
+ * STA connects to the AP interface. This is an obsoleted standard most
+ * users don't use, so don't pass these frames up unless requested.
+ */
+ if (!ifp->drvr->settings->iapp && brcmf_skb_is_iapp(skb)) {
+ brcmu_pkt_buf_free_skb(skb);
+ return;
+ }
+
if (skb->pkt_type == PACKET_MULTICAST)
ifp->ndev->stats.multicast++;
--
2.11.0
^ permalink raw reply related
* [PATCH] net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
From: SZ Lin (林上智) @ 2018-03-15 6:41 UTC (permalink / raw)
Cc: SZ Lin (林上智), Schuyler Patton,
Grygorii Strashko, David S. Miller, Ivan Khoronzhuk, Keerthy,
Sekhar Nori, linux-omap, netdev, linux-kernel
According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register is only
available when PHY is configured in RGMII mode with 10Mbps speed. It will
cause some networking issues without RGMII mode, such as carrier sense
errors and low throughput. TI also mentioned this issue in their forum[4].
This patch adds the check mechanism for PHY interface with RGMII interface
type, the in-band mode can only be set in RGMII mode with 10Mbps speed.
References:
[1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
[2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
[3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
[4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155
Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Schuyler Patton <spatton@ti.com>
---
drivers/net/ethernet/ti/cpsw.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1b1b78fdc138..3bbf22ed59cf 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1014,7 +1014,13 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave,
/* set speed_in input in case RMII mode is used in 100Mbps */
if (phy->speed == 100)
mac_control |= BIT(15);
- else if (phy->speed == 10)
+
+ /* in band mode only works in 10Mbps RGMII mode */
+ else if ((phy->speed == 10) &&
+ ((phy->interface == PHY_INTERFACE_MODE_RGMII) ||
+ (phy->interface == PHY_INTERFACE_MODE_RGMII_ID) ||
+ (phy->interface == PHY_INTERFACE_MODE_RGMII_RXID) ||
+ (phy->interface == PHY_INTERFACE_MODE_RGMII_TXID)))
mac_control |= BIT(18); /* In Band mode */
if (priv->rx_pause)
--
2.16.2
^ permalink raw reply related
* Re: [PATCH 00/47] arch-removal: device drivers
From: Boris Brezillon @ 2018-03-15 5:32 UTC (permalink / raw)
To: Arnd Bergmann
Cc: ulf.hansson, linux-usb, wsa, linux-iio, viresh.kumar,
linus.walleij, alexandre.belloni, linux-ide, netdev, linux-mtd,
linux-i2c, linux-rtc, lars, herbert, corbet, linux-doc, stern,
linux-serial, jslaby, linux-mmc, shli, wg, linux-crypto,
linux-pwm, linux-watchdog, alsa-devel, b.zolnierkie, linux-input,
linux-can, linux-raid, linux-gpio, broonie, bp, linux-fbdev,
mchehab, lin
In-Reply-To: <20180314153603.3127932-1-arnd@arndb.de>
Hi Arnd,
On Wed, 14 Mar 2018 16:35:13 +0100
Arnd Bergmann <arnd@arndb.de> wrote:
> Hi driver maintainers,
>
> I just posted one series with the removal of eight architectures,
> see https://lkml.org/lkml/2018/3/14/505 for details, or
> https://lwn.net/Articles/748074/ for more background.
>
> These are the device drivers that go along with them. I have already
> picked up the drivers for arch/metag/ into my tree, they were reviewed
> earlier.
>
> Please let me know if you have any concerns with the patch, or if you
> prefer to pick up the patches in your respective trees. I created
> the patches with 'git format-patch -D', so they will not apply without
> manually removing those files.
>
> For anything else, I'd keep the removal patches in my asm-generic tree
> and will send a pull request for 4.17 along with the actual arch removal.
>
> Arnd
>
> Arnd Bergmann
> edac: remove tile driver
> net: tile: remove ethernet drivers
> net: adi: remove blackfin ethernet drivers
> net: 8390: remove m32r specific bits
> net: remove cris etrax ethernet driver
> net: smsc: remove m32r specific smc91x configuration
> raid: remove tile specific raid6 implementation
> rtc: remove tile driver
> rtc: remove bfin driver
> char: remove obsolete ds1302 rtc driver
> char: remove tile-srom.c
> char: remove blackfin OTP driver
> pcmcia: remove m32r drivers
> pcmcia: remove blackfin driver
> ASoC: remove blackfin drivers
> video/logo: remove obsolete logo files
> fbdev: remove blackfin drivers
> fbdev: s1d13xxxfb: remove m32r specific hacks
> crypto: remove blackfin CRC driver
> media: platform: remove blackfin capture driver
> media: platform: remove m32r specific arv driver
> cpufreq: remove blackfin driver
> cpufreq: remove cris specific drivers
> gpio: remove etraxfs driver
> pinctrl: remove adi2/blackfin drivers
> ata: remove bf54x driver
> input: keyboard: remove bf54x driver
> input: misc: remove blackfin rotary driver
> mmc: remove bfin_sdh driver
> can: remove bfin_can driver
> watchdog: remove bfin_wdt driver
> mtd: maps: remove bfin-async-flash driver
> mtd: nand: remove bf5xx_nand driver
If you don't mind, I'd like to take the mtd patches through the MTD
tree. As you've probably noticed, nand code has been moved around and
it's easier for me to carry those 2 simple changes in my tree than
creating an immutable branch.
Let me know if this is a problem.
Regards,
Boris
--
Boris Brezillon, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: [PATCH v2 net] net: sched: fix uses after free
From: John Fastabend @ 2018-03-15 5:19 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <5b7c5a08-4225-9f20-2a2c-57767a36a967@gmail.com>
On 03/14/2018 08:10 PM, John Fastabend wrote:
> On 03/14/2018 06:53 PM, Eric Dumazet wrote:
>> syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
>>
>> Issue here is that we can not reuse skb after a successful skb_array_produce()
>> since another cpu might have consumed it already.
>>
>> I believe a similar problem exists in try_bulk_dequeue_skb_slow()
>> in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
>>
>
> [...]
>
>> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
>> Cc: John Fastabend <john.fastabend@gmail.com>
>> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>> Cc: Cong Wang <xiyou.wangcong@gmail.com>
>> Cc: Jiri Pirko <jiri@resnulli.us>
>> ---
>> net/sched/sch_generic.c | 22 +++++++++++++---------
>> 1 file changed, 13 insertions(+), 9 deletions(-)
>>
>
> Thanks!
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 190570f21b208d5a17943360a3a6f85e1c2a2187..7e3fbe9cc936be376b66a5b12bf8957c3b601f2c 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -106,6 +106,14 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
>>
>> __skb_queue_tail(&q->skb_bad_txq, skb);
>>
>> + if (qdisc_is_percpu_stats(q)) {
>> + qdisc_qstats_cpu_backlog_inc(q, skb);
>
> So I guess the skb access above needs to be removed as
> well per your comment in the commit description. But that
> can be another patch.
>
Actually this is fine I read this too quickly on first
read.
Sorry for the noise. Looks good to me.
>> + qdisc_qstats_cpu_qlen_inc(q);
>> + } else {
>> + qdisc_qstats_backlog_inc(q, skb);
>> + q->q.qlen++;
>> + }
>> +
>> if (lock)
>> spin_unlock(lock);
>> }
>
>
^ permalink raw reply
* Re: [PATCH 06/47] net: smsc: remove m32r specific smc91x configuration
From: Finn Thain @ 2018-03-15 4:58 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Arnd Bergmann, linux-kernel, David S. Miller, netdev
In-Reply-To: <nycvar.YSQ.7.76.1803141144200.28583@knanqh.ubzr>
On Wed, 14 Mar 2018, Nicolas Pitre wrote:
> On Wed, 14 Mar 2018, Arnd Bergmann wrote:
>
> > The m32r architecture is getting removed, so this part can be
> > cleaned up as well.
> >
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> Acked-by: Nicolas Pitre <nico@fluxnic.net>
>
> > ---
> > drivers/net/ethernet/smsc/Kconfig | 4 ++--
> > drivers/net/ethernet/smsc/smc91x.h | 26 --------------------------
> > 2 files changed, 2 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/smsc/Kconfig b/drivers/net/ethernet/smsc/Kconfig
> > index 4c2f612e4414..1c2b6663f7ce 100644
> > --- a/drivers/net/ethernet/smsc/Kconfig
> > +++ b/drivers/net/ethernet/smsc/Kconfig
> > @@ -37,8 +37,8 @@ config SMC91X
> > select CRC32
> > select MII
> > depends on !OF || GPIOLIB
> > - depends on ARM || ARM64 || ATARI_ETHERNAT || BLACKFIN || COLDFIRE || \
> > - M32R || MIPS || MN10300 || NIOS2 || SUPERH || XTENSA || H8300
> > + depends on ARM || ARM64 || ATARI_ETHERNAT || COLDFIRE || \
> > + MIPS || NIOS2 || SUPERH || XTENSA || H8300
> > ---help---
> > This is a driver for SMC's 91x series of Ethernet chipsets,
> > including the SMC91C94 and the SMC91C111. Say Y if you want it
> > diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
> > index 08b17adf0a65..b337ee97e0c0 100644
> > --- a/drivers/net/ethernet/smsc/smc91x.h
> > +++ b/drivers/net/ethernet/smsc/smc91x.h
> > @@ -144,32 +144,6 @@ static inline void _SMC_outw_align4(u16 val, void __iomem *ioaddr, int reg,
> >
> > #define SMC_IRQ_FLAGS (0)
> >
> > -#elif defined(CONFIG_M32R)
> > -
> > -#define SMC_CAN_USE_8BIT 0
> > -#define SMC_CAN_USE_16BIT 1
> > -#define SMC_CAN_USE_32BIT 0
> > -
> > -#define SMC_inb(a, r) inb(((u32)a) + (r))
> > -#define SMC_inw(a, r) inw(((u32)a) + (r))
> > -#define SMC_outb(v, a, r) outb(v, ((u32)a) + (r))
> > -#define SMC_outw(lp, v, a, r) outw(v, ((u32)a) + (r))
> > -#define SMC_insw(a, r, p, l) insw(((u32)a) + (r), p, l)
> > -#define SMC_outsw(a, r, p, l) outsw(((u32)a) + (r), p, l)
> > -
> > -#define SMC_IRQ_FLAGS (0)
> > -
> > -#define RPC_LSA_DEFAULT RPC_LED_TX_RX
> > -#define RPC_LSB_DEFAULT RPC_LED_100_10
> > -
> > -#elif defined(CONFIG_MN10300)
MN103 is a separate architecture, unrelated to M32R afaict...
--
> > -
> > -/*
> > - * MN10300/AM33 configuration
> > - */
> > -
> > -#include <unit/smc91111.h>
> > -
> > #elif defined(CONFIG_ATARI)
> >
> > #define SMC_CAN_USE_8BIT 1
> > --
> > 2.9.0
> >
> >
>
^ permalink raw reply
* Re: [PATCH RFC bpf-next 1/6] bpf: Hooks for sys_bind
From: Alexei Starovoitov @ 2018-03-15 3:37 UTC (permalink / raw)
To: Eric Dumazet
Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
Network Development, Kernel Team
In-Reply-To: <97dc8c66-9701-7970-bb38-750d79f767c8@gmail.com>
On Wed, Mar 14, 2018 at 05:17:54PM -0700, Eric Dumazet wrote:
>
>
> On 03/14/2018 11:41 AM, Alexei Starovoitov wrote:
> > On Wed, Mar 14, 2018 at 11:00 AM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >>> It seems this is exactly the case where a netns would be the correct answer.
> >>
> >> Unfortuantely that's not the case. That's what I tried to explain
> >> in the cover letter:
> >> "The setup involves per-container IPs, policy, etc, so traditional
> >> network-only solutions that involve VRFs, netns, acls are not applicable."
> >> To elaborate more on that:
> >> netns is l2 isolation.
> >> vrf is l3 isolation.
> >> whereas to containerize an application we need to punch connectivity holes
> >> in these layered techniques.
> >> We also considered resurrecting Hannes's afnetns work
> >> and even went as far as designing a new namespace for L4 isolation.
> >> Unfortunately all hierarchical namespace abstraction don't work.
> >> To run an application inside cgroup container that was not written
> >> with containers in mind we have to make an illusion of running
> >> in non-containerized environment.
> >> In some cases we remember the port and container id in the post-bind hook
> >> in a bpf map and when some other task in a different container is trying
> >> to connect to a service we need to know where this service is running.
> >> It can be remote and can be local. Both client and service may or may not
> >> be written with containers in mind and this sockaddr rewrite is providing
> >> connectivity and load balancing feature that you simply cannot do
> >> with hierarchical networking primitives.
> >
> > have to explain this a bit further...
> > We also considered hacking these 'connectivity holes' in
> > netns and/or vrf, but that would be real layering violation,
> > since clean l2, l3 abstraction would suddenly support
> > something that breaks through the layers.
> > Just like many consider ipvlan a bad hack that punches
> > through the layers and connects l2 abstraction of netns
> > at l3 layer, this is not something kernel should ever do.
> > We really didn't want another ipvlan-like hack in the kernel.
> > Instead bpf programs at bind/connect time _help_
> > applications discover and connect to each other.
> > All containers are running in init_nens and there are no vrfs.
> > After bind/connect the normal fib/neighbor core networking
> > logic works as it should always do. The whole system is
> > clean from network point of view.
>
>
> We apparently missed something when deploying ipvlan and one netns per
> container/job
Hanness expressed the reasons why RHEL doesn't support ipvlan long ago.
I couldn't find the complete link. This one mentions some of the issues:
https://www.mail-archive.com/netdev@vger.kernel.org/msg157614.html
Since ipvlan works for you, great, but it's clearly a layering violation.
ipvlan connects L2 namespaces via L3 by doing its own fib lookups.
To me it's a definition 'punch connectivity hole' in L2 abstraction.
In normal L2 setup of netns+veth the traffic from one netns should
have went into another netns via full L2. ipvlan cheats by giving
L3 connectivity. It's not clean to me. There are still neighbour
tables in netnses that are duplicated.
Because netns is L2 there is full requeuing for traffic across netnses.
I guess google doesn't prioritize container to container traffic
while outside into netns via ipvlan works ok similar to bond, but
imo it's cheating too.
imo afnetns would have been much better alternative for your
use case without ipvlan pitfalls, but as you said ipvlan already
in the tree and afnetns is not.
With afnetns early demux would have worked not only for traffic from
the network, but for traffic across afnetns-es.
> I find netns isolation very clean, powerful, and it is there already.
netns+veth is a clean abstraction, but netns+ipvlan is imo not.
imo VRF is another clean L3 abstraction. Yet some folks tried
to do VRF-like things with netns.
David Ahern wrote nice blog about issues with that.
I suspect VRF also could have worked for google use case
and would have been easier to use than netns+ipvlan.
But since ipvlan works for you in the current shape, great,
I'm not going to argue further.
Let's agree to disagree on cleanliness of the solution.
> It also works with UDP just fine. Are you considering adding a hook
> later for sendmsg() (unconnected socket or not), or do you want to use
> the existing one in ip_finish_output(), adding per-packet overhead ?
Currently that's indeed the case. Existing cgroup-bpf hooks
at ip_finish_output work for many use cases, but per-packet overhead
is bad. With bind/connect hooks we avoid that overhead for
good traffic (which is tcp and connected udp). We still need
to solve it for unconnected udp. Rough idea is to do similar
sockaddr rewrite/drop in unconnected part of udp_sendmsg.
^ permalink raw reply
* Re: [PATCH v2 iproute2-next 0/6] cm_id, cq, mr, and pd resource tracking
From: Jason Gunthorpe @ 2018-03-15 3:29 UTC (permalink / raw)
To: David Ahern; +Cc: Leon Romanovsky, stephen, Steve Wise, netdev, linux-rdma
In-Reply-To: <e0daff32-9957-e90b-e82e-b1352f42d6e4@gmail.com>
On Wed, Mar 14, 2018 at 08:14:53PM -0700, David Ahern wrote:
> On 3/13/18 2:13 PM, Jason Gunthorpe wrote:
> > Could you pull the uapi headers from linux-next? That tree will have
> > both netdev and rdma stuff merged together properly.
>
> What's the merge history between linux-next, Linus' tree, net-next +
> rdma-next?
Not sure I understand the question?
linux-next is thrown away every cycle, so for instance you can't say
"took uapi headers from linux-next commit XYZ" in the iproute git
commit..
Otherwise, linux-next is built, I think daily/weekly and includes the
latest of both rdma and netdev next trees, so it certainly has the
right content.
A script could figure out the stable netdev and rdma commit IDs from
linux-next and record those, if that is interest..
Jason
^ permalink raw reply
* Re: [PATCH v2 iproute2-next 0/6] cm_id, cq, mr, and pd resource tracking
From: David Ahern @ 2018-03-15 3:14 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Leon Romanovsky, stephen, Steve Wise, netdev, linux-rdma
In-Reply-To: <20180313211355.GD21498@ziepe.ca>
On 3/13/18 2:13 PM, Jason Gunthorpe wrote:
> Could you pull the uapi headers from linux-next? That tree will have
> both netdev and rdma stuff merged together properly.
What's the merge history between linux-next, Linus' tree, net-next +
rdma-next?
^ permalink raw reply
* Re: [PATCH v2 net] net: sched: fix uses after free
From: John Fastabend @ 2018-03-15 3:10 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <20180315015300.233327-1-edumazet@google.com>
On 03/14/2018 06:53 PM, Eric Dumazet wrote:
> syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
>
> Issue here is that we can not reuse skb after a successful skb_array_produce()
> since another cpu might have consumed it already.
>
> I believe a similar problem exists in try_bulk_dequeue_skb_slow()
> in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
>
[...]
> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> ---
> net/sched/sch_generic.c | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
Thanks!
Acked-by: John Fastabend <john.fastabend@gmail.com>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 190570f21b208d5a17943360a3a6f85e1c2a2187..7e3fbe9cc936be376b66a5b12bf8957c3b601f2c 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -106,6 +106,14 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
>
> __skb_queue_tail(&q->skb_bad_txq, skb);
>
> + if (qdisc_is_percpu_stats(q)) {
> + qdisc_qstats_cpu_backlog_inc(q, skb);
So I guess the skb access above needs to be removed as
well per your comment in the commit description. But that
can be another patch.
> + qdisc_qstats_cpu_qlen_inc(q);
> + } else {
> + qdisc_qstats_backlog_inc(q, skb);
> + q->q.qlen++;
> + }
> +
> if (lock)
> spin_unlock(lock);
> }
^ permalink raw reply
* [PATCH] mlx5: Remove call to ida_pre_get
From: Matthew Wilcox @ 2018-03-15 2:57 UTC (permalink / raw)
To: Saeed Mahameed, Matan Barak, Leon Romanovsky; +Cc: netdev, linux-rdma
From: Matthew Wilcox <mawilcox@microsoft.com>
The mlx5 driver calls ida_pre_get() in a loop for no readily apparent
reason. The driver uses ida_simple_get() which will call ida_pre_get()
by itself and there's no need to use ida_pre_get() unless using
ida_get_new().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 10e16381f20a..3ba07c7096ef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1647,7 +1647,6 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft,
list_for_each_entry(iter, match_head, list) {
nested_down_read_ref_node(&iter->g->node, FS_LOCK_PARENT);
- ida_pre_get(&iter->g->fte_allocator, GFP_KERNEL);
}
search_again_locked:
^ permalink raw reply related
* [PATCH v2] rsi: Remove stack VLA usage
From: Tobin C. Harding @ 2018-03-15 2:31 UTC (permalink / raw)
To: Kalle Valo
Cc: Tobin C. Harding, kernel-hardening, linux-kernel, netdev,
linux-wireless, Tycho Andersen, Kees Cook, Larry Finger
The use of stack Variable Length Arrays needs to be avoided, as they
can be a vector for stack exhaustion, which can be both a runtime bug
(kernel Oops) or a security flaw (overwriting memory beyond the
stack). Also, in general, as code evolves it is easy to lose track of
how big a VLA can get. Thus, we can end up having runtime failures
that are hard to debug. As part of the directive[1] to remove all VLAs
from the kernel, and build with -Wvla.
Currently rsi code uses a VLA based on a function argument to
`rsi_sdio_load_data_master_write()`. The function call chain is
Both these functions
rsi_sdio_reinit_device()
rsi_probe()
start the call chain:
rsi_hal_device_init()
rsi_load_fw()
auto_fw_upgrade()
ping_pong_write()
rsi_sdio_load_data_master_write()
[Without familiarity with the code] it appears that none of the 4 locks
mutex
rx_mutex
tx_mutex
tx_bus_mutex
are held when `rsi_sdio_load_data_master_write()` is called. It is therefore
safe to use kmalloc with GFP_KERNEL.
We can avoid using the VLA by using `kmalloc()` and free'ing the memory on all
exit paths.
Change buffer from 'u8 array' to 'u8 *'. Call `kmalloc()` to allocate memory for
the buffer. Using goto statement to call `kfree()` on all return paths.
It can be expected that this patch will result in a small increase in overhead
due to the use of `kmalloc()` however this code is only called on initialization
(and re-initialization) so this overhead should not degrade performance.
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Tobin C. Harding <me@tobin.cc>
---
This applies onto tip of wireless-drivers next, commit
(28bf8312a983 mwifiex: get_channel from firmware
v2:
- Use kmalloc instead of #define (suggested by Larry)
- (Apply on top of wireless-drivers-next tree)
- (Fix up user name on patchwork.kernel.org)
drivers/net/wireless/rsi/rsi_91x_sdio.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c
index 98c7d1dae18e..13705fca59dd 100644
--- a/drivers/net/wireless/rsi/rsi_91x_sdio.c
+++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c
@@ -576,7 +576,7 @@ static int rsi_sdio_load_data_master_write(struct rsi_hw *adapter,
{
u32 num_blocks, offset, i;
u16 msb_address, lsb_address;
- u8 temp_buf[block_size];
+ u8 *temp_buf;
int status;
num_blocks = instructions_sz / block_size;
@@ -585,11 +585,15 @@ static int rsi_sdio_load_data_master_write(struct rsi_hw *adapter,
rsi_dbg(INFO_ZONE, "ins_size: %d, num_blocks: %d\n",
instructions_sz, num_blocks);
+ temp_buf = kmalloc(block_size, GFP_KERNEL);
+ if (!temp_buf)
+ return -ENOMEM;
+
/* Loading DM ms word in the sdio slave */
status = rsi_sdio_master_access_msword(adapter, msb_address);
if (status < 0) {
rsi_dbg(ERR_ZONE, "%s: Unable to set ms word reg\n", __func__);
- return status;
+ goto out_free;
}
for (offset = 0, i = 0; i < num_blocks; i++, offset += block_size) {
@@ -601,7 +605,7 @@ static int rsi_sdio_load_data_master_write(struct rsi_hw *adapter,
temp_buf, block_size);
if (status < 0) {
rsi_dbg(ERR_ZONE, "%s: failed to write\n", __func__);
- return status;
+ goto out_free;
}
rsi_dbg(INFO_ZONE, "%s: loading block: %d\n", __func__, i);
base_address += block_size;
@@ -616,7 +620,7 @@ static int rsi_sdio_load_data_master_write(struct rsi_hw *adapter,
rsi_dbg(ERR_ZONE,
"%s: Unable to set ms word reg\n",
__func__);
- return status;
+ goto out_free;
}
}
}
@@ -632,12 +636,16 @@ static int rsi_sdio_load_data_master_write(struct rsi_hw *adapter,
temp_buf,
instructions_sz % block_size);
if (status < 0)
- return status;
+ goto out_free;
rsi_dbg(INFO_ZONE,
"Written Last Block in Address 0x%x Successfully\n",
offset | RSI_SD_REQUEST_MASTER);
}
- return 0;
+
+ status = 0;
+out_free:
+ kfree(temp_buf);
+ return status;
}
#define FLASH_SIZE_ADDR 0x04000016
--
2.7.4
^ permalink raw reply related
* Re: [PATCH 2/2] net: phy: relax error checking when creating sysfs link netdev->phydev
From: Greg Kroah-Hartman @ 2018-03-15 2:26 UTC (permalink / raw)
To: Grygorii Strashko
Cc: David S. Miller, netdev, Andrew Lunn, Florian Fainelli,
Sekhar Nori, linux-kernel, linux-omap
In-Reply-To: <20180314222624.12744-3-grygorii.strashko@ti.com>
On Wed, Mar 14, 2018 at 05:26:24PM -0500, Grygorii Strashko wrote:
> Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
> one netdevice, as result such drivers will produce warning during system
> boot and fail to connect second phy to netdevice when PHYLIB framework
> will try to create sysfs link netdev->phydev for second PHY
> in phy_attach_direct(), because sysfs link with the same name has been
> created already for the first PHY. As result, second CPSW external
> port will became unusable.
>
> Fix it by relaxing error checking when PHYLIB framework is creating sysfs
> link netdev->phydev in phy_attach_direct(), suppressing warning by using
> sysfs_create_link_nowarn() and adding debug message instead.
>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Fixes: a3995460491d ("net: phy: Relax error checking on sysfs_create_link()")
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
> drivers/net/phy/phy_device.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 478405e..fe16f58 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1012,10 +1012,17 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> err = sysfs_create_link(&phydev->mdio.dev.kobj, &dev->dev.kobj,
> "attached_dev");
> if (!err) {
> - err = sysfs_create_link(&dev->dev.kobj, &phydev->mdio.dev.kobj,
> - "phydev");
> - if (err)
> - goto error;
> + err = sysfs_create_link_nowarn(&dev->dev.kobj,
> + &phydev->mdio.dev.kobj,
> + "phydev");
> + if (err) {
> + dev_err(&dev->dev, "could not add device link to %s err %d\n",
> + kobject_name(&phydev->mdio.dev.kobj),
> + err);
dev_err() is not a "debugging" message :)
What is a user going to do with this new error? If it's not important
at all, why care about it?
> + /* non-fatal - some net drivers can use one netdevice
> + * with more then one phy
> + */
What about devices that do not have more than one phy and this call
fails for? Shouldn't you check for that?
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-15 2:17 UTC (permalink / raw)
To: Alexander Duyck
Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <CAKgT0UeBHGDmr0uXfuPQJKOomiH9uvXRVYpyDwMOzjkxAZMSYg@mail.gmail.com>
On 3/14/2018 9:44 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Hi Alexander,
>>
>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>> On Wed, Mar 14, 2018 at 5:13 AM, <okaya@codeaurora.org> wrote:
>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>
>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>
>>> Actually I would argue this whole patch set is pointless. For starters
>>> why is it we are only updating the Intel Ethernet drivers here?
>>
>> I did a regex search for wmb() followed by writel() in each drivers directory.
>> I scrubbed the ones I care about and posted this series. Note also that
>> I have one Infiniband patch in the series.
>
> I didn't see it as it I was only looking at the patches that had ended
> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
> find that on LKML.
Yeah, I didn't have a cover page. These patches were sitting on my branch
for a while. I wanted to get them out without putting too much effort into
it. I'll add it on the next version.
>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>
> It might be advisable to break things up to subsystem or family. So
> for example if you are going to update the Intel Ethernet drivers I
> would focus on that and maybe spin the infiniband patch of into a
> separate set that can be applied to a separate tree. This is something
> I would consider more of a driver optimization than a fix. In our case
> it makes it easier for us to maintain the patches to the Intel drivers
> if you could submit just those to Jeff and Intel-wired-lan so that we
> can take care of test and review as well as figure out what other
> drivers will would still need to update in order to handle all the
> cases involved in this.
>
>>> This
>>> seems like something that is going to impact the whole kernel tree
>>> since many of us have been writing drivers for some time assuming x86
>>> style behavior.
>>
>> That's true. We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>
> My only real objection is that you are going to be having to scrub
> pretty much ALL the drivers. It seems a little like trying to fix a
> bad tire on your car by paving the road to match the shape of the
> tire.
Or we start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.
We could do several iterations like you are suggesting for each subsystem.
>
>>>
>>> It doesn't make sense to be optimizing the drivers for one subset of
>>> architectures. The scope of the work needed to update the drivers for
>>> this would be ridiculous. Also I don't see how this could be expected
>>> to work on any other architecture when we pretty much need to have a
>>> wmb() before calling the writel on x86 to deal with accesses between
>>> coherent and non-coherent memory. It seems to me more like somebody
>>> added what they considered to be an optimization somewhere that is a
>>> workaround for a poorly written driver. Either that or the barrier is
>>> serving a different purpose then the one we were using.
>>
>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>> writel_relaxed()? I thought everything is well described in barriers
>> document about what to expect from these APIs.
>>
>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>> It doesn't really change anything for x86 but it saves barriers on
>> other architectures.
>
> Yeah. I had to go through and do some review since my concerns have
> been PowerPC, IA64, and x86 historically. From what I can tell all
> those architectures are setup the same way so that shouldn't be an
> issue.
OK, glad that we are in common understanding.
>
>>>
>>> It would make more sense to put in the effort making writel and
>>> writel_relaxed consistent between architectures before we go through
>>> and start modifying driver code to support different architectures.
>>>
>>
>> Is there an arch problem that I'm not seeing?
>>
>> Sinan
>
> It isn't really an arch problem I have as a logistical one. It just
> seems like this is really backwards in terms of how this has been
> handled. For the x86 we have historically had to deal with the
> barriers for this kind of stuff ourselves, now for ARM and a couple
> other architectures they seem to have incorporated the barriers into
> writel and are expecting everyone to move over to writel_relaxed.
You want to move to writel_relaxed() only if you know that your register
accesses won't have any side effects.
If you require some memory update to be observable to the HW before
doing a register write, right thing is to do
wmb() + writel_relaxed()
wmb() + writel() is clearly the wrong choice and that's what the goal of
this change.
If we know that all writel() adaptations in all architectures guarantee
observability, another option is to get rid of wmb() and just keep writel().
I'm not so convinced about this and hoping that someone will correct me.
wmb() on x86 seems to have an sfence but writel() seems to have a compiler
barrier in it. So, the type of barrier wmb() is using is different.
we can't say
(wmb()+ writel_relaxed()) == writel()
for all architectures, but maybe I'm wrong.
We really don't want to convert all writel() to writel_relaxed() blindly
without giving much thought into it.
This was also another reason why I limited the changes to wmb() + writel()
combinations only.
If there is wmb() + code + writel() and if we convert this to wmb() + code +
writel_relaxed(), code will not be observed by the HW and this might break
the driver.
> It
> seems like instead of going that route they should have probably just
> looked at pushing the ARM drivers to something like a "writel_strict"
> and adopted the behavior of the other architectures for writel.
>
> I'll go back through and review. It looks like a number of items were missed.
>
OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
I wanted to raise it here first before inspecting the rest.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2018-03-15 1:55 UTC (permalink / raw)
To: David Miller, Networking
Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
Sabrina Dubroca, David Ahern
[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]
Hi all,
Today's linux-next merge of the net-next tree got a conflict in:
net/ipv4/xfrm4_policy.c
between commit:
d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu")
from the net tree and commit:
68e813aa4307 ("net/ipv4: Remove fib table id from rtable")
from the net-next tree.
I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging. You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.
--
Cheers,
Stephen Rothwell
diff --cc net/ipv4/xfrm4_policy.c
index fbebda67ac1b,0c752dc3f93b..000000000000
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@@ -100,10 -100,7 +100,9 @@@ static int xfrm4_fill_dst(struct xfrm_d
xdst->u.rt.rt_gateway = rt->rt_gateway;
xdst->u.rt.rt_uses_gateway = rt->rt_uses_gateway;
xdst->u.rt.rt_pmtu = rt->rt_pmtu;
+ xdst->u.rt.rt_mtu_locked = rt->rt_mtu_locked;
- xdst->u.rt.rt_table_id = rt->rt_table_id;
INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
+ rt_add_uncached_list(&xdst->u.rt);
return 0;
}
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH net] net: sched: fix uses after free
From: Eric Dumazet @ 2018-03-15 1:54 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Eric Dumazet, John Fastabend, Jamal Hadi Salim
In-Reply-To: <20180315014800.232628-1-edumazet@google.com>
On Wed, Mar 14, 2018 at 6:48 PM Eric Dumazet <edumazet@google.com> wrote:
> syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
> Issue here is that we can not reuse skb after a successful
skb_array_produce()
> since another cpu might have consumed it already.
> I believe a similar problem exists in try_bulk_dequeue_skb_slow()
> in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
> [1]
> ==================================================================
I sent a V2 without this ====== line that is fooling patchwork, sorry for
the noise.
^ permalink raw reply
* Re: [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 1:53 UTC (permalink / raw)
To: Sinan Kaya
Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <1520997599-17298-1-git-send-email-okaya@codeaurora.org>
On Tue, Mar 13, 2018 at 8:19 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
> drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index e554aa6cf..7028516 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
> * such as IA-64).
> */
> wmb();
> - writel(val, rx_ring->tail);
> + writel_relaxed(val, rx_ring->tail);
> }
>
> /**
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> index 357d605..2d323fc 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> @@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
> * such as IA-64).
> */
> wmb();
> - writel(val, rx_ring->tail);
> + writel_relaxed(val, rx_ring->tail);
> }
>
So this one missed writel calls in:
i40e:
i40e_program_fdir_filter
i40e_clean_rx_irq
i40e_tx_map
i40evf:
i40e_clean_rx_irq
i40e_tx_map
^ permalink raw reply
* [PATCH v2 net] net: sched: fix uses after free
From: Eric Dumazet @ 2018-03-15 1:53 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Eric Dumazet, John Fastabend,
Jamal Hadi Salim
syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
Issue here is that we can not reuse skb after a successful skb_array_produce()
since another cpu might have consumed it already.
I believe a similar problem exists in try_bulk_dequeue_skb_slow()
in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
[1]
BUG: KASAN: use-after-free in qdisc_pkt_len include/net/sch_generic.h:610 [inline]
BUG: KASAN: use-after-free in qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
BUG: KASAN: use-after-free in pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
Read of size 4 at addr ffff8801cede37e8 by task syzkaller717588/5543
CPU: 1 PID: 5543 Comm: syzkaller717588 Not tainted 4.16.0-rc4+ #265
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
qdisc_pkt_len include/net/sch_generic.h:610 [inline]
qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
__dev_xmit_skb net/core/dev.c:3216 [inline]
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
---
net/sched/sch_generic.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 190570f21b208d5a17943360a3a6f85e1c2a2187..7e3fbe9cc936be376b66a5b12bf8957c3b601f2c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -106,6 +106,14 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
__skb_queue_tail(&q->skb_bad_txq, skb);
+ if (qdisc_is_percpu_stats(q)) {
+ qdisc_qstats_cpu_backlog_inc(q, skb);
+ qdisc_qstats_cpu_qlen_inc(q);
+ } else {
+ qdisc_qstats_backlog_inc(q, skb);
+ q->q.qlen++;
+ }
+
if (lock)
spin_unlock(lock);
}
@@ -196,14 +204,6 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
break;
if (unlikely(skb_get_queue_mapping(nskb) != mapping)) {
qdisc_enqueue_skb_bad_txq(q, nskb);
-
- if (qdisc_is_percpu_stats(q)) {
- qdisc_qstats_cpu_backlog_inc(q, nskb);
- qdisc_qstats_cpu_qlen_inc(q);
- } else {
- qdisc_qstats_backlog_inc(q, nskb);
- q->q.qlen++;
- }
break;
}
skb->next = nskb;
@@ -628,6 +628,7 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
int band = prio2band[skb->priority & TC_PRIO_MAX];
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
struct skb_array *q = band2list(priv, band);
+ unsigned int pkt_len = qdisc_pkt_len(skb);
int err;
err = skb_array_produce(q, skb);
@@ -636,7 +637,10 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
return qdisc_drop_cpu(skb, qdisc, to_free);
qdisc_qstats_cpu_qlen_inc(qdisc);
- qdisc_qstats_cpu_backlog_inc(qdisc, skb);
+ /* Note: skb can not be used after skb_array_produce(),
+ * so we better not use qdisc_qstats_cpu_backlog_inc()
+ */
+ this_cpu_add(qdisc->cpu_qstats->backlog, pkt_len);
return NET_XMIT_SUCCESS;
}
--
2.16.2.804.g6dcf76e118-goog
^ permalink raw reply related
* Re: [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 1:50 UTC (permalink / raw)
To: Sinan Kaya
Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <1520997629-17361-5-git-send-email-okaya@codeaurora.org>
On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index b88fae7..ba8ccb5 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
> * such as IA-64).
> */
> wmb();
> - writel(i, rx_ring->tail);
> + writel_relaxed(i, rx_ring->tail);
> }
> }
>
This one missed the writel at the end of igb_tx_map().
^ permalink raw reply
* Re: [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 1:48 UTC (permalink / raw)
To: Sinan Kaya
Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <1520997629-17361-4-git-send-email-okaya@codeaurora.org>
On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
> index 4214c15..fe3441b 100644
> --- a/drivers/net/ethernet/intel/igbvf/netdev.c
> +++ b/drivers/net/ethernet/intel/igbvf/netdev.c
> @@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
> * such as IA-64).
> */
> wmb();
> - writel(i, adapter->hw.hw_addr + rx_ring->tail);
> + writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
> }
> }
>
This one missed the writel at the end of igbvf_tx_queue_adv().
^ permalink raw reply
* [PATCH net] net: sched: fix uses after free
From: Eric Dumazet @ 2018-03-15 1:48 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Eric Dumazet, John Fastabend,
Jamal Hadi Salim
syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
Issue here is that we can not reuse skb after a successful skb_array_produce()
since another cpu might have consumed it already.
I believe a similar problem exists in try_bulk_dequeue_skb_slow()
in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
[1]
==================================================================
BUG: KASAN: use-after-free in qdisc_pkt_len include/net/sch_generic.h:610 [inline]
BUG: KASAN: use-after-free in qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
BUG: KASAN: use-after-free in pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
Read of size 4 at addr ffff8801cede37e8 by task syzkaller717588/5543
CPU: 1 PID: 5543 Comm: syzkaller717588 Not tainted 4.16.0-rc4+ #265
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
qdisc_pkt_len include/net/sch_generic.h:610 [inline]
qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
__dev_xmit_skb net/core/dev.c:3216 [inline]
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
---
net/sched/sch_generic.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 190570f21b208d5a17943360a3a6f85e1c2a2187..7e3fbe9cc936be376b66a5b12bf8957c3b601f2c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -106,6 +106,14 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
__skb_queue_tail(&q->skb_bad_txq, skb);
+ if (qdisc_is_percpu_stats(q)) {
+ qdisc_qstats_cpu_backlog_inc(q, skb);
+ qdisc_qstats_cpu_qlen_inc(q);
+ } else {
+ qdisc_qstats_backlog_inc(q, skb);
+ q->q.qlen++;
+ }
+
if (lock)
spin_unlock(lock);
}
@@ -196,14 +204,6 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
break;
if (unlikely(skb_get_queue_mapping(nskb) != mapping)) {
qdisc_enqueue_skb_bad_txq(q, nskb);
-
- if (qdisc_is_percpu_stats(q)) {
- qdisc_qstats_cpu_backlog_inc(q, nskb);
- qdisc_qstats_cpu_qlen_inc(q);
- } else {
- qdisc_qstats_backlog_inc(q, nskb);
- q->q.qlen++;
- }
break;
}
skb->next = nskb;
@@ -628,6 +628,7 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
int band = prio2band[skb->priority & TC_PRIO_MAX];
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
struct skb_array *q = band2list(priv, band);
+ unsigned int pkt_len = qdisc_pkt_len(skb);
int err;
err = skb_array_produce(q, skb);
@@ -636,7 +637,10 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
return qdisc_drop_cpu(skb, qdisc, to_free);
qdisc_qstats_cpu_qlen_inc(qdisc);
- qdisc_qstats_cpu_backlog_inc(qdisc, skb);
+ /* Note: skb can not be used after skb_array_produce(),
+ * so we better not use qdisc_qstats_cpu_backlog_inc()
+ */
+ this_cpu_add(qdisc->cpu_qstats->backlog, pkt_len);
return NET_XMIT_SUCCESS;
}
--
2.16.2.804.g6dcf76e118-goog
^ permalink raw reply related
* Re: [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 1:47 UTC (permalink / raw)
To: Sinan Kaya
Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <1520997629-17361-2-git-send-email-okaya@codeaurora.org>
On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
In this patch you missed the writel at the end of ixgbe_tx_map.
^ permalink raw reply
* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 1:45 UTC (permalink / raw)
To: Sinan Kaya
Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <1520997629-17361-7-git-send-email-okaya@codeaurora.org>
On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 ++++-
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> index f695242..64d0e0b 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> @@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
> return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
> }
>
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
> static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
> {
> - writel(value, ring->tail);
> + writel_relaxed(value, ring->tail);
> }
>
> #define IXGBEVF_RX_DESC(R, i) \
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 9b3d43d..0ba7f59 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
>
> tx_ring->next_to_use = i;
>
> + /* Force memory writes to complete before letting h/w
> + * know there are new descriptors to fetch. (Only
> + * applicable for weak-ordered memory model archs,
> + * such as IA-64).
> + */
> + wmb();
This memory barrier is redundant. There is a wmb() that is called
about 10 lines before this.
> +
> /* notify HW of packet */
> ixgbevf_write_tail(tx_ring, i);
>
> --
> 2.7.4
>
^ permalink raw reply
* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 1:44 UTC (permalink / raw)
To: Sinan Kaya
Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <d565406d-1bc7-6fb0-3f36-87a28ac69070@codeaurora.org>
On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Hi Alexander,
>
> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 5:13 AM, <okaya@codeaurora.org> wrote:
>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>
>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>
>> Actually I would argue this whole patch set is pointless. For starters
>> why is it we are only updating the Intel Ethernet drivers here?
>
> I did a regex search for wmb() followed by writel() in each drivers directory.
> I scrubbed the ones I care about and posted this series. Note also that
> I have one Infiniband patch in the series.
I didn't see it as it I was only looking at the patches that had ended
up in intel-wired-lan. Also was there a cover page, I couldn't seem to
find that on LKML.
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
It might be advisable to break things up to subsystem or family. So
for example if you are going to update the Intel Ethernet drivers I
would focus on that and maybe spin the infiniband patch of into a
separate set that can be applied to a separate tree. This is something
I would consider more of a driver optimization than a fix. In our case
it makes it easier for us to maintain the patches to the Intel drivers
if you could submit just those to Jeff and Intel-wired-lan so that we
can take care of test and review as well as figure out what other
drivers will would still need to update in order to handle all the
cases involved in this.
>> This
>> seems like something that is going to impact the whole kernel tree
>> since many of us have been writing drivers for some time assuming x86
>> style behavior.
>
> That's true. We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
My only real objection is that you are going to be having to scrub
pretty much ALL the drivers. It seems a little like trying to fix a
bad tire on your car by paving the road to match the shape of the
tire.
>>
>> It doesn't make sense to be optimizing the drivers for one subset of
>> architectures. The scope of the work needed to update the drivers for
>> this would be ridiculous. Also I don't see how this could be expected
>> to work on any other architecture when we pretty much need to have a
>> wmb() before calling the writel on x86 to deal with accesses between
>> coherent and non-coherent memory. It seems to me more like somebody
>> added what they considered to be an optimization somewhere that is a
>> workaround for a poorly written driver. Either that or the barrier is
>> serving a different purpose then the one we were using.
>
> Is there a semantic problem with the definition of wmb() vs. writel() vs.
> writel_relaxed()? I thought everything is well described in barriers
> document about what to expect from these APIs.
>
> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
> It doesn't really change anything for x86 but it saves barriers on
> other architectures.
Yeah. I had to go through and do some review since my concerns have
been PowerPC, IA64, and x86 historically. From what I can tell all
those architectures are setup the same way so that shouldn't be an
issue.
>>
>> It would make more sense to put in the effort making writel and
>> writel_relaxed consistent between architectures before we go through
>> and start modifying driver code to support different architectures.
>>
>
> Is there an arch problem that I'm not seeing?
>
> Sinan
It isn't really an arch problem I have as a logistical one. It just
seems like this is really backwards in terms of how this has been
handled. For the x86 we have historically had to deal with the
barriers for this kind of stuff ourselves, now for ARM and a couple
other architectures they seem to have incorporated the barriers into
writel and are expecting everyone to move over to writel_relaxed. It
seems like instead of going that route they should have probably just
looked at pushing the ARM drivers to something like a "writel_strict"
and adopted the behavior of the other architectures for writel.
I'll go back through and review. It looks like a number of items were missed.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox