Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net-next] sfc: stop the TX queue before pushing new buffers
From: Edward Cree @ 2018-05-25 11:57 UTC (permalink / raw)
  To: Martin Habets, linux-net-drivers, davem; +Cc: netdev, jarod
In-Reply-To: <f2749cb0-f1ce-4572-04fe-f1c11fe09237@solarflare.com>

On 24/05/18 10:14, Martin Habets wrote:
> efx_enqueue_skb() can push new buffers for the xmit_more functionality.
> We must stops the TX queue before this or else the TX queue does not get
> restarted and we get a netdev watchdog.
>
> In the error handling we may now need to unwind more than 1 packet, and
> we may need to push the new buffers onto the partner queue.
>
> v2: In the error leg also push this queue if xmit_more is set
>
> Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
> Reported-by: Jarod Wilson <jarod@redhat.com>
> Tested-by: Jarod Wilson <jarod@redhat.com>
> Signed-off-by: Martin Habets <mhabets@solarflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-05-25 11:59 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <20180525102803.GA30627@apalos>

> So it's only static FBDs left.

Please describe the use case. Why would i want to put static FDB
addresses on a CPU port? What in the linux network stack needs this?

	  Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/7] net: bridge: Extract boilerplate around switchdev_port_obj_*()
From: Nikolay Aleksandrov @ 2018-05-25 12:04 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	vivien.didelot, f.fainelli
In-Reply-To: <af1bed5ce2b236544edd591b8cfb8425c2cb5ca0.1527173527.git.petrm@mellanox.com>

On 24/05/18 18:10, Petr Machata wrote:
> A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves
> initializing a struct switchdev_obj_port_vlan, a piece of code that
> repeats on each call site almost verbatim. While in the current codebase
> there is just one duplicated add call, the follow-up patches add more of
> both add and del calls.
> 
> Thus to remove the duplication, extract the repetition into named
> functions and reuse.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> ---
>  net/bridge/br_vlan.c | 44 +++++++++++++++++++++++---------------------
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH v2 1/1] qed: Add firmware 8.37.2.0
From: Josh Boyer @ 2018-05-25 12:07 UTC (permalink / raw)
  To: Rahul Verma
  Cc: Kyle McMartin, netdev, linux-rdma, linux-scsi, Linux Firmware,
	Ben Hutchings, ariel.elior
In-Reply-To: <20180521135141.11091-1-rahul.verma@cavium.com>

On Mon, May 21, 2018 at 10:25 AM Rahul Verma <rahul.verma@cavium.com> wrote:

> This patch add a new qed firmware with fixes and added
> support for new features.
> -Fix aRFS for tunneled traffic without inner IP.
> -Fix chip configuration which may fail under heavy traffic conditions.
> -Fix iSCSI recovery flow.
> -Support receiving any-VNI in VXLAN and GENEVE RX classification.
> -Support additional RoCE statistics for error cases.
> -Support RoCE T10DIF.

> v1->v2:
> Added signoff for Ariel Elior.

> Signed-off-by: Rahul Verma <rahul.verma@cavium.com>
> Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
> ---
>   WHENCE                                  |   1 +
>   qed/qed_init_values_zipped-8.37.2.0.bin | Bin 0 -> 867472 bytes
>   2 files changed, 1 insertion(+)
>   create mode 100755 qed/qed_init_values_zipped-8.37.2.0.bin

Applied and pushed out.

josh

^ permalink raw reply

* Re: [PATCH net-next 6/7] net: bridge: Notify about bridge VLANs
From: Nikolay Aleksandrov @ 2018-05-25 12:08 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	vivien.didelot, f.fainelli
In-Reply-To: <b0b6b46a239eb330d722cea89137ab6ec5bae6e3.1527173527.git.petrm@mellanox.com>

On 24/05/18 18:10, Petr Machata wrote:
> A driver might need to react to changes in settings of brentry VLANs.
> Therefore send switchdev port notifications for these as well. Reuse
> SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use
> netif_is_bridge_master() on orig_dev to determine whether the
> notification is about a bridge port or a bridge.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> ---
>  net/bridge/br_vlan.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-05-25 12:09 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Ivan Vecera, Jiri Pirko, netdev, grygorii.strashko,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <20180525062902.GA11526@apalos>

> Understood, if we missed back anything on handling multicast for
> the cpu port we'll go back and fix it (i am assuming snooping is the answer
> here).

It is part of the answer. You should also look at .ndo_set_rx_mode.
When the switch ports are not in a bridge, this call i used to pass a
list of MAC addresses which the network stack would like to receiver.
You probably want to turn that list into MBD and FDBs.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: dsa_loop: Make dynamic debugging helpful
From: Andrew Lunn @ 2018-05-25 12:16 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, Vivien Didelot, David S. Miller, open list
In-Reply-To: <20180525035215.19341-1-f.fainelli@gmail.com>

On Thu, May 24, 2018 at 08:52:14PM -0700, Florian Fainelli wrote:
> Remove redundant debug prints from phy_read/write since we can trace those
> calls through trace events. Enhance dynamic debug prints to print arguments
> which helps figuring how what is going on at the driver level with higher level
> configuration interfaces.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 0/7] net: bridge: Notify about bridge VLANs
From: Ilias Apalodimas @ 2018-05-25 12:33 UTC (permalink / raw)
  To: Petr Machata
  Cc: Florian Fainelli, netdev, devel, bridge, jiri, idosch, davem,
	razvan.stefanescu, gregkh, stephen, andrew, vivien.didelot,
	nikolay
In-Reply-To: <wihbmd4gi6t.fsf@dev-r-vrt-156.mtr.labs.mlnx>

On Fri, May 25, 2018 at 01:09:46PM +0300, Petr Machata wrote:
> Florian Fainelli <f.fainelli@gmail.com> writes:
> 
> > You seem to have approached the bridge changes a little differently from
> > this series:
> >
> > https://lists.linux-foundation.org/pipermail/bridge/2016-November/010112.html
> 
> It pretty much extends the patchset to also send the notifications for
> the CPU port.
> 
> I missed this e-mail yesterday and now I see you already found out for
> yourself how it behaves.
> 
> > Both have the same intent that by targeting the bridge device itself,
> > you can propagate that through switchdev to the switch drivers, and in
> > turn create configurations where for instance, you have:
> >
> > - CPU/management port present in specific VLANs that is a subset or
> > superset of the VLANs configured on front-panel ports
> > - CPU/management port tagged/untagged in specific VLANs which can be a
> > different setting from the front-panel ports
> >
> > One problem we have in DSA at the moment is that we always add the CPU
> > port to the VLANs configured to the front-panel port but we do this with
> > the same attributes as the front panel ports! For instance, if you add
> > Port 0 to VLAN1 untagged, the the CPU port also gets added to that
> > VLAN1, also untagged. As long as there is just one VLAN untagged, this
> > is not much of a problem. Now do this with another VLAN or another port,
> > and the CPU can no longer differentiate the traffic from which VLAN it
> > is coming from, no bueno.
> 
> Yep, with this patchset you should be able to use the CPU port
> notifications to configure things exactly.
> 
> > bridge vlan add vid 2 dev port0 pvid untagged
> > 	-> port0 (e.g: switch port 0) gets programmed
> > 	-> CPU port gets programmed
> > bridge vlan add vid 2 dev br0 self
> > 	-> CPU port gets programmed
> > bridge vlan add vid 2 dev port0
> > 	-> port0 (switch port 0) gets programmed
> >
> > Are these use cases possible with your series? It seems to me like it is
> > if we drop the netif_is_bridge_master() checks and resolve orig_dev as
> > being a hint for the CPU/management port.
> 
> Yeah, that's how it behaves. If you accept the events where
> netif_is_bridge_master(orig_dev), you can tell the CPU port-related
> events from the rest by BRIDGE_VLAN_INFO_BRENTRY.
This also addresses the issue i am having trying to add switchdev functionality
to a driver that needs separate configuration for cpu port.  
https://www.spinics.net/lists/netdev/msg504577.html

Tested it and it works fine

Thanks,
Ilias

^ permalink raw reply

* Re: [PATCH] ath10k: transmit queued frames after waking queues
From: Niklas Cassel @ 2018-05-25 12:36 UTC (permalink / raw)
  To: Bob Copeland
  Cc: Adrian Chadd, Kalle Valo, David Miller, ath10k, linux-wireless,
	netdev, Linux Kernel Mailing List
In-Reply-To: <20180524155034.xse7cfm5figfktuj@localhost>

On Thu, May 24, 2018 at 11:50:34AM -0400, Bob Copeland wrote:
> On Mon, May 21, 2018 at 10:37:01PM +0200, Niklas Cassel wrote:
> > On Thu, May 17, 2018 at 03:26:25PM -0700, Adrian Chadd wrote:
> > > On Thu, 17 May 2018 at 16:16, Niklas Cassel <niklas.cassel@linaro.org>
> > > wrote:
> > > 
> > > > diff --git a/drivers/net/wireless/ath/ath10k/txrx.c
> > > b/drivers/net/wireless/ath/ath10k/txrx.c
> > > > index cda164f6e9f6..1d3b2d2c3fee 100644
> > > > --- a/drivers/net/wireless/ath/ath10k/txrx.c
> > > > +++ b/drivers/net/wireless/ath/ath10k/txrx.c
> > > > @@ -95,6 +95,9 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
> > > >                  wake_up(&htt->empty_tx_wq);
> > > >          spin_unlock_bh(&htt->tx_lock);
> > > 
> > > > +       if (htt->num_pending_tx <= 3 && !list_empty(&ar->txqs))
> > > > +               ath10k_mac_tx_push_pending(ar);
> > > > +
> > > 
> > > Just sanity checking - what's protecting htt->num_pending_tx? or is it
> > > serialised some other way?
> [...]
> > I can't see that any of the examples applies, but let's add READ_ONCE(),
> > to make sure that the compiler doesn't try to optimize this.
> 
> Couldn't you just move the num_pending_tx read inside tx_lock which is 2 lines
> above?  I think all the other manipulations are protected by tx_lock.

Hello Bob,

There is both a V2 and V3 of this patchset, V3 moves this to a sdio.c and
calls ath10k_mac_tx_push_pending() unconditionally.

But to answer your question,
it shouldn't matter if the read is done while holding the lock or not.

Sure, the compiler could do things so that the code path is always or never
executed, but that is why I added READ_ONCE() in V2.

A spin lock does have the advantage of ordering: memory operations issued
before the spin_unlock_bh() will be completed before the spin_unlock_bh()
operation has completed.

However, ath10k_htt_tx_dec_pending() was called earlier in the same function,
which decreases htt->num_pending_tx, so that write will be completed before
our read. That is the only ordering we care about here (if we should call
ath10k_mac_tx_push_pending() or not).

Kind regards,
Niklas

^ permalink raw reply

* [PATCH] drivers/net/phy/micrel: Fix for PHY KSZ8061 errrata: Potential link-up failure when Ethernet cable is connected slowly
From: Alexander Onnasch @ 2018-05-25 12:37 UTC (permalink / raw)
  Cc: alexander.onnasch, Andrew Lunn, Florian Fainelli, netdev,
	linux-kernel

Signed-off-by: Alexander Onnasch <alexander.onnasch@landisgyr.com>
---
 drivers/net/phy/micrel.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 6c45ff6..7d80a00 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -339,6 +339,28 @@ static int ksz8041_config_aneg(struct phy_device *phydev)
        return genphy_config_aneg(phydev);
 }

+#define MII_KSZ8061RN_MMD_CTRL_REG         0x0d
+#define MII_KSZ8061RN_MMD_REGDATA_REG  0x0e
+
+static int ksz8061_extended_write(struct phy_device *phydev,
+                                 u8 mode, u32 dev_addr, u32 regnum, u16 val)
+{
+       phy_write(phydev, MII_KSZ8061RN_MMD_CTRL_REG, dev_addr);
+       phy_write(phydev, MII_KSZ8061RN_MMD_REGDATA_REG, regnum);
+       phy_write(phydev, MII_KSZ8061RN_MMD_CTRL_REG, (mode << 14) | dev_addr);
+       return phy_write(phydev, MII_KSZ8061RN_MMD_REGDATA_REG, val);
+}
+
+static int ksz8061_config_init(struct phy_device *phydev)
+{
+       int ret;
+
+       ret = ksz8061_extended_write(phydev, 0x1, 0x1, 0x2, 0xB61A);
+       if (ret)
+               return ret;
+       return kszphy_config_init(phydev);
+}
+
 static int ksz9021_load_values_from_of(struct phy_device *phydev,
                                       const struct device_node *of_node,
                                       u16 reg,
@@ -938,7 +960,7 @@ static struct phy_driver ksphy_driver[] = {
        .phy_id_mask    = MICREL_PHY_ID_MASK,
        .features       = PHY_BASIC_FEATURES,
        .flags          = PHY_HAS_INTERRUPT,
-       .config_init    = kszphy_config_init,
+       .config_init    = ksz8061_config_init,
        .config_aneg    = genphy_config_aneg,
        .read_status    = genphy_read_status,
        .ack_interrupt  = kszphy_ack_interrupt,
--
2.7.4


P PLEASE CONSIDER OUR ENVIRONMENT BEFORE PRINTING THIS EMAIL.

This e-mail (including any attachments) is confidential and may be legally privileged. If you are not an intended recipient or an authorized representative of an intended recipient, you are prohibited from using, copying or distributing the information in this e-mail or its attachments. If you have received this e-mail in error, please notify the sender immediately by return e-mail and delete all copies of this message and any attachments. Thank you.

^ permalink raw reply related

* Hello,
From: Mrs. Annabelle Edo. @ 2018-05-25 12:42 UTC (permalink / raw)


Hello,

Good day, How you are today and your family, i hope you are in good
health, my name is Mrs. Annabelle, I saw your email on Web. Media, I
have something to discuss with you my late unclaimed fund, please
reply me for more information,

Mrs. Annabelle Ed....

^ permalink raw reply

* Re: [PATCH] ath10k: transmit queued frames after waking queues
From: Bob Copeland @ 2018-05-25 12:50 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Adrian Chadd, Kalle Valo, David Miller, ath10k, linux-wireless,
	netdev, Linux Kernel Mailing List
In-Reply-To: <20180525123656.GB8780@localhost.localdomain>

On Fri, May 25, 2018 at 02:36:56PM +0200, Niklas Cassel wrote:
> A spin lock does have the advantage of ordering: memory operations issued
> before the spin_unlock_bh() will be completed before the spin_unlock_bh()
> operation has completed.
> 
> However, ath10k_htt_tx_dec_pending() was called earlier in the same function,
> which decreases htt->num_pending_tx, so that write will be completed before
> our read. That is the only ordering we care about here (if we should call
> ath10k_mac_tx_push_pending() or not).

Sure.  I also understand that reading inside a lock and operating on the
value outside the lock isn't really the definition of synchronization
(doesn't really matter in this case though).

I was just suggesting that the implicit memory barrier in the spin unlock
that we are already paying for would be sufficient here too, and it matches
the semantic of "tx fields under tx_lock."  On the other hand, maybe it's
just me, but I tend to look askance at just-in-case READ_ONCEs sprinkled
about.

-- 
Bob Copeland %% https://bobcopeland.com/

^ permalink raw reply

* [PATCH] net: netsec: reduce DMA mask to 40 bits
From: Ard Biesheuvel @ 2018-05-25 12:50 UTC (permalink / raw)
  To: netdev
  Cc: davem, Ard Biesheuvel, Robin Murphy, Jassi Brar, Masahisa Kojima,
	Ilias Apalodimas

The netsec network controller IP can drive 64 address bits for DMA, and
the DMA mask is set accordingly in the driver. However, the SynQuacer
SoC, which is the only silicon incorporating this IP at the moment,
integrates this IP in a manner that leaves address bits [63:40]
unconnected.

Up until now, this has not resulted in any problems, given that the DDR
controller doesn't decode those bits to begin with. However, recent
firmware updates for platforms incorporating this SoC allow the IOMMU
to be enabled, which does decode address bits [47:40], and allocates
top down from the IOVA space, producing DMA addresses that have bits
set that have been left unconnected.

Both the DT and ACPI (IORT) descriptions of the platform take this into
account, and only describe a DMA address space of 40 bits (using either
dma-ranges DT properties, or DMA address limits in IORT named component
nodes). However, even though our IOMMU and bus layers may take such
limitations into account by setting a narrower DMA mask when creating
the platform device, the netsec probe() entrypoint follows the common
practice of setting the DMA mask uncondionally, according to the
capabilities of the IP block itself rather than to its integration into
the chip.

It is currently unclear what the correct fix is here. We could hack around
it by only setting the DMA mask if it deviates from its default value of
DMA_BIT_MASK(32). However, this makes it impossible for the bus layer to
use DMA_BIT_MASK(32) as the bus limit, and so it appears that a more
comprehensive approach is required to take DMA limits imposed by the
SoC as a whole into account.

In the mean time, let's limit the DMA mask to 40 bits. Given that there
is currently only one SoC that incorporates this IP, this is a reasonable
approach that can be backported to -stable and buys us some time to come
up with a proper fix going forward.

Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Masahisa Kojima <masahisa.kojima@linaro.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
Please cc to -stable (v4.16+)

 drivers/net/ethernet/socionext/netsec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index f4c0b02ddad8..59fbf74dcada 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1674,8 +1674,8 @@ static int netsec_probe(struct platform_device *pdev)
 	if (ret)
 		goto unreg_napi;

-	if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)))
-		dev_warn(&pdev->dev, "Failed to enable 64-bit DMA\n");
+	if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40)))
+		dev_warn(&pdev->dev, "Failed to set DMA mask\n");

 	ret = register_netdev(ndev);
 	if (ret) {
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next] qmi_wwan: apply SET_DTR quirk to the SIMCOM shared device ID
From: Bjørn Mork @ 2018-05-25 13:00 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Bjørn Mork, Reinhard Speyerer

SIMCOM are reusing a single device ID for many (all of their?)
different modems, based on different chipsets and firmwares. Newer
Qualcomm chipset generations require setting DTR to wake the QMI
function.  The SIM7600E modem is using such a chipset, making it
fail to work with this driver despite the device ID match.

Fix by unconditionally enabling the SET_DTR quirk for all SIMCOM
modems using this specific device ID.  This is similar to what
we already have done for another case of device IDs recycled over
multiple chipset generations: 14cf4a771b30 ("drivers: net: usb:
qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201")

Initial testing on an older SIM7100 modem shows no immediate side
effects.

Reported-by: Sebastian Sjoholm <sebastian.sjoholm@gmail.com>
Cc: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
I propose this for net-next for now to get some extra testing
exposure before going into stable.  I'll send a separate request
for stable backport as soon as I'm satisfied that there are no
hidden issues with some specific modem.

Bjørn 

 drivers/net/usb/qmi_wwan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 42565dd33aa6..148e78f8b48c 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1248,7 +1248,7 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x03f0, 0x4e1d, 8)},	/* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */
 	{QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)},	/* HP lt4120 Snapdragon X5 LTE */
 	{QMI_FIXED_INTF(0x22de, 0x9061, 3)},	/* WeTelecom WPD-600N */
-	{QMI_FIXED_INTF(0x1e0e, 0x9001, 5)},	/* SIMCom 7230E */
+	{QMI_QUIRK_SET_DTR(0x1e0e, 0x9001, 5)},	/* SIMCom 7100E, 7230E, 7600E ++ */
 	{QMI_QUIRK_SET_DTR(0x2c7c, 0x0125, 4)},	/* Quectel EC25, EC20 R2.0  Mini PCIe */
 	{QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)},	/* Quectel EC21 Mini PCIe */
 	{QMI_FIXED_INTF(0x2c7c, 0x0296, 4)},	/* Quectel BG96 */
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH] net: netsec: reduce DMA mask to 40 bits
From: Robin Murphy @ 2018-05-25 13:08 UTC (permalink / raw)
  To: Ard Biesheuvel, netdev
  Cc: davem, Jassi Brar, Masahisa Kojima, Ilias Apalodimas
In-Reply-To: <20180525125037.779-1-ard.biesheuvel@linaro.org>

On 25/05/18 13:50, Ard Biesheuvel wrote:
> The netsec network controller IP can drive 64 address bits for DMA, and
> the DMA mask is set accordingly in the driver. However, the SynQuacer
> SoC, which is the only silicon incorporating this IP at the moment,
> integrates this IP in a manner that leaves address bits [63:40]
> unconnected.
> 
> Up until now, this has not resulted in any problems, given that the DDR
> controller doesn't decode those bits to begin with. However, recent
> firmware updates for platforms incorporating this SoC allow the IOMMU
> to be enabled, which does decode address bits [47:40], and allocates
> top down from the IOVA space, producing DMA addresses that have bits
> set that have been left unconnected.
> 
> Both the DT and ACPI (IORT) descriptions of the platform take this into
> account, and only describe a DMA address space of 40 bits (using either
> dma-ranges DT properties, or DMA address limits in IORT named component
> nodes). However, even though our IOMMU and bus layers may take such
> limitations into account by setting a narrower DMA mask when creating
> the platform device, the netsec probe() entrypoint follows the common
> practice of setting the DMA mask uncondionally, according to the
> capabilities of the IP block itself rather than to its integration into
> the chip.
> 
> It is currently unclear what the correct fix is here. We could hack around
> it by only setting the DMA mask if it deviates from its default value of
> DMA_BIT_MASK(32). However, this makes it impossible for the bus layer to
> use DMA_BIT_MASK(32) as the bus limit, and so it appears that a more
> comprehensive approach is required to take DMA limits imposed by the
> SoC as a whole into account.
> 
> In the mean time, let's limit the DMA mask to 40 bits. Given that there
> is currently only one SoC that incorporates this IP, this is a reasonable
> approach that can be backported to -stable and buys us some time to come
> up with a proper fix going forward.

It's a little bit of a dodge, but until another SoC comes along with 
different requirements I agree it's the reasonable thing to do.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Jassi Brar <jaswinder.singh@linaro.org>
> Cc: Masahisa Kojima <masahisa.kojima@linaro.org>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> Please cc to -stable (v4.16+)
> 
>   drivers/net/ethernet/socionext/netsec.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index f4c0b02ddad8..59fbf74dcada 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -1674,8 +1674,8 @@ static int netsec_probe(struct platform_device *pdev)
>   	if (ret)
>   		goto unreg_napi;
>   
> -	if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)))
> -		dev_warn(&pdev->dev, "Failed to enable 64-bit DMA\n");
> +	if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40)))
> +		dev_warn(&pdev->dev, "Failed to set DMA mask\n");
>   
>   	ret = register_netdev(ndev);
>   	if (ret) {
> 

^ permalink raw reply

* Re: [PATCH] ath6kl: mark expected switch fall-throughs
From: Kalle Valo @ 2018-05-25 13:30 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Gustavo A. R. Silva, David S. Miller, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <060f93a0-7210-79b6-37a7-cd8900719d40@cogentembedded.com>

Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:

> On 5/25/2018 2:13 AM, Gustavo A. R. Silva wrote:
>
>> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
>> where we are expecting to fall through.
>>
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>> ---
>>   drivers/net/wireless/ath/ath6kl/cfg80211.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath6kl/cfg80211.c b/drivers/net/wireless/ath/ath6kl/cfg80211.c
>> index 2ba8cf3..29e32cd 100644
>> --- a/drivers/net/wireless/ath/ath6kl/cfg80211.c
>> +++ b/drivers/net/wireless/ath/ath6kl/cfg80211.c
>> @@ -3898,17 +3898,17 @@ int ath6kl_cfg80211_init(struct ath6kl *ar)
>>   	wiphy->max_scan_ie_len = 1000; /* FIX: what is correct limit? */
>>   	switch (ar->hw.cap) {
>>   	case WMI_11AN_CAP:
>> -		ht = true;
>> +		ht = true; /* fall through */
>>   	case WMI_11A_CAP:
>>   		band_5gig = true;
>>   		break;
>>   	case WMI_11GN_CAP:
>> -		ht = true;
>> +		ht = true; /* fall through */
>>   	case WMI_11G_CAP:
>>   		band_2gig = true;
>>   		break;
>>   	case WMI_11AGN_CAP:
>> -		ht = true;
>> +		ht = true; /* fall through */
>>   	case WMI_11AG_CAP:
>>   		band_2gig = true;
>>   		band_5gig = true;
>
>    Hm, typically such comments are done on a line of their own, have
> never seen this style...

Yeah, I was wondering the same. Was there a particular reason for this?

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
From: Toshiaki Makita @ 2018-05-25 13:43 UTC (permalink / raw)
  To: Jason Wang, Toshiaki Makita, David S. Miller; +Cc: netdev
In-Reply-To: <491862dc-b6d2-854d-4d5f-8dfe34da882d@redhat.com>

On 18/05/25 (金) 18:59, Jason Wang wrote:
> On 2018年05月25日 12:32, Toshiaki Makita wrote:
>> Calling XDP redirection requires preempt/bh disabled. Especially softirq
>> can call another XDP function and redirection functions, then percpu
>> value ri->map can be overwritten to NULL.
>>
>> This is a generic XDP case called from tun.
>>
>> [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference 
>> at 0000000000000018
>> [ 3535.743974] PGD 0 P4D 0
>> [ 3535.746530] Oops: 0000 [#1] SMP PTI
>> [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp 
>> llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
>> sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif 
>> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses 
>> aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper 
>> ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca 
>> ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs 
>> libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea 
>> sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel 
>> scsi_transport_sas tg3 i2c_core ptp pps_core
>> [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 
>> 4.17.0-rc4 #2
>> [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 
>> Gen10, BIOS U32 11/14/2017
>> [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
>> [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
>> [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: 
>> ffff9fdf27fe3100
>> [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 
>> 0000000000000ec0
>> [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: 
>> ffff9fdf9e72c000
>> [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: 
>> ffff9fdfc82cdd00
>> [ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) 
>> knlGS:0000000000000000
>> [ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 
>> 00000000007626e0
>> [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [ 3535.910538] PKRU: 55555554
>> [ 3535.913267] Call Trace:
>> [ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
>> [ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
>> [ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
>> [ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
>> [ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
>> [ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
>> [ 3535.942030]  kthread+0xf5/0x130
>> [ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
>> [ 3535.950031]  ? kthread_bind+0x10/0x10
>> [ 3535.953727]  ret_from_fork+0x35/0x40
>> [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 
>> 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 
>> 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 
>> c0 c3
>> [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
>> [ 3535.982883] CR2: 0000000000000018
>> [ 3535.987096] ---[ end trace 383b299dd1430240 ]---
>> [ 3536.131325] Kernel panic - not syncing: Fatal exception
>> [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> And a kernel with generic case fixed still panics in tun driver XDP
>> redirect, because it did not disable bh.
>>
>> [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference 
>> at 0000000000000018
>> [ 2055.136662] PGD 0 P4D 0
>> [ 2055.139219] Oops: 0000 [#1] SMP PTI
>> [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp 
>> llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
>> sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel 
>> ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo 
>> wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me 
>> ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c 
>> sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
>> sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas 
>> crc32c_intel i2c_core ptp pps_core
>> [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        
>> W         4.17.0-rc5-fix-tun+ #1
>> [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 
>> Gen10, BIOS U32 11/14/2017
>> [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
>> [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
>> [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 
>> 0000000000000004
>> [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 
>> 0000000000005d80
>> [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: 
>> ffff998b097a9000
>> [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: 
>> ffff998b07607d78
>> [ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) 
>> knlGS:0000000000000000
>> [ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 
>> 00000000007626e0
>> [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [ 2055.305424] PKRU: 55555554
>> [ 2055.308153] Call Trace:
>> [ 2055.310624]  xdp_do_redirect+0x7b/0x380
>> [ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
>> [ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
>> [ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
>> [ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
>> [ 2055.331646]  kthread+0xf5/0x130
>> [ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
>> [ 2055.339646]  ? kthread_bind+0x10/0x10
>> [ 2055.343343]  ret_from_fork+0x35/0x40
>> [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 
>> 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 
>> 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 
>> c0 c3
>> [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
>> [ 2055.372500] CR2: 0000000000000018
>> [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
>> [ 2055.523626] Kernel panic - not syncing: Fatal exception
>> [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> Fixes: 761876c857cb ("tap: XDP support")
>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>> ---
>>   drivers/net/tun.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 45d8077..4fc7dbf 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1650,6 +1650,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>       else
>>           *skb_xdp = 0;
>> +    local_bh_disable();
>>       preempt_disable();
>>       rcu_read_lock();
>>       xdp_prog = rcu_dereference(tun->xdp_prog);
>> @@ -1676,6 +1677,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>                   goto err_redirect;
>>               rcu_read_unlock();
>>               preempt_enable();
>> +            local_bh_enable();
>>               return NULL;
>>           case XDP_TX:
>>               get_page(alloc_frag->page);
>> @@ -1685,6 +1687,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>               tun_xdp_flush(tun->dev);
>>               rcu_read_unlock();
>>               preempt_enable();
>> +            local_bh_enable();
>>               return NULL;
>>           case XDP_PASS:
>>               delta = orig_data - xdp.data;
>> @@ -1704,6 +1707,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>       if (!skb) {
>>           rcu_read_unlock();
>>           preempt_enable();
>> +        local_bh_enable();
>>           return ERR_PTR(-ENOMEM);
>>       }
>> @@ -1714,6 +1718,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>       rcu_read_unlock();
>>       preempt_enable();
>> +    local_bh_enable();
>>       return skb;
>> @@ -1722,6 +1727,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>   err_xdp:
>>       rcu_read_unlock();
>>       preempt_enable();
>> +    local_bh_enable();
>>       this_cpu_inc(tun->pcpu_stats->rx_dropped);
>>       return NULL;
>>   }
>> @@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct tun_struct 
>> *tun, struct tun_file *tfile,
>>           struct bpf_prog *xdp_prog;
>>           int ret;
>> +        local_bh_disable();
>> +        preempt_disable();
>>           rcu_read_lock();
>>           xdp_prog = rcu_dereference(tun->xdp_prog);
>>           if (xdp_prog) {
>>               ret = do_xdp_generic(xdp_prog, skb);
>>               if (ret != XDP_PASS) {
>>                   rcu_read_unlock();
>> +                preempt_enable();
>> +                local_bh_enable();
>>                   return total_len;
>>               }
>>           }
>>           rcu_read_unlock();
>> +        preempt_enable();
>> +        local_bh_enable();
>>       }
>>       rcu_read_lock();
> 
> Good catch, thanks.
> 
> But I think we can just replace preempt_disable()/enable() with 
> local_bh_disable()/local_bh_enable() ?

I actually thought the same, but noticed this patch.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9ea4c380066fbe

It looks like they do not think local_bh_disable() implies 
preempt_disable(). But I'm not sure why..

Toshiaki Makita

^ permalink raw reply

* Re: KASAN: use-after-free Read in ccid2_hc_tx_packet_recv
From: syzbot @ 2018-05-25 13:52 UTC (permalink / raw)
  To: alexey.kodanev, davem, dccp, edumazet, gerrit, keescook,
	linux-kernel, netdev, soheil, syzkaller-bugs
In-Reply-To: <0000000000003872fd0568da185f@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    b50694381cfc Merge branch 'stable/for-linus-4.17' of git:/..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17151cb7800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
dashboard link: https://syzkaller.appspot.com/bug?extid=554ccde221001ab5479a
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1363ccb7800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1272e2b7800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+554ccde221001ab5479a@syzkaller.appspotmail.com

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
==================================================================
  fail_dump lib/fault-inject.c:51 [inline]
  should_fail.cold.4+0xa/0x1a lib/fault-inject.c:149
BUG: KASAN: use-after-free in ccid2_hc_tx_packet_recv+0x2383/0x275e  
net/dccp/ccids/ccid2.c:597
Read of size 1 at addr ffff8801ba4911c2 by task syz-executor940/4542

  __should_failslab+0x124/0x180 mm/failslab.c:32
  should_failslab+0x9/0x14 mm/slab_common.c:1522
  slab_pre_alloc_hook mm/slab.h:423 [inline]
  slab_alloc mm/slab.c:3378 [inline]
  kmem_cache_alloc_trace+0x4b/0x780 mm/slab.c:3618
  kmalloc include/linux/slab.h:512 [inline]
  dccp_ackvec_parsed_add+0xa1/0x310 net/dccp/ackvec.c:352
  ccid2_hc_tx_parse_options+0x9a/0xb0 net/dccp/ccids/ccid2.c:510
  ccid_hc_tx_parse_options net/dccp/ccid.h:207 [inline]
  dccp_parse_options+0x658/0x11f0 net/dccp/options.c:233
  dccp_rcv_established+0x44/0xb0 net/dccp/input.c:374
  dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
  sk_backlog_rcv include/net/sock.h:909 [inline]
  __release_sock+0x12f/0x3a0 net/core/sock.c:2335
  release_sock+0xa4/0x2b0 net/core/sock.c:2850
  dccp_sendmsg+0x771/0x1020 net/dccp/proto.c:820
  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  ___sys_sendmsg+0x525/0x940 net/socket.c:2117
  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2212
  __do_sys_sendmmsg net/socket.c:2241 [inline]
  __se_sys_sendmmsg net/socket.c:2238 [inline]
  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2238
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441819
RSP: 002b:00007ffdb9a9df08 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441819
RDX: 04000000000001e6 RSI: 0000000020000c00 RDI: 0000000000000005
RBP: 00007ffdb9a9df20 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
R13: 04000000000001e6 R14: 0000000000000006 R15: 0000000000000000
CPU: 0 PID: 4542 Comm: syz-executor940 Not tainted 4.17.0-rc6+ #66
dccp_parse_options: DCCP(        (ptrval)): Option 38 (len=1) error=5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  ccid2_hc_tx_packet_recv+0x2383/0x275e net/dccp/ccids/ccid2.c:597
  ccid_hc_tx_packet_recv net/dccp/ccid.h:192 [inline]
  dccp_deliver_input_to_ccids+0x203/0x280 net/dccp/input.c:186
  dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
  dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
  sk_backlog_rcv include/net/sock.h:909 [inline]
  __release_sock+0x12f/0x3a0 net/core/sock.c:2335
  release_sock+0xa4/0x2b0 net/core/sock.c:2850
  dccp_sendmsg+0x771/0x1020 net/dccp/proto.c:820
  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  ___sys_sendmsg+0x525/0x940 net/socket.c:2117
  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2212
  __do_sys_sendmmsg net/socket.c:2241 [inline]
  __se_sys_sendmmsg net/socket.c:2238 [inline]
  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2238
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441819
RSP: 002b:00007ffdb9a9df08 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441819
RDX: 04000000000001e6 RSI: 0000000020000c00 RDI: 0000000000000005
RBP: 00007ffdb9a9df20 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
R13: 04000000000001e6 R14: 0000000000000006 R15: 0000000000000000

Allocated by task 4542:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  __do_kmalloc_node mm/slab.c:3682 [inline]
  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3696
  __kmalloc_reserve.isra.38+0x3a/0xe0 net/core/skbuff.c:137
  __alloc_skb+0x14d/0x780 net/core/skbuff.c:205
  alloc_skb include/linux/skbuff.h:987 [inline]
  dccp_send_ack+0xd2/0x340 net/dccp/output.c:580
  ccid2_hc_rx_packet_recv+0x139/0x1b0 net/dccp/ccids/ccid2.c:776
  ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
  dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180
  dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
  dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
  sk_backlog_rcv include/net/sock.h:909 [inline]
  __sk_receive_skb+0x3a2/0xd60 net/core/sock.c:513
  dccp_v4_rcv+0x10e5/0x1f3f net/dccp/ipv4.c:875
  ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215
  NF_HOOK include/linux/netfilter.h:288 [inline]
  ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256
  dst_input include/net/dst.h:450 [inline]
  ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396
  NF_HOOK include/linux/netfilter.h:288 [inline]
  ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492
  __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592
  __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657
  process_backlog+0x219/0x760 net/core/dev.c:5337
  napi_poll net/core/dev.c:5735 [inline]
  net_rx_action+0x7b7/0x1930 net/core/dev.c:5801
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285

Freed by task 4542:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x260 mm/slab.c:3813
  skb_free_head+0x99/0xc0 net/core/skbuff.c:550
  skb_release_data+0x690/0x860 net/core/skbuff.c:570
  skb_release_all+0x4a/0x60 net/core/skbuff.c:627
  __kfree_skb net/core/skbuff.c:641 [inline]
  kfree_skb+0x195/0x560 net/core/skbuff.c:659
  dccp_v4_do_rcv+0x12b/0x180 net/dccp/ipv4.c:689
  sk_backlog_rcv include/net/sock.h:909 [inline]
  __release_sock+0x12f/0x3a0 net/core/sock.c:2335
  release_sock+0xa4/0x2b0 net/core/sock.c:2850
  dccp_sendmsg+0x771/0x1020 net/dccp/proto.c:820
  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  ___sys_sendmsg+0x525/0x940 net/socket.c:2117
  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2212
  __do_sys_sendmmsg net/socket.c:2241 [inline]
  __se_sys_sendmmsg net/socket.c:2238 [inline]
  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2238
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ba490d40
  which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 1154 bytes inside of
  2048-byte region [ffff8801ba490d40, ffff8801ba491540)
The buggy address belongs to the page:
page:ffffea0006e92400 count:1 mapcount:0 mapping:ffff8801ba4904c0 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff8801ba4904c0 0000000000000000 0000000100000003
raw: ffffea0006ed9a20 ffffea0006fa5fa0 ffff8801da800c40 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ba491080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ba491100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801ba491180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                            ^
  ffff8801ba491200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ba491280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] fix test_sockmap
From: John Fastabend @ 2018-05-25 14:01 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev
In-Reply-To: <a7a84334-b650-23e7-9df5-43a942ac9666@lab.ntt.co.jp>

On 05/25/2018 01:28 AM, Prashant Bhole wrote:
> 
> 
> On 5/24/2018 1:58 PM, John Fastabend wrote:
>> On 05/23/2018 09:47 PM, Prashant Bhole wrote:
>>>
>>>
>>> On 5/23/2018 6:44 PM, Prashant Bhole wrote:
>>>>
>>>>
>>>> On 5/22/2018 2:08 AM, John Fastabend wrote:
>>>>> On 05/20/2018 10:13 PM, Prashant Bhole wrote:
>>>>>>
>>>>>>
>>>>>> On 5/19/2018 1:42 AM, John Fastabend wrote:
>>>>>>> On 05/18/2018 12:17 AM, Prashant Bhole wrote:
>>>>>>>> This series fixes bugs in test_sockmap code. They weren't caught
>>>>>>>> previously because failure in RX/TX thread was not notified to the
>>>>>>>> main thread.
>>>>>>>>
>>>>>>>> Also fixed data verification logic and slightly improved test
>>>>>>>> output
>>>>>>>> such that parameters values (cork, apply, start, end) of failed
>>>>>>>> test
>>>>>>>> can be easily seen.
>>>>>>>>
>>>>>>>
>>>>>>> Great, this was on my list so thanks for taking care of it.
>>>>>>>
>>>>>>>> Note: Even after fixing above problems there are issues with tests
>>>>>>>> which set cork parameter. Tests fail (RX thread timeout) when cork
>>>>>>>> value is non-zero and overall data sent by TX thread isn't
>>>>>>>> multiples
>>>>>>>> of cork value.
>>>>>>>
>>>>>>>
>>>>>>> This is expected. When 'cork' is set the sender should only xmit
>>>>>>> the data when 'cork' bytes are available. If the user doesn't
>>>>>>> provide the N bytes the data is cork'ed waiting for the bytes and
>>>>>>> if the socket is closed the state is cleaned up. What these tests
>>>>>>> are testing is the cleanup path when a user doesn't provide the
>>>>>>> N bytes. In practice this is used to validate headers and prevent
>>>>>>> users from sending partial headers. We want to keep these tests
>>>>>>> because
>>>>>>> they verify a tear-down path in the code.
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>>>
>>>>>>> After your changes do these get reported as failures? If so we
>>>>>>> need to account for the above in the calculations.
>>>>>>
>>>>>> Yes, cork related test are reported as failures because of RX thread
>>>>>> timeout.
>>>>>>
>>>>>> So with your above description, I think we need to differentiate cork
>>>>>> tests with partial data and full data. In partial data test we can
>>>>>> have
>>>>>> something like "timeout_expected" flag. Any other way to fix it?
>>>>>>
>>>>>
>>>>> Adding a flag seems reasonable to me. Lets do this for now. Also I
>>>>> plan to add more negative tests so we can either use the same
>>>>> flag or a new one for those cases as well.
>>>>>
>>>>
>>>> John,
>>>> I worked on this for some time and noticed that the RX-timeout of
>>>> tests with cork parameter is dependent on various parameters. So we
>>>> can not set a flag like the way 'drop_expected' flag is set before
>>>> executing the test.
>>>>
>>>> So I decided to write a function which judges all parameters before
>>>> each test and decides whether a test with cork parameter will
>>>> timeout or not. Then the conditions in the function became
>>>> complicated. For example some tests fail if opt->rate < 17 (with
>>>> some other conditions). Here is 17 is related to FRAGS_PER_SKB.
>>>> Consider following two examples.
>>> I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/
>>>
>>>>
>>>> ./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage
>>>> --txmsg --txmsg_cork 1024   # RX timeout occurs
>>>>
>>>> ./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage
>>>> --txmsg --txmsg_cork 1024   # Success!
>>>>
>>
>> Ah yes this hits the buffer limit and flushes the queue. The kernel
>> side doesn't know how to merge those specific sendpage requests so
>> it gives each request its own buffer and when the limit is reached
>> we flush it.
>>
>>>> Do we need to keep such tests? if yes, then I will continue with
>>>> adding such conditions in the function.
>>>>
>>
>> Yes, these tests are needed because they are testing the edge cases.
>> These are probably the most important tests because my normal usage
>> will catch any issues in the "good" cases its these types of things
>> that can go unnoticed (at least for a short while) if we don't have
>> specific tests for them.
> 
> I tried but it is difficult to come up with a right set of conditions
> which lead to test failure.
> 

Agreed, it can be yes. How about adding your logic for all tests except
"cork" cases. If there is a flag to set if the timeout is expected we
can always manually set it in the test invocation. Might not be as
nice as automatically learning the expected results but possibly easier
than building some complicated logic to figure it out.

Would you mind submitting your series again without the "cork" tests
being tracked? And if you want add a bit to tell if the "cork" tests are
going to timeout or not setting it per test manually. But I think
your series can just omit the cork test for now and still be useful.

> -Prashant
>>
>> Thanks for doing this.
>> John
> 
> 

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-05-25 14:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180524182015.1af7b4c9@cakuba>

On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> Hi Bjorn!
> 
> On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > to not fail, e.g.:
> > > 
> > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > 
> > > For devices which VF support depends on loaded FW we have the
> > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > to 0.  Remove the special values completely and simply initialize
> > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > Add a helper for drivers to reset the VF limit back to total.  
> > 
> > I still can't really make sense out of the changelog.
> >
> > I think part of the reason it's confusing is because there are two
> > things going on:
> > 
> >   1) You want this:
> >   
> >        pci_sriov_set_totalvfs(dev, 0);
> >        x = pci_sriov_get_totalvfs(dev) 
> > 
> >      to return 0 instead of total_VFs.  That seems to connect with
> >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> >      0, but I don't know how that is useful (I'm sure it is; just
> >      educate me :))
> 
> Let me just quote the bug report that got filed on our internal bug
> tracker :)
> 
>   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
>   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
>   then tries to set that as the sriov_numvfs parameter.
> 
>   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
>   but it's set to max.  When FW is switched to flower*, the correct 
>   sriov_totalvfs value is presented.
> 
> * flower is a project name

>From the point of view of the PCI core (which knows nothing about
device firmware and relies on the architected config space described
by the PCIe spec), this sounds like an erratum: with some firmware
installed, the device is not capable of SR-IOV, but still advertises
an SR-IOV capability with "TotalVFs > 0".

Regardless of whether that's an erratum, we do allow PF drivers to use
pci_sriov_set_totalvfs() to limit the number of VFs that may be
enabled by writing to the PF's "sriov_numvfs" sysfs file.

But the current implementation does not allow a PF driver to limit VFs
to 0, and that does seem nonsensical.

> My understanding is OpenStack uses sriov_totalvfs to determine how many
> VFs can be enabled, looks like this is the code:
> 
> http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> 
> >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> >      sure what you intend for this.  Is *every* driver supposed to
> >      call it in .remove()?  Could/should this be done in the core
> >      somehow instead of depending on every driver?
> 
> Good question, I was just thinking yesterday we may want to call it
> from the core, but I don't think it's strictly necessary nor always
> sufficient (we may reload FW without re-probing).
> 
> We have a device which supports different number of VFs based on the FW
> loaded.  Some legacy FWs does not inform the driver how many VFs it can
> support, because it supports max.  So the flow in our driver is this:
> 
> load_fw(dev);
> ...
> max_vfs = ask_fw_for_max_vfs(dev);
> if (max_vfs >= 0)
> 	return pci_sriov_set_totalvfs(dev, max_vfs);
> else /* FW didn't tell us, assume max */
> 	return pci_sriov_reset_totalvfs(dev); 
> 
> We also reset the max on device remove, but that's not strictly
> necessary.
> 
> Other users of pci_sriov_set_totalvfs() always know the value to set
> the total to (either always get it from FW or it's a constant).
> 
> If you prefer we can work out the correct max for those legacy cases in
> the driver as well, although it seemed cleaner to just ask the core,
> since it already has total_VFs value handy :)
> 
> > I'm also having a hard time connecting your user-space command example
> > with the rest of this.  Maybe it will make more sense to me tomorrow
> > after some coffee.
> 
> OpenStack assumes it will always be able to set sriov_numvfs to
> sriov_totalvfs, see this 'if':
> 
> http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512

Thanks for educating me.  I think there are two issues here that we
can separate.  I extracted the patch below for the first.

The second is the question of resetting driver_max_VFs.  I think we
currently have a general issue in the core:

  - load PF driver 1
  - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
  - unload PF driver 1
  - load PF driver 2

Now driver_max_VFs is still stuck at the lower value set by driver 1.
I don't think that's the way this should work.

I guess this is partly a consequence of setting driver_max_VFs in
sriov_init(), which is called before driver attach and should only
depend on hardware characteristics, so it is related to the patch
below.  But I think we should fix it in general, not just for
netronome.


commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Fri May 25 08:18:34 2018 -0500

    PCI/IOV: Allow PF drivers to limit total_VFs to 0
    
    Some SR-IOV PF drivers implement .sriov_configure(), which allows
    user-space to enable VFs by writing the desired number of VFs to the sysfs
    "sriov_numvfs" file (see sriov_numvfs_store()).
    
    The PCI core limits the number of VFs to the TotalVFs advertised by the
    device in its SR-IOV capability.  The PF driver can limit the number of VFs
    to even fewer (it may have pre-allocated data structures or knowledge of
    device limitations) by calling pci_sriov_set_totalvfs(), but previously it
    could not limit the VFs to 0.
    
    Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
    by the PF driver, even if the limit is 0.
    
    This sequence:
    
      pci_sriov_set_totalvfs(dev, 0);
      x = pci_sriov_get_totalvfs(dev);
    
    previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
    "x" to 0.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 192b82898a38..d0d73dbbd5ca 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
 	iov->nres = nres;
 	iov->ctrl = ctrl;
 	iov->total_VFs = total;
+	iov->driver_max_VFs = total;
 	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
 	iov->pgsz = pgsz;
 	iov->self = dev;
@@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
 	if (!dev->is_physfn)
 		return 0;
 
-	if (dev->sriov->driver_max_VFs)
-		return dev->sriov->driver_max_VFs;
-
-	return dev->sriov->total_VFs;
+	return dev->sriov->driver_max_VFs;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
 

^ permalink raw reply related

* Re: STMMAC driver with TSO enabled issue
From: Jose Abreu @ 2018-05-25 14:05 UTC (permalink / raw)
  To: Bhadram Varka, Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <3143ce1f-30e7-6b8d-06b5-6048abab54bc@nvidia.com>

Hi Bhadram,

On 25-05-2018 05:41, Bhadram Varka wrote:
> Hi Jose,
>
> On 5/24/2018 3:01 PM, Jose Abreu wrote:
>> Hi Bhadram,
>>
>> On 24-05-2018 06:58, Bhadram Varka wrote:
>>>
>>> After some time if check Tx descriptor status - then I see only
>>> below
>>>
>>> [..]
>>> [85788.286730] 027 [0x827951b0]: 0xf854f000 0x0 0x16d8
>>> 0x90000000
>>>
>>> index 025 and 026 descriptors processed but not index 027.
>>>
>>> At this stage Tx DMA is always in below state -
>>>
>>> ■ 3'b011: Running (Reading Data from system memory
>>> buffer and queuing it to the Tx buffer (Tx FIFO))
>>
>> Thats strange, I think the descriptors look okay though. I will
>> need the registers values (before the lock) and, if possible, the
>> git bisect output.
>
> Attaching the register dump file after the issue observed.
> Please check once.
>

----->8-----
0x112c = 0x0000003F
0x11ac = 0x0000003F
0x122c = 0x0000003F
0x12ac = 0x0000003F

0x1130 = 0x0000003F
0x11b0 = 0x0000003F
0x1230 = 0x0000003F
0x12b0 = 0x0000003F
----->8-----

This can't be right, it should be DMA_{RX/TX}_SIZE - 1 = 511. Did
you change these values in the code?

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

* KASAN: use-after-free Write in tls_push_record
From: syzbot @ 2018-05-25 14:16 UTC (permalink / raw)
  To: aviadye, borisp, davejwatson, davem, linux-kernel, netdev,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    13405468f49d bpfilter: don't pass O_CREAT when opening con..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=109ad82f800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8be0182d69f8d422
dashboard link: https://syzkaller.appspot.com/bug?extid=709f2810a6a05f11d4d3
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=151ec3a7800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=154d302f800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com

RDX: 00000000fffffdef RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00007ffd6ccdd780 R08: 0000000020000000 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000004
R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000000
==================================================================
BUG: KASAN: use-after-free in tls_fill_prepend include/net/tls.h:339  
[inline]
BUG: KASAN: use-after-free in tls_push_record+0x1023/0x13e0  
net/tls/tls_sw.c:240
Write of size 1 at addr ffff8801d88d5000 by task syz-executor377/4600

CPU: 1 PID: 4600 Comm: syz-executor377 Not tainted 4.17.0-rc6+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435
  tls_fill_prepend include/net/tls.h:339 [inline]
  tls_push_record+0x1023/0x13e0 net/tls/tls_sw.c:240
  tls_sw_sendmsg+0x9de/0x12b0 net/tls/tls_sw.c:484
  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  __sys_sendto+0x3d7/0x670 net/socket.c:1789
  __do_sys_sendto net/socket.c:1801 [inline]
  __se_sys_sendto net/socket.c:1797 [inline]
  __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4416d9
RSP: 002b:00007ffd6ccdd758 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004416d9
RDX: 00000000fffffdef RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00007ffd6ccdd780 R08: 0000000020000000 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000004
R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0007623540 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0007592b60 ffff8801dae2fdd8 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801d88d4f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801d88d4f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801d88d5000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                    ^
  ffff8801d88d5080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8801d88d5100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [patch iproute2/net-next 2/2] devlink: introduce support for showing port number and split subport number
From: David Ahern @ 2018-05-25 14:19 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, idosch, jakub.kicinski, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, simon.horman,
	pieter.jansenvanvuuren, john.hurley, dirk.vandermerwe,
	alexander.h.duyck, ogerlitz, vijaya.guvva, satananda.burla,
	raghu.vatsavayi, felix.manlunas, gospo, sathya.perla,
	vasundhara-v.volam, tariqt, eranbe, jeffrey.t.kirsher, roopa
In-Reply-To: <20180524063952.GC2295@nanopsycho>

On 5/24/18 12:39 AM, Jiri Pirko wrote:
> Wed, May 23, 2018 at 10:05:49PM CEST, dsahern@gmail.com wrote:
>> On 5/20/18 2:15 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  devlink/devlink.c            | 6 ++++++
>>>  include/uapi/linux/devlink.h | 2 ++
>>>  2 files changed, 8 insertions(+)
>>>
>>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>>> index df2c66dac1c7..b0ae17767dab 100644
>>> --- a/devlink/devlink.c
>>> +++ b/devlink/devlink.c
>>> @@ -1737,9 +1737,15 @@ static void pr_out_port(struct dl *dl, struct nlattr **tb)
>>>  
>>>  		pr_out_str(dl, "flavour", port_flavour_name(port_flavour));
>>>  	}
>>> +	if (tb[DEVLINK_ATTR_PORT_NUMBER])
>>> +		pr_out_uint(dl, "number",
>>> +			    mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_NUMBER]));
>>
>> "number" is a label means nothing. "port" is more descriptive.
> 
> That attribute name is "port_number". As the other attributes are
> named "port_something", and the "something" is printed out here, the
> "number" is consistent with it. Each line represents a port with a list
> of attributes.

The name of the attribute is not relevant here. That's an API that very
few people will see. I am looking at this from a user perspective and
the word "number" followed by a number is not clear.

> 
>>
>> # ./devlink port
>> pci/0000:03:00.0/1: type eth netdev swp17 flavour physical number 17
>> pci/0000:03:00.0/3: type eth netdev swp18 flavour physical number 18
>> pci/0000:03:00.0/5: type eth netdev swp19 flavour physical number 19
>> pci/0000:03:00.0/7: type eth netdev swp20 flavour physical number 20
>> pci/0000:03:00.0/9: type eth netdev swp21 flavour physical number 21
>> ...
>> pci/0000:03:00.0/61: type eth netdev swp1s0 flavour physical number 1
>> split_group 1 subport 0
>> pci/0000:03:00.0/62: type eth netdev swp1s1 flavour physical number 1
>> split_group 1 subport 1
>>

^ permalink raw reply

* Re: [PATCH] ath10k: transmit queued frames after waking queues
From: Niklas Cassel @ 2018-05-25 14:21 UTC (permalink / raw)
  To: Bob Copeland
  Cc: Adrian Chadd, Kalle Valo, David Miller, ath10k, linux-wireless,
	netdev, Linux Kernel Mailing List
In-Reply-To: <20180525125023.alc42lkgehc6iodg@localhost>

On Fri, May 25, 2018 at 08:50:23AM -0400, Bob Copeland wrote:
> On Fri, May 25, 2018 at 02:36:56PM +0200, Niklas Cassel wrote:
> > A spin lock does have the advantage of ordering: memory operations issued
> > before the spin_unlock_bh() will be completed before the spin_unlock_bh()
> > operation has completed.
> > 
> > However, ath10k_htt_tx_dec_pending() was called earlier in the same function,
> > which decreases htt->num_pending_tx, so that write will be completed before
> > our read. That is the only ordering we care about here (if we should call
> > ath10k_mac_tx_push_pending() or not).
> 
> Sure.  I also understand that reading inside a lock and operating on the
> value outside the lock isn't really the definition of synchronization
> (doesn't really matter in this case though).
> 
> I was just suggesting that the implicit memory barrier in the spin unlock
> that we are already paying for would be sufficient here too, and it matches
> the semantic of "tx fields under tx_lock."  On the other hand, maybe it's
> just me, but I tend to look askance at just-in-case READ_ONCEs sprinkled
> about.

I agree, because of the implicit memory barrier from spin_unlock_bh(),
READ_ONCE shouldn't really be needed in this case.

I think that it's a good thing to be critical of all "just-in-case" things,
however, it's not always that obvious if you actually need READ_ONCE or not.

E.g. you might need to use it even when you are holding a spin_lock.

Some people recommend to use it for all concurrent non-read-only shared memory
accesses: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

Is there a better guideline somewhere..?


Kind regards,
Niklas

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: David Miller @ 2018-05-25 14:23 UTC (permalink / raw)
  To: qing.huang
  Cc: tariqt, haakon.bugge, yanjun.zhu, netdev, linux-rdma,
	linux-kernel, gi-oh.kim
In-Reply-To: <20180523232246.20445-1-qing.huang@oracle.com>

From: Qing Huang <qing.huang@oracle.com>
Date: Wed, 23 May 2018 16:22:46 -0700

> When a system is under memory presure (high usage with fragments),
> the original 256KB ICM chunk allocations will likely trigger kernel
> memory management to enter slow path doing memory compact/migration
> ops in order to complete high order memory allocations.
> 
> When that happens, user processes calling uverb APIs may get stuck
> for more than 120s easily even though there are a lot of free pages
> in smaller chunks available in the system.
> 
> Syslog:
> ...
> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
> oracle_205573_e:205573 blocked for more than 120 seconds.
> ...
> 
> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
> 
> However in order to support smaller ICM chunk size, we need to fix
> another issue in large size kcalloc allocations.
> 
> E.g.
> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
> entry). So we need a 16MB allocation for a table->icm pointer array to
> hold 2M pointers which can easily cause kcalloc to fail.
> 
> The solution is to use kvzalloc to replace kcalloc which will fall back
> to vmalloc automatically if kmalloc fails.
> 
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> Acked-by: Daniel Jurgens <danielj@mellanox.com>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox