Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
From: Vitaly Kuznetsov @ 2017-05-05 14:40 UTC (permalink / raw)
  To: David Miller; +Cc: xen-devel, netdev, linux-kernel, boris.ostrovsky, jgross
In-Reply-To: <20170504.112150.391662736580694835.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date: Thu,  4 May 2017 14:23:04 +0200
>
>> Unavoidable crashes in netfront_resume() and netback_changed() after a
>> previous fail in talk_to_netback() (e.g. when we fail to read MAC from
>> xenstore) were discovered. The failure path in talk_to_netback() does
>> unregister/free for netdev but we don't reset drvdata and we try accessing
>> it again after resume.
>> 
>> Reset drvdata in netback_changed() the same way we reset it in
>> netfront_probe() and check for NULL in both netfront_resume() and
>> netback_changed() to properly handle the situation.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> The circumstances under which netfront_probe() NULLs out the device
> private is different than what you propose here, which is to do it
> on a live device in netback_changed() whilst mutliple susbsytems
> have a reference to this device and can call into the driver still.
>
> It is only legal to do this in the probe function because such
> references and execution possibilities do not exist at that point.
>
> What really needs to happen is that the xenbus_driver must be told to
> unregister this xen device and stop making calls into the driver for
> it before you release the netdev state.
>
> That is the only reasonable way to fix this bug.

True,

after looking at the issue again I realized that removing half of the
device in talk_to_netback() is a mistake - we should either treat errors
as fatal and remove the device completely or leave netdev in place
hoping that it'll magically got fixed later. I'm leaning towards the
former, I tried and the following simple patch does the job:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6ffc482..7b61adb 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1934,8 +1934,7 @@ static int talk_to_netback(struct xenbus_device *dev,
        xennet_disconnect_backend(info);
        xennet_destroy_queues(info);
  out:
-       unregister_netdev(info->netdev);
-       xennet_free_netdev(info->netdev);
+       device_unregister(&dev->dev);
        return err;
 }

In case noone is against this big hammer I can send this as v2.

Thank you for your feedback, David!

-- 
  Vitaly

^ permalink raw reply related

* Re: [PATCH 0/4] TI Bluetooth serdev support
From: Adam Ford @ 2017-05-05 14:51 UTC (permalink / raw)
  To: Sebastian Reichel
  Cc: Rob Herring, Marcel Holtmann,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Johan Hedberg, Gustavo Padovan,
	Satish Patel, Wei Xu, Eyal Reizer, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20170430160430.rmyuo6sdrkrjxjg6@earth>

On Sun, Apr 30, 2017 at 11:04 AM, Sebastian Reichel <sre-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Hi,
>
> On Sun, Apr 30, 2017 at 10:14:20AM -0500, Adam Ford wrote:
>> On Wed, Apr 5, 2017 at 1:30 PM, Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > This series adds serdev support to the HCI LL protocol used on TI BT
>> > modules and enables support on HiKey board with with the WL1835 module.
>> > With this the custom TI UIM daemon and btattach are no longer needed.
>>
>> Without UIM daemon, what instruction do you use to load the BT firmware?
>>
>> I was thinking 'hciattach' but I was having trouble.  I was hoping you
>> might have some insight.
>>
>>  hciattach -t 30 -s 115200 /dev/ttymxc1 texas 3000000 flow  Just
>> returns a timeout.
>>
>> I modified my i.MX6 device tree per the binding documentation and
>> setup the regulators and enable GPIO pins.
>
> If you configured everything correctly no userspace interaction is
> required. The driver should request the firmware automatically once
> you power up the bluetooth device.
>
> Apart from DT changes make sure, that the following options are
> enabled and check dmesg for any hints.
>
> CONFIG_SERIAL_DEV_BUS
> CONFIG_SERIAL_DEV_CTRL_TTYPORT
> CONFIG_BT_HCIUART
> CONFIG_BT_HCIUART_LL
>


I have enabled those flags, and I have updated my device tree.
I am testing this on an OMAP3630 (DM3730) board with a WL1283.  I am
getting a lot of timeout errors.  I tested this against the original
implemention I had in pdata-quirks.c using the ti-st driver, uim & and
the btwilink driver.

I pulled in some of the newer patches to enable the wl1283-st, but I
am obviously missing something.

I   58.717651] Bluetooth: hci0: Reading TI version information failed
(-110)
[   58.724853] Bluetooth: hci0: download firmware failed, retrying...
[   60.957641] Bluetooth: hci0 command 0x1001 tx timeout
[   68.957641] Bluetooth: hci0: Reading TI version information failed
(-110)
[   68.964843] Bluetooth: hci0: download firmware failed, retrying...
[   69.132171] Bluetooth: Unknown HCI packet type 06
[   69.138244] Bluetooth: Unknown HCI packet type 0c
[   69.143249] Bluetooth: Unknown HCI packet type 40
[   69.148498] Bluetooth: Unknown HCI packet type 20
[   69.153533] Bluetooth: Data length is too large
[   69.158569] Bluetooth: Unknown HCI packet type a0
[   69.163574] Bluetooth: Unknown HCI packet type 00
[   69.168731] Bluetooth: Unknown HCI packet type 00
[   69.173736] Bluetooth: Unknown HCI packet type 34
[   69.178924] Bluetooth: Unknown HCI packet type 91
[   71.197631] Bluetooth: hci0 command 0x1001 tx timeout
[   79.197662] Bluetooth: hci0: Reading TI version information failed (-110)

Since the pdata-quirks and original ti-st drivers work together, I
know the hardware is fine.  The only change to the device tree is the
addition of the Bluetooth container:

bluetooth {
  compatible = "ti,wl1283-st";
  enable-gpios = <&gpio6 2 GPIO_ACTIVE_HIGH>;
};

Any thoughts or suggestions to try?  I get similar behavior on an
i.MX6 board with a wl1837-st module as well.

adam
> -- Sebastian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net] bridge: netlink: account for IFLA_BRPORT_{B,M}CAST_FLOOD size and policy
From: Nikolay Aleksandrov @ 2017-05-05 14:55 UTC (permalink / raw)
  To: Tobias Klauser, Stephen Hemminger, David S. Miller
  Cc: bridge, netdev, Mike Manning
In-Reply-To: <20170505143653.8486-1-tklauser@distanz.ch>

On 05/05/17 17:36, Tobias Klauser wrote:
> The attribute sizes for IFLA_BRPORT_MCAST_FLOOD and
> IFLA_BRPORT_BCAST_FLOOD weren't accounted for in br_port_info_size()
> when they were added. Do so now and also add the corresponding policy
> entries:
> 
> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Cc: Mike Manning <mmanning@brocade.com>
> Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag")
> Fixes: 99f906e9ad7b ("bridge: add per-port broadcast flood flag")
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> ---
>  net/bridge/br_netlink.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 

Oops, good catch.

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH net] bridge: netlink: account for IFLA_BRPORT_{B,M}CAST_FLOOD size and policy
From: David Miller @ 2017-05-05 15:22 UTC (permalink / raw)
  To: tklauser; +Cc: stephen, bridge, netdev, nikolay, mmanning
In-Reply-To: <20170505143653.8486-1-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Fri,  5 May 2017 16:36:53 +0200

> The attribute sizes for IFLA_BRPORT_MCAST_FLOOD and
> IFLA_BRPORT_BCAST_FLOOD weren't accounted for in br_port_info_size()
> when they were added. Do so now and also add the corresponding policy
> entries:
> 
> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Cc: Mike Manning <mmanning@brocade.com>
> Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag")
> Fixes: 99f906e9ad7b ("bridge: add per-port broadcast flood flag")
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* admin
From: administrador @ 2017-05-05 14:13 UTC (permalink / raw)
  To: Recipients

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de  enviar o recibir correo nuevo hasta que vuelva a validar subuzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación:

nombre:
Nombre de usuario:
contraseña: 
Confirmar contraseña: 
E-mail: 
teléfono: 0

Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es:00916gbd51.17 
Correo Soporte Técnico © 2017

¡gracias
Sistemas administrador

^ permalink raw reply

* Re: [PATCH v3 net] tcp: randomize timestamps on syncookies
From: David Miller @ 2017-05-05 16:00 UTC (permalink / raw)
  To: eric.dumazet; +Cc: fw, netdev, ycheng
In-Reply-To: <1493992614.7796.46.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 05 May 2017 06:56:54 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Whole point of randomization was to hide server uptime, but an attacker
> can simply start a syn flood and TCP generates 'old style' timestamps,
> directly revealing server jiffies value.
> 
> Also, TSval sent by the server to a particular remote address vary
> depending on syncookies being sent or not, potentially triggering PAWS
> drops for innocent clients.
> 
> Lets implement proper randomization, including for SYNcookies.
> 
> Also we do not need to export sysctl_tcp_timestamps, since it is not
> used from a module.
> 
> In v2, I added Florian feedback and contribution, adding tsoff to
> tcp_get_cookie_sock().
> 
> v3 removed one unused variable in tcp_v4_connect() as Florian spotted.
> 
> Fixes: 95a22caee396c ("tcp: randomize tcp timestamp offsets for each connection")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Florian Westphal <fw@strlen.de>
> Tested-by: Florian Westphal <fw@strlen.de>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Question about packet_mmap & huge pages
From: DESBRUS Maxime @ 2017-05-05 16:00 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,

I am currently evaluating various 0-copy packet capture frameworks on Linux.
One of them is packet mmap, provided by the mainline Linux kernel.

I want to capture on one network interface with several raw sockets, each being used in a dedicated thread (to spread the processing load on several cores).
My first test program works well, however I am now trying to gain even more performance by mapping RX rings inside 1GB huge pages (other 0-copy network frameworks like DPDK do this).

My first question is: is it possible to map the rx ring of packet_mmap using huge pages, with the current Linux kernel?
If it is possible, what specific flags must be passed to mmap?

Here is what I tried so far:
* Calling mmap on the socket file descriptor (to set the rx ring address) with MAP_SHARED | MAP_HUGETLB | MAP_HUGE_1GB
=> mmap returns error EINVAL
* Calling mmap on the socket file descriptor (to set the rx ring address) with MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_HUGE_1GB
=> mmap succeeds and a huge page is allocated, but the memory area is all zero and does not contain the expected ring structures. I suspect the MAP_ANONYMOUS flag makes it ignore the socket file descriptor, and the mapping is thus unrelated to the ring.
* Calling mmap first to map a huge page area by opening a file on a hugetlbfs mount, and then calling mmap on the socket file descriptor (to set the ring address) with an address inside the previously mapped area
=> the address hint is ignored, and thus the ring is not mapped inside a huge page

I am using "tpacket_v3" packet_mmap version and my kernel version is based on 4.8 (on Ubuntu 16.04). 
I reserve huge pages with the kernel boot command line, as recommended in Documentation/vm/hugetlbpage.txt.

Thank you in advance for your guidance

^ permalink raw reply

* [PATCH] misplaced EXPORT_SYMBOL_GPL(ping_hash) in net/ipv4/ping.c
From: Vladis Dronov @ 2017-05-05 16:17 UTC (permalink / raw)
  To: netdev; +Cc: Vladis Dronov

Move misplaced EXPORT_SYMBOL_GPL(ping_hash) to a proper place.

Signed-off-by: Vladis Dronov <vdronov@redhat.com>
---

Actually, this is so small and unimportant (it just hurts my perfectionism),
so does not worth a separate patch. Please, feel free to make it a part of
some patch of yours.

 net/ipv4/ping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index ccfbce1..19f0b7b 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -71,7 +71,6 @@ static inline u32 ping_hashfn(const struct net *net, u32 num, u32 mask)
 	pr_debug("hash(%u) = %u\n", num, res);
 	return res;
 }
-EXPORT_SYMBOL_GPL(ping_hash);
 
 static inline struct hlist_nulls_head *ping_hashslot(struct ping_table *table,
 					     struct net *net, unsigned int num)
@@ -152,6 +151,7 @@ int ping_hash(struct sock *sk)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(ping_hash);
 
 void ping_unhash(struct sock *sk)
 {
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next] ibmvnic: Track state of adapter napis
From: John Allen @ 2017-05-05 16:31 UTC (permalink / raw)
  To: netdev; +Cc: Nathan Fontenot, Thomas Falcon, brking, muvic

Track the state of ibmvnic napis. The driver can get into states where it
can be reset when napis are already disabled and attempting to disable them
again will cause the driver to hang.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 4f2d329..594ee6d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -675,8 +675,12 @@ static int __ibmvnic_open(struct net_device *netdev)
 	adapter->state = VNIC_OPENING;
 	replenish_pools(adapter);

-	for (i = 0; i < adapter->req_rx_queues; i++)
-		napi_enable(&adapter->napi[i]);
+	if (adapter->napi_disabled) {
+		for (i = 0; i < adapter->req_rx_queues; i++)
+			napi_enable(&adapter->napi[i]);
+
+		adapter->napi_disabled = false;
+	}

 	/* We're ready to receive frames, enable the sub-crq interrupts and
 	 * set the logical link state to up
@@ -780,9 +784,11 @@ static int __ibmvnic_close(struct net_device *netdev)
 	adapter->state = VNIC_CLOSING;
 	netif_tx_stop_all_queues(netdev);

-	if (adapter->napi) {
+	if (!adapter->napi_disabled) {
 		for (i = 0; i < adapter->req_rx_queues; i++)
 			napi_disable(&adapter->napi[i]);
+
+		adapter->napi_disabled = true;
 	}

 	clean_tx_pools(adapter);
@@ -3540,6 +3546,9 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		free_netdev(netdev);
 		return rc;
 	}
+
+	adapter->napi_disabled = true;
+
 	dev_info(&dev->dev, "ibmvnic registered\n");

 	adapter->state = VNIC_PROBED;
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 4702b48..12b2400 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1031,4 +1031,5 @@ struct ibmvnic_adapter {
 	struct list_head rwi_list;
 	struct work_struct ibmvnic_reset;
 	bool resetting;
+	bool napi_disabled;
 };

^ permalink raw reply related

* Re: [PATCH net-next] ibmvnic: Track state of adapter napis
From: David Miller @ 2017-05-05 16:43 UTC (permalink / raw)
  To: jallen; +Cc: netdev, nfont, tlfalcon, brking, muvic
In-Reply-To: <47c6bbb6-faa0-b07b-871f-10e804b2152d@linux.vnet.ibm.com>

From: John Allen <jallen@linux.vnet.ibm.com>
Date: Fri, 5 May 2017 11:31:58 -0500

> Track the state of ibmvnic napis. The driver can get into states where it
> can be reset when napis are already disabled and attempting to disable them
> again will cause the driver to hang.
> 
> Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>

The net-next tree is closed, resubmit this when the net-next tree opens
back up.

Thanks.

^ permalink raw reply

* Re: [PATCH] misplaced EXPORT_SYMBOL_GPL(ping_hash) in net/ipv4/ping.c
From: David Miller @ 2017-05-05 16:44 UTC (permalink / raw)
  To: vdronov; +Cc: netdev
In-Reply-To: <20170505161716.6752-1-vdronov@redhat.com>

From: Vladis Dronov <vdronov@redhat.com>
Date: Fri,  5 May 2017 18:17:16 +0200

> Move misplaced EXPORT_SYMBOL_GPL(ping_hash) to a proper place.
> 
> Signed-off-by: Vladis Dronov <vdronov@redhat.com>

Please use a proper subject line with an appropriate subsystem
prefix.

Also you must indicate the intended target tree (net or net-next)
in the [] brackets.  For example:

	[PATCH net] ipv4: Fix misplaced EXPORT_SYMBOL_GPL ...

^ permalink raw reply

* Re: [PATCH iproute2] vxlan: Add support for modifying vxlan device attributes
From: Stephen Hemminger @ 2017-05-05 16:47 UTC (permalink / raw)
  To: Girish Moodalbail; +Cc: netdev
In-Reply-To: <c5c7969d-9808-13a9-316b-afe590997127@oracle.com>

On Thu, 4 May 2017 17:26:23 -0700
Girish Moodalbail <girish.moodalbail@oracle.com> wrote:

> On 5/4/17 5:07 PM, Stephen Hemminger wrote:
> > On Thu,  4 May 2017 14:46:34 -0700
> > Girish Moodalbail <girish.moodalbail@oracle.com> wrote:
> >  
> >> Ability to change vxlan device attributes was added to kernel through
> >> commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
> >> cannot do the same through ip(8) command.  Changing the allowed vxlan
> >> device attributes using 'ip link set dev <vxlan_name> type vxlan
> >> <allowed_attributes>' currently fails with 'operation not supported'
> >> error.  This failure is due to the incorrect rtnetlink message
> >> construction for the 'ip link set' operation.
> >>
> >> The vxlan_parse_opt() callback function is called for parsing options
> >> for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
> >> down default values for those attributes that were not provided as CLI
> >> options. However, for the 'set' case we should be only passing down the
> >> explicitly provided attributes and not any other (default) attributes.
> >>
> >> Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> >> ---  
> >
> > All these foo_set variables are ugly. This looks almost like machine
> > generated code. It doesn't read well.  
> 
> I thought about it, however I wasn't sure if refactoring that whole routine will 
> be well received so I decided to follow the current model that already existed 
> in iplink_vxlan.c. I will re-submit a patch cleaning up that whole routine.
> 
> thanks,
> ~Girish
> 

Thanks. This is one of those cases where something new gets added and additional
refactoring (that was overdue) is needed.

^ permalink raw reply

* Re: [PATCH iproute] tc: Reflect HW offload status
From: Stephen Hemminger @ 2017-05-05 16:49 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roi Dayan, Paul Blakey
In-Reply-To: <1493903715-29664-1-git-send-email-ogerlitz@mellanox.com>

On Thu,  4 May 2017 16:15:15 +0300
Or Gerlitz <ogerlitz@mellanox.com> wrote:

> Currently there is no way of querying whether a filter is
> offloaded to HW or not when using "both" policy (where none
> of skip_sw or skip_hw flags are set by user-space).
> 
> Add two new flags, "in hw" and "not in hw" such that user
> space can determine if a filter is actually offloaded to
> hw or not. The "in hw" UAPI semantics was chosen so it's
> similar to the "skip hw" flag logic.
> 
> If none of these two flags are set, this signals running
> over older kernel.
> 
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>

Applied thanks

^ permalink raw reply

* Re: [patch iproute2 v2 0/2] devlink: Add support for pipeline
From: Stephen Hemminger @ 2017-05-05 16:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, arkadis, davem, mlxsw, dsa
In-Reply-To: <1493810723-2536-1-git-send-email-jiri@resnulli.us>

On Wed,  3 May 2017 13:25:21 +0200
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Arkadi says:
> 
> Add support for pipeline debug (dpipe). As a preparation step the netlink
> attribute validation was changed before adding new dpipe attributes.
> ---
> v1->v2
> - Change netlink attribute validation. 
> - Fix commit message typos
> 
> Arkadi Sharshevsky (2):
>   devlink: Change netlink attribute validation
>   devlink: Add support for pipeline debug (dpipe)
> 
>  devlink/devlink.c | 1450 ++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 1281 insertions(+), 169 deletions(-)
> 

Applied thanks.

^ permalink raw reply

* Re: [PATCH] net: ipv6: Fix warning of freeing alive inet6 address
From: Mike Manning @ 2017-05-05 16:51 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Andrey Konovalov, David Miller
In-Reply-To: <f52bbfcd-e961-190a-6aa5-3f43d833f27c@brocade.com>

On 03/05/17 19:24, Mike Manning wrote:
> On 03/05/17 18:58, Cong Wang wrote:
>> On Tue, May 2, 2017 at 11:30 AM, Mike Manning <mmanning@brocade.com> wrote:
>>> While this is not reproducible manually, Andrey's syzkaller program hit
>>> the warning "IPv6: Freeing alive inet6 address" with this part trace:
>>>
>>> inet6_ifa_finish_destroy+0x12e/0x190 c:894
>>> in6_ifa_put ./include/net/addrconf.h:330
>>> addrconf_dad_work+0x4e9/0x1040 net/ipv6/addrconf.c:3963
>>>
>>> The fix is to call in6_ifa_put() for the inet6_ifaddr before rather
>>> than after calling addrconf_ifdown(), as the latter may remove it from
>>> the address hash table.
>>>
>>> Fixes: 85b51b12115c ("net: ipv6: Remove addresses for failures with strict DAD")
>>> Reported-by: Andrey Konovalov <andreyknvl@google.com>
>>> Signed-off-by: Mike Manning <mmanning@brocade.com>
>>> ---
>>>  net/ipv6/addrconf.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>>> index 80ce478..361993a 100644
>>> --- a/net/ipv6/addrconf.c
>>> +++ b/net/ipv6/addrconf.c
>>> @@ -3902,8 +3902,11 @@ static void addrconf_dad_work(struct work_struct *w)
>>>         } else if (action == DAD_ABORT) {
>>>                 in6_ifa_hold(ifp);
>>>                 addrconf_dad_stop(ifp, 1);
>>> -               if (disable_ipv6)
>>> +               if (disable_ipv6) {
>>> +                       in6_ifa_put(ifp);
>>>                         addrconf_ifdown(idev->dev, 0);
>>> +                       goto unlock;
>>> +               }
>>
>>
>> But addrconf_dad_stop() calls ipv6_del_addr() which could unhash
>> the addr too...
>>

Further investigation shows that none of the code block above is at fault. Debugging
shows that the problem is happening with DAD_BEGIN and not DAD_ABORT. Follows more
detail on the issue, but as I do not have a fix at this stage, I retract this
submission altogether.

The problem is due to rapidly adding the same address fd00::bb on ip6tnl0, and also
without running DAD (accept_dad < 1), so it's an edge case. Typically the call to
addrconf_dad_work() starts with an ifp refcnt of 3. Then via addrconf_dad_begin()
and addrconf_dad_completed(), the call to addrconf_del_dad_work() results in a dec
of the refcnt to 2 due to the call to cancel_delayed_work() returning 1.

The 2nd normal case is if the call to addrconf_dad_work() starts with an ifp refcnt of
2, in which case the call to cancel_delayed_work() returns 0 and so no decrement
of the refcnt, which correctly stays at 2.

The error case is when the call to addrconf_dad_work() starts with an ifp refcnt of
2, but the call to cancel_delayed_work() then also results in a dec of the refcnt to 1,
so the final in6_ifa_put() detects that the refcnt is being reduced to 0 for an active
address.

So the question is whether the interaction of cancel_delayed_work() in 
addrconf_dad_work(), delayed_work_pending() in addrconf_mod_dad_work() and
INIT_DELAYED_WORK in ipv6_add_addr() [along with the handling for this when deleting
addresses] needs improving, and if so how?

^ permalink raw reply

* Re: net/smc and the RDMA core
From: Ursula Braun @ 2017-05-05 17:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: hch-jcswGhMUV9g@public.gmane.org, Sagi Grimberg, Bart Van Assche,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20170504153155.GB854-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>



On 05/04/2017 05:31 PM, Jason Gunthorpe wrote:
> On Thu, May 04, 2017 at 03:08:39PM +0200, Ursula Braun wrote:
>>
>>
>> On 05/04/2017 10:48 AM, hch-jcswGhMUV9g@public.gmane.org wrote:
>>> On Thu, May 04, 2017 at 11:43:50AM +0300, Sagi Grimberg wrote:
>>>> I would also suggest that you stop exposing the DMA MR for remote
>>>> access (at least by default) and use a proper reg_mr operations with a
>>>> limited lifetime on a properly sized buffer.
>>>
>>> Yes, exposing the default DMA MR is a _major_ security risk.  As soon
>>> as SMC is enabled this will mean a remote system has full read/write
>>> access to the local systems memory.
>>>
>>> There ??s a reason why I removed the ib_get_dma_mr function and replaced
>>> it with the IB_PD_UNSAFE_GLOBAL_RKEY key that has _UNSAFE_ in the name
>>> and a very long comment explaining why, and I'm really disappointed that
>>> we got a driver merged that instead of asking on the relevant list on
>>> why a change unexpertong a function it needed happened and instead
>>> tried the hard way to keep a security vulnerarbility alive.
>>>
>> Thanks for pointing out these problems. We will address them.
> 
> So, you've created a huge security hole in the kernel, anyone who
> loads your smc module is vunerable.
> 
> What are you going to do *RIGHT NOW* to mitigate this?
> 
> Jason

We do not see that just loading the smc module causes this issue.The security
risk starts with the first connection, that actually uses smc. This is only
possible if an AF_SMC socket connection is created while the so-called
pnet-table is available and offers a mapping between the used Ethernet
interface and RoCE device. Such a mapping has to be configured by a user
(via a netlink interface) and, thus, is a conscious decision by that user.

Nevertheless, thanks for all the valuable feedback; we take this security risk
seriously and addressing it is obviously at the top of our list. We're working
on this issue right now, and will post patches as soon as possible. 
 
Ursula

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: net/smc and the RDMA core
From: Jason Gunthorpe @ 2017-05-05 17:10 UTC (permalink / raw)
  To: Ursula Braun
  Cc: hch@lst.de, Sagi Grimberg, Bart Van Assche, davem@davemloft.net,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <750b09b5-f898-fe7f-1e82-1f6c06cc0f58@linux.vnet.ibm.com>

On Fri, May 05, 2017 at 07:06:56PM +0200, Ursula Braun wrote:

> We do not see that just loading the smc module causes this issue.The security
> risk starts with the first connection, that actually uses smc. This is only
> possible if an AF_SMC socket connection is created while the so-called
> pnet-table is available and offers a mapping between the used Ethernet
> interface and RoCE device. Such a mapping has to be configured by a user
> (via a netlink interface) and, thus, is a conscious decision by that user.

At a mimimum this escaltes any local root exploit to a full kernel
exploit in the presense of RDMA hardware, so I do not think you should
be so dimissive of the impact.

I recommend immediately sending a kconfig patch cc'd to stable making
SMC require CONFIG_BROKEN so that nobody inadvertantly turns it on.

Jason

^ permalink raw reply

* admin
From: administrador @ 2017-05-05 13:34 UTC (permalink / raw)
  To: Recipients

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de  enviar o recibir correo nuevo hasta que vuelva a validar subuzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación:

nombre:
Nombre de usuario:
contraseña: 
Confirmar contraseña: 
E-mail: 
teléfono: 0

Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es:00916gbd51.17 
Correo Soporte Técnico © 2017

¡gracias
Sistemas administrador

^ permalink raw reply

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: Alexei Starovoitov @ 2017-05-05 18:24 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev, xdp-newbies, Yonghong Song
In-Reply-To: <33505cff-f730-ebac-c2d7-38f1793062b7@fb.com>

On 5/1/17 8:49 PM, Alexei Starovoitov wrote:
> On 4/30/17 9:07 AM, David Miller wrote:
>> This is mainly a synchronization point, I still need to look
>> more deeply into Alexei's -g issue.
>>
>> New in this version from v3:
>>  - Remove tailcall from opcode table
>>  - Rearrange relocations so that numbers match with LLVM ones
>>  - Emit relocs properly so that dwarf2 debug info tests pass
>>  - Handle negative load/store offsets properly, add tests
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> dwarf on little endian works now :)

Yonghong fixed llvm bug with big-endian dwarf [1]
and binutils worked out of the box :)

$ ./bin/clang -O2 -target bpfeb -c -g test.c
$ /w/binutils-gdb/bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfbe

Disassembly of section .text:
0000000000000000 <bpf_prog1>:
int bpf_prog1(void *ign)
{
   volatile unsigned long t = 0x8983984739ull;
    0:	18 10 00 00 83 98 47 39 	ldimm64	r1, 590618314553
    8:	00 00 00 00 00 00 00 89
   10:	7b a1 ff f8 00 00 00 00 	stdw	[r10+-8], r1
   return *(unsigned long *)((0xffffffff8fff0002ull) + t);
   18:	79 1a ff f8 00 00 00 00 	lddw	r1, [r10+-8]

[1]
https://reviews.llvm.org/rL302265

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Bart Van Assche @ 2017-05-05 18:38 UTC (permalink / raw)
  To: leon@kernel.org
  Cc: jiri@mellanox.com, linux-rdma@vger.kernel.org,
	ram.amrani@cavium.com, sagi@grimberg.me, ogerlitz@mellanox.com,
	hch@lst.de, dennis.dalessandro@intel.com, netdev@vger.kernel.org,
	jgunthorpe@obsidianresearch.com, stephen@networkplumber.org,
	dledford@redhat.com, ariela@mellanox.com
In-Reply-To: <20170504184531.GE22833@mtr-leonro.local>

On Thu, 2017-05-04 at 21:45 +0300, Leon Romanovsky wrote:
> It is not hard to create new tool, the hardest part is to ensure that it is
> part of the distributions. Did you count how many months we are trying to
> add rdma-core to debian?

Hello Leon,

Sorry but I was not aware that the effort to add rdma-core to Debian was taking
that long. Please let me know if I can help with that effort.

Bart.

^ permalink raw reply

* Re: [PATCH] net: alx: handle pci_alloc_irq_vectors return correctly
From: David Miller @ 2017-05-05 18:52 UTC (permalink / raw)
  To: rakesh
  Cc: jcliburn, chris.snook, tobias.regnery, feng.tang, edumazet,
	netdev, linux-kernel, hch
In-Reply-To: <20170505112823.GA4019@hercules.tuxera.com>

From: Rakesh Pandit <rakesh@tuxera.com>
Date: Fri, 5 May 2017 14:28:23 +0300

> It was introduced while switching to pci_alloc_irq_vectors recently
> and fixes:
 ...
> Fixes: f3297f68 ("net: alx: switch to pci_alloc_irq_vectors")
> Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: David Miller @ 2017-05-05 18:53 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies, yhs
In-Reply-To: <bf3b6bde-d159-c6a3-a532-83cccb1833d9@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Fri, 5 May 2017 11:24:19 -0700

> Yonghong fixed llvm bug with big-endian dwarf [1]
> and binutils worked out of the box :)
> 
> $ ./bin/clang -O2 -target bpfeb -c -g test.c
> $ /w/binutils-gdb/bld/binutils/objdump -S test.o
> 
> test.o:     file format elf64-bpfbe
> 
> Disassembly of section .text:
> 0000000000000000 <bpf_prog1>:
> int bpf_prog1(void *ign)
> {
>   volatile unsigned long t = 0x8983984739ull;
 ...
> [1]
> https://reviews.llvm.org/rL302265

Great, that's good to know!

^ permalink raw reply

* [PATCH] vlan: Keep NETIF_F_HW_CSUM similar to other software devices
From: Vladislav Yasevich @ 2017-05-05 19:20 UTC (permalink / raw)
  To: netdev; +Cc: mkubecek, alexander.duyck, avagin, Vladislav Yasevich

Vlan devices, like all other software devices, enable
NETIF_F_HW_CSUM feature.  However, unlike all the othe other
software devices, vlans will switch to using IP|IPV6_CSUM
features, if the underlying devices uses them.  In these situations,
checksum offload features on the vlan device can't be controlled
via ethtool.

This patch makes vlans keep HW_CSUM feature if the underlying
device supports checksum offloading.  This makes vlan devices
behave like other software devices, and restores control to the
user.

A side-effect is that some offload settings (typically UFO)
may be enabled on the vlan device while being disabled on the HW.
However, the GSO code will correctly process the packets. This
actually results in slightly better raw throughput.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 net/8021q/vlan_dev.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 9ee5787..ffc8167 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -626,10 +626,16 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
 {
 	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
 	netdev_features_t old_features = features;
+	netdev_features_t real_dev_features = real_dev->features;

-	features = netdev_intersect_features(features, real_dev->vlan_features);
+	features = netdev_intersect_features(features,
+					     (real_dev->vlan_features |
+					      NETIF_F_HW_CSUM));
 	features |= NETIF_F_RXCSUM;
-	features = netdev_intersect_features(features, real_dev->features);
+	if (real_dev_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
+		real_dev_features |= NETIF_F_HW_CSUM;
+
+	features = netdev_intersect_features(features, real_dev_features);

 	features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
 	features |= NETIF_F_LLTX;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: David Miller @ 2017-05-05 19:43 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies
In-Reply-To: <20170501.235158.89801140704756675.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 01 May 2017 23:51:58 -0400 (EDT)

> From: Alexei Starovoitov <ast@fb.com>
> Date: Mon, 1 May 2017 20:49:21 -0700
> 
>> (gdb) x/10i bpf_prog1
>>    0x0 <bpf_prog1>:	ldimm64	r0, 590618314553
>>    0x10 <bpf_prog1+16>:	stdw	[r1+-8], r10
>>    0x18 <bpf_prog1+24>:	lddw	r10, [r1+-8]
>>    0x20 <bpf_prog1+32>:	add	r0, -1879113726
>>    0x28 <bpf_prog1+40>:	lddw	r1, [r0+0]
>>    0x30 <bpf_prog1+48>:	exit
>>    0x38:	Cannot access memory at address 0x38
>> 
>> the last line also seems wrong. Off by 1 error?
> 
> Maybe, I'll look into it tomorrow.

This is not a BPF specific problem, GDB does this for any non-linked
object you try to inspect under it.  F.e. on a sparc object:

(gdb) x/10i 0
   0x0 <foo>:   retl
   0x4 <foo+4>: clr  %o0
   0x8: Cannot access memory at address 0x8
(gdb)

Same behavior.

^ permalink raw reply

* [PATCH net] tcp: make congestion control optionally skip slow start after idle
From: Wei Wang @ 2017-05-05 19:53 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Yuchung Cheng, Neal Cardwell, Eric Dumazet, Wei Wang

From: Wei Wang <weiwan@google.com>

Congestion control modules that want full control over congestion
control behavior do not want the cwnd modifications controlled by
the sysctl_tcp_slow_start_after_idle code path.
So skip those code paths for CC modules that use the cong_control()
API.
As an example, those cwnd effects are not desired for the BBR congestion
control algorithm.

Fixes: c0402760f565 ("tcp: new CC hook to set sending rate with rate_sample in any CA state")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/tcp.h     | 4 +++-
 net/ipv4/tcp_output.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 270e5cc43c99..4e16486802fc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1234,10 +1234,12 @@ void tcp_cwnd_restart(struct sock *sk, s32 delta);
 
 static inline void tcp_slow_start_after_idle_check(struct sock *sk)
 {
+	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	struct tcp_sock *tp = tcp_sk(sk);
 	s32 delta;
 
-	if (!sysctl_tcp_slow_start_after_idle || tp->packets_out)
+	if (!sysctl_tcp_slow_start_after_idle || tp->packets_out ||
+	    ca_ops->cong_control)
 		return;
 	delta = tcp_time_stamp - tp->lsndtime;
 	if (delta > inet_csk(sk)->icsk_rto)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 60111a0fc201..4858e190f6ac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1514,6 +1514,7 @@ static void tcp_cwnd_application_limited(struct sock *sk)
 
 static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
 {
+	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	/* Track the maximum number of outstanding packets in each
@@ -1536,7 +1537,8 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
 			tp->snd_cwnd_used = tp->packets_out;
 
 		if (sysctl_tcp_slow_start_after_idle &&
-		    (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto)
+		    (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto &&
+		    !ca_ops->cong_control)
 			tcp_cwnd_application_limited(sk);
 
 		/* The following conditions together indicate the starvation
-- 
2.13.0.rc1.294.g07d810a77f-goog

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox