Netdev List
 help / color / mirror / Atom feed
* [PATCH v4 22/25] virtio_scsi: fix race on device removal
From: Michael S. Tsirkin @ 2014-10-13  7:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
	James E.J. Bottomley, virtualization, Paolo Bonzini, Amit Shah,
	v9fs-developer, David S. Miller
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>

We cancel event work on device removal, but an interrupt
could trigger immediately after this, and queue it
again.

To fix, set a flag.

Loosely based on patch by Paolo Bonzini

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 501838d..327eba0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -110,6 +110,9 @@ struct virtio_scsi {
 	/* CPU hotplug notifier */
 	struct notifier_block nb;
 
+	/* Protected by event_vq lock */
+	bool stop_events;
+
 	struct virtio_scsi_vq ctrl_vq;
 	struct virtio_scsi_vq event_vq;
 	struct virtio_scsi_vq req_vqs[];
@@ -303,6 +306,11 @@ static void virtscsi_cancel_event_work(struct virtio_scsi *vscsi)
 {
 	int i;
 
+	/* Stop scheduling work before calling cancel_work_sync.  */
+	spin_lock_irq(&vscsi->event_vq.vq_lock);
+	vscsi->stop_events = true;
+	spin_unlock_irq(&vscsi->event_vq.vq_lock);
+
 	for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++)
 		cancel_work_sync(&vscsi->event_list[i].work);
 }
@@ -390,7 +398,8 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
 {
 	struct virtio_scsi_event_node *event_node = buf;
 
-	queue_work(system_freezable_wq, &event_node->work);
+	if (!vscsi->stop_events)
+		queue_work(system_freezable_wq, &event_node->work);
 }
 
 static void virtscsi_event_done(struct virtqueue *vq)
-- 
MST

^ permalink raw reply related

* [PATCH v4 23/25] virtio_balloon: enable VQs early on restore
From: Michael S. Tsirkin @ 2014-10-13  7:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
	v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
	Christian Borntraeger, David S. Miller, Paolo Bonzini
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.

To fix, call virtio_device_ready before using VQ.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 25ebe8e..9629fad 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -538,6 +538,8 @@ static int virtballoon_restore(struct virtio_device *vdev)
 	if (ret)
 		return ret;
 
+	virtio_device_ready(vdev);
+
 	fill_balloon(vb, towards_target(vb));
 	update_balloon_size(vb);
 	return 0;
-- 
MST

^ permalink raw reply related

* [PATCH v4 24/25] virtio_scsi: drop scan callback
From: Michael S. Tsirkin @ 2014-10-13  7:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
	v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
	Christian Borntraeger, David S. Miller, Paolo Bonzini,
	James E.J. Bottomley
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>

Enable VQs early like we do for restore.
This makes it possible to drop the scan callback,
moving scanning into the probe function, and making
code simpler.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 327eba0..5f022ff 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -860,17 +860,6 @@ static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
 	virtscsi_vq->vq = vq;
 }
 
-static void virtscsi_scan(struct virtio_device *vdev)
-{
-	struct Scsi_Host *shost = virtio_scsi_host(vdev);
-	struct virtio_scsi *vscsi = shost_priv(shost);
-
-	if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
-		virtscsi_kick_event_all(vscsi);
-
-	scsi_scan_host(shost);
-}
-
 static void virtscsi_remove_vqs(struct virtio_device *vdev)
 {
 	struct Scsi_Host *sh = virtio_scsi_host(vdev);
@@ -1007,10 +996,13 @@ static int virtscsi_probe(struct virtio_device *vdev)
 	err = scsi_add_host(shost, &vdev->dev);
 	if (err)
 		goto scsi_add_host_failed;
-	/*
-	 * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan()
-	 * after VIRTIO_CONFIG_S_DRIVER_OK has been set..
-	 */
+
+	virtio_device_ready(vdev);
+
+	if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
+		virtscsi_kick_event_all(vscsi);
+
+	scsi_scan_host(shost);
 	return 0;
 
 scsi_add_host_failed:
@@ -1090,7 +1082,6 @@ static struct virtio_driver virtio_scsi_driver = {
 	.driver.owner = THIS_MODULE,
 	.id_table = id_table,
 	.probe = virtscsi_probe,
-	.scan = virtscsi_scan,
 #ifdef CONFIG_PM_SLEEP
 	.freeze = virtscsi_freeze,
 	.restore = virtscsi_restore,
-- 
MST



^ permalink raw reply related

* [PATCH v4 25/25] virtio-rng: refactor probe error handling
From: Michael S. Tsirkin @ 2014-10-13  7:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
	v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
	Christian Borntraeger, David S. Miller, Paolo Bonzini,
	Matt Mackall, Herbert Xu, Amos Kong, Sasha Levin
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>

Code like
	vi->vq = NULL;
	kfree(vi)
does not make sense.

Clean it up, use goto error labels for cleanup.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/char/hw_random/virtio-rng.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
index 132c9cc..72295ea 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -109,8 +109,8 @@ static int probe_common(struct virtio_device *vdev)
 
 	vi->index = index = ida_simple_get(&rng_index_ida, 0, 0, GFP_KERNEL);
 	if (index < 0) {
-		kfree(vi);
-		return index;
+		err = index;
+		goto err_ida;
 	}
 	sprintf(vi->name, "virtio_rng.%d", index);
 	init_completion(&vi->have_data);
@@ -128,13 +128,16 @@ static int probe_common(struct virtio_device *vdev)
 	vi->vq = virtio_find_single_vq(vdev, random_recv_done, "input");
 	if (IS_ERR(vi->vq)) {
 		err = PTR_ERR(vi->vq);
-		vi->vq = NULL;
-		kfree(vi);
-		ida_simple_remove(&rng_index_ida, index);
-		return err;
+		goto err_find;
 	}
 
 	return 0;
+
+err_find:
+	ida_simple_remove(&rng_index_ida, index);
+err_ida:
+	kfree(vi);
+	return err;
 }
 
 static void remove_common(struct virtio_device *vdev)
-- 
MST

^ permalink raw reply related

* Re: [PATCH] net: can: esd_usb2: fix memory leak on disconnect
From: Matthias Fuchs @ 2014-10-13  8:05 UTC (permalink / raw)
  To: Alexey Khoroshilov, Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, ldv-project@linuxtesting.org
In-Reply-To: <1412973067-29707-1-git-send-email-khoroshilov@ispras.ru>

Hi Alexey,

On 10/10/2014 10:31 PM, Alexey Khoroshilov wrote:
> It seems struct esd_usb2 dev is not deallocated on disconnect.
> 
> The patch adds the deallocation.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
> ---
>  drivers/net/can/usb/esd_usb2.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/can/usb/esd_usb2.c b/drivers/net/can/usb/esd_usb2.c
> index b7c9e8b11460..7a90075529c3 100644
> --- a/drivers/net/can/usb/esd_usb2.c
> +++ b/drivers/net/can/usb/esd_usb2.c
> @@ -1143,6 +1143,7 @@ static void esd_usb2_disconnect(struct usb_interface *intf)
>  			}
>  		}
>  		unlink_all_urbs(dev);
> +		kfree(dev);
>  	}
>  }
>  
> 
thanks for pointing this out. Marc, can you please catch this up.

Matthias

Acked-by: Matthias Fuchs <matthias.fuchs@esd.eu>

^ permalink raw reply

* Re: ipv4: net namespace does not inherit network configurations
From: zhuyj @ 2014-10-13  8:20 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Hong Zhiguo, LKML, netdev, Tao, Yue,
	Alexandre Dietsch, zhuyj
In-Reply-To: <CAHA+R7MC2gKeSWqR8pDDX26D-Th4BG1AveM+HseMMPjLBJuWDw@mail.gmail.com>

Hi, Miller && Cong

Can we merge this patch into kernel mainline? since the independence
between ipv4 and ipv6 is inconsistent even in the latest linux 
kernel(3.17-rc7),
that is, the net namespace is independent in ipv6 while it is not in ipv4.

Thanks a lot.
Zhu Yanjun

On 07/30/2014 01:48 AM, Cong Wang wrote:
> On Tue, Jul 29, 2014 at 2:29 AM, zhuyj <zyjzyj2000@gmail.com> wrote:
>> Hi,all
>>
>> I did a test on kernel3.16 rc6:
>>
>> root@qemu1:~# echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
>> root@qemu1:~# echo 1 > /proc/sys/net/ipv4/conf/all/forwarding
>> root@qemu1:~# ip netns list
>> root@qemu1:~# ip netns add fib1
>> root@qemu1:~# ip netns exec fib1 bash
>> root@qemu1:~# cat /proc/sys/net/ipv6/conf/all/forwarding
>> 0
>> root@qemu1:~# cat /proc/sys/net/ipv4/conf/all/forwarding
>> 1
>>
>> The behavior of ipv4 and ipv6 is very inconsistent. I checked
>> the kernel source code. I found that from this patch
>> [ipv6: fix bad free of addrconf_init_net], the above difference
>> appeared.
>>
>> Since a net namespace is independent to another. That is, there
>> is no any relationship between the net namespaces. So the behavior
>> of ipv4 is not correct.
>>
> Well, they are already independent, not shared, just that the initial
> value is duplicated from init_net for IPv4.
>
> This change might break existing applications which rely on this
> behavior, but given IPv6 change is almost the same, I think it's ok.
>
> BTW, you need to submit a patch as normal, instead of as an attachment.
>

^ permalink raw reply

* RE: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: David Laight @ 2014-10-13  8:32 UTC (permalink / raw)
  To: 'David Miller', alexander.h.duyck@redhat.com
  Cc: eric.dumazet@gmail.com, alexander.duyck@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <20141010.135851.1743803688676076555.davem@davemloft.net>

From: David Miller
> From: Alexander Duyck <alexander.h.duyck@redhat.com>
> Date: Fri, 10 Oct 2014 09:50:17 -0700
> 
> > If I just use get_unaligned that is pretty easy in terms of cleanup
> > for the ports and IPv4 addresses, the IPv6 will still be a significant
> > hurdle to overcome though.
> 
> Actually, it's not that simple.
> 
> When the compiler sees things like "th->doff" it will load the 32-bit
> word that 4-bit field contains and extract the value using shifts and
> masking.
> 
> So we might need to sprinkle a "attribute((packed))" here and there
> to make it work.

Marking a structure 'packed' forces the compiler to generate byte accesses.
It is enough to mark the 32bit members with __attribute__((aligned(2)))
(or use a typedef for u32 that includes that attribute).
Then the compiler will use 16bit accesses for that field.

	David

^ permalink raw reply

* RE: [PATCH] net: wireless: brcm80211: brcmfmac: dhd_sdio.c: Cleaning up missing null-terminate in conjunction with strncpy
From: David Laight @ 2014-10-13  8:55 UTC (permalink / raw)
  To: 'Rickard Strandqvist', Brett Rudley, Arend van Spriel
  Cc: Hante Meuleman, John W. Linville, Pieter-Paul Giesberts,
	Daniel Kim, linux-wireless@vger.kernel.org,
	brcm80211-dev-list@broadcom.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1413071551-15372-1-git-send-email-rickard_strandqvist@spectrumdigital.se>

From: Rickard Strandqvist
> Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
> And changed from using strncpy to strlcpy to simplify code.

I think you should return an error if the strings get truncated.
Silent truncation is going to lead to issues at some point in the future
(in some places).

> Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
> ---
>  drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c |   25 ++++++++++----------
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> index f55f625..d20d4e6 100644
> --- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> +++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> @@ -670,7 +670,6 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci,
>  				  struct brcmf_sdio_dev *sdiodev)
>  {
>  	int i;
> -	uint fw_len, nv_len;
>  	char end;
> 
>  	for (i = 0; i < ARRAY_SIZE(brcmf_fwname_data); i++) {
> @@ -684,25 +683,25 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci,
>  		return -ENODEV;
>  	}
> 
> -	fw_len = sizeof(sdiodev->fw_name) - 1;
> -	nv_len = sizeof(sdiodev->nvram_name) - 1;
>  	/* check if firmware path is provided by module parameter */
>  	if (brcmf_firmware_path[0] != '\0') {
> -		strncpy(sdiodev->fw_name, brcmf_firmware_path, fw_len);
> -		strncpy(sdiodev->nvram_name, brcmf_firmware_path, nv_len);
> -		fw_len -= strlen(sdiodev->fw_name);
> -		nv_len -= strlen(sdiodev->nvram_name);
> +		strlcpy(sdiodev->fw_name, brcmf_firmware_path,
> +			sizeof(sdiodev->fw_name));
> +		strlcpy(sdiodev->nvram_name, brcmf_firmware_path,
> +			sizeof(sdiodev->nvram_name));
> 
>  		end = brcmf_firmware_path[strlen(brcmf_firmware_path) - 1];

If you are doing a strlen() here, you could use the length for the copy
and/or use it to avoid the strcat().

>  		if (end != '/') {
> -			strncat(sdiodev->fw_name, "/", fw_len);
> -			strncat(sdiodev->nvram_name, "/", nv_len);
> -			fw_len--;
> -			nv_len--;
> +			strlcat(sdiodev->fw_name, "/",
> +				sizeof(sdiodev->fw_name));
> +			strlcat(sdiodev->nvram_name, "/",
> +				sizeof(sdiodev->nvram_name));
>  		}
>  	}
> -	strncat(sdiodev->fw_name, brcmf_fwname_data[i].bin, fw_len);
> -	strncat(sdiodev->nvram_name, brcmf_fwname_data[i].nv, nv_len);
> +	strlcat(sdiodev->fw_name, brcmf_fwname_data[i].bin,
> +		sizeof(sdiodev->fw_name));
> +	strlcat(sdiodev->nvram_name, brcmf_fwname_data[i].nv,
> +		sizeof(sdiodev->nvram_name));

I assume something ensures that fw_name[0] == 0 here.

	David

^ permalink raw reply

* Re: vxlan gro problem ?
From: yinpeijun @ 2014-10-13  9:14 UTC (permalink / raw)
  To: Or Gerlitz, qinchuanyu
  Cc: netdev, linux-kernel, lichunhe, wangfakai, liuyongan
In-Reply-To: <543ADBA3.5030305@mellanox.com>

On 2014/10/13 3:50, Or Gerlitz wrote:
> On 10/8/2014 10:46 AM, yinpeijun wrote:
>> Hi all,
>>          recently Linux 3.14 has been released and I find the networking has added udp gro and vxlan gro funtion, then I use the redhat 7.0(there is also add this funtion)
>> to test, I use kernel vxlan module and  create a vxlan device then attach the device to  ovs  bridge , the configure as follow:
>>         root@25:~$ ip link
>>          15: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT
>>              link/ether be:e1:ae:3d:8b:f2 brd ff:ff:ff:ff:ff:ff
>>          16: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq master ovs-system state UNKNOWN mode DEFAULT qlen 5000
>>                 root@25:~$ ovs-vsctl show
>>          aa1294f3-9952-4393-b2b5-54e9a6eb76ee
>>          Bridge ovs-vx
>>              Port ovs-vx
>>                  Interface ovs-vx
>>                      type: internal
>>              Port "vnet0"
>>                  Interface "vnet0"
>>              Port "vxlan0"
>>                  Interface "vxlan0"
>>          ovs_version: "2.0.2"
>>
>> vnet0 is a vm backend device,  and the end is the same configuration. then I use netperf to test throughput  in vm (netperf -H **** -t TCP_STREAM -l 10 -- -m 1460),
>> the result is 3-4 Gbit/sec, the  improvement  is not obvious,   and I also confused there is no aggregation  packets (length > mtu) in the end vm.   so I want to know what
>> wrong ?   or how to test the function ?
>>
>
> As things are set in 3.14 and AFAIK also in RHEL 7.0, for GRO/VXLAN to come into play you need to run over a NIC which supports RX checksum offload too, is this the case?
>
> Also, the configuration you run with isn't the typical play of VXLAN with OVS... I didn't try it out and this week being out to LPC.
>
> Did you try the usual track of running OVS VXLAN port?e.g as explained in the Example section of [1]
>
> Or.
>
> [1] http://community.mellanox.com/docs/DOC-1446
>
> Or.
>
>
>
> .
>
thank you for your reply, Gerlitz .

my test environment use mellanox ConnectX-3 Pro nic ,  as I know the nic support Rx  checksum offload.  but I am not confirm if should  I  do some special configure?
or the nic driver or firmware need update  ?  also , I have used redhat7.0 ovs vxlan to test with the similar configure as before, but there is also no improvement . 

the nic infomation:

04:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

root@localhost:~# ethtool -i eth4
driver: mlx4_en
version: 2.0(Dec 2011)
firmware-version:  2.31.5050
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

^ permalink raw reply

* Re: [PATCH] ipv6: notify userspace when we added or changed an ipv6 token
From: Daniel Borkmann @ 2014-10-13  9:46 UTC (permalink / raw)
  To: Lubomir Rintel
  Cc: netdev, linux-kernel, David S. Miller, Hannes Frederic Sowa
In-Reply-To: <1412950112-15593-1-git-send-email-lkundrak@v3.sk>

On 10/10/2014 04:08 PM, Lubomir Rintel wrote:
> NetworkManager might want to know that it changed when the router advertisement
> arrives.
>
> Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Daniel Borkmann <dborkman@redhat.com>
> ---
>   net/ipv6/addrconf.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 3e118df..3d11390 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4528,6 +4528,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
>   	}
>
>   	write_unlock_bh(&idev->lock);
> +	netdev_state_change(dev);

I'm wondering why netdev_state_change()? You are probably
only after the netlink notification that is being invoked,
i.e. rtmsg_ifinfo(RTM_NEWLINK, ...), and don't strictly want
to call the device notifier chain.

Perhaps it might be better to define a new RTM_SETTOKEN, and
just call inet6_ifinfo_notify(RTM_SETTOKEN, idev) as this is
only idev specific anyway?

>   	addrconf_verify_rtnl();
>   	return 0;
>   }
>

^ permalink raw reply

* [patch net] ipv4: fix nexthop attlen check in fib_nh_match
From: Jiri Pirko @ 2014-10-13  9:54 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber

fib_nh_match does not match nexthops correctly. Example:

This command is not successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv4/fib_semantics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 5b6efb3..f99f41b 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
 			return 1;
 
 		attrlen = rtnh_attrlen(rtnh);
-		if (attrlen < 0) {
+		if (attrlen > 0) {
 			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
 
 			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
-- 
1.9.3

^ permalink raw reply related

* [PATCH] ipv4: dst_entry leak in ip_append_data()
From: Vasily Averin @ 2014-10-13 10:17 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Eric Dumazet

Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")

If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
that creates skb and adds it to sk_write_queue.

If skb was added successfully following ip_push_pending_frames() call
reassign dst entry from cork to skb, and kfree_skb frees dst_entry.

However nobody frees stolen dst_entry if skb was not added into sk_write_queue.

Signed-off-by: Vasily Averin <vvs@parallels.com>
---
 net/ipv4/ip_output.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..cc7b579 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1120,6 +1120,15 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	return 0;
 }
 
+static void ip_cork_release(struct inet_cork *cork)
+{
+	cork->flags &= ~IPCORK_OPT;
+	kfree(cork->opt);
+	cork->opt = NULL;
+	dst_release(cork->dst);
+	cork->dst = NULL;
+}
+
 /*
  *	ip_append_data() and ip_append_page() can make one large IP datagram
  *	from many pieces of data. Each pieces will be holded on the socket
@@ -1152,9 +1161,14 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 		transhdrlen = 0;
 	}
 
-	return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
+	err = __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
 				sk_page_frag(sk), getfrag,
 				from, length, transhdrlen, flags);
+
+	if (skb_queue_empty(&sk->sk_write_queue))
+		ip_cork_release(&inet->cork.base);
+
+	return err;
 }
 
 ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
@@ -1304,15 +1318,6 @@ error:
 	return err;
 }
 
-static void ip_cork_release(struct inet_cork *cork)
-{
-	cork->flags &= ~IPCORK_OPT;
-	kfree(cork->opt);
-	cork->opt = NULL;
-	dst_release(cork->dst);
-	cork->dst = NULL;
-}
-
 /*
  *	Combined all pending IP fragments on the socket as one IP datagram
  *	and push them out.
-- 
1.9.1

^ permalink raw reply related

* Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 10:52 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Carles Pagès, linux-arm-kernel

Hello,

on the 7th of January 2014 ths patch was applied:
https://lkml.org/lkml/2014/1/7/307

[PATCH v2] net: Do not enable tx-nocache-copy by default
        
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
sent corrupted. I think this machine has something special about the cache.

Enabling back this tx-nocache-copy (as it used to be before the patch) the
transfers work fine again. I think that most people, encountering this problem,
completely disable the tx offload instead of enabling back this setting.

Is this an ARM kernel problem regarding this platform?

Thank you,
Lluís

^ permalink raw reply

* Re: [PATCH linux v3 1/1] fs/proc: use a rb tree for the directory entries
From: Nicolas Dichtel @ 2014-10-13 11:14 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, ebiederm, akpm, adobriyan, rui.xiang, viro, oleg, gorcunov,
	kirill.shutemov, grant.likely, tytso, Linus Torvalds
In-Reply-To: <1412672559-5256-2-git-send-email-nicolas.dichtel@6wind.com>

Le 07/10/2014 11:02, Nicolas Dichtel a écrit :
> The current implementation for the directories in /proc is using a single
> linked list. This is slow when handling directories with large numbers of
> entries (eg netdevice-related entries when lots of tunnels are opened).
>
> This patch replaces this linked list by a red-black tree.
>
> Here are some numbers:
>
> dummy30000.batch contains 30 000 times 'link add type dummy'.
>
> Before the patch:
> $ time ip -b dummy30000.batch
> real	2m31.950s
> user	0m0.440s
> sys	2m21.440s
> $ time rmmod dummy
> real	1m35.764s
> user	0m0.000s
> sys	1m24.088s
>
> After the patch:
> $ time ip -b dummy30000.batch
> real	2m0.874s
> user	0m0.448s
> sys	1m49.720s
> $ time rmmod dummy
> real	1m13.988s
> user	0m0.000s
> sys	1m1.008s
>
> The idea of improving this part was suggested by
> Thierry Herbelot <thierry.herbelot@6wind.com>.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Acked-by: David S. Miller <davem@davemloft.net>
> ---

I'm not sure who is in charge of taking this patch. Should I resend it to
someone else or is it already included in a tree?


Thank you,
Nicolas

^ permalink raw reply

* Fwd: micrel: ksz8051 badly detected as ksz8031
From: Angelo Dureghello @ 2014-10-13 11:23 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <543A7A2A.5040700@gmail.com>



I confirm this seems to be an issue of recent versions of kernel driver 
"micrel.c".

I just compiled kernel 3.17.0 with micrel.c driver 3.5.1, link is up and 
running.

If you need debug info, or to test a patch, let me know.

Regards
angelo

^ permalink raw reply

* Re: [PATCH] netfilter: release skbuf when nlmsg put fail
From: Florian Westphal @ 2014-10-13 11:42 UTC (permalink / raw)
  To: Houcheng Lin
  Cc: pablo, kaber, kadlec, davem, netfilter-devel, coreteam, netdev,
	Linux Kernel Mailing List
In-Reply-To: <CAL8JtxAqDhOXooLtOebSBHtKxwE=sLFqW8B-VgtCzsr-M4OD7g@mail.gmail.com>

Houcheng Lin <houcheng@gmail.com> wrote:
> When system is under heavy loading, the __nfulnl_send() may may failed
> to put nlmsg into skbuf of nfulnl_instance. If not clear the skbuff on failed,
> the __nfulnl_send() will still try to put next nlmsg onto this half-full skbuf
> and cause the user program can never receive packet.
> 
> This patch fix this issue by releasing skbuf immediately after nlmst put
> failed.

Did you observe such problem or is this based on code reading?
I ask because nflog should make sure we always have enough room left in
skb to append a done message, see nfulnl_log_packet():

if (inst->skb &&
    size > skb_tailroom(inst->skb) - sizeof(struct nfgenmsg)) {
	/* flush skb */

Your patch fixes such 'can never send' skb condition by leaking the
skb.  So at the very least you would need to call kfree_skb(), and
perhaps also add WARN_ON() so we catch this and can fix up the size
accounting?

^ permalink raw reply

* Please reply
From: Jose Calvache @ 2014-10-13 11:50 UTC (permalink / raw)


Dear Sir/Madam, Here is a pdf attachment of my proposal to you. Please
read and reply I would be grateful. Jose Calvache

^ permalink raw reply

* Re: [patch net] ipv4: fix nexthop attlen check in fib_nh_match
From: Eric Dumazet @ 2014-10-13 12:22 UTC (permalink / raw)
  To: Jiri Pirko, Thomas Graf; +Cc: netdev, davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1413194063-10354-1-git-send-email-jiri@resnulli.us>

On Mon, 2014-10-13 at 11:54 +0200, Jiri Pirko wrote:
> fib_nh_match does not match nexthops correctly. Example:
> 
> This command is not successful and route is removed. After this patch
> applied, the route is correctly matched and result is:
> RTNETLINK answers: No such process
> 
> Please consider this for stable trees as well.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  net/ipv4/fib_semantics.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 5b6efb3..f99f41b 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
>  			return 1;
>  
>  		attrlen = rtnh_attrlen(rtnh);
> -		if (attrlen < 0) {
> +		if (attrlen > 0) {
>  			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
>  
>  			nla = nla_find(attrs, attrlen, RTA_GATEWAY);

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")

Good catch, thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Eric Dumazet @ 2014-10-13 12:26 UTC (permalink / raw)
  To: Lluís Batlle i Rossell
  Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <20141013105246.GD1972@vicerveza.homeunix.net>

On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> Hello,
> 
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
> 
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>         
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.
> 
> Enabling back this tx-nocache-copy (as it used to be before the patch) the
> transfers work fine again. I think that most people, encountering this problem,
> completely disable the tx offload instead of enabling back this setting.
> 
> Is this an ARM kernel problem regarding this platform?

Which NIC and driver is this exactly ?

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 12:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <1413203171.9362.81.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, Oct 13, 2014 at 05:26:11AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> > Hello,
> > 
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> > 
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >         
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> > 
> > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > transfers work fine again. I think that most people, encountering this problem,
> > completely disable the tx offload instead of enabling back this setting.
> > 
> > Is this an ARM kernel problem regarding this platform?
> 
> Which NIC and driver is this exactly ?

According to dmesg in 3.10.1:
[    7.858872] mv643xx_eth: MV-643xx 10/100/1000 ethernet driver version 1.4
[    7.866001] mv643xx_eth_port mv643xx_eth_port.0 eth0: port 0 with MAC address 00:50:43:01:d1:bb

Regards,
Lluís.

^ permalink raw reply

* [PATCH net] tcp: TCP Small Queues and strange attractors
From: Eric Dumazet @ 2014-10-13 13:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.

Tested:

lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM      eth1 595448.00 1190564.00  38381.09 1760253.12      0.00      0.00      1.00
06:16:19 AM      eth1 594858.00 1189686.00  38340.76 1758952.72      0.00      0.00      0.00
06:16:20 AM      eth1 597017.00 1194019.00  38480.79 1765370.29      0.00      0.00      1.00
06:16:21 AM      eth1 595450.00 1190936.00  38380.19 1760805.05      0.00      0.00      0.00
06:16:22 AM      eth1 596385.00 1193096.00  38442.56 1763976.29      0.00      0.00      1.00
06:16:23 AM      eth1 598155.00 1195978.00  38552.97 1768264.60      0.00      0.00      0.00
06:16:24 AM      eth1 594405.00 1188643.00  38312.57 1757414.89      0.00      0.00      1.00
06:16:25 AM      eth1 593366.00 1187154.00  38252.16 1755195.83      0.00      0.00      0.00
06:16:26 AM      eth1 593188.00 1186118.00  38232.88 1753682.57      0.00      0.00      1.00
06:16:27 AM      eth1 596301.00 1192241.00  38440.94 1762733.09      0.00      0.00      0.00
Average:         eth1 595457.30 1190843.50  38381.69 1760664.84      0.00      0.00      0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
 backlog 7606336b 2513p requeues 167982 
 backlog 224072b 74p requeues 566 
 backlog 581376b 192p requeues 5598 
 backlog 181680b 60p requeues 1070 
 backlog 5305056b 1753p requeues 110166    // Here, this TX queue is attracting flows
 backlog 157456b 52p requeues 1758 
 backlog 672216b 222p requeues 3025 
 backlog 60560b 20p requeues 24541 
 backlog 448144b 148p requeues 21258 

lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM      eth1 1397632.00 2795397.00  90081.87 4133031.26      0.00      0.00      1.00
06:16:47 AM      eth1 1396874.00 2793614.00  90032.99 4130385.46      0.00      0.00      0.00
06:16:48 AM      eth1 1395842.00 2791600.00  89966.46 4127409.67      0.00      0.00      1.00
06:16:49 AM      eth1 1395528.00 2791017.00  89946.17 4126551.24      0.00      0.00      0.00
06:16:50 AM      eth1 1397891.00 2795716.00  90098.74 4133497.39      0.00      0.00      1.00
06:16:51 AM      eth1 1394951.00 2789984.00  89908.96 4125022.51      0.00      0.00      0.00
06:16:52 AM      eth1 1394608.00 2789190.00  89886.90 4123851.36      0.00      0.00      1.00
06:16:53 AM      eth1 1395314.00 2790653.00  89934.33 4125983.09      0.00      0.00      0.00
06:16:54 AM      eth1 1396115.00 2792276.00  89984.25 4128411.21      0.00      0.00      1.00
06:16:55 AM      eth1 1396829.00 2793523.00  90030.19 4130250.28      0.00      0.00      0.00
Average:         eth1 1396158.40 2792297.00  89987.09 4128439.35      0.00      0.00      0.50

lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
 backlog 7900052b 2609p requeues 173287 
 backlog 878120b 290p requeues 589 
 backlog 1068884b 354p requeues 5621 
 backlog 996212b 329p requeues 1088 
 backlog 984100b 325p requeues 115316 
 backlog 956848b 316p requeues 1781 
 backlog 1080996b 357p requeues 3047 
 backlog 975016b 322p requeues 24571 
 backlog 990156b 327p requeues 21274 

(All 8 TX queues get a fair share of the traffic)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c |   26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8d4eac793700..4a7e97811d71 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -839,26 +839,38 @@ void tcp_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
 	struct tcp_sock *tp = tcp_sk(sk);
+	int wmem;
+
+	/* Keep one reference on sk_wmem_alloc.
+	 * Will be released by sk_free() from here or tcp_tasklet_func()
+	 */
+	wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);
+
+	/* If this softirq is serviced by ksoftirqd, we are likely under stress.
+	 * Wait until our queues (qdisc + devices) are drained.
+	 * This gives :
+	 * - less callbacks to tcp_write_xmit(), reducing stress (batches)
+	 * - chance for incoming ACK (processed by another cpu maybe)
+	 *   to migrate this flow (skb->ooo_okay will be eventually set)
+	 */
+	if (wmem >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
+		goto out;
 
 	if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) &&
 	    !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) {
 		unsigned long flags;
 		struct tsq_tasklet *tsq;
 
-		/* Keep a ref on socket.
-		 * This last ref will be released in tcp_tasklet_func()
-		 */
-		atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc);
-
 		/* queue this socket to tasklet queue */
 		local_irq_save(flags);
 		tsq = &__get_cpu_var(tsq_tasklet);
 		list_add(&tp->tsq_node, &tsq->head);
 		tasklet_schedule(&tsq->tasklet);
 		local_irq_restore(flags);
-	} else {
-		sock_wfree(skb);
+		return;
 	}
+out:
+	sk_free(sk);
 }
 
 /* This routine actually transmits TCP packets queued in by

^ permalink raw reply related

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Andrew Lunn @ 2014-10-13 14:21 UTC (permalink / raw)
  To: Lluís Batlle i Rossell
  Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <20141013105246.GD1972@vicerveza.homeunix.net>

On Mon, Oct 13, 2014 at 12:52:46PM +0200, Lluís Batlle i Rossell wrote:
> Hello,
> 
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
> 
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>         
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.

Hi Lluís

Please could you describe your test setup. I would like to try to
reproduce the problem. I have a machine based on kirkwood 6282 and the
same ethernet.

Thanks
	Andrew

^ permalink raw reply

* Network optimality (was Re: [PATCH net-next] qdisc: validate skb without holding lock_
From: Dave Taht @ 2014-10-13 14:22 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, Eric Dumazet, netdev@vger.kernel.org, Tom Herbert,
	Hannes Frederic Sowa, Florian Westphal, Daniel Borkmann,
	Jamal Hadi Salim, Alexander Duyck, John Fastabend,
	Toke Høiland-Jørgensen

When I first got cc'd on these threads, and saw netperf-wrapper being
used on it,
I thought: "Oh god, I've created a monster.". My intent with helping create
such a measurement tool was not to routinely drive a network to saturation
but to be able to measure the impact on latency of doing so.

I was trying get reasonable behavior when a "router" went into overload.

Servers, on the other hand, have more options to avoid overload than
routers do. There's been a great deal of really nice work on that
front. I love all that.

and I like BQL because it provides enough backpressure to be able to
do smarter things about scheduling packets higher in the stack. (life
pre-BQL cost some hair)

But tom once told me me "BQL's objective is to keep the hardware busy".
It uses an MIAD controller instead of a more sane AIMD one, in particular,
I'd much rather it ramped down to smaller values after
absorbing a burst.

My objective is always to keep the *network's behavior optimal*,
minimizing bursts, and subsequent tail loss on the other side,
and responding quickly to loss, and doing that by
preserving to the highest extent possible the ack clocking that a
fluid model has. I LOVE BQL for providing more backpressure than has
ever existed before, and I know it's incredibly difficult to have fluid models
in a conventional cpu architecture that has to do other stuff.

But in order to get the best results for network behavior I'm willing to
sacrifice a great deal of cpu, interrupts, whatever it takes! to get the
most packets to all the destinations specified, whatever the workload,
with the *minimum amount of latency between ack and reply* possible.

What I'd hoped for in the new bulking and rcu stuff was to be able to
see a net reduction in TSO/GSO Size, and/or BQL's size, and I also did
keep hoping for some profiles on sch_fq, and for more complex
benchmarking of dozens or hundreds of realistically sized TCP flows
(in both directions) to exercise it all.

Some of the data presented showed that a single BQL'd queue was >400K,
and with hardware multi-queue, 128K, when TSO and GSO were used, but
with hardware multi-queue and no TSO/GSO, BQL was closer to 30K.

This said to me that the maximum "right" size for a TSO/GSO "packet" was
closer to 12k in this environment, and the right size for BQL, 30k,
before it started exerting backpressure to the qdisc.

This would reduce the potential inter-flow network latency by a factor
of 10 on the single hw queue scenario, and 4 in the multi queue one.

It would probably cost some interrupts, and in scenarios lacking
packet loss, throughput, but in other scenarios with lots of flows
each flow will ramp up in speed, faster, as you reduce the RTTs.
Paying attention to this will also push profiling activities into
areas of the stack that might be profitable.

I would very much like to have profiles of happens now both here and
elsewhere in the stack with this new code with TSO/GSO sizes capped
thusly and BQL capped to 30k, and a smarter qdisc like fq used.

2) Most of the time, a server is not driving the wire to saturation. If
   it is, you are doing something wrong. The BQL queues are empty, or
   nearly so, so the instant someone creates a qdisc queue, it
   drains.

But: if there are two or more flows under contention, creating a qdisc queue
    better multiplexing the results is highly desirable, and the stack
   should be smart enough to make that overload only last briefly.

   This is part of why I'm unfond of the deep and persistent BQL queues as we
get today.

3) Pure ack-only workloads are rare. It is a useful test case, but...

4) I thought the ring-cleanup optimization was rather interesting and
   could be made more dynamic.

5) I remain amazed at the vast improvements in throughput, reductions in
interrupts, lockless operation and the RCU stuff that have come out of
this so far, but had to make these points in the hope that the big picture
is retained.

It does no good to blast packets through the network unless there is a
high probability that they will actually be received on the other side.

thanks for listening.

^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 14:31 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linux-kernel, netdev, Carles Pagès, linux-arm-kernel
In-Reply-To: <20141013142156.GE26864@lunn.ch>

Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
reproduce here.

As for the hardware, it's an old sheevaplug board.

On Mon, Oct 13, 2014 at 04:21:56PM +0200, Andrew Lunn wrote:
> On Mon, Oct 13, 2014 at 12:52:46PM +0200, Lluís Batlle i Rossell wrote:
> > Hello,
> > 
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> > 
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >         
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> 
> Hi Lluís
> 
> Please could you describe your test setup. I would like to try to
> reproduce the problem. I have a machine based on kirkwood 6282 and the
> same ethernet.
> 
> Thanks
> 	Andrew

^ permalink raw reply

* [patch net repost] ipv4: fix nexthop attlen check in fib_nh_match
From: Jiri Pirko @ 2014-10-13 14:34 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, edumazet, tgraf

fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
---
reposted with example (it was missing for some reason in the original post)

 net/ipv4/fib_semantics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 5b6efb3..f99f41b 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
 			return 1;
 
 		attrlen = rtnh_attrlen(rtnh);
-		if (attrlen < 0) {
+		if (attrlen > 0) {
 			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
 
 			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
-- 
1.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox