* Please reply
From: Jose Calvache @ 2014-10-13 11:50 UTC (permalink / raw)
Dear Sir/Madam, Here is a pdf attachment of my proposal to you. Please
read and reply I would be grateful. Jose Calvache
^ permalink raw reply
* Re: [PATCH] netfilter: release skbuf when nlmsg put fail
From: Florian Westphal @ 2014-10-13 11:42 UTC (permalink / raw)
To: Houcheng Lin
Cc: pablo, kaber, kadlec, davem, netfilter-devel, coreteam, netdev,
Linux Kernel Mailing List
In-Reply-To: <CAL8JtxAqDhOXooLtOebSBHtKxwE=sLFqW8B-VgtCzsr-M4OD7g@mail.gmail.com>
Houcheng Lin <houcheng@gmail.com> wrote:
> When system is under heavy loading, the __nfulnl_send() may may failed
> to put nlmsg into skbuf of nfulnl_instance. If not clear the skbuff on failed,
> the __nfulnl_send() will still try to put next nlmsg onto this half-full skbuf
> and cause the user program can never receive packet.
>
> This patch fix this issue by releasing skbuf immediately after nlmst put
> failed.
Did you observe such problem or is this based on code reading?
I ask because nflog should make sure we always have enough room left in
skb to append a done message, see nfulnl_log_packet():
if (inst->skb &&
size > skb_tailroom(inst->skb) - sizeof(struct nfgenmsg)) {
/* flush skb */
Your patch fixes such 'can never send' skb condition by leaking the
skb. So at the very least you would need to call kfree_skb(), and
perhaps also add WARN_ON() so we catch this and can fix up the size
accounting?
^ permalink raw reply
* Fwd: micrel: ksz8051 badly detected as ksz8031
From: Angelo Dureghello @ 2014-10-13 11:23 UTC (permalink / raw)
To: netdev@vger.kernel.org
In-Reply-To: <543A7A2A.5040700@gmail.com>
I confirm this seems to be an issue of recent versions of kernel driver
"micrel.c".
I just compiled kernel 3.17.0 with micrel.c driver 3.5.1, link is up and
running.
If you need debug info, or to test a patch, let me know.
Regards
angelo
^ permalink raw reply
* Re: [PATCH linux v3 1/1] fs/proc: use a rb tree for the directory entries
From: Nicolas Dichtel @ 2014-10-13 11:14 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: davem, ebiederm, akpm, adobriyan, rui.xiang, viro, oleg, gorcunov,
kirill.shutemov, grant.likely, tytso, Linus Torvalds
In-Reply-To: <1412672559-5256-2-git-send-email-nicolas.dichtel@6wind.com>
Le 07/10/2014 11:02, Nicolas Dichtel a écrit :
> The current implementation for the directories in /proc is using a single
> linked list. This is slow when handling directories with large numbers of
> entries (eg netdevice-related entries when lots of tunnels are opened).
>
> This patch replaces this linked list by a red-black tree.
>
> Here are some numbers:
>
> dummy30000.batch contains 30 000 times 'link add type dummy'.
>
> Before the patch:
> $ time ip -b dummy30000.batch
> real 2m31.950s
> user 0m0.440s
> sys 2m21.440s
> $ time rmmod dummy
> real 1m35.764s
> user 0m0.000s
> sys 1m24.088s
>
> After the patch:
> $ time ip -b dummy30000.batch
> real 2m0.874s
> user 0m0.448s
> sys 1m49.720s
> $ time rmmod dummy
> real 1m13.988s
> user 0m0.000s
> sys 1m1.008s
>
> The idea of improving this part was suggested by
> Thierry Herbelot <thierry.herbelot@6wind.com>.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Acked-by: David S. Miller <davem@davemloft.net>
> ---
I'm not sure who is in charge of taking this patch. Should I resend it to
someone else or is it already included in a tree?
Thank you,
Nicolas
^ permalink raw reply
* Regarding tx-nocache-copy in the Sheevaplug
From: Lluís Batlle i Rossell @ 2014-10-13 10:52 UTC (permalink / raw)
To: linux-kernel, netdev; +Cc: Carles Pagès, linux-arm-kernel
Hello,
on the 7th of January 2014 ths patch was applied:
https://lkml.org/lkml/2014/1/7/307
[PATCH v2] net: Do not enable tx-nocache-copy by default
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
sent corrupted. I think this machine has something special about the cache.
Enabling back this tx-nocache-copy (as it used to be before the patch) the
transfers work fine again. I think that most people, encountering this problem,
completely disable the tx offload instead of enabling back this setting.
Is this an ARM kernel problem regarding this platform?
Thank you,
Lluís
^ permalink raw reply
* [PATCH] ipv4: dst_entry leak in ip_append_data()
From: Vasily Averin @ 2014-10-13 10:17 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, Eric Dumazet
Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
that creates skb and adds it to sk_write_queue.
If skb was added successfully following ip_push_pending_frames() call
reassign dst entry from cork to skb, and kfree_skb frees dst_entry.
However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
Signed-off-by: Vasily Averin <vvs@parallels.com>
---
net/ipv4/ip_output.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..cc7b579 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1120,6 +1120,15 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
return 0;
}
+static void ip_cork_release(struct inet_cork *cork)
+{
+ cork->flags &= ~IPCORK_OPT;
+ kfree(cork->opt);
+ cork->opt = NULL;
+ dst_release(cork->dst);
+ cork->dst = NULL;
+}
+
/*
* ip_append_data() and ip_append_page() can make one large IP datagram
* from many pieces of data. Each pieces will be holded on the socket
@@ -1152,9 +1161,14 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
transhdrlen = 0;
}
- return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
+ err = __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
sk_page_frag(sk), getfrag,
from, length, transhdrlen, flags);
+
+ if (skb_queue_empty(&sk->sk_write_queue))
+ ip_cork_release(&inet->cork.base);
+
+ return err;
}
ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
@@ -1304,15 +1318,6 @@ error:
return err;
}
-static void ip_cork_release(struct inet_cork *cork)
-{
- cork->flags &= ~IPCORK_OPT;
- kfree(cork->opt);
- cork->opt = NULL;
- dst_release(cork->dst);
- cork->dst = NULL;
-}
-
/*
* Combined all pending IP fragments on the socket as one IP datagram
* and push them out.
--
1.9.1
^ permalink raw reply related
* [patch net] ipv4: fix nexthop attlen check in fib_nh_match
From: Jiri Pirko @ 2014-10-13 9:54 UTC (permalink / raw)
To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber
fib_nh_match does not match nexthops correctly. Example:
This command is not successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process
Please consider this for stable trees as well.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
net/ipv4/fib_semantics.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 5b6efb3..f99f41b 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -537,7 +537,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi)
return 1;
attrlen = rtnh_attrlen(rtnh);
- if (attrlen < 0) {
+ if (attrlen > 0) {
struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
nla = nla_find(attrs, attrlen, RTA_GATEWAY);
--
1.9.3
^ permalink raw reply related
* Re: [PATCH] ipv6: notify userspace when we added or changed an ipv6 token
From: Daniel Borkmann @ 2014-10-13 9:46 UTC (permalink / raw)
To: Lubomir Rintel
Cc: netdev, linux-kernel, David S. Miller, Hannes Frederic Sowa
In-Reply-To: <1412950112-15593-1-git-send-email-lkundrak@v3.sk>
On 10/10/2014 04:08 PM, Lubomir Rintel wrote:
> NetworkManager might want to know that it changed when the router advertisement
> arrives.
>
> Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Daniel Borkmann <dborkman@redhat.com>
> ---
> net/ipv6/addrconf.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 3e118df..3d11390 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4528,6 +4528,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
> }
>
> write_unlock_bh(&idev->lock);
> + netdev_state_change(dev);
I'm wondering why netdev_state_change()? You are probably
only after the netlink notification that is being invoked,
i.e. rtmsg_ifinfo(RTM_NEWLINK, ...), and don't strictly want
to call the device notifier chain.
Perhaps it might be better to define a new RTM_SETTOKEN, and
just call inet6_ifinfo_notify(RTM_SETTOKEN, idev) as this is
only idev specific anyway?
> addrconf_verify_rtnl();
> return 0;
> }
>
^ permalink raw reply
* Re: vxlan gro problem ?
From: yinpeijun @ 2014-10-13 9:14 UTC (permalink / raw)
To: Or Gerlitz, qinchuanyu
Cc: netdev, linux-kernel, lichunhe, wangfakai, liuyongan
In-Reply-To: <543ADBA3.5030305@mellanox.com>
On 2014/10/13 3:50, Or Gerlitz wrote:
> On 10/8/2014 10:46 AM, yinpeijun wrote:
>> Hi all,
>> recently Linux 3.14 has been released and I find the networking has added udp gro and vxlan gro funtion, then I use the redhat 7.0(there is also add this funtion)
>> to test, I use kernel vxlan module and create a vxlan device then attach the device to ovs bridge , the configure as follow:
>> root@25:~$ ip link
>> 15: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT
>> link/ether be:e1:ae:3d:8b:f2 brd ff:ff:ff:ff:ff:ff
>> 16: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq master ovs-system state UNKNOWN mode DEFAULT qlen 5000
>> root@25:~$ ovs-vsctl show
>> aa1294f3-9952-4393-b2b5-54e9a6eb76ee
>> Bridge ovs-vx
>> Port ovs-vx
>> Interface ovs-vx
>> type: internal
>> Port "vnet0"
>> Interface "vnet0"
>> Port "vxlan0"
>> Interface "vxlan0"
>> ovs_version: "2.0.2"
>>
>> vnet0 is a vm backend device, and the end is the same configuration. then I use netperf to test throughput in vm (netperf -H **** -t TCP_STREAM -l 10 -- -m 1460),
>> the result is 3-4 Gbit/sec, the improvement is not obvious, and I also confused there is no aggregation packets (length > mtu) in the end vm. so I want to know what
>> wrong ? or how to test the function ?
>>
>
> As things are set in 3.14 and AFAIK also in RHEL 7.0, for GRO/VXLAN to come into play you need to run over a NIC which supports RX checksum offload too, is this the case?
>
> Also, the configuration you run with isn't the typical play of VXLAN with OVS... I didn't try it out and this week being out to LPC.
>
> Did you try the usual track of running OVS VXLAN port?e.g as explained in the Example section of [1]
>
> Or.
>
> [1] http://community.mellanox.com/docs/DOC-1446
>
> Or.
>
>
>
> .
>
thank you for your reply, Gerlitz .
my test environment use mellanox ConnectX-3 Pro nic , as I know the nic support Rx checksum offload. but I am not confirm if should I do some special configure?
or the nic driver or firmware need update ? also , I have used redhat7.0 ovs vxlan to test with the similar configure as before, but there is also no improvement .
the nic infomation:
04:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
root@localhost:~# ethtool -i eth4
driver: mlx4_en
version: 2.0(Dec 2011)
firmware-version: 2.31.5050
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
^ permalink raw reply
* RE: [PATCH] net: wireless: brcm80211: brcmfmac: dhd_sdio.c: Cleaning up missing null-terminate in conjunction with strncpy
From: David Laight @ 2014-10-13 8:55 UTC (permalink / raw)
To: 'Rickard Strandqvist', Brett Rudley, Arend van Spriel
Cc: Hante Meuleman, John W. Linville, Pieter-Paul Giesberts,
Daniel Kim, linux-wireless@vger.kernel.org,
brcm80211-dev-list@broadcom.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <1413071551-15372-1-git-send-email-rickard_strandqvist@spectrumdigital.se>
From: Rickard Strandqvist
> Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
> And changed from using strncpy to strlcpy to simplify code.
I think you should return an error if the strings get truncated.
Silent truncation is going to lead to issues at some point in the future
(in some places).
> Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
> ---
> drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c | 25 ++++++++++----------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> index f55f625..d20d4e6 100644
> --- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> +++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
> @@ -670,7 +670,6 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci,
> struct brcmf_sdio_dev *sdiodev)
> {
> int i;
> - uint fw_len, nv_len;
> char end;
>
> for (i = 0; i < ARRAY_SIZE(brcmf_fwname_data); i++) {
> @@ -684,25 +683,25 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci,
> return -ENODEV;
> }
>
> - fw_len = sizeof(sdiodev->fw_name) - 1;
> - nv_len = sizeof(sdiodev->nvram_name) - 1;
> /* check if firmware path is provided by module parameter */
> if (brcmf_firmware_path[0] != '\0') {
> - strncpy(sdiodev->fw_name, brcmf_firmware_path, fw_len);
> - strncpy(sdiodev->nvram_name, brcmf_firmware_path, nv_len);
> - fw_len -= strlen(sdiodev->fw_name);
> - nv_len -= strlen(sdiodev->nvram_name);
> + strlcpy(sdiodev->fw_name, brcmf_firmware_path,
> + sizeof(sdiodev->fw_name));
> + strlcpy(sdiodev->nvram_name, brcmf_firmware_path,
> + sizeof(sdiodev->nvram_name));
>
> end = brcmf_firmware_path[strlen(brcmf_firmware_path) - 1];
If you are doing a strlen() here, you could use the length for the copy
and/or use it to avoid the strcat().
> if (end != '/') {
> - strncat(sdiodev->fw_name, "/", fw_len);
> - strncat(sdiodev->nvram_name, "/", nv_len);
> - fw_len--;
> - nv_len--;
> + strlcat(sdiodev->fw_name, "/",
> + sizeof(sdiodev->fw_name));
> + strlcat(sdiodev->nvram_name, "/",
> + sizeof(sdiodev->nvram_name));
> }
> }
> - strncat(sdiodev->fw_name, brcmf_fwname_data[i].bin, fw_len);
> - strncat(sdiodev->nvram_name, brcmf_fwname_data[i].nv, nv_len);
> + strlcat(sdiodev->fw_name, brcmf_fwname_data[i].bin,
> + sizeof(sdiodev->fw_name));
> + strlcat(sdiodev->nvram_name, brcmf_fwname_data[i].nv,
> + sizeof(sdiodev->nvram_name));
I assume something ensures that fw_name[0] == 0 here.
David
^ permalink raw reply
* RE: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: David Laight @ 2014-10-13 8:32 UTC (permalink / raw)
To: 'David Miller', alexander.h.duyck@redhat.com
Cc: eric.dumazet@gmail.com, alexander.duyck@gmail.com,
netdev@vger.kernel.org
In-Reply-To: <20141010.135851.1743803688676076555.davem@davemloft.net>
From: David Miller
> From: Alexander Duyck <alexander.h.duyck@redhat.com>
> Date: Fri, 10 Oct 2014 09:50:17 -0700
>
> > If I just use get_unaligned that is pretty easy in terms of cleanup
> > for the ports and IPv4 addresses, the IPv6 will still be a significant
> > hurdle to overcome though.
>
> Actually, it's not that simple.
>
> When the compiler sees things like "th->doff" it will load the 32-bit
> word that 4-bit field contains and extract the value using shifts and
> masking.
>
> So we might need to sprinkle a "attribute((packed))" here and there
> to make it work.
Marking a structure 'packed' forces the compiler to generate byte accesses.
It is enough to mark the 32bit members with __attribute__((aligned(2)))
(or use a typedef for u32 that includes that attribute).
Then the compiler will use 16bit accesses for that field.
David
^ permalink raw reply
* Re: ipv4: net namespace does not inherit network configurations
From: zhuyj @ 2014-10-13 8:20 UTC (permalink / raw)
To: Cong Wang
Cc: David S. Miller, Hong Zhiguo, LKML, netdev, Tao, Yue,
Alexandre Dietsch, zhuyj
In-Reply-To: <CAHA+R7MC2gKeSWqR8pDDX26D-Th4BG1AveM+HseMMPjLBJuWDw@mail.gmail.com>
Hi, Miller && Cong
Can we merge this patch into kernel mainline? since the independence
between ipv4 and ipv6 is inconsistent even in the latest linux
kernel(3.17-rc7),
that is, the net namespace is independent in ipv6 while it is not in ipv4.
Thanks a lot.
Zhu Yanjun
On 07/30/2014 01:48 AM, Cong Wang wrote:
> On Tue, Jul 29, 2014 at 2:29 AM, zhuyj <zyjzyj2000@gmail.com> wrote:
>> Hi,all
>>
>> I did a test on kernel3.16 rc6:
>>
>> root@qemu1:~# echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
>> root@qemu1:~# echo 1 > /proc/sys/net/ipv4/conf/all/forwarding
>> root@qemu1:~# ip netns list
>> root@qemu1:~# ip netns add fib1
>> root@qemu1:~# ip netns exec fib1 bash
>> root@qemu1:~# cat /proc/sys/net/ipv6/conf/all/forwarding
>> 0
>> root@qemu1:~# cat /proc/sys/net/ipv4/conf/all/forwarding
>> 1
>>
>> The behavior of ipv4 and ipv6 is very inconsistent. I checked
>> the kernel source code. I found that from this patch
>> [ipv6: fix bad free of addrconf_init_net], the above difference
>> appeared.
>>
>> Since a net namespace is independent to another. That is, there
>> is no any relationship between the net namespaces. So the behavior
>> of ipv4 is not correct.
>>
> Well, they are already independent, not shared, just that the initial
> value is duplicated from init_net for IPv4.
>
> This change might break existing applications which rely on this
> behavior, but given IPv6 change is almost the same, I think it's ok.
>
> BTW, you need to submit a patch as normal, instead of as an attachment.
>
^ permalink raw reply
* Re: [PATCH] net: can: esd_usb2: fix memory leak on disconnect
From: Matthias Fuchs @ 2014-10-13 8:05 UTC (permalink / raw)
To: Alexey Khoroshilov, Wolfgang Grandegger, Marc Kleine-Budde
Cc: linux-can@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, ldv-project@linuxtesting.org
In-Reply-To: <1412973067-29707-1-git-send-email-khoroshilov@ispras.ru>
Hi Alexey,
On 10/10/2014 10:31 PM, Alexey Khoroshilov wrote:
> It seems struct esd_usb2 dev is not deallocated on disconnect.
>
> The patch adds the deallocation.
>
> Found by Linux Driver Verification project (linuxtesting.org).
>
> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
> ---
> drivers/net/can/usb/esd_usb2.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/can/usb/esd_usb2.c b/drivers/net/can/usb/esd_usb2.c
> index b7c9e8b11460..7a90075529c3 100644
> --- a/drivers/net/can/usb/esd_usb2.c
> +++ b/drivers/net/can/usb/esd_usb2.c
> @@ -1143,6 +1143,7 @@ static void esd_usb2_disconnect(struct usb_interface *intf)
> }
> }
> unlink_all_urbs(dev);
> + kfree(dev);
> }
> }
>
>
thanks for pointing this out. Marc, can you please catch this up.
Matthias
Acked-by: Matthias Fuchs <matthias.fuchs@esd.eu>
^ permalink raw reply
* [PATCH v4 25/25] virtio-rng: refactor probe error handling
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini,
Matt Mackall, Herbert Xu, Amos Kong, Sasha Levin
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
Code like
vi->vq = NULL;
kfree(vi)
does not make sense.
Clean it up, use goto error labels for cleanup.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/char/hw_random/virtio-rng.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
index 132c9cc..72295ea 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -109,8 +109,8 @@ static int probe_common(struct virtio_device *vdev)
vi->index = index = ida_simple_get(&rng_index_ida, 0, 0, GFP_KERNEL);
if (index < 0) {
- kfree(vi);
- return index;
+ err = index;
+ goto err_ida;
}
sprintf(vi->name, "virtio_rng.%d", index);
init_completion(&vi->have_data);
@@ -128,13 +128,16 @@ static int probe_common(struct virtio_device *vdev)
vi->vq = virtio_find_single_vq(vdev, random_recv_done, "input");
if (IS_ERR(vi->vq)) {
err = PTR_ERR(vi->vq);
- vi->vq = NULL;
- kfree(vi);
- ida_simple_remove(&rng_index_ida, index);
- return err;
+ goto err_find;
}
return 0;
+
+err_find:
+ ida_simple_remove(&rng_index_ida, index);
+err_ida:
+ kfree(vi);
+ return err;
}
static void remove_common(struct virtio_device *vdev)
--
MST
^ permalink raw reply related
* [PATCH v4 24/25] virtio_scsi: drop scan callback
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini,
James E.J. Bottomley
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
Enable VQs early like we do for restore.
This makes it possible to drop the scan callback,
moving scanning into the probe function, and making
code simpler.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/scsi/virtio_scsi.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 327eba0..5f022ff 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -860,17 +860,6 @@ static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
virtscsi_vq->vq = vq;
}
-static void virtscsi_scan(struct virtio_device *vdev)
-{
- struct Scsi_Host *shost = virtio_scsi_host(vdev);
- struct virtio_scsi *vscsi = shost_priv(shost);
-
- if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
- virtscsi_kick_event_all(vscsi);
-
- scsi_scan_host(shost);
-}
-
static void virtscsi_remove_vqs(struct virtio_device *vdev)
{
struct Scsi_Host *sh = virtio_scsi_host(vdev);
@@ -1007,10 +996,13 @@ static int virtscsi_probe(struct virtio_device *vdev)
err = scsi_add_host(shost, &vdev->dev);
if (err)
goto scsi_add_host_failed;
- /*
- * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan()
- * after VIRTIO_CONFIG_S_DRIVER_OK has been set..
- */
+
+ virtio_device_ready(vdev);
+
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
+ virtscsi_kick_event_all(vscsi);
+
+ scsi_scan_host(shost);
return 0;
scsi_add_host_failed:
@@ -1090,7 +1082,6 @@ static struct virtio_driver virtio_scsi_driver = {
.driver.owner = THIS_MODULE,
.id_table = id_table,
.probe = virtscsi_probe,
- .scan = virtscsi_scan,
#ifdef CONFIG_PM_SLEEP
.freeze = virtscsi_freeze,
.restore = virtscsi_restore,
--
MST
^ permalink raw reply related
* [PATCH v4 23/25] virtio_balloon: enable VQs early on restore
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.
To fix, call virtio_device_ready before using VQ.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/virtio/virtio_balloon.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 25ebe8e..9629fad 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -538,6 +538,8 @@ static int virtballoon_restore(struct virtio_device *vdev)
if (ret)
return ret;
+ virtio_device_ready(vdev);
+
fill_balloon(vb, towards_target(vb));
update_balloon_size(vb);
return 0;
--
MST
^ permalink raw reply related
* [PATCH v4 22/25] virtio_scsi: fix race on device removal
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
James E.J. Bottomley, virtualization, Paolo Bonzini, Amit Shah,
v9fs-developer, David S. Miller
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
We cancel event work on device removal, but an interrupt
could trigger immediately after this, and queue it
again.
To fix, set a flag.
Loosely based on patch by Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/scsi/virtio_scsi.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 501838d..327eba0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -110,6 +110,9 @@ struct virtio_scsi {
/* CPU hotplug notifier */
struct notifier_block nb;
+ /* Protected by event_vq lock */
+ bool stop_events;
+
struct virtio_scsi_vq ctrl_vq;
struct virtio_scsi_vq event_vq;
struct virtio_scsi_vq req_vqs[];
@@ -303,6 +306,11 @@ static void virtscsi_cancel_event_work(struct virtio_scsi *vscsi)
{
int i;
+ /* Stop scheduling work before calling cancel_work_sync. */
+ spin_lock_irq(&vscsi->event_vq.vq_lock);
+ vscsi->stop_events = true;
+ spin_unlock_irq(&vscsi->event_vq.vq_lock);
+
for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++)
cancel_work_sync(&vscsi->event_list[i].work);
}
@@ -390,7 +398,8 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
{
struct virtio_scsi_event_node *event_node = buf;
- queue_work(system_freezable_wq, &event_node->work);
+ if (!vscsi->stop_events)
+ queue_work(system_freezable_wq, &event_node->work);
}
static void virtscsi_event_done(struct virtqueue *vq)
--
MST
^ permalink raw reply related
* [PATCH v4 21/25] virito_scsi: use freezable WQ for events
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini,
James E.J. Bottomley
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Michael S. Tsirkin noticed a race condition:
we reset device on freeze, but system WQ is still
running so it might try adding bufs to a VQ meanwhile.
To fix, switch to handling events from the freezable WQ.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/scsi/virtio_scsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 2b565b3..501838d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -390,7 +390,7 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
{
struct virtio_scsi_event_node *event_node = buf;
- schedule_work(&event_node->work);
+ queue_work(system_freezable_wq, &event_node->work);
}
static void virtscsi_event_done(struct virtqueue *vq)
--
MST
^ permalink raw reply related
* [PATCH v4 20/25] virtio_net: enable VQs early on restore
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio net violated this
rule by using receive VQs within restore.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/net/virtio_net.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3551417..6b6e136 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1912,6 +1912,8 @@ static int virtnet_restore(struct virtio_device *vdev)
if (err)
return err;
+ virtio_device_ready(vdev);
+
if (netif_running(vi->dev)) {
for (i = 0; i < vi->curr_queue_pairs; i++)
if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
--
MST
^ permalink raw reply related
* [PATCH v4 19/25] virtio_console: enable VQs early on restore
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
Greg Kroah-Hartman, virtualization, Arnd Bergmann, Paolo Bonzini,
Amit Shah, v9fs-developer, David S. Miller
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio console violated this
rule by adding inbufs, which causes the VQ to be used directly within
restore.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/char/virtio_console.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 6ebe8f6..2ae843f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -2184,6 +2184,8 @@ static int virtcons_restore(struct virtio_device *vdev)
if (ret)
return ret;
+ virtio_device_ready(portdev->vdev);
+
if (use_multiport(portdev))
fill_queue(portdev->c_ivq, &portdev->c_ivq_lock);
--
MST
^ permalink raw reply related
* [PATCH v4 18/25] virtio_scsi: enable VQs early on restore
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini,
James E.J. Bottomley
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio scsi violated
this rule on restore by kicking event vq within restore.
To fix, call virtio_device_ready before using event queue.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/scsi/virtio_scsi.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 0642ce3..2b565b3 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -1054,6 +1054,8 @@ static int virtscsi_restore(struct virtio_device *vdev)
return err;
}
+ virtio_device_ready(vdev);
+
if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
virtscsi_kick_event_all(vscsi);
--
MST
^ permalink raw reply related
* [PATCH v4 17/25] virtio_blk: enable VQs early on restore
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio block violated
this rule on restore by restarting queues, which might in theory
cause the VQ to be used directly within restore.
To fix, call virtio_device_ready before using starting queues.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/block/virtio_blk.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 46b04bf..1c95af5 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -804,10 +804,13 @@ static int virtblk_restore(struct virtio_device *vdev)
int ret;
ret = init_vq(vdev->priv);
- if (!ret)
- blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
+ if (ret)
+ return ret;
+
+ virtio_device_ready(vdev);
- return ret;
+ blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
+ return 0;
}
#endif
--
MST
^ permalink raw reply related
* [PATCH v4 16/25] virtio_scsi: move kick event out from virtscsi_init
From: Michael S. Tsirkin @ 2014-10-13 7:51 UTC (permalink / raw)
To: linux-kernel
Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
Christian Borntraeger, David S. Miller, Paolo Bonzini,
James E.J. Bottomley
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
We currently kick event within virtscsi_init,
before host is fully initialized.
This can in theory confuse guest if device
consumes the buffers immediately.
To fix, move virtscsi_kick_event_all out to scan/restore.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/scsi/virtio_scsi.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index eee1bc0..0642ce3 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -853,7 +853,11 @@ static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
static void virtscsi_scan(struct virtio_device *vdev)
{
- struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv;
+ struct Scsi_Host *shost = virtio_scsi_host(vdev);
+ struct virtio_scsi *vscsi = shost_priv(shost);
+
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
+ virtscsi_kick_event_all(vscsi);
scsi_scan_host(shost);
}
@@ -916,9 +920,6 @@ static int virtscsi_init(struct virtio_device *vdev,
virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
- if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
- virtscsi_kick_event_all(vscsi);
-
err = 0;
out:
@@ -1048,8 +1049,13 @@ static int virtscsi_restore(struct virtio_device *vdev)
return err;
err = register_hotcpu_notifier(&vscsi->nb);
- if (err)
+ if (err) {
vdev->config->del_vqs(vdev);
+ return err;
+ }
+
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
+ virtscsi_kick_event_all(vscsi);
return err;
}
--
MST
^ permalink raw reply related
* [PATCH v4 15/25] virtio_net: fix use after free on allocation failure
From: Michael S. Tsirkin @ 2014-10-13 7:50 UTC (permalink / raw)
To: linux-kernel
Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
virtualization, Paolo Bonzini, Amit Shah, v9fs-developer,
David S. Miller
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
In the extremely unlikely event that driver initialization fails after
RX buffers are added, virtio net frees RX buffers while VQs are
still active, potentially causing device to use a freed buffer.
To fix, reset device first - same as we do on device removal.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/net/virtio_net.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 430f3ae..3551417 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1830,6 +1830,8 @@ static int virtnet_probe(struct virtio_device *vdev)
return 0;
free_recv_bufs:
+ vi->vdev->config->reset(vdev);
+
free_receive_bufs(vi);
unregister_netdev(dev);
free_vqs:
--
MST
^ permalink raw reply related
* [PATCH v4 14/25] 9p/trans_virtio: enable VQs early
From: Michael S. Tsirkin @ 2014-10-13 7:50 UTC (permalink / raw)
To: linux-kernel
Cc: linux-s390, Ron Minnich, kvm, linux-scsi, Christian Borntraeger,
Eric Van Hensbergen, netdev, virtualization, Paolo Bonzini,
Amit Shah, v9fs-developer, David S. Miller
In-Reply-To: <1413114332-626-1-git-send-email-mst-v4@redhat.com>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
net/9p/trans_virtio.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 6940d8f..766ba48 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -575,6 +575,8 @@ static int p9_virtio_probe(struct virtio_device *vdev)
/* Ceiling limit to avoid denial of service attacks */
chan->p9_max_pages = nr_free_buffer_pages()/4;
+ virtio_device_ready(vdev);
+
mutex_lock(&virtio_9p_lock);
list_add_tail(&chan->chan_list, &virtio_chan_list);
mutex_unlock(&virtio_9p_lock);
--
MST
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox