* [PATCH v3] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Jia-Ju Bai @ 2018-04-11 10:24 UTC (permalink / raw)
To: jon.maloy, ying.xue, davem
Cc: netdev, tipc-discussion, linux-kernel, Jia-Ju Bai
tipc_mon_create() is never called in atomic context.
The call chain ending up at tipc_mon_create() is:
[1] tipc_mon_create() <- tipc_enable_bearer() <- tipc_nl_bearer_enable()
tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this function
is not called in atomic context.
Despite never getting called from atomic context,
tipc_mon_create() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of successful allocation.
This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
v2:
* Modify the description of GFP_ATOMIC in v1.
Thank Eric for good advice.
v3:
* Modify wrong text in description in v2.
Thank Ying for good advice.
---
net/tipc/monitor.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
index 9e109bb..9714d80 100644
--- a/net/tipc/monitor.c
+++ b/net/tipc/monitor.c
@@ -604,9 +604,9 @@ int tipc_mon_create(struct net *net, int bearer_id)
if (tn->monitors[bearer_id])
return 0;
- mon = kzalloc(sizeof(*mon), GFP_ATOMIC);
- self = kzalloc(sizeof(*self), GFP_ATOMIC);
- dom = kzalloc(sizeof(*dom), GFP_ATOMIC);
+ mon = kzalloc(sizeof(*mon), GFP_KERNEL);
+ self = kzalloc(sizeof(*self), GFP_KERNEL);
+ dom = kzalloc(sizeof(*dom), GFP_KERNEL);
if (!mon || !self || !dom) {
kfree(mon);
kfree(self);
--
1.9.1
^ permalink raw reply related
* Re: [PATCH v3] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Ying Xue @ 2018-04-11 10:26 UTC (permalink / raw)
To: Jia-Ju Bai, jon.maloy, davem; +Cc: netdev, tipc-discussion, linux-kernel
In-Reply-To: <1523442262-4823-1-git-send-email-baijiaju1990@gmail.com>
On 04/11/2018 06:24 PM, Jia-Ju Bai wrote:
> tipc_mon_create() is never called in atomic context.
>
> The call chain ending up at tipc_mon_create() is:
> [1] tipc_mon_create() <- tipc_enable_bearer() <- tipc_nl_bearer_enable()
> tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this function
> is not called in atomic context.
>
> Despite never getting called from atomic context,
> tipc_mon_create() calls kzalloc() with GFP_ATOMIC,
> which does not sleep for allocation.
> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
> which can sleep and improve the possibility of successful allocation.
>
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
>
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
> ---
> v2:
> * Modify the description of GFP_ATOMIC in v1.
> Thank Eric for good advice.
> v3:
> * Modify wrong text in description in v2.
> Thank Ying for good advice.
> ---
> net/tipc/monitor.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
> index 9e109bb..9714d80 100644
> --- a/net/tipc/monitor.c
> +++ b/net/tipc/monitor.c
> @@ -604,9 +604,9 @@ int tipc_mon_create(struct net *net, int bearer_id)
> if (tn->monitors[bearer_id])
> return 0;
>
> - mon = kzalloc(sizeof(*mon), GFP_ATOMIC);
> - self = kzalloc(sizeof(*self), GFP_ATOMIC);
> - dom = kzalloc(sizeof(*dom), GFP_ATOMIC);
> + mon = kzalloc(sizeof(*mon), GFP_KERNEL);
> + self = kzalloc(sizeof(*self), GFP_KERNEL);
> + dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> if (!mon || !self || !dom) {
> kfree(mon);
> kfree(self);
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply
* Re: TCP one-by-one acking - RFC interpretation question
From: Michal Kubecek @ 2018-04-11 10:58 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, Yuchung Cheng, Neal Cardwell,
Kenneth Klette Jonassen
In-Reply-To: <f98ec5a8-391d-5b42-2690-0f58e8122f5a@gmail.com>
On Fri, Apr 06, 2018 at 09:49:58AM -0700, Eric Dumazet wrote:
> Cc Neal and Yuchung if they missed this thread.
>
> On 04/06/2018 08:03 AM, Michal Kubecek wrote:
> > On Fri, Apr 06, 2018 at 05:01:29AM -0700, Eric Dumazet wrote:
> >>
> >>
> >> On 04/06/2018 03:05 AM, Michal Kubecek wrote:
> >>> Hello
> >>>
> >>> I encountered a strange behaviour of some (non-linux) TCP stack which
> >>> I believe is incorrect but support engineers from the company producing
> >>> it claim is OK.
> >>>
> >>> Assume a client (sender, Linux 4.4 kernel) sends a stream of MSS sized
> >>> segments but segments 2, 4 and 6 do not reach the server (receiver):
> >>>
> >>> ACK SAK SAK SAK
> >>> +-------+-------+-------+-------+-------+-------+-------+
> >>> | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
> >>> +-------+-------+-------+-------+-------+-------+-------+
> >>> 34273 35701 37129 38557 39985 41413 42841 44269
> >>>
> >>> When segment 2 is retransmitted after RTO timeout, normal response would
> >>> be ACK-ing segment 3 (38557) with SACK for 5 and 7 (39985-41413 and
> >>> 42841-44269).
> >>>
> >>> However, this server stack responds with two separate ACKs:
> >>>
> >>> - ACK 37129, SACK 37129-38557 39985-41413 42841-44269
> >>> - ACK 38557, SACK 39985-41413 42841-44269
> >>
> >> Hmmm... Yes this seems very very wrong and lazy.
> >>
> >> Have you verified behavior of more recent linux kernel to such threats ?
> >
> > No, unfortunately the problem was only encountered by our customer in
> > production environment (they tried to reproduce in a test lab but no
> > luck). They are running backups to NFS server and it happens from time
> > to time (in the order of hours, IIUC). So it would be probably hard to
> > let them try with more recent kernel.
> >
> > On the other hand, they reported that SLE11 clients (kernel 3.0) do not
> > run into this kind of problem. It was originally reported as a
> > a regression on migration from SLE11-SP4 (3.0 kernel) to SLE12-SP2 (4.4
> > kernel) and the problem was reported as "SLE12-SP2 is ignoring dupacks"
> > (which seems to be mostly caused by the switch to RACK).
> >
> > It also seems that part of the problem is specific packet loss pattern
> > where at some point, many packets are lost in "every second" pattern.
> > The customer finally started to investigate this problem and it seems it
> > has something to do with their bonding setup (they provided no details,
> > my guess is packets are divided over two paths and one of them fails).
> >
> >> packetdrill test would be relatively easy to write.
> >
> > I'll try but I have very little experience with writing packetdrill
> > scripts so it will probably take some time.
> >
> >> Regardless of this broken alien stack, we might be able to work around
> >> this faster than the vendor is able to fix and deploy a new stack.
> >>
> >> ( https://en.wikipedia.org/wiki/Robustness_principle )
> >> Be conservative in what you do, be liberal in what you accept from
> >> others...
> >
> > I was thinking about this a bit. "Fixing" the acknowledgment number
> > could do the trick but it doesn't feel correct. We might use the fact
> > that TSecr of both ACKs above matches TSval of the retransmission which
> > triggered them so that RTT calculated from timestamp would be the right
> > one. So perhaps something like "prefer timestamp RTT if measured RTT
> > seems way too off". But I'm not sure if it couldn't break other use
> > cases where (high) measured RTT is actually correct, rather than (low)
> > timestamp RTT.
I stared at the code some more and apparently I was wrong. I put my
tracing right after the check
if (flag & FLAG_SACK_RENEGING) {
in tcp_check_sack_reneging() and completely missed Neal's commit
5ae344c949e7 ("tcp: reduce spurious retransmits due to transient SACK
reneging") which deals with exactly this kind of broken TCP stacks
sending a series of acks for each segment from out of order queue.
Therefore it seems the events in my log weren't actual SACK reneging
events and the SACK scoreboard wasn't cleared.
There is something else I don't understand, though. In the case of
acking previously sacked and never retransmitted segment,
tcp_clean_rtx_queue() calculates the parameters for tcp_ack_update_rtt()
using
if (sack->first_sackt.v64) {
sack_rtt_us = skb_mstamp_us_delta(&now,
&sack->first_sackt);
ca_rtt_us = skb_mstamp_us_delta(&now,
&sack->last_sackt);
}
(in 4.4; mainline code replaces &now with tp->tcp_mstamp). If I read the
code correctly, both sack->first_sackt and sack->last_sackt contain
timestamps of initial segment transmission. This would mean we use the
time difference between the initial transmission and now, i.e. including
the RTO of the lost packet).
IMHO we should take the actual round trip time instead, i.e. the
difference between the original transmission and the time the packet
sacked (first time). It seems we have been doing this before commit
31231a8a8730 ("tcp: improve RTT from SACK for CC").
Michal Kubecek
^ permalink raw reply
* [RFC/PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4 version
From: Andre Naujoks @ 2018-04-11 11:02 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, David Ahern,
Greg Kroah-Hartman, Andre Naujoks, Maciej Żenczykowski,
Shaohua Li, Paolo Abeni, Thomas Gleixner, Kate Stewart,
Philippe Ombredanne, linux-kernel, netdev
Hi.
I was running into a problem, when trying to join multiple multicast groups
on a single socket and thus binding to the any-address on said socket. I
received traffic from multicast groups, I did not join on that socket and
was at first surprised by that. After reading some old e-mails/threads,
which came to the conclusion "It is, as it is."
(e.g https://marc.info/?l=linux-kernel&m=115815686626791&w=2), I discovered
the IPv4 socketoption IP_MULTICAST_ALL, which, when disabled, does exactly
what I would expect from a socket by default.
I propose a socket option for IPv6, which does the same and has the same
default as the IPv4 version. My first thought was, to just apply
IP_MULTICAST_ALL to a ipv6 socket, but that would change the behavior of
current applications and would probably be a big no-no.
Regards
Andre
>From 473653086c05a3de839c3504885053f6254c7bc5 Mon Sep 17 00:00:00 2001
From: Andre Naujoks <nautsch2@gmail.com>
Date: Wed, 11 Apr 2018 12:38:28 +0200
Subject: [PATCH] Add a socketoption IPV6_MULTICAST_ALL analogue to the IPV4
version
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.
A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.
Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.
The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.
Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
---
include/linux/ipv6.h | 3 ++-
include/uapi/linux/in6.h | 1 +
net/ipv6/af_inet6.c | 1 +
net/ipv6/ipv6_sockglue.c | 11 +++++++++++
net/ipv6/mcast.c | 2 +-
5 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 8415bf1a9776..495e834c1367 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -274,7 +274,8 @@ struct ipv6_pinfo {
*/
dontfrag:1,
autoflowlabel:1,
- autoflowlabel_set:1;
+ autoflowlabel_set:1,
+ mc_all:1;
__u8 min_hopcount;
__u8 tclass;
__be32 rcv_flowinfo;
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index ed291e55f024..71d82fe15b03 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -177,6 +177,7 @@ struct in6_flowlabel_req {
#define IPV6_V6ONLY 26
#define IPV6_JOIN_ANYCAST 27
#define IPV6_LEAVE_ANYCAST 28
+#define IPV6_MULTICAST_ALL 29
/* IPV6_MTU_DISCOVER values */
#define IPV6_PMTUDISC_DONT 0
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 8da0b513f188..7844cd9d2f10 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -209,6 +209,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
np->hop_limit = -1;
np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
np->mc_loop = 1;
+ np->mc_all = 1;
np->pmtudisc = IPV6_PMTUDISC_WANT;
np->repflow = net->ipv6.sysctl.flowlabel_reflect;
sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4d780c7f0130..b2bc1942a2ee 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -664,6 +664,13 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
retv = ipv6_sock_ac_drop(sk, mreq.ipv6mr_ifindex, &mreq.ipv6mr_acaddr);
break;
}
+ case IPV6_MULTICAST_ALL:
+ if (optlen < sizeof(int))
+ goto e_inval;
+ np->mc_all = valbool;
+ retv = 0;
+ break;
+
case MCAST_JOIN_GROUP:
case MCAST_LEAVE_GROUP:
{
@@ -1255,6 +1262,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
val = np->mcast_oif;
break;
+ case IPV6_MULTICAST_ALL:
+ val = np->mc_all;
+ break;
+
case IPV6_UNICAST_IF:
val = (__force int)htonl((__u32) np->ucast_oif);
break;
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 793159d77d8a..623ad00eb3c2 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -622,7 +622,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
}
if (!mc) {
rcu_read_unlock();
- return true;
+ return np->mc_all;
}
read_lock(&mc->sflock);
psl = mc->sflist;
--
2.17.0
^ permalink raw reply related
* [PATCH] lan78xx: Avoid spurious kevent 4 "error"
From: Phil Elwell @ 2018-04-11 11:02 UTC (permalink / raw)
To: Woojung Huh, Microchip Linux Driver Support, netdev, linux-usb,
linux-kernel
Cc: Phil Elwell
lan78xx_defer_event generates an error message whenever the work item
is already scheduled. lan78xx_open defers three events -
EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware
of the likelihood (or certainty) of an error message, the DEV_OPEN
event is added to the set of pending events directly, relying on
the subsequent deferral of the EVENT_LINK_RESET call to schedule the
work. Take the same precaution with EVENT_STAT_UPDATE to avoid a
totally unnecessary error message.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
---
drivers/net/usb/lan78xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 32cf217..3102374 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2507,7 +2507,7 @@ static void lan78xx_init_stats(struct lan78xx_net *dev)
dev->stats.rollover_max.eee_tx_lpi_transitions = 0xFFFFFFFF;
dev->stats.rollover_max.eee_tx_lpi_time = 0xFFFFFFFF;
- lan78xx_defer_kevent(dev, EVENT_STAT_UPDATE);
+ set_bit(EVENT_STAT_UPDATE, &dev->flags);
}
static int lan78xx_open(struct net_device *net)
--
2.7.4
^ permalink raw reply related
* [PATCH v2 net] slip: Check if rstate is initialized before uncompressing
From: Tejaswi Tanikella @ 2018-04-11 11:04 UTC (permalink / raw)
To: netdev, davem; +Cc: g.nault
On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.
89.666953: <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
89.666965: <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
89.666978: <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
89.666991: <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
89.667005: <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
89.667027: <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
89.667040: <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
89.667053: <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
89.667065: <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
89.667085: <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
89.667098: <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
89.667117: <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
89.667131: <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c
./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
ip_fast_csum at arch/arm64/include/asm/checksum.h:40
(inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615
Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.
Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
---
drivers/net/slip/slhc.c | 5 +++++
include/net/slhc_vj.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index 5782733..f4e93f5 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -509,6 +509,10 @@ struct slcompress *
if(x < 0 || x > comp->rslot_limit)
goto bad;
+ /* Check if the cstate is initialized */
+ if (!comp->rstate[x].initialized)
+ goto bad;
+
comp->flags &=~ SLF_TOSS;
comp->recv_current = x;
} else {
@@ -673,6 +677,7 @@ struct slcompress *
if (cs->cs_tcp.doff > 5)
memcpy(cs->cs_tcpopt, icp + ihl*4 + sizeof(struct tcphdr), (cs->cs_tcp.doff - 5) * 4);
cs->cs_hsize = ihl*2 + cs->cs_tcp.doff*2;
+ cs->initialized = true;
/* Put headers back on packet
* Neither header checksum is recalculated
*/
diff --git a/include/net/slhc_vj.h b/include/net/slhc_vj.h
index 8716d59..8fcf890 100644
--- a/include/net/slhc_vj.h
+++ b/include/net/slhc_vj.h
@@ -127,6 +127,7 @@
*/
struct cstate {
byte_t cs_this; /* connection id number (xmit) */
+ bool initialized; /* true if initialized */
struct cstate *next; /* next in ring (xmit) */
struct iphdr cs_ip; /* ip/tcp hdr from most recent packet */
struct tcphdr cs_tcp;
--
1.9.1
^ permalink raw reply related
* [PATCH] cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
From: Bassem Boubaker @ 2018-04-11 11:15 UTC (permalink / raw)
Cc: bassem.boubaker, Oliver Neukum, linux-usb, netdev, linux-kernel
The Cinterion AHS8 is a 3G device with one embedded WWAN interface
using cdc_ether as a driver.
The modem is controlled via AT commands through the exposed TTYs.
AT+CGDCONT write command can be used to activate or deactivate a WWAN
connection for a PDP context defined with the same command. UE supports
one WWAN adapter.
Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
---
drivers/net/usb/cdc_ether.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index fff4b13..5c42cf8 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -902,6 +902,12 @@ static const struct usb_device_id products[] = {
USB_CDC_PROTO_NONE),
.driver_info = (unsigned long)&wwan_info,
}, {
+ /* Cinterion AHS3 modem by GEMALTO */
+ USB_DEVICE_AND_INTERFACE_INFO(0x1e2d, 0x0055, USB_CLASS_COMM,
+ USB_CDC_SUBCLASS_ETHERNET,
+ USB_CDC_PROTO_NONE),
+ .driver_info = (unsigned long)&wwan_info,
+}, {
USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_ETHERNET,
USB_CDC_PROTO_NONE),
.driver_info = (unsigned long) &cdc_info,
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
From: Oliver Neukum @ 2018-04-11 11:25 UTC (permalink / raw)
To: Bassem Boubaker; +Cc: linux-kernel, linux-usb, netdev
In-Reply-To: <1523445353-7631-1-git-send-email-bassem.boubaker@actia.fr>
Am Mittwoch, den 11.04.2018, 13:15 +0200 schrieb Bassem Boubaker:
> The Cinterion AHS8 is a 3G device with one embedded WWAN interface
> using cdc_ether as a driver.
>
> The modem is controlled via AT commands through the exposed TTYs.
>
> AT+CGDCONT write command can be used to activate or deactivate a WWAN
> connection for a PDP context defined with the same command. UE supports
> one WWAN adapter.
>
> Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
Acked-by: Oliver Neukum <oneukum@suse.com>
^ permalink raw reply
* Re: TCP one-by-one acking - RFC interpretation question
From: Michal Kubecek @ 2018-04-11 12:06 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, Yuchung Cheng, Neal Cardwell,
Kenneth Klette Jonassen
In-Reply-To: <20180411105837.bwnpoqvbra43kjub@unicorn.suse.cz>
On Wed, Apr 11, 2018 at 12:58:37PM +0200, Michal Kubecek wrote:
> There is something else I don't understand, though. In the case of
> acking previously sacked and never retransmitted segment,
> tcp_clean_rtx_queue() calculates the parameters for tcp_ack_update_rtt()
> using
>
> if (sack->first_sackt.v64) {
> sack_rtt_us = skb_mstamp_us_delta(&now,
> &sack->first_sackt);
> ca_rtt_us = skb_mstamp_us_delta(&now,
> &sack->last_sackt);
> }
>
> (in 4.4; mainline code replaces &now with tp->tcp_mstamp). If I read the
> code correctly, both sack->first_sackt and sack->last_sackt contain
> timestamps of initial segment transmission. This would mean we use the
> time difference between the initial transmission and now, i.e. including
> the RTO of the lost packet).
>
> IMHO we should take the actual round trip time instead, i.e. the
> difference between the original transmission and the time the packet
> sacked (first time). It seems we have been doing this before commit
> 31231a8a8730 ("tcp: improve RTT from SACK for CC").
Sorry for the noise, this was my misunderstanding, the first_sackt and
last_sackt values are only taken from segments newly sacked by ack
received right now, not those which were already sacked before.
The actual problem and unrealistic RTT measurements come from another
RFC violation I didn't mention before: the NAS doesn't follow RFC 2018
section 4 rule for ordering of SACK blocks. Rather than sending SACK
blocks three most recently received out-of-order blocks, it simply sends
first three ordered by sequence numbers. In the earlier example (odd
packets were received, even lost)
ACK SAK SAK SAK
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
34273 35701 37129 38557 39985 41413 42841 44269 45697 47125
it responds to retransmitted segment 2 by
1. ACK 37129, SACK 37129-38557 39985-41413 42841-44269
2. ACK 38557, SACK 39985-41413 42841-44269 45697-47125
This new SACK block 45697-47125 has not been retransmitted and as it
wasn't sacked before, it is considered newly sacked. Therefore it gets
processed and its deemed RTT (time since its original transmit time)
"poisons" the RTT calculation, leading to RTO spiraling up.
Thus if we want to work around the NAS behaviour, we would need to
recognize such new SACK block as "not really new" and ignore it for
first_sackt/last_sackt. I'm not sure if it's possible without
misinterpreting actually delayed out of order packets. Of course, it is
not clear if it's worth the effort to work around so severely broken TCP
implementations (two obvious RFC violations, even if we don't count the
one-by-one acking).
Michal Kubecek
^ permalink raw reply
* Re: [PATCH linux] net: fix deadlock while clearing neighbor proxy table
From: Wolfgang Bumiller @ 2018-04-11 12:17 UTC (permalink / raw)
To: David Miller; +Cc: netdev, yoshfuji
In-Reply-To: <20180410.110229.1597289057689247263.davem@davemloft.net>
David Miller wrote:
> From: Wolfgang Bumiller <w.bumiller@proxmox.com>
> Date: Tue, 10 Apr 2018 11:15:14 +0200
>
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > index 7b7a14abba28..601df647588c 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -292,7 +292,6 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
> > write_lock_bh(&tbl->lock);
> > neigh_flush_dev(tbl, dev);
> > pneigh_ifdown(tbl, dev);
> > - write_unlock_bh(&tbl->lock);
>
> If we are going to fix it this way, we need to annotate the code here in some
> way so that future readers understand why the tbl->lock is not being released
> here.
A better way would of course be nice, too, but I find it hard to find
one given how "far away" the IGMP and then output code are from this
point.
> One way is to add a comment.
>
> Another way is to rename pneigh_ifdown() to "pneigh_ifdown_and_unlock()".
Sure, I can send a v2 with whichever is preferred - personally I prefer
the rename as it'll be visible at both the calling & implementation
side.
^ permalink raw reply
* Re: [RFC PATCH v2 00/14] Introducing AF_XDP support
From: Björn Töpel @ 2018-04-11 12:17 UTC (permalink / raw)
To: William Tu
Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
Willem de Bruijn, Daniel Borkmann,
Linux Kernel Network Developers, Björn Töpel,
michael.lundkvist, Brandeburg, Jesse, Anjali Singhai Jain,
Zhang, Qi Z, ravineet.singh
In-Reply-To: <CALDO+SYvxrnqKy=7P2c_agWDJEn2spwNQr0ynE2-JTfWPAaEEg@mail.gmail.com>
2018-04-10 16:14 GMT+02:00 William Tu <u9012063@gmail.com>:
> On Mon, Apr 9, 2018 at 11:47 PM, Björn Töpel <bjorn.topel@gmail.com> wrote:
[...]
>>>
>>
>> So you've setup two identical UMEMs? Then you can just forward the
>> incoming Rx descriptor to the other netdev's Tx queue. Note, that you
>> only need to copy the descriptor, not the actual frame data.
>>
>
> Thanks!
> I will give it a try, I guess you're saying I can do below:
>
> int sfd1; // for device1
> int sfd2; // for device2
> ...
> // create 2 umem
> umem1 = calloc(1, sizeof(*umem));
> umem2 = calloc(1, sizeof(*umem));
>
> // allocate 1 shared buffer, 1 xdp_umem_reg
> posix_memalign(&bufs, ...)
> mr.addr = (__u64)bufs; // shared for umem1,2
> ...
>
> // umem reg the same mr
> setsockopt(sfd1, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr))
> setsockopt(sfd2, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr))
>
> // setup fill, completion, mmap for sfd1 and sfd2
> ...
>
> Since both device can put frame data in 'bufs', I only need to copy
> the descs between 2 umem1 and umem2. Am I understand correct?
>
Yup, spot on! umem1 and umem2 have the same layout/index "address
space", so you can just forward the descriptors and never touch the
data.
In the current RFC you are required to create both an Rx and Tx queue
to bind the socket, which is just weird for your "Rx on one device, Tx
to another" scenario. I'll fix that in the next RFC.
Björn
> Regards,
> William
^ permalink raw reply
* [PATCH net 0/2] Aquantia atlantic critical fixes 04/2018
From: Igor Russkikh @ 2018-04-11 12:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
Two regressions on latest 4.16 driver reported by users
Some of old FW (1.5.44) had a link management logic which prevents
driver to make clean reset. Driver of 4.16 has a full hardware reset
implemented and that broke the link and traffic on such a cards.
Second is oops on shutdown callback in case interface is already
closed or was never opened.
Igor Russkikh (2):
net: aquantia: Regression on reset with 1.x firmware
net: aquantia: oops when shutdown on already stopped device
drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 8 +++++---
.../net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 16 ++++++++++++++++
2 files changed, 21 insertions(+), 3 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH net 1/2] net: aquantia: Regression on reset with 1.x firmware
From: Igor Russkikh @ 2018-04-11 12:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1523449097.git.igor.russkikh@aquantia.com>
On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
is active. With this mode when motherboard gets powered (but no poweron
happens yet), NIC automatically enables powersave link and watches
for WOL packet.
This normally allows to powerup the PC after AC power failures.
Not all motherboards or bios settings gives power to PCI slots,
so this mode is not enabled on all the hardware.
4.16 linux driver introduced full hardware reset sequence
This is required since before that we had no NIC hardware
reset implemented and there were side effects of "not clean start".
But this full reset is incompatible with "dirty wake" WOL feature
it keeps the PHY link in a special mode forever. As a consequence,
driver sees no link and no traffic.
To fix this we forcibly change FW state to idle state before doing
the full reset. This makes FW to restore link state.
Fixes: c8c82eb net: aquantia: Introduce global AQC hardware reset sequence
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
.../net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 84d7f4d..e652d86 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -48,6 +48,8 @@
#define FORCE_FLASHLESS 0
static int hw_atl_utils_ver_match(u32 ver_expected, u32 ver_actual);
+static int hw_atl_utils_mpi_set_state(struct aq_hw_s *self,
+ enum hal_atl_utils_fw_state_e state);
int hw_atl_utils_initfw(struct aq_hw_s *self, const struct aq_fw_ops **fw_ops)
{
@@ -247,6 +249,20 @@ int hw_atl_utils_soft_reset(struct aq_hw_s *self)
self->rbl_enabled = (boot_exit_code != 0);
+ /* FW 1.x may bootup in an invalid POWER state (WOL feature).
+ * We should work around this by forcing its state back to DEINIT
+ */
+ if (!hw_atl_utils_ver_match(HW_ATL_FW_VER_1X,
+ aq_hw_read_reg(self,
+ HW_ATL_MPI_FW_VERSION))) {
+ int err = 0;
+
+ hw_atl_utils_mpi_set_state(self, MPI_DEINIT);
+ AQ_HW_WAIT_FOR((aq_hw_read_reg(self, HW_ATL_MPI_STATE_ADR) &
+ HW_ATL_MPI_STATE_MSK) == MPI_DEINIT,
+ 10, 1000U);
+ }
+
if (self->rbl_enabled)
return hw_atl_utils_soft_reset_rbl(self);
else
--
2.7.4
^ permalink raw reply related
* [PATCH net 2/2] net: aquantia: oops when shutdown on already stopped device
From: Igor Russkikh @ 2018-04-11 12:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1523449097.git.igor.russkikh@aquantia.com>
In case netdev is closed at the moment of pci shutdown, aq_nic_stop
gets called second time. napi_disable in that case hangs indefinitely.
In other case, if device was never opened at all, we get oops because
of null pointer access.
We should invoke aq_nic_stop conditionally, only if device is running
at the moment of shutdown.
Reported-by: David Arcari <darcari@redhat.com>
Fixes: 90869ddfefeb ("net: aquantia: Implement pci shutdown callback")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index c96a921..32f6d2e 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -951,9 +951,11 @@ void aq_nic_shutdown(struct aq_nic_s *self)
netif_device_detach(self->ndev);
- err = aq_nic_stop(self);
- if (err < 0)
- goto err_exit;
+ if (netif_running(self->ndev)) {
+ err = aq_nic_stop(self);
+ if (err < 0)
+ goto err_exit;
+ }
aq_nic_deinit(self);
err_exit:
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v1 net 0/3] lan78xx: Fixes and enhancements
From: Andrew Lunn @ 2018-04-11 12:55 UTC (permalink / raw)
To: Raghuram Chary J; +Cc: davem, netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180411072450.9809-1-raghuramchary.jallipalli@microchip.com>
On Wed, Apr 11, 2018 at 12:54:47PM +0530, Raghuram Chary J wrote:
> These series of patches have fix and enhancements for
> lan78xx driver.
Hi Raghuram
Please separate the fixes from the enhancements. The enhancements need
to wait until net-next re-opens in a weeks time. The first patch,
which is a real fix, can probably be accepted now.
Andrew
>
> Raghuram Chary J (3):
> lan78xx: PHY DSP registers initialization to address EEE link drop
> issues with long cables
> lan78xx: Add support to dump lan78xx registers
> lan78xx: Lan7801 Support for Fixed PHY
>
> drivers/net/phy/microchip.c | 178 ++++++++++++++++++++++++++++++++++++++++++-
> drivers/net/usb/Kconfig | 1 +
> drivers/net/usb/lan78xx.c | 96 ++++++++++++++++++++++-
> include/linux/microchipphy.h | 8 ++
> 4 files changed, 278 insertions(+), 5 deletions(-)
>
> --
> 2.16.2
>
^ permalink raw reply
* Re: [PATCH] lan78xx: Correctly indicate invalid OTP
From: Andrew Lunn @ 2018-04-11 12:57 UTC (permalink / raw)
To: Phil Elwell
Cc: Woojung Huh, Microchip Linux Driver Support, netdev, linux-usb,
linux-kernel
In-Reply-To: <1523440757-127451-1-git-send-email-phil@raspberrypi.org>
On Wed, Apr 11, 2018 at 10:59:17AM +0100, Phil Elwell wrote:
> lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
> content, but the value gets overwritten before it is returned and the
> read goes ahead anyway. Make the read conditional as it should be
> and preserve the error code.
Hi Phil
Do you know that the Fixes: tag should be for this? When did it break?
Thanks
Andrew
^ permalink raw reply
* [PATCH net] sctp: do not check port in sctp_inet6_cmp_addr
From: Xin Long @ 2018-04-11 12:58 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman
pf->cmp_addr() is called before binding a v6 address to the sock. It
should not check ports, like in sctp_inet_cmp_addr.
But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr,
sctp_v6_cmp_addr where it also compares the ports.
This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind
multiple duplicated IPv6 addresses after Commit 40b4f0fd74e4 ("sctp:
lack the check for ports in sctp_v6_cmp_addr").
This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr,
but do the proper check for both v6 addrs and v4mapped addrs.
Fixes: 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/sctp/ipv6.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f1fc48e..be4b72c 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -846,8 +846,8 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
const union sctp_addr *addr2,
struct sctp_sock *opt)
{
- struct sctp_af *af1, *af2;
struct sock *sk = sctp_opt2sk(opt);
+ struct sctp_af *af1, *af2;
af1 = sctp_get_af_specific(addr1->sa.sa_family);
af2 = sctp_get_af_specific(addr2->sa.sa_family);
@@ -863,10 +863,31 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
if (sctp_is_any(sk, addr1) || sctp_is_any(sk, addr2))
return 1;
- if (addr1->sa.sa_family != addr2->sa.sa_family)
+ if (addr1->sa.sa_family != addr2->sa.sa_family) {
+ if (addr1->sa.sa_family == AF_INET &&
+ addr2->sa.sa_family == AF_INET6 &&
+ ipv6_addr_v4mapped(&addr2->v6.sin6_addr))
+ if (addr2->v6.sin6_addr.s6_addr32[3] ==
+ addr1->v4.sin_addr.s_addr)
+ return 1;
+ if (addr2->sa.sa_family == AF_INET &&
+ addr1->sa.sa_family == AF_INET6 &&
+ ipv6_addr_v4mapped(&addr1->v6.sin6_addr))
+ if (addr1->v6.sin6_addr.s6_addr32[3] ==
+ addr2->v4.sin_addr.s_addr)
+ return 1;
+ return 0;
+ }
+
+ if (!ipv6_addr_equal(&addr1->v6.sin6_addr, &addr2->v6.sin6_addr))
+ return 0;
+
+ if ((ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) &&
+ addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id &&
+ addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id)
return 0;
- return af1->cmp_addr(addr1, addr2);
+ return 1;
}
/* Verify that the provided sockaddr looks bindable. Common verification,
--
2.1.0
^ permalink raw reply related
* RE: [PATCH v3] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in tipc_mon_create
From: Jon Maloy @ 2018-04-11 13:01 UTC (permalink / raw)
To: Ying Xue, Jia-Ju Bai, davem@davemloft.net
Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
linux-kernel@vger.kernel.org
In-Reply-To: <7bf48d79-54fe-421a-e04c-f2d6cd2c71e2@windriver.com>
> -----Original Message-----
> From: Ying Xue [mailto:ying.xue@windriver.com]
> Sent: Wednesday, April 11, 2018 06:27
> To: Jia-Ju Bai <baijiaju1990@gmail.com>; Jon Maloy
> <jon.maloy@ericsson.com>; davem@davemloft.net
> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v3] net: tipc: Replace GFP_ATOMIC with GFP_KERNEL in
> tipc_mon_create
>
> On 04/11/2018 06:24 PM, Jia-Ju Bai wrote:
> > tipc_mon_create() is never called in atomic context.
> >
> > The call chain ending up at tipc_mon_create() is:
> > [1] tipc_mon_create() <- tipc_enable_bearer() <-
> > tipc_nl_bearer_enable()
> > tipc_nl_bearer_enable() calls rtnl_lock(), which indicates this
> > function is not called in atomic context.
> >
> > Despite never getting called from atomic context,
> > tipc_mon_create() calls kzalloc() with GFP_ATOMIC, which does not
> > sleep for allocation.
> > GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
> which
> > can sleep and improve the possibility of successful allocation.
> >
> > This is found by a static analysis tool named DCNS written by myself.
> > And I also manually check it.
> >
> > Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>
> Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
>
> > ---
> > v2:
> > * Modify the description of GFP_ATOMIC in v1.
> > Thank Eric for good advice.
> > v3:
> > * Modify wrong text in description in v2.
> > Thank Ying for good advice.
> > ---
> > net/tipc/monitor.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c index
> > 9e109bb..9714d80 100644
> > --- a/net/tipc/monitor.c
> > +++ b/net/tipc/monitor.c
> > @@ -604,9 +604,9 @@ int tipc_mon_create(struct net *net, int bearer_id)
> > if (tn->monitors[bearer_id])
> > return 0;
> >
> > - mon = kzalloc(sizeof(*mon), GFP_ATOMIC);
> > - self = kzalloc(sizeof(*self), GFP_ATOMIC);
> > - dom = kzalloc(sizeof(*dom), GFP_ATOMIC);
> > + mon = kzalloc(sizeof(*mon), GFP_KERNEL);
> > + self = kzalloc(sizeof(*self), GFP_KERNEL);
> > + dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> > if (!mon || !self || !dom) {
> > kfree(mon);
> > kfree(self);
> >
^ permalink raw reply
* Re: [PATCH] lan78xx: Correctly indicate invalid OTP
From: Phil Elwell @ 2018-04-11 13:03 UTC (permalink / raw)
To: Andrew Lunn
Cc: Woojung Huh, Microchip Linux Driver Support, netdev, linux-usb,
linux-kernel
In-Reply-To: <20180411125737.GB6119@lunn.ch>
Hi Andrew.
On 11/04/2018 13:57, Andrew Lunn wrote:
> On Wed, Apr 11, 2018 at 10:59:17AM +0100, Phil Elwell wrote:
>> lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
>> content, but the value gets overwritten before it is returned and the
>> read goes ahead anyway. Make the read conditional as it should be
>> and preserve the error code.
>
> Hi Phil
>
> Do you know that the Fixes: tag should be for this? When did it break?
It's been broken since day 1, so:
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
^ permalink raw reply
* [PATCH][next] iwlwifi: mvm: remove division by size of sizeof(struct ieee80211_wmm_rule)
From: Colin King @ 2018-04-11 13:05 UTC (permalink / raw)
To: Johannes Berg, Emmanuel Grumbach, Luca Coelho,
Intel Linux Wireless, Kalle Valo, linux-wireless, netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
The subtraction of two struct ieee80211_wmm_rule pointers leaves a result
that is automatically scaled down by the size of the size of pointed-to
type, hence the division by sizeof(struct ieee80211_wmm_rule) is
bogus and should be removed.
Detected by CoverityScan, CID#1467777 ("Extra sizeof expression")
Fixes: 77e30e10ee28 ("iwlwifi: mvm: query regdb for wmm rule if needed")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index ca0174680af9..e78219057d2f 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -1007,8 +1007,7 @@ iwl_parse_nvm_mcc_info(struct device *dev, const struct iwl_cfg *cfg,
continue;
copy_rd->reg_rules[i].wmm_rule = d_wmm +
- (regd->reg_rules[i].wmm_rule - s_wmm) /
- sizeof(struct ieee80211_wmm_rule);
+ (regd->reg_rules[i].wmm_rule - s_wmm);
}
out:
--
2.17.0
^ permalink raw reply related
* Re: [PATCH v3 1/2] vhost: fix vhost_vq_access_ok() log check
From: Michael S. Tsirkin @ 2018-04-11 13:23 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtualization, Linus Torvalds, jasowang, netdev, syzkaller-bugs,
kvm, linux-kernel
In-Reply-To: <20180411023541.15776-2-stefanha@redhat.com>
On Wed, Apr 11, 2018 at 10:35:40AM +0800, Stefan Hajnoczi wrote:
> Commit d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log
> when IOTLB is enabled") introduced a regression. The logic was
> originally:
>
> if (vq->iotlb)
> return 1;
> return A && B;
>
> After the patch the short-circuit logic for A was inverted:
>
> if (A || vq->iotlb)
> return A;
> return B;
>
> This patch fixes the regression by rewriting the checks in the obvious
> way, no longer returning A when vq->iotlb is non-NULL (which is hard to
> understand).
>
> Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/vhost/vhost.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index bec722e41f58..fc805b7fad9d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1244,10 +1244,12 @@ static int vq_log_access_ok(struct vhost_virtqueue *vq,
> /* Caller should have vq mutex and device mutex */
> int vhost_vq_access_ok(struct vhost_virtqueue *vq)
> {
> - int ret = vq_log_access_ok(vq, vq->log_base);
> + if (!vq_log_access_ok(vq, vq->log_base))
> + return 0;
>
> - if (ret || vq->iotlb)
> - return ret;
> + /* Access validation occurs at prefetch time with IOTLB */
> + if (vq->iotlb)
> + return 1;
>
> return vq_access_ok(vq, vq->num, vq->desc, vq->avail, vq->used);
> }
> --
> 2.14.3
^ permalink raw reply
* Re: [PATCH v3 2/2] vhost: return bool from *_access_ok() functions
From: Michael S. Tsirkin @ 2018-04-11 13:23 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: kvm, netdev, syzkaller-bugs, linux-kernel, virtualization,
Linus Torvalds
In-Reply-To: <20180411023541.15776-3-stefanha@redhat.com>
On Wed, Apr 11, 2018 at 10:35:41AM +0800, Stefan Hajnoczi wrote:
> Currently vhost *_access_ok() functions return int. This is error-prone
> because there are two popular conventions:
>
> 1. 0 means failure, 1 means success
> 2. -errno means failure, 0 means success
>
> Although vhost mostly uses #1, it does not do so consistently.
> umem_access_ok() uses #2.
>
> This patch changes the return type from int to bool so that false means
> failure and true means success. This eliminates a potential source of
> errors.
>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/vhost/vhost.h | 4 ++--
> drivers/vhost/vhost.c | 66 +++++++++++++++++++++++++--------------------------
> 2 files changed, 35 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index d8ee85ae8fdc..6c844b90a168 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -178,8 +178,8 @@ void vhost_dev_cleanup(struct vhost_dev *);
> void vhost_dev_stop(struct vhost_dev *);
> long vhost_dev_ioctl(struct vhost_dev *, unsigned int ioctl, void __user *argp);
> long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp);
> -int vhost_vq_access_ok(struct vhost_virtqueue *vq);
> -int vhost_log_access_ok(struct vhost_dev *);
> +bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
> +bool vhost_log_access_ok(struct vhost_dev *);
>
> int vhost_get_vq_desc(struct vhost_virtqueue *,
> struct iovec iov[], unsigned int iov_count,
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index fc805b7fad9d..0fcb51a9940c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -641,14 +641,14 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> }
> EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
>
> -static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
> +static bool log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
> {
> u64 a = addr / VHOST_PAGE_SIZE / 8;
>
> /* Make sure 64 bit math will not overflow. */
> if (a > ULONG_MAX - (unsigned long)log_base ||
> a + (unsigned long)log_base > ULONG_MAX)
> - return 0;
> + return false;
>
> return access_ok(VERIFY_WRITE, log_base + a,
> (sz + VHOST_PAGE_SIZE * 8 - 1) / VHOST_PAGE_SIZE / 8);
> @@ -661,30 +661,30 @@ static bool vhost_overflow(u64 uaddr, u64 size)
> }
>
> /* Caller should have vq mutex and device mutex. */
> -static int vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem,
> - int log_all)
> +static bool vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem,
> + int log_all)
> {
> struct vhost_umem_node *node;
>
> if (!umem)
> - return 0;
> + return false;
>
> list_for_each_entry(node, &umem->umem_list, link) {
> unsigned long a = node->userspace_addr;
>
> if (vhost_overflow(node->userspace_addr, node->size))
> - return 0;
> + return false;
>
>
> if (!access_ok(VERIFY_WRITE, (void __user *)a,
> node->size))
> - return 0;
> + return false;
> else if (log_all && !log_access_ok(log_base,
> node->start,
> node->size))
> - return 0;
> + return false;
> }
> - return 1;
> + return true;
> }
>
> static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
> @@ -701,13 +701,13 @@ static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
>
> /* Can we switch to this memory table? */
> /* Caller should have device mutex but not vq mutex */
> -static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
> - int log_all)
> +static bool memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
> + int log_all)
> {
> int i;
>
> for (i = 0; i < d->nvqs; ++i) {
> - int ok;
> + bool ok;
> bool log;
>
> mutex_lock(&d->vqs[i]->mutex);
> @@ -717,12 +717,12 @@ static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
> ok = vq_memory_access_ok(d->vqs[i]->log_base,
> umem, log);
> else
> - ok = 1;
> + ok = true;
> mutex_unlock(&d->vqs[i]->mutex);
> if (!ok)
> - return 0;
> + return false;
> }
> - return 1;
> + return true;
> }
>
> static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
> @@ -959,21 +959,21 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
> spin_unlock(&d->iotlb_lock);
> }
>
> -static int umem_access_ok(u64 uaddr, u64 size, int access)
> +static bool umem_access_ok(u64 uaddr, u64 size, int access)
> {
> unsigned long a = uaddr;
>
> /* Make sure 64 bit math will not overflow. */
> if (vhost_overflow(uaddr, size))
> - return -EFAULT;
> + return false;
>
> if ((access & VHOST_ACCESS_RO) &&
> !access_ok(VERIFY_READ, (void __user *)a, size))
> - return -EFAULT;
> + return false;
> if ((access & VHOST_ACCESS_WO) &&
> !access_ok(VERIFY_WRITE, (void __user *)a, size))
> - return -EFAULT;
> - return 0;
> + return false;
> + return true;
> }
>
> static int vhost_process_iotlb_msg(struct vhost_dev *dev,
> @@ -988,7 +988,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
> ret = -EFAULT;
> break;
> }
> - if (umem_access_ok(msg->uaddr, msg->size, msg->perm)) {
> + if (!umem_access_ok(msg->uaddr, msg->size, msg->perm)) {
> ret = -EFAULT;
> break;
> }
> @@ -1135,10 +1135,10 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
> return 0;
> }
>
> -static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
> - struct vring_desc __user *desc,
> - struct vring_avail __user *avail,
> - struct vring_used __user *used)
> +static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used)
>
> {
> size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> @@ -1161,8 +1161,8 @@ static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
> vq->meta_iotlb[type] = node;
> }
>
> -static int iotlb_access_ok(struct vhost_virtqueue *vq,
> - int access, u64 addr, u64 len, int type)
> +static bool iotlb_access_ok(struct vhost_virtqueue *vq,
> + int access, u64 addr, u64 len, int type)
> {
> const struct vhost_umem_node *node;
> struct vhost_umem *umem = vq->iotlb;
> @@ -1220,7 +1220,7 @@ EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
>
> /* Can we log writes? */
> /* Caller should have device mutex but not vq mutex */
> -int vhost_log_access_ok(struct vhost_dev *dev)
> +bool vhost_log_access_ok(struct vhost_dev *dev)
> {
> return memory_access_ok(dev, dev->umem, 1);
> }
> @@ -1228,8 +1228,8 @@ EXPORT_SYMBOL_GPL(vhost_log_access_ok);
>
> /* Verify access for write logging. */
> /* Caller should have vq mutex and device mutex */
> -static int vq_log_access_ok(struct vhost_virtqueue *vq,
> - void __user *log_base)
> +static bool vq_log_access_ok(struct vhost_virtqueue *vq,
> + void __user *log_base)
> {
> size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
>
> @@ -1242,14 +1242,14 @@ static int vq_log_access_ok(struct vhost_virtqueue *vq,
>
> /* Can we start vq? */
> /* Caller should have vq mutex and device mutex */
> -int vhost_vq_access_ok(struct vhost_virtqueue *vq)
> +bool vhost_vq_access_ok(struct vhost_virtqueue *vq)
> {
> if (!vq_log_access_ok(vq, vq->log_base))
> - return 0;
> + return false;
>
> /* Access validation occurs at prefetch time with IOTLB */
> if (vq->iotlb)
> - return 1;
> + return true;
>
> return vq_access_ok(vq, vq->num, vq->desc, vq->avail, vq->used);
> }
> --
> 2.14.3
^ permalink raw reply
* Re: [PATCH v3 0/2] vhost: fix vhost_vq_access_ok() log check
From: Michael S. Tsirkin @ 2018-04-11 13:24 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: kvm, netdev, syzkaller-bugs, linux-kernel, virtualization,
Linus Torvalds
In-Reply-To: <20180411023541.15776-1-stefanha@redhat.com>
On Wed, Apr 11, 2018 at 10:35:39AM +0800, Stefan Hajnoczi wrote:
> v3:
> * Rebased onto net/master and resolved conflict [DaveM]
>
> v2:
> * Rewrote the conditional to make the vq access check clearer [Linus]
> * Added Patch 2 to make the return type consistent and harder to misuse [Linus]
>
> The first patch fixes the vhost virtqueue access check which was recently
> broken. The second patch replaces the int return type with bool to prevent
> future bugs.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
We need the 1st one on stable I think.
> Stefan Hajnoczi (2):
> vhost: fix vhost_vq_access_ok() log check
> vhost: return bool from *_access_ok() functions
>
> drivers/vhost/vhost.h | 4 +--
> drivers/vhost/vhost.c | 70 ++++++++++++++++++++++++++-------------------------
> 2 files changed, 38 insertions(+), 36 deletions(-)
>
> --
> 2.14.3
^ permalink raw reply
* [PATCH] vhost: Fix vhost_copy_to_user()
From: Eric Auger @ 2018-04-11 13:30 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, linux-kernel, virtualization, netdev,
kvm, jasowang, mst, kvmarm
Cc: stefanha
vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
This fixes a stall observed when running an aarch64 guest with
virtual smmu
---
drivers/vhost/vhost.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index bec722e..f44aead 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to,
struct iov_iter t;
void __user *uaddr = vhost_vq_meta_fetch(vq,
(u64)(uintptr_t)to, size,
- VHOST_ADDR_DESC);
+ VHOST_ADDR_USED);
if (uaddr)
return __copy_to_user(uaddr, from, size);
--
2.5.5
^ permalink raw reply related
* Re: [PATCH] vhost: Fix vhost_copy_to_user()
From: Jason Wang @ 2018-04-11 13:44 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, linux-kernel, virtualization, netdev,
kvm, mst, kvmarm
Cc: stefanha
In-Reply-To: <1523453438-4266-1-git-send-email-eric.auger@redhat.com>
On 2018年04月11日 21:30, Eric Auger wrote:
> vhost_copy_to_user is used to copy vring used elements to userspace.
> We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
>
> Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> This fixes a stall observed when running an aarch64 guest with
> virtual smmu
> ---
> drivers/vhost/vhost.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index bec722e..f44aead 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to,
> struct iov_iter t;
> void __user *uaddr = vhost_vq_meta_fetch(vq,
> (u64)(uintptr_t)to, size,
> - VHOST_ADDR_DESC);
> + VHOST_ADDR_USED);
>
> if (uaddr)
> return __copy_to_user(uaddr, from, size);
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks!
Stable material I think.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox