* Re: [PATCH 4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
From: maowenan @ 2019-09-03 8:55 UTC (permalink / raw)
To: Tim Froidcoeur, eric.dumazet
Cc: David Miller, cpaasch@apple.com, jonathan.lemon@gmail.com,
stable@vger.kernel.org, gregkh@linuxfoundation.org,
matthieu.baerts@tessares.net, aprout@ll.mit.edu,
edumazet@google.com, jtl@netflix.com,
linux-kernel@vger.kernel.org, mkubecek@suse.cz,
ncardwell@google.com, sashal@kernel.org, ycheng@google.com,
netdev@vger.kernel.org
In-Reply-To: <CAOj+RUvXMaoVKzSeDab4oTn3p=-BJtuhgqwKDCUuhCQWHO7bgQ@mail.gmail.com>
On 2019/9/3 14:58, Tim Froidcoeur wrote:
> Hi,
>
> I also tried to reproduce this in a targeted way, and run into the
> same difficulty as you: satisfying the first condition “
> (sk->sk_wmem_queued >> 1) > limit “.
> I will not have bandwidth the coming days to try and reproduce it in
> this way. Maybe simply forcing a very small send buffer using sysctl
> net.ipv4.tcp_wmem might even do the trick?
>
> I suspect that the bug is easier to trigger with the MPTCP patch like
> I did originally, due to the way this patch manages the tcp subflow
> buffers (it can temporarily overfill the buffers, satisfying that
> first condition more often).
>
> another thing, the stacktrace you shared before seems caused by
> another issue (corrupted socket?), it will not be solved by the patch
> we submitted.
The trace shows zero window probe message can be BUG_ON in skb_queue_prev,
this is reproduced on our platform with syzkaller. It can be resolved by
your fix patch.
The thing I need to think is why the first condition can be satisfied?
Eric, Do you have any comments to reproduce it as the first condition
is hard to be true?
(sk->sk_wmem_queued >> 1) > limit
>
> kind regards,
>
> Tim
>
>
> On Tue, Sep 3, 2019 at 5:22 AM maowenan <maowenan@huawei.com> wrote:
>>
>> Hi Tim,
>>
>>
>>
>> I try to reproduce it with packetdrill or user application, but I can’t.
>>
>> The first condition “ (sk->sk_wmem_queued >> 1) > limit “ can’t be satisfied,
>>
>> This condition is to avoid tiny SO_SNDBUF values set by user.
>>
>> It also adds the some room due to the fact that tcp_sendmsg()
>>
>> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
>>
>> TSO skb (64KB size).
>>
>>
>>
>> limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
>>
>> if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
>>
>> skb != tcp_rtx_queue_head(sk) &&
>>
>> skb != tcp_rtx_queue_tail(sk))) {
>>
>> NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>>
>> return -ENOMEM;
>>
>> }
>>
>>
>>
>> Can you try to reproduce it with packetdrill or C socket application?
>>
>>
>
>
>
^ permalink raw reply
* [PATCH net] ipmr: remove cache_resolve_queue_len
From: Hangbin Liu @ 2019-09-03 8:43 UTC (permalink / raw)
To: netdev
Cc: Phil Karn, Sukumar Gopalakrishnan, David S . Miller,
Alexey Kuznetsov, Hideaki YOSHIFUJI, Hangbin Liu
This is a re-post of previous patch wrote by David Miller[1].
Phil Karn reported[2] that on busy networks with lots of unresolved
multicast routing entries, the creation of new multicast group routes
can be extremely slow and unreliable.
The reason is we hard-coded multicast route entries with unresolved source
addresses(cache_resolve_queue_len) to 10. If some multicast route never
resolves and the unresolved source addresses increased, there will
be no ability to create new multicast route cache.
To resolve this issue, we need either add a sysctl entry to make the
cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len
directly, as we already have the socket receive queue limits of mrouted
socket, pointed by David.
From my side, I'd perfer to remove the cache_resolve_queue_len instead
of creating two more(IPv4 and IPv6 version) sysctl entry.
[1] https://lkml.org/lkml/2018/7/22/11
[2] https://lkml.org/lkml/2018/7/21/343
Reported-by: Phil Karn <karn@ka9q.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
include/linux/mroute_base.h | 2 --
net/ipv4/ipmr.c | 27 ++++++++++++++++++---------
net/ipv6/ip6mr.c | 10 +++-------
3 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index 34de06b426ef..50fb89533029 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -235,7 +235,6 @@ struct mr_table_ops {
* @mfc_hash: Hash table of all resolved routes for easy lookup
* @mfc_cache_list: list of resovled routes for possible traversal
* @maxvif: Identifier of highest value vif currently in use
- * @cache_resolve_queue_len: current size of unresolved queue
* @mroute_do_assert: Whether to inform userspace on wrong ingress
* @mroute_do_pim: Whether to receive IGMP PIMv1
* @mroute_reg_vif_num: PIM-device vif index
@@ -252,7 +251,6 @@ struct mr_table {
struct rhltable mfc_hash;
struct list_head mfc_cache_list;
int maxvif;
- atomic_t cache_resolve_queue_len;
bool mroute_do_assert;
bool mroute_do_pim;
bool mroute_do_wrvifwhole;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c07bc82cbbe9..6c5278b45afb 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -744,8 +744,6 @@ static void ipmr_destroy_unres(struct mr_table *mrt, struct mfc_cache *c)
struct sk_buff *skb;
struct nlmsgerr *e;
- atomic_dec(&mrt->cache_resolve_queue_len);
-
while ((skb = skb_dequeue(&c->_c.mfc_un.unres.unresolved))) {
if (ip_hdr(skb)->version == 0) {
struct nlmsghdr *nlh = skb_pull(skb,
@@ -1133,9 +1131,11 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
}
if (!found) {
+ bool was_empty;
+
/* Create a new entry if allowable */
- if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
- (c = ipmr_cache_alloc_unres()) == NULL) {
+ c = ipmr_cache_alloc_unres();
+ if (!c) {
spin_unlock_bh(&mfc_unres_lock);
kfree_skb(skb);
@@ -1161,11 +1161,11 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
return err;
}
- atomic_inc(&mrt->cache_resolve_queue_len);
+ was_empty = list_empty(&mrt->mfc_unres_queue);
list_add(&c->_c.list, &mrt->mfc_unres_queue);
mroute_netlink_event(mrt, c, RTM_NEWROUTE);
- if (atomic_read(&mrt->cache_resolve_queue_len) == 1)
+ if (was_empty)
mod_timer(&mrt->ipmr_expire_timer,
c->_c.mfc_un.unres.expires);
}
@@ -1272,7 +1272,6 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
if (uc->mfc_origin == c->mfc_origin &&
uc->mfc_mcastgrp == c->mfc_mcastgrp) {
list_del(&_uc->list);
- atomic_dec(&mrt->cache_resolve_queue_len);
found = true;
break;
}
@@ -1328,7 +1327,7 @@ static void mroute_clean_tables(struct mr_table *mrt, int flags)
}
if (flags & MRT_FLUSH_MFC) {
- if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+ if (!list_empty(&mrt->mfc_unres_queue)) {
spin_lock_bh(&mfc_unres_lock);
list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
list_del(&c->list);
@@ -2750,9 +2749,19 @@ static int ipmr_rtm_route(struct sk_buff *skb, struct nlmsghdr *nlh,
return ipmr_mfc_delete(tbl, &mfcc, parent);
}
+static int queue_count(struct mr_table *mrt)
+{
+ struct list_head *pos;
+ int count = 0;
+
+ list_for_each(pos, &mrt->mfc_unres_queue)
+ count++;
+ return count;
+}
+
static bool ipmr_fill_table(struct mr_table *mrt, struct sk_buff *skb)
{
- u32 queue_len = atomic_read(&mrt->cache_resolve_queue_len);
+ u32 queue_len = queue_count(mrt);
if (nla_put_u32(skb, IPMRA_TABLE_ID, mrt->id) ||
nla_put_u32(skb, IPMRA_TABLE_CACHE_RES_QUEUE_LEN, queue_len) ||
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index e80d36c5073d..d02f0f903559 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -768,8 +768,6 @@ static void ip6mr_destroy_unres(struct mr_table *mrt, struct mfc6_cache *c)
struct net *net = read_pnet(&mrt->net);
struct sk_buff *skb;
- atomic_dec(&mrt->cache_resolve_queue_len);
-
while ((skb = skb_dequeue(&c->_c.mfc_un.unres.unresolved)) != NULL) {
if (ipv6_hdr(skb)->version == 0) {
struct nlmsghdr *nlh = skb_pull(skb,
@@ -1148,8 +1146,8 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
* Create a new entry if allowable
*/
- if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
- (c = ip6mr_cache_alloc_unres()) == NULL) {
+ c = ip6mr_cache_alloc_unres();
+ if (!c) {
spin_unlock_bh(&mfc_unres_lock);
kfree_skb(skb);
@@ -1176,7 +1174,6 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
return err;
}
- atomic_inc(&mrt->cache_resolve_queue_len);
list_add(&c->_c.list, &mrt->mfc_unres_queue);
mr6_netlink_event(mrt, c, RTM_NEWROUTE);
@@ -1468,7 +1465,6 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
if (ipv6_addr_equal(&uc->mf6c_origin, &c->mf6c_origin) &&
ipv6_addr_equal(&uc->mf6c_mcastgrp, &c->mf6c_mcastgrp)) {
list_del(&_uc->list);
- atomic_dec(&mrt->cache_resolve_queue_len);
found = true;
break;
}
@@ -1526,7 +1522,7 @@ static void mroute_clean_tables(struct mr_table *mrt, int flags)
}
if (flags & MRT6_FLUSH_MFC) {
- if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+ if (!list_empty(&mrt->mfc_unres_queue)) {
spin_lock_bh(&mfc_unres_lock);
list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
list_del(&c->list);
--
2.19.2
^ permalink raw reply related
* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Allan W. Nielsen @ 2019-09-03 8:14 UTC (permalink / raw)
To: Ido Schimmel
Cc: Jiri Pirko, David Miller, andrew, horatiu.vultur,
alexandre.belloni, UNGLinuxDriver, ivecera, f.fainelli, netdev,
linux-kernel
In-Reply-To: <20190903061324.GA6149@splinter>
Hi Ido,
The 09/03/2019 09:13, Ido Schimmel wrote:
> On Mon, Sep 02, 2019 at 07:42:31PM +0200, Allan W. Nielsen wrote:
> > I have been reading through this thread several times and I still do not get it.
> I kept thinking about this and I want to make sure that I correctly
> understand the end result.
So do I, and regardless of this being merged or not, I'm really happy that we
have the discussion as there are clearly a need for synchronizing the meaning of
promiscuity.
One of the reasons why we started working on this patch was the comments we got
when requesting comments on the "[PATCH] net: bridge: Allow bridge to join
multicast groups" where we want to be able to setup mac-entries for L2 multicast
MAC addresses. Here it is important for us to be able to control which ports a
static configured multicast address needs to be forwarded to (including the
CPU). When working on this patch it was pointed out that the interfaces are in
promisc mode anyway (if member of a bridge, and if flooding and learning are
enabled), and it therefore this optimization should not matter anyway.
I found that this was a fair comment, and it has been on the list of things we
wanted to "fix" for a long time.
The optimization does matter to us, as we have until now worked around the issue
by not implementing promisc mode.
When debugging, it is very useful for us to be able to see all the traffic RXed
on the interface, and to us it did make a lot of sense to "just" make promisc
mode work like we was used to.
But, this is for debugging, it is not the most important patch we have on the
backlog, and I would therefore prefer finding a solution which we are all happy
with.
> With these patches applied I assume I will see the following traffic
> when running tcpdump on one of the netdevs exposed by the ocelot driver:
>
> - Ingress: All
> - Egress: Only locally generated traffic and traffic forwarded by the
> kernel from interfaces not belonging to the ocelot driver
>
> The above means I will not see any offloaded traffic transmitted by the
> port. Is that correct?
Correct - but maybe we should change this.
In Ocelot and in LANxxxx (the part we are working on now), we can come pretty
close. We can get the offloaded TX traffic to the CPU, but it will not be
re-written (it will look like the ingress frame, which is not always the same as
the egress frame, vlan tags an others may be re-written).
In some of our chips we can actually do this (not Ocelot, and not the LANxxxx
part we are working on now) after the frame as been re-written.
> I see that the driver is setting 'offload_fwd_mark' for any traffic trapped
> from bridged ports, which means the bridge will drop it before it traverses
> the packet taps on egress.
Correct.
> Large parts of the discussion revolve around the fact that switch ports
> are not any different than other ports. Dave wrote "Please stop
> portraying switches as special in this regard" and Andrew wrote "[The
> user] just wants tcpdump to work like on their desktop."
And we are trying to get as close to this as practical possible, knowing that it
may not be exactly the same.
> But if anything, this discussion proves that switch ports are special in
> this regard and that tcpdump will not work like on the desktop.
I think it can come really close. Some drivers may be able to fix the TX issue
you point out, others may not.
> Beside the fact that I don't agree (but gave up) with the new
> interpretation of promisc mode, I wonder if we're not asking for trouble
> with this patchset. Users will see all offloaded traffic on ingress, but
> none of it on egress. This is in contrast to the sever/desktop, where
> Linux is much more dominant in comparison to switches (let alone hw
> accelerated ones) and where all the traffic is visible through tcpdump.
> I can already see myself having to explain this over and over again to
> confused users.
>
> Now, I understand that showing egress traffic is inherently difficult.
> It means one of two things:
>
> 1. We allow packets to be forwarded by both the software and the
> hardware
> 2. We trap all ingressing traffic from all the ports
If the HW cannot copy the egress traffic to the CPU (which our HW cannot), then
you need to do both. All ingress traffic needs to go to the CPU, you need to
make all the forwarding decisions in the CPU, to figure out what traffic happens
to go to the port you want to monitor.
I really doubt this will work in real life. Too much traffic, and HW may make
different forwarding decision that the SW (tc rules in HW but not in SW), which
means that it will not be good for debugging anyway.
> Both options can have devastating effects on the network and therefore
> should not be triggered by a supposedly innocent invocation of tcpdump.
Agree.
> I again wonder if it would not be wiser to solve this by introducing two
> new flags to tcpdump for ingress/egress (similar to -Q in/out) capturing
> of offloaded traffic. The capturing of egress offloaded traffic can be
> documented with the appropriate warnings.
Not sure I agree, but I will try to spend some more time considering it.
In the mean while, what TC action was it that Jiri suggestion we should use? The
trap action is no good, and it prevents the forwarding in silicon, and I'm not
aware of a "COPY-TO-CPU" action.
> Anyway, I don't want to hold you up, I merely want to make sure that the
> above (assuming it's correct) is considered before the patches are
> applied.
Sounds good, and thanks for all the time spend on reviewing and asking the
critical questions.
/Allan
^ permalink raw reply
* request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput)
From: Michael S. Tsirkin @ 2019-09-03 8:02 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190717113030.163499-1-sgarzare@redhat.com>
Patches 1,3 and 4 are needed for stable.
Dave, could you queue them there please?
On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
>
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
> moved this patch after "vsock/virtio: reduce credit update messages"
> to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> conflicts
>
> v3: https://patchwork.kernel.org/cover/10970145
>
> v2: https://patchwork.kernel.org/cover/10938743
>
> v1: https://patchwork.kernel.org/cover/10885431
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.
>
> A brief description of patches:
> - Patches 1: limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
> transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
>
> host -> guest [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.032 0.030 0.048 0.051
> 64 0.061 0.059 0.108 0.117
> 128 0.122 0.112 0.227 0.234
> 256 0.244 0.241 0.418 0.415
> 512 0.459 0.466 0.847 0.865
> 1K 0.927 0.919 1.657 1.641
> 2K 1.884 1.813 3.262 3.269
> 4K 3.378 3.326 6.044 6.195
> 8K 5.637 5.676 10.141 11.287
> 16K 8.250 8.402 15.976 16.736
> 32K 13.327 13.204 19.013 20.515
> 64K 21.241 21.341 20.973 21.879
> 128K 21.851 22.354 21.816 23.203
> 256K 21.408 21.693 21.846 24.088
> 512K 21.600 21.899 21.921 24.106
>
> guest -> host [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.045 0.046 0.057 0.057
> 64 0.089 0.091 0.103 0.104
> 128 0.170 0.179 0.192 0.200
> 256 0.364 0.351 0.361 0.379
> 512 0.709 0.699 0.731 0.790
> 1K 1.399 1.407 1.395 1.427
> 2K 2.670 2.684 2.745 2.835
> 4K 5.171 5.199 5.305 5.451
> 8K 8.442 8.500 10.083 9.941
> 16K 12.305 12.259 13.519 15.385
> 32K 11.418 11.150 11.988 24.680
> 64K 10.778 10.659 11.589 35.273
> 128K 10.421 10.339 10.939 40.338
> 256K 10.300 9.719 10.508 36.562
> 512K 9.833 9.808 10.612 35.979
>
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.35 0.45 0.79 1.02
> 64 0.56 0.80 1.41 1.54
> 128 1.11 1.52 3.03 3.12
> 256 2.20 2.16 5.44 5.58
> 512 4.17 4.18 10.96 11.46
> 1K 8.30 8.26 20.99 20.89
> 2K 16.82 16.31 39.76 39.73
> 4K 30.89 30.79 74.07 75.73
> 8K 53.74 54.49 124.24 148.91
> 16K 80.68 83.63 200.21 232.79
> 32K 132.27 132.52 260.81 357.07
> 64K 229.82 230.40 300.19 444.18
> 128K 332.60 329.78 331.51 492.28
> 256K 331.06 337.22 339.59 511.59
> 512K 335.58 328.50 331.56 504.56
>
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.43 0.43 0.53 0.56
> 64 0.85 0.86 1.04 1.10
> 128 1.63 1.71 2.07 2.13
> 256 3.48 3.35 4.02 4.22
> 512 6.80 6.67 7.97 8.63
> 1K 13.32 13.31 15.72 15.94
> 2K 25.79 25.92 30.84 30.98
> 4K 50.37 50.48 58.79 59.69
> 8K 95.90 96.15 107.04 110.33
> 16K 145.80 145.43 143.97 174.70
> 32K 147.06 144.74 146.02 282.48
> 64K 145.25 143.99 141.62 406.40
> 128K 149.34 146.96 147.49 489.34
> 256K 156.35 149.81 152.21 536.37
> 512K 151.65 150.74 151.52 519.93
>
> [1] https://github.com/stefano-garzarella/iperf/
>
> Stefano Garzarella (5):
> vsock/virtio: limit the memory used per-socket
> vsock/virtio: reduce credit update messages
> vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
>
> drivers/vhost/vsock.c | 68 ++++++++++++-----
> include/linux/virtio_vsock.h | 4 +-
> net/vmw_vsock/virtio_transport.c | 1 +
> net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> 4 files changed, 134 insertions(+), 38 deletions(-)
>
> --
> 2.20.1
^ permalink raw reply
* Re: [PATCH] hostap: remove set but not used variable 'copied' in prism2_io_debug_proc_read
From: zhong jiang @ 2019-09-03 8:01 UTC (permalink / raw)
To: zhong jiang; +Cc: kvalo, davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <1567497430-22539-1-git-send-email-zhongjiang@huawei.com>
Please ignore the patch. Because the hostap_proc.c is marked as 'obsolete'.
Thanks,
zhong jiang
On 2019/9/3 15:57, zhong jiang wrote:
> Obviously, variable 'copied' is initialized to zero. But it is not used.
> hence just remove it.
>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
> drivers/net/wireless/intersil/hostap/hostap_proc.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/intersil/hostap/hostap_proc.c b/drivers/net/wireless/intersil/hostap/hostap_proc.c
> index 703d74c..6151d8d 100644
> --- a/drivers/net/wireless/intersil/hostap/hostap_proc.c
> +++ b/drivers/net/wireless/intersil/hostap/hostap_proc.c
> @@ -234,7 +234,7 @@ static int prism2_io_debug_proc_read(char *page, char **start, off_t off,
> {
> local_info_t *local = (local_info_t *) data;
> int head = local->io_debug_head;
> - int start_bytes, left, copy, copied;
> + int start_bytes, left, copy;
>
> if (off + count > PRISM2_IO_DEBUG_SIZE * 4) {
> *eof = 1;
> @@ -243,7 +243,6 @@ static int prism2_io_debug_proc_read(char *page, char **start, off_t off,
> count = PRISM2_IO_DEBUG_SIZE * 4 - off;
> }
>
> - copied = 0;
> start_bytes = (PRISM2_IO_DEBUG_SIZE - head) * 4;
> left = count;
>
^ permalink raw reply
* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Stefano Garzarella @ 2019-09-03 8:00 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190903034747-mutt-send-email-mst@kernel.org>
On Tue, Sep 03, 2019 at 03:52:24AM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 03, 2019 at 09:45:54AM +0200, Stefano Garzarella wrote:
> > On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > > >
> > > > > Assuming we miss nothing and buffers < 4K are broken,
> > > > > I think we need to add this to the spec, possibly with
> > > > > a feature bit to relax the requirement that all buffers
> > > > > are at least 4k in size.
> > > > >
> > > >
> > > > Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
> > >
> > > How about we also fix the bug for now?
> >
> > This series unintentionally fix the bug because we are introducing a way
> > to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
> > split packets to send using multiple buffers) and we removed the limit
> > to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
> > allowed).
> >
> > I discovered that there was a bug while we discussed memory accounting.
> >
> > Do you think it's enough while we introduce the feature bit in the spec?
> >
> > Thanks,
> > Stefano
>
> Well locking is also broken (patch 3/5). It seems that 3/5 and 4/5 work
> by themselves, right? So how about we ask Dave to send these to stable?
Yes, they work by themselves and I agree that should be send to stable.
> Also, how about 1/5? Also needed for stable?
I think so, without this patch if we flood the guest with 1-byte packets,
we can consume ~ 1 GB of guest memory per-socket.
Thanks,
Stefano
^ permalink raw reply
* [PATCH] hostap: remove set but not used variable 'copied' in prism2_io_debug_proc_read
From: zhong jiang @ 2019-09-03 7:57 UTC (permalink / raw)
To: kvalo, davem; +Cc: zhongjiang, linux-wireless, netdev, linux-kernel
Obviously, variable 'copied' is initialized to zero. But it is not used.
hence just remove it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
drivers/net/wireless/intersil/hostap/hostap_proc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intersil/hostap/hostap_proc.c b/drivers/net/wireless/intersil/hostap/hostap_proc.c
index 703d74c..6151d8d 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_proc.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_proc.c
@@ -234,7 +234,7 @@ static int prism2_io_debug_proc_read(char *page, char **start, off_t off,
{
local_info_t *local = (local_info_t *) data;
int head = local->io_debug_head;
- int start_bytes, left, copy, copied;
+ int start_bytes, left, copy;
if (off + count > PRISM2_IO_DEBUG_SIZE * 4) {
*eof = 1;
@@ -243,7 +243,6 @@ static int prism2_io_debug_proc_read(char *page, char **start, off_t off,
count = PRISM2_IO_DEBUG_SIZE * 4 - off;
}
- copied = 0;
start_bytes = (PRISM2_IO_DEBUG_SIZE - head) * 4;
left = count;
--
1.7.12.4
^ permalink raw reply related
* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-09-03 7:52 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190903074554.mq6spyivftuodahy@steredhat>
On Tue, Sep 03, 2019 at 09:45:54AM +0200, Stefano Garzarella wrote:
> On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > >
> > > > Assuming we miss nothing and buffers < 4K are broken,
> > > > I think we need to add this to the spec, possibly with
> > > > a feature bit to relax the requirement that all buffers
> > > > are at least 4k in size.
> > > >
> > >
> > > Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
> >
> > How about we also fix the bug for now?
>
> This series unintentionally fix the bug because we are introducing a way
> to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
> split packets to send using multiple buffers) and we removed the limit
> to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
> allowed).
>
> I discovered that there was a bug while we discussed memory accounting.
>
> Do you think it's enough while we introduce the feature bit in the spec?
>
> Thanks,
> Stefano
Well locking is also broken (patch 3/5). It seems that 3/5 and 4/5 work
by themselves, right? So how about we ask Dave to send these to stable?
Also, how about 1/5? Also needed for stable?
--
MST
^ permalink raw reply
* Re: [PATCH net-next] vsock/virtio: a better comment on credit update
From: Stefano Garzarella @ 2019-09-03 7:47 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Stefan Hajnoczi, David S. Miller, kvm,
virtualization, netdev
In-Reply-To: <20190903073748.25214-1-mst@redhat.com>
On Tue, Sep 03, 2019 at 03:38:16AM -0400, Michael S. Tsirkin wrote:
> The comment we have is just repeating what the code does.
> Include the *reason* for the condition instead.
>
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> net/vmw_vsock/virtio_transport_common.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 94cc0fa3e848..5bb70c692b1e 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -307,8 +307,13 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>
> spin_unlock_bh(&vvs->rx_lock);
>
> - /* We send a credit update only when the space available seen
> - * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> + /* To reduce the number of credit update messages,
> + * don't update credits as long as lots of space is available.
> + * Note: the limit chosen here is arbitrary. Setting the limit
> + * too high causes extra messages. Too low causes transmitter
> + * stalls. As stalls are in theory more expensive than extra
> + * messages, we set the limit to a high value. TODO: experiment
> + * with different values.
> */
> if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> virtio_transport_send_credit_update(vsk,
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
^ permalink raw reply
* Re: [PATCH] Clock-independent TCP ISN generation
From: kbuild test robot @ 2019-09-03 7:45 UTC (permalink / raw)
To: Cyrus Sh; +Cc: kbuild-all, davem, shiraz.saleem, jgg, arnd, netdev, sirus
In-Reply-To: <70c41960-6d14-3943-31ca-75598ad3d2d7@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7349 bytes --]
Hi Cyrus,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc7 next-20190902]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Cyrus-Sh/Clock-independent-TCP-ISN-generation/20190903-131719
config: x86_64-randconfig-e003-201935 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> kernel/sysctl_binary.c:335:42: error: expected '}' before string constant
{CTL_INT, NET_IPV4_TCP_RANDOM_ISN "tcp_random_isn"}
^~~~~~~~~~~~~~~~
>> kernel/sysctl_binary.c:336:2: error: expected '}' before '{' token
{CTL_INT, NET_IPV4_FORWARD, "ip_forward" },
^
vim +335 kernel/sysctl_binary.c
333
334 static const struct bin_table bin_net_ipv4_table[] = {
> 335 {CTL_INT, NET_IPV4_TCP_RANDOM_ISN "tcp_random_isn"}
> 336 {CTL_INT, NET_IPV4_FORWARD, "ip_forward" },
337
338 { CTL_DIR, NET_IPV4_CONF, "conf", bin_net_ipv4_conf_table },
339 { CTL_DIR, NET_IPV4_NEIGH, "neigh", bin_net_neigh_table },
340 { CTL_DIR, NET_IPV4_ROUTE, "route", bin_net_ipv4_route_table },
341 /* NET_IPV4_FIB_HASH unused */
342 { CTL_DIR, NET_IPV4_NETFILTER, "netfilter", bin_net_ipv4_netfilter_table },
343
344 { CTL_INT, NET_IPV4_TCP_TIMESTAMPS, "tcp_timestamps" },
345 { CTL_INT, NET_IPV4_TCP_WINDOW_SCALING, "tcp_window_scaling" },
346 { CTL_INT, NET_IPV4_TCP_SACK, "tcp_sack" },
347 { CTL_INT, NET_IPV4_TCP_RETRANS_COLLAPSE, "tcp_retrans_collapse" },
348 { CTL_INT, NET_IPV4_DEFAULT_TTL, "ip_default_ttl" },
349 /* NET_IPV4_AUTOCONFIG unused */
350 { CTL_INT, NET_IPV4_NO_PMTU_DISC, "ip_no_pmtu_disc" },
351 { CTL_INT, NET_IPV4_NONLOCAL_BIND, "ip_nonlocal_bind" },
352 { CTL_INT, NET_IPV4_TCP_SYN_RETRIES, "tcp_syn_retries" },
353 { CTL_INT, NET_TCP_SYNACK_RETRIES, "tcp_synack_retries" },
354 { CTL_INT, NET_TCP_MAX_ORPHANS, "tcp_max_orphans" },
355 { CTL_INT, NET_TCP_MAX_TW_BUCKETS, "tcp_max_tw_buckets" },
356 { CTL_INT, NET_IPV4_DYNADDR, "ip_dynaddr" },
357 { CTL_INT, NET_IPV4_TCP_KEEPALIVE_TIME, "tcp_keepalive_time" },
358 { CTL_INT, NET_IPV4_TCP_KEEPALIVE_PROBES, "tcp_keepalive_probes" },
359 { CTL_INT, NET_IPV4_TCP_KEEPALIVE_INTVL, "tcp_keepalive_intvl" },
360 { CTL_INT, NET_IPV4_TCP_RETRIES1, "tcp_retries1" },
361 { CTL_INT, NET_IPV4_TCP_RETRIES2, "tcp_retries2" },
362 { CTL_INT, NET_IPV4_TCP_FIN_TIMEOUT, "tcp_fin_timeout" },
363 { CTL_INT, NET_TCP_SYNCOOKIES, "tcp_syncookies" },
364 { CTL_INT, NET_TCP_TW_RECYCLE, "tcp_tw_recycle" },
365 { CTL_INT, NET_TCP_ABORT_ON_OVERFLOW, "tcp_abort_on_overflow" },
366 { CTL_INT, NET_TCP_STDURG, "tcp_stdurg" },
367 { CTL_INT, NET_TCP_RFC1337, "tcp_rfc1337" },
368 { CTL_INT, NET_TCP_MAX_SYN_BACKLOG, "tcp_max_syn_backlog" },
369 { CTL_INT, NET_IPV4_LOCAL_PORT_RANGE, "ip_local_port_range" },
370 { CTL_INT, NET_IPV4_IGMP_MAX_MEMBERSHIPS, "igmp_max_memberships" },
371 { CTL_INT, NET_IPV4_IGMP_MAX_MSF, "igmp_max_msf" },
372 { CTL_INT, NET_IPV4_INET_PEER_THRESHOLD, "inet_peer_threshold" },
373 { CTL_INT, NET_IPV4_INET_PEER_MINTTL, "inet_peer_minttl" },
374 { CTL_INT, NET_IPV4_INET_PEER_MAXTTL, "inet_peer_maxttl" },
375 { CTL_INT, NET_IPV4_INET_PEER_GC_MINTIME, "inet_peer_gc_mintime" },
376 { CTL_INT, NET_IPV4_INET_PEER_GC_MAXTIME, "inet_peer_gc_maxtime" },
377 { CTL_INT, NET_TCP_ORPHAN_RETRIES, "tcp_orphan_retries" },
378 { CTL_INT, NET_TCP_FACK, "tcp_fack" },
379 { CTL_INT, NET_TCP_REORDERING, "tcp_reordering" },
380 { CTL_INT, NET_TCP_ECN, "tcp_ecn" },
381 { CTL_INT, NET_TCP_DSACK, "tcp_dsack" },
382 { CTL_INT, NET_TCP_MEM, "tcp_mem" },
383 { CTL_INT, NET_TCP_WMEM, "tcp_wmem" },
384 { CTL_INT, NET_TCP_RMEM, "tcp_rmem" },
385 { CTL_INT, NET_TCP_APP_WIN, "tcp_app_win" },
386 { CTL_INT, NET_TCP_ADV_WIN_SCALE, "tcp_adv_win_scale" },
387 { CTL_INT, NET_TCP_TW_REUSE, "tcp_tw_reuse" },
388 { CTL_INT, NET_TCP_FRTO, "tcp_frto" },
389 { CTL_INT, NET_TCP_FRTO_RESPONSE, "tcp_frto_response" },
390 { CTL_INT, NET_TCP_LOW_LATENCY, "tcp_low_latency" },
391 { CTL_INT, NET_TCP_NO_METRICS_SAVE, "tcp_no_metrics_save" },
392 { CTL_INT, NET_TCP_MODERATE_RCVBUF, "tcp_moderate_rcvbuf" },
393 { CTL_INT, NET_TCP_TSO_WIN_DIVISOR, "tcp_tso_win_divisor" },
394 { CTL_STR, NET_TCP_CONG_CONTROL, "tcp_congestion_control" },
395 { CTL_INT, NET_TCP_MTU_PROBING, "tcp_mtu_probing" },
396 { CTL_INT, NET_TCP_BASE_MSS, "tcp_base_mss" },
397 { CTL_INT, NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS, "tcp_workaround_signed_windows" },
398 { CTL_INT, NET_TCP_SLOW_START_AFTER_IDLE, "tcp_slow_start_after_idle" },
399 { CTL_INT, NET_CIPSOV4_CACHE_ENABLE, "cipso_cache_enable" },
400 { CTL_INT, NET_CIPSOV4_CACHE_BUCKET_SIZE, "cipso_cache_bucket_size" },
401 { CTL_INT, NET_CIPSOV4_RBM_OPTFMT, "cipso_rbm_optfmt" },
402 { CTL_INT, NET_CIPSOV4_RBM_STRICTVALID, "cipso_rbm_strictvalid" },
403 /* NET_TCP_AVAIL_CONG_CONTROL "tcp_available_congestion_control" no longer used */
404 { CTL_STR, NET_TCP_ALLOWED_CONG_CONTROL, "tcp_allowed_congestion_control" },
405 { CTL_INT, NET_TCP_MAX_SSTHRESH, "tcp_max_ssthresh" },
406
407 { CTL_INT, NET_IPV4_ICMP_ECHO_IGNORE_ALL, "icmp_echo_ignore_all" },
408 { CTL_INT, NET_IPV4_ICMP_ECHO_IGNORE_BROADCASTS, "icmp_echo_ignore_broadcasts" },
409 { CTL_INT, NET_IPV4_ICMP_IGNORE_BOGUS_ERROR_RESPONSES, "icmp_ignore_bogus_error_responses" },
410 { CTL_INT, NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR, "icmp_errors_use_inbound_ifaddr" },
411 { CTL_INT, NET_IPV4_ICMP_RATELIMIT, "icmp_ratelimit" },
412 { CTL_INT, NET_IPV4_ICMP_RATEMASK, "icmp_ratemask" },
413
414 { CTL_INT, NET_IPV4_IPFRAG_HIGH_THRESH, "ipfrag_high_thresh" },
415 { CTL_INT, NET_IPV4_IPFRAG_LOW_THRESH, "ipfrag_low_thresh" },
416 { CTL_INT, NET_IPV4_IPFRAG_TIME, "ipfrag_time" },
417
418 { CTL_INT, NET_IPV4_IPFRAG_SECRET_INTERVAL, "ipfrag_secret_interval" },
419 /* NET_IPV4_IPFRAG_MAX_DIST "ipfrag_max_dist" no longer used */
420
421 { CTL_INT, 2088 /* NET_IPQ_QMAX */, "ip_queue_maxlen" },
422
423 /* NET_TCP_DEFAULT_WIN_SCALE unused */
424 /* NET_TCP_BIC_BETA unused */
425 /* NET_IPV4_TCP_MAX_KA_PROBES unused */
426 /* NET_IPV4_IP_MASQ_DEBUG unused */
427 /* NET_TCP_SYN_TAILDROP unused */
428 /* NET_IPV4_ICMP_SOURCEQUENCH_RATE unused */
429 /* NET_IPV4_ICMP_DESTUNREACH_RATE unused */
430 /* NET_IPV4_ICMP_TIMEEXCEED_RATE unused */
431 /* NET_IPV4_ICMP_PARAMPROB_RATE unused */
432 /* NET_IPV4_ICMP_ECHOREPLY_RATE unused */
433 /* NET_IPV4_ALWAYS_DEFRAG unused */
434 {}
435 };
436
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24350 bytes --]
^ permalink raw reply
* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Stefano Garzarella @ 2019-09-03 7:45 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190903003823-mutt-send-email-mst@kernel.org>
On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > >
> > > Assuming we miss nothing and buffers < 4K are broken,
> > > I think we need to add this to the spec, possibly with
> > > a feature bit to relax the requirement that all buffers
> > > are at least 4k in size.
> > >
> >
> > Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
>
> How about we also fix the bug for now?
This series unintentionally fix the bug because we are introducing a way
to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
split packets to send using multiple buffers) and we removed the limit
to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
allowed).
I discovered that there was a bug while we discussed memory accounting.
Do you think it's enough while we introduce the feature bit in the spec?
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH] Clock-independent TCP ISN generation
From: Eric Dumazet @ 2019-09-03 7:41 UTC (permalink / raw)
To: Cyrus Sh, davem; +Cc: shiraz.saleem, jgg, arnd, netdev, sirus
In-Reply-To: <70c41960-6d14-3943-31ca-75598ad3d2d7@gmail.com>
On 9/3/19 7:06 AM, Cyrus Sh wrote:
> This patch addresses the privacy issue of TCP ISN generation in Linux
> kernel. Currently an adversary can deanonymize a user behind an anonymity
> network by inducing a load pattern on the target machine and correlating
> its clock skew with the pattern. Since the kernel adds a clock-based
> counter to generated ISNs, the adversary can observe SYN packets with
> similar IP and port numbers to find out the clock skew of the target
> machine and this can help them identify the user. To resolve this problem
> I have changed the related function to generate the initial sequence
> numbers randomly and independent from the cpu clock. This feature is
> controlled by a new sysctl option called "tcp_random_isn" which I've added
> to the kernel. Once enabled the initial sequence numbers are guaranteed to
> be generated independently from each other and from the hardware clock of
> the machine. If the option is off, ISNs are generated as before. To get
> more information about this patch and its effectiveness you can refer to my
> post here:
> https://bitguard.wordpress.com/?p=982
<quote>
Firstly it’s unlikely that this happens at all,
</quote>
Sorry this happens all the time.
Some people use very disturbing setups really, and they are not trying to be malicious.
Clock skew seems quite secondary. Some firewall rules should prevent this kind of attacks ?
> and to see a discussion about the issue you can read this:
> https://trac.torproject.org/projects/tor/ticket/16659
Four years old discussion. Does not seem urgent matter :/
>
> Signed-off-by: Sirus Shahini <sirus.shahini@gmail.com>
> ---
> include/net/tcp.h | 1 +
> include/uapi/linux/sysctl.h | 1 +
> kernel/sysctl_binary.c | 1 +
> net/core/secure_seq.c | 24 +++++++++++++++++++++++-
> net/ipv4/sysctl_net_ipv4.c | 7 +++++++
> net/ipv4/tcp_input.c | 1 +
> 6 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 81e8ade..4ad1bbf 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -241,6 +241,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
>
> /* sysctl variables for tcp */
> extern int sysctl_tcp_max_orphans;
> +extern int sysctl_tcp_random_isn;
All TCP sysctls are per netns these days. (Look in include/net/netns/ipv4.h )
> extern long sysctl_tcp_mem[3];
>
> #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */
> diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> index 87aa2a6..ba8927e 100644
> --- a/include/uapi/linux/sysctl.h
> +++ b/include/uapi/linux/sysctl.h
> @@ -426,6 +426,7 @@ enum
> NET_TCP_ALLOWED_CONG_CONTROL=123,
> NET_TCP_MAX_SSTHRESH=124,
> NET_TCP_FRTO_RESPONSE=125,
> + NET_IPV4_TCP_RANDOM_ISN=126,
Nope, we do not add new sysctls there anymore.
Everybody should now have /proc files to tune the values.
> };
>
> enum {
> diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
> index 73c1320..0faf7d4 100644
> --- a/kernel/sysctl_binary.c
> +++ b/kernel/sysctl_binary.c
> @@ -332,6 +332,7 @@ static const struct bin_table bin_net_ipv4_netfilter_table[] = {
> };
>
> static const struct bin_table bin_net_ipv4_table[] = {
> + {CTL_INT, NET_IPV4_TCP_RANDOM_ISN "tcp_random_isn"}
Same remark. We no longer add stuff there.
> {CTL_INT, NET_IPV4_FORWARD, "ip_forward" },
>
> { CTL_DIR, NET_IPV4_CONF, "conf", bin_net_ipv4_conf_table },
> diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
> index 7b6b1d2..b644bbe 100644
> --- a/net/core/secure_seq.c
> +++ b/net/core/secure_seq.c
> @@ -22,6 +22,7 @@
>
> static siphash_key_t net_secret __read_mostly;
> static siphash_key_t ts_secret __read_mostly;
> +static siphash_key_t last_secret = {{0,0}} ;
>
> static __always_inline void net_secret_init(void)
> {
> @@ -134,8 +135,29 @@ u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
> __be16 sport, __be16 dport)
> {
> u32 hash;
> -
> + u32 temp;
> +
> net_secret_init();
> +
> + if (sysctl_tcp_random_isn){
> + if (!last_secret.key[0] && !last_secret.key[1]){
> + memcpy(&last_secret,&net_secret,sizeof(last_secret));
> +
> + }else{
Check your patch using scripts/checkpatch.pl
All these missing spaces should be spotted.
> + temp = *((u32*)&(net_secret.key[0]));
> + temp >>= 8;
> + last_secret.key[0]+=temp;
> + temp = *((u32*)&(net_secret.key[1]));
> + temp >>= 8;
> + last_secret.key[1]+=temp;
Why not simply use an official random generator, instead of these convoluted
and not SMP safe attempts ?
Check drivers/char/random.c for officially maintained generators.
> + }
> + hash = siphash_3u32((__force u32)saddr, (__force u32)daddr,
> + (__force u32)sport << 16 | (__force u32)dport,
> + &last_secret);
> + return hash;
> + }
> +
> +
> hash = siphash_3u32((__force u32)saddr, (__force u32)daddr,
> (__force u32)sport << 16 | (__force u32)dport,
> &net_secret);
^ permalink raw reply
* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
From: Michael S. Tsirkin @ 2019-09-03 7:38 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190903073120.kefllalytkvidcvh@steredhat>
On Tue, Sep 03, 2019 at 09:31:20AM +0200, Stefano Garzarella wrote:
> On Tue, Sep 03, 2019 at 12:38:02AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> > > In order to reduce the number of credit update messages,
> > > we send them only when the space available seen by the
> > > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > include/linux/virtio_vsock.h | 1 +
> > > net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> > > 2 files changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index 7d973903f52e..49fc9d20bc43 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
> > >
> > > /* Protected by rx_lock */
> > > u32 fwd_cnt;
> > > + u32 last_fwd_cnt;
> > > u32 rx_bytes;
> > > struct list_head rx_queue;
> > > };
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 095221f94786..a85559d4d974 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > > {
> > > spin_lock_bh(&vvs->tx_lock);
> > > + vvs->last_fwd_cnt = vvs->fwd_cnt;
> > > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > > spin_unlock_bh(&vvs->tx_lock);
> > > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > > struct virtio_vsock_sock *vvs = vsk->trans;
> > > struct virtio_vsock_pkt *pkt;
> > > size_t bytes, total = 0;
> > > + u32 free_space;
> > > int err = -EFAULT;
> > >
> > > spin_lock_bh(&vvs->rx_lock);
> > > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > > virtio_transport_free_pkt(pkt);
> > > }
> > > }
> > > +
> > > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> > > +
> > > spin_unlock_bh(&vvs->rx_lock);
> > >
> > > - /* Send a credit pkt to peer */
> > > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > > - NULL);
> > > + /* We send a credit update only when the space available seen
> > > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> >
> > This is just repeating what code does though.
> > Please include the *reason* for the condition.
> > E.g. here's a better comment:
> >
> > /* To reduce number of credit update messages,
> > * don't update credits as long as lots of space is available.
> > * Note: the limit chosen here is arbitrary. Setting the limit
> > * too high causes extra messages. Too low causes transmitter
> > * stalls. As stalls are in theory more expensive than extra
> > * messages, we set the limit to a high value. TODO: experiment
> > * with different values.
> > */
> >
>
> Yes, it is better, sorry for that. I'll try to avoid unnecessary comments,
> explaining the reason for certain changes.
>
> Since this patch is already queued in net-next, should I send another
> patch to fix the comment?
>
> Thanks,
> Stefano
I just sent a patch like that, pls ack it.
--
MST
^ permalink raw reply
* [PATCH net-next] vsock/virtio: a better comment on credit update
From: Michael S. Tsirkin @ 2019-09-03 7:38 UTC (permalink / raw)
To: linux-kernel
Cc: Stefano Garzarella, Stefan Hajnoczi, David S. Miller, kvm,
virtualization, netdev
The comment we have is just repeating what the code does.
Include the *reason* for the condition instead.
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
net/vmw_vsock/virtio_transport_common.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 94cc0fa3e848..5bb70c692b1e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -307,8 +307,13 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
spin_unlock_bh(&vvs->rx_lock);
- /* We send a credit update only when the space available seen
- * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+ /* To reduce the number of credit update messages,
+ * don't update credits as long as lots of space is available.
+ * Note: the limit chosen here is arbitrary. Setting the limit
+ * too high causes extra messages. Too low causes transmitter
+ * stalls. As stalls are in theory more expensive than extra
+ * messages, we set the limit to a high value. TODO: experiment
+ * with different values.
*/
if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
virtio_transport_send_credit_update(vsk,
--
MST
^ permalink raw reply related
* RE: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Hayes Wang @ 2019-09-03 7:36 UTC (permalink / raw)
To: Heiner Kallweit, netdev@vger.kernel.org
Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <32d490ae-70af-ba86-93de-be342a2a7e39@gmail.com>
Heiner Kallweit [mailto:hkallweit1@gmail.com]
> Sent: Tuesday, September 03, 2019 3:13 PM
[...]
> > Some of our chips support the feature of UPS. When satisfying certain
> > condition, the hw would recover the settings of speed. Therefore, I have
> > to record the settings of the speed, and set them to hw.
> >
> Not knowing the UPS feature in detail:
> In net-next I changed the software "PHY speed-down" implementation to
> be more generic. It stores the old advertised settings in a new
> phy_device member adv_old, and restores them in phy_speed_up().
> Maybe what you need is similar.
It is a feature about power saving. When some conditions are
satisfied, the power of PHY would be cut. And the hw would
restore the PHY settings including the speed automatically,
when leaving power saving mode.
Best Regards,
Hayes
^ permalink raw reply
* Re: [PATCH v3] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-09-03 7:35 UTC (permalink / raw)
To: Jason Wang
Cc: David Miller, netdev, eric dumazet, xiyou wangcong, weiyongjun1
In-Reply-To: <71e17457-d4bc-15be-dfb3-d0a977fd7556@redhat.com>
On 2019/9/3 14:06, Jason Wang wrote:
>
> On 2019/9/3 下午1:42, Yang Yingliang wrote:
>>
>>
>> On 2019/9/3 11:03, Jason Wang wrote:
>>>
>>> On 2019/9/3 上午9:45, Yang Yingliang wrote:
>>>>
>>>>
>>>> On 2019/9/2 13:32, Jason Wang wrote:
>>>>>
>>>>> On 2019/8/23 下午5:36, Yang Yingliang wrote:
>>>>>>
>>>>>>
>>>>>> On 2019/8/23 11:05, Jason Wang wrote:
>>>>>>> ----- Original Message -----
>>>>>>>>
>>>>>>>> On 2019/8/22 14:07, Yang Yingliang wrote:
>>>>>>>>>
>>>>>>>>> On 2019/8/22 10:13, Jason Wang wrote:
>>>>>>>>>> On 2019/8/20 上午10:28, Jason Wang wrote:
>>>>>>>>>>> On 2019/8/20 上午9:25, David Miller wrote:
>>>>>>>>>>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>>>>>>>>>> Date: Mon, 19 Aug 2019 21:31:19 +0800
>>>>>>>>>>>>
>>>>>>>>>>>>> Call tun_attach() after register_netdevice() to make sure
>>>>>>>>>>>>> tfile->tun
>>>>>>>>>>>>> is not published until the netdevice is registered. So the
>>>>>>>>>>>>> read/write
>>>>>>>>>>>>> thread can not use the tun pointer that may freed by
>>>>>>>>>>>>> free_netdev().
>>>>>>>>>>>>> (The tun and dev pointer are allocated by
>>>>>>>>>>>>> alloc_netdev_mqs(), they
>>>>>>>>>>>>> can
>>>>>>>>>>>>> be freed by netdev_freemem().)
>>>>>>>>>>>> register_netdevice() must always be the last operation in
>>>>>>>>>>>> the order of
>>>>>>>>>>>> network device setup.
>>>>>>>>>>>>
>>>>>>>>>>>> At the point register_netdevice() is called, the device is
>>>>>>>>>>>> visible
>>>>>>>>>>>> globally
>>>>>>>>>>>> and therefore all of it's software state must be fully
>>>>>>>>>>>> initialized and
>>>>>>>>>>>> ready for us.
>>>>>>>>>>>>
>>>>>>>>>>>> You're going to have to find another solution to these
>>>>>>>>>>>> problems.
>>>>>>>>>>>
>>>>>>>>>>> The device is loosely coupled with sockets/queues. Each side is
>>>>>>>>>>> allowed to be go away without caring the other side. So in this
>>>>>>>>>>> case, there's a small window that network stack think the
>>>>>>>>>>> device has
>>>>>>>>>>> one queue but actually not, the code can then safely drop them.
>>>>>>>>>>> Maybe it's ok here with some comments?
>>>>>>>>>>>
>>>>>>>>>>> Or if not, we can try to hold the device before tun_attach
>>>>>>>>>>> and drop
>>>>>>>>>>> it after register_netdevice().
>>>>>>>>>>
>>>>>>>>>> Hi Yang:
>>>>>>>>>>
>>>>>>>>>> I think maybe we can try to hold refcnt instead of playing
>>>>>>>>>> real num
>>>>>>>>>> queues here. Do you want to post a V4?
>>>>>>>>> I think the refcnt can prevent freeing the memory in this case.
>>>>>>>>> When register_netdevice() failed, free_netdev() will be called
>>>>>>>>> directly,
>>>>>>>>> dev->pcpu_refcnt and dev are freed without checking refcnt of
>>>>>>>>> dev.
>>>>>>>> How about using patch-v1 that using a flag to check whether the
>>>>>>>> device
>>>>>>>> registered successfully.
>>>>>>>>
>>>>>>> As I said, it lacks sufficient locks or barriers. To be clear, I
>>>>>>> meant
>>>>>>> something like (compile-test only):
>>>>>>>
>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>>> index db16d7a13e00..e52678f9f049 100644
>>>>>>> --- a/drivers/net/tun.c
>>>>>>> +++ b/drivers/net/tun.c
>>>>>>> @@ -2828,6 +2828,7 @@ static int tun_set_iff(struct net *net,
>>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>> (ifr->ifr_flags & TUN_FEATURES);
>>>>>>> INIT_LIST_HEAD(&tun->disabled);
>>>>>>> + dev_hold(dev);
>>>>>>> err = tun_attach(tun, file, false,
>>>>>>> ifr->ifr_flags & IFF_NAPI,
>>>>>>> ifr->ifr_flags & IFF_NAPI_FRAGS);
>>>>>>> if (err < 0)
>>>>>>> @@ -2836,6 +2837,7 @@ static int tun_set_iff(struct net *net,
>>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>> err = register_netdevice(tun->dev);
>>>>>>> if (err < 0)
>>>>>>> goto err_detach;
>>>>>>> + dev_put(dev);
>>>>>>> }
>>>>>>> netif_carrier_on(tun->dev);
>>>>>>> @@ -2852,11 +2854,13 @@ static int tun_set_iff(struct net *net,
>>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>> return 0;
>>>>>>> err_detach:
>>>>>>> + dev_put(dev);
>>>>>>> tun_detach_all(dev);
>>>>>>> /* register_netdevice() already called
>>>>>>> tun_free_netdev() */
>>>>>>> goto err_free_dev;
>>>>>>> err_free_flow:
>>>>>>> + dev_put(dev);
>>>>>>> tun_flow_uninit(tun);
>>>>>>> security_tun_dev_free_security(tun->security);
>>>>>>> err_free_stat:
>>>>>>>
>>>>>>> What's your thought?
>>>>>>
>>>>>> The dev pointer are freed without checking the refcount in
>>>>>> free_netdev() called by err_free_dev
>>>>>>
>>>>>> path, so I don't understand how the refcount protects this pointer.
>>>>>>
>>>>>
>>>>> The refcount are guaranteed to be zero there, isn't it?
>>>> No, it's not.
>>>>
>>>> err_free_dev:
>>>> free_netdev(dev);
>>>>
>>>> void free_netdev(struct net_device *dev)
>>>> {
>>>> ...
>>>> /* pcpu_refcnt can be freed without checking refcount */
>>>> free_percpu(dev->pcpu_refcnt);
>>>> dev->pcpu_refcnt = NULL;
>>>>
>>>> /* Compatibility with error handling in drivers */
>>>> if (dev->reg_state == NETREG_UNINITIALIZED) {
>>>> /* dev can be freed without checking refcount */
>>>> netdev_freemem(dev);
>>>> return;
>>>> }
>>>> ...
>>>> }
>>>
>>>
>>> Right, but what I meant is in my patch, when code reaches
>>> free_netdev() the refcnt is zero. What did I miss?
>> Yes, but it can't fix the UAF problem.
>
>
> Well, it looks to me that the dev_put() in tun_put() won't release the
> device in this case.
The device is not released in tun_put().
This is how the UAF occurs:
CPUA CPUB
tun_set_iff()
alloc_netdev_mqs()
tun_attach()
tun_chr_read_iter()
tun_get()
tun_do_read()
tun_ring_recv()
register_netdevice() <-- inject error
goto err_detach
tun_detach_all() <-- set RCV_SHUTDOWN
free_netdev() <-- called from
err_free_dev path
netdev_freemem() <-- free the memory
without check refcount
(In this path, the refcount cannot prevent
freeing the memory of dev, and the memory
will be used by dev_put() called by
tun_chr_read_iter() on CPUB.)
(Break from tun_ring_recv(), because RCV_SHUTDOWN is set)
tun_put()
dev_put() <-- use the memory freed by netdev_freemem()
>
> Thanks
>
^ permalink raw reply
* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
From: Stefano Garzarella @ 2019-09-03 7:31 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190903003050-mutt-send-email-mst@kernel.org>
On Tue, Sep 03, 2019 at 12:38:02AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> > In order to reduce the number of credit update messages,
> > we send them only when the space available seen by the
> > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> > include/linux/virtio_vsock.h | 1 +
> > net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 7d973903f52e..49fc9d20bc43 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
> >
> > /* Protected by rx_lock */
> > u32 fwd_cnt;
> > + u32 last_fwd_cnt;
> > u32 rx_bytes;
> > struct list_head rx_queue;
> > };
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 095221f94786..a85559d4d974 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > {
> > spin_lock_bh(&vvs->tx_lock);
> > + vvs->last_fwd_cnt = vvs->fwd_cnt;
> > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > spin_unlock_bh(&vvs->tx_lock);
> > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > struct virtio_vsock_sock *vvs = vsk->trans;
> > struct virtio_vsock_pkt *pkt;
> > size_t bytes, total = 0;
> > + u32 free_space;
> > int err = -EFAULT;
> >
> > spin_lock_bh(&vvs->rx_lock);
> > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > virtio_transport_free_pkt(pkt);
> > }
> > }
> > +
> > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> > +
> > spin_unlock_bh(&vvs->rx_lock);
> >
> > - /* Send a credit pkt to peer */
> > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > - NULL);
> > + /* We send a credit update only when the space available seen
> > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
>
> This is just repeating what code does though.
> Please include the *reason* for the condition.
> E.g. here's a better comment:
>
> /* To reduce number of credit update messages,
> * don't update credits as long as lots of space is available.
> * Note: the limit chosen here is arbitrary. Setting the limit
> * too high causes extra messages. Too low causes transmitter
> * stalls. As stalls are in theory more expensive than extra
> * messages, we set the limit to a high value. TODO: experiment
> * with different values.
> */
>
Yes, it is better, sorry for that. I'll try to avoid unnecessary comments,
explaining the reason for certain changes.
Since this patch is already queued in net-next, should I send another
patch to fix the comment?
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH] net: sched: taprio: Fix potential integer overflow in taprio_set_picos_per_byte
From: Eric Dumazet @ 2019-09-03 7:19 UTC (permalink / raw)
To: Gustavo A. R. Silva, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S. Miller, Vladimir Oltean
Cc: netdev, linux-kernel
In-Reply-To: <20190903010817.GA13595@embeddedor>
On 9/3/19 3:08 AM, Gustavo A. R. Silva wrote:
> Add suffix LL to constant 1000 in order to avoid a potential integer
> overflow and give the compiler complete information about the proper
> arithmetic to use. Notice that this constant is being used in a context
> that expects an expression of type s64, but it's currently evaluated
> using 32-bit arithmetic.
>
> Addresses-Coverity-ID: 1453459 ("Unintentional integer overflow")
> Fixes: f04b514c0ce2 ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte")
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> ---
> net/sched/sch_taprio.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 8d8bc2ec5cd6..956f837436ea 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -966,7 +966,7 @@ static void taprio_set_picos_per_byte(struct net_device *dev,
>
> skip:
> picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
> - speed * 1000 * 1000);
> + speed * 1000LL * 1000);
>
> atomic64_set(&q->picos_per_byte, picos_per_byte);
> netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld, linkspeed: %d\n",
>
But, why even multiplying by 1,000,000 in the first place, this seems silly,
a standard 32 bit divide could be used instead.
->
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8d8bc2ec5cd6281d811fd5d8a5c5211ebb0edd73..944b1af3215668e927d486b6c6c65c4599fb9539 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -965,8 +965,7 @@ static void taprio_set_picos_per_byte(struct net_device *dev,
speed = ecmd.base.speed;
skip:
- picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
- speed * 1000 * 1000);
+ picos_per_byte = (USEC_PER_SEC * 8) / speed;
atomic64_set(&q->picos_per_byte, picos_per_byte);
netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld, linkspeed: %d\n",
^ permalink raw reply related
* Re: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Heiner Kallweit @ 2019-09-03 7:12 UTC (permalink / raw)
To: Hayes Wang, netdev@vger.kernel.org; +Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18DAD2A@RTITMBSVM03.realtek.com.tw>
On 03.09.2019 08:55, Hayes Wang wrote:
> Heiner Kallweit [mailto:hkallweit1@gmail.com]
>> Sent: Tuesday, September 03, 2019 2:45 PM
> [...]
>>> Besides, I have a question. I think I don't need rtl8152_set_speed()
>>> if I implement phylib. However, I need to record some information
>>> according to the settings of speed. For now, I do it in rtl8152_set_speed().
>>> Do you have any idea about how I should do it with phylib without
>>> rtl8152_set_speed()?
>>>
>> When saying "record some information", what kind of information?
>
> Some of our chips support the feature of UPS. When satisfying certain
> condition, the hw would recover the settings of speed. Therefore, I have
> to record the settings of the speed, and set them to hw.
>
Not knowing the UPS feature in detail:
In net-next I changed the software "PHY speed-down" implementation to
be more generic. It stores the old advertised settings in a new
phy_device member adv_old, and restores them in phy_speed_up().
Maybe what you need is similar.
>> The speed itself is stored in struct phy_device, if you need to adjust
>> certain chip settings depending on negotiated speed, then you can do
>> this in a callback (parameter handler of phy_connect_direct).
>> See e.g. r8169_phylink_handler()
>
> Thanks. I would study it.
>
> Best Regards,
> Hayes
>
>
Heiner
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH] i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min tx rate
From: Stefan Assmann @ 2019-09-03 7:09 UTC (permalink / raw)
To: Paul Menzel, intel-wired-lan; +Cc: netdev, davem
In-Reply-To: <36909884-1de6-a537-0341-b060d01e4c0d@molgen.mpg.de>
On 03.09.19 08:42, Paul Menzel wrote:
> Dear Stefan,
Hi Paul,
> On 03.09.19 08:08, Stefan Assmann wrote:
>> In the case of an invalid min tx rate being requested
>> i40e_ndo_set_vf_bw() immediately returns -EINVAL instead of releasing
>> __I40E_VIRTCHNL_OP_PENDING first.
>
> What problem does this cause?
If the __I40E_VIRTCHNL_OP_PENDING bit never gets cleared no further
virtchnl op can be processed. For example you can no longer destroy the
VFs.
Stefan
>> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
>> ---
>> drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> index f8aa4deceb5e..3d2440838822 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> @@ -4263,7 +4263,8 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
>> if (min_tx_rate) {
>> dev_err(&pf->pdev->dev, "Invalid min tx rate (%d) (greater than 0) specified for VF %d.\n",
>> min_tx_rate, vf_id);
>> - return -EINVAL;
>> + ret = -EINVAL;
>> + goto error;
>> }
>>
>> vf = &pf->vf[vf_id];
>
>
> Kind regards,
>
> Paul
>
^ permalink raw reply
* [BACKPORT 4.14.y 7/8] ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"
From: Baolin Wang @ 2019-09-03 7:00 UTC (permalink / raw)
To: stable, paulus
Cc: ebiggers, linux-ppp, netdev, arnd, baolin.wang, orsonzhai,
vincent.guittot, linux-kernel
In-Reply-To: <cover.1567492316.git.baolin.wang@linaro.org>
From: Eric Biggers <ebiggers@google.com>
Commit 0e5a610b5ca5 ("ppp: mppe: switch to RC4 library interface"),
which was merged through the crypto tree for v5.3, changed ppp_mppe.c to
use the new arc4_crypt() library function rather than access RC4 through
the dynamic crypto_skcipher API.
Meanwhile commit aad1dcc4f011 ("ppp: mppe: Add softdep to arc4") was
merged through the net tree and added a module soft-dependency on "arc4".
The latter commit no longer makes sense because the code now uses the
"libarc4" module rather than "arc4", and also due to the direct use of
arc4_crypt(), no module soft-dependency is required.
So revert the latter commit.
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
drivers/net/ppp/ppp_mppe.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
index d9eda7c..6c7fd98 100644
--- a/drivers/net/ppp/ppp_mppe.c
+++ b/drivers/net/ppp/ppp_mppe.c
@@ -63,7 +63,6 @@
MODULE_DESCRIPTION("Point-to-Point Protocol Microsoft Point-to-Point Encryption support");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_ALIAS("ppp-compress-" __stringify(CI_MPPE));
-MODULE_SOFTDEP("pre: arc4");
MODULE_VERSION("1.0.2");
static unsigned int
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH 4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
From: Tim Froidcoeur @ 2019-09-03 6:58 UTC (permalink / raw)
To: maowenan
Cc: David Miller, cpaasch@apple.com, jonathan.lemon@gmail.com,
stable@vger.kernel.org, gregkh@linuxfoundation.org,
matthieu.baerts@tessares.net, aprout@ll.mit.edu,
edumazet@google.com, jtl@netflix.com,
linux-kernel@vger.kernel.org, mkubecek@suse.cz,
ncardwell@google.com, sashal@kernel.org, ycheng@google.com,
netdev@vger.kernel.org
In-Reply-To: <F95AC9340317A84688A5F0DF0246F3F21AAB8F82@dggeml512-mbx.china.huawei.com>
Hi,
I also tried to reproduce this in a targeted way, and run into the
same difficulty as you: satisfying the first condition “
(sk->sk_wmem_queued >> 1) > limit “.
I will not have bandwidth the coming days to try and reproduce it in
this way. Maybe simply forcing a very small send buffer using sysctl
net.ipv4.tcp_wmem might even do the trick?
I suspect that the bug is easier to trigger with the MPTCP patch like
I did originally, due to the way this patch manages the tcp subflow
buffers (it can temporarily overfill the buffers, satisfying that
first condition more often).
another thing, the stacktrace you shared before seems caused by
another issue (corrupted socket?), it will not be solved by the patch
we submitted.
kind regards,
Tim
On Tue, Sep 3, 2019 at 5:22 AM maowenan <maowenan@huawei.com> wrote:
>
> Hi Tim,
>
>
>
> I try to reproduce it with packetdrill or user application, but I can’t.
>
> The first condition “ (sk->sk_wmem_queued >> 1) > limit “ can’t be satisfied,
>
> This condition is to avoid tiny SO_SNDBUF values set by user.
>
> It also adds the some room due to the fact that tcp_sendmsg()
>
> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
>
> TSO skb (64KB size).
>
>
>
> limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
>
> if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
>
> skb != tcp_rtx_queue_head(sk) &&
>
> skb != tcp_rtx_queue_tail(sk))) {
>
> NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>
> return -ENOMEM;
>
> }
>
>
>
> Can you try to reproduce it with packetdrill or C socket application?
>
>
--
Tim Froidcoeur | R&D engineer HAG
tim.froidcoeur@tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
--
Disclaimer: https://www.tessares.net/mail-disclaimer/
<https://www.tessares.net/mail-disclaimer/>
^ permalink raw reply
* [PATCH] net/mlx5: Use PTR_ERR_OR_ZERO rather than its implementation
From: zhong jiang @ 2019-09-03 6:56 UTC (permalink / raw)
To: davem; +Cc: saeedm, leon, netdev, linux-kernel, zhongjiang
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better
to use it directly. hence just replace it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5581a80..2e0b467 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -989,10 +989,7 @@ static void mlx5e_hairpin_flow_del(struct mlx5e_priv *priv,
&flow_act, dest, dest_ix);
mutex_unlock(&priv->fs.tc.t_lock);
- if (IS_ERR(flow->rule[0]))
- return PTR_ERR(flow->rule[0]);
-
- return 0;
+ return PTR_ERR_OR_ZERO(flow->rule[0]);
}
static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
--
1.7.12.4
^ permalink raw reply related
* [BACKPORT 4.14.y 4/8] net: sctp: fix warning "NULL check before some freeing functions is not needed"
From: Baolin Wang @ 2019-09-03 6:58 UTC (permalink / raw)
To: stable, vyasevich, nhorman, davem
Cc: hariprasad.kelam, linux-sctp, netdev, arnd, baolin.wang,
orsonzhai, vincent.guittot, linux-kernel
In-Reply-To: <cover.1567492316.git.baolin.wang@linaro.org>
From: Hariprasad Kelam <hariprasad.kelam@gmail.com>
This patch removes NULL checks before calling kfree.
fixes below issues reported by coccicheck
net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
freeing functions is not needed.
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
net/sctp/sm_make_chunk.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f67df16..6dac492 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2586,8 +2586,7 @@ static int sctp_process_param(struct sctp_association *asoc,
case SCTP_PARAM_STATE_COOKIE:
asoc->peer.cookie_len =
ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
- if (asoc->peer.cookie)
- kfree(asoc->peer.cookie);
+ kfree(asoc->peer.cookie);
asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp);
if (!asoc->peer.cookie)
retval = 0;
@@ -2652,8 +2651,7 @@ static int sctp_process_param(struct sctp_association *asoc,
goto fall_through;
/* Save peer's random parameter */
- if (asoc->peer.peer_random)
- kfree(asoc->peer.peer_random);
+ kfree(asoc->peer.peer_random);
asoc->peer.peer_random = kmemdup(param.p,
ntohs(param.p->length), gfp);
if (!asoc->peer.peer_random) {
@@ -2667,8 +2665,7 @@ static int sctp_process_param(struct sctp_association *asoc,
goto fall_through;
/* Save peer's HMAC list */
- if (asoc->peer.peer_hmacs)
- kfree(asoc->peer.peer_hmacs);
+ kfree(asoc->peer.peer_hmacs);
asoc->peer.peer_hmacs = kmemdup(param.p,
ntohs(param.p->length), gfp);
if (!asoc->peer.peer_hmacs) {
@@ -2684,8 +2681,7 @@ static int sctp_process_param(struct sctp_association *asoc,
if (!ep->auth_enable)
goto fall_through;
- if (asoc->peer.peer_chunks)
- kfree(asoc->peer.peer_chunks);
+ kfree(asoc->peer.peer_chunks);
asoc->peer.peer_chunks = kmemdup(param.p,
ntohs(param.p->length), gfp);
if (!asoc->peer.peer_chunks)
--
1.7.9.5
^ permalink raw reply related
* [BACKPORT 4.14.y 2/8] ip6: fix skb leak in ip6frag_expire_frag_queue()
From: Baolin Wang @ 2019-09-03 6:56 UTC (permalink / raw)
To: stable, davem, kuznet, yoshfuji, edumazet
Cc: netdev, arnd, baolin.wang, orsonzhai, vincent.guittot,
linux-kernel
In-Reply-To: <cover.1567492316.git.baolin.wang@linaro.org>
From: Eric Dumazet <edumazet@google.com>
Since ip6frag_expire_frag_queue() now pulls the head skb
from frag queue, we should no longer use skb_get(), since
this leads to an skb leak.
Stefan Bader initially reported a problem in 4.4.stable [1] caused
by the skb_get(), so this patch should also fix this issue.
296583.091021] kernel BUG at /build/linux-6VmqmP/linux-4.4.0/net/core/skbuff.c:1207!
[296583.091734] Call Trace:
[296583.091749] [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
[296583.091764] [<ffffffff8183939a>] _decode_session6+0x26a/0x400
[296583.091779] [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
[296583.091795] [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
[296583.091809] [<ffffffff81824421>] icmp6_send+0x5e1/0x940
[296583.091823] [<ffffffff81753238>] ? __netif_receive_skb+0x18/0x60
[296583.091838] [<ffffffff817532b2>] ? netif_receive_skb_internal+0x32/0xa0
[296583.091858] [<ffffffffc0199f74>] ? ixgbe_clean_rx_irq+0x594/0xac0 [ixgbe]
[296583.091876] [<ffffffffc04eb260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[296583.091893] [<ffffffff8183d431>] icmpv6_send+0x21/0x30
[296583.091906] [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
[296583.091921] [<ffffffffc04eb27f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[296583.091938] [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[296583.091951] [<ffffffffc04eb260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[296583.091968] [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[296583.091982] [<ffffffff8108a339>] __do_softirq+0x109/0x2b0
Fixes: d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6 defrag")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: Peter Oskolkov <posk@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
include/net/ipv6_frag.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/net/ipv6_frag.h b/include/net/ipv6_frag.h
index 28aa9b3..1f77fb4 100644
--- a/include/net/ipv6_frag.h
+++ b/include/net/ipv6_frag.h
@@ -94,7 +94,6 @@ static inline u32 ip6frag_obj_hashfn(const void *data, u32 len, u32 seed)
goto out;
head->dev = dev;
- skb_get(head);
spin_unlock(&fq->q.lock);
icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
--
1.7.9.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox