* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-29 12:43 UTC (permalink / raw)
To: Ido Schimmel
Cc: Andrew Lunn, Horatiu Vultur, Nikolay Aleksandrov, roopa, davem,
bridge, netdev, linux-kernel
In-Reply-To: <20190729060923.GA16938@splinter>
Hi Ido,
The 07/29/2019 09:09, Ido Schimmel wrote:
> External E-Mail
>
>
> On Sun, Jul 28, 2019 at 09:15:59PM +0200, Allan W. Nielsen wrote:
> > If we assume that the SwitchDev driver implemented such that all multicast
> > traffic goes to the CPU, then we should really have a way to install a HW
> > offload path in the silicon, such that these packets does not go to the CPU (as
> > they are known not to be use full, and a frame every 3 us is a significant load
> > on small DMA connections and CPU resources).
> >
> > If we assume that the SwitchDev driver implemented such that only "needed"
> > multicast packets goes to the CPU, then we need a way to get these packets in
> > case we want to implement the DLR protocol.
>
> I'm not familiar with the HW you're working with, so the below might not
> be relevant.
>
> In case you don't want to send all multicast traffic to the CPU (I'll
> refer to it later), you can install an ingress tc filter that traps to
> the CPU the packets you do want to receive. Something like:
>
> # tc qdisc add dev swp1 clsact
> # tc filter add dev swp1 pref 1 ingress flower skip_sw dst_mac \
> 01:21:6C:00:00:01 action trap
I have actually been looking at this, and it may an idea to go down this road.
But so far we have chosen not to for the following reasons:
- It is not only about trapping traffic to the CPU, we also needs to capability
to limit the flooding on the front ports.
- In our case (the silicon), this feature really belongs to the MAC-table, which
is why we did prefer to do it via the FDB entries.
- But the HW does have TCAM resources, and we are planning on exposing these
resources via the tc-flower interface. It is just that we have more MAC
table resoruces than TCAM resources, which is another argument for using the
MAC table.
> If your HW supports sharing the same filter among multiple ports, then
> you can install your filter in a tc shared block and bind multiple ports
> to it.
It does, thanks for making us aware of this optimization option.
> Another option is to always send a *copy* of multicast packets to the
> CPU, but make sure the HW uses a policer that prevents the CPU from
> being overwhelmed. To avoid packets being forwarded twice (by HW and
> SW), you will need to mark such packets in your driver with
> 'skb->offload_fwd_mark = 1'.
Understood
> Now, in case user wants to allow the CPU to receive certain packets at a
> higher rate, a tc filter can be used. It will be identical to the filter
> I mentioned earlier, but with a 'police' action chained before 'trap'.
I see.
> I don't think this is currently supported by any driver, but I believe
> it's the right way to go: By default the CPU receives all the traffic it
> should receive and user can fine-tune it using ACLs.
If all the frames goes to the CPU, then how can I fine-tune frames not to go to
the CPU?? I can do a TRAP (to get it to the CPU) a DROP (to drop it before
forwarding), but how can I forward a multicast packet, but prevent it from going
to the CPU?
I have seen that the mirror command can do re-direction, but not to a list of
ports...
All in all, thanks a lot for the suggestions, but to begin with I think we will
explore the MAC table option a bit more. But we will get back to TC to support
the ACL functions.
/Allan
^ permalink raw reply
* Re: [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu @ 2019-07-29 12:47 UTC (permalink / raw)
To: Jiri Pirko; +Cc: pablo, fw, jakub.kicinski, netfilter-devel, netdev
In-Reply-To: <20190729111350.GE2211@nanopsycho>
在 2019/7/29 19:13, Jiri Pirko 写道:
> Sun, Jul 28, 2019 at 08:52:47AM CEST, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> move tc indirect block to flow_offload and rename
>> it to flow indirect block.The nf_tables can use the
>> indr block architecture.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>> v3: subsys_initcall for init_flow_indr_rhashtable
>> v4: no change
>>
> [...]
>
>
>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>> index 00b9aab..66f89bc 100644
>> --- a/include/net/flow_offload.h
>> +++ b/include/net/flow_offload.h
>> @@ -4,6 +4,7 @@
>> #include <linux/kernel.h>
>> #include <linux/list.h>
>> #include <net/flow_dissector.h>
>> +#include <linux/rhashtable.h>
>>
>> struct flow_match {
>> struct flow_dissector *dissector;
>> @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
>> INIT_LIST_HEAD(&flow_block->cb_list);
>> }
>>
>> +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
>> + enum tc_setup_type type, void *type_data);
>> +
>> +struct flow_indr_block_cb {
>> + struct list_head list;
>> + void *cb_priv;
>> + flow_indr_block_bind_cb_t *cb;
>> + void *cb_ident;
>> +};
> I don't understand why are you pushing this struct out of the c file to
> the header. Please don't.
>
>
>> +
>> +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
>> + struct flow_block *flow_block,
>> + struct flow_indr_block_cb *indr_block_cb,
>> + enum flow_block_command command);
>> +
>> +struct flow_indr_block_dev {
>> + struct rhash_head ht_node;
>> + struct net_device *dev;
>> + unsigned int refcnt;
>> + struct list_head cb_list;
>> + flow_indr_block_ing_cmd_t *ing_cmd_cb;
>> + struct flow_block *flow_block;
> I don't understand why are you pushing this struct out of the c file to
> the header. Please don't.
the flow_indr_block_dev and indr_block_cb in the h file used for the function
tc_indr_block_ing_cmd in cls_api.c
>> -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
>> - struct tc_indr_block_cb *indr_block_cb,
>> +static void tc_indr_block_ing_cmd(struct net_device *dev,
> I don't understand why you change struct tc_indr_block_dev * to
> struct net_device * here. If you want to do that, please do that in a
> separate patch, not it this one where only "the move" should happen.
>
^ permalink raw reply
* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Nikolay Aleksandrov @ 2019-07-29 12:50 UTC (permalink / raw)
To: Allan W. Nielsen
Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel
In-Reply-To: <95315f9e-0d31-2d34-ba50-11e1bbc1465c@cumulusnetworks.com>
On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> Hi Allan,
> On 29/07/2019 15:14, Allan W. Nielsen wrote:
>> Hi Nikolay,
>>
>> First of all, as mentioned further down in this thread, I realized that our
>> implementation of the multicast floodmasks does not align with the existing SW
>> implementation. We will change this, such that all multicast packets goes to the
>> SW bridge.
>>
>> This changes things a bit, not that much.
>>
>> I actually think you summarized the issue we have (after changing to multicast
>> flood-masks) right here:
>>
>> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
>>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
>>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
>>>>> Thus only the flooding may need to be controlled.
>>
>> This seems to be exactly what we need.
>>
>> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
>> on a network where we want to limit the flooding of frames with dmac
>> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
>>
>> One way of doing this could potentially be to support the following command:
>>
>> bridge fdb add 01:21:6C:00:00:01 port eth0
>> bridge fdb append 01:21:6C:00:00:01 port eth1
>>
And the fdbs become linked lists ? So we'll increase the complexity for something
that is already supported by ACLs (e.g. tc) and also bridge per-port multicast
flood flag ?
I'm sorry but that doesn't sound good to me for a case which is very rare and
there are existing ways to solve without incurring performance hits or increasing
code complexity.
>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>>>>>>>> In general NLM_F_APPEND is only used in vxlan, the bridge does not
>>>>>>>>> handle that flag at all. FDB is only for *unicast*, nothing is joined
>>>>>>>>> and no multicast should be used with fdbs. MDB is used for multicast
>>>>>>>>> handling, but both of these are used for forwarding.
>> This is true, and this should have been addressed in the patch, we were too
>> focused on setting up the offload patch in the driver, and forgot to do the SW
>> implementation.
>>
>> Do you see any issues in supporting this flag, and updating the SW
>> forwarding in br_handle_frame_finish such that it can support/allow a FDB entry
>> to be a multicast?
>>
>
> Yes, all of the multicast code is handled differently, it doesn't go through the fdb
> lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
> multicast mac address, take a look at br_handle_frame_finish() and you'll notice
> that when a multicast dmac is detected then we use the bridge mcast code for lookups
> and forwarding. If you're trying to achieve Rx only on the bridge of these then
> why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?
>
> If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
> use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
> possible to achieve via other methods, no need to go through the bridge.
>
>> /Allan
>>
>
^ permalink raw reply
* RE: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Michal Kalderon @ 2019-07-29 12:58 UTC (permalink / raw)
To: Jason Gunthorpe, galpress@amazon.com
Cc: Ariel Elior, dledford@redhat.com, galpress@amazon.com,
linux-rdma@vger.kernel.org, davem@davemloft.net,
netdev@vger.kernel.org
In-Reply-To: <20190725175540.GA18757@ziepe.ca>
> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
>
> > + xa_lock(&ucontext->mmap_xa);
> > + if (check_add_overflow(ucontext->mmap_xa_page,
> > + (u32)(length >> PAGE_SHIFT),
> > + &next_mmap_page))
> > + goto err_unlock;
>
> I still don't like that this algorithm latches into a permanent failure when the
> xa_page wraps.
>
> It seems worth spending a bit more time here to tidy this.. Keep using the
> mmap_xa_page scheme, but instead do something like
>
> alloc_cyclic_range():
>
> while () {
> // Find first empty element in a cyclic way
> xa_page_first = mmap_xa_page;
> xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)
>
> // Is there a enough room to have the range?
> if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
> mmap_xa_page = 0;
> continue;
> }
>
> // See if the element before intersects
> elm = xa_find(xa, &zero, xa_page_end, 0);
> if (elm && intersects(xa_page_first, xa_page_last, elm->first, elm->last)) {
> mmap_xa_page = elm->last + 1;
> continue
> }
>
> // xa_page_first -> xa_page_end should now be free
> xa_insert(xa, xa_page_start, entry);
> mmap_xa_page = xa_page_end + 1;
> return xa_page_start;
> }
>
> Approximately, please check it.
Gal & Jason,
Coming back to the mmap_xa_page algorithm. I couldn't find some background on this.
Why do you need the length to be represented in the mmap_xa_page ?
Why not simply use xa_alloc_cyclic ( like in siw )
This is simply a key to a mmap object...
Thanks,
Michal
^ permalink raw reply
* Re: [PATCH] net: stmmac: manage errors returned by of_get_mac_address()
From: Neil Armstrong @ 2019-07-29 13:03 UTC (permalink / raw)
To: Martin Blumenstingl, peppe.cavallaro, alexandre.torgue, joabreu,
davem, netdev
Cc: linux-amlogic, linux-kernel, linux-arm-kernel
In-Reply-To: <20190727192137.27881-1-martin.blumenstingl@googlemail.com>
On 27/07/2019 21:21, Martin Blumenstingl wrote:
> Commit d01f449c008a ("of_net: add NVMEM support to of_get_mac_address")
> added support for reading the MAC address from an nvmem-cell. This
> required changing the logic to return an error pointer upon failure.
>
> If stmmac is loaded before the nvmem provider driver then
> of_get_mac_address() return an error pointer with -EPROBE_DEFER.
>
> Propagate this error so the stmmac driver will be probed again after the
> nvmem provider driver is loaded.
> Default to a random generated MAC address in case of any other error,
> instead of using the error pointer as MAC address.
>
> Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address")
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index 73fc2524372e..154daf4d1072 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -370,6 +370,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
> return ERR_PTR(-ENOMEM);
>
> *mac = of_get_mac_address(np);
> + if (IS_ERR(*mac)) {
> + if (PTR_ERR(*mac) == -EPROBE_DEFER)
> + return ERR_CAST(*mac);
> +
> + *mac = NULL;
> + }
> +
> plat->interface = of_get_phy_mode(np);
>
> /* Some wrapper drivers still rely on phy_node. Let's save it while
>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
^ permalink raw reply
* [PATCH ethtool] gitignore: ignore vim swapfiles and patches
From: Michal Kubecek @ 2019-07-29 13:10 UTC (permalink / raw)
To: John W. Linville; +Cc: netdev
The .*.swp files are created by vim to hold the undo/redo log. Add them to
.gitignore to prevent "git status" or "git gui" from showing them whenever
some file is open in editor.
Add also *.patch to hide patches created by e.g. "git format-patch".
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
.gitignore | 3 +++
1 file changed, 3 insertions(+)
diff --git a/.gitignore b/.gitignore
index f1165a2c9037..c4df588c37ea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -27,3 +27,6 @@ autom4te.cache
.deps
test-*.log
test-*.trs
+
+.*.swp
+*.patch
--
2.22.0
^ permalink raw reply related
* Re: [PATCH] tcp: add new tcp_mtu_probe_floor sysctl
From: Neal Cardwell @ 2019-07-29 13:12 UTC (permalink / raw)
To: Josh Hunt; +Cc: Eric Dumazet, netdev, David Miller
In-Reply-To: <5a054ca5-4077-5e91-69d5-f1add8dc8bfa@akamai.com>
On Sun, Jul 28, 2019 at 5:14 PM Josh Hunt <johunt@akamai.com> wrote:
>
> On 7/28/19 6:54 AM, Eric Dumazet wrote:
> > On Sun, Jul 28, 2019 at 1:21 AM Josh Hunt <johunt@akamai.com> wrote:
> >>
> >> On 7/27/19 12:05 AM, Eric Dumazet wrote:
> >>> On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt <johunt@akamai.com> wrote:
> >>>>
> >>>> The current implementation of TCP MTU probing can considerably
> >>>> underestimate the MTU on lossy connections allowing the MSS to get down to
> >>>> 48. We have found that in almost all of these cases on our networks these
> >>>> paths can handle much larger MTUs meaning the connections are being
> >>>> artificially limited. Even though TCP MTU probing can raise the MSS back up
> >>>> we have seen this not to be the case causing connections to be "stuck" with
> >>>> an MSS of 48 when heavy loss is present.
> >>>>
> >>>> Prior to pushing out this change we could not keep TCP MTU probing enabled
> >>>> b/c of the above reasons. Now with a reasonble floor set we've had it
> >>>> enabled for the past 6 months.
> >>>
> >>> And what reasonable value have you used ???
> >>
> >> Reasonable for some may not be reasonable for others hence the new
> >> sysctl :) We're currently running with a fairly high value based off of
> >> the v6 min MTU minus headers and options, etc. We went conservative with
> >> our setting initially as it seemed a reasonable first step when
> >> re-enabling TCP MTU probing since with no configurable floor we saw a #
> >> of cases where connections were using severely reduced mss b/c of loss
> >> and not b/c of actual path restriction. I plan to reevaluate the setting
> >> at some point, but since the probing method is still the same it means
> >> the same clients who got stuck with mss of 48 before will land at
> >> whatever floor we set. Looking forward we are interested in trying to
> >> improve TCP MTU probing so it does not penalize clients like this.
> >>
> >> A suggestion for a more reasonable floor default would be 512, which is
> >> the same as the min_pmtu. Given both mechanisms are trying to achieve
> >> the same goal it seems like they should have a similar min/floor.
> >>
> >>>
> >>>>
> >>>> The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
> >>>> administrators the ability to control the floor of MSS probing.
> >>>>
> >>>> Signed-off-by: Josh Hunt <johunt@akamai.com>
> >>>> ---
> >>>> Documentation/networking/ip-sysctl.txt | 6 ++++++
> >>>> include/net/netns/ipv4.h | 1 +
> >>>> net/ipv4/sysctl_net_ipv4.c | 9 +++++++++
> >>>> net/ipv4/tcp_ipv4.c | 1 +
> >>>> net/ipv4/tcp_timer.c | 2 +-
> >>>> 5 files changed, 18 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> >>>> index df33674799b5..49e95f438ed7 100644
> >>>> --- a/Documentation/networking/ip-sysctl.txt
> >>>> +++ b/Documentation/networking/ip-sysctl.txt
> >>>> @@ -256,6 +256,12 @@ tcp_base_mss - INTEGER
> >>>> Path MTU discovery (MTU probing). If MTU probing is enabled,
> >>>> this is the initial MSS used by the connection.
> >>>>
> >>>> +tcp_mtu_probe_floor - INTEGER
> >>>> + If MTU probing is enabled this caps the minimum MSS used for search_low
> >>>> + for the connection.
> >>>> +
> >>>> + Default : 48
> >>>> +
> >>>> tcp_min_snd_mss - INTEGER
> >>>> TCP SYN and SYNACK messages usually advertise an ADVMSS option,
> >>>> as described in RFC 1122 and RFC 6691.
> >>>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> >>>> index bc24a8ec1ce5..c0c0791b1912 100644
> >>>> --- a/include/net/netns/ipv4.h
> >>>> +++ b/include/net/netns/ipv4.h
> >>>> @@ -116,6 +116,7 @@ struct netns_ipv4 {
> >>>> int sysctl_tcp_l3mdev_accept;
> >>>> #endif
> >>>> int sysctl_tcp_mtu_probing;
> >>>> + int sysctl_tcp_mtu_probe_floor;
> >>>> int sysctl_tcp_base_mss;
> >>>> int sysctl_tcp_min_snd_mss;
> >>>> int sysctl_tcp_probe_threshold;
> >>>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> >>>> index 0b980e841927..59ded25acd04 100644
> >>>> --- a/net/ipv4/sysctl_net_ipv4.c
> >>>> +++ b/net/ipv4/sysctl_net_ipv4.c
> >>>> @@ -820,6 +820,15 @@ static struct ctl_table ipv4_net_table[] = {
> >>>> .extra2 = &tcp_min_snd_mss_max,
> >>>> },
> >>>> {
> >>>> + .procname = "tcp_mtu_probe_floor",
> >>>> + .data = &init_net.ipv4.sysctl_tcp_mtu_probe_floor,
> >>>> + .maxlen = sizeof(int),
> >>>> + .mode = 0644,
> >>>> + .proc_handler = proc_dointvec_minmax,
> >>>> + .extra1 = &tcp_min_snd_mss_min,
> >>>> + .extra2 = &tcp_min_snd_mss_max,
> >>>> + },
> >>>> + {
> >>>> .procname = "tcp_probe_threshold",
> >>>> .data = &init_net.ipv4.sysctl_tcp_probe_threshold,
> >>>> .maxlen = sizeof(int),
> >>>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> >>>> index d57641cb3477..e0a372676329 100644
> >>>> --- a/net/ipv4/tcp_ipv4.c
> >>>> +++ b/net/ipv4/tcp_ipv4.c
> >>>> @@ -2637,6 +2637,7 @@ static int __net_init tcp_sk_init(struct net *net)
> >>>> net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
> >>>> net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
> >>>> net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
> >>>> + net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS;
> >>>>
> >>>> net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
> >>>> net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
> >>>> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> >>>> index c801cd37cc2a..dbd9d2d0ee63 100644
> >>>> --- a/net/ipv4/tcp_timer.c
> >>>> +++ b/net/ipv4/tcp_timer.c
> >>>> @@ -154,7 +154,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
> >>>> } else {
> >>>> mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
> >>>> mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
> >>>> - mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len);
> >>>> + mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor);
> >>>> mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss);
> >>>> icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
> >>>> }
> >>>
> >>>
> >>> Existing sysctl should be enough ?
> >>
> >> I don't think so. Changing tcp_min_snd_mss could impact clients that
> >> really want/need a small mss. When you added the new sysctl I tried to
> >> analyze the mss values we're seeing to understand what we could possibly
> >> raise it to. While not a huge amount, we see more clients than I
> >> expected announcing mss values in the 180-512 range. Given that I would
> >> not feel comfortable setting tcp_min_snd_mss to say 512 as I suggested
> >> above.
> >
> > If these clients need mss values in 180-512 ranges, how MTU probing
> > would work for them,
> > if you set a floor to 512 ?
>
> First, we already seem to be fine with ignoring these paths with ICMP
> based PMTU discovery b/c of our min_pmtu default of 512 and that is
> configurable. Second by adding this sysctl we're giving administrators
> the choice to decide if they'd like to attempt to support these very
> very small # of paths which may be below 512 (MSS <= 512 does not mean
> MTU <= 512) or cover themselves by being able to raise the floor to not
> penalize clients who may be on very lossy networks.
>
> >
> > Are we sure the intent of tcp_base_mss was not to act as a floor ?
>
> My understanding is that tcp_base_mss is meant to be the initial value
> of search_low (as per Docs). Then in RFC 4821 [1] Sections 7.2, shows
> search_low should be configurable, and 7.7 we see that in response to
> successive black hole detection search_low should be halved. So I don't
> think it was meant to be a floor, but just the initial search_low param.
That matches my reading of the RFC and code as well. But in that case
IMHO an additional commit should fix this comment to reflect the fact
thatTCP_BASE_MSS is the initial value, rather than a floor:
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 42728239cdbe..05575ac70333 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -75,7 +75,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
/* Minimal accepted MSS. It is (60+60+8) - (20+20). */
#define TCP_MIN_MSS 88U
-/* The least MTU to use for probing */
+/* The initial MTU to use for probing */
#define TCP_BASE_MSS 1024
/* probing interval, default to 10 minutes as per RFC4821 */
neal
^ permalink raw reply related
* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
From: Stefan Hajnoczi @ 2019-07-29 13:12 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Stefan Hajnoczi, kvm, Michael S. Tsirkin, netdev, linux-kernel,
virtualization, David S. Miller
In-Reply-To: <20190722091434.tzf7lxw3tvrs5w5v@steredhat>
[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]
On Mon, Jul 22, 2019 at 11:14:34AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > to better track the performance trends.
> > >
> > > v4:
> > > - rebased all patches on current master (conflicts is Patch 4)
> > > - Patch 1: added Stefan's R-b
> > > - Patch 3: removed lock when buf_alloc is written [David];
> > > moved this patch after "vsock/virtio: reduce credit update messages"
> > > to make it clearer
> > > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> > > conflicts
> >
> > Stefano: Do you want to continue experimenting before we merge this
> > patch series? The code looks functionally correct and the performance
> > increases, so I'm happy for it to be merged.
>
> I think we can merge this series.
>
> I'll continue to do other experiments (e.g. removing TX workers, allocating
> pages, etc.) but I think these changes are prerequisites for the other patches,
> so we can merge them.
>
> Thank you very much for the reviews!
All patches have been reviewed by here. Have an Ack for good measure:
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
The topics discussed in sub-threads relate to longer-term optimization
work that doesn't block this series. Please merge.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: Jiri Pirko @ 2019-07-29 13:13 UTC (permalink / raw)
To: wenxu; +Cc: pablo, fw, jakub.kicinski, netfilter-devel, netdev
In-Reply-To: <c218d9bb-1da7-2ed6-d5b0-afddbe3d0bd7@ucloud.cn>
Mon, Jul 29, 2019 at 02:47:07PM CEST, wenxu@ucloud.cn wrote:
>
>在 2019/7/29 19:13, Jiri Pirko 写道:
>> Sun, Jul 28, 2019 at 08:52:47AM CEST, wenxu@ucloud.cn wrote:
>>> From: wenxu <wenxu@ucloud.cn>
>>>
>>> move tc indirect block to flow_offload and rename
>>> it to flow indirect block.The nf_tables can use the
>>> indr block architecture.
>>>
>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>> ---
>>> v3: subsys_initcall for init_flow_indr_rhashtable
>>> v4: no change
>>>
>> [...]
>>
>>
>>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>>> index 00b9aab..66f89bc 100644
>>> --- a/include/net/flow_offload.h
>>> +++ b/include/net/flow_offload.h
>>> @@ -4,6 +4,7 @@
>>> #include <linux/kernel.h>
>>> #include <linux/list.h>
>>> #include <net/flow_dissector.h>
>>> +#include <linux/rhashtable.h>
>>>
>>> struct flow_match {
>>> struct flow_dissector *dissector;
>>> @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
>>> INIT_LIST_HEAD(&flow_block->cb_list);
>>> }
>>>
>>> +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
>>> + enum tc_setup_type type, void *type_data);
>>> +
>>> +struct flow_indr_block_cb {
>>> + struct list_head list;
>>> + void *cb_priv;
>>> + flow_indr_block_bind_cb_t *cb;
>>> + void *cb_ident;
>>> +};
>> I don't understand why are you pushing this struct out of the c file to
>> the header. Please don't.
>>
>>
>>> +
>>> +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
>>> + struct flow_block *flow_block,
>>> + struct flow_indr_block_cb *indr_block_cb,
>>> + enum flow_block_command command);
>>> +
>>> +struct flow_indr_block_dev {
>>> + struct rhash_head ht_node;
>>> + struct net_device *dev;
>>> + unsigned int refcnt;
>>> + struct list_head cb_list;
>>> + flow_indr_block_ing_cmd_t *ing_cmd_cb;
>>> + struct flow_block *flow_block;
>> I don't understand why are you pushing this struct out of the c file to
>> the header. Please don't.
>
>the flow_indr_block_dev and indr_block_cb in the h file used for the function
You don't need it, same as before. Please don't expose this struct.
>
>tc_indr_block_ing_cmd in cls_api.c
>
>>> -static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
>>> - struct tc_indr_block_cb *indr_block_cb,
>>> +static void tc_indr_block_ing_cmd(struct net_device *dev,
>> I don't understand why you change struct tc_indr_block_dev * to
>> struct net_device * here. If you want to do that, please do that in a
>> separate patch, not it this one where only "the move" should happen.
Did you see the rest of my comments???
>>
^ permalink raw reply
* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-29 13:14 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel
In-Reply-To: <95315f9e-0d31-2d34-ba50-11e1bbc1465c@cumulusnetworks.com>
The 07/29/2019 15:22, Nikolay Aleksandrov wrote:
> Yes, all of the multicast code is handled differently, it doesn't go through the fdb
> lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
> multicast mac address, take a look at br_handle_frame_finish() and you'll notice
> that when a multicast dmac is detected then we use the bridge mcast code for lookups
> and forwarding.
Here is my thinking (needs much more elaboration, which will come if we do a
patch to test it out):
In br_pkt_type
Rename BR_PKT_MULTICAST to BR_PKT_MULTICAST_IP
Add a new type called BR_PKT_MULTICAST_L2
In br_handle_frame_finish
if (is_multicast_ether_addr(dest)) {
/* by definition the broadcast is also a multicast address */
if (is_broadcast_ether_addr(dest)) {
pkt_type = BR_PKT_BROADCAST;
local_rcv = true;
} else {
pkt_type = BR_PKT_MULTICAST;
if (br_multicast_rcv(br, p, skb, vid))
goto drop;
}
}
Change the code above to detect if it is a BR_PKT_MULTICAST_IP or a
BR_PKT_MULTICAST_L2
In this section:
switch (pkt_type) {
....
}
if (dst) {
} else {
}
Add awareness to the BR_PKT_MULTICAST_L2 type, and allow it do forwarding
according to the static entry if it is there.
> If you're trying to achieve Rx only on the bridge of these then
> why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?
>
> If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
> use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
> possible to achieve via other methods, no need to go through the bridge.
Well, I wanted the SW bridge implementation to behave the same with an without
HW offload.
And also, I believe that is conceptually belongs to the MAC tables.
/Allan
^ permalink raw reply
* [PATCH stable 4.9] tcp: reset sk_send_head in tcp_write_queue_purge
From: Mao Wenan @ 2019-07-29 13:21 UTC (permalink / raw)
To: gregkh, stable; +Cc: netdev, linux-kernel
From: Soheil Hassas Yeganeh <soheil@google.com>
tcp_write_queue_purge clears all the SKBs in the write queue
but does not reset the sk_send_head. As a result, we can have
a NULL pointer dereference anywhere that we use tcp_send_head
instead of the tcp_write_queue_tail.
For example, after a27fd7a8ed38 (tcp: purge write queue upon RST),
we can purge the write queue on RST. Prior to
75c119afe14f (tcp: implement rb-tree based retransmit queue),
tcp_push will only check tcp_send_head and then accesses
tcp_write_queue_tail to send the actual SKB. As a result, it will
dereference a NULL pointer.
This has been reported twice for 4.14 where we don't have
75c119afe14f:
By Timofey Titovets:
[ 422.081094] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000038
[ 422.081254] IP: tcp_push+0x42/0x110
[ 422.081314] PGD 0 P4D 0
[ 422.081364] Oops: 0002 [#1] SMP PTI
By Yongjian Xu:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: tcp_push+0x48/0x120
PGD 80000007ff77b067 P4D 80000007ff77b067 PUD 7fd989067 PMD 0
Oops: 0002 [#18] SMP PTI
Modules linked in: tcp_diag inet_diag tcp_bbr sch_fq iTCO_wdt
iTCO_vendor_support pcspkr ixgbe mdio i2c_i801 lpc_ich joydev input_leds shpchp
e1000e igb dca ptp pps_core hwmon mei_me mei ipmi_si ipmi_msghandler sg ses
scsi_transport_sas enclosure ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas
wmi ast ttm dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 6 PID: 14156 Comm: [ET_NET 6] Tainted: G D 4.14.26-1.el6.x86_64 #1
Hardware name: LENOVO ThinkServer RD440 /ThinkServer RD440, BIOS A0TS80A
09/22/2014
task: ffff8807d78d8140 task.stack: ffffc9000e944000
RIP: 0010:tcp_push+0x48/0x120
RSP: 0018:ffffc9000e947a88 EFLAGS: 00010246
RAX: 00000000000005b4 RBX: ffff880f7cce9c00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8807d00f5000
RBP: ffffc9000e947aa8 R08: 0000000000001c84 R09: 0000000000000000
R10: ffff8807d00f5158 R11: 0000000000000000 R12: ffff8807d00f5000
R13: 0000000000000020 R14: 00000000000256d4 R15: 0000000000000000
FS: 00007f5916de9700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000007f8226004 CR4: 00000000001606e0
Call Trace:
tcp_sendmsg_locked+0x33d/0xe50
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x39/0xc0
sock_sendmsg+0x49/0x60
sock_write_iter+0xb6/0x100
do_iter_readv_writev+0xec/0x130
? rw_verify_area+0x49/0xb0
do_iter_write+0x97/0xd0
vfs_writev+0x7e/0xe0
? __wake_up_common_lock+0x80/0xa0
? __fget_light+0x2c/0x70
? __do_page_fault+0x1e7/0x530
do_writev+0x60/0xf0
? inet_shutdown+0xac/0x110
SyS_writev+0x10/0x20
do_syscall_64+0x6f/0x140
? prepare_exit_to_usermode+0x8b/0xa0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x3135ce0c57
RSP: 002b:00007f5916de4b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000003135ce0c57
RDX: 0000000000000002 RSI: 00007f5916de4b90 RDI: 000000000000606f
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f5916de8c38
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000464cc
R13: 00007f5916de8c30 R14: 00007f58d8bef080 R15: 0000000000000002
Code: 48 8b 97 60 01 00 00 4c 8d 97 58 01 00 00 41 b9 00 00 00 00 41 89 f3 4c 39
d2 49 0f 44 d1 41 81 e3 00 80 00 00 0f 85 b0 00 00 00 <80> 4a 38 08 44 8b 8f 74
06 00 00 44 89 8f 7c 06 00 00 83 e6 01
RIP: tcp_push+0x48/0x120 RSP: ffffc9000e947a88
CR2: 0000000000000038
---[ end trace 8d545c2e93515549 ]---
There is other scenario which found in stable 4.4:
Allocated:
[<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
[<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
[<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
[<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
[<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
Freed:
[<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
[<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
[<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
[<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
[<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
[<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
[<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
[<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
[<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
[<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
[<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
[<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
stable 4.4 and stable 4.9 don't have the commit abb4a8b870b5 ("tcp: purge write queue upon RST")
which is referred in dbbf2d1e4077,
in tcp_connect_init, it calls tcp_write_queue_purge, and does not reset sk_send_head, then UAF.
stable 4.14 have the commit abb4a8b870b5 ("tcp: purge write queue upon RST"),
in tcp_reset, it calls tcp_write_queue_purge(sk), and does not reset sk_send_head, then UAF.
So this patch can be used to fix stable 4.4 and 4.9.
Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Timofey Titovets <nefelim4ag@gmail.com>
Reported-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
include/net/tcp.h | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index d7047de952f0..1eda31f7f013 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1512,6 +1512,11 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
void tcp_fastopen_init_key_once(bool publish);
#define TCP_FASTOPEN_KEY_LENGTH 16
+static inline void tcp_init_send_head(struct sock *sk)
+{
+ sk->sk_send_head = NULL;
+}
+
/* Fastopen key context */
struct tcp_fastopen_context {
struct crypto_cipher *tfm;
@@ -1528,6 +1533,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
sk_wmem_free_skb(sk, skb);
sk_mem_reclaim(sk);
tcp_clear_all_retrans_hints(tcp_sk(sk));
+ tcp_init_send_head(sk);
inet_csk(sk)->icsk_backoff = 0;
}
@@ -1589,11 +1595,6 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli
tcp_sk(sk)->highest_sack = NULL;
}
-static inline void tcp_init_send_head(struct sock *sk)
-{
- sk->sk_send_head = NULL;
-}
-
static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
{
__skb_queue_tail(&sk->sk_write_queue, skb);
--
2.20.1
^ permalink raw reply related
* [PATCH stable 4.4] tcp: reset sk_send_head in tcp_write_queue_purge
From: Mao Wenan @ 2019-07-29 13:22 UTC (permalink / raw)
To: gregkh, stable; +Cc: netdev, linux-kernel
From: Soheil Hassas Yeganeh <soheil@google.com>
tcp_write_queue_purge clears all the SKBs in the write queue
but does not reset the sk_send_head. As a result, we can have
a NULL pointer dereference anywhere that we use tcp_send_head
instead of the tcp_write_queue_tail.
For example, after a27fd7a8ed38 (tcp: purge write queue upon RST),
we can purge the write queue on RST. Prior to
75c119afe14f (tcp: implement rb-tree based retransmit queue),
tcp_push will only check tcp_send_head and then accesses
tcp_write_queue_tail to send the actual SKB. As a result, it will
dereference a NULL pointer.
This has been reported twice for 4.14 where we don't have
75c119afe14f:
By Timofey Titovets:
[ 422.081094] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000038
[ 422.081254] IP: tcp_push+0x42/0x110
[ 422.081314] PGD 0 P4D 0
[ 422.081364] Oops: 0002 [#1] SMP PTI
By Yongjian Xu:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: tcp_push+0x48/0x120
PGD 80000007ff77b067 P4D 80000007ff77b067 PUD 7fd989067 PMD 0
Oops: 0002 [#18] SMP PTI
Modules linked in: tcp_diag inet_diag tcp_bbr sch_fq iTCO_wdt
iTCO_vendor_support pcspkr ixgbe mdio i2c_i801 lpc_ich joydev input_leds shpchp
e1000e igb dca ptp pps_core hwmon mei_me mei ipmi_si ipmi_msghandler sg ses
scsi_transport_sas enclosure ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas
wmi ast ttm dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 6 PID: 14156 Comm: [ET_NET 6] Tainted: G D 4.14.26-1.el6.x86_64 #1
Hardware name: LENOVO ThinkServer RD440 /ThinkServer RD440, BIOS A0TS80A
09/22/2014
task: ffff8807d78d8140 task.stack: ffffc9000e944000
RIP: 0010:tcp_push+0x48/0x120
RSP: 0018:ffffc9000e947a88 EFLAGS: 00010246
RAX: 00000000000005b4 RBX: ffff880f7cce9c00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8807d00f5000
RBP: ffffc9000e947aa8 R08: 0000000000001c84 R09: 0000000000000000
R10: ffff8807d00f5158 R11: 0000000000000000 R12: ffff8807d00f5000
R13: 0000000000000020 R14: 00000000000256d4 R15: 0000000000000000
FS: 00007f5916de9700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000007f8226004 CR4: 00000000001606e0
Call Trace:
tcp_sendmsg_locked+0x33d/0xe50
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x39/0xc0
sock_sendmsg+0x49/0x60
sock_write_iter+0xb6/0x100
do_iter_readv_writev+0xec/0x130
? rw_verify_area+0x49/0xb0
do_iter_write+0x97/0xd0
vfs_writev+0x7e/0xe0
? __wake_up_common_lock+0x80/0xa0
? __fget_light+0x2c/0x70
? __do_page_fault+0x1e7/0x530
do_writev+0x60/0xf0
? inet_shutdown+0xac/0x110
SyS_writev+0x10/0x20
do_syscall_64+0x6f/0x140
? prepare_exit_to_usermode+0x8b/0xa0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x3135ce0c57
RSP: 002b:00007f5916de4b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000003135ce0c57
RDX: 0000000000000002 RSI: 00007f5916de4b90 RDI: 000000000000606f
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f5916de8c38
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000464cc
R13: 00007f5916de8c30 R14: 00007f58d8bef080 R15: 0000000000000002
Code: 48 8b 97 60 01 00 00 4c 8d 97 58 01 00 00 41 b9 00 00 00 00 41 89 f3 4c 39
d2 49 0f 44 d1 41 81 e3 00 80 00 00 0f 85 b0 00 00 00 <80> 4a 38 08 44 8b 8f 74
06 00 00 44 89 8f 7c 06 00 00 83 e6 01
RIP: tcp_push+0x48/0x120 RSP: ffffc9000e947a88
CR2: 0000000000000038
---[ end trace 8d545c2e93515549 ]---
There is other scenario which found in stable 4.4:
Allocated:
[<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
[<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
[<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
[<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
[<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
Freed:
[<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
[<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
[<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
[<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
[<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
[<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
[<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
[<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
[<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
[<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
[<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
[<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
stable 4.4 and stable 4.9 don't have the commit abb4a8b870b5 ("tcp: purge write queue upon RST")
which is referred in dbbf2d1e4077,
in tcp_connect_init, it calls tcp_write_queue_purge, and does not reset sk_send_head, then UAF.
stable 4.14 have the commit abb4a8b870b5 ("tcp: purge write queue upon RST"),
in tcp_reset, it calls tcp_write_queue_purge(sk), and does not reset sk_send_head, then UAF.
So this patch can be used to fix stable 4.4 and 4.9.
Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Timofey Titovets <nefelim4ag@gmail.com>
Reported-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
include/net/tcp.h | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index bf8a0dae977a..77438a8406ec 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1443,6 +1443,11 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
void tcp_fastopen_init_key_once(bool publish);
#define TCP_FASTOPEN_KEY_LENGTH 16
+static inline void tcp_init_send_head(struct sock *sk)
+{
+ sk->sk_send_head = NULL;
+}
+
/* Fastopen key context */
struct tcp_fastopen_context {
struct crypto_cipher *tfm;
@@ -1459,6 +1464,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
sk_wmem_free_skb(sk, skb);
sk_mem_reclaim(sk);
tcp_clear_all_retrans_hints(tcp_sk(sk));
+ tcp_init_send_head(sk);
inet_csk(sk)->icsk_backoff = 0;
}
@@ -1520,11 +1526,6 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli
tcp_sk(sk)->highest_sack = NULL;
}
-static inline void tcp_init_send_head(struct sock *sk)
-{
- sk->sk_send_head = NULL;
-}
-
static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
{
__skb_queue_tail(&sk->sk_write_queue, skb);
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v3] net: dsa: qca8k: enable port flow control
From: Andrew Lunn @ 2019-07-29 13:23 UTC (permalink / raw)
To: xiaofeis
Cc: davem, vkoul, netdev, linux-arm-msm, bjorn.andersson,
vivien.didelot, f.fainelli, niklas.cassel, xiazha
In-Reply-To: <fa444b03b42a2cb72037bc73a62f1976@codeaurora.org>
> But our qca8k HW can auto sync the pause status to MAC from phy with the
> auto-negotiated result.
> So no need to set in qca8k_adjust_link, since there is one setting in
> qca8k_port_set_status: mask |= QCA8K_PORT_STATUS_LINK_AUTO;
How does the auto-sync actually work? Does the MAC make MDIO reads to
the PHY? That is generally unsafe, since some PHYs support pages, and
the PHY driver might be using a different page while the MAC tries to
access the auto-neg results.
Do any of the ports support an external PHY? The auto-sync might not
work in that condition as well. Different register layout, c45 not
c22, etc.
The safest option is to explicitly set the MAC flow configuration
based on the values in phydev.
Andrew
^ permalink raw reply
* Re: [PATCH 2/4] net: phy: Add mdio-aspeed
From: Andrew Lunn @ 2019-07-29 13:32 UTC (permalink / raw)
To: Andrew Jeffery
Cc: netdev, davem, robh+dt, mark.rutland, joel, f.fainelli,
hkallweit1, devicetree, linux-arm-kernel, linux-aspeed,
linux-kernel
In-Reply-To: <20190729043926.32679-3-andrew@aj.id.au>
On Mon, Jul 29, 2019 at 02:09:24PM +0930, Andrew Jeffery wrote:
> The AST2600 design separates the MDIO controllers from the MAC, which is
> where they were placed in the AST2400 and AST2500. Further, the register
> interface is reworked again, so now we have three possible different
> interface implementations, however this driver only supports the
> interface provided by the AST2600. The AST2400 and AST2500 will continue
> to be supported by the MDIO support embedded in the FTGMAC100 driver.
>
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> ---
> drivers/net/phy/Kconfig | 13 +++
> drivers/net/phy/Makefile | 1 +
> drivers/net/phy/mdio-aspeed.c | 159 ++++++++++++++++++++++++++++++++++
> 3 files changed, 173 insertions(+)
> create mode 100644 drivers/net/phy/mdio-aspeed.c
>
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 20f14c5fbb7e..206d8650ee7f 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -21,6 +21,19 @@ config MDIO_BUS
>
> if MDIO_BUS
>
> +config MDIO_ASPEED
> + tristate "ASPEED MDIO bus controller"
> + depends on ARCH_ASPEED || COMPILE_TEST
> + depends on OF_MDIO && HAS_IOMEM
> + help
> + This module provides a driver for the independent MDIO bus
> + controllers found in the ASPEED AST2600 SoC. This is a driver for the
> + third revision of the ASPEED MDIO register interface - the first two
> + revisions are the "old" and "new" interfaces found in the AST2400 and
> + AST2500, embedded in the MAC. For legacy reasons, FTGMAC100 driver
> + continues to drive the embedded MDIO controller for the AST2400 and
> + AST2500 SoCs, so say N if AST2600 support is not required.
> +
> config MDIO_BCM_IPROC
> tristate "Broadcom iProc MDIO bus controller"
> depends on ARCH_BCM_IPROC || COMPILE_TEST
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 839acb292c38..ba07c27e4208 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -22,6 +22,7 @@ libphy-$(CONFIG_LED_TRIGGER_PHY) += phy_led_triggers.o
> obj-$(CONFIG_PHYLINK) += phylink.o
> obj-$(CONFIG_PHYLIB) += libphy.o
>
> +obj-$(CONFIG_MDIO_ASPEED) += mdio-aspeed.o
> obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
> obj-$(CONFIG_MDIO_BCM_UNIMAC) += mdio-bcm-unimac.o
> obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o
> diff --git a/drivers/net/phy/mdio-aspeed.c b/drivers/net/phy/mdio-aspeed.c
> new file mode 100644
> index 000000000000..71496a9ff54a
> --- /dev/null
> +++ b/drivers/net/phy/mdio-aspeed.c
> @@ -0,0 +1,159 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Copyright (C) 2019 IBM Corp. */
> +
> +#include <linux/bitfield.h>
> +#include <linux/delay.h>
> +#include <linux/mdio.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_mdio.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +
> +#define DRV_NAME "mdio-aspeed"
> +
> +#define ASPEED_MDIO_CTRL 0x0
> +#define ASPEED_MDIO_CTRL_FIRE BIT(31)
> +#define ASPEED_MDIO_CTRL_ST BIT(28)
> +#define ASPEED_MDIO_CTRL_ST_C45 0
> +#define ASPEED_MDIO_CTRL_ST_C22 1
> +#define ASPEED_MDIO_CTRL_OP GENMASK(27, 26)
> +#define MDIO_C22_OP_WRITE 0b01
> +#define MDIO_C22_OP_READ 0b10
> +#define ASPEED_MDIO_CTRL_PHYAD GENMASK(25, 21)
> +#define ASPEED_MDIO_CTRL_REGAD GENMASK(20, 16)
> +#define ASPEED_MDIO_CTRL_MIIWDATA GENMASK(15, 0)
> +
> +#define ASPEED_MDIO_DATA 0x4
> +#define ASPEED_MDIO_DATA_MDC_THRES GENMASK(31, 24)
> +#define ASPEED_MDIO_DATA_MDIO_EDGE BIT(23)
> +#define ASPEED_MDIO_DATA_MDIO_LATCH GENMASK(22, 20)
> +#define ASPEED_MDIO_DATA_IDLE BIT(16)
> +#define ASPEED_MDIO_DATA_MIIRDATA GENMASK(15, 0)
> +
> +#define ASPEED_MDIO_RETRIES 10
> +
> +struct aspeed_mdio {
> + void __iomem *base;
> +};
> +
> +static int aspeed_mdio_read(struct mii_bus *bus, int addr, int regnum)
> +{
> + struct aspeed_mdio *ctx = bus->priv;
> + u32 ctrl;
> + int i;
> +
> + dev_dbg(&bus->dev, "%s: addr: %d, regnum: %d\n", __func__, addr,
> + regnum);
> +
> + /* Just clause 22 for the moment */
> + ctrl = ASPEED_MDIO_CTRL_FIRE
Hi Andrew
In the binding, you say C45 is supported. Here you don't. It would be
nice to be consistent.
> + | FIELD_PREP(ASPEED_MDIO_CTRL_ST, ASPEED_MDIO_CTRL_ST_C22)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_OP, MDIO_C22_OP_READ)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_PHYAD, addr)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_REGAD, regnum);
> +
> + iowrite32(ctrl, ctx->base + ASPEED_MDIO_CTRL);
> +
> + for (i = 0; i < ASPEED_MDIO_RETRIES; i++) {
> + u32 data;
> +
> + data = ioread32(ctx->base + ASPEED_MDIO_DATA);
> + if (data & ASPEED_MDIO_DATA_IDLE)
> + return FIELD_GET(ASPEED_MDIO_DATA_MIIRDATA, data);
> +
> + udelay(100);
> + }
One of the readx_poll_timeout functions could be used.
> +
> + dev_err(&bus->dev, "MDIO read failed\n");
> + return -EIO;
> +}
> +
> +static int aspeed_mdio_write(struct mii_bus *bus, int addr, int regnum, u16 val)
> +{
> + struct aspeed_mdio *ctx = bus->priv;
> + u32 ctrl;
> + int i;
> +
> + dev_dbg(&bus->dev, "%s: addr: %d, regnum: %d, val: 0x%x\n",
> + __func__, addr, regnum, val);
> +
> + /* Just clause 22 for the moment */
> + ctrl = ASPEED_MDIO_CTRL_FIRE
> + | FIELD_PREP(ASPEED_MDIO_CTRL_ST, ASPEED_MDIO_CTRL_ST_C22)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_OP, MDIO_C22_OP_WRITE)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_PHYAD, addr)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_REGAD, regnum)
> + | FIELD_PREP(ASPEED_MDIO_CTRL_MIIWDATA, val);
> +
> + iowrite32(ctrl, ctx->base + ASPEED_MDIO_CTRL);
> +
> + for (i = 0; i < ASPEED_MDIO_RETRIES; i++) {
> + ctrl = ioread32(ctx->base + ASPEED_MDIO_CTRL);
> + if (!(ctrl & ASPEED_MDIO_CTRL_FIRE))
> + return 0;
> +
> + udelay(100);
> + }
readx_poll_timeout() here as well.
Otherwise this looks good.
Andrew
^ permalink raw reply
* [PATCH v4 04/14] NFC: nxp-nci: Convert to use GPIO descriptor
From: Andy Shevchenko @ 2019-07-29 13:35 UTC (permalink / raw)
To: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
Cc: Andy Shevchenko, Sedat Dilek
In-Reply-To: <20190729133514.13164-1-andriy.shevchenko@linux.intel.com>
Since we got rid of platform data, the driver may use
GPIO descriptor directly.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
---
drivers/nfc/nxp-nci/core.c | 1 -
drivers/nfc/nxp-nci/i2c.c | 60 ++++++++++----------------------------
2 files changed, 15 insertions(+), 46 deletions(-)
diff --git a/drivers/nfc/nxp-nci/core.c b/drivers/nfc/nxp-nci/core.c
index aed18ca60170..a0ce95a287c5 100644
--- a/drivers/nfc/nxp-nci/core.c
+++ b/drivers/nfc/nxp-nci/core.c
@@ -11,7 +11,6 @@
*/
#include <linux/delay.h>
-#include <linux/gpio.h>
#include <linux/module.h>
#include <linux/nfc.h>
diff --git a/drivers/nfc/nxp-nci/i2c.c b/drivers/nfc/nxp-nci/i2c.c
index 47b3b7e612e6..713c267acf88 100644
--- a/drivers/nfc/nxp-nci/i2c.c
+++ b/drivers/nfc/nxp-nci/i2c.c
@@ -21,8 +21,6 @@
#include <linux/module.h>
#include <linux/nfc.h>
#include <linux/gpio/consumer.h>
-#include <linux/of_gpio.h>
-#include <linux/of_irq.h>
#include <asm/unaligned.h>
#include <net/nfc/nfc.h>
@@ -37,8 +35,8 @@ struct nxp_nci_i2c_phy {
struct i2c_client *i2c_dev;
struct nci_dev *ndev;
- unsigned int gpio_en;
- unsigned int gpio_fw;
+ struct gpio_desc *gpiod_en;
+ struct gpio_desc *gpiod_fw;
int hard_fault; /*
* < 0 if hardware error occurred (e.g. i2c err)
@@ -51,8 +49,8 @@ static int nxp_nci_i2c_set_mode(void *phy_id,
{
struct nxp_nci_i2c_phy *phy = (struct nxp_nci_i2c_phy *) phy_id;
- gpio_set_value(phy->gpio_fw, (mode == NXP_NCI_MODE_FW) ? 1 : 0);
- gpio_set_value(phy->gpio_en, (mode != NXP_NCI_MODE_COLD) ? 1 : 0);
+ gpiod_set_value(phy->gpiod_fw, (mode == NXP_NCI_MODE_FW) ? 1 : 0);
+ gpiod_set_value(phy->gpiod_en, (mode != NXP_NCI_MODE_COLD) ? 1 : 0);
usleep_range(10000, 15000);
if (mode == NXP_NCI_MODE_COLD)
@@ -252,30 +250,18 @@ static irqreturn_t nxp_nci_i2c_irq_thread_fn(int irq, void *phy_id)
static int nxp_nci_i2c_parse_devtree(struct i2c_client *client)
{
struct nxp_nci_i2c_phy *phy = i2c_get_clientdata(client);
- struct device_node *pp;
- int r;
-
- pp = client->dev.of_node;
- if (!pp)
- return -ENODEV;
- r = of_get_named_gpio(pp, "enable-gpios", 0);
- if (r == -EPROBE_DEFER)
- r = of_get_named_gpio(pp, "enable-gpios", 0);
- if (r < 0) {
- nfc_err(&client->dev, "Failed to get EN gpio, error: %d\n", r);
- return r;
+ phy->gpiod_en = devm_gpiod_get(&client->dev, "enable", GPIOD_OUT_LOW);
+ if (IS_ERR(phy->gpiod_en)) {
+ nfc_err(&client->dev, "Failed to get EN gpio\n");
+ return PTR_ERR(phy->gpiod_en);
}
- phy->gpio_en = r;
- r = of_get_named_gpio(pp, "firmware-gpios", 0);
- if (r == -EPROBE_DEFER)
- r = of_get_named_gpio(pp, "firmware-gpios", 0);
- if (r < 0) {
- nfc_err(&client->dev, "Failed to get FW gpio, error: %d\n", r);
- return r;
+ phy->gpiod_fw = devm_gpiod_get(&client->dev, "firmware", GPIOD_OUT_LOW);
+ if (IS_ERR(phy->gpiod_fw)) {
+ nfc_err(&client->dev, "Failed to get FW gpio\n");
+ return PTR_ERR(phy->gpiod_fw);
}
- phy->gpio_fw = r;
return 0;
}
@@ -283,19 +269,15 @@ static int nxp_nci_i2c_parse_devtree(struct i2c_client *client)
static int nxp_nci_i2c_acpi_config(struct nxp_nci_i2c_phy *phy)
{
struct i2c_client *client = phy->i2c_dev;
- struct gpio_desc *gpiod_en, *gpiod_fw;
- gpiod_en = devm_gpiod_get_index(&client->dev, NULL, 2, GPIOD_OUT_LOW);
- gpiod_fw = devm_gpiod_get_index(&client->dev, NULL, 1, GPIOD_OUT_LOW);
+ phy->gpiod_en = devm_gpiod_get_index(&client->dev, NULL, 2, GPIOD_OUT_LOW);
+ phy->gpiod_fw = devm_gpiod_get_index(&client->dev, NULL, 1, GPIOD_OUT_LOW);
- if (IS_ERR(gpiod_en) || IS_ERR(gpiod_fw)) {
+ if (IS_ERR(phy->gpiod_en) || IS_ERR(phy->gpiod_fw)) {
nfc_err(&client->dev, "No GPIOs\n");
return -EINVAL;
}
- phy->gpio_en = desc_to_gpio(gpiod_en);
- phy->gpio_fw = desc_to_gpio(gpiod_fw);
-
return 0;
}
@@ -331,24 +313,12 @@ static int nxp_nci_i2c_probe(struct i2c_client *client,
r = nxp_nci_i2c_acpi_config(phy);
if (r < 0)
goto probe_exit;
- goto nci_probe;
} else {
nfc_err(&client->dev, "No platform data\n");
r = -EINVAL;
goto probe_exit;
}
- r = devm_gpio_request_one(&phy->i2c_dev->dev, phy->gpio_en,
- GPIOF_OUT_INIT_LOW, "nxp_nci_en");
- if (r < 0)
- goto probe_exit;
-
- r = devm_gpio_request_one(&phy->i2c_dev->dev, phy->gpio_fw,
- GPIOF_OUT_INIT_LOW, "nxp_nci_fw");
- if (r < 0)
- goto probe_exit;
-
-nci_probe:
r = nxp_nci_probe(phy, &client->dev, &i2c_phy_ops,
NXP_NCI_I2C_MAX_PAYLOAD, &phy->ndev);
if (r < 0)
--
2.20.1
^ permalink raw reply related
* [PATCH v4 02/14] NFC: nxp-nci: Add NXP1001 to the ACPI ID table
From: Andy Shevchenko @ 2019-07-29 13:35 UTC (permalink / raw)
To: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
Cc: Andy Shevchenko, Sedat Dilek
In-Reply-To: <20190729133514.13164-1-andriy.shevchenko@linux.intel.com>
It seems a lot of laptops are equipped with NXP NFC300 chip with
the ACPI ID NXP1001 as per DSDT.
Append it to the driver's ACPI ID table.
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
---
drivers/nfc/nxp-nci/i2c.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nfc/nxp-nci/i2c.c b/drivers/nfc/nxp-nci/i2c.c
index 4aeb3861b409..5db71869f04b 100644
--- a/drivers/nfc/nxp-nci/i2c.c
+++ b/drivers/nfc/nxp-nci/i2c.c
@@ -396,6 +396,7 @@ MODULE_DEVICE_TABLE(of, of_nxp_nci_i2c_match);
#ifdef CONFIG_ACPI
static struct acpi_device_id acpi_id[] = {
+ { "NXP1001" },
{ "NXP7471" },
{ },
};
--
2.20.1
^ permalink raw reply related
* [PATCH v4 09/14] NFC: nxp-nci: Drop of_match_ptr() use
From: Andy Shevchenko @ 2019-07-29 13:35 UTC (permalink / raw)
To: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
Cc: Andy Shevchenko, Sedat Dilek
In-Reply-To: <20190729133514.13164-1-andriy.shevchenko@linux.intel.com>
There is no need to guard OF device ID table with of_match_ptr().
Otherwise we would get a defined but not used data.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
---
drivers/nfc/nxp-nci/i2c.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nfc/nxp-nci/i2c.c b/drivers/nfc/nxp-nci/i2c.c
index 4e71962dc557..f2c8a560e265 100644
--- a/drivers/nfc/nxp-nci/i2c.c
+++ b/drivers/nfc/nxp-nci/i2c.c
@@ -342,7 +342,7 @@ static struct i2c_driver nxp_nci_i2c_driver = {
.driver = {
.name = NXP_NCI_I2C_DRIVER_NAME,
.acpi_match_table = ACPI_PTR(acpi_id),
- .of_match_table = of_match_ptr(of_nxp_nci_i2c_match),
+ .of_match_table = of_nxp_nci_i2c_match,
},
.probe = nxp_nci_i2c_probe,
.id_table = nxp_nci_i2c_id_table,
--
2.20.1
^ permalink raw reply related
* [PATCH v4 00/14] NFC: nxp-nci: clean up and new device support
From: Andy Shevchenko @ 2019-07-29 13:35 UTC (permalink / raw)
To: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
Cc: Andy Shevchenko
Few people reported that some laptops are coming with new ACPI ID for the
devices should be supported by nxp-nci driver.
This series adds new ID (patch 2), cleans up the driver from legacy platform
data and unifies GPIO request for Device Tree and ACPI (patches 3-6), removes
dead or unneeded code (patches 7, 9, 11), constifies ID table (patch 8),
removes comma in terminator line for better maintenance (patch 10) and
rectifies Kconfig entry (patches 12-14).
It also contains a fix for NFC subsystem as suggested by Sedat.
Series has been tested by Sedat.
Changelog v4:
- rebased on top of latest linux-next
- appended cover letter
- elaborated removal of pr_fmt() in the patch 11 (David)
Andrey Konovalov (1):
NFC: fix attrs checks in netlink interface
Andy Shevchenko (11):
NFC: nxp-nci: Add NXP1001 to the ACPI ID table
NFC: nxp-nci: Get rid of platform data
NFC: nxp-nci: Convert to use GPIO descriptor
NFC: nxp-nci: Add GPIO ACPI mapping table
NFC: nxp-nci: Get rid of code duplication in ->probe()
NFC: nxp-nci: Get rid of useless label
NFC: nxp-nci: Constify acpi_device_id
NFC: nxp-nci: Drop of_match_ptr() use
NFC: nxp-nci: Drop comma in terminator lines
NFC: nxp-nci: Remove unused macro pr_fmt()
NFC: nxp-nci: Remove 'default n' for the core
Sedat Dilek (2):
NFC: nxp-nci: Clarify on supported chips
NFC: nxp-nci: Fix recommendation for NFC_NXP_NCI_I2C Kconfig
MAINTAINERS | 1 -
drivers/nfc/nxp-nci/Kconfig | 7 +-
drivers/nfc/nxp-nci/core.c | 2 -
drivers/nfc/nxp-nci/i2c.c | 134 +++++++-------------------
drivers/nfc/nxp-nci/nxp-nci.h | 1 -
include/linux/platform_data/nxp-nci.h | 19 ----
net/nfc/netlink.c | 6 +-
7 files changed, 41 insertions(+), 129 deletions(-)
delete mode 100644 include/linux/platform_data/nxp-nci.h
--
2.20.1
^ permalink raw reply
* [PATCH v4 10/14] NFC: nxp-nci: Drop comma in terminator lines
From: Andy Shevchenko @ 2019-07-29 13:35 UTC (permalink / raw)
To: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
Cc: Andy Shevchenko, Sedat Dilek
In-Reply-To: <20190729133514.13164-1-andriy.shevchenko@linux.intel.com>
There is no need to have a comma after terminator entry
in the arrays of IDs.
This may prevent the misguided addition behind the terminator
without compiler notice.
Drop the comma in terminator lines for good.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
---
drivers/nfc/nxp-nci/i2c.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/nxp-nci/i2c.c b/drivers/nfc/nxp-nci/i2c.c
index f2c8a560e265..59b0a02a813d 100644
--- a/drivers/nfc/nxp-nci/i2c.c
+++ b/drivers/nfc/nxp-nci/i2c.c
@@ -325,7 +325,7 @@ MODULE_DEVICE_TABLE(i2c, nxp_nci_i2c_id_table);
static const struct of_device_id of_nxp_nci_i2c_match[] = {
{ .compatible = "nxp,nxp-nci-i2c", },
- {},
+ {}
};
MODULE_DEVICE_TABLE(of, of_nxp_nci_i2c_match);
@@ -333,7 +333,7 @@ MODULE_DEVICE_TABLE(of, of_nxp_nci_i2c_match);
static const struct acpi_device_id acpi_id[] = {
{ "NXP1001" },
{ "NXP7471" },
- { },
+ { }
};
MODULE_DEVICE_TABLE(acpi, acpi_id);
#endif
--
2.20.1
^ permalink raw reply related
* [PATCH v4 12/14] NFC: nxp-nci: Remove 'default n' for the core
From: Andy Shevchenko @ 2019-07-29 13:35 UTC (permalink / raw)
To: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
Cc: Andy Shevchenko, Sedat Dilek
In-Reply-To: <20190729133514.13164-1-andriy.shevchenko@linux.intel.com>
It seems contributors follow the style of Kconfig entries where explicit
'default n' is present. The default 'default' is 'n' already, thus, drop
these lines from Kconfig to make it more clear.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
---
drivers/nfc/nxp-nci/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/nfc/nxp-nci/Kconfig b/drivers/nfc/nxp-nci/Kconfig
index 12df2c8cc51d..ed6cbdf0f0b4 100644
--- a/drivers/nfc/nxp-nci/Kconfig
+++ b/drivers/nfc/nxp-nci/Kconfig
@@ -2,7 +2,6 @@
config NFC_NXP_NCI
tristate "NXP-NCI NFC driver"
depends on NFC_NCI
- default n
---help---
Generic core driver for NXP NCI chips such as the NPC100
or PN7150 families.
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Nikolay Aleksandrov @ 2019-07-29 13:42 UTC (permalink / raw)
To: Allan W. Nielsen
Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel
In-Reply-To: <20190729131420.tqukz55tz26jkg73@lx-anielsen.microsemi.net>
On 29/07/2019 16:14, Allan W. Nielsen wrote:
> The 07/29/2019 15:22, Nikolay Aleksandrov wrote:
>> Yes, all of the multicast code is handled differently, it doesn't go through the fdb
>> lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
>> multicast mac address, take a look at br_handle_frame_finish() and you'll notice
>> that when a multicast dmac is detected then we use the bridge mcast code for lookups
>> and forwarding.
>
> Here is my thinking (needs much more elaboration, which will come if we do a
> patch to test it out):
>
>
> In br_pkt_type
>
> Rename BR_PKT_MULTICAST to BR_PKT_MULTICAST_IP
> Add a new type called BR_PKT_MULTICAST_L2
>
> In br_handle_frame_finish
>
> if (is_multicast_ether_addr(dest)) {
> /* by definition the broadcast is also a multicast address */
> if (is_broadcast_ether_addr(dest)) {
> pkt_type = BR_PKT_BROADCAST;
> local_rcv = true;
> } else {
> pkt_type = BR_PKT_MULTICAST;
> if (br_multicast_rcv(br, p, skb, vid))
> goto drop;
> }
> }
>
> Change the code above to detect if it is a BR_PKT_MULTICAST_IP or a
> BR_PKT_MULTICAST_L2
>
>
> In this section:
>
> switch (pkt_type) {
> ....
> }
>
> if (dst) {
> } else {
> }
>
> Add awareness to the BR_PKT_MULTICAST_L2 type, and allow it do forwarding
> according to the static entry if it is there.
>
All of these add overhead to everyone - standard unicast and multicast forwarding.
Also as mentioned in my second email the fdb would need changes which would further
increase that overhead and also the code complexity for everyone for a use-case which
is very rare when there are alternatives today which avoid all of that. Offload tc rules
and add a rule to allow that traffic on ingress for particular ports, then either use
the bridge multicast flood flag or tc again to restrict flood to other egress ports.
Alternatively use ip maddr add which calls dev_mc_add() which is what the patch was
trying to do in the first place to allow the Rx of these packets and then control
the flooding via one of the above methods.
Alternatively offload nft and use that to control the traffic.
I'm sure there are more ways that I'm missing. :)
If you find a way to achieve this without incurring a performance hit or significant
code complexity increase, and without breaking current use-cases (e.g. unexpected default
forwarding behaviour changes) then please send a patch and we can discuss it further with
all details present. People have provided enough alternatives which avoid all of the
problems.
>> If you're trying to achieve Rx only on the bridge of these then
>> why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?
>>
>> If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
>> use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
>> possible to achieve via other methods, no need to go through the bridge.
>
> Well, I wanted the SW bridge implementation to behave the same with an without
> HW offload.
>
> And also, I believe that is conceptually belongs to the MAC tables.
>
> /Allan
>
^ permalink raw reply
* Re: [PATCH] net: phy: phy_led_triggers: Fix a possible null-pointer dereference in phy_led_trigger_change_speed()
From: Andrew Lunn @ 2019-07-29 13:45 UTC (permalink / raw)
To: Jia-Ju Bai; +Cc: f.fainelli, hkallweit1, davem, netdev, linux-kernel
In-Reply-To: <20190729092424.30928-1-baijiaju1990@gmail.com>
On Mon, Jul 29, 2019 at 05:24:24PM +0800, Jia-Ju Bai wrote:
> In phy_led_trigger_change_speed(), there is an if statement on line 48
> to check whether phy->last_triggered is NULL:
> if (!phy->last_triggered)
>
> When phy->last_triggered is NULL, it is used on line 52:
> led_trigger_event(&phy->last_triggered->trigger, LED_OFF);
>
> Thus, a possible null-pointer dereference may occur.
>
> To fix this bug, led_trigger_event(&phy->last_triggered->trigger,
> LED_OFF) is called when phy->last_triggered is not NULL.
>
> This bug is found by a static analysis tool STCheck written by us.
Who is 'us'?
Thanks
Andrew
^ permalink raw reply
* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-29 13:52 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel
In-Reply-To: <3cc69103-d194-2eca-e7dd-e2fa6a730223@cumulusnetworks.com>
The 07/29/2019 15:50, Nikolay Aleksandrov wrote:
> On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> > Hi Allan,
> > On 29/07/2019 15:14, Allan W. Nielsen wrote:
> >> First of all, as mentioned further down in this thread, I realized that our
> >> implementation of the multicast floodmasks does not align with the existing SW
> >> implementation. We will change this, such that all multicast packets goes to the
> >> SW bridge.
> >>
> >> This changes things a bit, not that much.
> >>
> >> I actually think you summarized the issue we have (after changing to multicast
> >> flood-masks) right here:
> >>
> >> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> >>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> >>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> >>>>> Thus only the flooding may need to be controlled.
> >>
> >> This seems to be exactly what we need.
> >>
> >> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
> >> on a network where we want to limit the flooding of frames with dmac
> >> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
> >>
> >> One way of doing this could potentially be to support the following command:
> >>
> >> bridge fdb add 01:21:6C:00:00:01 port eth0
> >> bridge fdb append 01:21:6C:00:00:01 port eth1
> >>
>
> And the fdbs become linked lists?
Yes, it will most likely become a linked list
> So we'll increase the complexity for something that is already supported by
> ACLs (e.g. tc) and also bridge per-port multicast flood flag ?
I do not think it can be supported with the facilities we have today in tc.
We can do half of it (copy more fraems to the CPU) with tc, but we can not limit
the floodmask of a frame with tc (say we want it to flood to 2 out of 4 slave
ports).
> I'm sorry but that doesn't sound good to me for a case which is very rare and
> there are existing ways to solve without incurring performance hits or increasing
> code complexity.
I do not consider it rarely, controling the forwarding of L2 multicast frames is
quite common in the applications we are doing.
> If you find a way to achieve this without incurring a performance hit or significant
> code complexity increase, and without breaking current use-cases (e.g. unexpected default
> forwarding behaviour changes) then please send a patch and we can discuss it further with
> all details present. People have provided enough alternatives which avoid all of the
> problems.
Will do, thanks for the guidance.
/Allan
^ permalink raw reply
* Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Gal Pressman @ 2019-07-29 13:53 UTC (permalink / raw)
To: Michal Kalderon, Jason Gunthorpe
Cc: Ariel Elior, dledford@redhat.com, linux-rdma@vger.kernel.org,
davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <MN2PR18MB3182F4557BC042EE37A3C565A1DD0@MN2PR18MB3182.namprd18.prod.outlook.com>
On 29/07/2019 15:58, Michal Kalderon wrote:
>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> owner@vger.kernel.org> On Behalf Of Jason Gunthorpe
>>
>>> + xa_lock(&ucontext->mmap_xa);
>>> + if (check_add_overflow(ucontext->mmap_xa_page,
>>> + (u32)(length >> PAGE_SHIFT),
>>> + &next_mmap_page))
>>> + goto err_unlock;
>>
>> I still don't like that this algorithm latches into a permanent failure when the
>> xa_page wraps.
>>
>> It seems worth spending a bit more time here to tidy this.. Keep using the
>> mmap_xa_page scheme, but instead do something like
>>
>> alloc_cyclic_range():
>>
>> while () {
>> // Find first empty element in a cyclic way
>> xa_page_first = mmap_xa_page;
>> xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)
>>
>> // Is there a enough room to have the range?
>> if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
>> mmap_xa_page = 0;
>> continue;
>> }
>>
>> // See if the element before intersects
>> elm = xa_find(xa, &zero, xa_page_end, 0);
>> if (elm && intersects(xa_page_first, xa_page_last, elm->first, elm->last)) {
>> mmap_xa_page = elm->last + 1;
>> continue
>> }
>>
>> // xa_page_first -> xa_page_end should now be free
>> xa_insert(xa, xa_page_start, entry);
>> mmap_xa_page = xa_page_end + 1;
>> return xa_page_start;
>> }
>>
>> Approximately, please check it.
> Gal & Jason,
>
> Coming back to the mmap_xa_page algorithm. I couldn't find some background on this.
> Why do you need the length to be represented in the mmap_xa_page ?
> Why not simply use xa_alloc_cyclic ( like in siw )
> This is simply a key to a mmap object...
The intention was that the entry would "occupy" number of xarray elements
according to its size (in pages). It wasn't initially like this, but IIRC this
was preferred by Jason.
^ permalink raw reply
* Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
From: Michael S. Tsirkin @ 2019-07-29 13:59 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190717113030.163499-1-sgarzare@redhat.com>
On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
Series:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Can this go into net-next?
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
> moved this patch after "vsock/virtio: reduce credit update messages"
> to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> conflicts
>
> v3: https://patchwork.kernel.org/cover/10970145
>
> v2: https://patchwork.kernel.org/cover/10938743
>
> v1: https://patchwork.kernel.org/cover/10885431
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.
>
> A brief description of patches:
> - Patches 1: limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
> transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
>
> host -> guest [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.032 0.030 0.048 0.051
> 64 0.061 0.059 0.108 0.117
> 128 0.122 0.112 0.227 0.234
> 256 0.244 0.241 0.418 0.415
> 512 0.459 0.466 0.847 0.865
> 1K 0.927 0.919 1.657 1.641
> 2K 1.884 1.813 3.262 3.269
> 4K 3.378 3.326 6.044 6.195
> 8K 5.637 5.676 10.141 11.287
> 16K 8.250 8.402 15.976 16.736
> 32K 13.327 13.204 19.013 20.515
> 64K 21.241 21.341 20.973 21.879
> 128K 21.851 22.354 21.816 23.203
> 256K 21.408 21.693 21.846 24.088
> 512K 21.600 21.899 21.921 24.106
>
> guest -> host [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.045 0.046 0.057 0.057
> 64 0.089 0.091 0.103 0.104
> 128 0.170 0.179 0.192 0.200
> 256 0.364 0.351 0.361 0.379
> 512 0.709 0.699 0.731 0.790
> 1K 1.399 1.407 1.395 1.427
> 2K 2.670 2.684 2.745 2.835
> 4K 5.171 5.199 5.305 5.451
> 8K 8.442 8.500 10.083 9.941
> 16K 12.305 12.259 13.519 15.385
> 32K 11.418 11.150 11.988 24.680
> 64K 10.778 10.659 11.589 35.273
> 128K 10.421 10.339 10.939 40.338
> 256K 10.300 9.719 10.508 36.562
> 512K 9.833 9.808 10.612 35.979
>
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.35 0.45 0.79 1.02
> 64 0.56 0.80 1.41 1.54
> 128 1.11 1.52 3.03 3.12
> 256 2.20 2.16 5.44 5.58
> 512 4.17 4.18 10.96 11.46
> 1K 8.30 8.26 20.99 20.89
> 2K 16.82 16.31 39.76 39.73
> 4K 30.89 30.79 74.07 75.73
> 8K 53.74 54.49 124.24 148.91
> 16K 80.68 83.63 200.21 232.79
> 32K 132.27 132.52 260.81 357.07
> 64K 229.82 230.40 300.19 444.18
> 128K 332.60 329.78 331.51 492.28
> 256K 331.06 337.22 339.59 511.59
> 512K 335.58 328.50 331.56 504.56
>
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.43 0.43 0.53 0.56
> 64 0.85 0.86 1.04 1.10
> 128 1.63 1.71 2.07 2.13
> 256 3.48 3.35 4.02 4.22
> 512 6.80 6.67 7.97 8.63
> 1K 13.32 13.31 15.72 15.94
> 2K 25.79 25.92 30.84 30.98
> 4K 50.37 50.48 58.79 59.69
> 8K 95.90 96.15 107.04 110.33
> 16K 145.80 145.43 143.97 174.70
> 32K 147.06 144.74 146.02 282.48
> 64K 145.25 143.99 141.62 406.40
> 128K 149.34 146.96 147.49 489.34
> 256K 156.35 149.81 152.21 536.37
> 512K 151.65 150.74 151.52 519.93
>
> [1] https://github.com/stefano-garzarella/iperf/
>
> Stefano Garzarella (5):
> vsock/virtio: limit the memory used per-socket
> vsock/virtio: reduce credit update messages
> vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
>
> drivers/vhost/vsock.c | 68 ++++++++++++-----
> include/linux/virtio_vsock.h | 4 +-
> net/vmw_vsock/virtio_transport.c | 1 +
> net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> 4 files changed, 134 insertions(+), 38 deletions(-)
>
> --
> 2.20.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox