Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] bnxt_en: Add page_pool_destroy() during RX ring cleanup.
From: Andy Gospodarek @ 2019-07-09 13:18 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, Ilias Apalodimas
In-Reply-To: <1562658607-30048-1-git-send-email-michael.chan@broadcom.com>

On Tue, Jul 09, 2019 at 03:50:07AM -0400, Michael Chan wrote:
> Add page_pool_destroy() in bnxt_free_rx_rings() during normal RX ring
> cleanup, as Ilias has informed us that the following commit has been
> merged:
> 
> 1da4bbeffe41 ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
> 
> The special error handling code to call page_pool_free() can now be
> removed.  bnxt_free_rx_rings() will always be called during normal
> shutdown or any error paths.
> 
> Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support")
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: Andy Gospodarek <gospo@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index e9d3bd8..2b5b0ab 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -2500,6 +2500,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
>  		if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
>  
> +		page_pool_destroy(rxr->page_pool);
>  		rxr->page_pool = NULL;
>  
>  		kfree(rxr->rx_tpa);
> @@ -2560,19 +2561,14 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  			return rc;
>  
>  		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
> -		if (rc < 0) {
> -			page_pool_free(rxr->page_pool);
> -			rxr->page_pool = NULL;
> +		if (rc < 0)
>  			return rc;
> -		}
>  
>  		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
>  						MEM_TYPE_PAGE_POOL,
>  						rxr->page_pool);
>  		if (rc) {
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
> -			page_pool_free(rxr->page_pool);
> -			rxr->page_pool = NULL;

Rather than deleting these lines it would also be acceptable to do:

                if (rc) {
                        xdp_rxq_info_unreg(&rxr->xdp_rxq);
-                       page_pool_free(rxr->page_pool);
+                       page_pool_destroy(rxr->page_pool);
                        rxr->page_pool = NULL;
                        return rc;
                }

but anytime there is a failure to bnxt_alloc_rx_rings the driver will
immediately follow it up with a call to bnxt_free_rx_rings, so
page_pool_destroy will be called.

Thanks for pushing this out so quickly!

Acked-by: Andy Gospodarek <gospo@broadcom.com> 


^ permalink raw reply

* Re: i.mx6ul with DSA in multi chip addressing mode - no MDIO access
From: Benjamin Beckmeyer @ 2019-07-09 13:20 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190708145733.GA9027@lunn.ch>

>> Hi Andrew,
>> I got it working a little bit better. When I'm fast enough I can read
>> the registers I want but it isn't a solution.
> Why do you need to read registers?
>
> What you actually might be interested in is the debugfs patches in
> Viviens github tree.
>
>> Here is an output of the tracing even with my custom accesses.
>> mii -i 2 0 0x9b60; mii -i 2 1
>> phyid:2, reg:0x01 -> 0xc801
>>
>> Do you know how to delete EEInt bit? It is always one. And now all 
>> accesses coming from the kworker thread. Maybe this is your polling 
>> function?
> EEInt should clear on read for older chips. For 6390, it might be
> necessary to read global 2, register 0x13, index 03.
>  
>> I view the INT pin on an oscilloscope but it never changed. So maybe
>> this is the problem. We just soldered a pull-up to that pin but it 
>> still never changend. Maybe you have an idea?
> The EEInt bit is probably masked. So it will not generate in
> interrupt.
>
>> So what I think is, because of the EEInt bit is never set back to one 
>> i will poll it as fast as possible.
> Is it forever looping in mv88e6xxx_g1_irq_thread_work? Or is it the
> polling code, mv88e6xxx_irq_poll() firing every 100ms?
>
> 	Andrew

Hi Andrew,
good news first, it seems to be running ;-).

The interrupt GPIO pin was not correctly configured in the device tree.

For now we have around 68 accesses per second, I think this is okay 
because we even have indirect access, so the bus must be more busy.

What do you think about it?

Why we need access to the bus is because we have some software which was 
using the DSDT driver and now we want to switch to the UMSD driver.
But we hope that we can forget about all the UMSD driver stuff and the 
DSDT driver stuff as well and just use the DSA part from the kernel.
To be honest, so far I don't know what functions we need from the driver
which aren't supported by the DSA.

Thanks again for your help and patience.

Cheers,
Benny

^ permalink raw reply

* Re: IPv6 flow label reflection behave for RST packets
From: Eric Dumazet @ 2019-07-09 13:22 UTC (permalink / raw)
  To: Marek Majkowski, Eric Dumazet
  Cc: kuznet, yoshfuji, Jakub Sitnicki, netdev, kernel-team
In-Reply-To: <CAJPywTKXL=_8h3aoC=n-c8o_Uo7P6RnKOgm6CpvrNsPQuw4C9A@mail.gmail.com>



On 7/9/19 2:33 PM, Marek Majkowski wrote:
> Ha, thanks. I missed that.
> 
> There is a caveat though. I don't think it's working as intended...


Note that my commit really took a look at a fraction of the cases ;)

commit 323a53c41292a0d7efc8748856c623324c8d7c21

    ipv6: tcp: enable flowlabel reflection in some RST packets
    
    When RST packets are sent because no socket could be found,
    it makes sense to use flowlabel_reflect sysctl to decide
    if a reflection of the flowlabel is requested.
    

In your case, a socket is found, most probably, and np->repflow seems to be ignored.

I'll take a look, thanks.

> Running my script:
> 
> $ sysctl -w net.ipv6.flowlabel_reflect=3
> 
> $ tail reflect.py
> cd2.close()
> cd.send(b"a")
> 
> $ python3 reflect.py
> IP6 (flowlabel 0xf2927, hlim 64) ::1.1235 > ::1.60246: Flags [F.]
> IP6 (flowlabel 0xf2927, hlim 64) ::1.60246 > ::1.1235: Flags [P.]
> IP6 (flowlabel 0x58ecd, hlim 64) ::1.1235 > ::1.60246: Flags [R]
> 
> Note. The RST is opportunistic, depending on timing I sometimes get a
> proper FIN, without RST.
> 
> If I change the script to introduce some delay:
> 
> $ tail reflect.py
> cd2.close()
> time.sleep(0.1)
> cd.send(b"a")
> 
> $ python3 reflect.py
> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [.]
> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [P.]
> IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R]
> 
> Now it seem to work reliably. Tested on net-next under virtme.
> 
> Marek
> 
> On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 7/9/19 1:10 PM, Marek Majkowski wrote:
>>> Morning,
>>>
>>> I'm experimenting with flow label reflection from a server point of
>>> view. I'm able to get it working in both supported ways:
>>>
>>> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
>>>
>>> (b) with global flowlabel_reflect sysctl
>>>
>>> However, I was surprised to see that RST after the connection is torn
>>> down, doesn't have the correct flow label value:
>>>
>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
>>> IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
>>>
>>> Notice, the last RST packet has inconsistent flow label. Perhaps we
>>> can argue this behaviour might be acceptable for a per-socket
>>> IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
>>> expect the RST to preserve the reflected flow label value.
>>>
>>> I suspect the same behaviour is true for kernel-generated ICMPv6.
>>>
>>> Prepared test case:
>>> https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
>>>
>>> This behaviour is not necessarily a bug, more of a surprise. Flow
>>> label reflection is mostly useful in deployments where Linux servers
>>> stand behind ECMP router, which uses flow-label to compute the hash.
>>> Flow label reflection allows ICMP PTB message to be routed back to
>>> correct server.
>>>
>>> It's hard to imagine a situation where generated RST or ICMP echo
>>> response would trigger a ICMP PTB. Flow label reflection is explained
>>> here:
>>> https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
>>> and:
>>> https://tools.ietf.org/html/rfc7098
>>> https://tools.ietf.org/html/rfc6438
>>>
>>> Cheers,
>>>     Marek
>>>
>>>
>>> (Note: the unrelated "fwmark_reflect" toggle is about something
>>> different - flow marks, but also addresses RST and ICMP generated by
>>> the server)
>>>
>>
>> Please check the recent commits, scheduled for linux-5.3
>>
>> a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
>> c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
>> 392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
>> 50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
>> 323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets
>>

^ permalink raw reply

* RE: [PATCH] tipc: ensure skb->lock is initialised
From: Jon Maloy @ 2019-07-09 13:25 UTC (permalink / raw)
  To: Eric Dumazet, Chris Packham, ying.xue@windriver.com,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <b862a74b-9f1e-fb64-0641-550a83b64664@gmail.com>



> -----Original Message-----
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Sent: 9-Jul-19 03:31
> To: Chris Packham <Chris.Packham@alliedtelesis.co.nz>; Eric Dumazet
> <eric.dumazet@gmail.com>; Jon Maloy <jon.maloy@ericsson.com>;
> ying.xue@windriver.com; davem@davemloft.net
> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] tipc: ensure skb->lock is initialised
> 
> 
> 
> On 7/8/19 11:13 PM, Chris Packham wrote:
> > On 9/07/19 8:43 AM, Chris Packham wrote:
> >> On 8/07/19 8:18 PM, Eric Dumazet wrote:
> >>>
> >>>
> >>> On 7/8/19 12:53 AM, Chris Packham wrote:
> >>>> tipc_named_node_up() creates a skb list. It passes the list to
> >>>> tipc_node_xmit() which has some code paths that can call
> >>>> skb_queue_purge() which relies on the list->lock being initialised.
> >>>> Ensure tipc_named_node_up() uses skb_queue_head_init() so that the
> >>>> lock is explicitly initialised.
> >>>>
> >>>> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> >>>
> >>> I would rather change the faulty skb_queue_purge() to
> >>> __skb_queue_purge()
> >>>
> >>
> >> Makes sense. I'll look at that for v2.
> >>
> >
> > Actually maybe not. tipc_rcast_xmit(), tipc_node_xmit_skb(),
> > tipc_send_group_msg(), __tipc_sendmsg(), __tipc_sendstream(), and
> > tipc_sk_timeout() all use skb_queue_head_init(). So my original change
> > brings tipc_named_node_up() into line with them.
> >
> > I think it should be safe for tipc_node_xmit() to use
> > __skb_queue_purge() since all the callers seem to have exclusive
> > access to the list of skbs. It still seems that the callers should all
> > use
> > skb_queue_head_init() for consistency.

I agree with that.

> >
> 
> No, tipc does not use the list lock (it relies on the socket lock)  and therefore
> should consistently use __skb_queue_head_init() instead of
> skb_queue_head_init()

TIPC is using the list lock at message reception within the scope of tipc_sk_rcv()/tipc_skb_peek_port(), so it is fundamental that the lock always is correctly initialized.

> 
[...]
> 
> tipc_link_xmit() for example never acquires the spinlock, yet uses skb_peek()
> and __skb_dequeue()


You should look at tipc_node_xmit instead. Node local messages are sent directly to tipc_sk_rcv(), and never go through tipc_link_xmit()

Regards
///jon



^ permalink raw reply

* Re: [PATCH net-next v2 4/4] bnxt_en: add page_pool support
From: Andy Gospodarek @ 2019-07-09 13:29 UTC (permalink / raw)
  To: Ilias Apalodimas; +Cc: Michael Chan, davem, netdev, hawk, ast
In-Reply-To: <20190709062746.GA621@apalos>

On Tue, Jul 09, 2019 at 09:27:46AM +0300, Ilias Apalodimas wrote:
> 
> Thanks and sorry for the inconvenience :(
> /Ilias

No worries.  I didn't know Ivan's patch was going to go in so quickly!



^ permalink raw reply

* Re: [PATCH net-next,v3 10/11] net: flow_offload: add flow_block_cb_is_busy() and use it
From: Jiri Pirko @ 2019-07-09 13:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netdev, davem, thomas.lendacky, f.fainelli, ariel.elior,
	michael.chan, madalin.bucur, yisen.zhuang, salil.mehta,
	jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch, jakub.kicinski,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, netfilter-devel
In-Reply-To: <20190708160614.2226-11-pablo@netfilter.org>

Mon, Jul 08, 2019 at 06:06:12PM CEST, pablo@netfilter.org wrote:
>This patch adds a function to check if flow block callback is already in
>use.  Call this new function from flow_block_cb_setup_simple() and from
>drivers.
>
>Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>---
>v3: formerly known as "net: flow_offload: don't allow subsystem to reuse blocks"
>    add flow_block_cb_is_busy() helper. Call it per driver to make it easier
>    to remove this whenever the first driver client support for multiple
>    subsystem offloads.
>
> drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    |  4 ++++
> drivers/net/ethernet/mellanox/mlxsw/spectrum.c      |  4 ++++
> drivers/net/ethernet/mscc/ocelot_tc.c               |  3 +++
> drivers/net/ethernet/netronome/nfp/flower/offload.c |  4 ++++
> include/net/flow_offload.h                          |  3 +++
> net/core/flow_offload.c                             | 18 ++++++++++++++++++
> net/dsa/slave.c                                     |  3 +++
> 7 files changed, 39 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>index 19133b9e121a..e303149053e4 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>@@ -721,6 +721,10 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
> 		if (indr_priv)
> 			return -EEXIST;
> 
>+		if (flow_block_cb_is_busy(mlx5e_rep_indr_setup_block_cb,
>+					  indr_priv, &mlx5e_block_cb_list))

As I already asked for in another patch in this set, it would be really
much much better to have some wrapping struct instead of plain list
head here. 

[...]

^ permalink raw reply

* Re: IPv6 flow label reflection behave for RST packets
From: Eric Dumazet @ 2019-07-09 13:36 UTC (permalink / raw)
  To: Eric Dumazet, Marek Majkowski
  Cc: kuznet, yoshfuji, Jakub Sitnicki, netdev, kernel-team
In-Reply-To: <8e2fca44-6fe7-42fc-8684-2cdd52c67103@gmail.com>



On 7/9/19 3:22 PM, Eric Dumazet wrote:
> 
> 
> On 7/9/19 2:33 PM, Marek Majkowski wrote:
>> Ha, thanks. I missed that.
>>
>> There is a caveat though. I don't think it's working as intended...
> 
> 
> Note that my commit really took a look at a fraction of the cases ;)
> 
> commit 323a53c41292a0d7efc8748856c623324c8d7c21
> 
>     ipv6: tcp: enable flowlabel reflection in some RST packets
>     
>     When RST packets are sent because no socket could be found,
>     it makes sense to use flowlabel_reflect sysctl to decide
>     if a reflection of the flowlabel is requested.
>     
> 
> In your case, a socket is found, most probably, and np->repflow seems to be ignored.
> 
> I'll take a look, thanks.

I guess a possible fix would be :

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d56a9019a0feb5a34312ec353c555f44b8c09b3d..2a298835317c0f6b1d82fb118dc4ba9647a2a110 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -984,8 +984,13 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
 
        if (sk) {
                oif = sk->sk_bound_dev_if;
-               if (sk_fullsock(sk))
+               if (sk_fullsock(sk)) {
+                       struct ipv6_pinfo *np = tcp_inet6_sk(sk);
+
                        trace_tcp_send_reset(sk, skb);
+                       if (np->repflow)
+                               label = ip6_flowlabel(ipv6h);
+               }
                if (sk->sk_state == TCP_TIME_WAIT)
                        label = cpu_to_be32(inet_twsk(sk)->tw_flowlabel);
        } else {


> 
>> Running my script:
>>
>> $ sysctl -w net.ipv6.flowlabel_reflect=3
>>
>> $ tail reflect.py
>> cd2.close()
>> cd.send(b"a")
>>
>> $ python3 reflect.py
>> IP6 (flowlabel 0xf2927, hlim 64) ::1.1235 > ::1.60246: Flags [F.]
>> IP6 (flowlabel 0xf2927, hlim 64) ::1.60246 > ::1.1235: Flags [P.]
>> IP6 (flowlabel 0x58ecd, hlim 64) ::1.1235 > ::1.60246: Flags [R]
>>
>> Note. The RST is opportunistic, depending on timing I sometimes get a
>> proper FIN, without RST.
>>
>> If I change the script to introduce some delay:
>>
>> $ tail reflect.py
>> cd2.close()
>> time.sleep(0.1)
>> cd.send(b"a")
>>
>> $ python3 reflect.py
>> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [.]
>> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [P.]
>> IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R]
>>
>> Now it seem to work reliably. Tested on net-next under virtme.
>>
>> Marek
>>
>> On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>>
>>> On 7/9/19 1:10 PM, Marek Majkowski wrote:
>>>> Morning,
>>>>
>>>> I'm experimenting with flow label reflection from a server point of
>>>> view. I'm able to get it working in both supported ways:
>>>>
>>>> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
>>>>
>>>> (b) with global flowlabel_reflect sysctl
>>>>
>>>> However, I was surprised to see that RST after the connection is torn
>>>> down, doesn't have the correct flow label value:
>>>>
>>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
>>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
>>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
>>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
>>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
>>>> IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
>>>>
>>>> Notice, the last RST packet has inconsistent flow label. Perhaps we
>>>> can argue this behaviour might be acceptable for a per-socket
>>>> IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
>>>> expect the RST to preserve the reflected flow label value.
>>>>
>>>> I suspect the same behaviour is true for kernel-generated ICMPv6.
>>>>
>>>> Prepared test case:
>>>> https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
>>>>
>>>> This behaviour is not necessarily a bug, more of a surprise. Flow
>>>> label reflection is mostly useful in deployments where Linux servers
>>>> stand behind ECMP router, which uses flow-label to compute the hash.
>>>> Flow label reflection allows ICMP PTB message to be routed back to
>>>> correct server.
>>>>
>>>> It's hard to imagine a situation where generated RST or ICMP echo
>>>> response would trigger a ICMP PTB. Flow label reflection is explained
>>>> here:
>>>> https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
>>>> and:
>>>> https://tools.ietf.org/html/rfc7098
>>>> https://tools.ietf.org/html/rfc6438
>>>>
>>>> Cheers,
>>>>     Marek
>>>>
>>>>
>>>> (Note: the unrelated "fwmark_reflect" toggle is about something
>>>> different - flow marks, but also addresses RST and ICMP generated by
>>>> the server)
>>>>
>>>
>>> Please check the recent commits, scheduled for linux-5.3
>>>
>>> a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
>>> c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
>>> 392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
>>> 50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
>>> 323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets
>>>

^ permalink raw reply related

* Re: [PATCH net-next,v3 04/11] net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
From: Jiri Pirko @ 2019-07-09 13:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netdev, davem, thomas.lendacky, f.fainelli, ariel.elior,
	michael.chan, madalin.bucur, yisen.zhuang, salil.mehta,
	jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch, jakub.kicinski,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, netfilter-devel
In-Reply-To: <20190708160614.2226-5-pablo@netfilter.org>

Mon, Jul 08, 2019 at 06:06:06PM CEST, pablo@netfilter.org wrote:

[...]


>+struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,

You don't use net any longer.


>+					  void *cb_ident, void *cb_priv,
>+					  void (*release)(void *cb_priv))

[...]

^ permalink raw reply

* Re: [PATCH nf-next 1/3] netfilter: nf_nat_proto: add nf_nat_bridge_ops support
From: wenxu @ 2019-07-09 13:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <20190709104206.gy6l52rx2dat3743@breakpoint.cc>

在 2019/7/9 18:42, Florian Westphal 写道:
> wenxu <wenxu@ucloud.cn> wrote:
>>> For NAT on bridge, it should be possible already to push such packets
>>> up the stack by
>>>
>>> bridge input meta iif eth0 ip saddr 192.168.0.0/16 \
>>>        meta pkttype set unicast ether daddr set 00:11:22:33:44:55
>> yes, packet can be push up to IP stack to handle the nat through bridge device. 
>>
>> In my case dnat 2.2.1.7 to 10.0.0.7, It assume the mac address of the two address
>> is the same known by outer.
> I think that in general they will have different MAC addresses, so plain
> replacement of ip addresses won't work.
>
>> But in This case modify the packet dmac to bridge device, the packet push up through bridge device
>> Then do nat and route send back to bridge device.
> Are you saying that you can use the send-to-ip-layer approach?
>
> We might need/want a more convenient way to do this.
> There are two ways that I can see:
>
> 1. a redirect support for nftables bridge family.
>    The redirect expression would be same as "ether daddr set
>    <bridge_mac>", but there is no need to know the bridge mac address.
>
> 2. Support ebtables -t broute in nftables.
>    The route rework for ebtables has been completed already, so
>    this needs a new expression.  Packet that is brouted behaves
>    as if the bridge port was not part of the bridge.

This is my senario:

For a virtual machine example with address  10.0.0.7 and internet address 2.2.1.7  default router

10.0.0.1. There are both the east-west and south-north traffic. So the outer vnet0 connect to bridge

br0 which with address 10.0.0.1.   The bridge also add an flow-based/metadata_dst vxlan device vxlan0.

So there are three kinds traffic to handle:

1. 10.0.0.7 <-----> 10.0.0.8: both ingress and egress packet gothrough the bridge with vlanid to vni feature.

2. 10.0.0.7 <-----> 10.0.1.8: The egress packet push up to stack through br0 to do route. And the route send packet through

vxlan0 to peer with static mac(Maybe the route can send through br0); The ingress packet always gothrough the bridge to VM.

3. 10.0.0.7  <----> 1.1.1.7: The egress The egress packet push up to stack through br0 to do route and nat. And the route send

packet through vxlan0 to router. With this patche, The router assume is the same mac address for 10.0.0.7 and 2.2.1.7. so it can do

nat under bridge and send to VM.

I think the most big problem is that the only vxlan0 device is alyways attach on br0. For L3( do route) traffic the egress packet will push

up to stack do route through br0.  The ingress I hope only gothrough the bridge to VM for all the three kinds traffic above.

^ permalink raw reply

* Re: [PATCH net-next,v3 10/11] net: flow_offload: add flow_block_cb_is_busy() and use it
From: Jiri Pirko @ 2019-07-09 13:39 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netdev, davem, thomas.lendacky, f.fainelli, ariel.elior,
	michael.chan, madalin.bucur, yisen.zhuang, salil.mehta,
	jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch, jakub.kicinski,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, netfilter-devel
In-Reply-To: <20190708160614.2226-11-pablo@netfilter.org>

Mon, Jul 08, 2019 at 06:06:12PM CEST, pablo@netfilter.org wrote:

[...]

>+bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,

There should be another patch before this one renaming tc_setup_cb_t and
ndo_setup_tc. This is not TC specific anymore now, it might confuse the
reader.

[...]

^ permalink raw reply

* Re: [PATCH net-next v6 04/15] ethtool: introduce ethtool netlink interface
From: Jiri Pirko @ 2019-07-09 13:42 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: netdev, David Miller, Jakub Kicinski, Andrew Lunn,
	Florian Fainelli, John Linville, Stephen Hemminger, Johannes Berg,
	linux-kernel
In-Reply-To: <20190708202219.GE24474@unicorn.suse.cz>

Mon, Jul 08, 2019 at 10:22:19PM CEST, mkubecek@suse.cz wrote:
>On Mon, Jul 08, 2019 at 09:26:29PM +0200, Jiri Pirko wrote:
>> Mon, Jul 08, 2019 at 07:27:29PM CEST, mkubecek@suse.cz wrote:
>> >
>> >There are two reasons for this design. First is to reduce the number of
>> >requests needed to get the information. This is not so much a problem of
>> >ethtool itself; the only existing commands that would result in multiple
>> >request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
>> >also "ethtool -x/-X <dev>" but even if the indirection table and hash
>> >key have different bits assigned now, they don't have to be split even
>> >if we split other commands. It may be bigger problem for daemons wanting
>> >to keep track of system configuration which would have to issue many
>> >requests whenever a new device appears.
>> >
>> >Second reason is that with 8-bit genetlink command/message id, the space
>> >is not as infinite as it might seem. I counted quickly, right now the
>> >full series uses 14 ids for kernel messages, with split you propose it
>> >would most likely grow to 44. For full implementation of all ethtool
>> >functionality, we could get to ~60 ids. It's still only 1/4 of the
>> >available space but it's not clear what the future development will look
>> >like. We would certainly need to be careful not to start allocating new
>> >commands for single parameters and try to be foreseeing about what can
>> >be grouped together. But we will need to do that in any case.
>> >
>> >On kernel side, splitting existing messages would make some things a bit
>> >easier. It would also reduce the number of scenarios where only part of
>> >requested information is available or only part of a SET request fails.
>> 
>> Okay, I got your point. So why don't we look at if from the other angle.
>> Why don't we have only single get/set command that would be in general
>> used to get/set ALL info from/to the kernel. Where we can have these
>> bits (perhaps rather varlen bitfield) to for user to indicate which data
>> is he interested in? This scales. The other commands would be
>> just for action.
>> 
>> Something like RTM_GETLINK/RTM_SETLINK. Makes sense?
>
>It's certainly an option but at the first glance it seems as just moving
>what I tried to avoid one level lower. It would work around the u8 issue
>(but as Johannes pointed out, we can handle it with genetlink when/if
>the time comes). We would almost certainly have to split the replies
>into multiple messages to keep the packet size reasonable. I'll have to
>think more about the consequences for both kernel and userspace.
>
>My gut feeling is that out of the two extreme options (one universal
>message type and message types corresponding to current infomask bits),
>the latter is more appealing. After all, ethtool has been gathering
>features that would need those ~60 message types for 20 years.

Yeah, but I think that we have to do one or another. Anything in between
makes the code complex and uapi confusing. Let's start clean :)

^ permalink raw reply

* Re: [PATCH] tipc: ensure skb->lock is initialised
From: Eric Dumazet @ 2019-07-09 13:45 UTC (permalink / raw)
  To: Jon Maloy, Eric Dumazet, Chris Packham, ying.xue@windriver.com,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <MN2PR15MB35811151C4A627C0AF364CAC9AF10@MN2PR15MB3581.namprd15.prod.outlook.com>



On 7/9/19 3:25 PM, Jon Maloy wrote:
> 
> 
>> -----Original Message-----
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Sent: 9-Jul-19 03:31
>> To: Chris Packham <Chris.Packham@alliedtelesis.co.nz>; Eric Dumazet
>> <eric.dumazet@gmail.com>; Jon Maloy <jon.maloy@ericsson.com>;
>> ying.xue@windriver.com; davem@davemloft.net
>> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
>> kernel@vger.kernel.org
>> Subject: Re: [PATCH] tipc: ensure skb->lock is initialised
>>
>>
>>
>> On 7/8/19 11:13 PM, Chris Packham wrote:
>>> On 9/07/19 8:43 AM, Chris Packham wrote:
>>>> On 8/07/19 8:18 PM, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 7/8/19 12:53 AM, Chris Packham wrote:
>>>>>> tipc_named_node_up() creates a skb list. It passes the list to
>>>>>> tipc_node_xmit() which has some code paths that can call
>>>>>> skb_queue_purge() which relies on the list->lock being initialised.
>>>>>> Ensure tipc_named_node_up() uses skb_queue_head_init() so that the
>>>>>> lock is explicitly initialised.
>>>>>>
>>>>>> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
>>>>>
>>>>> I would rather change the faulty skb_queue_purge() to
>>>>> __skb_queue_purge()
>>>>>
>>>>
>>>> Makes sense. I'll look at that for v2.
>>>>
>>>
>>> Actually maybe not. tipc_rcast_xmit(), tipc_node_xmit_skb(),
>>> tipc_send_group_msg(), __tipc_sendmsg(), __tipc_sendstream(), and
>>> tipc_sk_timeout() all use skb_queue_head_init(). So my original change
>>> brings tipc_named_node_up() into line with them.
>>>
>>> I think it should be safe for tipc_node_xmit() to use
>>> __skb_queue_purge() since all the callers seem to have exclusive
>>> access to the list of skbs. It still seems that the callers should all
>>> use
>>> skb_queue_head_init() for consistency.
> 
> I agree with that.
> 
>>>
>>
>> No, tipc does not use the list lock (it relies on the socket lock)  and therefore
>> should consistently use __skb_queue_head_init() instead of
>> skb_queue_head_init()
> 
> TIPC is using the list lock at message reception within the scope of tipc_sk_rcv()/tipc_skb_peek_port(), so it is fundamental that the lock always is correctly initialized.

Where is the lock acquired, why was it only acquired by queue purge and not normal dequeues ???

> 
>>
> [...]
>>
>> tipc_link_xmit() for example never acquires the spinlock, yet uses skb_peek()
>> and __skb_dequeue()
> 
> 
> You should look at tipc_node_xmit instead. Node local messages are sent directly to tipc_sk_rcv(), and never go through tipc_link_xmit()

tipc_node_xmit() calls tipc_link_xmit() eventually, right ?

Please show me where the head->lock is acquired, and why it needed.

If this is mandatory, then more fixes are needed than just initializing the lock for lockdep purposes.


^ permalink raw reply

* Re: [PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first
From: Jiangfeng Xiao @ 2019-07-09 13:48 UTC (permalink / raw)
  To: Sergei Shtylyov, davem, robh+dt, yisen.zhuang, salil.mehta,
	mark.rutland, dingtianhong
  Cc: netdev, devicetree, linux-kernel, leeyou.li, nixiaoming,
	jianping.liu, xiekunxun
In-Reply-To: <890c48d1-76b8-5aea-e175-aa7d9967acd2@cogentembedded.com>



On 2019/7/9 17:35, Sergei Shtylyov wrote:
> Hello!
> 
> On 09.07.2019 6:31, Jiangfeng Xiao wrote:
> 
>> HI13X1_GMAC delete request for soft reset at first,
>> otherwise, the subsequent initialization will not
>> take effect.
>>
>> Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
>> ---
>>   drivers/net/ethernet/hisilicon/hip04_eth.c | 24 ++++++++++++++++++++++++
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
>> index fe61b01..19d8cfd 100644
>> --- a/drivers/net/ethernet/hisilicon/hip04_eth.c
>> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> [...]
>> @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev)
>>           goto init_fail;
>>       }
>>   +#if defined(CONFIG_HI13X1_GMAC)
>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>> +    priv->sysctrl_base = devm_ioremap_resource(d, res);
> 
>    There's devm_platform_ioremap_resource() now.

Thank you for your review, Great issue, which makes my code more concise.

I will fix it in v3. Or submit a patch to modify it separately, if maintainer
applies this patch series.


^ permalink raw reply

* Re: i.mx6ul with DSA in multi chip addressing mode - no MDIO access
From: Andrew Lunn @ 2019-07-09 13:52 UTC (permalink / raw)
  To: Benjamin Beckmeyer; +Cc: netdev
In-Reply-To: <0d595637-0081-662d-2812-0a174ee1a901@eks-engel.de>

> Hi Andrew,
> good news first, it seems to be running ;-).

Great.

> 
> The interrupt GPIO pin was not correctly configured in the device tree.
> 
> For now we have around 68 accesses per second, I think this is okay 
> because we even have indirect access, so the bus must be more busy.

That sounds reasonable.

> Why we need access to the bus is because we have some software which was 
> using the DSDT driver and now we want to switch to the UMSD driver.
> But we hope that we can forget about all the UMSD driver stuff and the 
> DSDT driver stuff as well and just use the DSA part from the kernel.
> To be honest, so far I don't know what functions we need from the driver
> which aren't supported by the DSA.

You should take a close look at what you actually need. Using
DSDT/UMSD at the same time as mainline DSA does not sound like a good
idea. One can stomp over the other.

If you do decide to do this, you are going to need to add a new API to
allow DSDT/UMSD to get reliable access to the registers. You need to
take the chip->reg_lock to give you exclusive access to the
indirection registers. That also won't be accepted into mainline. We
don't want user space drivers...

      Andrew

^ permalink raw reply

* Re: IPv6 flow label reflection behave for RST packets
From: Marek Majkowski @ 2019-07-09 14:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kuznet, yoshfuji, Jakub Sitnicki, netdev, kernel-team
In-Reply-To: <1cf380b3-843e-599a-105a-d1879852def1@gmail.com>

I can confirm the patch works for the RST case I checked.

Thanks!

On Tue, Jul 9, 2019 at 3:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/9/19 3:22 PM, Eric Dumazet wrote:
> >
> >
> > On 7/9/19 2:33 PM, Marek Majkowski wrote:
> >> Ha, thanks. I missed that.
> >>
> >> There is a caveat though. I don't think it's working as intended...
> >
> >
> > Note that my commit really took a look at a fraction of the cases ;)
> >
> > commit 323a53c41292a0d7efc8748856c623324c8d7c21
> >
> >     ipv6: tcp: enable flowlabel reflection in some RST packets
> >
> >     When RST packets are sent because no socket could be found,
> >     it makes sense to use flowlabel_reflect sysctl to decide
> >     if a reflection of the flowlabel is requested.
> >
> >
> > In your case, a socket is found, most probably, and np->repflow seems to be ignored.
> >
> > I'll take a look, thanks.
>
> I guess a possible fix would be :
>
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index d56a9019a0feb5a34312ec353c555f44b8c09b3d..2a298835317c0f6b1d82fb118dc4ba9647a2a110 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -984,8 +984,13 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
>
>         if (sk) {
>                 oif = sk->sk_bound_dev_if;
> -               if (sk_fullsock(sk))
> +               if (sk_fullsock(sk)) {
> +                       struct ipv6_pinfo *np = tcp_inet6_sk(sk);
> +
>                         trace_tcp_send_reset(sk, skb);
> +                       if (np->repflow)
> +                               label = ip6_flowlabel(ipv6h);
> +               }
>                 if (sk->sk_state == TCP_TIME_WAIT)
>                         label = cpu_to_be32(inet_twsk(sk)->tw_flowlabel);
>         } else {
>
>
> >
> >> Running my script:
> >>
> >> $ sysctl -w net.ipv6.flowlabel_reflect=3
> >>
> >> $ tail reflect.py
> >> cd2.close()
> >> cd.send(b"a")
> >>
> >> $ python3 reflect.py
> >> IP6 (flowlabel 0xf2927, hlim 64) ::1.1235 > ::1.60246: Flags [F.]
> >> IP6 (flowlabel 0xf2927, hlim 64) ::1.60246 > ::1.1235: Flags [P.]
> >> IP6 (flowlabel 0x58ecd, hlim 64) ::1.1235 > ::1.60246: Flags [R]
> >>
> >> Note. The RST is opportunistic, depending on timing I sometimes get a
> >> proper FIN, without RST.
> >>
> >> If I change the script to introduce some delay:
> >>
> >> $ tail reflect.py
> >> cd2.close()
> >> time.sleep(0.1)
> >> cd.send(b"a")
> >>
> >> $ python3 reflect.py
> >> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [.]
> >> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [P.]
> >> IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R]
> >>
> >> Now it seem to work reliably. Tested on net-next under virtme.
> >>
> >> Marek
> >>
> >> On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On 7/9/19 1:10 PM, Marek Majkowski wrote:
> >>>> Morning,
> >>>>
> >>>> I'm experimenting with flow label reflection from a server point of
> >>>> view. I'm able to get it working in both supported ways:
> >>>>
> >>>> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
> >>>>
> >>>> (b) with global flowlabel_reflect sysctl
> >>>>
> >>>> However, I was surprised to see that RST after the connection is torn
> >>>> down, doesn't have the correct flow label value:
> >>>>
> >>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
> >>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
> >>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
> >>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
> >>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
> >>>> IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
> >>>>
> >>>> Notice, the last RST packet has inconsistent flow label. Perhaps we
> >>>> can argue this behaviour might be acceptable for a per-socket
> >>>> IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
> >>>> expect the RST to preserve the reflected flow label value.
> >>>>
> >>>> I suspect the same behaviour is true for kernel-generated ICMPv6.
> >>>>
> >>>> Prepared test case:
> >>>> https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
> >>>>
> >>>> This behaviour is not necessarily a bug, more of a surprise. Flow
> >>>> label reflection is mostly useful in deployments where Linux servers
> >>>> stand behind ECMP router, which uses flow-label to compute the hash.
> >>>> Flow label reflection allows ICMP PTB message to be routed back to
> >>>> correct server.
> >>>>
> >>>> It's hard to imagine a situation where generated RST or ICMP echo
> >>>> response would trigger a ICMP PTB. Flow label reflection is explained
> >>>> here:
> >>>> https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
> >>>> and:
> >>>> https://tools.ietf.org/html/rfc7098
> >>>> https://tools.ietf.org/html/rfc6438
> >>>>
> >>>> Cheers,
> >>>>     Marek
> >>>>
> >>>>
> >>>> (Note: the unrelated "fwmark_reflect" toggle is about something
> >>>> different - flow marks, but also addresses RST and ICMP generated by
> >>>> the server)
> >>>>
> >>>
> >>> Please check the recent commits, scheduled for linux-5.3
> >>>
> >>> a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
> >>> c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
> >>> 392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
> >>> 50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
> >>> 323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets
> >>>

^ permalink raw reply

* Re: [PATCH net-next v6 06/15] ethtool: netlink bitset handling
From: Jiri Pirko @ 2019-07-09 14:18 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: netdev, David Miller, Jakub Kicinski, Andrew Lunn,
	Florian Fainelli, John Linville, Stephen Hemminger, Johannes Berg,
	linux-kernel
In-Reply-To: <20190704115236.GR20101@unicorn.suse.cz>

Thu, Jul 04, 2019 at 01:52:36PM CEST, mkubecek@suse.cz wrote:
>On Thu, Jul 04, 2019 at 10:04:35AM +0200, Jiri Pirko wrote:
>> Wed, Jul 03, 2019 at 08:18:51PM CEST, mkubecek@suse.cz wrote:
>> >On Wed, Jul 03, 2019 at 01:49:33PM +0200, Jiri Pirko wrote:
>> >> Tue, Jul 02, 2019 at 01:50:09PM CEST, mkubecek@suse.cz wrote:
>> >> >+Compact form: nested (bitset) atrribute contents:
>> >> >+
>> >> >+    ETHTOOL_A_BITSET_LIST	(flag)		no mask, only a list
>> >> >+    ETHTOOL_A_BITSET_SIZE	(u32)		number of significant bits
>> >> >+    ETHTOOL_A_BITSET_VALUE	(binary)	bitmap of bit values
>> >> >+    ETHTOOL_A_BITSET_MASK	(binary)	bitmap of valid bits
>> >> >+
>> >> >+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
>> >> >+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
>> >> 
>> >> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
>> >> nested array of NLA_BITFIELD32 instead?
>> >
>> >That would mean a layout like
>> >
>> >  4 bytes of attr header
>> >  4 bytes of value
>> >  4 bytes of mask
>> >  4 bytes of attr header
>> >  4 bytes of value
>> >  4 bytes of mask
>> >  ...
>> >
>> >i.e. interleaved headers, words of value and words of mask. Having value
>> >and mask contiguous looks cleaner to me. Also, I can quickly check the
>> >sizes without iterating through a (potentially long) array.
>> 
>> Yeah, if you are not happy with this, I suggest to introduce
>> NLA_BITFIELD with arbitrary size. That would be probably cleanest.
>
>There is still the question if it it should be implemented as a nested
>attribute which could look like the current compact form without the
>"list" flag (if there is no mask, it's a list). Or an unstructured data
>block consisting of u32 bit length and one or two bitmaps of
>corresponding length. I would prefer the nested attribute, netlink was
>designed to represent structured data, passing structures as binary goes
>against the design (just looked at VFINFO in rtnetlink few days ago,
>it's awful, IMHO).
>
>Either way, I would still prefer to have bitmaps represented as an array
>of 32-bit blocks in host byte order. This would be easy to handle in
>kernel both in places where we have u32 based bitmaps and unsigned long
>based ones. Other options seem less appealing:
>
>  - u8 based: only complicates processing
>  - u64 based: have to care about alignment
>  - unsigned long based: alignment and also problems with 64-bit kernel
>    vs. 32-bit userspace
>
>> >> This is quite complex and confusing. Having the same API for 2 APIs is
>> >> odd. The API should be crystal clear, easy to use.
>> >> 
>> >> Why can't you have 2 commands, one working with bit arrays only, one
>> >> working with strings? Something like:
>> >> X_GET
>> >>    ETHTOOL_A_BITS (nested)
>> >>       ETHTOOL_A_BIT_ARRAY (BITFIELD32)
>> >> X_NAMES_GET
>> >>    ETHTOOL_A_BIT_NAMES (nested)
>> >> 	ETHTOOL_A_BIT_INDEX
>> >> 	ETHTOOL_A_BIT_NAME
>> >> 
>> >> For set, you can also have multiple cmds:
>> >> X_SET  - to set many at once, by bit index
>> >>    ETHTOOL_A_BITS (nested)
>> >>       ETHTOOL_A_BIT_ARRAY (BITFIELD32)
>> >> X_ONE_SET   - to set one, by bit index
>> >>    ETHTOOL_A_BIT_INDEX
>> >>    ETHTOOL_A_BIT_VALUE
>> >> X_ONE_SET   - to set one, by name
>> >>    ETHTOOL_A_BIT_NAME
>> >>    ETHTOOL_A_BIT_VALUE
>> >
>> >This looks as if you assume there is nothing except the bitset in the
>> >message but that is not true. Even with your proposed breaking of
>> >current groups, you would still have e.g. 4 bitsets in reply to netdev
>> >features query, 3 in timestamping info GET request and often bitsets
>> >combined with other data (e.g. WoL modes and optional WoL password).
>> >If you wanted to further refine the message granularity to the level of
>> >single parameters, we might be out of message type ids already.
>> 
>> You can still have multiple bitsets(bitfields) in single message and
>> have separate cmd/cmds to get string-bit mapping. No need to mangle it.
>
>Let's take a look at what it means in practice, the command is
>
>  ethtool --set-prif-flags eth3 legacy-rx on
>
>on an ixgbe card. Currently, ethtool (from the github repository) does
>
>------------------------------------------------------------------------
>ETHTOOL_CMD_SETTINGS_SET (K->U, 68 bytes)
>    ETHTOOL_A_HEADER
>        ETHTOOL_A_DEV_NAME = "eth3"
>    ETHTOOL_A_SETTINGS_PRIV_FLAGS
>        ETHTOOL_A_BITSET_BITS
>            ETHTOOL_A_BITS_BIT
>                ETHTOOL_A_BIT_NAME = "legacy-rx"
>                ETHTOOL_A_BIT_VALUE
>
>NLMSG_ERR (K->U, 36 bytes) err = 0
>------------------------------------------------------------------------
>
>If we had only compact form (or some of the NLA_BITFIELD solutions we
>are talking about), you would need
>
>------------------------------------------------------------------------
>ETHTOOL_CMD_STRSET_GET (U->K, 52 bytes)
>    ETHTOOL_A_HEADER
>        ETHTOOL_A_DEV_NAME = "eth3"
>    ETHTOOL_A_STRSET_STRINGSETS
>        ETHTOOL_A_STRINGSETS_STRINGSET
>            ETHTOOL_A_STRINGSET_ID = 2 (ETH_SS_PRIV_FLAGS)
>
>ETHTOOL_CMD_STRSET_GET_REPLY (K->U, 128 bytes)
>    ETHTOOL_A_HEADER
>        ETHTOOL_A_DEV_INDEX = 9
>        ETHTOOL_A_DEV_NAME = "eth3"
>    ETHTOOL_A_STRSET_STRINGSETS
>        ETHTOOL_A_STRINGSETS_STRINGSET
>            ETHTOOL_A_STRINGSET_ID = 2 (ETH_SS_PRIV_FLAGS)
>            ETHTOOL_A_STRINGSET_COUNT = 2
>            ETHTOOL_A_STRINGSET_STRINGS
>                ETHTOOL_A_STRINGS_STRING
>                    ETHTOOL_A_STRING_INDEX = 0
>                    ETHTOOL_A_STRING_VALUE = "legacy-rx"
>                ETHTOOL_A_STRINGS_STRING
>                    ETHTOOL_A_STRING_INDEX = 1
>                    ETHTOOL_A_STRING_VALUE = "vf-ipsec"
>
>NLMSG_ERR (K->U, 36 bytes) err = 0
>
>ETHTOOL_CMD_SETTINGS_SET (K->U, 64 bytes)
>    ETHTOOL_A_HEADER
>        ETHTOOL_A_DEV_NAME = "eth3"
>    ETHTOOL_A_SETTINGS_PRIV_FLAGS
>        ETHTOOL_A_BITSET_SIZE = 2
>        ETHTOOL_A_BITSET_VALUE = 00000001
>        ETHTOOL_A_BITSET_MASK = 00000001
>
>NLMSG_ERR (K->U, 36 bytes) err = 0
>------------------------------------------------------------------------
>
>That's an extra roundtrip, lot more chat and the SETTINGS_SET message is
>only 4 bytes shorter in the end. And we can consider ourselves lucky
>this NIC has only two private flags. Or that we didn't need to enable or
>disable a netdev feature (56 bits) or link mode (69 bits and growing).
>
>We could reduce the overhead by allowing STRSET_GET query to only ask
>for specific string(s) but there would still be the extra roundtrip
>which I dislike in the ioctl interface. Florian also said in the v5
>discussion that he would like if it was possible to get names and data
>together in one request.

I understand. So how about avoid the bitfield all together and just
have array of either bits of strings or combinations?

ETHTOOL_CMD_SETTINGS_SET (U->K)
    ETHTOOL_A_HEADER
        ETHTOOL_A_DEV_NAME = "eth3"
    ETHTOOL_A_SETTINGS_PRIV_FLAGS
       ETHTOOL_A_SETTINGS_PRIV_FLAG
           ETHTOOL_A_FLAG_NAME = "legacy-rx"
	   ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)

or the same with index instead of string

ETHTOOL_CMD_SETTINGS_SET (U->K)
    ETHTOOL_A_HEADER
        ETHTOOL_A_DEV_NAME = "eth3"
    ETHTOOL_A_SETTINGS_PRIV_FLAGS
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_INDEX = 0
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)


For set you can combine both when you want to set multiple bits:

ETHTOOL_CMD_SETTINGS_SET (U->K)
    ETHTOOL_A_HEADER
        ETHTOOL_A_DEV_NAME = "eth3"
    ETHTOOL_A_SETTINGS_PRIV_FLAGS
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_INDEX = 2
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_INDEX = 8
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_NAME = "legacy-rx"
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)


For get this might be a bit bigger message:

ETHTOOL_CMD_SETTINGS_GET_REPLY (K->U)
    ETHTOOL_A_HEADER
        ETHTOOL_A_DEV_NAME = "eth3"
    ETHTOOL_A_SETTINGS_PRIV_FLAGS
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_INDEX = 0
            ETHTOOL_A_FLAG_NAME = "legacy-rx"
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_INDEX = 1
            ETHTOOL_A_FLAG_NAME = "vf-ipsec"
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)
        ETHTOOL_A_SETTINGS_PRIV_FLAG
            ETHTOOL_A_FLAG_INDEX = 8
            ETHTOOL_A_FLAG_NAME = "something-else"
 	    ETHTOOL_A_FLAG_VALUE   (NLA_FLAG)


>
>Michal

^ permalink raw reply

* [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

Create some common API's for adding entries to a xa_mmap.
Searching for an entry and freeing one.

The code was copied from the efa driver almost as is, just renamed
function to be generic and not efa specific.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/core/device.c      |   1 +
 drivers/infiniband/core/rdma_core.c   |   1 +
 drivers/infiniband/core/uverbs_cmd.c  |   1 +
 drivers/infiniband/core/uverbs_main.c | 135 ++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h               |  46 ++++++++++++
 5 files changed, 184 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8a6ccb936dfe..a830c2c5d691 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
 	SET_DEVICE_OP(dev_ops, map_phys_fmr);
 	SET_DEVICE_OP(dev_ops, mmap);
+	SET_DEVICE_OP(dev_ops, mmap_free);
 	SET_DEVICE_OP(dev_ops, modify_ah);
 	SET_DEVICE_OP(dev_ops, modify_cq);
 	SET_DEVICE_OP(dev_ops, modify_device);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..1ed01b02401f 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -816,6 +816,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
 
 	rdma_restrack_del(&ucontext->res);
 
+	rdma_user_mmap_entries_remove_free(ucontext);
 	ib_dev->ops.dealloc_ucontext(ucontext);
 	kfree(ucontext);
 
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7ddd0e5bc6b3..44c0600245e4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
 
 	mutex_init(&ucontext->per_mm_list_lock);
 	INIT_LIST_HEAD(&ucontext->per_mm_list);
+	xa_init(&ucontext->mmap_xa);
 
 	ret = get_unused_fd_flags(O_CLOEXEC);
 	if (ret < 0)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 11c13c1381cf..4b909d7b97de 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -965,6 +965,141 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(rdma_user_mmap_io);
 
+static inline u64
+rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
+{
+	return (u64)entry->mmap_page << PAGE_SHIFT;
+}
+
+/**
+ * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @key: The key received from rdma_user_mmap_entry_insert which
+ *     is provided by user as the address to map.
+ * @len: The length the user wants to map
+ *
+ * This function is called when a user tries to mmap a key it
+ * initially received from the driver. They key was created by
+ * the function rdma_user_mmap_entry_insert.
+ *
+ * Return an entry if exists or NULL if there is no match.
+ */
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
+{
+	struct rdma_user_mmap_entry *entry;
+	u64 mmap_page;
+
+	mmap_page = key >> PAGE_SHIFT;
+	if (mmap_page > U32_MAX)
+		return NULL;
+
+	entry = xa_load(&ucontext->mmap_xa, mmap_page);
+	if (!entry || entry->length != len)
+		return NULL;
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
+		  entry->obj, key, entry->address, entry->length);
+
+	return entry;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_get);
+
+/**
+ * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @obj: opaque driver object that will be stored in the entry.
+ * @address: The address that will be mmapped to the user
+ * @length: Length of the address that will be mmapped
+ * @mmap_flag: opaque driver flags related to the address (For
+ *           example could be used for cachability)
+ *
+ * This function should be called by drivers that use the rdma_user_mmap
+ * interface for handling user mmapped addresses. The database is handled in
+ * the core and helper functions are provided to insert entries into the
+ * database and extract entries when the user call mmap with the given key.
+ * The function returns a unique key that should be provided to user, the user
+ * will use the key to map the given address.
+ *
+ * Note this locking scheme cannot support removal of entries,
+ * except during ucontext destruction when the core code
+ * guarentees no concurrency.
+ *
+ * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
+ */
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
+				u64 address, u64 length, u8 mmap_flag)
+{
+	struct rdma_user_mmap_entry *entry;
+	u32 next_mmap_page;
+	int err;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return RDMA_USER_MMAP_INVALID;
+
+	entry->obj = obj;
+	entry->address = address;
+	entry->length = length;
+	entry->mmap_flag = mmap_flag;
+
+	xa_lock(&ucontext->mmap_xa);
+	if (check_add_overflow(ucontext->mmap_xa_page,
+			       (u32)(length >> PAGE_SHIFT),
+			       &next_mmap_page))
+		goto err_unlock;
+
+	entry->mmap_page = ucontext->mmap_xa_page;
+	ucontext->mmap_xa_page = next_mmap_page;
+	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
+			  GFP_KERNEL);
+	if (err)
+		goto err_unlock;
+
+	xa_unlock(&ucontext->mmap_xa);
+
+	ibdev_dbg(ucontext->device,
+		  "mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
+		  entry->obj, entry->address, entry->length,
+		  rdma_user_mmap_get_key(entry));
+
+	return rdma_user_mmap_get_key(entry);
+
+err_unlock:
+	xa_unlock(&ucontext->mmap_xa);
+	kfree(entry);
+	return RDMA_USER_MMAP_INVALID;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
+
+/*
+ * This is only called when the ucontext is destroyed and there can be no
+ * concurrent query via mmap or allocate on the xarray, thus we can be sure no
+ * other thread is using the entry pointer. We also know that all the BAR
+ * pages have either been zap'd or munmaped at this point.  Normal pages are
+ * refcounted and will be freed at the proper time.
+ */
+void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext)
+{
+	struct rdma_user_mmap_entry *entry;
+	unsigned long mmap_page;
+
+	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
+		xa_erase(&ucontext->mmap_xa, mmap_page);
+
+		ibdev_dbg(ucontext->device,
+			  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
+			  entry->obj, rdma_user_mmap_get_key(entry),
+			  entry->address, entry->length);
+		if (ucontext->device->ops.mmap_free)
+			ucontext->device->ops.mmap_free(entry);
+		kfree(entry);
+	}
+}
+
 void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 {
 	struct rdma_umap_priv *priv, *next_priv;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 26e9c2594913..1ba29a00f584 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1425,6 +1425,8 @@ struct ib_ucontext {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+	struct xarray mmap_xa;
+	u32 mmap_xa_page;
 };
 
 struct ib_uobject {
@@ -2199,6 +2201,17 @@ struct iw_cm_conn_param;
 
 #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
 
+#define RDMA_USER_MMAP_FLAG_SHIFT 56
+#define RDMA_USER_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
+#define RDMA_USER_MMAP_INVALID U64_MAX
+struct rdma_user_mmap_entry {
+	void *obj;
+	u64 address;
+	u64 length;
+	u32 mmap_page;
+	u8 mmap_flag;
+};
+
 /**
  * struct ib_device_ops - InfiniBand device operations
  * This structure defines all the InfiniBand device operations, providers will
@@ -2311,6 +2324,19 @@ struct ib_device_ops {
 			      struct ib_udata *udata);
 	void (*dealloc_ucontext)(struct ib_ucontext *context);
 	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
+	/**
+	 * Memory that is mapped to the user can only be freed once the
+	 * ucontext of the application is destroyed. This is for
+	 * security reasons where we don't want an application to have a
+	 * mapping to phyiscal memory that is freed and allocated to
+	 * another application. For this reason, all the entries are
+	 * stored in ucontext and once ucontext is freed mmap_free is
+	 * called on each of the entries. They type of the memory that
+	 * was mapped may differ between entries and is opaque to the
+	 * rdma_user_mmap interface. Therefore needs to be implemented
+	 * by the driver in mmap_free.
+	 */
+	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
 	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
 	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
 	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
@@ -2709,6 +2735,11 @@ void ib_set_device_ops(struct ib_device *device,
 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
 		      unsigned long pfn, unsigned long size, pgprot_t prot);
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
+				u64 address, u64 length, u8 mmap_flag);
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len);
+void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext);
 #else
 static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
 				    struct vm_area_struct *vma,
@@ -2717,6 +2748,21 @@ static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
 {
 	return -EINVAL;
 }
+
+static u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
+				       u64 address, u64 length, u8 mmap_flag)
+{
+	return RDMA_USER_MMAP_INVALID;
+}
+
+static struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
+{
+	return NULL;
+}
+
+static void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext) {}
+
 #endif
 
 static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
-- 
2.14.5


^ permalink raw reply related

* [PATCH v6 rdma-next 4/6] qed*: Change dpi_addr to be denoted with __iomem
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

Several casts were required around dpi_addr parameter in qed_rdma_if.h
This is an address on the doorbell bar and should therefore be marked
with __iomem.

Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/main.c          | 2 +-
 drivers/infiniband/hw/qedr/qedr.h          | 2 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 ++---
 include/linux/qed/qed_rdma_if.h            | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index a0a7ba0a5af4..3db4b6ba5ad6 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -815,7 +815,7 @@ static int qedr_init_hw(struct qedr_dev *dev)
 	if (rc)
 		goto out;
 
-	dev->db_addr = (void __iomem *)(uintptr_t)out_params.dpi_addr;
+	dev->db_addr = out_params.dpi_addr;
 	dev->db_phys_addr = out_params.dpi_phys_addr;
 	dev->db_size = out_params.dpi_size;
 	dev->dpi = out_params.dpi;
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 97c90d1e525d..7e80ce521d8d 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -227,7 +227,7 @@ struct qedr_ucontext {
 	struct ib_ucontext ibucontext;
 	struct qedr_dev *dev;
 	struct qedr_pd *pd;
-	u64 dpi_addr;
+	void __iomem *dpi_addr;
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 7873d6dfd91f..fb3fe60a1a68 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -799,9 +799,8 @@ static int qed_rdma_add_user(void *rdma_cxt,
 	/* Calculate the corresponding DPI address */
 	dpi_start_offset = p_hwfn->dpi_start_offset;
 
-	out_params->dpi_addr = (u64)((u8 __iomem *)p_hwfn->doorbells +
-				     dpi_start_offset +
-				     ((out_params->dpi) * p_hwfn->dpi_size));
+	out_params->dpi_addr = p_hwfn->doorbells + dpi_start_offset +
+			       out_params->dpi * p_hwfn->dpi_size;
 
 	out_params->dpi_phys_addr = p_hwfn->cdev->db_phys_addr +
 				    dpi_start_offset +
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index d15f8e4815e3..834166809a6c 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -225,7 +225,7 @@ struct qed_rdma_start_in_params {
 
 struct qed_rdma_add_user_out_params {
 	u16 dpi;
-	u64 dpi_addr;
+	void __iomem *dpi_addr;
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 wid_count;
-- 
2.14.5


^ permalink raw reply related

* [PATCH v6 rdma-next 6/6] RDMA/qedr: Add iWARP doorbell recovery support
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

This patch adds the iWARP specific doorbells to the doorbell
recovery mechanism

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  | 12 +++++++-----
 drivers/infiniband/hw/qedr/verbs.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 8aed24b32de6..dc9ebbf625d2 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -234,6 +234,11 @@ struct qedr_ucontext {
 	bool db_rec;
 };
 
+union db_prod32 {
+	struct rdma_pwm_val16_data data;
+	u32 raw;
+};
+
 union db_prod64 {
 	struct rdma_pwm_val32_data data;
 	u64 raw;
@@ -265,6 +270,8 @@ struct qedr_userq {
 	struct qedr_user_db_rec *db_rec_data;
 	u64 db_rec_phys;
 	u64 db_rec_key;
+	void __iomem *db_rec_db2_addr;
+	union db_prod32 db_rec_db2_data;
 };
 
 struct qedr_cq {
@@ -300,11 +307,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-union db_prod32 {
-	struct rdma_pwm_val16_data data;
-	u32 raw;
-};
-
 struct qedr_qp_hwq_info {
 	/* WQE Elements */
 	struct qed_chain pbl;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index b0b9ec70f2fd..64190de4ce23 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1684,6 +1684,10 @@ static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
 	if (qp->urq.db_rec_data)
 		qedr_db_recovery_del(dev, qp->urq.db_addr,
 				     &qp->urq.db_rec_data->db_data);
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1))
+		qedr_db_recovery_del(dev, qp->urq.db_rec_db2_addr,
+				     &qp->urq.db_rec_db2_data);
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1758,6 +1762,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
 	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
 
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		qp->urq.db_rec_db2_addr = ctx->dpi_addr + uresp.rq_db2_offset;
+
+		/* calculate the db_rec_db2 data since it is constant so no
+		 *  need to reflect from user
+		 */
+		qp->urq.db_rec_db2_data.data.icid = cpu_to_le16(qp->icid);
+		qp->urq.db_rec_db2_data.data.value =
+			cpu_to_le16(DQ_TCM_IWARP_POST_RQ_CF_CMD);
+	}
+
 	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
 				  &qp->usq.db_rec_data->db_data,
 				  DB_REC_WIDTH_32B,
@@ -1771,6 +1786,15 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 				  DB_REC_USER);
 	if (rc)
 		goto err;
+
+	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+		rc = qedr_db_recovery_add(dev, qp->urq.db_rec_db2_addr,
+					  &qp->urq.db_rec_db2_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_USER);
+		if (rc)
+			goto err;
+	}
 	qedr_qp_user_print(dev, qp);
 
 	return rc;
@@ -1811,7 +1835,13 @@ static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 				  &qp->rq.db_data,
 				  DB_REC_WIDTH_32B,
 				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
 
+	rc = qedr_db_recovery_add(dev, qp->rq.iwarp_db2,
+				  &qp->rq.iwarp_db2_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
 	return rc;
 }
 
@@ -1940,8 +1970,13 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
 
-	if (!qp->srq)
+	if (!qp->srq) {
 		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
+
+		if (rdma_protocol_iwarp(&dev->ibdev, 1))
+			qedr_db_recovery_del(dev, qp->rq.iwarp_db2,
+					     &qp->rq.iwarp_db2_data);
+	}
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
-- 
2.14.5


^ permalink raw reply related

* [PATCH v6 rdma-next 3/6] RDMA/qedr: Use the common mmap API
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

Remove all function related to mmap from qedr and use the common
API

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/qedr.h  |  13 ----
 drivers/infiniband/hw/qedr/verbs.c | 153 +++++++++++++------------------------
 drivers/infiniband/hw/qedr/verbs.h |   2 +-
 3 files changed, 52 insertions(+), 116 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 6175d1e98717..97c90d1e525d 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -231,11 +231,6 @@ struct qedr_ucontext {
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
-
-	struct list_head mm_head;
-
-	/* Lock to protect mm list */
-	struct mutex mm_list_lock;
 };
 
 union db_prod64 {
@@ -298,14 +293,6 @@ struct qedr_pd {
 	struct qedr_ucontext *uctx;
 };
 
-struct qedr_mm {
-	struct {
-		u64 phy_addr;
-		unsigned long len;
-	} key;
-	struct list_head entry;
-};
-
 union db_prod32 {
 	struct rdma_pwm_val16_data data;
 	u32 raw;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 27d90a84ea01..f33f0f1e7d76 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -58,6 +58,10 @@
 
 #define DB_ADDR_SHIFT(addr)		((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
 
+enum {
+	QEDR_USER_MMAP_IO_WC = 0,
+};
+
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
 					size_t len)
 {
@@ -256,60 +260,6 @@ int qedr_modify_port(struct ib_device *ibdev, u8 port, int mask,
 	return 0;
 }
 
-static int qedr_add_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			 unsigned long len)
-{
-	struct qedr_mm *mm;
-
-	mm = kzalloc(sizeof(*mm), GFP_KERNEL);
-	if (!mm)
-		return -ENOMEM;
-
-	mm->key.phy_addr = phy_addr;
-	/* This function might be called with a length which is not a multiple
-	 * of PAGE_SIZE, while the mapping is PAGE_SIZE grained and the kernel
-	 * forces this granularity by increasing the requested size if needed.
-	 * When qedr_mmap is called, it will search the list with the updated
-	 * length as a key. To prevent search failures, the length is rounded up
-	 * in advance to PAGE_SIZE.
-	 */
-	mm->key.len = roundup(len, PAGE_SIZE);
-	INIT_LIST_HEAD(&mm->entry);
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_add(&mm->entry, &uctx->mm_head);
-	mutex_unlock(&uctx->mm_list_lock);
-
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "added (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-		 (unsigned long long)mm->key.phy_addr,
-		 (unsigned long)mm->key.len, uctx);
-
-	return 0;
-}
-
-static bool qedr_search_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
-			     unsigned long len)
-{
-	bool found = false;
-	struct qedr_mm *mm;
-
-	mutex_lock(&uctx->mm_list_lock);
-	list_for_each_entry(mm, &uctx->mm_head, entry) {
-		if (len != mm->key.len || phy_addr != mm->key.phy_addr)
-			continue;
-
-		found = true;
-		break;
-	}
-	mutex_unlock(&uctx->mm_list_lock);
-	DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-		 "searched for (addr=0x%llx,len=0x%lx) for ctx=%p, result=%d\n",
-		 mm->key.phy_addr, mm->key.len, uctx, found);
-
-	return found;
-}
-
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 {
 	struct ib_device *ibdev = uctx->device;
@@ -318,6 +268,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	struct qedr_alloc_ucontext_resp uresp = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
+	u64 key;
 
 	if (!udata)
 		return -EFAULT;
@@ -334,13 +285,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	ctx->dpi_addr = oparams.dpi_addr;
 	ctx->dpi_phys_addr = oparams.dpi_phys_addr;
 	ctx->dpi_size = oparams.dpi_size;
-	INIT_LIST_HEAD(&ctx->mm_head);
-	mutex_init(&ctx->mm_list_lock);
+
+	key = rdma_user_mmap_entry_insert(uctx, ctx,
+					  ctx->dpi_phys_addr, ctx->dpi_size,
+					  QEDR_USER_MMAP_IO_WC);
+	if (key == RDMA_USER_MMAP_INVALID)
+		return -ENOMEM;
 
 	uresp.dpm_enabled = dev->user_dpm_enabled;
 	uresp.wids_enabled = 1;
 	uresp.wid_count = oparams.wid_count;
-	uresp.db_pa = ctx->dpi_phys_addr;
+	uresp.db_pa = key;
 	uresp.db_size = ctx->dpi_size;
 	uresp.max_send_wr = dev->attr.max_sqe;
 	uresp.max_recv_wr = dev->attr.max_rqe;
@@ -356,10 +311,6 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 
 	ctx->dev = dev;
 
-	rc = qedr_add_mmap(ctx, ctx->dpi_phys_addr, ctx->dpi_size);
-	if (rc)
-		return rc;
-
 	DP_DEBUG(dev, QEDR_MSG_INIT, "Allocating user context %p\n",
 		 &ctx->ibucontext);
 	return 0;
@@ -368,66 +319,64 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
 {
 	struct qedr_ucontext *uctx = get_qedr_ucontext(ibctx);
-	struct qedr_mm *mm, *tmp;
 
 	DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n",
 		 uctx);
 	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
-
-	list_for_each_entry_safe(mm, tmp, &uctx->mm_head, entry) {
-		DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
-			 "deleted (addr=0x%llx,len=0x%lx) for ctx=%p\n",
-			 mm->key.phy_addr, mm->key.len, uctx);
-		list_del(&mm->entry);
-		kfree(mm);
-	}
 }
 
-int qedr_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 {
-	struct qedr_ucontext *ucontext = get_qedr_ucontext(context);
-	struct qedr_dev *dev = get_qedr_dev(context->device);
-	unsigned long phys_addr = vma->vm_pgoff << PAGE_SHIFT;
-	unsigned long len = (vma->vm_end - vma->vm_start);
-	unsigned long dpi_start;
+	struct ib_device *dev = ucontext->device;
+	u64 length = vma->vm_end - vma->vm_start;
+	u64 key = vma->vm_pgoff << PAGE_SHIFT;
+	struct rdma_user_mmap_entry *entry;
+	u64 pfn;
+	int err;
 
-	dpi_start = dev->db_phys_addr + (ucontext->dpi * ucontext->dpi_size);
+	ibdev_dbg(dev,
+		  "start %#lx, end %#lx, length = %#llx, key = %#llx\n",
+		  vma->vm_start, vma->vm_end, length, key);
 
-	DP_DEBUG(dev, QEDR_MSG_INIT,
-		 "mmap invoked with vm_start=0x%pK, vm_end=0x%pK,vm_pgoff=0x%pK; dpi_start=0x%pK dpi_size=0x%x\n",
-		 (void *)vma->vm_start, (void *)vma->vm_end,
-		 (void *)vma->vm_pgoff, (void *)dpi_start, ucontext->dpi_size);
-
-	if ((vma->vm_start & (PAGE_SIZE - 1)) || (len & (PAGE_SIZE - 1))) {
-		DP_ERR(dev,
-		       "failed mmap, addresses must be page aligned: start=0x%pK, end=0x%pK\n",
-		       (void *)vma->vm_start, (void *)vma->vm_end);
+	if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
+		ibdev_dbg(dev,
+			  "length[%#llx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+			  length, PAGE_SIZE, vma->vm_flags);
 		return -EINVAL;
 	}
 
-	if (!qedr_search_mmap(ucontext, phys_addr, len)) {
-		DP_ERR(dev, "failed mmap, vm_pgoff=0x%lx is not authorized\n",
-		       vma->vm_pgoff);
-		return -EINVAL;
+	if (vma->vm_flags & VM_EXEC) {
+		ibdev_dbg(dev, "Mapping executable pages is not permitted\n");
+		return -EPERM;
 	}
+	vma->vm_flags &= ~VM_MAYEXEC;
 
-	if (phys_addr < dpi_start ||
-	    ((phys_addr + len) > (dpi_start + ucontext->dpi_size))) {
-		DP_ERR(dev,
-		       "failed mmap, pages are outside of dpi; page address=0x%pK, dpi_start=0x%pK, dpi_size=0x%x\n",
-		       (void *)phys_addr, (void *)dpi_start,
-		       ucontext->dpi_size);
+	entry = rdma_user_mmap_entry_get(ucontext, key, length);
+	if (!entry) {
+		ibdev_dbg(dev, "key[%#llx] does not have valid entry\n",
+			  key);
 		return -EINVAL;
 	}
 
-	if (vma->vm_flags & VM_READ) {
-		DP_ERR(dev, "failed mmap, cannot map doorbell bar for read\n");
-		return -EINVAL;
+	ibdev_dbg(dev,
+		  "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
+		  entry->address, length, entry->mmap_flag);
+
+	pfn = entry->address >> PAGE_SHIFT;
+	switch (entry->mmap_flag) {
+	case QEDR_USER_MMAP_IO_WC:
+		err = rdma_user_mmap_io(ucontext, vma, pfn, length,
+					pgprot_writecombine(vma->vm_page_prot));
+		break;
+	default:
+		err = -EINVAL;
 	}
 
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, len,
-				  vma->vm_page_prot);
+	ibdev_dbg(dev,
+		  "Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
+		  entry->address, length, entry->mmap_flag, err);
+
+	return err;
 }
 
 int qedr_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 9aaa90283d6e..724d0983e972 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -46,7 +46,7 @@ int qedr_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
 int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
 void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
 
-int qedr_mmap(struct ib_ucontext *, struct vm_area_struct *vma);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
 int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
-- 
2.14.5


^ permalink raw reply related

* [PATCH v6 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

Remove the functions related to managing the mmap_xa database.
This code was copied to the ib_core. Use the common API's instead.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/efa/efa.h       |   3 +-
 drivers/infiniband/hw/efa/efa_main.c  |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c | 186 ++++++++--------------------------
 3 files changed, 44 insertions(+), 146 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 119f8efec564..350754ac240e 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -71,8 +71,6 @@ struct efa_dev {
 
 struct efa_ucontext {
 	struct ib_ucontext ibucontext;
-	struct xarray mmap_xa;
-	u32 mmap_xa_page;
 	u16 uarn;
 };
 
@@ -147,6 +145,7 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata);
 void efa_dealloc_ucontext(struct ib_ucontext *ibucontext);
 int efa_mmap(struct ib_ucontext *ibucontext,
 	     struct vm_area_struct *vma);
+void efa_mmap_free(struct rdma_user_mmap_entry *entry);
 int efa_create_ah(struct ib_ah *ibah,
 		  struct rdma_ah_attr *ah_attr,
 		  u32 flags,
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index dd1c6d49466f..65508c73accd 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -215,6 +215,7 @@ static const struct ib_device_ops efa_dev_ops = {
 	.get_link_layer = efa_port_link_layer,
 	.get_port_immutable = efa_get_port_immutable,
 	.mmap = efa_mmap,
+	.mmap_free = efa_mmap_free,
 	.modify_qp = efa_modify_qp,
 	.query_device = efa_query_device,
 	.query_gid = efa_query_gid,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index df77bc312a25..419170952760 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -13,10 +13,6 @@
 
 #include "efa.h"
 
-#define EFA_MMAP_FLAG_SHIFT 56
-#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
-#define EFA_MMAP_INVALID U64_MAX
-
 enum {
 	EFA_MMAP_DMA_PAGE = 0,
 	EFA_MMAP_IO_WC,
@@ -27,20 +23,6 @@ enum {
 	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
 	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
 
-struct efa_mmap_entry {
-	void  *obj;
-	u64 address;
-	u64 length;
-	u32 mmap_page;
-	u8 mmap_flag;
-};
-
-static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
-{
-	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
-	       ((u64)efa->mmap_page << PAGE_SHIFT);
-}
-
 #define EFA_CHUNK_PAYLOAD_SHIFT       12
 #define EFA_CHUNK_PAYLOAD_SIZE        BIT(EFA_CHUNK_PAYLOAD_SHIFT)
 #define EFA_CHUNK_PAYLOAD_PTR_SIZE    8
@@ -145,106 +127,6 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
 	return addr;
 }
 
-/*
- * This is only called when the ucontext is destroyed and there can be no
- * concurrent query via mmap or allocate on the xarray, thus we can be sure no
- * other thread is using the entry pointer. We also know that all the BAR
- * pages have either been zap'd or munmaped at this point.  Normal pages are
- * refcounted and will be freed at the proper time.
- */
-static void mmap_entries_remove_free(struct efa_dev *dev,
-				     struct efa_ucontext *ucontext)
-{
-	struct efa_mmap_entry *entry;
-	unsigned long mmap_page;
-
-	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
-		xa_erase(&ucontext->mmap_xa, mmap_page);
-
-		ibdev_dbg(
-			&dev->ibdev,
-			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-			entry->obj, get_mmap_key(entry), entry->address,
-			entry->length);
-		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
-			/* DMA mapping is already gone, now free the pages */
-			free_pages_exact(phys_to_virt(entry->address),
-					 entry->length);
-		kfree(entry);
-	}
-}
-
-static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
-					     struct efa_ucontext *ucontext,
-					     u64 key, u64 len)
-{
-	struct efa_mmap_entry *entry;
-	u64 mmap_page;
-
-	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
-	if (mmap_page > U32_MAX)
-		return NULL;
-
-	entry = xa_load(&ucontext->mmap_xa, mmap_page);
-	if (!entry || get_mmap_key(entry) != key || entry->length != len)
-		return NULL;
-
-	ibdev_dbg(&dev->ibdev,
-		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
-		  entry->obj, key, entry->address, entry->length);
-
-	return entry;
-}
-
-/*
- * Note this locking scheme cannot support removal of entries, except during
- * ucontext destruction when the core code guarentees no concurrency.
- */
-static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext *ucontext,
-			     void *obj, u64 address, u64 length, u8 mmap_flag)
-{
-	struct efa_mmap_entry *entry;
-	u32 next_mmap_page;
-	int err;
-
-	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
-	if (!entry)
-		return EFA_MMAP_INVALID;
-
-	entry->obj = obj;
-	entry->address = address;
-	entry->length = length;
-	entry->mmap_flag = mmap_flag;
-
-	xa_lock(&ucontext->mmap_xa);
-	if (check_add_overflow(ucontext->mmap_xa_page,
-			       (u32)(length >> PAGE_SHIFT),
-			       &next_mmap_page))
-		goto err_unlock;
-
-	entry->mmap_page = ucontext->mmap_xa_page;
-	ucontext->mmap_xa_page = next_mmap_page;
-	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
-			  GFP_KERNEL);
-	if (err)
-		goto err_unlock;
-
-	xa_unlock(&ucontext->mmap_xa);
-
-	ibdev_dbg(
-		&dev->ibdev,
-		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
-		entry->obj, entry->address, entry->length, get_mmap_key(entry));
-
-	return get_mmap_key(entry);
-
-err_unlock:
-	xa_unlock(&ucontext->mmap_xa);
-	kfree(entry);
-	return EFA_MMAP_INVALID;
-
-}
-
 int efa_query_device(struct ib_device *ibdev,
 		     struct ib_device_attr *props,
 		     struct ib_udata *udata)
@@ -488,45 +370,53 @@ static int qp_mmap_entries_setup(struct efa_qp *qp,
 				 struct efa_com_create_qp_params *params,
 				 struct efa_ibv_create_qp_resp *resp)
 {
+	u64 address;
+	u64 length;
+
 	/*
 	 * Once an entry is inserted it might be mmapped, hence cannot be
 	 * cleaned up until dealloc_ucontext.
 	 */
 	resp->sq_db_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->db_bar_addr + resp->sq_db_offset,
-				  PAGE_SIZE, EFA_MMAP_IO_NC);
-	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
+		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+					    dev->db_bar_addr +
+					    resp->sq_db_offset,
+					    PAGE_SIZE, EFA_MMAP_IO_NC);
+	if (resp->sq_db_mmap_key == RDMA_USER_MMAP_INVALID)
 		return -ENOMEM;
 
 	resp->sq_db_offset &= ~PAGE_MASK;
 
+	address = dev->mem_bar_addr + resp->llq_desc_offset;
+	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
+			    (resp->llq_desc_offset & ~PAGE_MASK));
 	resp->llq_desc_mmap_key =
-		mmap_entry_insert(dev, ucontext, qp,
-				  dev->mem_bar_addr + resp->llq_desc_offset,
-				  PAGE_ALIGN(params->sq_ring_size_in_bytes +
-					     (resp->llq_desc_offset & ~PAGE_MASK)),
-				  EFA_MMAP_IO_WC);
-	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
+		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+					    address,
+					    length,
+					    EFA_MMAP_IO_WC);
+	if (resp->llq_desc_mmap_key == RDMA_USER_MMAP_INVALID)
 		return -ENOMEM;
 
 	resp->llq_desc_offset &= ~PAGE_MASK;
 
 	if (qp->rq_size) {
+		address = dev->db_bar_addr + resp->rq_db_offset;
 		resp->rq_db_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  dev->db_bar_addr + resp->rq_db_offset,
-					  PAGE_SIZE, EFA_MMAP_IO_NC);
-		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
+			rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+						    address, PAGE_SIZE,
+						    EFA_MMAP_IO_NC);
+		if (resp->rq_db_mmap_key == RDMA_USER_MMAP_INVALID)
 			return -ENOMEM;
 
 		resp->rq_db_offset &= ~PAGE_MASK;
 
+		address = virt_to_phys(qp->rq_cpu_addr);
 		resp->rq_mmap_key =
-			mmap_entry_insert(dev, ucontext, qp,
-					  virt_to_phys(qp->rq_cpu_addr),
-					  qp->rq_size, EFA_MMAP_DMA_PAGE);
-		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
+			rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+						    address, qp->rq_size,
+						    EFA_MMAP_DMA_PAGE);
+		if (resp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
 			return -ENOMEM;
 
 		resp->rq_mmap_size = qp->rq_size;
@@ -875,11 +765,14 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
 				 struct efa_ibv_create_cq_resp *resp)
 {
+	struct efa_ucontext *ucontext = cq->ucontext;
+
 	resp->q_mmap_size = cq->size;
-	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
-					     virt_to_phys(cq->cpu_addr),
-					     cq->size, EFA_MMAP_DMA_PAGE);
-	if (resp->q_mmap_key == EFA_MMAP_INVALID)
+	resp->q_mmap_key =
+		rdma_user_mmap_entry_insert(&ucontext->ibucontext, cq,
+					    virt_to_phys(cq->cpu_addr),
+					    cq->size, EFA_MMAP_DMA_PAGE);
+	if (resp->q_mmap_key == RDMA_USER_MMAP_INVALID)
 		return -ENOMEM;
 
 	return 0;
@@ -1531,7 +1424,6 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
 		goto err_out;
 
 	ucontext->uarn = result.uarn;
-	xa_init(&ucontext->mmap_xa);
 
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
 	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
@@ -1560,19 +1452,25 @@ void efa_dealloc_ucontext(struct ib_ucontext *ibucontext)
 	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
 	struct efa_dev *dev = to_edev(ibucontext->device);
 
-	mmap_entries_remove_free(dev, ucontext);
 	efa_dealloc_uar(dev, ucontext->uarn);
 }
 
+void efa_mmap_free(struct rdma_user_mmap_entry *entry)
+{
+	/* DMA mapping is already gone, now free the pages */
+	if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
+		free_pages_exact(phys_to_virt(entry->address), entry->length);
+}
+
 static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
 		      struct vm_area_struct *vma, u64 key, u64 length)
 {
-	struct efa_mmap_entry *entry;
+	struct rdma_user_mmap_entry *entry;
 	unsigned long va;
 	u64 pfn;
 	int err;
 
-	entry = mmap_entry_get(dev, ucontext, key, length);
+	entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key, length);
 	if (!entry) {
 		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
 			  key);
-- 
2.14.5


^ permalink raw reply related

* [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev

This patch series uses the doorbell overflow recovery mechanism
introduced in
commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
for rdma ( RoCE and iWARP )

The first three patches modify the core code to contain helper
functions for managing mmap_xa inserting, getting and freeing
entries. The code was taken almost as is from the efa driver.
There is still an open discussion on whether we should take
this even further and make the entire mmap generic. Until a
decision is made, I only created the database API and modified
the efa and qedr driver to use it. The doorbell recovery code will be based
on the common code.

Efa driver was compile tested only.

rdma-core pull request #493

Changes from V5:
- Switch between driver dealloc_ucontext and mmap_entries_remove.
- No need to verify the key after using the key to load an entry from
  the mmap_xa.
- Change mmap_free api to pass an 'entry' object.
- Add documentation for mmap_free and for newly exported functions.
- Fix some extra/missing line breaks.

Changes from V4:
- Add common mmap database and cookie helper functions.

Changes from V3:
- Remove casts from void to u8. Pointer arithmetic can be done on void
- rebase to tip of rdma-next

Changes from V2:
- Don't use long-lived kmap. Instead use user-trigger mmap for the
  doorbell recovery entries.
- Modify dpi_addr to be denoted with __iomem and avoid redundant
  casts

Changes from V1:
- call kmap to map virtual address into kernel space
- modify db_rec_delete to be void
- remove some cpu_to_le16 that were added to previous patch which are
  correct but not related to the overflow recovery mechanism. Will be
  submitted as part of a different patch


Michal Kalderon (6):
  RDMA/core: Create mmap database and cookie helper functions
  RDMA/efa: Use the common mmap_xa helpers
  RDMA/qedr: Use the common mmap API
  qed*: Change dpi_addr to be denoted with __iomem
  RDMA/qedr: Add doorbell overflow recovery support
  RDMA/qedr: Add iWARP doorbell recovery support

 drivers/infiniband/core/device.c           |   1 +
 drivers/infiniband/core/rdma_core.c        |   1 +
 drivers/infiniband/core/uverbs_cmd.c       |   1 +
 drivers/infiniband/core/uverbs_main.c      | 135 +++++++++
 drivers/infiniband/hw/efa/efa.h            |   3 +-
 drivers/infiniband/hw/efa/efa_main.c       |   1 +
 drivers/infiniband/hw/efa/efa_verbs.c      | 186 +++---------
 drivers/infiniband/hw/qedr/main.c          |   3 +-
 drivers/infiniband/hw/qedr/qedr.h          |  32 +-
 drivers/infiniband/hw/qedr/verbs.c         | 463 ++++++++++++++++++++---------
 drivers/infiniband/hw/qedr/verbs.h         |   4 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c |   5 +-
 include/linux/qed/qed_rdma_if.h            |   2 +-
 include/rdma/ib_verbs.h                    |  46 +++
 include/uapi/rdma/qedr-abi.h               |  25 ++
 15 files changed, 600 insertions(+), 308 deletions(-)

-- 
2.14.5


^ permalink raw reply

* [PATCH v6 rdma-next 5/6] RDMA/qedr: Add doorbell overflow recovery support
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
  To: michal.kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/infiniband/hw/qedr/main.c  |   1 +
 drivers/infiniband/hw/qedr/qedr.h  |   7 +
 drivers/infiniband/hw/qedr/verbs.c | 273 ++++++++++++++++++++++++++++++++-----
 drivers/infiniband/hw/qedr/verbs.h |   2 +
 include/uapi/rdma/qedr-abi.h       |  25 ++++
 5 files changed, 273 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 3db4b6ba5ad6..34225c88f03d 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -206,6 +206,7 @@ static const struct ib_device_ops qedr_dev_ops = {
 	.get_link_layer = qedr_link_layer,
 	.map_mr_sg = qedr_map_mr_sg,
 	.mmap = qedr_mmap,
+	.mmap_free = qedr_mmap_free,
 	.modify_port = qedr_modify_port,
 	.modify_qp = qedr_modify_qp,
 	.modify_srq = qedr_modify_srq,
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 7e80ce521d8d..8aed24b32de6 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -231,6 +231,7 @@ struct qedr_ucontext {
 	u64 dpi_phys_addr;
 	u32 dpi_size;
 	u16 dpi;
+	bool db_rec;
 };
 
 union db_prod64 {
@@ -258,6 +259,12 @@ struct qedr_userq {
 	struct qedr_pbl *pbl_tbl;
 	u64 buf_addr;
 	size_t buf_len;
+
+	/* doorbell recovery */
+	void __iomem *db_addr;
+	struct qedr_user_db_rec *db_rec_data;
+	u64 db_rec_phys;
+	u64 db_rec_key;
 };
 
 struct qedr_cq {
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index f33f0f1e7d76..b0b9ec70f2fd 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -60,6 +60,7 @@
 
 enum {
 	QEDR_USER_MMAP_IO_WC = 0,
+	QEDR_USER_MMAP_PHYS_PAGE,
 };
 
 static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
@@ -266,6 +267,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	int rc;
 	struct qedr_ucontext *ctx = get_qedr_ucontext(uctx);
 	struct qedr_alloc_ucontext_resp uresp = {};
+	struct qedr_alloc_ucontext_req ureq = {};
 	struct qedr_dev *dev = get_qedr_dev(ibdev);
 	struct qed_rdma_add_user_out_params oparams;
 	u64 key;
@@ -273,6 +275,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
 	if (!udata)
 		return -EFAULT;
 
+	if (udata->inlen) {
+		rc = ib_copy_from_udata(&ureq, udata,
+					min(sizeof(ureq), udata->inlen));
+		if (rc) {
+			DP_ERR(dev, "Problem copying data from user space\n");
+			return -EFAULT;
+		}
+
+		ctx->db_rec = !!(ureq.context_flags & QEDR_ALLOC_UCTX_DB_REC);
+	}
+
 	rc = dev->ops->rdma_add_user(dev->rdma_ctx, &oparams);
 	if (rc) {
 		DP_ERR(dev,
@@ -325,6 +338,13 @@ void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
 	uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
 }
 
+void qedr_mmap_free(struct rdma_user_mmap_entry *entry)
+{
+	/* DMA mapping is already gone, now free the pages */
+	if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
+		free_page((unsigned long)phys_to_virt(entry->address));
+}
+
 int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 {
 	struct ib_device *dev = ucontext->device;
@@ -368,6 +388,11 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
 		err = rdma_user_mmap_io(ucontext, vma, pfn, length,
 					pgprot_writecombine(vma->vm_page_prot));
 		break;
+	case QEDR_USER_MMAP_PHYS_PAGE:
+		err = vm_insert_page(vma, vma->vm_start, pfn_to_page(pfn));
+		if (err)
+			break;
+		break;
 	default:
 		err = -EINVAL;
 	}
@@ -606,16 +631,48 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
 	}
 }
 
+static int qedr_db_recovery_add(struct qedr_dev *dev,
+				void __iomem *db_addr,
+				void *db_data,
+				enum qed_db_rec_width db_width,
+				enum qed_db_rec_space db_space)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return 0;
+	}
+
+	return dev->ops->common->db_recovery_add(dev->cdev, db_addr, db_data,
+						 db_width, db_space);
+}
+
+static void qedr_db_recovery_del(struct qedr_dev *dev,
+				 void __iomem *db_addr,
+				 void *db_data)
+{
+	if (!db_data) {
+		DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+		return;
+	}
+
+	/* Ignore return code as there is not much we can do about it. Error
+	 * log will be printed inside.
+	 */
+	dev->ops->common->db_recovery_del(dev->cdev, db_addr, db_data);
+}
+
 static int qedr_copy_cq_uresp(struct qedr_dev *dev,
-			      struct qedr_cq *cq, struct ib_udata *udata)
+			      struct qedr_cq *cq, struct ib_udata *udata,
+			      u32 db_offset)
 {
 	struct qedr_create_cq_uresp uresp;
 	int rc;
 
 	memset(&uresp, 0, sizeof(uresp));
 
-	uresp.db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+	uresp.db_offset = db_offset;
 	uresp.icid = cq->icid;
+	uresp.db_rec_addr = cq->q.db_rec_key;
 
 	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
 	if (rc)
@@ -643,10 +700,42 @@ static inline int qedr_align_cq_entries(int entries)
 	return aligned_size / QEDR_CQE_SIZE;
 }
 
+static int qedr_init_user_db_rec(struct ib_udata *udata,
+				 struct qedr_dev *dev, struct qedr_userq *q,
+				 bool requires_db_rec)
+{
+	struct qedr_ucontext *uctx =
+		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+					  ibucontext);
+
+	/* Aborting for non doorbell userqueue (SRQ) or non-supporting lib */
+	if (requires_db_rec == 0 || !uctx->db_rec)
+		return 0;
+
+	/* Allocate a page for doorbell recovery, add to mmap ) */
+	q->db_rec_data = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!q->db_rec_data) {
+		DP_ERR(dev,
+		       "get_free_page failed\n");
+		return -ENOMEM;
+	}
+
+	q->db_rec_phys = virt_to_phys(q->db_rec_data);
+	q->db_rec_key = rdma_user_mmap_entry_insert(&uctx->ibucontext, q,
+						    q->db_rec_phys,
+						    PAGE_SIZE,
+						    QEDR_USER_MMAP_PHYS_PAGE);
+	if (q->db_rec_key == RDMA_USER_MMAP_INVALID)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static inline int qedr_init_user_queue(struct ib_udata *udata,
 				       struct qedr_dev *dev,
 				       struct qedr_userq *q, u64 buf_addr,
-				       size_t buf_len, int access, int dmasync,
+				       size_t buf_len, bool requires_db_rec,
+				       int access, int dmasync,
 				       int alloc_and_init)
 {
 	u32 fw_pages;
@@ -684,7 +773,8 @@ static inline int qedr_init_user_queue(struct ib_udata *udata,
 		}
 	}
 
-	return 0;
+	/* mmap the user address used to store doorbell data for recovery */
+	return qedr_init_user_db_rec(udata, dev, q, requires_db_rec);
 
 err0:
 	ib_umem_release(q->umem);
@@ -770,6 +860,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int entries = attr->cqe;
 	struct qedr_cq *cq = get_qedr_cq(ibcq);
 	int chain_entries;
+	u32 db_offset;
 	int page_cnt;
 	u64 pbl_ptr;
 	u16 icid;
@@ -789,8 +880,12 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	chain_entries = qedr_align_cq_entries(entries);
 	chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
 
+	/* calc db offset. user will add DPI base, kernel will add db addr */
+	db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create cq: problem copying data from user space\n");
 			goto err0;
@@ -805,8 +900,9 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->cq_type = QEDR_CQ_TYPE_USER;
 
 		rc = qedr_init_user_queue(udata, dev, &cq->q, ureq.addr,
-					  ureq.len, IB_ACCESS_LOCAL_WRITE, 1,
-					  1);
+					  ureq.len, true,
+					  IB_ACCESS_LOCAL_WRITE,
+					  1, 1);
 		if (rc)
 			goto err0;
 
@@ -814,6 +910,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		page_cnt = cq->q.pbl_info.num_pbes;
 
 		cq->ibcq.cqe = chain_entries;
+		cq->q.db_addr = ctx->dpi_addr + db_offset;
 	} else {
 		cq->cq_type = QEDR_CQ_TYPE_KERNEL;
 
@@ -844,14 +941,21 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	spin_lock_init(&cq->cq_lock);
 
 	if (udata) {
-		rc = qedr_copy_cq_uresp(dev, cq, udata);
+		rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
+		if (rc)
+			goto err3;
+
+		rc = qedr_db_recovery_add(dev, cq->q.db_addr,
+					  &cq->q.db_rec_data->db_data,
+					  DB_REC_WIDTH_64B,
+					  DB_REC_USER);
 		if (rc)
 			goto err3;
+
 	} else {
 		/* Generate doorbell address. */
-		cq->db_addr = dev->db_addr +
-		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
 		cq->db.data.icid = cq->icid;
+		cq->db_addr = dev->db_addr + db_offset;
 		cq->db.data.params = DB_AGG_CMD_SET <<
 		    RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
 
@@ -861,6 +965,11 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		cq->latest_cqe = NULL;
 		consume_cqe(cq);
 		cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
+
+		rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
+					  DB_REC_WIDTH_64B, DB_REC_KERNEL);
+		if (rc)
+			goto err3;
 	}
 
 	DP_DEBUG(dev, QEDR_MSG_CQ,
@@ -879,8 +988,18 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	else
 		dev->ops->common->chain_free(dev->cdev, &cq->pbl);
 err1:
-	if (udata)
+	if (udata) {
 		ib_umem_release(cq->q.umem);
+		if (cq->q.db_rec_data) {
+			qedr_db_recovery_del(dev, cq->q.db_addr,
+					     &cq->q.db_rec_data->db_data);
+			if (cq->q.db_rec_key == RDMA_USER_MMAP_INVALID)
+				free_page((unsigned long)cq->q.db_rec_data);
+			/* o/w will be freed by ib_uverbs on context free */
+		}
+	} else {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
+	}
 err0:
 	return -EINVAL;
 }
@@ -911,8 +1030,10 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	cq->destroyed = 1;
 
 	/* GSIs CQs are handled by driver, so they don't exist in the FW */
-	if (cq->cq_type == QEDR_CQ_TYPE_GSI)
+	if (cq->cq_type == QEDR_CQ_TYPE_GSI) {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 		return;
+	}
 
 	iparams.icid = cq->icid;
 	dev->ops->rdma_destroy_cq(dev->rdma_ctx, &iparams, &oparams);
@@ -921,6 +1042,12 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	if (udata) {
 		qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
 		ib_umem_release(cq->q.umem);
+
+		if (cq->q.db_rec_data)
+			qedr_db_recovery_del(dev, cq->q.db_addr,
+					     &cq->q.db_rec_data->db_data);
+	} else {
+		qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
 	}
 
 	/* We don't want the IRQ handler to handle a non-existing CQ so we
@@ -1085,8 +1212,8 @@ static int qedr_copy_srq_uresp(struct qedr_dev *dev,
 }
 
 static void qedr_copy_rq_uresp(struct qedr_dev *dev,
-			       struct qedr_create_qp_uresp *uresp,
-			       struct qedr_qp *qp)
+			      struct qedr_create_qp_uresp *uresp,
+			      struct qedr_qp *qp)
 {
 	/* iWARP requires two doorbells per RQ. */
 	if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
@@ -1099,6 +1226,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev,
 	}
 
 	uresp->rq_icid = qp->icid;
+	uresp->rq_db_rec_addr = qp->urq.db_rec_key;
 }
 
 static void qedr_copy_sq_uresp(struct qedr_dev *dev,
@@ -1112,22 +1240,24 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev,
 		uresp->sq_icid = qp->icid;
 	else
 		uresp->sq_icid = qp->icid + 1;
+
+	uresp->sq_db_rec_addr = qp->usq.db_rec_key;
 }
 
 static int qedr_copy_qp_uresp(struct qedr_dev *dev,
-			      struct qedr_qp *qp, struct ib_udata *udata)
+			      struct qedr_qp *qp, struct ib_udata *udata,
+			      struct qedr_create_qp_uresp *uresp)
 {
-	struct qedr_create_qp_uresp uresp;
 	int rc;
 
-	memset(&uresp, 0, sizeof(uresp));
-	qedr_copy_sq_uresp(dev, &uresp, qp);
-	qedr_copy_rq_uresp(dev, &uresp, qp);
+	memset(uresp, 0, sizeof(*uresp));
+	qedr_copy_sq_uresp(dev, uresp, qp);
+	qedr_copy_rq_uresp(dev, uresp, qp);
 
-	uresp.atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
-	uresp.qp_id = qp->qp_id;
+	uresp->atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
+	uresp->qp_id = qp->qp_id;
 
-	rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+	rc = qedr_ib_copy_to_udata(udata, uresp, sizeof(*uresp));
 	if (rc)
 		DP_ERR(dev,
 		       "create qp: failed a copy to user space with qp icid=0x%x.\n",
@@ -1171,16 +1301,35 @@ static void qedr_set_common_qp_params(struct qedr_dev *dev,
 		 qp->sq.max_sges, qp->sq_cq->icid);
 }
 
-static void qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid + 1;
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	if (!qp->srq) {
 		qp->rq.db = dev->db_addr +
 			    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD);
 		qp->rq.db_data.data.icid = qp->icid;
+
+		rc = qedr_db_recovery_add(dev, qp->rq.db,
+					  &qp->rq.db_data,
+					  DB_REC_WIDTH_32B,
+					  DB_REC_KERNEL);
+		if (rc)
+			qedr_db_recovery_del(dev, qp->sq.db,
+					     &qp->sq.db_data);
 	}
+
+	return rc;
 }
 
 static int qedr_check_srq_params(struct qedr_dev *dev,
@@ -1234,7 +1383,7 @@ static int qedr_init_srq_user_params(struct ib_udata *udata,
 	int rc;
 
 	rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, ureq->srq_addr,
-				  ureq->srq_len, access, dmasync, 1);
+				  ureq->srq_len, false, access, dmasync, 1);
 	if (rc)
 		return rc;
 
@@ -1330,7 +1479,8 @@ int qedr_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr,
 	hw_srq->max_sges = init_attr->attr.max_sge;
 
 	if (udata) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+		if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+							 udata->inlen))) {
 			DP_ERR(dev,
 			       "create srq: problem copying data from user space\n");
 			goto err0;
@@ -1526,6 +1676,14 @@ static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	ib_umem_release(qp->urq.umem);
 	qp->urq.umem = NULL;
+
+	if (qp->usq.db_rec_data)
+		qedr_db_recovery_del(dev, qp->usq.db_addr,
+				     &qp->usq.db_rec_data->db_data);
+
+	if (qp->urq.db_rec_data)
+		qedr_db_recovery_del(dev, qp->urq.db_addr,
+				     &qp->urq.db_rec_data->db_data);
 }
 
 static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1537,12 +1695,14 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	struct qed_rdma_create_qp_in_params in_params;
 	struct qed_rdma_create_qp_out_params out_params;
 	struct qedr_pd *pd = get_qedr_pd(ibpd);
+	struct qedr_create_qp_uresp uresp;
+	struct qedr_ucontext *ctx = NULL;
 	struct qedr_create_qp_ureq ureq;
 	int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1);
 	int rc = -EINVAL;
 
 	memset(&ureq, 0, sizeof(ureq));
-	rc = ib_copy_from_udata(&ureq, udata, sizeof(ureq));
+	rc = ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), udata->inlen));
 	if (rc) {
 		DP_ERR(dev, "Problem copying data from user space\n");
 		return rc;
@@ -1550,14 +1710,16 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 
 	/* SQ - read access only (0), dma sync not required (0) */
 	rc = qedr_init_user_queue(udata, dev, &qp->usq, ureq.sq_addr,
-				  ureq.sq_len, 0, 0, alloc_and_init);
+				  ureq.sq_len, true, 0, 0,
+				  alloc_and_init);
 	if (rc)
 		return rc;
 
 	if (!qp->srq) {
 		/* RQ - read access only (0), dma sync not required (0) */
 		rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
-					  ureq.rq_len, 0, 0, alloc_and_init);
+					  ureq.rq_len, true,
+					  0, 0, alloc_and_init);
 		if (rc)
 			return rc;
 	}
@@ -1587,13 +1749,31 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	rc = qedr_copy_qp_uresp(dev, qp, udata);
+	rc = qedr_copy_qp_uresp(dev, qp, udata, &uresp);
 	if (rc)
 		goto err;
 
+	/* db offset was calculated in copy_qp_uresp, now set in the user q */
+	ctx = pd->uctx;
+	qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
+	qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
+
+	rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
+				  &qp->usq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
+	if (rc)
+		goto err;
+
+	rc = qedr_db_recovery_add(dev, qp->urq.db_addr,
+				  &qp->urq.db_rec_data->db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_USER);
+	if (rc)
+		goto err;
 	qedr_qp_user_print(dev, qp);
 
-	return 0;
+	return rc;
 err:
 	rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
 	if (rc)
@@ -1604,12 +1784,21 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
 	return rc;
 }
 
-static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 {
+	int rc;
+
 	qp->sq.db = dev->db_addr +
 	    DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
 	qp->sq.db_data.data.icid = qp->icid;
 
+	rc = qedr_db_recovery_add(dev, qp->sq.db,
+				  &qp->sq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+	if (rc)
+		return rc;
+
 	qp->rq.db = dev->db_addr +
 		    DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD);
 	qp->rq.db_data.data.icid = qp->icid;
@@ -1617,6 +1806,13 @@ static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
 			   DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_FLAGS);
 	qp->rq.iwarp_db2_data.data.icid = qp->icid;
 	qp->rq.iwarp_db2_data.data.value = DQ_TCM_IWARP_POST_RQ_CF_CMD;
+
+	rc = qedr_db_recovery_add(dev, qp->rq.db,
+				  &qp->rq.db_data,
+				  DB_REC_WIDTH_32B,
+				  DB_REC_KERNEL);
+
+	return rc;
 }
 
 static int
@@ -1664,8 +1860,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_roce_db_info(dev, qp);
-	return rc;
+	return qedr_set_roce_db_info(dev, qp);
 }
 
 static int
@@ -1723,8 +1918,7 @@ qedr_iwarp_create_kernel_qp(struct qedr_dev *dev,
 	qp->qp_id = out_params.qp_id;
 	qp->icid = out_params.icid;
 
-	qedr_set_iwarp_db_info(dev, qp);
-	return rc;
+	return qedr_set_iwarp_db_info(dev, qp);
 
 err:
 	dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -1739,6 +1933,15 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
 
 	dev->ops->common->chain_free(dev->cdev, &qp->rq.pbl);
 	kfree(qp->rqe_wr_id);
+
+	/* GSI qp is not registered to db mechanism so no need to delete */
+	if (qp->qp_type == IB_QPT_GSI)
+		return;
+
+	qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
+
+	if (!qp->srq)
+		qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
 }
 
 static int qedr_create_kernel_qp(struct qedr_dev *dev,
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 724d0983e972..830c86561e23 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -47,6 +47,8 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
 void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
 
 int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
+void qedr_mmap_free(struct rdma_user_mmap_entry *entry);
+
 int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 7a10b3a325fa..c022ee26089b 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -38,6 +38,15 @@
 #define QEDR_ABI_VERSION		(8)
 
 /* user kernel communication data structures. */
+enum qedr_alloc_ucontext_flags {
+	QEDR_ALLOC_UCTX_RESERVED	= 1 << 0,
+	QEDR_ALLOC_UCTX_DB_REC		= 1 << 1
+};
+
+struct qedr_alloc_ucontext_req {
+	__u32 context_flags;
+	__u32 reserved;
+};
 
 struct qedr_alloc_ucontext_resp {
 	__aligned_u64 db_pa;
@@ -74,6 +83,7 @@ struct qedr_create_cq_uresp {
 	__u32 db_offset;
 	__u16 icid;
 	__u16 reserved;
+	__aligned_u64 db_rec_addr;
 };
 
 struct qedr_create_qp_ureq {
@@ -109,6 +119,13 @@ struct qedr_create_qp_uresp {
 
 	__u32 rq_db2_offset;
 	__u32 reserved;
+
+	/* address of SQ doorbell recovery user entry */
+	__aligned_u64 sq_db_rec_addr;
+
+	/* address of RQ doorbell recovery user entry */
+	__aligned_u64 rq_db_rec_addr;
+
 };
 
 struct qedr_create_srq_ureq {
@@ -128,4 +145,12 @@ struct qedr_create_srq_uresp {
 	__u32 reserved1;
 };
 
+/* doorbell recovery entry allocated and populated by userspace doorbelling
+ * entities and mapped to kernel. Kernel uses this to register doorbell
+ * information with doorbell drop recovery mechanism.
+ */
+struct qedr_user_db_rec {
+	__aligned_u64 db_data; /* doorbell data */
+};
+
 #endif /* __QEDR_USER_H__ */
-- 
2.14.5


^ permalink raw reply related

* Re: [PATCH] crypto: user - make NETLINK_CRYPTO work inside netns
From: Herbert Xu @ 2019-07-09 14:38 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: linux-crypto, netdev, David S . Miller, Stephan Mueller,
	Steffen Klassert, Don Zickus
In-Reply-To: <20190709111124.31127-1-omosnace@redhat.com>

On Tue, Jul 09, 2019 at 01:11:24PM +0200, Ondrej Mosnacek wrote:
> Currently, NETLINK_CRYPTO works only in the init network namespace. It
> doesn't make much sense to cut it out of the other network namespaces,
> so do the minor plumbing work necessary to make it work in any network
> namespace. Code inspired by net/core/sock_diag.c.
> 
> Tested using kcapi-dgst from libkcapi [1]:
> Before:
>     # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
>     libkcapi - Error: Netlink error: sendmsg failed
>     libkcapi - Error: Netlink error: sendmsg failed
>     libkcapi - Error: NETLINK_CRYPTO: cannot obtain cipher information for hmac(sha512) (is required crypto_user.c patch missing? see documentation)
>     0
> 
> After:
>     # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
>     32
> 
> [1] https://github.com/smuellerDD/libkcapi
> 
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

Should we really let root inside a namespace manipulate crypto
algorithms which are global?

I think we should only allow the query operations without deeper
surgery.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Fw: [Bug 204099] New: systemd-networkd fails on 5.2 - same version works on 5.1.16
From: Stephen Hemminger @ 2019-07-09 14:43 UTC (permalink / raw)
  To: netdev

Looks like the stricter netlink validation broke userspace.
This is bad.

Begin forwarded message:

Date: Tue, 09 Jul 2019 00:44:01 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 204099] New: systemd-networkd fails on 5.2 - same version works on 5.1.16


https://bugzilla.kernel.org/show_bug.cgi?id=204099

            Bug ID: 204099
           Summary: systemd-networkd fails on 5.2 - same version works on
                    5.1.16
           Product: Networking
           Version: 2.5
    Kernel Version: 5.2
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: low
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: Ian.kumlien@gmail.com
        Regression: No

This is more FYI, I haven't had time to properly debug it.

Booting 5.2 causes systemd-networkd to fail to bring any interface up, it will
fail with: "Could not bring up interface: Invalid argument"

However, booting 5.1.16 with the same software works just fine.

Sounds like something was changed in, what I assume is, the netlink API

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox