Netdev List

Netdev List
 help / color / mirror / Atom feed

* [BACKPORT 4.14.y 2/8] ip6: fix skb leak in ip6frag_expire_frag_queue()
From: Baolin Wang @ 2019-09-03  6:56 UTC (permalink / raw)
  To: stable, davem, kuznet, yoshfuji, edumazet
  Cc: netdev, arnd, baolin.wang, orsonzhai, vincent.guittot,
	linux-kernel
In-Reply-To: <cover.1567492316.git.baolin.wang@linaro.org>

From: Eric Dumazet <edumazet@google.com>

Since ip6frag_expire_frag_queue() now pulls the head skb
from frag queue, we should no longer use skb_get(), since
this leads to an skb leak.

Stefan Bader initially reported a problem in 4.4.stable [1] caused
by the skb_get(), so this patch should also fix this issue.

296583.091021] kernel BUG at /build/linux-6VmqmP/linux-4.4.0/net/core/skbuff.c:1207!
[296583.091734] Call Trace:
[296583.091749]  [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
[296583.091764]  [<ffffffff8183939a>] _decode_session6+0x26a/0x400
[296583.091779]  [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
[296583.091795]  [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
[296583.091809]  [<ffffffff81824421>] icmp6_send+0x5e1/0x940
[296583.091823]  [<ffffffff81753238>] ? __netif_receive_skb+0x18/0x60
[296583.091838]  [<ffffffff817532b2>] ? netif_receive_skb_internal+0x32/0xa0
[296583.091858]  [<ffffffffc0199f74>] ? ixgbe_clean_rx_irq+0x594/0xac0 [ixgbe]
[296583.091876]  [<ffffffffc04eb260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[296583.091893]  [<ffffffff8183d431>] icmpv6_send+0x21/0x30
[296583.091906]  [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
[296583.091921]  [<ffffffffc04eb27f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[296583.091938]  [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[296583.091951]  [<ffffffffc04eb260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[296583.091968]  [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[296583.091982]  [<ffffffff8108a339>] __do_softirq+0x109/0x2b0

Fixes: d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6 defrag")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: Peter Oskolkov <posk@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 include/net/ipv6_frag.h |    1 -
 1 file changed, 1 deletion(-)

diff --git a/include/net/ipv6_frag.h b/include/net/ipv6_frag.h
index 28aa9b3..1f77fb4 100644
--- a/include/net/ipv6_frag.h
+++ b/include/net/ipv6_frag.h
@@ -94,7 +94,6 @@ static inline u32 ip6frag_obj_hashfn(const void *data, u32 len, u32 seed)
 		goto out;
 
 	head->dev = dev;
-	skb_get(head);
 	spin_unlock(&fq->q.lock);
 
 	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
-- 
1.7.9.5


^ permalink raw reply related

* Re: [PATCH net-next 0/5] net/tls: minor cleanups
From: Boris Pismenny @ 2019-09-03  6:56 UTC (permalink / raw)
  To: Jakub Kicinski, davem@davemloft.net
  Cc: netdev@vger.kernel.org, oss-drivers@netronome.com,
	davejwatson@fb.com, Aviad Yehezkel, john.fastabend@gmail.com,
	daniel@iogearbox.net
In-Reply-To: <20190903043106.27570-1-jakub.kicinski@netronome.com>

On 9/3/2019 7:31 AM, Jakub Kicinski wrote:
> Hi!
>
> This set is a grab bag of TLS cleanups accumulated in my tree
> in an attempt to avoid merge problems with net. Nothing stands
> out. First patch dedups context information. Next control path
> locking is very slightly optimized. Fourth patch cleans up
> ugly #ifdefs.
>
> Jakub Kicinski (5):
>   net/tls: use the full sk_proto pointer
>   net/tls: don't jump to return
>   net/tls: narrow down the critical area of device_offload_lock
>   net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICE
>   net/tls: dedup the record cleanup
>
>  drivers/crypto/chelsio/chtls/chtls_main.c |  6 +-
>  include/net/tls.h                         | 48 +++++++++-----
>  net/tls/tls_device.c                      | 78 +++++++++++------------
>  net/tls/tls_main.c                        | 46 ++++---------
>  net/tls/tls_sw.c                          |  6 +-
>  5 files changed, 85 insertions(+), 99 deletions(-)

LGTM

Reviewed-by: Boris Pismenny <borisp@mellanox.com>


^ permalink raw reply

* RE: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Hayes Wang @ 2019-09-03  6:55 UTC (permalink / raw)
  To: Heiner Kallweit, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <56675c6b-c792-245e-54d0-eacd50e7a139@gmail.com>

Heiner Kallweit [mailto:hkallweit1@gmail.com]
> Sent: Tuesday, September 03, 2019 2:45 PM
[...]
> > Besides, I have a question. I think I don't need rtl8152_set_speed()
> > if I implement phylib. However, I need to record some information
> > according to the settings of speed. For now, I do it in rtl8152_set_speed().
> > Do you have any idea about how I should do it with phylib without
> > rtl8152_set_speed()?
> >
> When saying "record some information", what kind of information?

Some of our chips support the feature of UPS. When satisfying certain
condition, the hw would recover the settings of speed. Therefore, I have
to record the settings of the speed, and set them to hw.

> The speed itself is stored in struct phy_device, if you need to adjust
> certain chip settings depending on negotiated speed, then you can do
> this in a callback (parameter handler of phy_connect_direct).
> See e.g. r8169_phylink_handler()

Thanks. I would study it.

Best Regards,
Hayes



^ permalink raw reply

* [BACKPORT 4.14.y 0/8] Candidates from Spreadtrum 4.14 product kernel
From: Baolin Wang @ 2019-09-03  6:53 UTC (permalink / raw)
  To: stable, chris, airlied, davem, kuznet, yoshfuji, edumazet, peterz,
	mingo, vyasevich, nhorman, linus.walleij, natechancellor, sre,
	paulus, gregkh
  Cc: intel-gfx, dri-devel, netdev, longman, hariprasad.kelam,
	linux-sctp, linux-gpio, david, linux-pm, ebiggers, linux-ppp,
	lanqing.liu, linux-serial, arnd, baolin.wang, orsonzhai,
	vincent.guittot, linux-kernel

With Arnd's script [1] help, I found some bugfixes in Spreadtrum 4.14 product
kernel, but missing in v4.14.141:

86fda90ab588 net: sctp: fix warning "NULL check before some freeing functions is not needed"
25a09ce79639 ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"
d9b308b1f8a1 drm/i915/fbdev: Actually configure untiled displays
47d3d7fdb10a ip6: fix skb leak in ip6frag_expire_frag_queue()
5b9cea15a3de serial: sprd: Modify the baud rate calculation formula
513e1073d52e locking/lockdep: Add debug_locks check in __lock_downgrade()
957063c92473 pinctrl: sprd: Use define directive for sprd_pinconf_params values
87a2b65fc855 power: supply: sysfs: ratelimit property read error message

[1] https://lore.kernel.org/lkml/20190322154425.3852517-19-arnd@arndb.de/T/

Chris Wilson (1):
  drm/i915/fbdev: Actually configure untiled displays

David Lechner (1):
  power: supply: sysfs: ratelimit property read error message

Eric Biggers (1):
  ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"

Eric Dumazet (1):
  ip6: fix skb leak in ip6frag_expire_frag_queue()

Hariprasad Kelam (1):
  net: sctp: fix warning "NULL check before some freeing functions is
    not needed"

Lanqing Liu (1):
  serial: sprd: Modify the baud rate calculation formula

Nathan Chancellor (1):
  pinctrl: sprd: Use define directive for sprd_pinconf_params values

Waiman Long (1):
  locking/lockdep: Add debug_locks check in __lock_downgrade()

 drivers/gpu/drm/i915/intel_fbdev.c        |   12 +++++++-----
 drivers/net/ppp/ppp_mppe.c                |    1 -
 drivers/pinctrl/sprd/pinctrl-sprd.c       |    6 ++----
 drivers/power/supply/power_supply_sysfs.c |    3 ++-
 drivers/tty/serial/sprd_serial.c          |    2 +-
 include/net/ipv6_frag.h                   |    1 -
 kernel/locking/lockdep.c                  |    3 +++
 net/sctp/sm_make_chunk.c                  |   12 ++++--------
 8 files changed, 19 insertions(+), 21 deletions(-)

-- 
1.7.9.5


^ permalink raw reply

* Re: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Heiner Kallweit @ 2019-09-03  6:44 UTC (permalink / raw)
  To: Hayes Wang, netdev@vger.kernel.org; +Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18DACE1@RTITMBSVM03.realtek.com.tw>

On 03.09.2019 08:36, Hayes Wang wrote:
> Heiner Kallweit [mailto:hkallweit1@gmail.com]
>> Sent: Tuesday, September 03, 2019 2:14 PM
> [...]
>>>> Seeing all this code it might be a good idea to switch this driver
>>>> to phylib, similar to what I did with r8169 some time ago.
>>>
>>> It is too complex to be completed for me at the moment.
>>> If this patch is unacceptable, I would submit other
>>> patches first. Thanks.
>>>
>> My remark isn't directly related to your patch and wasn't
>> meant as an immediate ToDo. It's just a hint, because I think
>> using phylib could help to significantly simplify the driver.
> 
> I would schedule this in my work. Maybe I finish submitting
> the other patches later.
> 
> Besides, I have a question. I think I don't need rtl8152_set_speed()
> if I implement phylib. However, I need to record some information
> according to the settings of speed. For now, I do it in rtl8152_set_speed().
> Do you have any idea about how I should do it with phylib without
> rtl8152_set_speed()?
> 
When saying "record some information", what kind of information?
The speed itself is stored in struct phy_device, if you need to adjust
certain chip settings depending on negotiated speed, then you can do
this in a callback (parameter handler of phy_connect_direct).
See e.g. r8169_phylink_handler()

> Best Regards,
> Hayes
> 
> 

Heiner

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min tx rate
From: Paul Menzel @ 2019-09-03  6:42 UTC (permalink / raw)
  To: Stefan Assmann, intel-wired-lan; +Cc: netdev, davem
In-Reply-To: <20190903060810.30775-1-sassmann@kpanic.de>

Dear Stefan,


On 03.09.19 08:08, Stefan Assmann wrote:
> In the case of an invalid min tx rate being requested
> i40e_ndo_set_vf_bw() immediately returns -EINVAL instead of releasing
> __I40E_VIRTCHNL_OP_PENDING first.

What problem does this cause?

> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index f8aa4deceb5e..3d2440838822 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -4263,7 +4263,8 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
>   	if (min_tx_rate) {
>   		dev_err(&pf->pdev->dev, "Invalid min tx rate (%d) (greater than 0) specified for VF %d.\n",
>   			min_tx_rate, vf_id);
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto error;
>   	}
>   
>   	vf = &pf->vf[vf_id];


Kind regards,

Paul

^ permalink raw reply

* RE: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Hayes Wang @ 2019-09-03  6:36 UTC (permalink / raw)
  To: Heiner Kallweit, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <aa9513ff-3cef-4b9f-ecbd-1310660a911c@gmail.com>

Heiner Kallweit [mailto:hkallweit1@gmail.com]
> Sent: Tuesday, September 03, 2019 2:14 PM
[...]
> >> Seeing all this code it might be a good idea to switch this driver
> >> to phylib, similar to what I did with r8169 some time ago.
> >
> > It is too complex to be completed for me at the moment.
> > If this patch is unacceptable, I would submit other
> > patches first. Thanks.
> >
> My remark isn't directly related to your patch and wasn't
> meant as an immediate ToDo. It's just a hint, because I think
> using phylib could help to significantly simplify the driver.

I would schedule this in my work. Maybe I finish submitting
the other patches later.

Besides, I have a question. I think I don't need rtl8152_set_speed()
if I implement phylib. However, I need to record some information
according to the settings of speed. For now, I do it in rtl8152_set_speed().
Do you have any idea about how I should do it with phylib without
rtl8152_set_speed()?

Best Regards,
Hayes



^ permalink raw reply

* Re: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Heiner Kallweit @ 2019-09-03  6:13 UTC (permalink / raw)
  To: Hayes Wang, netdev@vger.kernel.org; +Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18DAB41@RTITMBSVM03.realtek.com.tw>

On 03.09.2019 05:16, Hayes Wang wrote:
> Heiner Kallweit [mailto:hkallweit1@gmail.com]
>> Sent: Tuesday, September 03, 2019 2:37 AM
> [...]
>> Seeing all this code it might be a good idea to switch this driver
>> to phylib, similar to what I did with r8169 some time ago.
> 
> It is too complex to be completed for me at the moment.
> If this patch is unacceptable, I would submit other
> patches first. Thanks.
> 
My remark isn't directly related to your patch and wasn't
meant as an immediate ToDo. It's just a hint, because I think
using phylib could help to significantly simplify the driver.

> Best Regards,
> Hayes
> 
> 
Heiner

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Ido Schimmel @ 2019-09-03  6:13 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Jiri Pirko, David Miller, andrew, horatiu.vultur,
	alexandre.belloni, UNGLinuxDriver, ivecera, f.fainelli, netdev,
	linux-kernel
In-Reply-To: <20190902174229.uur7r7duq4dvbnqq@lx-anielsen.microsemi.net>

On Mon, Sep 02, 2019 at 07:42:31PM +0200, Allan W. Nielsen wrote:
> I have been reading through this thread several times and I still do not get it.

Allan,

I kept thinking about this and I want to make sure that I correctly
understand the end result.

With these patches applied I assume I will see the following traffic
when running tcpdump on one of the netdevs exposed by the ocelot driver:

- Ingress: All
- Egress: Only locally generated traffic and traffic forwarded by the
  kernel from interfaces not belonging to the ocelot driver

The above means I will not see any offloaded traffic transmitted by the
port. Is that correct? I see that the driver is setting
'offload_fwd_mark' for any traffic trapped from bridged ports, which
means the bridge will drop it before it traverses the packet taps on
egress.

Large parts of the discussion revolve around the fact that switch ports
are not any different than other ports. Dave wrote "Please stop
portraying switches as special in this regard" and Andrew wrote "[The
user] just wants tcpdump to work like on their desktop."

But if anything, this discussion proves that switch ports are special in
this regard and that tcpdump will not work like on the desktop.

Beside the fact that I don't agree (but gave up) with the new
interpretation of promisc mode, I wonder if we're not asking for trouble
with this patchset. Users will see all offloaded traffic on ingress, but
none of it on egress. This is in contrast to the sever/desktop, where
Linux is much more dominant in comparison to switches (let alone hw
accelerated ones) and where all the traffic is visible through tcpdump.
I can already see myself having to explain this over and over again to
confused users.

Now, I understand that showing egress traffic is inherently difficult.
It means one of two things:

1. We allow packets to be forwarded by both the software and the
hardware
2. We trap all ingressing traffic from all the ports

Both options can have devastating effects on the network and therefore
should not be triggered by a supposedly innocent invocation of tcpdump.

I again wonder if it would not be wiser to solve this by introducing two
new flags to tcpdump for ingress/egress (similar to -Q in/out) capturing
of offloaded traffic. The capturing of egress offloaded traffic can be
documented with the appropriate warnings.

Anyway, I don't want to hold you up, I merely want to make sure that the
above (assuming it's correct) is considered before the patches are
applied.

Thanks

^ permalink raw reply

* [PATCH] i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min tx rate
From: Stefan Assmann @ 2019-09-03  6:08 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, davem, jeffrey.t.kirsher, lihong.yang, sassmann

In the case of an invalid min tx rate being requested
i40e_ndo_set_vf_bw() immediately returns -EINVAL instead of releasing
__I40E_VIRTCHNL_OP_PENDING first.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index f8aa4deceb5e..3d2440838822 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -4263,7 +4263,8 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
 	if (min_tx_rate) {
 		dev_err(&pf->pdev->dev, "Invalid min tx rate (%d) (greater than 0) specified for VF %d.\n",
 			min_tx_rate, vf_id);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto error;
 	}
 
 	vf = &pf->vf[vf_id];
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v3] tun: fix use-after-free when register netdev failed
From: Jason Wang @ 2019-09-03  6:06 UTC (permalink / raw)
  To: Yang Yingliang
  Cc: David Miller, netdev, eric dumazet, xiyou wangcong, weiyongjun1
In-Reply-To: <5D6DFD57.7020905@huawei.com>


On 2019/9/3 下午1:42, Yang Yingliang wrote:
>
>
> On 2019/9/3 11:03, Jason Wang wrote:
>>
>> On 2019/9/3 上午9:45, Yang Yingliang wrote:
>>>
>>>
>>> On 2019/9/2 13:32, Jason Wang wrote:
>>>>
>>>> On 2019/8/23 下午5:36, Yang Yingliang wrote:
>>>>>
>>>>>
>>>>> On 2019/8/23 11:05, Jason Wang wrote:
>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>> On 2019/8/22 14:07, Yang Yingliang wrote:
>>>>>>>>
>>>>>>>> On 2019/8/22 10:13, Jason Wang wrote:
>>>>>>>>> On 2019/8/20 上午10:28, Jason Wang wrote:
>>>>>>>>>> On 2019/8/20 上午9:25, David Miller wrote:
>>>>>>>>>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>>>>>>>>> Date: Mon, 19 Aug 2019 21:31:19 +0800
>>>>>>>>>>>
>>>>>>>>>>>> Call tun_attach() after register_netdevice() to make sure 
>>>>>>>>>>>> tfile->tun
>>>>>>>>>>>> is not published until the netdevice is registered. So the 
>>>>>>>>>>>> read/write
>>>>>>>>>>>> thread can not use the tun pointer that may freed by 
>>>>>>>>>>>> free_netdev().
>>>>>>>>>>>> (The tun and dev pointer are allocated by 
>>>>>>>>>>>> alloc_netdev_mqs(), they
>>>>>>>>>>>> can
>>>>>>>>>>>> be freed by netdev_freemem().)
>>>>>>>>>>> register_netdevice() must always be the last operation in 
>>>>>>>>>>> the order of
>>>>>>>>>>> network device setup.
>>>>>>>>>>>
>>>>>>>>>>> At the point register_netdevice() is called, the device is 
>>>>>>>>>>> visible
>>>>>>>>>>> globally
>>>>>>>>>>> and therefore all of it's software state must be fully 
>>>>>>>>>>> initialized and
>>>>>>>>>>> ready for us.
>>>>>>>>>>>
>>>>>>>>>>> You're going to have to find another solution to these 
>>>>>>>>>>> problems.
>>>>>>>>>>
>>>>>>>>>> The device is loosely coupled with sockets/queues. Each side is
>>>>>>>>>> allowed to be go away without caring the other side. So in this
>>>>>>>>>> case, there's a small window that network stack think the 
>>>>>>>>>> device has
>>>>>>>>>> one queue but actually not, the code can then safely drop them.
>>>>>>>>>> Maybe it's ok here with some comments?
>>>>>>>>>>
>>>>>>>>>> Or if not, we can try to hold the device before tun_attach 
>>>>>>>>>> and drop
>>>>>>>>>> it after register_netdevice().
>>>>>>>>>
>>>>>>>>> Hi Yang:
>>>>>>>>>
>>>>>>>>> I think maybe we can try to hold refcnt instead of playing 
>>>>>>>>> real num
>>>>>>>>> queues here. Do you want to post a V4?
>>>>>>>> I think the refcnt can prevent freeing the memory in this case.
>>>>>>>> When register_netdevice() failed, free_netdev() will be called 
>>>>>>>> directly,
>>>>>>>> dev->pcpu_refcnt and dev are freed without checking refcnt of dev.
>>>>>>> How about using patch-v1 that using a flag to check whether the 
>>>>>>> device
>>>>>>> registered successfully.
>>>>>>>
>>>>>> As I said, it lacks sufficient locks or barriers. To be clear, I 
>>>>>> meant
>>>>>> something like (compile-test only):
>>>>>>
>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>> index db16d7a13e00..e52678f9f049 100644
>>>>>> --- a/drivers/net/tun.c
>>>>>> +++ b/drivers/net/tun.c
>>>>>> @@ -2828,6 +2828,7 @@ static int tun_set_iff(struct net *net, 
>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>                                (ifr->ifr_flags & TUN_FEATURES);
>>>>>> INIT_LIST_HEAD(&tun->disabled);
>>>>>> +               dev_hold(dev);
>>>>>>                  err = tun_attach(tun, file, false, 
>>>>>> ifr->ifr_flags & IFF_NAPI,
>>>>>>                                   ifr->ifr_flags & IFF_NAPI_FRAGS);
>>>>>>                  if (err < 0)
>>>>>> @@ -2836,6 +2837,7 @@ static int tun_set_iff(struct net *net, 
>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>                  err = register_netdevice(tun->dev);
>>>>>>                  if (err < 0)
>>>>>>                          goto err_detach;
>>>>>> +               dev_put(dev);
>>>>>>          }
>>>>>>            netif_carrier_on(tun->dev);
>>>>>> @@ -2852,11 +2854,13 @@ static int tun_set_iff(struct net *net, 
>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>          return 0;
>>>>>>     err_detach:
>>>>>> +       dev_put(dev);
>>>>>>          tun_detach_all(dev);
>>>>>>          /* register_netdevice() already called tun_free_netdev() */
>>>>>>          goto err_free_dev;
>>>>>>     err_free_flow:
>>>>>> +       dev_put(dev);
>>>>>>          tun_flow_uninit(tun);
>>>>>> security_tun_dev_free_security(tun->security);
>>>>>>   err_free_stat:
>>>>>>
>>>>>> What's your thought?
>>>>>
>>>>> The dev pointer are freed without checking the refcount in 
>>>>> free_netdev() called by err_free_dev
>>>>>
>>>>> path, so I don't understand how the refcount protects this pointer.
>>>>>
>>>>
>>>> The refcount are guaranteed to be zero there, isn't it?
>>> No, it's not.
>>>
>>> err_free_dev:
>>>         free_netdev(dev);
>>>
>>> void free_netdev(struct net_device *dev)
>>> {
>>> ...
>>>         /* pcpu_refcnt can be freed without checking refcount */
>>>         free_percpu(dev->pcpu_refcnt);
>>>         dev->pcpu_refcnt = NULL;
>>>
>>>         /*  Compatibility with error handling in drivers */
>>>         if (dev->reg_state == NETREG_UNINITIALIZED) {
>>>                 /* dev can be freed without checking refcount */
>>>                 netdev_freemem(dev);
>>>                 return;
>>>         }
>>> ...
>>> }
>>
>>
>> Right, but what I meant is in my patch, when code reaches 
>> free_netdev() the refcnt is zero. What did I miss?
> Yes, but it can't fix the UAF problem.


Well, it looks to me that the dev_put() in tun_put() won't release the 
device in this case.

Thanks


>>
>> Thanks
>>
>>
>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks,
>>>>> Yang
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
>
>

^ permalink raw reply

* Re: [PATCH v3] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-09-03  5:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, netdev, eric dumazet, xiyou wangcong, weiyongjun1
In-Reply-To: <4a5d84b7-f3cb-c4e1-d6fe-28d186a551ee@redhat.com>



On 2019/9/3 11:03, Jason Wang wrote:
>
> On 2019/9/3 上午9:45, Yang Yingliang wrote:
>>
>>
>> On 2019/9/2 13:32, Jason Wang wrote:
>>>
>>> On 2019/8/23 下午5:36, Yang Yingliang wrote:
>>>>
>>>>
>>>> On 2019/8/23 11:05, Jason Wang wrote:
>>>>> ----- Original Message -----
>>>>>>
>>>>>> On 2019/8/22 14:07, Yang Yingliang wrote:
>>>>>>>
>>>>>>> On 2019/8/22 10:13, Jason Wang wrote:
>>>>>>>> On 2019/8/20 上午10:28, Jason Wang wrote:
>>>>>>>>> On 2019/8/20 上午9:25, David Miller wrote:
>>>>>>>>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>>>>>>>> Date: Mon, 19 Aug 2019 21:31:19 +0800
>>>>>>>>>>
>>>>>>>>>>> Call tun_attach() after register_netdevice() to make sure 
>>>>>>>>>>> tfile->tun
>>>>>>>>>>> is not published until the netdevice is registered. So the 
>>>>>>>>>>> read/write
>>>>>>>>>>> thread can not use the tun pointer that may freed by 
>>>>>>>>>>> free_netdev().
>>>>>>>>>>> (The tun and dev pointer are allocated by 
>>>>>>>>>>> alloc_netdev_mqs(), they
>>>>>>>>>>> can
>>>>>>>>>>> be freed by netdev_freemem().)
>>>>>>>>>> register_netdevice() must always be the last operation in the 
>>>>>>>>>> order of
>>>>>>>>>> network device setup.
>>>>>>>>>>
>>>>>>>>>> At the point register_netdevice() is called, the device is 
>>>>>>>>>> visible
>>>>>>>>>> globally
>>>>>>>>>> and therefore all of it's software state must be fully 
>>>>>>>>>> initialized and
>>>>>>>>>> ready for us.
>>>>>>>>>>
>>>>>>>>>> You're going to have to find another solution to these problems.
>>>>>>>>>
>>>>>>>>> The device is loosely coupled with sockets/queues. Each side is
>>>>>>>>> allowed to be go away without caring the other side. So in this
>>>>>>>>> case, there's a small window that network stack think the 
>>>>>>>>> device has
>>>>>>>>> one queue but actually not, the code can then safely drop them.
>>>>>>>>> Maybe it's ok here with some comments?
>>>>>>>>>
>>>>>>>>> Or if not, we can try to hold the device before tun_attach and 
>>>>>>>>> drop
>>>>>>>>> it after register_netdevice().
>>>>>>>>
>>>>>>>> Hi Yang:
>>>>>>>>
>>>>>>>> I think maybe we can try to hold refcnt instead of playing real 
>>>>>>>> num
>>>>>>>> queues here. Do you want to post a V4?
>>>>>>> I think the refcnt can prevent freeing the memory in this case.
>>>>>>> When register_netdevice() failed, free_netdev() will be called 
>>>>>>> directly,
>>>>>>> dev->pcpu_refcnt and dev are freed without checking refcnt of dev.
>>>>>> How about using patch-v1 that using a flag to check whether the 
>>>>>> device
>>>>>> registered successfully.
>>>>>>
>>>>> As I said, it lacks sufficient locks or barriers. To be clear, I 
>>>>> meant
>>>>> something like (compile-test only):
>>>>>
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index db16d7a13e00..e52678f9f049 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -2828,6 +2828,7 @@ static int tun_set_iff(struct net *net, 
>>>>> struct file *file, struct ifreq *ifr)
>>>>>                                (ifr->ifr_flags & TUN_FEATURES);
>>>>>                    INIT_LIST_HEAD(&tun->disabled);
>>>>> +               dev_hold(dev);
>>>>>                  err = tun_attach(tun, file, false, ifr->ifr_flags 
>>>>> & IFF_NAPI,
>>>>>                                   ifr->ifr_flags & IFF_NAPI_FRAGS);
>>>>>                  if (err < 0)
>>>>> @@ -2836,6 +2837,7 @@ static int tun_set_iff(struct net *net, 
>>>>> struct file *file, struct ifreq *ifr)
>>>>>                  err = register_netdevice(tun->dev);
>>>>>                  if (err < 0)
>>>>>                          goto err_detach;
>>>>> +               dev_put(dev);
>>>>>          }
>>>>>            netif_carrier_on(tun->dev);
>>>>> @@ -2852,11 +2854,13 @@ static int tun_set_iff(struct net *net, 
>>>>> struct file *file, struct ifreq *ifr)
>>>>>          return 0;
>>>>>     err_detach:
>>>>> +       dev_put(dev);
>>>>>          tun_detach_all(dev);
>>>>>          /* register_netdevice() already called tun_free_netdev() */
>>>>>          goto err_free_dev;
>>>>>     err_free_flow:
>>>>> +       dev_put(dev);
>>>>>          tun_flow_uninit(tun);
>>>>>          security_tun_dev_free_security(tun->security);
>>>>>   err_free_stat:
>>>>>
>>>>> What's your thought?
>>>>
>>>> The dev pointer are freed without checking the refcount in 
>>>> free_netdev() called by err_free_dev
>>>>
>>>> path, so I don't understand how the refcount protects this pointer.
>>>>
>>>
>>> The refcount are guaranteed to be zero there, isn't it?
>> No, it's not.
>>
>> err_free_dev:
>>         free_netdev(dev);
>>
>> void free_netdev(struct net_device *dev)
>> {
>> ...
>>         /* pcpu_refcnt can be freed without checking refcount */
>>         free_percpu(dev->pcpu_refcnt);
>>         dev->pcpu_refcnt = NULL;
>>
>>         /*  Compatibility with error handling in drivers */
>>         if (dev->reg_state == NETREG_UNINITIALIZED) {
>>                 /* dev can be freed without checking refcount */
>>                 netdev_freemem(dev);
>>                 return;
>>         }
>> ...
>> }
>
>
> Right, but what I meant is in my patch, when code reaches 
> free_netdev() the refcnt is zero. What did I miss?
Yes, but it can't fix the UAF problem.
>
> Thanks
>
>
>>
>>>
>>> Thanks
>>>
>>>
>>>> Thanks,
>>>> Yang
>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>
>>> .
>>>
>>
>>
>
> .
>



^ permalink raw reply

* [PATCH] rtl8xxxu: add bluetooth co-existence support for single antenna
From: Chris Chiu @ 2019-09-03  5:37 UTC (permalink / raw)
  To: Jes.Sorensen, kvalo, davem; +Cc: linux-wireless, netdev, linux-kernel, linux

The RTL8723BU suffers the wifi disconnection problem while bluetooth
device connected. While wifi is doing tx/rx, the bluetooth will scan
without results. This is due to the wifi and bluetooth share the same
single antenna for RF communication and they need to have a mechanism
to collaborate.

BT information is provided via the packet sent from co-processor to
host (C2H). It contains the status of BT but the rtl8723bu_handle_c2h
dose not really handle it. And there's no bluetooth coexistence
mechanism to deal with it.

This commit adds a workqueue to set the tdma configurations and
coefficient table per the parsed bluetooth link status and given
wifi connection state. The tdma/coef table comes from the vendor
driver code of the RTL8192EU and RTL8723BU. However, this commit is
only for single antenna scenario which RTL8192EU is default dual
antenna. The rtl8xxxu_parse_rxdesc24 which invokes the handle_c2h
is only for 8723b and 8192e so the mechanism is expected to work
on both chips with single antenna. Note RTL8192EU dual antenna is
not supported.

Signed-off-by: Chris Chiu <chiu@endlessm.com>
---
 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu.h  |  37 +++
 .../realtek/rtl8xxxu/rtl8xxxu_8723b.c         |   2 -
 .../wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 243 +++++++++++++++++-
 3 files changed, 275 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
index 582c2a346cec..22e95b11bfbb 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
@@ -1220,6 +1220,37 @@ enum ratr_table_mode_new {
 	RATEID_IDX_BGN_3SS = 14,
 };
 
+#define BT_INFO_8723B_1ANT_B_FTP		BIT(7)
+#define BT_INFO_8723B_1ANT_B_A2DP		BIT(6)
+#define BT_INFO_8723B_1ANT_B_HID		BIT(5)
+#define BT_INFO_8723B_1ANT_B_SCO_BUSY		BIT(4)
+#define BT_INFO_8723B_1ANT_B_ACL_BUSY		BIT(3)
+#define BT_INFO_8723B_1ANT_B_INQ_PAGE		BIT(2)
+#define BT_INFO_8723B_1ANT_B_SCO_ESCO		BIT(1)
+#define BT_INFO_8723B_1ANT_B_CONNECTION	BIT(0)
+
+enum _BT_8723B_1ANT_STATUS {
+	BT_8723B_1ANT_STATUS_NON_CONNECTED_IDLE      = 0x0,
+	BT_8723B_1ANT_STATUS_CONNECTED_IDLE          = 0x1,
+	BT_8723B_1ANT_STATUS_INQ_PAGE                = 0x2,
+	BT_8723B_1ANT_STATUS_ACL_BUSY                = 0x3,
+	BT_8723B_1ANT_STATUS_SCO_BUSY                = 0x4,
+	BT_8723B_1ANT_STATUS_ACL_SCO_BUSY            = 0x5,
+	BT_8723B_1ANT_STATUS_MAX
+};
+
+struct rtl8xxxu_btcoex {
+	u8      bt_status;
+	bool	bt_busy;
+	bool	has_sco;
+	bool	has_a2dp;
+	bool    has_hid;
+	bool    has_pan;
+	bool	hid_only;
+	bool	a2dp_only;
+	bool    c2h_bt_inquiry;
+};
+
 #define RTL8XXXU_RATR_STA_INIT 0
 #define RTL8XXXU_RATR_STA_HIGH 1
 #define RTL8XXXU_RATR_STA_MID  2
@@ -1340,6 +1371,10 @@ struct rtl8xxxu_priv {
 	 */
 	struct ieee80211_vif *vif;
 	struct delayed_work ra_watchdog;
+	struct work_struct c2hcmd_work;
+	struct sk_buff_head c2hcmd_queue;
+	spinlock_t c2hcmd_lock;
+	struct rtl8xxxu_btcoex bt_coex;
 };
 
 struct rtl8xxxu_rx_urb {
@@ -1486,6 +1521,8 @@ void rtl8xxxu_fill_txdesc_v2(struct ieee80211_hw *hw, struct ieee80211_hdr *hdr,
 			     struct rtl8xxxu_txdesc32 *tx_desc32, bool sgi,
 			     bool short_preamble, bool ampdu_enable,
 			     u32 rts_rate);
+void rtl8723bu_set_ps_tdma(struct rtl8xxxu_priv *priv,
+			   u8 arg1, u8 arg2, u8 arg3, u8 arg4, u8 arg5);
 
 extern struct rtl8xxxu_fileops rtl8192cu_fops;
 extern struct rtl8xxxu_fileops rtl8192eu_fops;
diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
index ceffe05bd65b..9ba661b3d767 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
@@ -1580,9 +1580,7 @@ static void rtl8723b_enable_rf(struct rtl8xxxu_priv *priv)
 	/*
 	 * Software control, antenna at WiFi side
 	 */
-#ifdef NEED_PS_TDMA
 	rtl8723bu_set_ps_tdma(priv, 0x08, 0x00, 0x00, 0x00, 0x00);
-#endif
 
 	rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x55555555);
 	rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0x55555555);
diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index a6f358b9e447..4f72c2d14d44 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -3820,9 +3820,8 @@ void rtl8xxxu_power_off(struct rtl8xxxu_priv *priv)
 	rtl8xxxu_write8(priv, REG_RSV_CTRL, 0x0e);
 }
 
-#ifdef NEED_PS_TDMA
-static void rtl8723bu_set_ps_tdma(struct rtl8xxxu_priv *priv,
-				  u8 arg1, u8 arg2, u8 arg3, u8 arg4, u8 arg5)
+void rtl8723bu_set_ps_tdma(struct rtl8xxxu_priv *priv,
+			   u8 arg1, u8 arg2, u8 arg3, u8 arg4, u8 arg5)
 {
 	struct h2c_cmd h2c;
 
@@ -3835,7 +3834,6 @@ static void rtl8723bu_set_ps_tdma(struct rtl8xxxu_priv *priv,
 	h2c.b_type_dma.data5 = arg5;
 	rtl8xxxu_gen2_h2c_cmd(priv, &h2c, sizeof(h2c.b_type_dma));
 }
-#endif
 
 void rtl8xxxu_gen2_disable_rf(struct rtl8xxxu_priv *priv)
 {
@@ -5186,12 +5184,239 @@ static void rtl8xxxu_rx_urb_work(struct work_struct *work)
 	}
 }
 
+void rtl8723bu_set_coex_with_type(struct rtl8xxxu_priv *priv, u8 type)
+{
+	switch (type) {
+	case 0:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x55555555);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0x55555555);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	case 1:
+	case 3:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x55555555);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0x5a5a5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	case 2:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x5a5a5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0x5a5a5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	case 4:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x5a5a5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0xaaaa5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	case 5:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x5a5a5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0xaa5a5a5a);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	case 6:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x55555555);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0xaaaaaaaa);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	case 7:
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0xaaaaaaaa);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0xaaaaaaaa);
+		rtl8xxxu_write32(priv, REG_BT_COEX_TABLE3, 0x00ffffff);
+		rtl8xxxu_write8(priv, REG_BT_COEX_TABLE4, 0x03);
+		break;
+	default:
+		break;
+	}
+}
+
+void rtl8723bu_update_bt_link_info(struct rtl8xxxu_priv *priv, u8 bt_info)
+{
+	struct rtl8xxxu_btcoex *btcoex = &priv->bt_coex;
+
+	if (bt_info & BT_INFO_8723B_1ANT_B_INQ_PAGE)
+		btcoex->c2h_bt_inquiry = true;
+	else
+		btcoex->c2h_bt_inquiry = false;
+
+	if (!(bt_info & BT_INFO_8723B_1ANT_B_CONNECTION)) {
+		btcoex->bt_status = BT_8723B_1ANT_STATUS_NON_CONNECTED_IDLE;
+		btcoex->has_sco = false;
+		btcoex->has_hid = false;
+		btcoex->has_pan = false;
+		btcoex->has_a2dp = false;
+	} else {
+		if ((bt_info & 0x1f) == BT_INFO_8723B_1ANT_B_CONNECTION)
+			btcoex->bt_status = BT_8723B_1ANT_STATUS_CONNECTED_IDLE;
+		else if ((bt_info & BT_INFO_8723B_1ANT_B_SCO_ESCO) ||
+			 (bt_info & BT_INFO_8723B_1ANT_B_SCO_BUSY))
+			btcoex->bt_status = BT_8723B_1ANT_STATUS_SCO_BUSY;
+		else if (bt_info & BT_INFO_8723B_1ANT_B_ACL_BUSY)
+			btcoex->bt_status = BT_8723B_1ANT_STATUS_ACL_BUSY;
+		else
+			btcoex->bt_status = BT_8723B_1ANT_STATUS_MAX;
+
+		if (bt_info & BT_INFO_8723B_1ANT_B_FTP)
+			btcoex->has_pan = true;
+		else
+			btcoex->has_pan = false;
+
+		if (bt_info & BT_INFO_8723B_1ANT_B_A2DP)
+			btcoex->has_a2dp = true;
+		else
+			btcoex->has_a2dp = false;
+
+		if (bt_info & BT_INFO_8723B_1ANT_B_HID)
+			btcoex->has_hid = true;
+		else
+			btcoex->has_hid = false;
+
+		if (bt_info & BT_INFO_8723B_1ANT_B_SCO_ESCO)
+			btcoex->has_sco = true;
+		else
+			btcoex->has_sco = false;
+	}
+
+	if (!btcoex->has_a2dp &&
+	    !btcoex->has_sco &&
+	    !btcoex->has_pan &&
+	    btcoex->has_hid)
+		btcoex->hid_only = true;
+	else
+		btcoex->hid_only = false;
+
+	if (!btcoex->has_sco &&
+	    !btcoex->has_pan &&
+	    !btcoex->has_hid &&
+	    btcoex->has_a2dp)
+		btcoex->has_a2dp = true;
+	else
+		btcoex->has_a2dp = false;
+
+	if (btcoex->bt_status == BT_8723B_1ANT_STATUS_SCO_BUSY ||
+	    btcoex->bt_status == BT_8723B_1ANT_STATUS_ACL_BUSY)
+		btcoex->bt_busy = true;
+	else
+		btcoex->bt_busy = false;
+}
+
+static void rtl8xxxu_c2hcmd_callback(struct work_struct *work)
+{
+	struct rtl8xxxu_priv *priv;
+	struct rtl8723bu_c2h *c2h;
+	struct ieee80211_vif *vif;
+	struct device *dev;
+	struct sk_buff *skb = NULL;
+	unsigned long flags;
+	int len;
+	u8 bt_info = 0;
+	struct rtl8xxxu_btcoex *btcoex;
+
+	priv = container_of(work, struct rtl8xxxu_priv, c2hcmd_work);
+	vif = priv->vif;
+	btcoex = &priv->bt_coex;
+	dev = &priv->udev->dev;
+
+	if (priv->rf_paths > 1)
+		goto out;
+
+	while (!skb_queue_empty(&priv->c2hcmd_queue)) {
+		spin_lock_irqsave(&priv->c2hcmd_lock, flags);
+		skb = __skb_dequeue(&priv->c2hcmd_queue);
+		spin_unlock_irqrestore(&priv->c2hcmd_lock, flags);
+
+		c2h = (struct rtl8723bu_c2h *)skb->data;
+		len = skb->len - 2;
+
+		switch (c2h->id) {
+		case C2H_8723B_BT_INFO:
+			bt_info = c2h->bt_info.bt_info;
+
+			rtl8723bu_update_bt_link_info(priv, bt_info);
+
+			if (btcoex->c2h_bt_inquiry) {
+				if (vif && !vif->bss_conf.assoc) {
+					rtl8723bu_set_ps_tdma(priv, 0x8, 0x0, 0x0, 0x0, 0x0);
+					rtl8723bu_set_coex_with_type(priv, 0);
+				} else if (btcoex->has_sco ||
+					   btcoex->has_hid ||
+					   btcoex->has_a2dp) {
+					rtl8723bu_set_ps_tdma(priv, 0x61, 0x35, 0x3, 0x11, 0x11);
+					rtl8723bu_set_coex_with_type(priv, 4);
+				} else if (btcoex->has_pan) {
+					rtl8723bu_set_ps_tdma(priv, 0x61, 0x3f, 0x3, 0x11, 0x11);
+					rtl8723bu_set_coex_with_type(priv, 4);
+				} else {
+					rtl8723bu_set_ps_tdma(priv, 0x8, 0x0, 0x0, 0x0, 0x0);
+					rtl8723bu_set_coex_with_type(priv, 7);
+				}
+
+				return;
+			}
+
+			if (vif && vif->bss_conf.assoc) {
+				u32 val32 = 0;
+				u32 high_prio_tx = 0, high_prio_rx = 0;
+
+				val32 = rtl8xxxu_read32(priv, 0x770);
+				high_prio_tx = val32 & 0x0000ffff;
+				high_prio_rx = (val32  & 0xffff0000) >> 16;
+
+				if (btcoex->bt_busy) {
+					if (btcoex->hid_only) {
+						rtl8723bu_set_ps_tdma(priv, 0x61, 0x20, 0x3, 0x11, 0x11);
+						rtl8723bu_set_coex_with_type(priv, 5);
+					} else if (btcoex->a2dp_only) {
+						rtl8723bu_set_ps_tdma(priv, 0x61, 0x35, 0x3, 0x11, 0x11);
+						rtl8723bu_set_coex_with_type(priv, 4);
+					} else if ((btcoex->has_a2dp &&
+						    btcoex->has_pan) ||
+						   (btcoex->has_hid &&
+						    btcoex->has_a2dp &&
+						    btcoex->has_pan)) {
+						rtl8723bu_set_ps_tdma(priv, 0x51, 0x21, 0x3, 0x10, 0x10);
+						rtl8723bu_set_coex_with_type(priv, 4);
+					} else if (btcoex->has_hid &&
+						 btcoex->has_a2dp) {
+						rtl8723bu_set_ps_tdma(priv, 0x51, 0x21, 0x3, 0x10, 0x10);
+						rtl8723bu_set_coex_with_type(priv, 3);
+					} else {
+						rtl8723bu_set_ps_tdma(priv, 0x61, 0x35, 0x3, 0x11, 0x11);
+						rtl8723bu_set_coex_with_type(priv, 4);
+					}
+				} else {
+					rtl8723bu_set_ps_tdma(priv, 0x8, 0x0, 0x0, 0x0, 0x0);
+					if (high_prio_tx + high_prio_rx <= 60)
+						rtl8723bu_set_coex_with_type(priv, 2);
+					else
+						rtl8723bu_set_coex_with_type(priv, 7);
+				}
+			} else {
+				rtl8723bu_set_ps_tdma(priv, 0x8, 0x0, 0x0, 0x0, 0x0);
+				rtl8723bu_set_coex_with_type(priv, 0);
+			}
+			break;
+		default:
+			break;
+		}
+	}
+
+out:
+	dev_kfree_skb(skb);
+}
+
 static void rtl8723bu_handle_c2h(struct rtl8xxxu_priv *priv,
 				 struct sk_buff *skb)
 {
 	struct rtl8723bu_c2h *c2h = (struct rtl8723bu_c2h *)skb->data;
 	struct device *dev = &priv->udev->dev;
 	int len;
+	unsigned long flags;
 
 	len = skb->len - 2;
 
@@ -5229,6 +5454,12 @@ static void rtl8723bu_handle_c2h(struct rtl8xxxu_priv *priv,
 			       16, 1, c2h->raw.payload, len, false);
 		break;
 	}
+
+	spin_lock_irqsave(&priv->c2hcmd_lock, flags);
+	__skb_queue_tail(&priv->c2hcmd_queue, skb);
+	spin_unlock_irqrestore(&priv->c2hcmd_lock, flags);
+
+	schedule_work(&priv->c2hcmd_work);
 }
 
 int rtl8xxxu_parse_rxdesc16(struct rtl8xxxu_priv *priv, struct sk_buff *skb)
@@ -5353,7 +5584,6 @@ int rtl8xxxu_parse_rxdesc24(struct rtl8xxxu_priv *priv, struct sk_buff *skb)
 		struct device *dev = &priv->udev->dev;
 		dev_dbg(dev, "%s: C2H packet\n", __func__);
 		rtl8723bu_handle_c2h(priv, skb);
-		dev_kfree_skb(skb);
 		return RX_TYPE_C2H;
 	}
 
@@ -6272,6 +6502,9 @@ static int rtl8xxxu_probe(struct usb_interface *interface,
 	spin_lock_init(&priv->rx_urb_lock);
 	INIT_WORK(&priv->rx_urb_wq, rtl8xxxu_rx_urb_work);
 	INIT_DELAYED_WORK(&priv->ra_watchdog, rtl8xxxu_watchdog_callback);
+	spin_lock_init(&priv->c2hcmd_lock);
+	INIT_WORK(&priv->c2hcmd_work, rtl8xxxu_c2hcmd_callback);
+	skb_queue_head_init(&priv->c2hcmd_queue);
 
 	usb_set_intfdata(interface, hw);
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH] Clock-independent TCP ISN generation
From: Cyrus Sh @ 2019-09-03  5:06 UTC (permalink / raw)
  To: davem; +Cc: shiraz.saleem, jgg, arnd, arnd, netdev, sirus

This patch addresses the privacy issue of TCP ISN generation in Linux
kernel. Currently an adversary can deanonymize a user behind an anonymity
network by inducing a load pattern on the target machine and correlating
its clock skew with the pattern. Since the kernel adds a clock-based
counter to generated ISNs, the adversary can observe SYN packets with
similar IP and port numbers to find out the clock skew of the target
machine and this can help them identify the user.  To resolve this problem
I have changed the related function to generate the initial sequence
numbers randomly and independent from the cpu clock. This feature is
controlled by a new sysctl option called "tcp_random_isn" which I've added
to the kernel. Once enabled the initial sequence numbers are guaranteed to
be generated independently from each other and from the hardware clock of
the machine. If the option is off, ISNs are generated as before.  To get
more information about this patch and its effectiveness you can refer to my
post here:
https://bitguard.wordpress.com/?p=982
and to see a discussion about the issue you can read this:
https://trac.torproject.org/projects/tor/ticket/16659

Signed-off-by: Sirus Shahini <sirus.shahini@gmail.com>
---
 include/net/tcp.h           |  1 +
 include/uapi/linux/sysctl.h |  1 +
 kernel/sysctl_binary.c      |  1 +
 net/core/secure_seq.c       | 24 +++++++++++++++++++++++-
 net/ipv4/sysctl_net_ipv4.c  |  7 +++++++
 net/ipv4/tcp_input.c        |  1 +
 6 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 81e8ade..4ad1bbf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -241,6 +241,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 
 /* sysctl variables for tcp */
 extern int sysctl_tcp_max_orphans;
+extern int sysctl_tcp_random_isn;
 extern long sysctl_tcp_mem[3];
 
 #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 87aa2a6..ba8927e 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -426,6 +426,7 @@ enum
 	NET_TCP_ALLOWED_CONG_CONTROL=123,
 	NET_TCP_MAX_SSTHRESH=124,
 	NET_TCP_FRTO_RESPONSE=125,
+	NET_IPV4_TCP_RANDOM_ISN=126,
 };
 
 enum {
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 73c1320..0faf7d4 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -332,6 +332,7 @@ static const struct bin_table bin_net_ipv4_netfilter_table[] = {
 };
 
 static const struct bin_table bin_net_ipv4_table[] = {
+	{CTL_INT,   NET_IPV4_TCP_RANDOM_ISN     "tcp_random_isn"}
 	{CTL_INT,	NET_IPV4_FORWARD,			"ip_forward" },
 
 	{ CTL_DIR,	NET_IPV4_CONF,		"conf",		bin_net_ipv4_conf_table },
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 7b6b1d2..b644bbe 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -22,6 +22,7 @@
 
 static siphash_key_t net_secret __read_mostly;
 static siphash_key_t ts_secret __read_mostly;
+static siphash_key_t last_secret = {{0,0}} ;
 
 static __always_inline void net_secret_init(void)
 {
@@ -134,8 +135,29 @@ u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
 		   __be16 sport, __be16 dport)
 {
 	u32 hash;
-
+	u32 temp;
+	
 	net_secret_init();
+	
+	if (sysctl_tcp_random_isn){
+		if (!last_secret.key[0] && !last_secret.key[1]){
+			memcpy(&last_secret,&net_secret,sizeof(last_secret));	
+					
+		}else{
+			temp = *((u32*)&(net_secret.key[0]));
+			temp >>= 8;
+			last_secret.key[0]+=temp;
+			temp = *((u32*)&(net_secret.key[1]));
+			temp >>= 8;
+			last_secret.key[1]+=temp;
+		}
+		hash = siphash_3u32((__force u32)saddr, (__force u32)daddr,
+			        (__force u32)sport << 16 | (__force u32)dport,
+			        &last_secret);
+		return hash;
+	}
+	
+	
 	hash = siphash_3u32((__force u32)saddr, (__force u32)daddr,
 			    (__force u32)sport << 16 | (__force u32)dport,
 			    &net_secret);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0b980e8..74b2b6a 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -479,6 +479,13 @@ static int proc_fib_multipath_hash_policy(struct ctl_table *table, int write,
 
 static struct ctl_table ipv4_table[] = {
 	{
+    	.procname	= "tcp_random_isn",
+		.data		= &sysctl_tcp_random_isn,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec  
+	},
+	{
 		.procname	= "tcp_max_orphans",
 		.data		= &sysctl_tcp_max_orphans,
 		.maxlen		= sizeof(int),
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a2..c6b4ebf 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -80,6 +80,7 @@
 #include <linux/jump_label_ratelimit.h>
 #include <net/busy_poll.h>
 
+int sysctl_tcp_random_isn __read_mostly = 0;
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 
 #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)
From: Lorenzo Colitti @ 2019-09-03  4:58 UTC (permalink / raw)
  To: David Ahern
  Cc: Maciej Żenczykowski, Maciej Żenczykowski,
	David S . Miller, Linux NetDev
In-Reply-To: <cd6b7a9b-59a7-143a-0d5f-e73069d9295d@gmail.com>

On Tue, Sep 3, 2019 at 11:18 AM David Ahern <dsahern@gmail.com> wrote:
> addrconf_f6i_alloc is used for addresses added by userspace
> (ipv6_add_addr) and anycast. ie., from what I can see it is not used for RAs

Isn't ipv6_add_addr called by addrconf_prefix_rcv_add_addr, which is
called by addrconf_prefix_rcv, which is called by
ndisc_router_discovery? That is what happens when we process an RA;
AFAICS manual configuration is inet6_addr_add, not ipv6_add_addr.

Maciej, with this patch, do SLAAC addresses still have RTF_ADDRCONF?
Per my previous message, my assumption would be no, but I might be
misreading the code.

^ permalink raw reply

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-09-03  4:39 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190902095723.6vuvp73fdunmiogo@steredhat>

On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > 
> > Assuming we miss nothing and buffers < 4K are broken,
> > I think we need to add this to the spec, possibly with
> > a feature bit to relax the requirement that all buffers
> > are at least 4k in size.
> > 
> 
> Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?

How about we also fix the bug for now?

-- 
MST

^ permalink raw reply

* Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages
From: Michael S. Tsirkin @ 2019-09-03  4:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717113030.163499-3-sgarzare@redhat.com>

On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> In order to reduce the number of credit update messages,
> we send them only when the space available seen by the
> transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/linux/virtio_vsock.h            |  1 +
>  net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 7d973903f52e..49fc9d20bc43 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
>  
>  	/* Protected by rx_lock */
>  	u32 fwd_cnt;
> +	u32 last_fwd_cnt;
>  	u32 rx_bytes;
>  	struct list_head rx_queue;
>  };
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 095221f94786..a85559d4d974 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
>  {
>  	spin_lock_bh(&vvs->tx_lock);
> +	vvs->last_fwd_cnt = vvs->fwd_cnt;
>  	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
>  	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
>  	spin_unlock_bh(&vvs->tx_lock);
> @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  	struct virtio_vsock_sock *vvs = vsk->trans;
>  	struct virtio_vsock_pkt *pkt;
>  	size_t bytes, total = 0;
> +	u32 free_space;
>  	int err = -EFAULT;
>  
>  	spin_lock_bh(&vvs->rx_lock);
> @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  			virtio_transport_free_pkt(pkt);
>  		}
>  	}
> +
> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> +
>  	spin_unlock_bh(&vvs->rx_lock);
>  
> -	/* Send a credit pkt to peer */
> -	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> -					    NULL);
> +	/* We send a credit update only when the space available seen
> +	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE

This is just repeating what code does though.
Please include the *reason* for the condition.
E.g. here's a better comment:

	/* To reduce number of credit update messages,
	 * don't update credits as long as lots of space is available.
	 * Note: the limit chosen here is arbitrary. Setting the limit
	 * too high causes extra messages. Too low causes transmitter
	 * stalls. As stalls are in theory more expensive than extra
	 * messages, we set the limit to a high value. TODO: experiment
	 * with different values.
	 */


> +	 */
> +	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> +		virtio_transport_send_credit_update(vsk,
> +						    VIRTIO_VSOCK_TYPE_STREAM,
> +						    NULL);
> +	}
>  
>  	return total;
>  
> -- 
> 2.20.1

^ permalink raw reply

* [PATCH net-next 5/5] net/tls: dedup the record cleanup
From: Jakub Kicinski @ 2019-09-03  4:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel, Jakub Kicinski, John Hurley, Dirk van der Merwe
In-Reply-To: <20190903043106.27570-1-jakub.kicinski@netronome.com>

If retransmit record hint fall into the cleanup window we will
free it by just walking the list. No need to duplicate the code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 net/tls/tls_device.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 9e1bec1a0a28..41c106e45f01 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -159,12 +159,8 @@ static void tls_icsk_clean_acked(struct sock *sk, u32 acked_seq)
 
 	spin_lock_irqsave(&ctx->lock, flags);
 	info = ctx->retransmit_hint;
-	if (info && !before(acked_seq, info->end_seq)) {
+	if (info && !before(acked_seq, info->end_seq))
 		ctx->retransmit_hint = NULL;
-		list_del(&info->list);
-		destroy_record(info);
-		deleted_records++;
-	}
 
 	list_for_each_entry_safe(info, temp, &ctx->records_list, list) {
 		if (before(acked_seq, info->end_seq))
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 4/5] net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICE
From: Jakub Kicinski @ 2019-09-03  4:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel, Jakub Kicinski, John Hurley, Dirk van der Merwe
In-Reply-To: <20190903043106.27570-1-jakub.kicinski@netronome.com>

TLS code has a number of #ifdefs which make the code a little
harder to follow. Recent fixes removed the ifdef around the
TLS_HW define, so we can switch to the often used pattern
of defining tls_device functions as empty static inlines
in the header when CONFIG_TLS_DEVICE=n.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 include/net/tls.h  | 38 ++++++++++++++++++++++++++++++++------
 net/tls/tls_main.c | 19 +------------------
 net/tls/tls_sw.c   |  6 ++----
 3 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6dab6683e42f..c664e6dba0d1 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -366,13 +366,9 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 			   struct pipe_inode_info *pipe,
 			   size_t len, unsigned int flags);
 
-int tls_set_device_offload(struct sock *sk, struct tls_context *ctx);
 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_device_sendpage(struct sock *sk, struct page *page,
 			int offset, size_t size, int flags);
-void tls_device_free_resources_tx(struct sock *sk);
-void tls_device_init(void);
-void tls_device_cleanup(void);
 int tls_tx_records(struct sock *sk, int flags);
 
 struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
@@ -649,7 +645,6 @@ int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
 		      unsigned char *record_type);
 void tls_register_device(struct tls_device *device);
 void tls_unregister_device(struct tls_device *device);
-int tls_device_decrypted(struct sock *sk, struct sk_buff *skb);
 int decrypt_skb(struct sock *sk, struct sk_buff *skb,
 		struct scatterlist *sgout);
 struct sk_buff *tls_encrypt_skb(struct sk_buff *skb);
@@ -662,9 +657,40 @@ int tls_sw_fallback_init(struct sock *sk,
 			 struct tls_offload_context_tx *offload_ctx,
 			 struct tls_crypto_info *crypto_info);
 
+#ifdef CONFIG_TLS_DEVICE
+void tls_device_init(void);
+void tls_device_cleanup(void);
+int tls_set_device_offload(struct sock *sk, struct tls_context *ctx);
+void tls_device_free_resources_tx(struct sock *sk);
 int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx);
-
 void tls_device_offload_cleanup_rx(struct sock *sk);
 void tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq);
+int tls_device_decrypted(struct sock *sk, struct sk_buff *skb);
+#else
+static inline void tls_device_init(void) {}
+static inline void tls_device_cleanup(void) {}
 
+static inline int
+tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void tls_device_free_resources_tx(struct sock *sk) {}
+
+static inline int
+tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void tls_device_offload_cleanup_rx(struct sock *sk) {}
+static inline void
+tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq) {}
+
+static inline int tls_device_decrypted(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+#endif
 #endif /* _TLS_OFFLOAD_H */
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 2df1ae8b77fa..ac88877dcade 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -286,19 +286,14 @@ static void tls_sk_proto_cleanup(struct sock *sk,
 		kfree(ctx->tx.rec_seq);
 		kfree(ctx->tx.iv);
 		tls_sw_release_resources_tx(sk);
-#ifdef CONFIG_TLS_DEVICE
 	} else if (ctx->tx_conf == TLS_HW) {
 		tls_device_free_resources_tx(sk);
-#endif
 	}
 
 	if (ctx->rx_conf == TLS_SW)
 		tls_sw_release_resources_rx(sk);
-
-#ifdef CONFIG_TLS_DEVICE
-	if (ctx->rx_conf == TLS_HW)
+	else if (ctx->rx_conf == TLS_HW)
 		tls_device_offload_cleanup_rx(sk);
-#endif
 }
 
 static void tls_sk_proto_close(struct sock *sk, long timeout)
@@ -537,26 +532,18 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
 	}
 
 	if (tx) {
-#ifdef CONFIG_TLS_DEVICE
 		rc = tls_set_device_offload(sk, ctx);
 		conf = TLS_HW;
 		if (rc) {
-#else
-		{
-#endif
 			rc = tls_set_sw_offload(sk, ctx, 1);
 			if (rc)
 				goto err_crypto_info;
 			conf = TLS_SW;
 		}
 	} else {
-#ifdef CONFIG_TLS_DEVICE
 		rc = tls_set_device_offload_rx(sk, ctx);
 		conf = TLS_HW;
 		if (rc) {
-#else
-		{
-#endif
 			rc = tls_set_sw_offload(sk, ctx, 0);
 			if (rc)
 				goto err_crypto_info;
@@ -920,9 +907,7 @@ static int __init tls_register(void)
 	tls_sw_proto_ops = inet_stream_ops;
 	tls_sw_proto_ops.splice_read = tls_sw_splice_read;
 
-#ifdef CONFIG_TLS_DEVICE
 	tls_device_init();
-#endif
 	tcp_register_ulp(&tcp_tls_ulp_ops);
 
 	return 0;
@@ -931,9 +916,7 @@ static int __init tls_register(void)
 static void __exit tls_unregister(void)
 {
 	tcp_unregister_ulp(&tcp_tls_ulp_ops);
-#ifdef CONFIG_TLS_DEVICE
 	tls_device_cleanup();
-#endif
 }
 
 module_init(tls_register);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 91d21b048a9b..c2b5e0d2ba1a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1489,13 +1489,12 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
 	int pad, err = 0;
 
 	if (!ctx->decrypted) {
-#ifdef CONFIG_TLS_DEVICE
 		if (tls_ctx->rx_conf == TLS_HW) {
 			err = tls_device_decrypted(sk, skb);
 			if (err < 0)
 				return err;
 		}
-#endif
+
 		/* Still not decrypted after tls_device */
 		if (!ctx->decrypted) {
 			err = decrypt_internal(sk, skb, dest, NULL, chunk, zc,
@@ -2014,10 +2013,9 @@ static int tls_read_size(struct strparser *strp, struct sk_buff *skb)
 		ret = -EINVAL;
 		goto read_failure;
 	}
-#ifdef CONFIG_TLS_DEVICE
+
 	tls_device_rx_resync_new_rec(strp->sk, data_len + TLS_HEADER_SIZE,
 				     TCP_SKB_CB(skb)->seq + rxm->offset);
-#endif
 	return data_len + TLS_HEADER_SIZE;
 
 read_failure:
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 3/5] net/tls: narrow down the critical area of device_offload_lock
From: Jakub Kicinski @ 2019-09-03  4:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel, Jakub Kicinski, John Hurley, Dirk van der Merwe
In-Reply-To: <20190903043106.27570-1-jakub.kicinski@netronome.com>

On setsockopt path we need to hold device_offload_lock from
the moment we check netdev is up until the context is fully
ready to be added to the tls_device_list.

No need to hold it around the get_netdev_for_sock().
Change the code and remove the confusing comment.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 net/tls/tls_device.c | 46 +++++++++++++++++++++-----------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 2cd7318a1338..9e1bec1a0a28 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -935,17 +935,11 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	if (skb)
 		TCP_SKB_CB(skb)->eor = 1;
 
-	/* We support starting offload on multiple sockets
-	 * concurrently, so we only need a read lock here.
-	 * This lock must precede get_netdev_for_sock to prevent races between
-	 * NETDEV_DOWN and setsockopt.
-	 */
-	down_read(&device_offload_lock);
 	netdev = get_netdev_for_sock(sk);
 	if (!netdev) {
 		pr_err_ratelimited("%s: netdev not found\n", __func__);
 		rc = -EINVAL;
-		goto release_lock;
+		goto disable_cad;
 	}
 
 	if (!(netdev->features & NETIF_F_HW_TLS_TX)) {
@@ -956,10 +950,15 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	/* Avoid offloading if the device is down
 	 * We don't want to offload new flows after
 	 * the NETDEV_DOWN event
+	 *
+	 * device_offload_lock is taken in tls_devices's NETDEV_DOWN
+	 * handler thus protecting from the device going down before
+	 * ctx was added to tls_device_list.
 	 */
+	down_read(&device_offload_lock);
 	if (!(netdev->flags & IFF_UP)) {
 		rc = -EINVAL;
-		goto release_netdev;
+		goto release_lock;
 	}
 
 	ctx->priv_ctx_tx = offload_ctx;
@@ -967,9 +966,10 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 					     &ctx->crypto_send.info,
 					     tcp_sk(sk)->write_seq);
 	if (rc)
-		goto release_netdev;
+		goto release_lock;
 
 	tls_device_attach(ctx, sk, netdev);
+	up_read(&device_offload_lock);
 
 	/* following this assignment tls_is_sk_tx_device_offloaded
 	 * will return true and the context might be accessed
@@ -977,14 +977,14 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	 */
 	smp_store_release(&sk->sk_validate_xmit_skb, tls_validate_xmit_skb);
 	dev_put(netdev);
-	up_read(&device_offload_lock);
 
 	return 0;
 
-release_netdev:
-	dev_put(netdev);
 release_lock:
 	up_read(&device_offload_lock);
+release_netdev:
+	dev_put(netdev);
+disable_cad:
 	clean_acked_data_disable(inet_csk(sk));
 	crypto_free_aead(offload_ctx->aead_send);
 free_rec_seq:
@@ -1008,17 +1008,10 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
 	if (ctx->crypto_recv.info.version != TLS_1_2_VERSION)
 		return -EOPNOTSUPP;
 
-	/* We support starting offload on multiple sockets
-	 * concurrently, so we only need a read lock here.
-	 * This lock must precede get_netdev_for_sock to prevent races between
-	 * NETDEV_DOWN and setsockopt.
-	 */
-	down_read(&device_offload_lock);
 	netdev = get_netdev_for_sock(sk);
 	if (!netdev) {
 		pr_err_ratelimited("%s: netdev not found\n", __func__);
-		rc = -EINVAL;
-		goto release_lock;
+		return -EINVAL;
 	}
 
 	if (!(netdev->features & NETIF_F_HW_TLS_RX)) {
@@ -1029,16 +1022,21 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
 	/* Avoid offloading if the device is down
 	 * We don't want to offload new flows after
 	 * the NETDEV_DOWN event
+	 *
+	 * device_offload_lock is taken in tls_devices's NETDEV_DOWN
+	 * handler thus protecting from the device going down before
+	 * ctx was added to tls_device_list.
 	 */
+	down_read(&device_offload_lock);
 	if (!(netdev->flags & IFF_UP)) {
 		rc = -EINVAL;
-		goto release_netdev;
+		goto release_lock;
 	}
 
 	context = kzalloc(TLS_OFFLOAD_CONTEXT_SIZE_RX, GFP_KERNEL);
 	if (!context) {
 		rc = -ENOMEM;
-		goto release_netdev;
+		goto release_lock;
 	}
 	context->resync_nh_reset = 1;
 
@@ -1066,10 +1064,10 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
 	down_read(&device_offload_lock);
 release_ctx:
 	ctx->priv_ctx_rx = NULL;
-release_netdev:
-	dev_put(netdev);
 release_lock:
 	up_read(&device_offload_lock);
+release_netdev:
+	dev_put(netdev);
 	return rc;
 }
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 2/5] net/tls: don't jump to return
From: Jakub Kicinski @ 2019-09-03  4:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel, Jakub Kicinski, John Hurley, Dirk van der Merwe
In-Reply-To: <20190903043106.27570-1-jakub.kicinski@netronome.com>

Reusing parts of error path for normal exit will make
next commit harder to read, untangle the two.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 net/tls/tls_device.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index e188139f0464..2cd7318a1338 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -838,22 +838,18 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	struct net_device *netdev;
 	char *iv, *rec_seq;
 	struct sk_buff *skb;
-	int rc = -EINVAL;
 	__be64 rcd_sn;
+	int rc;
 
 	if (!ctx)
-		goto out;
+		return -EINVAL;
 
-	if (ctx->priv_ctx_tx) {
-		rc = -EEXIST;
-		goto out;
-	}
+	if (ctx->priv_ctx_tx)
+		return -EEXIST;
 
 	start_marker_record = kmalloc(sizeof(*start_marker_record), GFP_KERNEL);
-	if (!start_marker_record) {
-		rc = -ENOMEM;
-		goto out;
-	}
+	if (!start_marker_record)
+		return -ENOMEM;
 
 	offload_ctx = kzalloc(TLS_OFFLOAD_CONTEXT_SIZE_TX, GFP_KERNEL);
 	if (!offload_ctx) {
@@ -982,7 +978,8 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	smp_store_release(&sk->sk_validate_xmit_skb, tls_validate_xmit_skb);
 	dev_put(netdev);
 	up_read(&device_offload_lock);
-	goto out;
+
+	return 0;
 
 release_netdev:
 	dev_put(netdev);
@@ -999,7 +996,6 @@ int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
 	ctx->priv_ctx_tx = NULL;
 free_marker_record:
 	kfree(start_marker_record);
-out:
 	return rc;
 }
 
@@ -1058,7 +1054,11 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
 		goto free_sw_resources;
 
 	tls_device_attach(ctx, sk, netdev);
-	goto release_netdev;
+	up_read(&device_offload_lock);
+
+	dev_put(netdev);
+
+	return 0;
 
 free_sw_resources:
 	up_read(&device_offload_lock);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 1/5] net/tls: use the full sk_proto pointer
From: Jakub Kicinski @ 2019-09-03  4:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel, Jakub Kicinski, John Hurley, Dirk van der Merwe
In-Reply-To: <20190903043106.27570-1-jakub.kicinski@netronome.com>

Since we already have the pointer to the full original sk_proto
stored use that instead of storing all individual callback
pointers as well.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 drivers/crypto/chelsio/chtls/chtls_main.c |  6 +++--
 include/net/tls.h                         | 10 ---------
 net/tls/tls_main.c                        | 27 +++++++++--------------
 3 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c b/drivers/crypto/chelsio/chtls/chtls_main.c
index 635bb4b447fb..e6df5b95ed47 100644
--- a/drivers/crypto/chelsio/chtls/chtls_main.c
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -474,7 +474,8 @@ static int chtls_getsockopt(struct sock *sk, int level, int optname,
 	struct tls_context *ctx = tls_get_ctx(sk);
 
 	if (level != SOL_TLS)
-		return ctx->getsockopt(sk, level, optname, optval, optlen);
+		return ctx->sk_proto->getsockopt(sk, level,
+						 optname, optval, optlen);
 
 	return do_chtls_getsockopt(sk, optval, optlen);
 }
@@ -541,7 +542,8 @@ static int chtls_setsockopt(struct sock *sk, int level, int optname,
 	struct tls_context *ctx = tls_get_ctx(sk);
 
 	if (level != SOL_TLS)
-		return ctx->setsockopt(sk, level, optname, optval, optlen);
+		return ctx->sk_proto->setsockopt(sk, level,
+						 optname, optval, optlen);
 
 	return do_chtls_setsockopt(sk, optname, optval, optlen);
 }
diff --git a/include/net/tls.h b/include/net/tls.h
index ec3c3ed2c6c3..6dab6683e42f 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -275,16 +275,6 @@ struct tls_context {
 	struct proto *sk_proto;
 
 	void (*sk_destruct)(struct sock *sk);
-	void (*sk_proto_close)(struct sock *sk, long timeout);
-
-	int  (*setsockopt)(struct sock *sk, int level,
-			   int optname, char __user *optval,
-			   unsigned int optlen);
-	int  (*getsockopt)(struct sock *sk, int level,
-			   int optname, char __user *optval,
-			   int __user *optlen);
-	int  (*hash)(struct sock *sk);
-	void (*unhash)(struct sock *sk);
 
 	union tls_crypto_context crypto_send;
 	union tls_crypto_context crypto_recv;
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 277f7c209fed..2df1ae8b77fa 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -331,7 +331,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 		tls_sw_strparser_done(ctx);
 	if (ctx->rx_conf == TLS_SW)
 		tls_sw_free_ctx_rx(ctx);
-	ctx->sk_proto_close(sk, timeout);
+	ctx->sk_proto->close(sk, timeout);
 
 	if (free_ctx)
 		tls_ctx_free(sk, ctx);
@@ -451,7 +451,8 @@ static int tls_getsockopt(struct sock *sk, int level, int optname,
 	struct tls_context *ctx = tls_get_ctx(sk);
 
 	if (level != SOL_TLS)
-		return ctx->getsockopt(sk, level, optname, optval, optlen);
+		return ctx->sk_proto->getsockopt(sk, level,
+						 optname, optval, optlen);
 
 	return do_tls_getsockopt(sk, optname, optval, optlen);
 }
@@ -609,7 +610,8 @@ static int tls_setsockopt(struct sock *sk, int level, int optname,
 	struct tls_context *ctx = tls_get_ctx(sk);
 
 	if (level != SOL_TLS)
-		return ctx->setsockopt(sk, level, optname, optval, optlen);
+		return ctx->sk_proto->setsockopt(sk, level, optname, optval,
+						 optlen);
 
 	return do_tls_setsockopt(sk, optname, optval, optlen);
 }
@@ -624,10 +626,7 @@ static struct tls_context *create_ctx(struct sock *sk)
 		return NULL;
 
 	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
-	ctx->setsockopt = sk->sk_prot->setsockopt;
-	ctx->getsockopt = sk->sk_prot->getsockopt;
-	ctx->sk_proto_close = sk->sk_prot->close;
-	ctx->unhash = sk->sk_prot->unhash;
+	ctx->sk_proto = sk->sk_prot;
 	return ctx;
 }
 
@@ -683,9 +682,6 @@ static int tls_hw_prot(struct sock *sk)
 
 			spin_unlock_bh(&device_spinlock);
 			tls_build_proto(sk);
-			ctx->hash = sk->sk_prot->hash;
-			ctx->unhash = sk->sk_prot->unhash;
-			ctx->sk_proto_close = sk->sk_prot->close;
 			ctx->sk_destruct = sk->sk_destruct;
 			sk->sk_destruct = tls_hw_sk_destruct;
 			ctx->rx_conf = TLS_HW_RECORD;
@@ -717,7 +713,7 @@ static void tls_hw_unhash(struct sock *sk)
 		}
 	}
 	spin_unlock_bh(&device_spinlock);
-	ctx->unhash(sk);
+	ctx->sk_proto->unhash(sk);
 }
 
 static int tls_hw_hash(struct sock *sk)
@@ -726,7 +722,7 @@ static int tls_hw_hash(struct sock *sk)
 	struct tls_device *dev;
 	int err;
 
-	err = ctx->hash(sk);
+	err = ctx->sk_proto->hash(sk);
 	spin_lock_bh(&device_spinlock);
 	list_for_each_entry(dev, &device_list, dev_list) {
 		if (dev->hash) {
@@ -816,7 +812,6 @@ static int tls_init(struct sock *sk)
 
 	ctx->tx_conf = TLS_BASE;
 	ctx->rx_conf = TLS_BASE;
-	ctx->sk_proto = sk->sk_prot;
 	update_sk_prot(sk, ctx);
 out:
 	write_unlock_bh(&sk->sk_callback_lock);
@@ -828,12 +823,10 @@ static void tls_update(struct sock *sk, struct proto *p)
 	struct tls_context *ctx;
 
 	ctx = tls_get_ctx(sk);
-	if (likely(ctx)) {
-		ctx->sk_proto_close = p->close;
+	if (likely(ctx))
 		ctx->sk_proto = p;
-	} else {
+	else
 		sk->sk_prot = p;
-	}
 }
 
 static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 0/5] net/tls: minor cleanups
From: Jakub Kicinski @ 2019-09-03  4:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel, Jakub Kicinski

Hi!

This set is a grab bag of TLS cleanups accumulated in my tree
in an attempt to avoid merge problems with net. Nothing stands
out. First patch dedups context information. Next control path
locking is very slightly optimized. Fourth patch cleans up
ugly #ifdefs.

Jakub Kicinski (5):
  net/tls: use the full sk_proto pointer
  net/tls: don't jump to return
  net/tls: narrow down the critical area of device_offload_lock
  net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICE
  net/tls: dedup the record cleanup

 drivers/crypto/chelsio/chtls/chtls_main.c |  6 +-
 include/net/tls.h                         | 48 +++++++++-----
 net/tls/tls_device.c                      | 78 +++++++++++------------
 net/tls/tls_main.c                        | 46 ++++---------
 net/tls/tls_sw.c                          |  6 +-
 5 files changed, 85 insertions(+), 99 deletions(-)

-- 
2.21.0


^ permalink raw reply

* RE: [PATCH v2 5/6] mdev: Update sysfs documentation
From: Parav Pandit @ 2019-09-03  3:53 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: alex.williamson@redhat.com, Jiri Pirko, kwankhede@nvidia.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190902163658.51fc48d2.cohuck@redhat.com>



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Monday, September 2, 2019 8:07 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> kwankhede@nvidia.com; davem@davemloft.net; kvm@vger.kernel.org;
> linux-kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 5/6] mdev: Update sysfs documentation
> 
> On Fri, 30 Aug 2019 13:10:17 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Cornelia Huck <cohuck@redhat.com>
> > > Sent: Friday, August 30, 2019 6:19 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> > > kwankhede@nvidia.com; davem@davemloft.net; kvm@vger.kernel.org;
> > > linux- kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 5/6] mdev: Update sysfs documentation
> > >
> > > On Thu, 29 Aug 2019 06:19:03 -0500
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > Updated documentation for optional read only sysfs attribute.
> > >
> > > I'd probably merge this into the patch introducing the attribute.
> > >
> > Ok. I will spin v3.
> >
> > > >
> > > > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > ---
> > > >  Documentation/driver-api/vfio-mediated-device.rst | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> > > > b/Documentation/driver-api/vfio-mediated-device.rst
> > > > index 25eb7d5b834b..0ab03d3f5629 100644
> > > > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > > > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > > > @@ -270,6 +270,7 @@ Directories and Files Under the sysfs for Each
> > > > mdev
> > > Device
> > > >           |--- remove
> > > >           |--- mdev_type {link to its type}
> > > >           |--- vendor-specific-attributes [optional]
> > > > +         |--- alias [optional]
> > >
> > > "optional" implies "not always present" to me, not "might return a
> > > read error if not available". Don't know if there's a better way to
> > > tag this? Or make it really optional? :)
> >
> > May be write it as,
> >
> > alias [ optional when requested by parent ]
> 
> I'm not sure what 'optional when requested' is supposed to mean...
> maybe something like 'content optional' or so?
> 
> >
> > >
> > > >
> > > >  * remove (write only)
> > > >
> > > > @@ -281,6 +282,10 @@ Example::
> > > >
> > > >  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> > > >
> > > > +* alias (read only)
> > > > +Whenever a parent requested to generate an alias, each mdev is
> > > > +assigned a unique alias by the mdev core. This file shows the
> > > > +alias of the
> > > mdev device.
> > >
> > > It's not really the parent, but the vendor driver requesting this,
> > > right? Also,
> > At mdev level, it only knows parent->ops structure, whether parent is
> registered by vendor driver or something else.
> 
> Who else is supposed to create the mdev device?
If you nitpick the language what is the vendor id for sample mttty driver?
Mtty is not a 'vendor driver' per say.

> 
> >
> > > "each mdev" is a bit ambiguous,
> > It is in context of the parent. Sentence is not starting with "each mdev".
> > But may be more verbosely written as,
> >
> > Whenever a parent requested to generate an alias, Each mdev device of
> > such parent is assigned unique alias by the mdev core. This file shows the
> alias of the mdev device.
> 
> I'd really leave the parent out of this: this seems more like an
> implementation detail. It's more that alias may either contain an alias, or
> return a read error if no alias has been generated. Who requested the alias
> to be generated is probably not really of interest to the userspace reader.
>

The documentation is for user and developer both.
It is not the right claim that 'only user care' for this.
Otherwise all the .ko diagrams and API description etc doesn't make any sense to the user.

For user it doesn't matter whether alias length is provided by 'vendor driver' or 'registered parent'.
This note on who should specify the alias length is mainly for the developers.
 
> >
> > > created via that driver. Lastly, if we stick with the "returns an
> > > error if not implemented" approach, that should also be mentioned
> here.
> > Ok. Will spin v3 to describe it.
> >
> > >
> > > > +
> > > >  Mediated device Hot plug
> > > >  ------------------------
> > > >
> >


^ permalink raw reply

* RE: [PATCH v2 1/6] mdev: Introduce sha1 based mdev alias
From: Parav Pandit @ 2019-09-03  3:47 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: alex.williamson@redhat.com, Jiri Pirko, kwankhede@nvidia.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190902164604.1d04614f.cohuck@redhat.com>



> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Monday, September 2, 2019 8:16 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> kwankhede@nvidia.com; davem@davemloft.net; kvm@vger.kernel.org;
> linux-kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 1/6] mdev: Introduce sha1 based mdev alias
> 
> On Fri, 30 Aug 2019 15:45:13 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > > > > > This detour via the local variable looks weird to me. Can
> > > > > > > you either create the alias directly in the mdev (would need
> > > > > > > to happen later in the function, but I'm not sure why you
> > > > > > > generate the alias before checking for duplicates anyway), or do
> an explicit copy?
> > > > > > Alias duplicate check is done after generating it, because
> > > > > > duplicate alias are
> > > > > not allowed.
> > > > > > The probability of collision is rare.
> > > > > > So it is speculatively generated without hold the lock,
> > > > > > because there is no
> > > > > need to hold the lock.
> > > > > > It is compared along with guid while mutex lock is held in single
> loop.
> > > > > > And if it is duplicate, there is no need to allocate mdev.
> > > > > >
> > > > > > It will be sub optimal to run through the mdev list 2nd time
> > > > > > after mdev
> > > > > creation and after generating alias for duplicate check.
> > > > >
> > > > > Ok, but what about copying it? I find this "set local variable
> > > > > to NULL after ownership is transferred" pattern a bit unintuitive.
> > > > > Copying it to the mdev (and then unconditionally freeing it)
> > > > > looks more
> > > obvious to me.
> > > > Its not unconditionally freed.
> > >
> > > That's not what I have been saying :(
> > >
> > Ah I see. You want to allocate alias memory twice; once inside mdev device
> and another one in _create() function.
> > _create() one you want to free unconditionally.
> >
> > Well, passing pointer is fine.
> 
> It's not that it doesn't work, but it feels fragile due to its non-obviousness.
And its well commented as Alex asked.

> 
> > mdev_register_device() has similar little tricky pattern that makes parent =
> NULL on __find_parent_device() finds duplicate one.
> 
> I don't think that the two are comparable.
>
They are very similar.
Why parent should be marked null otherwise.

 > >
> > Ownership transfer is more straight forward code.
> 
> I have to disagree here.
>
Ok. It is better than allocating memory twice. So I prefer to stick to this method.
 
> >
> > It is similar to device_initialize(), device init sequence code, where once
> device_initialize is done, freeing the device memory will be left to the
> put_device(), we don't call kfree() on mdev device.
> 
> This does not really look similar to me: devices are refcounted structures,
> while strings aren't; you transfer a local pointer to a refcounted structure
> and then discard the local reference.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox