Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] brcmfmac: stop watchdog before detach and free everything
From: Arend van Spriel @ 2018-05-29  9:25 UTC (permalink / raw)
  To: Michael Trimarchi
  Cc: Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Kalle Valo, David S. Miller, Pieter-Paul Giesberts, Ian Molton,
	linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
	netdev, linux-kernel
In-Reply-To: <1527493857-2220-1-git-send-email-michael@amarulasolutions.com>

On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
> Watchdog need to be stopped in brcmf_sdio_remove to avoid
> i
> The system is going down NOW!
> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
> Sent SIGTERM to all processes

[snip]

Please send a V2 with your configuration details to the commit message, 
ie. using built-in driver, no firmware in place, etc.

Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
> ---
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++
>   1 file changed, 7 insertions(+)

^ permalink raw reply

* Re: [PATCH] mac80211_hwsim: add error check to call to rhashtable_init
From: Colin Ian King @ 2018-05-29  9:23 UTC (permalink / raw)
  To: Johannes Berg, Kalle Valo, David S . Miller, linux-wireless,
	netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <1527585428.6955.7.camel@sipsolutions.net>

On 29/05/18 10:17, Johannes Berg wrote:
> On Tue, 2018-05-29 at 10:14 +0100, Colin King wrote:
>>
>> @@ -3573,7 +3573,9 @@ static int __init init_mac80211_hwsim(void)
>>  	hwsim_wq = alloc_workqueue("hwsim_wq", 0, 0);
>>  	if (!hwsim_wq)
>>  		return -ENOMEM;
>> -	rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params);
>> +	err = rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params);
>> +	if (err)
>> +		return err;
> 
> That's missing a workqueue free, but I can fix that while applying if
> you prefer.

Please do. Thanks
> 
> johannes
> 

^ permalink raw reply

* Re: [PATCH] brcmfmac: stop watchdog before detach and free everything
From: Arend van Spriel @ 2018-05-29  9:22 UTC (permalink / raw)
  To: Michael Nazzareno Trimarchi, Andy Shevchenko
  Cc: Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Kalle Valo, David S. Miller, Pieter-Paul Giesberts, Ian Molton,
	open list:TI WILINK WIRELES...,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	brcm80211-dev-list, netdev, LKML
In-Reply-To: <CAOf5uwkLT_-esTkqj-Mhbsn_f-JETs+XF+vd8v=U+FQSoSPcQA@mail.gmail.com>

On 5/28/2018 5:33 PM, Michael Nazzareno Trimarchi wrote:
> Hi Andy
>
> The problem seems really easy to solve:
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> index 412a05b..ba60b151 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> @@ -4227,13 +4227,7 @@ struct brcmf_sdio *brcmf_sdio_probe(struct
> brcmf_sdio_dev *sdiodev)
>          timer_setup(&bus->timer, brcmf_sdio_watchdog, 0);
>          /* Initialize watchdog thread */
>          init_completion(&bus->watchdog_wait);
> -       bus->watchdog_tsk = kthread_run(brcmf_sdio_watchdog_thread,
> -                                       bus, "brcmf_wdog/%s",
> -                                       dev_name(&sdiodev->func1->dev));
> -       if (IS_ERR(bus->watchdog_tsk)) {
> -               pr_warn("brcmf_watchdog thread failed to start\n");
> -               bus->watchdog_tsk = NULL;
> -       }
> +
>          /* Initialize DPC thread */
>          bus->dpc_triggered = false;
>          bus->dpc_running = false;
> @@ -4281,6 +4275,14 @@ struct brcmf_sdio *brcmf_sdio_probe(struct
> brcmf_sdio_dev *sdiodev)
>                  goto fail;
>          }
>
> +       bus->watchdog_tsk = kthread_run(brcmf_sdio_watchdog_thread,
> +                                       bus, "brcmf_wdog/%s",
> +                                       dev_name(&sdiodev->func1->dev));
> +       if (IS_ERR(bus->watchdog_tsk)) {
> +               pr_warn("brcmf_watchdog thread failed to start\n");
> +               bus->watchdog_tsk = NULL;
> +       }
> +
>          return bus;

Hi Michael,

That makes no sense. Or are you saying the function 
brcmf_fw_get_firmwares() fails for you? Oh wait, you mentioned the 
driver was built-in, right? The above change does not solve the issue. 
It just makes it less likely to occur. So I think your initial fix is 
the best solution for this.

Regards,
Arend

^ permalink raw reply

* Re: [PATCH] mac80211_hwsim: add error check to call to rhashtable_init
From: Johannes Berg @ 2018-05-29  9:17 UTC (permalink / raw)
  To: Colin King, Kalle Valo, David S . Miller, linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20180529091412.8530-1-colin.king@canonical.com>

On Tue, 2018-05-29 at 10:14 +0100, Colin King wrote:
> 
> @@ -3573,7 +3573,9 @@ static int __init init_mac80211_hwsim(void)
>  	hwsim_wq = alloc_workqueue("hwsim_wq", 0, 0);
>  	if (!hwsim_wq)
>  		return -ENOMEM;
> -	rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params);
> +	err = rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params);
> +	if (err)
> +		return err;

That's missing a workqueue free, but I can fix that while applying if
you prefer.

johannes

^ permalink raw reply

* [PATCH] mac80211_hwsim: add error check to call to rhashtable_init
From: Colin King @ 2018-05-29  9:14 UTC (permalink / raw)
  To: Johannes Berg, Kalle Valo, David S . Miller, linux-wireless,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The call to rhashtable_init is missing an error return check, it is
possible for this to fail various different ways, so fix this by adding
an error check.

Detected by: CoverityScan, CID#1469446 ("Unchecked return value")

Fixes: c6509cc3b3e8 ("mac80211_hwsim: add hashtable with mac address keys for faster lookup")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/mac80211_hwsim.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 89fc22520d40..f4b4f5690b16 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -3573,7 +3573,9 @@ static int __init init_mac80211_hwsim(void)
 	hwsim_wq = alloc_workqueue("hwsim_wq", 0, 0);
 	if (!hwsim_wq)
 		return -ENOMEM;
-	rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params);
+	err = rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params);
+	if (err)
+		return err;
 
 	err = register_pernet_device(&hwsim_net_ops);
 	if (err)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next] net: remove bypassed check in sch_direct_xmit()
From: Sergei Shtylyov @ 2018-05-29  8:58 UTC (permalink / raw)
  To: Song Liu, netdev
  Cc: kernel-team, John Fastabend, Alexei Starovoitov, David S . Miller
In-Reply-To: <20180528213632.1412353-1-songliubraving@fb.com>

Hello!

On 5/29/2018 12:36 AM, Song Liu wrote:

> Check sch_direct_xmit() at the end of sch_direct_xmit() will be bypassed.

    "Checking netif_xmit_frozen_or_stopped()", perhaps? Else it doesn't make 
much sense...

> This is because "ret" from sch_direct_xmit() will be either NETDEV_TX_OK
> or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0 will reach the
> condition:
> 
>      if (ret && netif_xmit_frozen_or_stopped(txq))
>          return false;
> 
> This patch cleans up the code by removing  the whole condition.
> 
> For more discussion about this, please refer to
>     https://marc.info/?t=152727195700008
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: David S. Miller <davem@davemloft.net>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 0/4 RFCv2] Realtek SMI RTL836x DSA driver
From: Linus Walleij @ 2018-05-29  8:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, Florian Fainelli, netdev,
	OpenWrt Development List, LEDE Development List
In-Reply-To: <20180528182037.GC27177@lunn.ch>

On Mon, May 28, 2018 at 8:20 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Mon, May 28, 2018 at 07:47:48PM +0200, Linus Walleij wrote:
>> This is a second RFC version of the DSA driver for Realtek
>> RTL8366x especially RTL8366RB.
>>
>> I've been beating my head against this one and I'm not really
>> clear on why my ethernet frames are not coming through to the
>> CPU port on the chip.
>>
>> It appears when using ethtool -S on the ports that packets
>> are passing fine into the router fabric and through to the
>> CPU port but the ethernet driver where the fixed link is
>> connected refuse to accept the packages.
>
> Hi Linus
>
> Have you played with RGMII delays?

No not like I changed them or anything... the SoC has some
set-up for skew and delay on the nanosecond level, but I used the
vendor defaults, verified to be the same in their custom
kernel tree.

It's this stuff from the DTS:

+                                       conf0 {
+                                               pins = "V8 GMAC0
RXDV", "T10 GMAC1 RXDV";
+                                               skew-delay = <0>;
+                                       };
+                                       conf1 {
+                                               pins = "Y7 GMAC0 RXC",
"Y11 GMAC1 RXC";
+                                               skew-delay = <15>;
+                                       };
+                                       conf2 {
+                                               pins = "T8 GMAC0
TXEN", "W11 GMAC1 TXEN";
+                                               skew-delay = <7>;
+                                       };
+                                       conf3 {
+                                               pins = "U8 GMAC0 TXC";
+                                               skew-delay = <11>;
+                                       };
+                                       conf4 {
+                                               pins = "V11 GMAC1 TXC";
+                                               skew-delay = <10>;
+                                       };

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Jesper Dangaard Brouer @ 2018-05-29  8:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eugene Syromiatnikov, netdev, linux-kernel, linux-doc, Kees Cook,
	Kai-Heng Feng, Daniel Borkmann, Alexei Starovoitov,
	Jonathan Corbet, Jiri Olsa, brouer
In-Reply-To: <20180525194359.qckwj3tnsbufyapz@ast-mbp>

On Fri, 25 May 2018 12:44:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Fri, May 25, 2018 at 06:50:09PM +0200, Eugene Syromiatnikov wrote:
> > On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:  
> > > On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:  
> > > > On Wed, 23 May 2018 15:02:45 -0700
> > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > >   
> > > > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:  
> > > > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > > > system boot/init stages these sysctls are not yet configured.
> > > > > > A concrete example is systemd, that has implemented loading of BPF
> > > > > > programs.
> > > > > > 
> > > > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > > > adds the ability to change the default setting of these sysctl knobs
> > > > > > as well as option to override them via a boot-time kernel parameter
> > > > > > (in order to avoid rebuilding kernel each time a need of changing these
> > > > > > defaults arises).
> > > > > > 
> > > > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.    
> > > > > 
> > > > > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > > > >   so disabling unpriv during boot time makes no difference to systemd.
> > > > >   what is the actual reason to present time?  
> > systemd also runs a lot of code, some of which is unprivileged.  
> 
> systemd processes sysctl configs first. It's essential for system
> security to do so. If you have concerns in how systemd does that
> please bring it up with systemd folks.
> 
> > > > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > > > >   networking. With unpriv disable during boot, it will force systemd
> > > > >   to do such networking from root, which will lower its security barrier.  
> > No, it will force systemd not to use SO_REUSEPORT BPF.  
> 
> sorry this argument makes no sense to me.
> 
> > > > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > > > >   Flipping it during the boot or right after or any time after
> > > > >   is the same thing. Why add such boot flag then?  
> > Well, that one was for completeness.  
> 
> Should we convert all sysctls to bootparams for 'completeness' ?

The bpf_jit_kallsyms option is not there for 'completeness', it is
there because I know Daniel talked about changing the default, and this
would provide an easy option for allowing changing this default (to on)
in different distros before flipping it on in some kernel release.


> > > > > - jit_harden can be turned on by systemd. so turning it during the boot
> > > > >   will make systemd progs to be constant blinded.
> > > > >   Constant blinding protects kernel from unprivileged JIT spraying.
> > > > >   Are you worried that systemd will attack the kernel with JIT spraying?  
> > I'm worried that systemd can be exploited for a JIT spraying attack.  
> 
> I'm afraid we're not on the same page with definition of 'JIT spraying attack'.
> 
> > Another thing I'm concerned with is that the generated code is different,
> > which introduces additional complication during debugging.  
> 
> specifically what kind of complication?
> 

I think kernel.unprivileged_bpf_disable is the best example, why we
need this facility.  Once kernel.unprivileged_bpf_disable is set to 1,
it cannot be turned back to 0.

It is a policy decision that I want unprivileged_bpf_disable=1.
Setting this policy in the sysctl step is too late, because as you
write above, you expect that e.g. systemd or other boot-scripts
want/can load bpf-progs in unpriv mode.  Thus, they can violate my
policy choice.

This also leads to the before mentioned inconsistency issue. The
init-script/systemd-service will during boot successfully load unpriv
bpf-prog, but when I afterwards when I reload (or stop/start) the
service, it will fail as it no longer can load the unpriv bpf program.

If a distribution makes this policy choice, then users cannot change it
back to unprivileged_bpf_disable=0 via sysctl.  Thus, we need a boot
param to allow users to change this policy.  If the disto didn't change
this default, then we still need the boot-param, to allow users to
enforce this policy choice towards init-scripts (as explained above).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next 3/3] bnxt_en: Implement .ndo_set_vf_queues().
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1527581920-3778-1-git-send-email-michael.chan@broadcom.com>

Implement .ndo_set_vf_queues() on the PF driver to configure the queues
parameters for individual VFs.  This allows the admin. on the host to
increase or decrease queues for individual VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 130 ++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |   2 +
 3 files changed, 133 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index dfa0839..2ce9779 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8373,6 +8373,7 @@ static int bnxt_swdev_port_attr_get(struct net_device *dev,
 	.ndo_set_vf_link_state	= bnxt_set_vf_link_state,
 	.ndo_set_vf_spoofchk	= bnxt_set_vf_spoofchk,
 	.ndo_set_vf_trust	= bnxt_set_vf_trust,
+	.ndo_set_vf_queues	= bnxt_set_vf_queues,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= bnxt_poll_controller,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 7a92125..a34a32f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -138,6 +138,136 @@ int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trusted)
 	return 0;
 }
 
+static bool bnxt_param_ok(int new, u16 curr, u16 avail)
+{
+	int delta;
+
+	if (new <= curr)
+		return true;
+
+	delta = new - curr;
+	if (delta <= avail)
+		return true;
+	return false;
+}
+
+static void bnxt_adjust_ring_resc(struct bnxt *bp, struct bnxt_vf_info *vf,
+				  struct hwrm_func_vf_resource_cfg_input *req)
+{
+	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+	u16 avail_cp_rings, avail_stat_ctx;
+	u16 avail_vnics, avail_ring_grps;
+	u16 cp, grp, stat, vnic;
+	u16 min_tx, min_rx;
+
+	min_tx = le16_to_cpu(req->min_tx_rings);
+	min_rx = le16_to_cpu(req->min_rx_rings);
+	avail_cp_rings = hw_resc->max_cp_rings - bp->cp_nr_rings;
+	avail_stat_ctx = hw_resc->max_stat_ctxs - bp->num_stat_ctxs;
+	avail_ring_grps = hw_resc->max_hw_ring_grps - bp->rx_nr_rings;
+	avail_vnics = hw_resc->max_vnics - bp->nr_vnics;
+
+	cp = max_t(u16, 2 * min_tx, min_rx);
+	if (cp > vf->min_cp_rings)
+		cp = min_t(u16, cp, avail_cp_rings + vf->min_cp_rings);
+	grp = min_tx;
+	if (grp > vf->min_ring_grps)
+		grp = min_t(u16, grp, avail_ring_grps + vf->min_ring_grps);
+	stat = min_rx;
+	if (stat > vf->min_stat_ctxs)
+		stat = min_t(u16, stat, avail_stat_ctx + vf->min_stat_ctxs);
+	vnic = min_rx;
+	if (vnic > vf->min_vnics)
+		vnic = min_t(u16, vnic, avail_vnics + vf->min_vnics);
+
+	req->min_cmpl_rings = req->max_cmpl_rings = cpu_to_le16(cp);
+	req->min_hw_ring_grps = req->max_hw_ring_grps = cpu_to_le16(grp);
+	req->min_stat_ctx = req->max_stat_ctx = cpu_to_le16(stat);
+	req->min_vnics = req->max_vnics = cpu_to_le16(vnic);
+}
+
+static void bnxt_record_ring_resc(struct bnxt *bp, struct bnxt_vf_info *vf,
+				  struct hwrm_func_vf_resource_cfg_input *req)
+{
+	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+
+	hw_resc->max_tx_rings += vf->min_tx_rings;
+	hw_resc->max_rx_rings += vf->min_rx_rings;
+	vf->min_tx_rings = le16_to_cpu(req->min_tx_rings);
+	vf->max_tx_rings = le16_to_cpu(req->max_tx_rings);
+	vf->min_rx_rings = le16_to_cpu(req->min_rx_rings);
+	vf->max_rx_rings = le16_to_cpu(req->max_rx_rings);
+	hw_resc->max_tx_rings -= vf->min_tx_rings;
+	hw_resc->max_rx_rings -= vf->min_rx_rings;
+	if (bp->pf.vf_resv_strategy == BNXT_VF_RESV_STRATEGY_MAXIMAL) {
+		hw_resc->max_cp_rings += vf->min_cp_rings;
+		hw_resc->max_hw_ring_grps += vf->min_ring_grps;
+		hw_resc->max_stat_ctxs += vf->min_stat_ctxs;
+		hw_resc->max_vnics += vf->min_vnics;
+		vf->min_cp_rings = le16_to_cpu(req->min_cmpl_rings);
+		vf->min_ring_grps = le16_to_cpu(req->min_hw_ring_grps);
+		vf->min_stat_ctxs = le16_to_cpu(req->min_stat_ctx);
+		vf->min_vnics = le16_to_cpu(req->min_vnics);
+		hw_resc->max_cp_rings -= vf->min_cp_rings;
+		hw_resc->max_hw_ring_grps -= vf->min_ring_grps;
+		hw_resc->max_stat_ctxs -= vf->min_stat_ctxs;
+		hw_resc->max_vnics -= vf->min_vnics;
+	}
+}
+
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+		       int max_txq, int min_rxq, int max_rxq)
+{
+	struct hwrm_func_vf_resource_cfg_input req = {0};
+	struct bnxt *bp = netdev_priv(dev);
+	u16 avail_tx_rings, avail_rx_rings;
+	struct bnxt_hw_resc *hw_resc;
+	struct bnxt_vf_info *vf;
+	int rc;
+
+	if (bnxt_vf_ndo_prep(bp, vf_id))
+		return -EINVAL;
+
+	if (!(bp->flags & BNXT_FLAG_NEW_RM) || bp->hwrm_spec_code < 0x10902)
+		return -EOPNOTSUPP;
+
+	vf = &bp->pf.vf[vf_id];
+	hw_resc = &bp->hw_resc;
+
+	avail_tx_rings = hw_resc->max_tx_rings - bp->tx_nr_rings;
+	if (bp->flags & BNXT_FLAG_AGG_RINGS)
+		avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings * 2;
+	else
+		avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings;
+
+	if (!bnxt_param_ok(min_txq, vf->min_tx_rings, avail_tx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(min_rxq, vf->min_rx_rings, avail_rx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(max_txq, vf->max_tx_rings, avail_tx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(max_rxq, vf->max_rx_rings, avail_rx_rings))
+		return -ENOBUFS;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_RESOURCE_CFG, -1, -1);
+	memcpy(&req, &bp->vf_resc_cfg_input, sizeof(req));
+	req.vf_id = cpu_to_le16(vf->fw_fid);
+	req.min_tx_rings = cpu_to_le16(min_txq);
+	req.min_rx_rings = cpu_to_le16(min_rxq);
+	req.max_tx_rings = cpu_to_le16(max_txq);
+	req.max_rx_rings = cpu_to_le16(max_rxq);
+
+	if (bp->pf.vf_resv_strategy == BNXT_VF_RESV_STRATEGY_MAXIMAL)
+		bnxt_adjust_ring_resc(bp, vf, &req);
+
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		return -EIO;
+
+	bnxt_record_ring_resc(bp, vf, &req);
+	return 0;
+}
+
 int bnxt_get_vf_config(struct net_device *dev, int vf_id,
 		       struct ifla_vf_info *ivi)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index e9b20cd..325b412 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -35,6 +35,8 @@
 int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+		       int max_txq, int min_rxq, int max_rxq);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs.
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1527581920-3778-1-git-send-email-michael.chan@broadcom.com>

With new infrastructure to configure queues differently for each VF,
we need to store the current min/max rx/tx rings and other resources
for each VF.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  9 +++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 27 +++++++++++++++++++++----
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 9b14eb6..531c77d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -837,6 +837,14 @@ struct bnxt_vf_info {
 	u32	func_flags; /* func cfg flags */
 	u32	min_tx_rate;
 	u32	max_tx_rate;
+	u16	min_tx_rings;
+	u16	max_tx_rings;
+	u16	min_rx_rings;
+	u16	max_rx_rings;
+	u16	min_cp_rings;
+	u16	min_stat_ctxs;
+	u16	min_ring_grps;
+	u16	min_vnics;
 	void	*hwrm_cmd_req_addr;
 	dma_addr_t	hwrm_cmd_req_dma_addr;
 };
@@ -1351,6 +1359,7 @@ struct bnxt {
 #ifdef CONFIG_BNXT_SRIOV
 	int			nr_vfs;
 	struct bnxt_vf_info	vf;
+	struct hwrm_func_vf_resource_cfg_input vf_resc_cfg_input;
 	wait_queue_head_t	sriov_cfg_wait;
 	bool			sriov_cfg;
 #define BNXT_SRIOV_CFG_WAIT_TMO	msecs_to_jiffies(10000)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index a649108..7a92125 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -171,6 +171,10 @@ int bnxt_get_vf_config(struct net_device *dev, int vf_id,
 		ivi->linkstate = IFLA_VF_LINK_STATE_ENABLE;
 	else
 		ivi->linkstate = IFLA_VF_LINK_STATE_DISABLE;
+	ivi->min_tx_queues = vf->min_tx_rings;
+	ivi->max_tx_queues = vf->max_tx_rings;
+	ivi->min_rx_queues = vf->min_rx_rings;
+	ivi->max_rx_queues = vf->max_rx_rings;
 
 	return 0;
 }
@@ -498,6 +502,8 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		struct bnxt_vf_info *vf = &pf->vf[i];
+
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
@@ -506,7 +512,15 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 			break;
 		}
 		pf->active_vfs = i + 1;
-		pf->vf[i].fw_fid = pf->first_vf_id + i;
+		vf->fw_fid = pf->first_vf_id + i;
+		vf->min_tx_rings = le16_to_cpu(req.min_tx_rings);
+		vf->max_tx_rings = vf_tx_rings;
+		vf->min_rx_rings = le16_to_cpu(req.min_rx_rings);
+		vf->max_rx_rings = vf_rx_rings;
+		vf->min_cp_rings = le16_to_cpu(req.min_cmpl_rings);
+		vf->min_stat_ctxs = le16_to_cpu(req.min_stat_ctx);
+		vf->min_ring_grps = le16_to_cpu(req.min_hw_ring_grps);
+		vf->min_vnics = le16_to_cpu(req.min_vnics);
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	if (pf->active_vfs) {
@@ -521,6 +535,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 		hw_resc->max_stat_ctxs -= le16_to_cpu(req.min_stat_ctx) * n;
 		hw_resc->max_vnics -= le16_to_cpu(req.min_vnics) * n;
 
+		memcpy(&bp->vf_resc_cfg_input, &req, sizeof(req));
 		rc = pf->active_vfs;
 	}
 	return rc;
@@ -585,6 +600,7 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		struct bnxt_vf_info *vf = &pf->vf[i];
 		int vf_tx_rsvd = vf_tx_rings;
 
 		req.fid = cpu_to_le16(pf->first_vf_id + i);
@@ -593,12 +609,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 		if (rc)
 			break;
 		pf->active_vfs = i + 1;
-		pf->vf[i].fw_fid = le16_to_cpu(req.fid);
-		rc = __bnxt_hwrm_get_tx_rings(bp, pf->vf[i].fw_fid,
-					      &vf_tx_rsvd);
+		vf->fw_fid = le16_to_cpu(req.fid);
+		rc = __bnxt_hwrm_get_tx_rings(bp, vf->fw_fid, &vf_tx_rsvd);
 		if (rc)
 			break;
 		total_vf_tx_rings += vf_tx_rsvd;
+		vf->min_tx_rings = vf_tx_rsvd;
+		vf->max_tx_rings = vf_tx_rsvd;
+		vf->min_rx_rings = vf_rx_rings;
+		vf->max_rx_rings = vf_rx_rings;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	if (rc)
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1527581920-3778-1-git-send-email-michael.chan@broadcom.com>

VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF.  With ever increasing number of VFs
being supported, it is desirable to allow the admin. to configure queue
resources differently for the VFs.  Some VFs may require more or fewer
queues due to different bandwidth requirements or different number of
vCPUs in the VM.  This patch adds the infrastructure to do that by
adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
to the net_device_ops.

Four parameters are exposed for each VF:

o min_tx_queues - Guaranteed tx queues available to the VF.

o max_tx_queues - Maximum but not necessarily guaranteed tx queues
  available to the VF.

o min_rx_queues - Guaranteed rx queues available to the VF.

o max_rx_queues - Maximum but not necessarily guaranteed rx queues
  available to the VF.

The "ip link set" command will subsequently be patched to support the new
operation to set the above parameters.

After the admin. makes a change to the above parameters, the corresponding
VF will have a new range of channels to set using ethtool -L.  The VF may
have to go through IF down/up before the new queues will take effect.  Up
to the min values are guaranteed.  Up to the max values are possible but not
guaranteed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 include/linux/if_link.h      |  4 ++++
 include/linux/netdevice.h    |  6 ++++++
 include/uapi/linux/if_link.h |  9 +++++++++
 net/core/rtnetlink.c         | 32 +++++++++++++++++++++++++++++---
 4 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 622658d..8e81121 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -29,5 +29,9 @@ struct ifla_vf_info {
 	__u32 rss_query_en;
 	__u32 trusted;
 	__be16 vlan_proto;
+	__u32 min_tx_queues;
+	__u32 max_tx_queues;
+	__u32 min_rx_queues;
+	__u32 max_rx_queues;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8452f72..17f5892 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,8 @@ struct dev_ifalias {
  *      with PF and querying it may introduce a theoretical security risk.
  * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
+ *			    int max_txq, int min_rxq, int max_rxq);
  * int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
  *		       void *type_data);
  *	Called to setup any 'tc' scheduler, classifier or action on @dev.
@@ -1276,6 +1278,10 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
+	int			(*ndo_set_vf_queues)(struct net_device *dev,
+						     int vf,
+						     int min_txq, int max_txq,
+						     int min_rxq, int max_rxq);
 	int			(*ndo_setup_tc)(struct net_device *dev,
 						enum tc_setup_type type,
 						void *type_data);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index cf01b68..81bbc4e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -659,6 +659,7 @@ enum {
 	IFLA_VF_IB_NODE_GUID,	/* VF Infiniband node GUID */
 	IFLA_VF_IB_PORT_GUID,	/* VF Infiniband port GUID */
 	IFLA_VF_VLAN_LIST,	/* nested list of vlans, option for QinQ */
+	IFLA_VF_QUEUES,		/* Min and Max TX/RX queues */
 	__IFLA_VF_MAX,
 };
 
@@ -749,6 +750,14 @@ struct ifla_vf_trust {
 	__u32 setting;
 };
 
+struct ifla_vf_queues {
+	__u32 vf;
+	__u32 min_tx_queues;	/* min guaranteed tx queues */
+	__u32 max_tx_queues;	/* max non guaranteed tx queues */
+	__u32 min_rx_queues;	/* min guaranteed rx queues */
+	__u32 max_rx_queues;	/* max non guaranteed rx queues */
+};
+
 /* VF ports management section
  *
  *	Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8080254..e21ab8a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -921,7 +921,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size_64bit(sizeof(__u64)) +
 			 /* IFLA_VF_STATS_TX_DROPPED */
 			 nla_total_size_64bit(sizeof(__u64)) +
-			 nla_total_size(sizeof(struct ifla_vf_trust)));
+			 nla_total_size(sizeof(struct ifla_vf_trust)) +
+			 nla_total_size(sizeof(struct ifla_vf_queues)));
 		return size;
 	} else
 		return 0;
@@ -1181,6 +1182,7 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	struct ifla_vf_vlan_info vf_vlan_info;
 	struct ifla_vf_spoofchk vf_spoofchk;
 	struct ifla_vf_tx_rate vf_tx_rate;
+	struct ifla_vf_queues vf_queues;
 	struct ifla_vf_stats vf_stats;
 	struct ifla_vf_trust vf_trust;
 	struct ifla_vf_vlan vf_vlan;
@@ -1198,6 +1200,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	ivi.spoofchk = -1;
 	ivi.rss_query_en = -1;
 	ivi.trusted = -1;
+	ivi.min_tx_queues = -1;
+	ivi.max_tx_queues = -1;
+	ivi.min_rx_queues = -1;
+	ivi.max_rx_queues = -1;
 	/* The default value for VF link state is "auto"
 	 * IFLA_VF_LINK_STATE_AUTO which equals zero
 	 */
@@ -1217,7 +1223,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		vf_spoofchk.vf =
 		vf_linkstate.vf =
 		vf_rss_query_en.vf =
-		vf_trust.vf = ivi.vf;
+		vf_trust.vf =
+		vf_queues.vf = ivi.vf;
 
 	memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
 	vf_vlan.vlan = ivi.vlan;
@@ -1232,6 +1239,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	vf_linkstate.link_state = ivi.linkstate;
 	vf_rss_query_en.setting = ivi.rss_query_en;
 	vf_trust.setting = ivi.trusted;
+	vf_queues.min_tx_queues = ivi.min_tx_queues;
+	vf_queues.max_tx_queues = ivi.max_tx_queues;
+	vf_queues.min_rx_queues = ivi.min_rx_queues;
+	vf_queues.max_rx_queues = ivi.max_rx_queues;
 	vf = nla_nest_start(skb, IFLA_VF_INFO);
 	if (!vf)
 		goto nla_put_vfinfo_failure;
@@ -1249,7 +1260,9 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		    sizeof(vf_rss_query_en),
 		    &vf_rss_query_en) ||
 	    nla_put(skb, IFLA_VF_TRUST,
-		    sizeof(vf_trust), &vf_trust))
+		    sizeof(vf_trust), &vf_trust) ||
+	    nla_put(skb, IFLA_VF_QUEUES,
+		    sizeof(vf_queues), &vf_queues))
 		goto nla_put_vf_failure;
 	vfvlanlist = nla_nest_start(skb, IFLA_VF_VLAN_LIST);
 	if (!vfvlanlist)
@@ -1706,6 +1719,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	[IFLA_VF_TRUST]		= { .len = sizeof(struct ifla_vf_trust) },
 	[IFLA_VF_IB_NODE_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
 	[IFLA_VF_IB_PORT_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
+	[IFLA_VF_QUEUES]	= { .len = sizeof(struct ifla_vf_queues) },
 };
 
 static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
@@ -2208,6 +2222,18 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 		return handle_vf_guid(dev, ivt, IFLA_VF_IB_PORT_GUID);
 	}
 
+	if (tb[IFLA_VF_QUEUES]) {
+		struct ifla_vf_queues *ivq = nla_data(tb[IFLA_VF_QUEUES]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_queues)
+			err = ops->ndo_set_vf_queues(dev, ivq->vf,
+					ivq->min_tx_queues, ivq->max_tx_queues,
+					ivq->min_rx_queues, ivq->max_rx_queues);
+		if (err < 0)
+			return err;
+	}
+
 	return err;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues.
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev

VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF.  This series adds the infrastructure
to do that and adds the functionality to the bnxt_en driver.

The "ip link set" command will subsequently be patched to support the new
operation.

v1:
- Changed the meaning of the min parameters to be strictly the minimum
guaranteed value, suggested by Jakub Kicinsky.
- More complete implementation in the bnxt_en driver.

Michael Chan (3):
  net: Add support to configure SR-IOV VF minimum and maximum queues.
  bnxt_en: Store min/max tx/rx rings for individual VFs.
  bnxt_en: Implement .ndo_set_vf_queues().

 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |   9 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 157 +++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |   2 +
 include/linux/if_link.h                         |   4 +
 include/linux/netdevice.h                       |   6 +
 include/uapi/linux/if_link.h                    |   9 ++
 net/core/rtnetlink.c                            |  32 ++++-
 8 files changed, 213 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next 05/14] nfp: abm: add simple RED offload
From: Nogah Frankel @ 2018-05-29  7:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, jiri, xiyou.wangcong, john.fastabend, netdev, oss-drivers,
	alexei.starovoitov, nogahf, yuvalm, gerlitz.or
In-Reply-To: <20180528160507.4a2ff81a@cakuba>

On 29-May-18 2:05 AM, Jakub Kicinski wrote:

Hi Jakub,

> Hi Nogah!
> 
> On Mon, 28 May 2018 18:49:51 +0300, Nogah Frankel wrote:
>>> +static int
>>> +nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink,
>>> +		    struct tc_red_qopt_offload *opt)
>>> +{
>>> +	struct nfp_port *port = nfp_port_from_netdev(netdev);
>>> +	int err;
>>> +
>>> +	if (opt->set.min != opt->set.max || !opt->set.is_ecn) {
>>
>> I am a bit worried about the min == max.
>> sch_red doesn't really support it. It will calculate incorrect delta
>> value. (And that only if tc_red_eval_P in iproute2 won't reject it).
>> You might maybe use max = min+1,  because in real life it will probably
>> act the same but without this problem.
> 
> I remember having a long think about this when I wrote the code.
> My conclusion was that the two would operate almost the same, and
> setting min == max may be most obvious to the user.

I agree.

> 
> If min + 1 == max sch_red would act probabilistically for qavg == min,
> which is not what the card would do.
> 
> Userspace now does this:
> 
> tc_red_eval_P() {
> 	int i = qmax - qmin;
>   
> 	if (!i)
> 		return 0;
> 	if (i < 0)
> 		return -1;
> 	...
> }
> 
> And you've fixed delta to be treated as 1 to avoid division by 0 in
> commit 5c472203421a ("net_sched: red: Avoid devision by zero"):
> 
> red_set_parms() {
> 	int delta = qth_max - qth_min;
> 	u32 max_p_delta;
> 
> 	p->qth_min	= qth_min << Wlog;
> 	p->qth_max	= qth_max << Wlog;
> 	p->Wlog		= Wlog;
> 	p->Plog		= Plog;
> 	if (delta <= 0)
> 		delta = 1;
> 	p->qth_delta	= delta;
> 	...
> }

I changes it to avoid division by 0, but I wasn't sure that the delta 
value of 1 will be good, just that it is better then 0.

> 
> So we should be safe.  Targets will match.  Probability adjustment for
> adaptive should work correctly.  Which doesn't matter anyway, since we
> will never use the probabilistic action...

That makes sense, and it is a better way to set this setup (DCTCP, I 
guess?) than before.

Thanks

Nogah


> 
>> Nogah Frankel
>> (from a new mail address)
> 
> Noted :)
> 

^ permalink raw reply

* (unknown), 
From: администратор @ 2018-05-29  7:26 UTC (permalink / raw)


пользователь веб-почты

Обратите внимание, что 95% ваших писем, полученных после последнего раза, когда вам нужно обновить сервер своей веб-почты в нашей базе данных, были отложены. Регулярно получать и отправлять свои сообщения. Техническая команда нашей электронной почты обновит вашу учетную запись в течение 3 рабочих дней. Если вы этого не сделаете, ваша учетная запись будет временно приостановлена нашими службами. Чтобы снова проверить свой почтовый ящик, пришлите следующую информацию:

обычный:
Имя пользователя:
пароль:
Подтвердите Пароль:

Предупреждение!! Если это откажется обновлять аккаунты в течение двух
дней с момента получения электронной почты, он навсегда потеряет счета
владельцы учетных записей электронной почты.

Спасибо за сотрудничество!

Copyright © 2017-2018 Служба технической поддержки WebMail, Inc.

^ permalink raw reply

* Re: rtlwifi: remove duplicate code
From: Kalle Valo @ 2018-05-29  7:28 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Ping-Ke Shih, David S. Miller,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Gustavo A. R. Silva
In-Reply-To: <20180524185450.GA2875-L1vi/lXTdts+Va1GwOuvDg@public.gmane.org>

"Gustavo A. R. Silva" <gustavo-L1vi/lXTdts+Va1GwOuvDg@public.gmane.org> wrote:

> Remove and refactor some code in order to avoid having identical code
> for different branches.
> 
> Notice that the logic has been there since 2014.
> 
> Addresses-Coverity-ID: 1426199 ("Identical code for different branches")
> Signed-off-by: Gustavo A. R. Silva <gustavo-L1vi/lXTdts+Va1GwOuvDg@public.gmane.org>

Patch applied to wireless-drivers-next.git, thanks.

06895b1627fe rtlwifi: remove duplicate code

-- 
https://patchwork.kernel.org/patch/10425379/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: mwifiex: mark expected switch fall-throughs
From: Kalle Valo @ 2018-05-29  7:26 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Amitkumar Karwar, Nishant Sarmukadam, Ganapathi Bhat, Xinming Hu,
	David S. Miller, linux-wireless, netdev, linux-kernel,
	Gustavo A. R. Silva
In-Reply-To: <20180525213854.GA13392@embeddedor.com>

"Gustavo A. R. Silva" <gustavo@embeddedor.com> wrote:

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Patch applied to wireless-drivers-next.git, thanks.

666cc438f369 mwifiex: mark expected switch fall-throughs

-- 
https://patchwork.kernel.org/patch/10428577/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: rsi: fix spelling mistake "Uknown" -> "Unknown"
From: Kalle Valo @ 2018-05-29  7:21 UTC (permalink / raw)
  To: Colin Ian King
  Cc: David S . Miller, linux-wireless, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20180526150019.10288-1-colin.king@canonical.com>

Colin Ian King <colin.king@canonical.com> wrote:

> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistake in rsi_dbg message text
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Patch applied to wireless-drivers-next.git, thanks.

a53e8f3edaa0 rsi: fix spelling mistake "Uknown" -> "Unknown"

-- 
https://patchwork.kernel.org/patch/10429003/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: atmel: Add missing call to pci_disable_device()
From: Kalle Valo @ 2018-05-29  7:20 UTC (permalink / raw)
  To: YueHaibing; +Cc: simon, davem, netdev, linux-wireless, YueHaibing
In-Reply-To: <20180523103445.8372-1-yuehaibing@huawei.com>

YueHaibing <yuehaibing@huawei.com> wrote:

> add pci_disable_device in error handling while init_atmel_card failed.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Patch applied to wireless-drivers-next.git, thanks.

88027c8ff0a3 atmel: Add missing call to pci_disable_device()

-- 
https://patchwork.kernel.org/patch/10420905/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH rdma-next v1 11/13] IB/mlx5: Add flow counters binding support
From: Leon Romanovsky @ 2018-05-29  7:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Boris Pismenny, Matan Barak,
	Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180528172752.GD17505@ziepe.ca>

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

On Mon, May 28, 2018 at 11:27:52AM -0600, Jason Gunthorpe wrote:
> On Sun, May 27, 2018 at 01:23:44PM +0300, Leon Romanovsky wrote:
>
> > +	if (!mcounters->hw_cntrs_hndl) {
> > +		mcounters->hw_cntrs_hndl =
> > +			(void *)mlx5_fc_create(to_mdev(ibcounters->device)->mdev,
> > +					       false);
>
> No unnecessary casts

hw_cntrs_hndl was intended to be generic enough to be connected with
different type of counters and not only mlx5_fc. However, i don't see
any coming feature are using this generic interface, so I'll remove.

>
> > +struct mlx5_ib_flow_counters_data {
> > +	__aligned_u64   counters_data;
>
> I think this is supposed to use RDMA_UAPI_PTR() these days.

It is a little bit problematic,
1. That RDMA_UAPI_PTR is declared in ib_user_ioctl_verbs.h, while
mlx5_ib_flow_counters_data in mlx5-abi.h.
2. Other similar __aligned_u64 needs to be updated in other mlx5-abi.h structures.

Thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net] vhost_net: flush batched heads before trying to busy polling
From: Jason Wang @ 2018-05-29  6:18 UTC (permalink / raw)
  To: mst, jasowang; +Cc: netdev, linux-kernel, kvm, virtualization

After commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"),
we tend to batch updating used heads. But it doesn't flush batched
heads before trying to do busy polling, this will cause vhost to wait
for guest TX which waits for the used RX. Fixing by flush batched
heads before busy loop.

1 byte TCP_RR performance recovers from 13107.83 to 50402.65.

Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 986058a..eeaf673 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -105,7 +105,9 @@ struct vhost_net_virtqueue {
 	/* vhost zerocopy support fields below: */
 	/* last used idx for outstanding DMA zerocopy buffers */
 	int upend_idx;
-	/* first used idx for DMA done zerocopy buffers */
+	/* For TX, first used idx for DMA done zerocopy buffers
+	 * For RX, number of batched heads
+	 */
 	int done_idx;
 	/* an array of userspace buffers info */
 	struct ubuf_info *ubuf_info;
@@ -626,6 +628,18 @@ static int sk_has_rx_data(struct sock *sk)
 	return skb_queue_empty(&sk->sk_receive_queue);
 }
 
+static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
+{
+	struct vhost_virtqueue *vq = &nvq->vq;
+	struct vhost_dev *dev = vq->dev;
+
+	if (!nvq->done_idx)
+		return;
+
+	vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx);
+	nvq->done_idx = 0;
+}
+
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
 	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
@@ -635,6 +649,8 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 	int len = peek_head_len(rvq, sk);
 
 	if (!len && vq->busyloop_timeout) {
+		/* Flush batched heads first */
+		vhost_rx_signal_used(rvq);
 		/* Both tx vq and rx socket were polled here */
 		mutex_lock_nested(&vq->mutex, 1);
 		vhost_disable_notify(&net->dev, vq);
@@ -762,7 +778,7 @@ static void handle_rx(struct vhost_net *net)
 	};
 	size_t total_len = 0;
 	int err, mergeable;
-	s16 headcount, nheads = 0;
+	s16 headcount;
 	size_t vhost_hlen, sock_hlen;
 	size_t vhost_len, sock_len;
 	struct socket *sock;
@@ -790,8 +806,8 @@ static void handle_rx(struct vhost_net *net)
 	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len,
-					&in, vq_log, &log,
+		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
+					vhost_len, &in, vq_log, &log,
 					likely(mergeable) ? UIO_MAXIOV : 1);
 		/* On error, stop handling until the next kick. */
 		if (unlikely(headcount < 0))
@@ -862,12 +878,9 @@ static void handle_rx(struct vhost_net *net)
 			vhost_discard_vq_desc(vq, headcount);
 			goto out;
 		}
-		nheads += headcount;
-		if (nheads > VHOST_RX_BATCH) {
-			vhost_add_used_and_signal_n(&net->dev, vq, vq->heads,
-						    nheads);
-			nheads = 0;
-		}
+		nvq->done_idx += headcount;
+		if (nvq->done_idx > VHOST_RX_BATCH)
+			vhost_rx_signal_used(nvq);
 		if (unlikely(vq_log))
 			vhost_log_write(vq, vq_log, log, vhost_len);
 		total_len += vhost_len;
@@ -878,9 +891,7 @@ static void handle_rx(struct vhost_net *net)
 	}
 	vhost_net_enable_vq(net, vq);
 out:
-	if (nheads)
-		vhost_add_used_and_signal_n(&net->dev, vq, vq->heads,
-					    nheads);
+	vhost_rx_signal_used(nvq);
 	mutex_unlock(&vq->mutex);
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC v5 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-29  6:16 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <20180529051125.GA2976@debian>



On 2018年05月29日 13:11, Tiwei Bie wrote:
> On Tue, May 29, 2018 at 11:18:57AM +0800, Jason Wang wrote:
>> On 2018年05月22日 16:16, Tiwei Bie wrote:
> [...]
>>> +static void detach_buf_packed(struct vring_virtqueue *vq,
>>> +			      unsigned int id, void **ctx)
>>> +{
>>> +	struct vring_desc_state_packed *state;
>>> +	struct vring_packed_desc *desc;
>>> +	unsigned int i;
>>> +	int *next;
>>> +
>>> +	/* Clear data ptr. */
>>> +	vq->desc_state_packed[id].data = NULL;
>>> +
>>> +	next = &id;
>>> +	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
>>> +		state = &vq->desc_state_packed[*next];
>>> +		vring_unmap_state_packed(vq, state);
>>> +		next = &vq->desc_state_packed[*next].next;
>> Have a short discussion with Michael. It looks like the only thing we care
>> is DMA address and length here.
> The length tracked by desc_state_packed[] is also necessary
> when INDIRECT is used and the buffers are writable.
>
>> So a thought is to avoid using desc_state_packed() is vring_use_dma_api() is
>> false? Probably need another id allocator or just use desc_state_packed for
>> free id list.
> I think it's a good suggestion. Thanks!
>
> I won't track the addr/len/flags for non-indirect descs
> when vring_use_dma_api() returns false.
>
>>> +	}
>>> +
>>> +	vq->vq.num_free += vq->desc_state_packed[id].num;
>>> +	*next = vq->free_head;
>> Using pointer seems not elegant and unnecessary here. You can just track
>> next in integer I think?
> It's possible to use integer, but will need one more var
> to track `prev` (desc_state_packed[prev].next == next in
> this case).

Yes, please do this.

>
>>> +	vq->free_head = id;
>>> +
>>> +	if (vq->indirect) {
>>> +		u32 len;
>>> +
>>> +		/* Free the indirect table, if any, now that it's unmapped. */
>>> +		desc = vq->desc_state_packed[id].indir_desc;
>>> +		if (!desc)
>>> +			return;
>>> +
>>> +		len = vq->desc_state_packed[id].len;
>>> +		for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
>>> +			vring_unmap_desc_packed(vq, &desc[i]);
>>> +
>>> +		kfree(desc);
>>> +		vq->desc_state_packed[id].indir_desc = NULL;
>>> +	} else if (ctx) {
>>> +		*ctx = vq->desc_state_packed[id].indir_desc;
>>> +	}
>>>    }
>>>    static inline bool more_used_packed(const struct vring_virtqueue *vq)
>>>    {
>>> -	return false;
>>> +	u16 last_used, flags;
>>> +	u8 avail, used;
>>> +
>>> +	last_used = vq->last_used_idx;
>>> +	flags = virtio16_to_cpu(vq->vq.vdev,
>>> +				vq->vring_packed.desc[last_used].flags);
>>> +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
>>> +	used = !!(flags & VRING_DESC_F_USED(1));
>>> +
>>> +	return avail == used && used == vq->used_wrap_counter;
>> Spec does not check avail == used? So there's probably a bug in either of
>> the two places.
>>
>> In what condition that avail != used but used == vq_used_wrap_counter? A
>> buggy device or device set the two bits in two writes?
> Currently, spec doesn't forbid devices to set the status
> bits as: avail != used but used == vq_used_wrap_counter.
> So I think driver cannot assume this won't happen.

Right, but I could not find a situation that device need to something 
like this. We should either forbid this in the spec or change the 
example code in the spec.

Let's see how Michael think about this.

Thanks

^ permalink raw reply

* Re: [RFC v5 2/5] virtio_ring: support creating packed ring
From: Jason Wang @ 2018-05-29  6:13 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann,
	Cornelia Huck
In-Reply-To: <20180529052418.GB2976@debian>



On 2018年05月29日 13:24, Tiwei Bie wrote:
> On Tue, May 29, 2018 at 10:49:11AM +0800, Jason Wang wrote:
>> On 2018年05月22日 16:16, Tiwei Bie wrote:
>>> This commit introduces the support for creating packed ring.
>>> All split ring specific functions are added _split suffix.
>>> Some necessary stubs for packed ring are also added.
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
>>>    include/linux/virtio_ring.h  |   8 +-
>>>    2 files changed, 546 insertions(+), 263 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index 71458f493cf8..f5ef5f42a7cf 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -61,11 +61,15 @@ struct vring_desc_state {
>>>    	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>>>    };
>>> +struct vring_desc_state_packed {
>>> +	int next;			/* The next desc state. */
>>> +};
>>> +
>>>    struct vring_virtqueue {
>>>    	struct virtqueue vq;
>>> -	/* Actual memory layout for this queue */
>>> -	struct vring vring;
>>> +	/* Is this a packed ring? */
>>> +	bool packed;
>>>    	/* Can we use weak barriers? */
>>>    	bool weak_barriers;
>>> @@ -87,11 +91,39 @@ struct vring_virtqueue {
>>>    	/* Last used index we've seen. */
>>>    	u16 last_used_idx;
>>> -	/* Last written value to avail->flags */
>>> -	u16 avail_flags_shadow;
>>> +	union {
>>> +		/* Available for split ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue. */
>>> +			struct vring vring;
>>> -	/* Last written value to avail->idx in guest byte order */
>>> -	u16 avail_idx_shadow;
>>> +			/* Last written value to avail->flags */
>>> +			u16 avail_flags_shadow;
>>> +
>>> +			/* Last written value to avail->idx in
>>> +			 * guest byte order. */
>>> +			u16 avail_idx_shadow;
>>> +		};
>>> +
>>> +		/* Available for packed ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue. */
>>> +			struct vring_packed vring_packed;
>>> +
>>> +			/* Driver ring wrap counter. */
>>> +			u8 avail_wrap_counter;
>>> +
>>> +			/* Device ring wrap counter. */
>>> +			u8 used_wrap_counter;
>> How about just use boolean?
> I can change it to bool if you want.

Yes, please.

>
> [...]
>>> -static int vring_mapping_error(const struct vring_virtqueue *vq,
>>> -			       dma_addr_t addr)
>>> -{
>>> -	if (!vring_use_dma_api(vq->vq.vdev))
>>> -		return 0;
>>> -
>>> -	return dma_mapping_error(vring_dma_dev(vq), addr);
>>> -}
>> It looks to me if you keep vring_mapping_error behind
>> vring_unmap_one_split(), lots of changes were unncessary.
>>
> [...]
>>> +	}
>>> +	/* That should have freed everything. */
>>> +	BUG_ON(vq->vq.num_free != vq->vring.num);
>>> +
>>> +	END_USE(vq);
>>> +	return NULL;
>>> +}
>> I think the those copy-and-paste hunks could be avoided and the diff should
>> only contains renaming of the function. If yes, it would be very welcomed
>> since it requires to compare the changes verbatim otherwise.
> Michael suggested to lay out the code as:
>
> XXX_split
>
> XXX_packed
>
> XXX wrappers
>
> https://lkml.org/lkml/2018/4/13/410
>
> That's why I moved some code.

I see, then no need to change but it still looks unnecessary.

>
>>> +
>>> +/*
>>> + * The layout for the packed ring is a continuous chunk of memory
>>> + * which looks like this.
>>> + *
>>> + * struct vring_packed {
>>> + *	// The actual descriptors (16 bytes each)
>>> + *	struct vring_packed_desc desc[num];
>>> + *
>>> + *	// Padding to the next align boundary.
>>> + *	char pad[];
>>> + *
>>> + *	// Driver Event Suppression
>>> + *	struct vring_packed_desc_event driver;
>>> + *
>>> + *	// Device Event Suppression
>>> + *	struct vring_packed_desc_event device;
>>> + * };
>>> + */
>>> +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
>>> +				     void *p, unsigned long align)
>>> +{
>>> +	vr->num = num;
>>> +	vr->desc = p;
>>> +	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
>>> +		* num + align - 1) & ~(align - 1));
>> If we choose not to go uapi, maybe we can use ALIGN() macro here?
> Okay.
>
>>> +	vr->device = vr->driver + 1;
>>> +}
> [...]
>>> +/* Only available for split ring */
>>>    const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>>>    {
>> A possible issue with this is:
>>
>> After commit d4674240f31f8c4289abba07d64291c6ddce51bc ("KVM: s390:
>> virtio-ccw revision 1 SET_VQ"). CCW tries to use
>> virtqueue_get_avail()/virtqueue_get_used(). Looks like a bug either here or
>> ccw code.
> Do we still need to support:
>
> include/linux/virtio.h
> /*
>   * Legacy accessors -- in almost all cases, these are the wrong functions
>   * to use.
>   */
> static inline void *virtqueue_get_desc(struct virtqueue *vq)
> {
>          return virtqueue_get_vring(vq)->desc;
> }
> static inline void *virtqueue_get_avail(struct virtqueue *vq)
> {
>          return virtqueue_get_vring(vq)->avail;
> }
> static inline void *virtqueue_get_used(struct virtqueue *vq)
> {
>          return virtqueue_get_vring(vq)->used;
> }
>
> in packed ring?

I think it was probably a bug in ccw, they should use e.g 
virtqueue_get_desc_addr() instead.

Thanks

>
> If so, I think maybe it's better to expose them as
> symbols and implement them in virtio_ring.c.
>
> Best regards,
> Tiwei Bie

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Christoph Hellwig @ 2018-05-29  5:41 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Al Viro, Linux-Next Mailing List,
	Linux Kernel Mailing List, Björn Töpel,
	Alexei Starovoitov, Christoph Hellwig
In-Reply-To: <20180529134209.4b8353f1@canb.auug.org.au>

> [Christoph, I noticed that the comment above datagram_poll in
> net/core/datagram.c needs the function name updated]

I'll send a fix for that.

> 
> I have added the following merge fix patch for today:

This looks correct, thanks!

^ permalink raw reply

* Re: [RFC v5 2/5] virtio_ring: support creating packed ring
From: Tiwei Bie @ 2018-05-29  5:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann,
	Cornelia Huck
In-Reply-To: <654feee2-3414-e304-0aec-2bb19a7e0c87@redhat.com>

On Tue, May 29, 2018 at 10:49:11AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
> > This commit introduces the support for creating packed ring.
> > All split ring specific functions are added _split suffix.
> > Some necessary stubs for packed ring are also added.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
> >   include/linux/virtio_ring.h  |   8 +-
> >   2 files changed, 546 insertions(+), 263 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71458f493cf8..f5ef5f42a7cf 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -61,11 +61,15 @@ struct vring_desc_state {
> >   	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> >   };
> > +struct vring_desc_state_packed {
> > +	int next;			/* The next desc state. */
> > +};
> > +
> >   struct vring_virtqueue {
> >   	struct virtqueue vq;
> > -	/* Actual memory layout for this queue */
> > -	struct vring vring;
> > +	/* Is this a packed ring? */
> > +	bool packed;
> >   	/* Can we use weak barriers? */
> >   	bool weak_barriers;
> > @@ -87,11 +91,39 @@ struct vring_virtqueue {
> >   	/* Last used index we've seen. */
> >   	u16 last_used_idx;
> > -	/* Last written value to avail->flags */
> > -	u16 avail_flags_shadow;
> > +	union {
> > +		/* Available for split ring */
> > +		struct {
> > +			/* Actual memory layout for this queue. */
> > +			struct vring vring;
> > -	/* Last written value to avail->idx in guest byte order */
> > -	u16 avail_idx_shadow;
> > +			/* Last written value to avail->flags */
> > +			u16 avail_flags_shadow;
> > +
> > +			/* Last written value to avail->idx in
> > +			 * guest byte order. */
> > +			u16 avail_idx_shadow;
> > +		};
> > +
> > +		/* Available for packed ring */
> > +		struct {
> > +			/* Actual memory layout for this queue. */
> > +			struct vring_packed vring_packed;
> > +
> > +			/* Driver ring wrap counter. */
> > +			u8 avail_wrap_counter;
> > +
> > +			/* Device ring wrap counter. */
> > +			u8 used_wrap_counter;
> 
> How about just use boolean?

I can change it to bool if you want.

> 
[...]
> > -static int vring_mapping_error(const struct vring_virtqueue *vq,
> > -			       dma_addr_t addr)
> > -{
> > -	if (!vring_use_dma_api(vq->vq.vdev))
> > -		return 0;
> > -
> > -	return dma_mapping_error(vring_dma_dev(vq), addr);
> > -}
> 
> It looks to me if you keep vring_mapping_error behind
> vring_unmap_one_split(), lots of changes were unncessary.
> 
[...]
> > +	}
> > +	/* That should have freed everything. */
> > +	BUG_ON(vq->vq.num_free != vq->vring.num);
> > +
> > +	END_USE(vq);
> > +	return NULL;
> > +}
> 
> I think the those copy-and-paste hunks could be avoided and the diff should
> only contains renaming of the function. If yes, it would be very welcomed
> since it requires to compare the changes verbatim otherwise.

Michael suggested to lay out the code as:

XXX_split

XXX_packed

XXX wrappers

https://lkml.org/lkml/2018/4/13/410

That's why I moved some code.

> 
> > +
> > +/*
> > + * The layout for the packed ring is a continuous chunk of memory
> > + * which looks like this.
> > + *
> > + * struct vring_packed {
> > + *	// The actual descriptors (16 bytes each)
> > + *	struct vring_packed_desc desc[num];
> > + *
> > + *	// Padding to the next align boundary.
> > + *	char pad[];
> > + *
> > + *	// Driver Event Suppression
> > + *	struct vring_packed_desc_event driver;
> > + *
> > + *	// Device Event Suppression
> > + *	struct vring_packed_desc_event device;
> > + * };
> > + */
> > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > +				     void *p, unsigned long align)
> > +{
> > +	vr->num = num;
> > +	vr->desc = p;
> > +	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
> > +		* num + align - 1) & ~(align - 1));
> 
> If we choose not to go uapi, maybe we can use ALIGN() macro here?

Okay.

> 
> > +	vr->device = vr->driver + 1;
> > +}
[...]
> > +/* Only available for split ring */
> >   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> >   {
> 
> A possible issue with this is:
> 
> After commit d4674240f31f8c4289abba07d64291c6ddce51bc ("KVM: s390:
> virtio-ccw revision 1 SET_VQ"). CCW tries to use
> virtqueue_get_avail()/virtqueue_get_used(). Looks like a bug either here or
> ccw code.

Do we still need to support:

include/linux/virtio.h
/*
 * Legacy accessors -- in almost all cases, these are the wrong functions
 * to use.
 */
static inline void *virtqueue_get_desc(struct virtqueue *vq)
{
        return virtqueue_get_vring(vq)->desc;
}
static inline void *virtqueue_get_avail(struct virtqueue *vq)
{
        return virtqueue_get_vring(vq)->avail;
}
static inline void *virtqueue_get_used(struct virtqueue *vq)
{
        return virtqueue_get_vring(vq)->used;
}

in packed ring?

If so, I think maybe it's better to expose them as
symbols and implement them in virtio_ring.c.

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [RFC v5 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-29  5:11 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <30bd7505-0ded-8584-e9ab-152756a667d6@redhat.com>

On Tue, May 29, 2018 at 11:18:57AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
[...]
> > +static void detach_buf_packed(struct vring_virtqueue *vq,
> > +			      unsigned int id, void **ctx)
> > +{
> > +	struct vring_desc_state_packed *state;
> > +	struct vring_packed_desc *desc;
> > +	unsigned int i;
> > +	int *next;
> > +
> > +	/* Clear data ptr. */
> > +	vq->desc_state_packed[id].data = NULL;
> > +
> > +	next = &id;
> > +	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
> > +		state = &vq->desc_state_packed[*next];
> > +		vring_unmap_state_packed(vq, state);
> > +		next = &vq->desc_state_packed[*next].next;
> 
> Have a short discussion with Michael. It looks like the only thing we care
> is DMA address and length here.

The length tracked by desc_state_packed[] is also necessary
when INDIRECT is used and the buffers are writable.

> 
> So a thought is to avoid using desc_state_packed() is vring_use_dma_api() is
> false? Probably need another id allocator or just use desc_state_packed for
> free id list.

I think it's a good suggestion. Thanks!

I won't track the addr/len/flags for non-indirect descs
when vring_use_dma_api() returns false.

> 
> > +	}
> > +
> > +	vq->vq.num_free += vq->desc_state_packed[id].num;
> > +	*next = vq->free_head;
> 
> Using pointer seems not elegant and unnecessary here. You can just track
> next in integer I think?

It's possible to use integer, but will need one more var
to track `prev` (desc_state_packed[prev].next == next in
this case).

> 
> > +	vq->free_head = id;
> > +
> > +	if (vq->indirect) {
> > +		u32 len;
> > +
> > +		/* Free the indirect table, if any, now that it's unmapped. */
> > +		desc = vq->desc_state_packed[id].indir_desc;
> > +		if (!desc)
> > +			return;
> > +
> > +		len = vq->desc_state_packed[id].len;
> > +		for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
> > +			vring_unmap_desc_packed(vq, &desc[i]);
> > +
> > +		kfree(desc);
> > +		vq->desc_state_packed[id].indir_desc = NULL;
> > +	} else if (ctx) {
> > +		*ctx = vq->desc_state_packed[id].indir_desc;
> > +	}
> >   }
> >   static inline bool more_used_packed(const struct vring_virtqueue *vq)
> >   {
> > -	return false;
> > +	u16 last_used, flags;
> > +	u8 avail, used;
> > +
> > +	last_used = vq->last_used_idx;
> > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > +				vq->vring_packed.desc[last_used].flags);
> > +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
> > +	used = !!(flags & VRING_DESC_F_USED(1));
> > +
> > +	return avail == used && used == vq->used_wrap_counter;
> 
> Spec does not check avail == used? So there's probably a bug in either of
> the two places.
> 
> In what condition that avail != used but used == vq_used_wrap_counter? A
> buggy device or device set the two bits in two writes?

Currently, spec doesn't forbid devices to set the status
bits as: avail != used but used == vq_used_wrap_counter.
So I think driver cannot assume this won't happen.

> 
> >   }
[...]
> >   static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> >   {
> > -	return 0;
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	START_USE(vq);
> > +
> > +	/* We optimistically turn back on interrupts, then check if there was
> > +	 * more to do. */
> > +
> > +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > +		virtio_wmb(vq->weak_barriers);
> 
> Missing comments for the barrier.

Will add some comments.

> 
> > +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > +							vq->event_flags_shadow);
> > +	}
> > +
> > +	END_USE(vq);
> > +	return vq->last_used_idx;
> >   }
> >   static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> >   {
> > -	return false;
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	u8 avail, used;
> > +	u16 flags;
> > +
> > +	virtio_mb(vq->weak_barriers);
> > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > +			vq->vring_packed.desc[last_used_idx].flags);
> > +	avail = !!(flags & VRING_DESC_F_AVAIL(1));
> > +	used = !!(flags & VRING_DESC_F_USED(1));
> > +
> > +	return avail == used && used == vq->used_wrap_counter;
> >   }
> >   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >   {
> > -	return false;
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	START_USE(vq);
> > +
> > +	/* We optimistically turn back on interrupts, then check if there was
> > +	 * more to do. */
> > +
> > +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > +		virtio_wmb(vq->weak_barriers);
> 
> Need comments for the barrier.

Will add some comments.

Best regards,
Tiwei Bie

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox